IMO, access to DeepMind and Google infra is a hugely understated advantage Waymo has that no other competitor can replicate.
Subtle brag that Waymo could drive in camera-only mode if they chose to. They've stated as much previously, but that doesn't seem widely known.
Talk about edge cases.
But, what would you do? Trust the Waymo, or get out (or never get in) at the first sign of trouble?
[1] I've seen a couple of them but they're not available to hire yet and are still very rare.
Human drivers routinely do worse than Waymo, which I take 2 or 3 times a week. Is it perfect? No. Does it handle the situation better than most Lyft or Uber drivers? Yes.
As a bonus: unlike some of those drivers the Waymo doesn't get palpably angry at me for driving the route.
"we’re excited to continue effectively adapting to Boston’s cobblestones, narrow alleyways, roundabouts and turnpikes."
This does bring up something, though: Waymo has a "pull over" feature, but it's hidden behind a couple of touch screen actions involving small virtual buttons and it does not pull over immediately. Instead, it "finds a spot to pull over". I would very much like a big red STOP IMMEDIATELY button in these vehicles.
Humans do this, just in the sense of depth perception with both eyes.
(edit - I'm referring to deployed Tesla vehicles, I don't know what their research fleet comprises, but other commenters explain that this fleet does collect LIDAR)
https://youtu.be/LFh9GAzHg1c?t=872
They've also built it into a full neural simulator.
https://youtu.be/LFh9GAzHg1c?t=1063
I think what we are seeing is that they both converged on the correct approach, one of them decided to talk about it, and it triggered disclosure all around since nobody wants to be seen as lagging.
Also we record body position actuation and self speech. As output then we put this on thousands of people to get as much data as Waymo gets.
I mean that’s what we need to imitate agi right? I guess the only thing missing is the memory mechanism. We train everything as if it’s an input and output function without accounting for memory.
I started working heavily on realizing them in 2016 and it is unquestionably (finally) the future of AI
Or the most realistic game of SimCity you could imagine.
Vivaldi 7.8.3931.63 on iOS 26.2.1 iPhone 16 pro
2. No seriously, is the filipino driver thing confirmed? It really feels like they're trying to bury that.
As to the revolt, America doesn't do that any more. Years of education have removed both the vim and vigor of our souls. People will complain. They will do a TikTok dance as protest. Some will go into the streets. No meaningful uprising will occur.
The poor and the affected will be told to go to the trades. That's the new learn to program. Our tech overlords will have their media tell us that everything is ok (packaging it appropriately for the specific side of the aisle).
Ultimately the US will go down hill to become a Belgium. Not terrible, but not a world dominating, hand cutting entity it once was.
Humans do this with vibes and instincts, not just depth perception. When I can't see the lines on the road because there's too much slow, I can still interpret where they would be based on my familiarity with the roads and my implicit knowledge of how roads work, e.g. We do similar things for heavy rain or fog, although, sometimes those situations truly necessitate pulling over or slowing down and turning on your 4s - lidar might genuinely given an advantage there.
Why should you be able to do that exactly? Human vision is frequently tricked by it's lack of depth data.
Google/Alphabet are so vertically integrated for AI when you think about it. Compare what they're doing - their own power generation , their own silicon, their own data centers, search Gmail YouTube Gemini workspace wallet, billions and billions of Android and Chromebook users, their ads everywhere, their browser everywhere, waymo, probably buy back Boston dynamics soon enough (they're recently partnered together), fusion research, drugs discovery.... and then look at ChatGPT's chatbot or grok's porn. Pales in comparison.
https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...
Discussed here,eg.
Genie 3: A new frontier for world models (1510 points, 497 comments)
https://news.ycombinator.com/item?id=44798166
Project Genie: Experimenting with infinite, interactive worlds (673 points, 371 comments)
____.----.____
______/ \______
_____/ \_____
________________________________________
(simulations) (real world data) (simulations)
Seems like it, no?We started with physics-based simulators for training policies. Then put them in the real world using modular perception/prediction/planning systems. Once enough data was collected, we went back to making simulators. This time, they're physics "informed" deep learning models.
A power outage feels like a baseline scenario—orders of magnitude more common than the disasters in this demo. If the system can’t degrade gracefully when traffic lights go dark, what exactly is all that simulation buying us?
[*] https://futurism.com/advanced-transport/waymos-controlled-wo...
Its much easier to build everything into the compressed latent space of physical objects and how they move, and operate from there.
Everyone jumped on the end-2-end bandwagon, which then locks you into the input to your driving model being vision, which means that you have to have things like genie to generate vision data, which is wasteful.
Self driving cars is a dead end technology, that will introduce a whole host of new problems which are already solved with public transit, better urban planning, etc.
I'm curious why you say this given you start by highlighting several characteristics that are not like Belgium (to wit, poor education, political media capture, effective oligarchy). I feel there are several other nations that may be better comparators, just want to understand your selection.
They should be bought by a rocket company. Then they would stand a chance.
Once it gets unstuck, it runs autonomously.
For context, my "driver's test" was going to the back of the office, and driving some old car backwards and forwards a few meters.
Sharing one's opinion in a respectful way is possible. Less spectacle, so less eyeballs, but worth it. Try it.
Google's been thinking about world models since at least 2018: https://arxiv.org/abs/1803.10122
But eventually I think we will get there. Human drivers will be banned, the roads will be exclusively used by autonomous vehicles that are very efficient drivers (we could totally remove stoplights, for example. Only pedestrian crossing signs would be needed. Robo-vehicles could plug into a city-wide network that optimizes the routing of every vehicle.) At that point, public transit becomes subsidized robotaxi rides. Why take a subway when a car can take you door to door with an optimized route?
So in terms of why it isn’t a waste of time, it’s a step along the path towards this vision. We can’t flip a switch and make this tech exist, it will happen in gradual steps.
Anyway you can think it's a waste but they're wasting their money, not yours. If you want a train in your town, go get one. Waymo has only spent, cumulatively, about 4 months of the budgets of American transit agencies. If you had all that money it wouldn't amount to anything.
That is, both are true: this high-fidelity simulation is valuable and it won't catch all failure modes. Or in other words, it's still on Waymo for failing during the power outage, but it's not uniquely on Waymo's simulation team.
https://www.reddit.com/r/SelfDrivingCars/comments/1pem9ep/hm...
> there's probably no examples in the training data where the car is behind a stopped car, and the driver pulls over to another lane and another car comes from behind and crashes into the driver because it didn't check its blindspot
This specific scenario is in the examples: https://videos.ctfassets.net/7ijaobx36mtm/3wK6IWWc8UmhFNUSyy...
It doesn't show the failure mode, it demonstrates the successful crash avoidance.
As always tho the devil lies in the details: is an LLM based generation pipeline good enough? What even is the definition of "good enough"? Even with good prompts will the world model output something sufficiently close to reality so that it can be used as a good virtual driving environment for further training / testing of autonomous cars? Or do the kind of limitations you mentioned still mean subtle but dangerous imprecisions will slip through and cause too poor data distribution to be a truly viable approach?
My personal feeling is that this we will land somewhere in between: I think approaches like this one will be very useful, but I also don't think the current state of AI models mean we can have something 100% reliable with this.
The question is: is 100% reliability a realistic goal? Human drivers are definitely not 100% reliable. If we come up with a solution 10x more reliable than the best human drivers, that maybe has some also some hard proof that it cannot have certain classes of catastrophic failure modes (probably with verified code based approaches that for instance guarantees that even if the NN output is invalid the car doesn't try to make moves out of a verifiably safe envelope) then I feel like the public and regulators would be much more inclined to authorize full autonomy.
Not for the rendering (that's still way too expensive), but for the initial world generation that gets iteratively refined and then still ultimately gets converted into textured triangles.
This is legit hilarious to read from some random HN account.
Edit: or are you talking about the allegations of workers in the Philippines controlling the Waymos: https://futurism.com/advanced-transport/waymos-controlled-wo... I guess both are valid.
“When the Waymo vehicle encounters a particular situation on the road, the autonomous driver can reach out to a human fleet response agent for additional information to contextualize its environment,” the post reads. “The Waymo Driver [software] does not rely solely on the inputs it receives from the fleet response agent and it is in control of the vehicle at all times.” [from Waymo's own blog https://waymo.com/blog/2024/05/fleet-response/]
What's the problem with this?
> After being pressed for a breakdown on where these overseas operators operate, Peña said he didn’t have those stats, explaining that some operators live in the US, but others live much further away, including in the Philippines.
> “They provide guidance,” he argued. “They do not remotely drive the vehicles. Waymo asks for guidance in certain situations and gets an input, but the Waymo vehicle is always in charge of the dynamic driving tasks, so that is just one additional input.”
We've simply relabeled the "Mechanical Turk" into "AI."
The rest is built on stolen copyrighted data.
The new corporate model: "just lie the government clearly doesn't give a shit anymore."
>I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
So for the record, with this realization you're 3+ years behind Tesla.So many people advocate for public transit, but are unwilling to deal with the current market tradeoffs and decisions people are making on the ground. As long as that keeps happening, expect modes of transit -- like Waymo -- that deliver the level of service that they promise to keep exceeding expectations.
I've spent my entire adult life advocating for transportation alternatives, and at every turn in America, the vast majority of other transit advocates just expect people to be okay with anti-social behavior going completely unenforced, and expecting "good citizens" to keep paying when the expected value for any rational person is to engage in freeloading. Then they point to "enforcing the fare box" as a tradeoff between money to collect vs cost of enforcement, when the actually tradeoff is the signalling to every anti-social actor in the system that they can do whatever they want without any consequences.
I currently only see a future in bike-share, because it's the only system that actually delivers on what it promises.
I don't want to hear tiktok or full volume soap operas blasting at some deaf mouth breather.
I don't want to be near loud chewing of smelly leftovers.
I don't want to be begged for money, or interact with high or psychotic people.
The current culture doesn't allow enforcement of social behaviour: so public transport will always be a miserable containment vessel for the least functional, and everyone with sense avoids the whole thing.
Seems like there ought to be a name for this, like so-and-so's law.
Listen to the statement.
The operators help when the Waymo is in a "difficult situation".
Car drives itself 99% of the time, long tail of issues not yet fixed have a human intervene.
Everyone is making out like it's an RC car, completely false.
Having humans in the loop at some level is necessary for handling rare edge cases safely.
[1] https://people.com/waymo-exec-reveals-company-uses-operators...
edit: fixed kill -> hit
It'll be interesting to see which pays off and which becomes Quibi
Automation makes public transit better. There will be automated minibuses that are more flexible and frequent than today's buses. Automation also means that buses get a virtual bus lane. Taxis solve the last mile problem, by taking taxi to the station, riding train with thousands of people, and then taking more transit.
Also, we might discover the advantage of human powered transit. Ebikes are more efficient than cars and give health benefits. They will be much safer than automated cars. Could use the extra capacity for bike and bus lanes.
Oh come on -- of course they are. That's precisely why you put it in a "white paper" and not, you know, ads.
The technology "feels" way less cool knowing that there are human backups, which would absolutely in turn make its percieved value go down.
I don't think Google is targeting developers with their AI, they are targeting their product's users.
1) is a bit simplistic though. I don't know of any European system that would cover even operating costs out of fare/commercial revenue. Potentially the London Underground - but not London buses. UK National Rail had higher success rates
The better way to look at it imo is looking at the economic loss as well of congestion/abandoned commutes. To do a ridiculous hypothetical, London would collapse entirely if it didn't have transit. Perhaps 30-40% of inner london could commute by car (or walk/bike), so the economic benefit of that variable transit cost is in the hundreds of billions a year (compared to a small subsidy).
It's not the same in SFBA so I guess it's far easier to just "write off" transit like that, it is theoretically possible (though you'd probably get some quite extreme additional congestion on the freeways as even that small % moving to cars would have an outsized impact on additional congestion).
I quite agree with the overall point but can we leave this kind of discourse on X, please? It doesn't add much, it just feels caustic for effect and engagement farming.
As soon as a mode of transport actually has to compete in a market for scarce & valuable land to operate on, trains and other forms of transit (publicly or privately owned) win every time.
It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
But they are in full gear now that there is real competition, and it’ll be cool to see what they release over the next few years.
I basically agree with your premise that public transit as it exists today will be rendered obsolete, but I think this point here is where your prediction hits a wall. I would be stunned if we agreed to eliminate human drivers from the road in my lifetime, or the lifetime of anyone alive today. Waymo is amazing, but still just at the beginning of the long tail.
As soon as Waymo's massive robotaxi lead became undeniable, he pivoted to from robotaxis to humanoid robots.
Why do you expect them to make money? Roads don't make money and no one thinks to complain about that. One of the purposes of government is to make investment in things that have more nebulous returns. Moving more people to public transit makes better cities, healthier and happier citizens, stronger communities, and lets us save money on road infrastructure.
Is there a magic road wand?
Don't they have those somewhere in South America?
The first is the DDT control loop, what a human driver does. Waymo's remote assistants aren't involved in that. The computer always has responsibility for the safety of the vehicle and decisionmaking while operating, which is why Waymo's humans are remote assistants and not remote drivers. Their safety drivers do participate in the DDT loop, hence the name.
But there's also another "loop" of human involvement. Sometimes the vehicle doesn't understand the scene and asks humans for advice about the appropriate action to take. It's vaguely similar to captchas. The human will usually confirm the computer's proposed actions, but they can also suggest different actions. The computer the advice as a prior to continue operating instead of giving up the DDT responsibility. There's very likely a closely monitored SLA between a few seconds to a few minutes on how long it takes humans to start looking at the scene.
If something causes the computer to believe the advice isn't safe, it will ignore it. There have been cases where Waymos have erroneously detected collisions and remote assistants were unable to override that decisionmaking. When that happens, a vehicle recovery team is physically sent out to the location. The SLA here is likely between tens of minutes and a couple hours.
It probably doesn't matter though, "this general blob over there"
[1] https://people.com/waymo-car-hits-child-walking-to-school-du...
Under the same circumstances (kid suddenly emerging between two parked cars and running out onto the street), it could be debated that the outcome could have been worse if a human were driving.
The Waymo Driver has traveled nearly 200 million fully autonomous miles, becoming a vital part of the urban fabric in major U.S. cities and improving road safety. What riders and local communities don’t see is our Driver navigating billions of miles in virtual worlds, mastering complex scenarios long before it encounters them on public roads. Today, we are excited to introduce the Waymo World Model, a frontier generative model that sets a new bar for large-scale, hyper-realistic autonomous driving simulation.
Your web browser does not support this video.
Simulation of the Waymo Driver evading a vehicle going in the wrong direction. The simulation initially follows a real event, and seamlessly transitions to using camera and lidar images automatically generated by an efficient real-time Waymo World Model.
Simulation is a critical component of Waymo’s AI ecosystem and one of the three key pillars of our approach to demonstrably safe AI. The Waymo World Model, which we detail below, is the component that is responsible for generating hyper-realistic simulated environments.
The Waymo World Model is built upon Genie 3—Google DeepMind's most advanced general-purpose world model that generates photorealistic and interactive 3D environments—and is adapted for the rigors of the driving domain. By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.
This combination of broad world knowledge, fine-grained controllability, and multi-modal realism enhances Waymo’s ability to safely scale our service across more places and new driving environments. In the following sections we showcase the Waymo World Model in action, featuring simulations of the Waymo Driver navigating diverse rare edge-case scenarios.
Most simulation models in the autonomous driving industry are trained from scratch based on only the on-road data they collect. That approach means the system only learns from limited experience. Genie 3’s strong world knowledge, gained from its pre-training on an extremely large and diverse set of videos, allows us to explore situations that were never directly observed by our fleet.
Through our specialized post-training, we are transferring that vast world knowledge from 2D video into 3D lidar outputs unique to Waymo’s hardware suite. While cameras excel at depicting visual details, lidar sensors provide valuable complementary signals like precise depth. The Waymo World Model can generate virtually any scene—from regular, day-to-day driving to rare, long-tail scenarios—across multiple sensor modalities.
Your web browser does not support this video.
Simulation: Driving on the Golden Gate Bridge, covered in light snow. Waymo’s shadow is visible in the front camera footage.
Your web browser does not support this video.
Simulation: Encountering a tornado.
Your web browser does not support this video.
Simulation: A suburban cul de sac completely submerged in stagnant flood water with floating furniture.
Your web browser does not support this video.
Simulation: Driving on a street with lots of palm trees in a tropical city, strangely covered in snow.
Your web browser does not support this video.
Simulation: Driving out of a raging fire.
Your web browser does not support this video.
Simulation: Reckless driver driving off road.
Your web browser does not support this video.
Simulation: The leading vehicle driving into the tree branches.
Your web browser does not support this video.
Simulation: Driving behind a vehicle with precariously positioned furniture on top.
Your web browser does not support this video.
Simulation: A malfunctioned truck facing the wrong way, blocking the road.
Your web browser does not support this video.
Simulation: Encountering a huge tumbleweed the size of a car.
Your web browser does not support this video.
Simulation: Encounter with a friendly elephant.
Your web browser does not support this video.
Simulation: Encounter with a Texas longhorn.
Your web browser does not support this video.
Simulation: Encounter with a lion.
Your web browser does not support this video.
Simulation: A pedestrian dressing up as a T-rex.
Your web browser does not support this video.
Simulation: Encountering a huge tumbleweed the size of a car.
In the interactive viewers below, you can immersively view the realistic 4D point clouds generated by the Waymo World Model.
Loading…
Interactive 3D visualization of an encounter with an elephant.
Loading…
Interactive 3D visualizations of a drive through a city street.
The Waymo World Model offers strong simulation controllability through three main mechanisms: driving action control, scene layout control, and language control.
Driving action control allows us to have a responsive simulator that adheres to specific driving inputs. This enables us to simulate “what if” counterfactual events such as whether the Waymo Driver could have safely driven more confidently instead of yielding in a particular situation.
Counterfactual driving. We demonstrate simulations both under the original route in a past recorded drive, or a completely new route. While purely reconstructive simulation methods (e.g., 3D Gaussian Splats, or 3DGS) suffer from visual breakdowns due to missing observations when the simulated route is too different from the original driving, the fully learned Waymo World Model maintains good realism and consistency thanks to its strong generative capabilities.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Scene layout control allows for customization of the road layouts, traffic signal states, and the behavior of other road users. This way, we can create custom scenarios via selective placement of other road users, or applying custom mutations to road layouts.
Scene layout conditioning following
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Your web browser does not support this video.
Language control is our most flexible tool that allows us to adjust time-of-day, weather conditions, or even generate an entirely synthetic scene (such as the long-tail scenarios shown previously).
World Mutation - Time of Day
Your web browser does not support this video.
Dawn
Your web browser does not support this video.
Morning
Your web browser does not support this video.
Noon
Your web browser does not support this video.
Afternoon
Your web browser does not support this video.
Evening
Your web browser does not support this video.
Night
World Mutation - Weather
Your web browser does not support this video.
Cloudy
Your web browser does not support this video.
Foggy
Your web browser does not support this video.
Rainy
Your web browser does not support this video.
Snowy
Your web browser does not support this video.
Sunny
During a scenic drive, it is common to record videos of the journey on mobile devices or dashcams, perhaps capturing piled up snow banks or a highway at sunset. The Waymo World Model can convert those kinds of videos, or any taken with a regular camera, into a multimodal simulation—showing how the Waymo Driver would see that exact scene. This process enables the highest degree of realism and factuality, since simulations are derived from actual footage.
Your web browser does not support this video.
Norway.
Your web browser does not support this video.
Arches National Park, Utah, USA
Your web browser does not support this video.
Death Valley, California, USA
Some scenes we want to simulate may take longer to play out, for example, negotiating passage in a narrow lane. That’s harder to do because the longer the simulation, the tougher it is to compute and maintain stable quality. However, through a more efficient variant of the Waymo World Model, we can simulate longer scenes with dramatic reduction in compute while maintaining high realism and fidelity to enable large-scale simulations.
Your web browser does not support this video.
Navigating around in-lane stopper and fast traffic on the freeway.
Your web browser does not support this video.
Navigating a busy neighborhood.
Your web browser does not support this video.
Driving up a steep street and safely navigating around motorcyclists.
Your web browser does not support this video.
SUV U-turning.
By simulating the “impossible”, we proactively prepare the Waymo Driver for some of the most rare and complex scenarios. This creates a more rigorous safety benchmark, ensuring the Waymo Driver can navigate long-tail challenges long before it encounters them in the real world.
The Waymo World Model is enabled by the key research, engineering and evaluation contributions from James Gunn, Kanaad Parvate, Lu Liu, Lucas Deecke, Luca Bergamini, Zehao Zhu, Raajay Viswanathan, Jiahao Wang, Sakshum Kulshrestha, Titas Anciukevičius, Luna Yue Huang, Yury Bychenkov, Yijing Bai, Yichen Shen, Stefanos Nikolaidis, Tiancheng Ge, Shih-Yang Su and Vincent Casser.
We thank Chulong Chen, Mingxing Tan, Tom Walters, Harish Chandran, David Wong, Jieying Chen, Smitha Shyam, Vincent Vanhoucke and Drago Anguelov for their support in defining the vision for this project, and for their strong leadership and guidance throughout.
We would like to additionally thank Jon Pedersen, Michael Dreibelbis, Larry Lansing, Sasho Gabrovski, Alan Kimball, Dave Richardson, Evan Birenbaum, Harrison McKenzie Chapter and Pratyush Chakraborty, Khoa Vo, Todd Hester, Yuliang Zou, Artur Filipowicz, Sophie Wang and Linn Bieske for their invaluable partnership in facilitating and enabling this project.
We thank our partners from Google DeepMind: Jack Parker-Holder, Shlomi Fruchter, Philip Ball, Ruiqi Gao, Songyou Peng, Ben Poole, Fei Xia, Allan Zhou, Sean Kirmani, Christos Kaplanis, Matt McGill, Tim Salimans, Ruben Villegas, Xinchen Yan, Emma Wang, Woohyun Han, Shan Han, Rundi Wu, Shuang Li, Philipp Henzler, Yulia Rubanova, and Thomas Kipf for helpful discussions and for sharing invaluable insights for this project.
I always thought they deliberately tried to contain the genie in the bottle as long as they could
- I would be stunned if we agree to eliminate human drivers from 100% of roads in the lifetime of anyone alive today.
or
- I would be stunned if we agree to eliminate human drivers from 10% of roads...
...or is there some other percentage to qualify this? I guess I wouldn't expect there to be a decree that makes it happen all at once for a country. Especially a large country like the U.S.. More like, some really dense city will decide to make a tiny core autonomous vehicles only, and then some other cities also do years later. And then maybe it expands to something larger than just the core after 5 or 10 years. And so on...
I thought it was the Nazi salutes on stage and backing neo-nazi groups everywhere around the world, but you know, I guess the lidar thing too.
The key question is whether general purpose robots can outcompete on sheer economies of scale alone.
I don't.
That's why I said "variable cost of operations."
If a system doesn't generate enough revenue to cover the variable costs of operation, then every single new passenger drives the system closer to bankruptcy. The more "successful" the system is -- the more people depend on it -- the more likely it is to fail if anything happens to the underlying funding source, like a regular old local recession. This simple policy decision can create a downward economic spiral when a recession leads to service cuts, which leads to people unable to get to work reliably, which creates more economic pain, which leads to a bigger recession... rinse/repeat. This is why a public transit system should cover variable costs so that a successful system can grow -- and shrink -- sustainably.
When you aren't growing sustainably, you open yourself up to the whims of the business cycle literally destroying your transit system. It's literally happening right now with SF MUNI, where we've had so many funding problems, that they've consolidated bus lines. I use the 38R, and it's become extremely busy. These busses are getting so packed that people don't want to use them, but the point is they can't expand service because each expansion loses them more money, again, because the system doesn't actually cover those variable costs.
The public should be 100% completely covering the fixed capital costs of the system. Ideally, while there is a bit of wiggle room, the ridership should be 100% be covering the variable capital costs. That way the system can expand when it's successful, and contract when it's less popular. Right now in the Bay Area, you have the worst of both worlds, you have an underutilized system with absolutely spiraling costs, simply because there is zero connection between "people actually wanting to use the system" and "where the money comes from."
I think you'd be surprised. Look at the difference in cost per passenger mile.
[1]: https://research.google/blog/towards-a-conversational-agent-...
It basically happened for horses.
The issue with lidar is that many of the difficult edge-cases of FSD are all visible-light vision problems. Lidar might be able to tell you there's a car up front, but it can't tell you that the car has it's hazard lights on and a flat tire. Lidar might see a human shaped thing in the road, but it cannot tell whether it's a mannequin leaning against a bin or a human about to cross the road.
Lidar gets you most of the way there when it comes to spatial awareness on the road, but you need cameras for most of the edge-cases because cameras provide the color data needed to understand the world.
You could never have FSD with just lidar, but you could have FSD with just cameras if you can overcome all of the hardware and software challenges with accurate 3D perception.
Given Lidar adds cost and complexity, and most edge cases in FSD are camera problems, I think camera-only probably helps to force engineers to focus their efforts in the right place rather than hitting bottlenecks from over depending on Lidar data. This isn't an argument for camera-only FSD, but from Tesla's perspective it does down costs and allows them to continue to produce appealing cars – which is obviously important if you're coming at FSD from the perspective of an auto marker trying to sell cars.
Finally, adding lidar as a redundancy once you've "solved" FSD with cameras isn't impossible. I personally suspect Tesla will eventually do this with their robotaxis.
That said, I have no real experience with self-driving cars. I've only worked on vision problems and while lidar is great if you need to measure distances and not hit things, it's the wrong tool if you need to comprehend the world around you.
It sounds like they removed Lidar due to supplier issues and availability, not because they're trying to build self-driving cars and have determined they don't need it anymore.
Using vision only is so ignorant of what driving is all about: sound, vibration, vision, heat, cold...these are all clues on road condition. If the car isn't feeling all these things as part of the model, you're handicapping it. In a brilliant way Lidar is the missing piece of information a car needs without relying on multiple sensors, it's probably superior to what a human can do, where as vision only is clearly inferior.
IMO the presence of safety chase vehicles is just a sensible "as low as reasonably achievable" measure during the early rollout. I'm not sure that can (fairly) be used as a point against them.
I'm comfortably with Tesla sparing no expense for safety, since I think we all (including Tesla) understand that this isn't the ultimate implementation. In fact, I think it would be a scandal if Tesla failed to do exactly that.
Damned if you do and damned if you don't, apparently.
This isn't just happening in America. Train systems are in rough shape in the UK and Germany too.
Ebike shares are a much more sustainable system with a much lower cost, and achieve about 90% of the level of service in temperate regions of the country. Even the ski-lift guy in this thread has a much more reasonable approach to public transit, because they actually have extremely low cost for the level of service they provide. Their only real shortcoming is they they don't handle peak demand well, and are not flexible enough to handle their own success.
And apparently some people still haven't caught on.
Have a look if you don't believe me:
https://hn.algolia.com/?dateRange=custom&page=0&prefix=false...
Google Reader is a simple example: Googl had by far the most popular RSS reader, and they just threw it away. A single intern could have kept the whole thing running, and Google has literal billions, but they couldn't see the value in it.
I mean, it's not like being able to see what a good portion of America is reading every day could have any value for an AI company, right?
Google has always been terrible about turning tech into (viable, maintained) products.
https://www.yellowscan.com/knowledge/how-weather-really-affe...
Seeing how its by a lidar vendor, I don't think they're biased against it. It seems Lidar is not a panacea - it struggles with heavy rain, snow, much more than cameras do and is affected by cold weather or any contamination on the sensor.
So lidar will only get you so far. I'm far more interested in mmwave radar, which while much worse in spatial resolution, isn't affected by light conditions, weather, can directly measure stuff on the thing its illuminating, like material properties, the speed its moving, the thickness.
Fun fact: mmWave based presence sensors can measure your hearbeat, as the micro-movements show up as a frequency component. So I'd guess it would have a very good chance to detect a human.
I'm pretty sure even with much more rudimentary processing, it'll be able to tell if its looking at a living being.
By the way: what happened to the idea that self-driving cars will be able to talk to each other and combine each other's sensor data, so if there are multiple ones looking at the same spot, you'd get a much improved chance of not making a mistake.
I will never trust 2d camera-only, it can be covered or blocked physically and when it happens FSD fails.
As cheap as LIDAR has gotten, adding it to every new tesla seems to be the best way out of this idiotic position. Sadly I think Elon got bored with cars and moved on.
But Codex/5.2 was substantially more effective than Claude at debugging complex C++ bugs until around Fall, when I was writing a lot more code.
I find Gemini 3 useless. It has regressed on hallucinations from Gemini 2.5, to the point where its output is no better than a random token stream despite all its benchmark outperformance. I would use Gemini 2.5 to help write papers and all, can't see to use Gemini 3 for anything. Gemini CLI also is very non-compliant and crazy.
In many ways, turning tech into products that are useful, good, and don't make life hell is a more interesting issue of our times than the core research itself. We probably want to avoid the valuing capturing platform problem, as otherwise we'll end up seeing governments using ham fisted tools to punish winners in ways that aren't helpful either
But the Tesla engineers are "in the right place rather than hitting bottlenecks from over depending on Lidar data"? What?
7 cameras x 36fps x 5Mpx x 30s
48kHz audio
Nav maps and route for next few miles
100Hz kinematics (speed, IMU, odometry, etc)
Source: https://youtu.be/LFh9GAzHg1c?t=571See also: any programming thread and Rust.
Reader had to be killed because it [was seen as] a suboptimal ad monetization engine. Page views were superior.
Was Google going to support minimizing ads in any way?
I don't think Tesla is that far behind Waymo though given Waymo has had a significant head start, the fact Waymo has always been a taxi-first product, and given they're using significantly more expensive tech than Tesla is.
Additionally, it's not like this is a lidar vs cameras debate. Waymo also uses and needs cameras for FSD for the reasons I mentioned, but they supplement their robotaxis with lidar for accuracy and redundancy.
My guess is that Tesla will experiment with lidar on their robotaxis this year because design decisions should differ from those of a consumer automobile. But I could be wrong because if Tesla wants FSD to work well on visually appealing and affordable consumer vehicles then they'll probably have to solve some of the additional challenges with with a camera-only FSD system. I think it will depend on how much Elon decides Tesla needs to pivot into robotaxis.
Either way, what is undebatable is that you can't drive with lidar only. If the weather is so bad that cameras are useless then Waymos are also useless.
The real question is whether doing so is smart or dumb. Is Tesla hiding big show-stopper problems that will prevent them from scaling without a safety driver? Or are the big safety problems solved and they are just finishing the Robotaxi assembly line that will crank out more vertically-integrated purpose-designed cars than Waymo's entire fleet every day before lunch?
None of these technologies can ever be 100%, so we’re basically accepting a level of needless death.
Musk has even shrugged off FSD related deaths as, “progress”.
Having a self-driving solution that can be totally turned off with a speck of mud, heavy rain, morning dew, bright sunlight at dawn and dusk.. you can't engineer your way out of sensor-blindness.
I don't want a solution that is available to use 98% of the time, I want a solution that is always-available and can't be blinded by a bad lighting condition.
I think he did it because his solution always used the crutch of "FSD Not Available, Right hand Camera is Blocked" messaging and "Driver Supervision" as the backstop to any failure anywhere in the stack. Waymo had no choice but to solve the expensive problem of "Always Available and Safe" and work backwards on price.
What good is a huge fleet of Robotaxis if no one will trust them? I won't ever set foot in a Robotaxi, as long as Elon is involved.
FSD: 2 deaths in 7 billion miles
Looks like FSD saves lives by a margin so fat it can probably survive most statistical games.
0: https://techcrunch.com/2019/04/22/anyone-relying-on-lidar-is...
1: https://static.mobileye.com/website/corporate/media/radar-li...
2: https://www.luminartech.com/updates/luminar-accelerates-comm...
3: https://www.youtube.com/watch?v=Vvg9heQObyQ&t=48s
4: https://ir.innoviz.tech/news-events/press-releases/detail/13...
Um, yes they did.
No idea if it had any relation to Tesla though.
Also, integration effort went down but it never disappeared. Meanwhile, opportunity cost skyrocketed when vision started working. Which layers would you carve resources away from to make room? How far back would you be willing to send the training + validation schedule to accommodate the change? If you saw your vision-only stack take off and blow past human performance on the march of 9s, would you land the plane just because red paint became available and you wanted to paint it red?
I wouldn't completely discount ego either, but IMO there's more ego in the "LIDAR is necessary" case than the "LIDAR isn't necessary" at this point. FWIW, I used to be an outspoken LIDAR-head before 2021 when monocular depth estimation became a solved problem. It was funny watching everyone around me convert in the opposite direction at around the same time, probably driven by politics. I get it, I hate Elon's politics too, I just try very hard to keep his shitty behavior from influencing my opinions on machine learning.
[*] Failing to solve the impossible situation FSD dropped them into, that is.
Then that guy got decapitated when his Model S drove under a semi-truck that was crossing the highway and Mobileye terminated the contract. Weirdly, the same fatal edge case occurred 2 more times at least on Tesla's newer hardware.
https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashe...
It's still rather weak and true monocular depth estimation really wasn't spectacularly anything in 2021. It's fundamentally ill posed and any priors you use to get around that will come to bite you in the long tail of things some driver will encounter on the road.
The way it got good is by using camera overlap in space and over time while in motion to figure out metric depth over the entire image. Which is, humorously enough, sensor fusion.
https://www.nhtsa.gov/laws-regulations/standing-general-orde...
If there's gamesmanship going on, I'd expect the antifan site linked below to have different numbers, but it agrees with the 2 deaths figure for FSD.
There are two deaths associated with FSD.
https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashe...
Your link agrees with me:
> 2 fatalities involving the use of FSD
Your link agrees with me:
> two that NHTSA's Office of Defect Investigations determined as happening during the engagement of Full Self-Driving (FSD) after 2022.