Machines Training Machines

The promises of synthetic data for computer vision

Apr 05, 2021

Why aren’t self-driving cars on the roads yet?

Let’s ignore the hundreds of self-driving car startups launched over the last decade, several of which are billion dollar unicorns. Let’s just focus on the OG of them all, Waymo.

Waymo started training their self-driving cars 12 years ago. Founded, led, and built by the smartest minds of our generation, their fleets have collectively driven over 20 million miles on public roads.

Waymo’s self-driving cars would actually work in your day to day driving - on average, their self-driving fleet averaged almost 30,000 miles between disengagements in 2020. Depending on where you live, it’s possible that you can go days or weeks without having to touch the wheel. Waymo is that good.

And yet - no public product. No self-driving cars the public can jump in and try. (The few that we can jump in and try contain a human driver ready to take over at the slightest mistake, or are geofenced to an extremely limited, known area).

Why is this?

Self-driving cars require a huge amount of data (think: video and LiDAR recordings of good human driving. Exabytes, if not zettabytes, of it)
A self-driving car has to encounter a vast variety of situations, and it has to ace all of them, even the rare ones. Why? Because the cost of not acing them might mean death to its passengers, and death to the company.
The rarer the scenario, the harder it is to collect real-world data on it. Also, the more complicated the scenario tends to be (construction zones, emergency vehicle pullovers, vehicle stalled on the highway), which further exacerbates the requirement for large amounts of data.

It turns out that it’s not just hard to collect data on the edge cases, the rare scenarios. When you factor in the time and money required, it’s nigh impossible.

Enter synthetic data. Yeah, you heard me. I’m talking about computer-generated data. For self-driving cars. Instead of humans driving through cities, think computer’s driving through Grand Theft Auto’s open world. (Oops, spoiler alert!)

One might laugh at the thought of video games being seriously used to train something as complex and fine-tuned as a self-driving car. Let the following two videos forever put to rest these notions.

The distance between virtual simulation and physical reality will soon come to an end, and so will our dependence on physically collected data for any robotic application.

Why is this important? Because it’s a faster, cheaper, and more scalable to tell computers to generate data than it is to tell humans to generate data. Let’s say that you want to find ambulance data to train your self-driving car with. You can either

Stick one hundred cars and safety drivers on the road and drive them around for a week straight for ~$850,000 (100 cars * 24 hours * 7 days * ($21/hour/driver + ~$30/hour/car)) to get 16,800 hours of driving data. An extremely generous 1% would be ambulance data, resulting in 168 hours of data. This data would then need to be manually annotated by humans for an ungodly sum of money and time. To illustrate the ungodliness of data labeling, manually annotating the below image would take 60-90 minutes. And it would contain mistakes.
And that’s just semantic segmentation - there are also 2D bounding boxes, 3D bounding boxes, instance segmentation, point cloud annotations, and even depth or optical flow annotations which humans cannot do. Even if you parallelized the annotation across a fleet of humans, the turnaround time for a dataset of 168 hours would be weeks, not days.
Alternatively, generate hundreds of days of ambulance data with pixel-perfect annotations in a computer cluster within a few hours for $1,000.

The release of self-driving cars aren’t being blocked by figuring out what to do at a traffic light or stop sign. They’re being blocked by what to do when a literal chicken crosses the road. While physically-collected data gets us through the first 90 or 95 percent of encounterable scenarios, synthetic data will get us through the last 5-10 percent.

Cool, right? But let’s take it a step further. Why not eliminate the physical data altogether? What if the realism, scalability, and cheaper cost of synthetic data eventually lead to robots such as self-driving cars being entirely trained on synthetic data?

I need you to understand how cool this is. The human equivalent would be downloading the collective life experiences of thousands of people into your brain before you’re born so that from the first day you enter this world, you’re capable of solving calculus problems, doing ballet, building wristwatches, and creating original works of music. And.

From training self-driving cars to GPT-3 coding websites, computers are increasingly pushing humans out of the one thing that computers shouldn’t be able to do: build themselves!

It’s cool, and it’s scary. If you can’t beat ‘em, join ‘em.

Aamir’s Newsletter

Discussion about this post