What is synthetic data?
A primer for beginners
When I arrived at Parallel Domain a year and a half ago, I hadn’t grasped the full implications of synthetic data. It was my first time in a product management role, so most of my effort was spent on just learning the ropes. As I grew more comfortable and began understanding what synthetic data was really capable of, I grew more and more fascinated. Synthetic data for computer vision is a complex field spanning graphics, robotics, and machine learning. This post is the teaser I wish someone had showed me before I started.
Okay. What is synthetic data?
Glad you asked. Let me show you.


Hard time seeing that some of this data isn’t real? Same.
The crazy thing is that it’s only going to get better from here.
Synthetic data is going to change everything.
Over the last ten years, we’ve seen computer vision improve at breakneck speed… except when it comes to data.
To get data, humans have to manually run or drive around with cameras. Then, we have to contract hundreds of workers in developing countries to label the data, which can take months of effort and headaches. In other words, it’s painful for humans to teach machines how to see.
But synthetic data is made by computers. This means that we can:
Automatically create data at the speed of cloud computing
Create any data we can dream of with extreme programmability
Automatically label data with impossible speed and granularity.
With synthetic data, machines are no longer learning only from humans. Machines are learning from other machines.
We’re already building indoor robots, self-driving cars, and autonomous delivery drones. Tomorrow, we’ll be creating robots for construction, mining, agriculture, and space. Collecting data at scale for every use case just isn’t feasible. At each step along the way, synthetic data will help those robots learn to see.