Over the past decade, Artificial Intelligence (AI) has transformed from a topic that most people were only exposed to in science fiction books and movies, to a technology that serves as the engine behind some of the most innovative products we see and interact with on a daily basis. One of the most tangible examples of AI in society today is autonomous vehicles, cars that have the ability to conduct some or most of the tasks that a human driver can do.
With the advent of deep learning, and the acceleration of computing power, most major automakers, and several start-ups, have brought to market vehicles and systems that can handle what is known as “Level 2” autonomy. The Society of Automotive Engineers (SAE) categorizes autonomous systems on a scale that ranges from Level 0 to Level 5 autonomy. A Level 2 system is one that meets the criteria of “partial driving automation”, meaning that it can conduct some of the tasks of driving, including steering and acceleration, but it still requires human supervision and, at times, intervention.
While these systems that are on the market today provide an enhanced driving experience, they are still prone to failure. These scenarios arise when the system encounters a situation it does not have sufficient experience with, and can’t make an effective decision for. In order for self-driving cars to reach higher levels of autonomy, there are fundamental computer vision and data pipeline challenges that must be overcome.
As with any other machine learning (ML) problem, one of the most important aspects of building strong autonomous systems for cars is having strong, representative data to train models with.
For autonomous vehicles, training data is usually gathered in one of two ways. The first is to drive around and have the car collect data through various sensors such as cameras, LIDAR, radar, and sonar. Afterward, this data is annotated (labeled) so that the system can recognize what each object in the shots is (such as trees, pedestrians, other cars, etc.).
The second approach, which has gained popularity in recent years, is synthetic data in the form of shots from video games and other simulators. The advancements in computer graphics technology have led to the development of photo-realistic games such as Grand Theft Auto, which has been used to create entire datasets for self-driving cars. Synthetic datasets make it easy and inexpensive to control for variables such as weather and environment. It is also easier to annotate the data from synthetic environments than data from the real world. Leaders in the industry (such as Tesla) are actively investing in the development of simulators for their data pipelines. This method promises to be an important part of the future of the development of self-driving cars.
When machine learning algorithms and methods are developed, they are usually done in a research environment. Such environments are predictable and easy to control for. Bringing that method to production in a real-world setting is a much more difficult challenge. In the case of autonomous vehicles, the system could encounter scenarios in the real world where streets are unmarked, or a sign looks significantly different than the ones it was used to seeing in its training data. In machine learning, we call this problem data distribution shifts.
Data distribution shifts (or data shifts) are the phenomenon of machine learning systems engaging with data that changes over time. This causes the system’s accuracy to decrease as it becomes less and less familiar with the data that it is working with. This is a problem that people from the machine learning community have begun paying much closer attention to over recent years as ML models have begun moving from research labs into the real world for use in production.
In the case of self-driving cars, there are several computer vision problems that the autonomous system is solving in order to be able to conduct the act of driving. These problems include object recognition, object tracking, and segmentation (detecting multiple entities and spotting differences between them). For most humans, it is easy to detect traffic signs in the cities that we are used to driving in. We can also tell the differences between a pedestrian and a tree, and the differences between a yield sign and a stop sign. If we travel to a different country, we might occasionally have to ask what this sign means if it is unfamiliar to us. Much like humans, self-driving cars also require these explanations when being exposed to new environments. For instance, a system that was trained primarily on data from San Francisco would recognize a stop sign to be an octagonal-shaped red sign that has the word STOP on it. In some European countries where the stop sign is a circular shape with a triangle in it, could cause the system to be less accurate in detecting stop signs. The same could apply to road systems. In Europe, there are several cities with narrow one-way roads, whereas North American cities have larger and wider road systems. Unlike laboratory environments, the real world is not static, it changes all the time. For these reasons, having diverse datasets that represent all of the environments in which the system will be operating in is crucial to creating systems that generalize well and adapt to their domains.
Currently, some MLOps companies, such as Arize and Arthur are trying to solve this problem by allowing machine learning teams to be able to observe the performance of the model and detect data drifts. Further development of stronger model performance monitoring systems are an important aspect of developing more reliable and safer machine learning systems. As autonomous vehicles are operating in the real world, it is important to not only detect when the systems fail but to also find the attribute that caused the failure through proper monitoring. This can enable engineering teams to identify what data is missing in their system and to improve the data pipeline so that the model experiences less data shift and higher overall accuracy.
Throughout history, the world has seen several major technological and economic revolutions. More than 10’000 years ago, the Agricultural Revolution transformed society from a hunter-gatherer one to an agriculture and settlement based society.
Computer vision models are increasingly being adopted across several industries and verticals to optimize workflows and solve challenging problems that in the past have been cumbersome and time consuming. Whether it is self-driving cars...