Wie sehen autonome Fahrzeuge eigentlich die Welt? Schauen wir es uns an!
The equipment on an autonomous vehicle that differs from a human-piloted vehicle primarily can be classified into two categories: sensory equipment and mechanical actuators. The actuators just convey the driving system’s decisions to the physical part of the car, and can be very tightly integrated with the car itself. Motors that are used to assist with steering for human drivers can also be used to turn the steering rack itself. With most modern cars being drive-by-wire— not requiring, say, a physical cable from the gas pedal to a butterfly valve in the fuel system and instead just sending electronic signals from the pedal to increase or decrease throttle—having something other than a human controlling the functions of a car are pretty straightforward. This part of the equation is by far the easy part.The hard part is getting the car to perceive and understand the ever-changing, ever-moving world around it. That’s where the sensory equipment comes in. Here’s what that equipment usually consists of:Ultrasonic SensorsYou know those little round button-like things you see on some cars’ bumpers? Those are ultrasonic sensors, and they’re most often used as parking-assist sensors, since they’re good at telling what’s close to you, at low speeds. They bounce ultrasonic sound waves off objects to determine how close you are to them.These actually don’t have much use in fully autonomous vehicle use, but they still do help a car understand its environment, so I thought they were worth a mention. Automatic parallel parking systems do use them, so there are some autonomous-driving/parking contexts where they’re used.We can’t hear the pulses they make, since those pulses, while loud, tend to be between 40 kHz and 48 kHz (or higher, with newer sensors) or so. Human hearing stops at about 20 kHz. Dogs, cats, and bats, though, they should be able to hear them, which must be pretty annoying.CamerasVision, is, of course, the most important sense we use when driving, so most self-driving machines will need a way to replicate it. Modern technology is capable of making some very small and high-resolution camera systems, and modern cars already are getting pretty laden with cameras, even if they don’t have any interest in driving themselves.Cameras, usually mounted just above the inside rear-view mirror in the top-center of the windshield, are used for lane-departure systems, where computers run software that analyzes each frame of video to identify the lines painted on a highway, and makes sure the car stays inside them. These cameras may also be used for emergency braking systems and traffic sign identification. All of these examples would be for camera systems with some degree of artificial intelligence, since they’re actually attempting to make some sort of sense out of the images they capture.“Sense” is a bit of an anthropomorphizing term, of course: they’re really just analyzing frames of video for a very specific set of criteria, and acting on that criteria in very defined ways. Most autonomous vehicle camera systems with use two cameras to get binocular vision for real depth perception. While the cameras are good, they’re not usually as good as the one in, say, your phone. Most tend to be between 1 to 2 megapixels, which means they’re imaging the world at a resolution of about 1600 x 1200 pixels. Not bad, but much less than human vision. Still, this seems to be good enough to resolve what’s needed for driving, and is small enough to allow for image processing at the sorts of speeds required for driving.Really, it’s not about image quality or color saturation or any of the sorts of criteria we normally use when we evaluate cameras for our use. For driving a car, you want fast image acquisition—the more frames per second you can capture and evaluate, the quicker the car’s reaction time will be.When processing images from the camera, the car’s artificial vision system has to look out for and identify a number of things:• Road markings• Road Boundaries• Other vehicles• Cyclists, pedestrians, pets, discarded mattresses, and anything else in the road that is not a vehicle• Street signs, traffic signs, traffic signals• Other car’s signal lampsTo identify these objects and people, the camera systems must figure out what pixels in the image are background, and what are the actual things that need to be paid attention to? Humans can do this instinctively, but a machine doesn’t inherently understand that a 1600x1200 matrix of colored pixels that we see as a Porsche 356 parked in front of the burned remains of a Carl’s Jr. is actually a vehicle parked in front of a sub-par fast-food restaurant that fell victim to a grease fire.To get a computer to understand what it’s seeing through its cameras, a number of different methods have to be employed. Objects are identified as separate from their surroundings via algorithms and processes like edge detection, which is a complex and math-intensive way for a computer to look at a given image and find where there’s boundaries between areas, usually based on differences in image brightness between regions of pixels. As you can imagine, this process is non-trivial, since any given scene viewed through a camera is full of gradients of color and shade, shadows, bright spots, confusing boundaries, and so on. But, complicated math that looks like this......is precisely the sort of thing computers are good at, so, generally, this process works quite well.Once individual objects are separated from their background, they then need to be identified. Size and proportion are big factors in this, as most cars are—very roughly—similarly sized and proportioned, as are most people or cyclists and so on. Things that are large 12-foot-by-five-foot-by-six-foot rectangles are likely cars, narrow things that are shaped like a book on its spine are probably bicycles or motorcycles, and tall oblongs that move around are probably people or magic walking cacti. While most autonomous systems are pretty good at identifying cars and people and bikes, they’re still pretty stupid compared to humans. For example, where we humans would never mistake this for a real car: …it’s absolutely good enough for autonomous car use. This object identification is accomplished via lots of training and machine learning with thousands and thousands of example, pre-categorized images, and while it’s extremely impressive it works, it can be fooled in troubling ways. For example, it’s hard to tell the difference between a picture of a bicycle from a real bicycle, especially if the bike is an image on the back of a moving car, allowing the bike image to move as would be expected to the computer. That image is from an article in MIT Technology Review, and it highlights the biggest issue with cameras and image identification systems: they’re easy to fool, or, even if we’re not talking about any deliberate foolery, they can get confused. The solution to this is to not just rely on cameras, but to use cameras as part of a larger suite of other sensors.There’s lots of good reasons to have as many different world-sensing options as well: you want to be able to ‘see’ what’s going on, even in conditions where visibility is limited. Darkness is a factor, of course, but so is bad weather. We’ve all been driving and gotten caught in torrential rains that render the view out our windshield into something that looks like what you see if you attempt to view the world through a nice, cold gin-and-tonic. You can’t really see, and a computer wouldn’t be able to either. But, other systems, like radar or lidar, both of which we’ll get to soon here, may not be as affected. As far as how many cameras are used, at minimum an autonomous vehicle would need a pair of stereo cameras facing forward, though having rear and side cameras to help get as close to a 360° view would be ideal.