How AVs See the Road
James Carter
| 26-05-2026

· Automobile team
A self-driving vehicle can't blink, get distracted, or misjudge distance the way a human can. But it also can't rely on intuition, context, or the kind of split-second judgment that experienced drivers apply constantly without thinking.
What it has instead is a system of multiple sensors, each capturing a different dimension of the environment, feeding data into AI models that construct a real-time picture of everything around the vehicle and decide how to respond.
Understanding how that system works explains both how capable autonomous vehicles have become and where the remaining challenges lie.
The core architecture is called sensor fusion — the integration of data streams from cameras, lidar, radar, and ultrasonic sensors into a unified environmental model. Each sensor type has distinct strengths and limitations. No single sensor can provide everything an autonomous system needs to operate safely. The combination of all of them, processed by AI in real time, is what makes reliable perception possible.
Cameras: Visual Detail and Object Classification
Cameras are the richest source of visual information available to an autonomous vehicle. Capturing high-resolution imagery at up to 120 frames per second, cameras excel at reading traffic signs, interpreting traffic light states, detecting lane markings, and classifying objects — distinguishing a pedestrian from a cyclist from a parked car. Tesla's Autopilot system processes data from eight cameras simultaneously, handling 2.5 billion data points per second.
The limitation of cameras is their dependence on lighting and visibility conditions. At night, in heavy rain, or in dense fog, camera performance degrades. Shadows, lens glare, and low contrast situations create conditions where visual classification becomes unreliable. Cameras also struggle to provide accurate distance measurements on their own — they see detail but not depth.
Lidar: Three-Dimensional Mapping in Real Time
Lidar — Light Detection and Ranging — fires laser pulses at up to one million per second and measures the time it takes for each pulse to return after reflecting off a surface. The result is a continuous, precise three-dimensional map of the vehicle's surroundings, accurate to centimeter-level distances within a range of up to 200 meters. Waymo's systems use 360-degree lidar coverage, processing millions of data points per second to build a complete spatial model of the environment.
Lidar can accurately map the shape of a pedestrian or obstacle even when it is partially hidden behind another object — a capability called occlusion handling that is critical in complex urban environments. It maintains high accuracy in low-light conditions. Its weakness is cost — high-resolution lidar units have historically been expensive, though prices have fallen significantly — and performance in heavy precipitation, where laser pulses scatter off rain or snowflakes.
Radar: Velocity and All-Weather Reliability
Radar uses radio waves in the 76 to 81 GHz frequency range to measure both the position and velocity of objects. It operates reliably at distances up to 250 meters and is minimally affected by rain, fog, dust, or darkness. This makes radar the most reliable sensor in poor visibility conditions, and the primary basis for emergency braking and collision avoidance systems.
Radar's limitation is resolution. It cannot classify objects with the detail that cameras provide — it detects that something is present and moving at a certain speed, but not whether it is a pedestrian, a cyclist, or a piece of road debris. Different radar configurations serve different purposes: long-range units handle highway emergency detection; medium-range units monitor adjacent traffic; short-range units assist with parking and close-quarters maneuvering.
AI and the Problem of Edge Cases
Sensor fusion algorithms — often based on Kalman filters and Bayesian networks — take the continuous data streams from all sensors and resolve conflicts between them into a single, high-confidence environmental model. When a camera reports a pedestrian at one location and lidar confirms the same detection with consistent shape data, confidence rises. When sensors disagree, the system applies statistical weighting based on which sensor is most reliable in current conditions.
Deep learning models, particularly Convolutional Neural Networks, analyze camera imagery for pattern recognition — identifying pedestrians, cyclists, vehicles, and obstacles based on features learned from training on billions of real-world driving scenarios. The persistent challenge is edge cases: novel situations the training data didn't adequately represent. A child running between parked vehicles, a construction scene with temporary signage, an obstacle in an unexpected location — these situations continue to be where autonomous systems are most likely to struggle.