← All Resources

What is Sensor Fusion?

A technical reference on multi-sensor fusion: state estimation, data association, multi-hypothesis tracking, identity provenance, and why fusion has to run at the edge. Written for operators and engineers who are tired of marketing hand-waving about AI.

Sensor fusion is the process of combining data from multiple sensors into a unified picture that is more accurate, more complete, and more reliable than any individual sensor could produce alone. In operational systems it means correlating detections from radar, RF, EO/IR, acoustic, cooperative broadcasts, ISR, and whatever else you have into fused tracks with position, velocity, classification, and identity provenance. It is a physics problem with mathematical solutions. The math is decades old, battle-proven, and in some cases written in blood. It is not a problem to throw an LLM at, regardless of how much of the current marketing suggests otherwise.

For quick-reference answers on fusion algorithms, techniques, and related terms, see the Sensor Fusion FAQ.

The problem fusion solves

Every sensor lies. Not maliciously - but every sensor measurement contains noise, bias, and uncertainty. A radar return has range and bearing accuracy measured in meters and degrees. An RF direction-finding array has bearing uncertainty often in double-digit degrees. An EO/IR camera has pixel precision inside a narrow field of view. An acoustic array has range estimates with hundreds of meters of error. Cooperative broadcasts like Remote ID have asserted positions that can be completely fabricated.

No single sensor gives you the truth. Each gives you a noisy, partial, sometimes contradictory view of reality. The job of fusion is to combine those partial views into a picture that is closer to truth than any individual sensor can achieve - and, critically, to flag when the sensors disagree in ways that can't be reconciled so the operator sees the conflict rather than a silently-picked answer.

In operational terms: a radar sees something at range 1.2 km, bearing 047. An RF array hears a drone control link bearing northeast, somewhere in the 1-2 km range estimate. A cooperative RF sensor sees a Remote ID broadcast claiming to be a DJI Mavic 3 at 1.4 km, bearing 050. Are these three detections the same object, two objects, or three? Answering that question correctly, continuously, across potentially dozens of simultaneous detections, is the actual job.

Without fusion, an operator is staring at three separate screens mentally correlating three partial pictures. That works fine in a quiet environment with one or two objects. It collapses the moment a dense scene happens: drone swarm, contested airspace, a real incursion. Putting the three sensor feeds on one screen is not fusion. It is a dashboard. The operator is still the fusion engine, and the operator is overloaded.

How fusion actually works

Fusion operates in three layers, each solving a different class of problem. They interact continuously, which is part of what makes getting this right non-trivial.

Layer 1: State estimation

The foundation of fusion is state estimation - maintaining a mathematical model of each tracked object's position, velocity, and (sometimes) acceleration over time. The workhorse algorithm is the Kalman filter. Variants handle specific problems: Extended Kalman (EKF) for nonlinear dynamics, Unscented Kalman (UKF) for highly nonlinear systems, and Interacting Multiple Model (IMM) for targets that transition between flight regimes.

The Kalman filter operates in a predict-update cycle:

Predict. Using the target's last known state and a kinematic motion model (constant velocity, constant acceleration, coordinated turn, etc.), project the state forward to the current time. The uncertainty in the estimate - represented by the state covariance matrix - grows during prediction because no new information has arrived.

Update. When a new measurement arrives from any sensor, compare the predicted state to the measurement. The residual (the "innovation") is weighted by the relative confidence in the prediction versus the measurement. An accurate sensor measurement pulls the estimate strongly. A noisy measurement pulls it weakly. The uncertainty shrinks after update because new information has been incorporated.

This cycle runs continuously, at the sensor's update rate, for every tracked object. A radar that updates every 2 seconds triggers a predict-update cycle every 2 seconds. An EO/IR tracker at 30 Hz triggers 30 cycles per second. Between updates from any given sensor, the filter predicts forward, maintaining the best estimate possible given the information available.

The Kalman filter naturally handles sensors with different update rates, different accuracies, and different measurement types. A radar reports range and bearing. An RF array reports bearing only. An EO/IR camera reports azimuth and elevation in pixel coordinates. The filter's measurement model translates each sensor's native output into the common state space and weights it appropriately.

IMM deserves special mention. Against maneuvering targets - a drone transitioning from cruise to attack dive, a cruise missile going terminal, an OWA executing evasive jinks - a single motion model will lag when the target's behavior changes. IMM runs several filters in parallel (typically a constant-velocity, a constant-acceleration, and a coordinated-turn model) and blends their outputs by how well each is currently explaining the measurements. The filter automatically shifts weight toward whichever model is tracking best. This is what keeps track quality high when targets don't fly in straight lines, which is to say, most of the time.

Layer 2: Data association

State estimation assumes you already know which measurement goes with which track. Data association is the problem of figuring that out, and it is where most fusion systems fail in ways that matter.

When a radar completes a scan and reports 15 detections, and you are already tracking 12 objects, which detection goes with which track? Some are updates to existing tracks. Some are new objects. Some are false alarms from clutter, noise, or multipath. Some tracks have no corresponding detection this scan - the target was in a null, or the sensor was pointed elsewhere, or the target was briefly masked by terrain.

Greedy (nearest-neighbor) association processes detections one at a time. For each detection, find the closest track within a gating threshold, assign it, move on. Fast, simple, and order-dependent. The first detection processed gets the best available match. The second gets whatever is left. In dense environments with tightly-spaced targets, the first assignment steals the correct match from a later detection, producing a cascade of mis-associations that propagates through the entire track picture.

Greedy works fine when targets are well-separated. It falls apart when they are not - which is exactly the scenario that matters. Drone swarms, missile salvos, dense commercial air traffic with threats embedded, coordinated approaches from multiple axes. The systems that claim to "fuse" via greedy matching are fine in the demo and fail in contact.

Optimal global assignment considers all detections and all tracks simultaneously. It formulates the association problem as an optimization: find the assignment of detections to tracks (including "new track" and "false alarm" hypotheses) that minimizes the total statistical distance across the entire scene. The Hungarian algorithm (Munkres) or auction algorithm solves this in polynomial time. The difference is not subtle. In a scenario with 30 targets tracked by 10 sensor classes, greedy produces several mis-associations per scan cycle that compound over time. Optimal global assignment produces the mathematically best assignment every cycle. The 25th detection gets the same association quality as the first.

Joint Probabilistic Data Association (JPDA) is an intermediate approach worth mentioning. Instead of committing to a single assignment, JPDA weights each track's update by the probability that each nearby detection is the correct one. It handles clutter well and is computationally cheaper than full multi-hypothesis tracking. For many operational contexts JPDA is enough. For dense, high-ambiguity scenes it isn't.

Layer 3: Multi-hypothesis tracking (MHT)

Even optimal global assignment commits to a single association per scan cycle. Multi-hypothesis tracking (MHT) goes further: it maintains multiple possible association hypotheses in parallel and defers commitment until the data resolves the ambiguity.

The canonical example: two aircraft in close formation. Their radar returns merge into a single detection. A single-hypothesis tracker must decide - Track A, Track B, or new track? Whichever it picks, it's wrong two-thirds of the time. When the aircraft separate, the tracker has corrupted whichever track it guessed wrong on. Track-swap. Broken picture. Operator confusion at the worst possible moment.

MHT maintains all three hypotheses in parallel, each with its own independent state estimate. When the aircraft separate and produce distinct detections again, the hypothesis that correctly maintained both tracks scores highest. The losing hypotheses get pruned. Both tracks emerge from the merge event with accurate state estimates. No track swap. No corrupted picture.

MHT is more computationally expensive than single-hypothesis tracking because the hypothesis tree grows. Practical implementations use pruning strategies (N-scan-back, hypothesis merging, aggressive gating) to keep the tree bounded. With reasonable pruning, MHT runs comfortably against dozens of targets at sensor update rates on modern edge hardware. It is not a mainframe-scale algorithm. It is table stakes for any system operating in a dense, contested, or adversarial environment.

Trust and identity provenance

Fusion isn't only about position and velocity. Operational systems must also fuse identity - what is this thing? - and manage the trust level of that identity assessment.

A radar classification says "air track, small RCS, micro-Doppler signature consistent with rotary-wing." An RF sensor says "control link matches DJI OccuSync 3." A Remote ID broadcast says "registered commercial drone, operator ID 12345, position X, Y." An EO/IR track says "quadcopter, no visible payload." An acoustic sensor says "multi-rotor propeller signature, four-blade."

These identity claims come from different sources with very different reliability characteristics. The radar classification is inferred from physics. The RF fingerprint is observed from the actual emitted signal. The Remote ID is claimed by the drone itself and can be spoofed by a $30 ESP32. The EO/IR classification is observed visually. The acoustic match is derived from pattern recognition on the sound signature.

A real fusion system maintains identity provenance: where each piece of identity information came from, how it was derived, and how much to trust it. Claimed identity (Remote ID) is treated differently from observed identity (RF fingerprint), which is treated differently from inferred identity (radar RCS + kinematic behavior). When claims conflict - the Remote ID says "friendly commercial platform" but the RF signature matches a known threat control stack, or the claimed kinematics violate the performance envelope of the claimed airframe - the fusion system surfaces the conflict. It does not silently pick one and move on.

Trust is not static, either. A track corroborated by three sensors five minutes ago might have only one sensor confirming it now. Trust level should degrade as corroborating sensors go stale, and should restore when sensors reacquire. It's a lifecycle, not a binary. The specifics of how a given platform encodes that lifecycle and how different evidence streams combine mathematically - this is where theories like Dempster-Shafer evidence combination, Yager's rule, and related evidential reasoning frameworks come in. The math is public and well-studied. The operational tuning - what weights to apply, how zones should affect trust policy, how to detect spoof conditions and react to them - is where real engineering lives.

Why fusion has to run at the edge

The physics argument for edge-deployed fusion is straightforward: latency. A rotating radar paints a target roughly every 2 seconds. The fusion engine has to process that detection, associate it with existing tracks, update the state estimate, and present the result to the operator before the next scan completes. If the fusion engine lives in a cloud data center, round-trip latency adds 200 milliseconds to 2 full seconds depending on the link. For an intelligence picture, that's acceptable. For a fire control solution against a target moving at Mach 3 with a closing engagement window, it isn't.

The operational argument is stronger: survivability. A cloud-hosted fusion engine is a single point of failure. When the satellite link drops - whether from adversary action, weather, or equipment failure - every unit depending on that fusion engine loses its operational picture simultaneously. Edge-deployed fusion means each unit maintains its own fused picture from its organic sensors. Connectivity enriches the picture. Its absence does not destroy it.

The computational argument kills the cloud case entirely. Fusion is computationally modest. Kalman filtering is matrix multiplication and inversion per track per update cycle. Optimal global assignment is polynomial in the number of tracks and detections. MHT with pruning is bounded. Full multi-hypothesis fusion against dozens of targets from many sensor classes runs comfortably on a COTS edge compute module drawing well under 100 watts. There is no computational reason to put fusion in the cloud. The only reason it ends up there is architectural inertia from platforms that were designed cloud-first, back when edge compute was less capable and bandwidth was cheaper than it has become in contested environments.

For the longer architectural argument, see Why JADC2 Needs Sensor Fusion at the Edge and What is JADC2?.

Data normalization: the unglamorous prerequisite

Before you can fuse anything, you must normalize everything. This is less glamorous than talking about Kalman filters and MHT, but it is where a depressing number of "integrated" solutions fall apart.

Every sensor vendor reports differently. Coordinate systems vary (WGS84 geodetic, MGRS, local Cartesian, ENU). Update rates range from sub-second to multi-second. Uncertainty is expressed inconsistently - some vendors give you a full covariance matrix, some give you CEP, some give you nothing and expect you to treat their detections as gospel. Transports are fragmented: one sensor streams JSON over MQTT, another uses a proprietary SDK with a gRPC interface, another pushes ASTERIX CAT-062 over UDP, another wants you to poll a REST API, another has a serial interface with a custom pinout. Schemas are lossy - Cursor-on-Target and Link 16 J-messages both have structural limitations that drop meaningful vendor metadata into unstructured extension fields or off the table entirely.

Before a single track can be correlated, all of this has to be ingested, parsed, timestamped to a common clock (or at least well-characterized time offsets), projected into a shared spatiotemporal frame, tagged with a normalized uncertainty envelope, and preserved with enough semi-structured context that classification and identity logic downstream has something to work with. Get the normalization wrong and everything downstream is downstream of garbage. Get it right and the rest of the fusion math has a chance.

Sensor fusion versus "AI-powered" fusion

A recurring complaint worth making explicit: the defense tech industry has started marketing "AI-powered sensor fusion" as if frontier LLMs are the right tool for multi-hypothesis tracking across dozens of targets at millisecond resolution. They aren't. LLMs are astonishingly useful for many problems - explainability, natural-language interaction with curated intelligence, summarization of structured inputs, agent-based workflow orchestration. They are not a replacement for the linear algebra and probability theory that has been solving the tracking problem for decades.

Throwing detections into an LLM and asking what it thinks is not sensor fusion. It is a tech demo. Render unto math what is rightfully a math problem. Use AI for what it's actually good at. The two are complementary, not substitutable.

Where Empyrean fits

Empyrean's Fusion & Decision Engine implements multi-hypothesis tracking with optimal global assignment, zone-scoped trust policy, identity-provenance lifecycle management, and spoof detection at the edge. It consumes fused track output into every other workspace on the platform: the Common Operational Picture, EMSO, SSA, Narrative Intelligence, CUAS and Force Protection, Cyber / Physical Convergence, and the Policy & Decision Layer that automates response chains across all of them. The Simulation & Wargaming engine runs the same fusion pipeline against synthetic sensor data so operator training transfers directly to operations. All of it runs at the edge on hardware that deploys with the unit, with no cloud dependency.

Going deeper

For the architectural case on why fusion has to run at the edge rather than the cloud, read Why JADC2 Needs Sensor Fusion at the Edge. For how sensor fusion applies specifically to counter-UAS operations in operational depth, read The Ultimate Guide to Counter-UAS Operations. For the broader JADC2 context, see What is JADC2?. For the counter-UAS reference page, see What is Counter-UAS?.

Empyrean Defense

Want to see this in action?

Request a demo or explore the platform capabilities.