All posts
Field NotesSystems9 min readApril 15, 2026

What We Learned Putting Nexus on a Real Floor

DM

Dylan McCarthy

Founder & Engineer

What We Learned Putting Nexus on a Real Floor

Field notes from running our operational intelligence system on a working manufacturing floor. The surprises were not about the model. They were about what diagnosis actually is, and where a system has to hand back to a person.

Nexus is the first place our research meets a working floor. It reads a plant's control logic, supervisory systems, and documentation together, and reasons over them to diagnose faults and answer questions in plain language. Putting it on a real 14-line operation, rather than a simulated one, changed what we believed about the problem. These are the field notes.

Diagnosis Is Mostly Retrieval

We went in expecting the hard part to be reasoning. It was not. On a real floor, the information needed to resolve most faults already exists inside the control system. The alarm points at a tag, the tag maps to a rung, the rung depends on a permissive, the permissive depends on a module's state. The chain is fully determined. Nobody needs to be clever to follow it. They need to follow all of it, quickly, without skipping a link.

That reframed the whole system for us. The value of Nexus is not that it reasons better than an experienced engineer. It is that it traverses the complete dependency graph every time, while a person under pressure pattern-matches to the most likely cause and stops. Most of the time the shortcut is right. When it is wrong, it adds twenty minutes. A system that never takes the shortcut turns a graph-traversal problem that takes a person twenty minutes into one that takes seconds.

The Model Has to Read What the Engineer Reads

The second lesson was about inputs. Early on we were tempted to feed Nexus clean, normalized data. The floor does not have clean, normalized data. It has an Ignition project, a set of Studio 5000 exports full of nested Add-On Instructions, and device diagnostics from the network, none of which agree on naming and all of which assume a human will bridge the gaps.

Nexus only became useful when it read those sources the way the engineer does, together and in their native messiness, rather than waiting for someone to clean them first. The messiness is the job. A system that requires the operation to be tidy before it can help will never help, because operations are never tidy.

Completeness Beats Confidence

The most valuable diagnoses were not the obvious single-sensor faults. An experienced engineer finds those quickly with or without a system. The high-value cases were the faults with two or three contributing conditions, where the first cause looked sufficient and the second one was hiding.

On a fault traced to a safety module, the system cross-referenced that module against every PLC and surfaced two other machines that shared it in their interlock chain and were one timeout from faulting under load. No one had asked it to. That is completeness, and it is worth more than confidence.

That kind of systematic cross-referencing is exactly what a person cannot do in real time while a production manager is asking for an ETA. It is also exactly what a system holding the full model does for free.

Knowing Where to Stop

The lesson that mattered most was about the boundary. There is a clean line between the part of diagnosis that is retrieval and the part that is judgment. Tracing the dependency chain to a root cause is retrieval, and Nexus owns it. Deciding whether a degraded module should be replaced now or nursed to the end of the run is judgment, and that stays with the engineer.

We did not have to argue people into trusting the system. We had to make the boundary legible. When Nexus shows its trace, the path from symptom to cause, the engineer can see exactly what it did and confirm it against one physical observation. Trust came from transparency, not from accuracy claims. The engineers who relied on it most were the ones who could see its work.

What It Changed

Three months in, average fault-to-resolution time on that floor went from 28 minutes to under 6. Engineers got roughly eight hours a week back, which went to capital projects and preventive work instead of staring at logic under time pressure. Those numbers matter to the operation. To the research, the more important result is that a system holding a model of how the floor works, and showing that model honestly, gets trusted to use it.

Nexus is one expression of our Operational Intelligence research, not the whole of it. But it is the one that taught us the difference between studying understanding and watching it hold up on a floor where being wrong costs real money.

Keep reading the work.

This is one of a series of field notes and essays on building systems that understand and act in real operations. Nexus is where the ideas get tested.