Undocumented PLC programs don't announce their debt. It accumulates silently until a process change, a personnel transition, or a fault exposes exactly how much institutional knowledge is locked inside a ladder diagram.
Every plant has at least one PLC program that nobody fully understands anymore. The engineer who wrote it retired. The documentation was never completed, or was completed and then lost. The program works, mostly, and so it runs, untouched, until the day something changes and the lack of clarity becomes a very expensive problem.
The industry term for this is technical debt, but that understates the specific nature of the risk. Software technical debt usually means slower development velocity. PLC technical debt means slower fault recovery, higher risk during process changes, and single points of failure in the people who carry the institutional knowledge that makes the system safe to operate.
Understanding What PLC Code Looks Like After 20 Years
Ladder logic for simple sequencing is readable. A rung that says "if this input coil is energized and this timer has not expired, energize this output coil" is clear. Ladder logic for a process that's been in production for 20 years, patched by eight engineers for twelve different change requests, is not.
The specific patterns that accumulate in long-lived PLC programs:
Bit flags used as implicit state variables. Rather than implementing a proper state machine with explicit state transitions, engineers add intermediate bits that serve as flags between rungs. A flag named B3:2/14 carries no semantic information. Its meaning is implicit in which rungs energize it and which rungs it controls. Tracing the logic of a 400-rung program with 30 unnamed flags is an hours-long exercise, and the trace has to be repeated every time anyone needs to modify anything in that logic path.
Repurposed timers. Timers allocated for one purpose and later reused for another. The timer name (T4:12) reflects its original allocation, not its current function. The preset value may have been changed multiple times without documentation. Understanding the timing behavior of a sequence requires tracing every reference to every timer, cross-referencing with the physical process timing you observe during runtime, and hoping the two match.
Comments that describe old behavior. Comment blocks added when a rung was written, describing what it did before the last change order. The comment says "check conveyor speed before energizing zone 2." The rung no longer controls zone 2. The conveyor has been replaced. The comment is actively misleading to anyone who reads it.
Hardcoded parameters. Recipe parameters, process limits, and calibration constants that should be in a data table are instead hardcoded as constants within the logic. When the process changes, these constants have to be found across hundreds of rungs. When a new product is added to the line, there's no recipe structure to extend.
Interlocks that no longer match the physical system. Interlocks added for equipment that has since been removed, upgraded, or relocated. The interlock logic still runs every scan, creating dependencies on inputs that may no longer reflect the physical state the interlock was originally designed to protect against. Worse, the interlock may still fire occasionally because the input it's monitoring still changes state, but for completely different reasons than the original design assumed.
The dangerous property of all of these patterns is that they're invisible during normal operation. The code runs, the process runs, and the implicit assumptions baked into the code remain hidden until a fault, a process change, or a personnel transition exposes them.
The Real Cost Surface
Inherited PLC debt surfaces cost in predictable categories that most plants underestimate until they measure them directly.
Fault resolution time. When a fault occurs, the technician's first task is understanding what the program expects the process state to be. In well-documented code, this takes minutes. In undocumented code with accumulated complexity, it can take hours. The time to first diagnosis is dominated by code interpretation, not by physical troubleshooting. The wrench doesn't move until the logic is understood.
A direct measurement: track fault-to-resolution time on equipment with well-documented controls versus equipment with inherited, undocumented code. The delta is the carrying cost of the debt, expressed in labor hours and unplanned downtime per fault event. Most plants that perform this measurement are surprised by how large the difference is, typically two to four times longer fault resolution on undocumented code.
Process change risk premium. Modifying a PLC program you don't fully understand requires conservative changes and extensive testing, because you can't predict what else the code you're changing affects. Change orders that should take one day take three. Changes that should require a two-hour commissioning window require a full shift. The risk premium on process changes compounds as the program grows more opaque.
Personnel dependency. If one engineer on your team is the only person who can navigate the program under fault conditions, that engineer is load-bearing infrastructure. Every vacation, illness, and eventual departure represents a risk event. When a fault occurs during that engineer's absence, the cost of the dependency becomes concrete and immediate. The cost to bring a replacement up to speed on undocumented code can exceed the cost of the fault itself.
Training time for new staff. A new controls engineer joining a site with well-documented programs can become productive on that equipment in weeks. The same engineer joining a site with undocumented inherited programs may take months to reach the same confidence level, and may never fully replicate the knowledge the original engineer carried implicitly.
Platform-Specific Considerations
The challenge of inherited PLC code is not uniform across platforms. Allen-Bradley and Siemens, the two dominant PLC platforms in US and European manufacturing respectively, present different challenges for inherited code.
Allen-Bradley / Studio 5000. Rockwell's tag-based programming model, introduced with ControlLogix, supports descriptive tag names and data type definitions that make well-written programs significantly more readable than the bit-addressed programs common in older PLC-5 and SLC 500 code. However, many plants running ControlLogix hardware have programs that were migrated from older platforms and retain the bit-addressed style of the original code. The platform supports good documentation practices; migration projects often didn't implement them.
Older Allen-Bradley PLC-5 and SLC 500 programs are particularly difficult to work with. The integer file addressing (N7:0, B3:1/15) is opaque without external documentation, and the flat program structure doesn't support the modular organization that makes larger programs maintainable. These platforms are also past vendor support lifecycle, which means security patches, firmware updates, and replacement parts require planning rather than ordering.
Siemens S7 / TIA Portal. Siemens' block-based program organization (OB, FC, FB, DB) supports modular program structure, and TIA Portal's commenting and library features enable reasonable documentation when used. Older S7-300 and S7-400 programs written in STL (Statement List) are particularly difficult for engineers unfamiliar with the platform. STL is a low-level assembly-style language that expresses logic with no visual representation. Engineers who are comfortable with ladder logic or FBD can read STL but cannot easily reason about complex STL programs without significant additional effort.
In both ecosystems, the programs that are hardest to work with are those written in older languages and migrated to newer platforms without being restructured. The new platform runs the old logic correctly, but the old logic's readability and maintainability problems migrate with it.
Retrofit vs Rewrite: Getting the Decision Right
The instinct when inheriting a messy program is to rewrite it from scratch. That instinct is often wrong for specific reasons.
A rewrite requires fully understanding the current behavior before you can replicate it correctly. That's exactly what you lack. Programs that have been in production for years carry implicit knowledge in their logic: edge cases handled for equipment characteristics that were never documented, timing behaviors tuned for specific mechanical response times, interlocks that exist for reasons that have been forgotten but that protected against real failure modes. A rewrite without full understanding of the current behavior will get most of it right and miss some of it, and you won't know what you missed until production discovers it.
A rewrite also introduces all-new bugs. The clean version you write will have different failure modes than the code it replaces. You won't understand those failure modes until they manifest in production, typically at the worst possible time.
Retrofit is usually safer: document the existing code through careful static analysis and supervised runtime observation, identify the highest-risk areas (bit flags with broad influence, interlocks with opaque logic, hardcoded parameters in critical process paths), and progressively improve those areas while maintaining the current logic structure. This is slower and less satisfying than a clean rewrite, but it preserves institutional knowledge while incrementally reducing risk.
The cases where rewrite is justified are more specific: programs so corrupted that static analysis cannot establish a reliable control narrative, programs running on hardware past end of life that require a platform migration anyway (which resets the cost of restructuring), or programs for processes that have changed so significantly that most of the current logic no longer reflects the physical system. Even in these cases, the rewrite process should include extensive observation of the current program under production conditions before the replacement is written.
What Good Documentation Actually Looks Like
PLC documentation is not a printout of the ladder diagram with a cover page. It's a control narrative: a written description of what the program is supposed to do, organized by functional area. A control narrative describes:
- The states the process can be in and the complete set of valid transitions between them
- The conditions that trigger each transition, including the specific inputs, timer states, and bit flag values required
- The interlocks, what physical condition each one protects against, and what the consequence of an interlock fault is
- The timing behaviors that matter: when does a timer need to expire before the next step can proceed, and what happens if it doesn't?
- The parameters expected to change (recipe values, process limits) and where they live in the program
- The known edge cases and how the program handles them
This document should be written at a level of detail that allows a qualified controls engineer who has never seen the program to understand its intent before opening the programming software. If you can write that document for your existing code, you've started paying down the debt. If you can't, you're carrying more of it than you may realize.
The control narrative, kept current with program changes, is worth more to the next engineer who needs to work on the system than any amount of inline commenting. Inline comments describe what individual rungs do. The control narrative describes what the system is supposed to accomplish and why, which is the information that matters most under fault conditions and during process changes.
Getting Started on Legacy Code
The practical first step for a plant dealing with inherited PLC code is a structured assessment: for each PLC program in the facility, evaluate how well the control narrative is understood and documented, and rate the risk exposure. The relevant factors are how many engineers can confidently troubleshoot this program under pressure, how frequently it requires modification, and how significant the consequences of an error during modification would be.
Programs with high modification frequency and low documentation quality are the priority. Not because they're the most likely to fail spontaneously, but because modification risk is where undocumented complexity becomes actual cost. A program that runs stably and never needs to change carries its technical debt quietly; a program that changes quarterly and isn't documented pays the debt every change cycle.
The goal of the assessment is a prioritized list, not a complete audit of every program at once. Starting with the two or three programs that represent the highest combination of modification risk and documentation gap produces visible improvement without requiring a facility-wide documentation project that never gets funded.
