Securing Industrial Control Systems: What Actually Works

OT security frameworks are easier to audit than to implement in a plant that hasn't shut down in six years. Here's what separates the guidance that holds up from what gets deferred indefinitely.

OT security advice usually comes from one of two places: IT security teams who understand threat models but not the operational constraints of a plant floor, or compliance frameworks that describe what security should look like without engaging with why implementation is difficult in practice. The plants that make real progress learn a third way, from field experience, which is less clean but more honest about what actually works.

The framework gap is real. NIST SP 800-82, IEC 62443, and NERC CIP are not wrong, but they describe security postures without engaging with the engineering constraints that make implementation difficult in brownfield industrial environments. A plant that has been running continuously for eight years has a fundamentally different path to compliance than a greenfield facility, and the practical guidance for that path doesn't appear in the framework documents.

The Operational Reality of OT Security

The constraints that make OT security different from IT security are engineering facts, not excuses. Any realistic security program has to account for them explicitly, or it will be bypassed by operations staff who can't afford the downtime it creates.

Availability is the primary constraint. The goal in OT is not confidentiality first. It's keeping the process running. A security control that introduces unplanned downtime risk will be disabled. Security architecture that doesn't account for operational availability requirements doesn't last in production.

Patch cadence is different. OT systems don't have maintenance windows measured in hours. Rebooting a PLC mid-run is not an option. Vendor qualification requirements mean that an OS patch applied without ICS vendor approval may void support contracts and invalidate the safety case for certified equipment. The IT instinct to patch immediately is not transferable to OT without modification.

Change control is slow by design. A network change that takes two hours to plan and approve in IT may take two weeks in OT, because the blast radius of an incorrect configuration on a control network can include production downtime and process safety events. This is appropriate caution, not bureaucratic obstruction.

Legacy systems are the rule, not the exception. Most production OT environments run operating systems that are past Microsoft's end of life. Windows XP and Windows 7 HMI workstations are genuinely common. This isn't negligence. It's the result of vendor qualification processes that lag OS releases, hardware that was certified for specific OS versions, and risk aversion about testing HMI software on updated platforms while production depends on it.

Understanding these constraints is the starting point for building a security program that operations will actually support rather than work around.

Network Segmentation: The Foundation and the Challenge

Every OT security framework starts with network segmentation: separate your operational technology from your corporate IT network and from the internet, using firewalls and demilitarized zones. This is correct. It's also harder to execute in a plant where the network was built incrementally over 20 years and where any change carries downtime risk.

In practice, segmentation projects begin with discovery, not firewall rules. You cannot segment what you haven't mapped. The first deliverable in any OT security engagement is a current-state network diagram showing every connection between the OT environment and anything external to it: connections to the corporate WAN, remote access paths, historian connections to the enterprise data tier, cloud connections for vendor remote support.

The Purdue Model provides the structural framework: Level 0 (field devices), Level 1 (basic control), Level 2 (supervisory control), Level 3 (site operations), and Level 4 (enterprise). The goal is controlled data flow between levels with no direct connections that skip levels. In practice, most brownfield plants have connections that jump levels entirely: a Level 1 PLC with a direct connection to an enterprise historian, a Level 2 SCADA workstation with a VPN client for remote vendor access, a Level 3 historian server accessible from the corporate LAN without a DMZ.

The practical starting point is removing vestigial connections before adding new controls. A significant portion of legacy connections in most plants are no longer needed: remote access credentials for vendors who no longer support the system, historian connections that are no longer polled, test ports opened during commissioning and never closed. Removing these is lower risk than adding firewall rules, produces visible security improvements, and typically doesn't require the same level of change control scrutiny because nothing operational depends on them.

Patch Management in OT: The Constraint Is Real

Compensating controls are the correct response to the OT patching problem. If a system cannot be patched on the normal IT schedule, and most OT systems cannot, the question is not how to force the patch through but what controls reduce the impact if the unpatched vulnerability is exploited.

Effective compensating controls for unpatched OT systems:

Network isolation. A system that cannot reach the internet directly, has no lateral connectivity outside its defined communication paths, and is accessible only via a controlled jump host is substantially harder to exploit remotely than the same unpatched system on a flat network. Network isolation doesn't eliminate the vulnerability but it eliminates most remote exploitation paths.

Application whitelisting on engineering workstations. Engineering workstations (EWS) that connect to PLCs and SCADA systems are a primary lateral movement path in OT environments. An EWS running only vendor-approved software, enforced by application whitelisting via Microsoft AppLocker or commercial OT-aware tools, prevents malware installed via phishing or USB from pivoting to the control network even if the workstation's OS is unpatched.

Removable media controls. USB-delivered malware is a documented and recurring OT threat vector. The Stuxnet delivery mechanism is the well-known example, but the vector remains common in industrial environments where USB drives are used for PLC programming and HMI updates. Policies prohibiting unapproved USB devices, combined with USB port blocking where operationally feasible, address a significant attack surface that doesn't require any network connectivity.

Passive network monitoring. OT-specific network monitoring tools including Claroty, Dragos, and Nozomi Networks can detect anomalous behavior on the control network without requiring host-based agents on PLCs. They work by passively observing network traffic and alerting on communications that deviate from the established baseline: a PLC initiating an outbound connection it has never made before, a programming connection from an unauthorized IP address, a new device appearing on the network. This doesn't stop an attack but it reduces dwell time significantly, which is the measure that determines how much damage a successful intrusion causes.

Document compensating controls explicitly alongside the known unpatched vulnerabilities. The goal is visible, owned risk, not hidden risk. A list that says "this HMI workstation runs Windows 7, cannot be patched due to vendor certification requirements, and is compensated by application whitelisting, network isolation, and passive monitoring" is a manageable risk. The same workstation without documentation or compensating controls is an unmanaged one.

The most common gap found during OT security assessments isn't missing firewalls or unpatched systems. It's undocumented remote access: vendor jump boxes, TeamViewer sessions left running, cellular modems installed during commissioning and forgotten. Map these before anything else.

Credential Hygiene: The Persistent Gap

SCADA and HMI systems are frequently configured with shared accounts, default credentials, and passwords that haven't changed since commissioning. This is not an edge case. It's common enough in production environments to be a predictable finding rather than a surprising one.

The operational reason is legitimate: in an emergency, multiple people need system access quickly, and individual accounts with strong passwords can become a barrier when time matters. The answer is not shared accounts. It's emergency access procedures that are fast and auditable.

Individual named accounts with role-based access control provide two things that shared accounts don't: audit trails and targeted revocation. When a contractor's engagement ends, you revoke their account. When investigating an incident, you have a log of who accessed what and when. Neither capability exists with shared credentials, and both are essential for incident response and compliance auditing.

The specific access control model for SCADA and HMI systems should reflect the operational reality:

Operators need read access and setpoint authority within defined ranges for their process area
Engineers need programming access, scoped to engineering workstations and the PLCs they're responsible for
Vendor remote access should be provisioned on demand with automatic expiration rather than persistent credentials
Administrator-level access should be restricted to named individuals with explicit justification, reviewed quarterly

Most modern HMI and SCADA platforms support role-based access control natively. FactoryTalk Security (Rockwell), Ignition's identity provider integration, and Wonderware's security model all provide the mechanisms. The gap is usually configuration, not capability.

Password management for OT systems that can't integrate with Active Directory (common in older PLC and HMI platforms) requires a documented manual rotation schedule. Quarterly rotation for accounts with elevated access is a reasonable starting point. For sites moving toward more systematic credential management, OT-aware privileged access management platforms can manage credentials for systems that support it, while maintaining manual processes for systems that don't.

Remote Access: The Attack Surface Nobody Mapped

Remote access to OT environments is the single most common attack vector in documented ICS incidents, and it's frequently the least-managed part of the security architecture. Most production OT environments have more remote access paths than the people responsible for security are aware of.

Persistent vendor VPN accounts. Vendor support accounts provisioned for initial commissioning that are still active years after the engagement ended. These accounts often have elevated privileges (required for the commissioning work) and may not have been subject to standard credential rotation processes.

Screen sharing and remote desktop tools on engineering workstations. TeamViewer, AnyDesk, and similar tools installed for convenience during commissioning and left in place. These create persistent outbound connections that can be used for remote access by anyone with the session credentials, bypassing VPN controls entirely.

Cellular modems on PLCs or routers. Installed during commissioning to allow remote vendor support or data collection, and often not documented in the network architecture. Common in remote or unmanned facilities. These represent a path into the OT network that bypasses the corporate perimeter.

Shared jump server credentials. A jump server that provides controlled access to the OT network but uses shared credentials, doesn't require MFA, isn't monitored for session activity, or maintains persistent connections to multiple OT segments simultaneously.

The remediation approach varies by access type, but the discovery approach is the same: passive network monitoring on the OT network boundary combined with active enumeration of accounts with OT network access. Most organizations are surprised by the number of access paths that exist relative to the number they knew about before the assessment.

The target architecture for remote access: all remote sessions require MFA, no persistent connections exist from the corporate network or internet directly into the OT network, vendor access is provisioned on demand through a monitored jump server and expires automatically after the support window, and all sessions are logged with user identity, source IP, duration, and a record of systems accessed.

Incident Response for OT: Building the Playbook Before the Incident

IT incident response doctrine, which tells you to isolate the affected system and contain the threat, doesn't translate cleanly to OT. Isolating a compromised PLC may mean halting production. Taking a SCADA server offline during an investigation may mean losing visibility into an active process. The decision tree is different when the systems you're protecting are also controlling physical processes.

An OT incident response plan needs to be specific about decisions that are too consequential to make under pressure:

Decision authority during an incident. Who can authorize taking a specific piece of equipment offline during an active cyber incident? What is the escalation path if that person is unavailable? These decisions should be pre-approved for likely scenarios, not improvised in real time.

Preserve-versus-respond tradeoffs. Evidence preservation doctrine (don't touch the affected system until forensics is complete) conflicts with the operational need to restore production. The plan should specify, by asset type, which approach takes priority and under what conditions.

Backup and recovery capabilities. PLC program backups, HMI configuration backups, historian configuration backups. If a ransomware event encrypts engineering workstations and SCADA servers, the recovery path depends entirely on having current backups in a location that can't be reached from the compromised network. Most OT environments have backup processes; fewer have tested the restore process against a known-clean state.

External notification requirements. Critical infrastructure sectors have mandatory incident reporting requirements. Manufacturing facilities involved in defense supply chains may have additional DFARS obligations. Know what your reporting obligations are before an incident, not during one.

The test for an OT incident response plan is a tabletop exercise with operations leadership, IT security, and the control system engineering team together. Most plants that do this exercise discover gaps that are easiest to close in a conference room and very difficult to close during an actual incident.

Connecting Security Investment to Operations Outcomes

OT security programs that succeed treat security as an operations capability, not a compliance exercise. The visible deliverables of a well-executed program, including network segmentation, documented remote access, named accounts, and tested incident response, also improve operational visibility, reduce unauthorized changes, and provide audit trails useful for maintenance and troubleshooting.

The network mapping required for segmentation is the same network documentation that commissioning engineers need. The access control model required for security is the audit trail that maintenance managers need. The incident response plan required for security is the emergency operations procedure that operations leadership needs. The security investment and the operations investment are largely the same investment, which is the most important framing for getting operations leadership support for work that might otherwise look like IT overhead.

Teams building or upgrading SCADA and HMI systems have the best opportunity to implement these controls correctly from the start. Retrofitting them into a system that's been running for a decade is possible but substantially harder. Security architecture decisions made during the design and commissioning phase are the ones that produce durable results.