‘OT’ Cyber Security for Modern Grid Operations – Effective Design Methods

‘OT’ Cyber Security for Modern Grid Operations – Effective Design Methods

AbstractThe risks of various cyber threats to modern Grid Telemetry and Control Systems presents challenges to protect and operate properly.  This paper describes the approaches that can be used to  deployed effective cyber protection and defenses for the IP based SCADA and Industrial Control Systems we use.  These methods were used in 2018-2019 to build a best in class cyber security threat protection and management system in a generation and transmission grid environment.  The Author designed and oversaw the implementation of these systems as a consultant, then as a cooperative engineer over a 6 year period from 2014 to 2019.

Keywords—GRID IP, Cyber Security, Reliable Network 

  1. History of Grid IP development in East Texas

The 1980s,  analog microwave systems were constructed or leased carrier systems were deployed to enable the early remote scada systems.  These were four wire analog 300 and 1200 baud  FSK communication links.

The 1990s, transitions started to move from analog to digital TDM and RS232 interfaces and slightly faster speeds up to 9600 bps.

The 2000s – the first decade completes the moves from analog to digital transport on carrier and microwave systems.  ATM and Frame relay was on some carrier and VSAT systems.  The second decade sees the move from TDM to IP based transport and SCADA interfacing.  The demand for real-time modeling metering data from Points of Delivery (POD) requires a more dynamic, reliable, and robust network that uses TCP-IP technology.

2013 to present – dramatic changes in the substation and transport systems.  RTU’s and POD meters are communicating with DNP over IP.  First Deployments of TDM encapsulated IP relaying over MPLS.

1  2009 to 2014 

A  The IP/TDM infrastructure was constructed in three of the T&D coop members.  Dozens of towers, microwave paths, ethernet routers.

  1. 2015 to 2016

B   The IP/TDM infrastructure was extended in 2 additional of the T&D coop members.  Dozens of towers, microwave paths, ethernet routers.

  1. 2017 to present Installed first planned Grid focused designed deployment of firewalls at edges, external facing points,  some internal locations

Two thousand sixteen to the present – implement best practices cybersecurity measures and systems AND aim at being NERC CIP compliant for Low and Medium classifications.

  1. Cyber Security – Treat the entire system holistically across multiple organization transmission communications systems within the G&T.
  2. System Design Goals

Definition of holistic – ‘characterized by comprehension of the parts of something as intimately interconnected and explicable only by reference to the whole.’.

  •  Implement a Reliable Network
  •  Implement End to End Cyber Security
  •  Implement Reliable Grid OPS Data Centers
  •  Reliable Communications
  •  System Tuning and Optimization
  •  Training and testing
  1. Implement the five essential functions of cybersecurity
basic cyber functions.PNG
  1. By doing the following
    1. Prevention
    2. Intrusion detection 
    3. Threat hunting
    4. Incident response
    5. Remediation
  2. This process implemented at all levels of the GRid Operational Network (GRON).
    1. Points of Delivery of Wholesale Electricity 
  • Substations and Meter Points
  1. Points of Interconnections – sources of Wholesale Electricity
  • Switchyards, Generation Assets
  1. Grid Data Centers – run the threat detection, prevention, and hunting/monitoring systems in addition to other core functions of a Transmission and Distribution System.

III What is different about Grid OT versus Enterprise IT and Why it matters

  1. Monitoring and Security in the GRON

a.   Providing visibility and security to Grid OT networks is not merely taking Enterprise IT implementing tools and practices and applying to Grid IP networks.  There are unique Industrial Control System (ICS) requirements that should be addressed. 

b.  Safety and Reliability – Many industrial systems (in our case the Grid electrical system) operate 24/7/365 and involve processes with significant safety risks. Network interruptions or system failures may harm people, cause service or production disruptions, and result in negative economic consequences.

c.  Industrial Protocols – Industrial Control Systems (ICS, SCADA, DCS) use many protocols that are unknown in the Enterprise IT world, and that are inherently insecure.  These systems and connections are what the Enterprise IT world calls in many cases legacy systems that still use older forms of communications and operating systems.  They range from RS232 to RS485 copper based serial communications to Modbus and DNP protocols, to earlier Windows OS OEM embedded systems, to various forms of UNIX and Linux OS and embedded specialty firmware.  These all have vulnerabilities that differ significantly from the Enterprise IT environment, and many times require special handling.

  1. The scale of Geography, Heterogeneous,  and Legacy Systems – Grid OT Industrial networks are usually large, include many diverse assets, and often consist of multiple connected architectures. In the age of the Industrial Internet of Things (IIoT), they are also adjusted frequently, with new endpoint devices added regularly, and changed all the time.  In the case of the electric Grid network,  large also means geography that can span hundreds of miles. 

IV  The GRid Operational Network (GRON) differs from the Enterprise IT Network for reliability and Cyber protections

The GRid Operational Network or GRON is multi-organization at all levels,  the power generation level, transmission level, and distribution. This increase in connectedness  has increased and developed over the past 15 years due to the Wholesale Power Markets, and the resulting complex physical and logical real-time telemetry interconnects that each organization is mandated to make and maintain for operational benefit and prudence of the regional and local Grid OT networks as well as various purchasing agreements for wholesale power.

  • Connections between Distribution coops and Transmission / Generation (G&T) coops are common and necessary.  The connection from G&T coops and Regional Transmission Operators (RTO’s) are common and necessary.
    • These are both electrical (Substation and Transmission Line) and Digital / IP network connections.
    • A variety of telemetry and control system communications occurs between the coop organizations and neighbors that are RTOs, Meter Data Management companies, Investor Owned Utilities (IOUs) and other G&T coops.
  • These interconnections increase the Cyber threat landscape that needs to be monitored and protected.
    • Firewalls become a substation fixture in basic design
    • Routing security is as vital as the firewall and threat management design choices
    • Logical and physical separations are designed and chosen based on many factors that answer the following design questions:
      • NERC CIP – where does that apply?
      • Cyber Security Best Practices for SCADA/ICS
        • NERC CIP compliance AND Proper Protection/Monitoring

V.  Deploying Grid Cyber Security and Resilient Network Operations

  1. The project started by doing the more straightforward things:

A Several IP/Routing/Cyber Security vendors and contractor/consultants were evaluated  Two were employed prior to finding the one with the right skill set to develop the cyber and routing systems.

B In 2016 – It became apparent that the  tasks that needed to be done were more extensive than what the organization had staff to do the work scope or had the qualified staff.

C Through trial and error, The hardware and software platforms were selected to start building a resilient and effective cyber defense system.

D Resilient Routing  and Reliable Communications

E  Chose the routing protocol – Open Shortest Path First (OSPF)  as the primary transport IP routing protocol.

F  The infrastructure build plan resulted in several physical loops and enabled some mesh routing connections which made OSPF very reliable as a routing protocol for N-1 loss of paths in the IP network.

F  The focus changed with the realization in mid-2016 that the resilient, reliable IP transport also increased the attack surface and needed a focused design on Cyber Security isolation methods.  Our efforts also coincided with the first implementations of NERC Low Impact regulations which raised the awareness of potential vulnerabilities.

G The sources and resources  of what to do for Cyber Security design and implementation

  1. National Institute of Standards and Technology (NIST)
  2. North American Electric Reliability Corporation (NERC)
  3. MITRE Corporation Cyber Threat Recommendations
  4. Center for Internet Security CIS Top 20 Controls
  5. North American Transmission Forum (NATF)
  6. Reviews by Cyber Design Consultants
  7. Active Penetration testing in late 2018

VI Mistakes That were made early

  1. 2016 – Early phase 1 – Failure to examine comprehensively enough beyond NERC CIP for Low impact until early 2017 for cyber security defenses that were necessary
  2. The changes and lessons learned resulted in some learning do-overs 
  3. The rate of change of the Threat landscape and new developments of products and features for ICS SCADA changed what was required after 2017.
  4. 2017 – late phase 1 – The  focus on core routing redesign to make network resiliency a primary design factor (RELIABLE NETWORK).  Protocol choice changed late in 2017 to migrate to MPLS from OSPF as a primary transport protocol.

VII Other Observations

  1. Underestimating the level of effort in general.  The project was underway for over a year before realizing the scope of the undertaking was far more than initially envisioned.
  2. The ETEC Grid IP network covers over 40 East Texas Counties,  approximately eighty microwave radio towers,  small sections of single mode fiber (OPGW and underbuilt aerial) on Transmission and Distribution right of ways,  about eighty two microwave paths,  eight  ISP injection points,  several Ethernet private carrier lines, numerous RF multipoint systems and approximately forty five LTE routers as backup routing nodes.
  3. We chose to identify and deploy the best of breed routing and cybersecurity solutions.  The all in one solution in our view were not complete enough in critical areas. 

VIII  Phase 1 Cyber Security and Reliable Network

The following outlines what was deployed in late 2016 to late 2017

  1. Secure and isolate external access

Secure and isolate external access to Bulk Electric System (BES) networks and devices from any outside network /device/organization or individual – using the latest in Unified Threat Management (UTM) firewalls at each potential entry point (NERC calls or called this a routable node).

  1. Build a resilient network that could survive N-1 failures
  2. Move from layer 2 to layer 3 OSPF routing
  3. Place essential internal firewalls to do major isolations
  1. Phase 2 Cyber Security and Reliable Communications
  1. Displace OSPF with MPLS as the primary dynamic engineered Traffic routing system to further secure the transport system with a more secure and controllable routing protocol.
  2. Reduce the attack surface where possible without negative impact to required Enterprise functions for delivery of Electricity
  3. Implement System Wide monitoring for Reliable Network, Reliable Communications, and threat management
  4. Multi-vendor approach and solutions
  5. Physically isolate and logical isolate and deep packet inspection with Unified Threat Management (UTM) systems that are designed for ISC/SCADA Grid IP systems.
  6. Harden the systems
  7.  Implement Operating System Hardening Best practices for endpoint user devices and server-based systems
  8. Two-factor authentication and trusted zone restrictions policies for maintenance and configuration access by authorized users.
  9. Tighten Firewall rules internally between Grid IP trusted networks to better protect Critical devices and systems,  including  Deep Packet Inspection of ICS/DNP/Modbus/SEL protocol zones
  10. Implement 802.1X Network Access control on all compatible systems.
  11. Implement a Network Operations Center and Security Operations Center for 24×7 coverage and support.

IX  Threat hunting and Intrusion Detection and Prevention

Implement systems to watch for the behavior of users, devices, and traffic  by deploying the following systems:

  • Firewall logging
  • SYSLOG output from network routers and servers, etc
  • Event logs from Scada Servers
  • Active Domain tracking of devices,  user access and logs
  • Firewall monitoring of zone to zone traffic to establish baselines
  • Feed all the data into a common Security Information Event Manager
  • The final stages of implementing all of these systems and processes is underway.

X.   2019 plans

  •  Final Deployments of all cyber monitoring and threat management systems 
  • Isolation segmentation at each substation in most of the ETEC coop areas.
  • Example of a ETEC Cyber Secure Substation logical container diagram
  1. J Hargrove – various cyber project engineering plans and documents 2016-2018, unpublished.
  2. Skyhelm LLC – Network and Cyber Architect, documents 2018, consultant to ETEC, unpublished.
  3. NRECA’s Rural Cooperative Cybersecurity Capabilities (RC3) Program 
  4. NIST Special Publication 800-82  Guide to Industrial Control Systems (ICS) Security
  5. NIST Special Publication 800-14, Generally Accepted Principles and Practices for Securing Information Technology Systems
  6. NIST Special Publication 800-60 Volume I, Guide for Mapping Types of Information and Information Systems to Security Categories
  7. Carl M. Hurd, Michael V. McCarty, May 2017, A Survey of Security Tools for the Industrial Control System Environment, DARPA Public Release Center (PRC)

More Posts

The Challenges in RURAL services

#digitaldivide hashtag#ruralbroadband hashtag#healthcare I have spent the last two years of 2020-2022 and so far in 2023 starting conversations with people from across the spectrum;

The Challenges in RURAL Services

Countless companies and groups are holding focus groups and doing surveys about the problem which is well known – people who live in the rural space outside of town centers do not have reliable, functional, usable internet that can change their lives and help them live the same entire digital life that the city and town dwellers can.