Why the Best Engineers Ask “What Fails?” Before “What Works?”
By John E. Hargrove, P.E.
Early in my career at Gulf States Utilities, I learned to measure signals. Received signal level. Fade margin. Bit error rate. Crosstalk isolation at 1000 Hz. These were the metrics that mattered in telecommunications—numbers you could trace on test equipment, specifications you could verify against datasheets.
I was good at it. I understood the physics. I knew how to read oscilloscopes and spectrum analyzers, calculate path loss, and interpret eye diagrams and jitter measurements. I thought this was engineering.
It wasn’t. It was the entry point to engineering.
The Insufficient Truth of Specifications
A specification tells you what a component should do under controlled conditions. It doesn’t tell you what happens when that component is asked to operate for twenty years in a substation control panel with inadequate ventilation, marginal grounding, and legacy protocols it was never designed to handle.
It doesn’t tell you what happens when the person who installed it retires without documenting the workarounds they built into the configuration. It doesn’t tell you what happens when that component becomes part of a system where failures cascade in ways no single vendor anticipated.
Specifications describe signals. Engineering is about systems. And stewardship is about consequences.
From Signals to Systems
Somewhere around my tenth year in the field, I stopped seeing microwave paths and started seeing microwave networks. I stopped seeing RTUs and started seeing SCADA architectures. The shift wasn’t about learning more—it was about asking different questions.
Instead of “Does this radio meet spec?” I started asking:
- What happens when this link goes down during an hurrican/lightning/ice storm?
- How does this failure propagate to the control center?
- Can operators tell the difference between a communications failure and a substation failure?
- What do we lose if we can’t trust this path?
These are systems questions. They can’t be answered by looking at data sheets. They require understanding integration, interaction, and failure modes—how components behave when combined, how signals become information, how information becomes control, and how control affects physical systems.
Idaho National Laboratory’s cyber-informed engineering methodology formalizes what I learned through experience: the question isn’t “was there a cyber intrusion?” The question is “what physical consequence could this cyber pathway cause, and how do we design it out?”
This is the shift from signals to systems. It’s the shift from what works to what fails.
The Uncomfortable Reality of Legacy
Modern engineering culture celebrates innovation, but critical infrastructure relies on legacy systems. Substations commissioned in 1978. Relay protection schemes with decades of operational history. SCADA protocols that predate TCP/IP. Historically, Windows operating systems have not been supported by their manufacturer.
These systems weren’t poorly designed. They were designed for a different world—a world where networks were simpler, threats were different, and “airgap” meant something real.
The temptation is to replace them. However, replacement introduces new risks: configuration errors, unfamiliar failure modes, loss of institutional knowledge, and the hubris of assuming that the new is inherently better.
Systems thinking recognizes that legacy is a design constraint, not a failure. The question isn’t “how do we eliminate legacy?” It’s “how do we contain legacy behavior so it remains confined, observable, and non-propagating?”
This requires architectural containment: zones and conduits, protocol isolation, deny-by-default boundaries, and compensating controls. It requires accepting that you cannot patch some systems without creating operational risk—and designing so that unpatchable systems cannot threaten the broader network.
You design for reality, not for the ideal.
From Systems to Stewardship
Stewardship is what happens when you recognize that the systems you design will outlive your involvement with them. They will be operated by people you’ve never met, maintained by technicians trained on different principles, and stressed by conditions you didn’t anticipate.
Engineering becomes stewardship when you ask:
- Can someone else understand this ten years from now?
- Will this design make sense when I’m not here to explain it?
- Have I made it possible to recover from failure, or only to prevent it?
- What evidence will exist to support decisions under pressure?
Stewardship rejects the idea that “it works” is sufficient. It demands that systems be operable, understandable, recoverable, and defensible.
Stewardship in Practice
Stewardship shows up in mundane decisions that matter deeply over time:
- Documentation that explains why, not just what. Future engineers need to understand the rationale for design choices, not merely the configuration.
- Architecture that isolates consequences. Broadcast traffic doesn’t propagate across zones. Field devices can’t initiate lateral movement. Failures are observable before they become crises.
- Monitoring that produces evidence, not just alerts. When something goes wrong, you need to know what happened, what was blocked, and what was never allowed.
- Restoration procedures that don’t require guessing. Safe reconnection depends on staged validation, not hope.
Stewardship is the recognition that reliability isn’t just about uptime—it’s about confidence in decision-making under stress. It’s the difference between “the lights are on” and “we know why the lights are on, and we know what to do if they go out.”
The Professional Standard
Engineering is a licensed profession for a reason. A Professional Engineer’s seal carries legal weight because it represents accountability for consequences. You cannot sign off on a design and disclaim responsibility for its failure.
This is why consequence-driven design isn’t optional. It’s the ethical foundation of the profession.
When I design a SCADA network, I’m not just enabling communications—I’m creating pathways that could, if misused or compromised, affect grid stability, personnel safety, and public trust. I’m responsible for understanding what happens when those pathways are exploited, misconfigured, or degraded by time and entropy.
The question isn’t whether the design meets compliance requirements. The question is whether I can defend it to a board of inquiry, a regulatory body, or the families of people affected by a failure I should have anticipated.
This is not paranoia. It’s professionalism.
Why This Matters Now
Critical infrastructure is under pressure from multiple directions: aging systems, growing cyber threats, workforce transitions, and accelerating technology change. The instinct is to rush toward modernization—cloud-hosted systems, AI-driven analytics, software-defined everything.
These technologies have value. But adopting them without consequence-driven design is reckless.
Modern systems differ from legacy systems in how they fail, but they still fail. The question isn’t whether to modernize—it’s whether modernization is being approached with the same rigor we once applied to physical infrastructure.
Are we asking what fails? Are we designing for containment, observability, and recovery? Are we creating systems that future engineers can understand and maintain? Or are we creating dependencies on vendors, platforms, and expertise that may not exist under system stress?
Stewardship demands that we ask these questions before deployment, not after failure.
A Final Thought
I started my career measuring signals because that’s what young engineers do. You learn to trust the numbers, verify the specs, and take pride in technical precision.
But over time, you learn that the numbers don’t tell the whole story. Systems are more than the sum of their components. Failures are rarely single-point events. And the best engineering isn’t about making things work—it’s about understanding what happens when they don’t.
That’s the progression: from signals to systems to stewardship. From what works to what fails to what we owe the people who depend on this.
It’s not a destination. It’s a standard.
And it’s the standard we should hold ourselves to—not because regulators demand it, but because the work matters.



