The quiet single point of failure
When fire-fighting is mistaken for resilience
Some systems don’t rely on clear ownership. They rely on people stepping in.
Work gets done because someone notices a problem and intervenes - not because responsibility is explicit or designed.
At first, this can look like flexibility or resilience. Over time, it becomes risky and exhausting, quietly creating a single point of failure.
The cost of stopping at the fix
Most organisations are good at identifying problems. Something breaks, an incident occurs, or a disruption forces attention. The issue is named, the immediate failure is addressed, and work continues.
In many cases, that response is entirely reasonable. When systems are under pressure, the priority is restoring stability, not stepping back to conduct a deeper diagnosis while everything else waits.
The real problem comes later - not in the moment of response, but in what happens next.
Fixing a problem and understanding it are often treated as the same thing, but they aren’t. Identifying an issue answers a narrow question: what went wrong this time?
Understanding it properly requires answering harder ones.
What does this cost each time it happens?
Is this a one-off failure, or a recurring class of problem?
What would actually stop it happening again - and what would that require in terms of skills, time, authority, or disruption?
Many organisations never complete this second step. Not because they don’t care, but because it demands capacity and attention precisely when those are already stretched. Doing this work properly means slowing down, surfacing trade-offs, and sometimes admitting that the current system can’t support the change it would need.
So the response stops at the fix.
When fixing becomes the operating mode
Each fix restores stability. Service resumes, pressure drops, and attention moves on to the next priority.
The larger change that might prevent the issue recurring is deferred, not because it’s misunderstood, but because it feels slower, riskier, or harder to justify in the moment. Long-term fixes often require skills the organisation doesn’t yet have, time it can’t spare, or changes that would unsettle the system elsewhere.
Over time, this pattern becomes normal. The organisation learns how to cope with the problem rather than remove it. What began as a pragmatic response to uncertainty hardens into an operating mode. This is where fire-fighting emerges - not as chaos or incompetence, but as repeated intervention standing in for deeper diagnosis and redesign.
When the deeper work never happens, the cost doesn’t disappear. It simply shows up elsewhere. Each recurrence consumes time that could have gone into planned work, attention that could have been used to improve the system, and energy that gradually drains the people involved.
Because these costs are distributed and informal, they are rarely accounted for explicitly. They don’t appear on balance sheets or planning documents. Instead, they accumulate in diaries, inboxes, and evenings.
From the outside, the system appears to function. But it is doing so by quietly borrowing capacity from the people inside it.
How the real risk accumulates
Repeated issues create familiarity. The person who fixed the problem last time remembers where to look, what worked, and who needs to be involved.
When it happens again, involving them is simply the fastest route back to stability. Not because they own the problem, and not because they have a complete view of the system, but because they were asked to fix it before - and they remember how.
Over time, responsibility concentrates informally. Not through role design or intention, but through habit. Context accumulates in individuals because there is no space to turn it into something structural.
This is how organisations end up with single points of failure they never consciously chose to create.
From the outside, everything still looks fine. Problems are handled, deadlines are met, and the organisation keeps moving. That endurance is often interpreted as flexibility or resilience.
In reality, the system isn’t absorbing risk; it’s transferring it. Stability is being maintained by individuals compensating for gaps the system itself hasn’t addressed.
This pattern rarely stays confined to one kind of issue. Operationally, it leads to recurring incidents and fragile processes. Structurally, it prevents roles and responsibilities from ever settling properly. For people, it creates overload, burnout, and quiet dependency on those who keep stepping in.
Choosing not to do the deeper work is still a choice, even when it feels unavoidable. It preserves the current system by postponing the moment when its limits would need to be confronted - whether that means hiring expertise, creating slack, redesigning responsibilities, or accepting short-term disruption in exchange for long-term stability.
Until that moment arrives, fire-fighting remains rational. It just isn’t free. The cost is cumulative, and it accrues to the people who make the system work despite its constraints.
Originally published at lyndseyburton.com
This piece is part of a notebook on telecoms markets, regulation, complex systems and organisational behaviour.
