Deliver to Romania
IFor best experience Get the App
Behind Human Error
D**H
Important reading for modern enterprise IT
Written by a group of leading error researchers, this was by no means an "easy read" as it required significant attention to fully understand the arguments and implications, but I found it very rewarding and almost immediately applicable to my day-to-day work within a very large enterprise IT organization that is rapidly shifting to the *-as-a-Service delivery model. The book uses a number of fascinating (and occasionally terrifying) real-world examples from aviation, medicine, nuclear power, space program, and others to explain that the act of attributing a failure to "human error" does not do anything to explain why the failure happened, nor does it generally lead to any constructive responses or improvements.Below are 3 ideas from the book that I found particularly useful and insightful.#1: Goal Conflicts:"Perhaps the most common hazard in the analysis of incidents is the naive assessment of the strategic issues that confront practitioners."When investigating a failure, it is crucial to recognize that system operators are often dealing with multiple, competing goals. Operators must regularly assess and resolve these goal conflicts by making trade-off decisions which necessarily involve risk, and often these decisions frequently must be made under time pressure. Operators are normally able to skillfully and successfully balance these conflicting goals and risks as part of their daily routine. Failures occur when the risks are unsuccessfully balanced, but that does not mean the operators were not skillful. In many cases, the actions which led to failure were the exact same actions which previously led to to success. As investigators, we need to fully understand these goal conflicts in order to avoid hindsight bias and in order to improve the team's strategies for assessing and balancing risks. Also, it is important to understand that a team's strategies for balancing risks must evolve as do changes in the operating context, and in the related goals and risks. Putting this into practice, my team recently had a postmortem discussion about an outage that involved unplanned changes to a production system. During the discussion, it was enlightening to discuss our goal conflicts and our decision-making process around making unplanned changes. We determined that eliminating unplanned changes was not the right course of action - in fact unplanned changes are sometimes essential. Instead we refined our decision making process for unplanned changes to the production system; and acknowledging that our operating context is subject to change, we set a checkpoint to re-assess this process 3 months later.#2: Distancing through Differencing"Do not discard other events because they appear on the surface to be dissimilar. At some level of analysis, all events are unique; while at other levels of analysis, they reveal common patterns.""The obstacles to learning from failure are nearly as complex and subtle as the circumstances that surround a failure itself. Because accidents always involve multiple contributors, the decision to focus on one or another of the set, and therefore what will be learned, is largely socially determined."I was fascinated by one of the book's case studies, that of a chemical fire that occurred during routine machine maintenance in a high-tech product manufacturing plant in the US. This company was one that took safety very seriously, with good working conditions, significant investment in safety, and strong motivation to examine all accidents promptly and thoroughly."The manufacturer had an extensive safety program that required immediate and high-level responses to an incident such as this, even though no personal injury occurred and damage was limited to the machine involved. High-level management directed immediate investigations, including detailed debriefings of participants, reviews of corporate history for similar events, and a โroot causeโ analysis. Company policy required completion of this activity within a few days and formal, written notification of the event and related findings to all other manufacturing plants in the company. The cost of the incident may have been more than a million dollars."The company's investigation of this accident focused on the machine, the maintenance procedures, and the operators who performed the maintenance and identified multiple deficiencies that were corrected quickly. The fascinating part of this case study was that a broader review by outside investigators found that a very similar chemical fire had occurred in one of the company's other manufacturing plants in another country earlier that same year, and that this prior event was well known by practitioners at the US plant. Both the practitioners and the internal investigators considered the prior event to be irrelevant because it had occurred in a non-US plant with a different safety system to contain fires and involved a different model of the machine. Later, the accident occurred again in the US plant, this time during a different shift, and this third event was rationalized as having been due to lower skill level of the workers in that shift. The authors use the term "Distancing through Differencing" to label the tendency of organizations and individuals to distance ourselves from failures (i.e. "that could never happen here"). My takeaway is that there is a great opportunity across the many teams now providing cloud services in enterprise IT organizations such as my own to share details about each other's failures, look for the general patterns, and avoid repeating those incidents that have occurred within other services.#3: Design-induced failures"Automation surprises begin with miscommunication and misassessments between the automation and users, which lead to a gap between the userโs understanding of what the automated systems are set up to do, what they are doing, and what they are going to do."The book contains several chapters devoted to the ways in which the design of computer systems used by operators can induce failures. These chapters detail several different aspects of this issue which is vitally important to enterprise IT as both a technology provider and as a technology consumer. Among the many points raised here was that automation often introduces new burdens on the same operators that it is intended to assist. I have seen this principle in action when teams implement automation to accomplish manual tasks but unfortunately do so in a way that does not provide users/operators with sufficient feedback to understand what is going on when it doesn't work. This is an example of automation is written without regard for the users, and it can add significant complexity and brittleness to the system.
C**R
A key resource on this topic
This is an excellent book on why failures occur and general approaches that can be used to reduce incidence of failures. I highly recommend it. The book benefits from having multiple authors while still being coherent and clear. The general idea is that, instead of looking for scapegoats and attributing 'human error' to them, it's more productive to look at the systemic factors that influence behavior of individuals and groups, and make systemic changes accordingly. I agree with this, though I would also note that there are still cases of gross negligence, incompetence, and corruption which need to be addressed accordingly.
K**R
Good book; becareful the text is pretty small
Good book; becareful the text is pretty small
M**N
Five Stars
Great book and great insight into an interesting area for the Health and Safety professional
G**P
Best book on human error ever!
This is by far the most intelligent and comprehensive explanation of "error" in print. The authors are mostly first class experts on the subject (Cook is a poseur and wannabe) and the material is both coherent and complete.The idea that error is a discrete, scientific category of human performance is finally buried by this work. Although better human performance researchers have known and accepted this for more than a decade, the message has percolated the larger research community slowly -- mostly for want of a single text that covers the history and experience that lead to this conclusion.Still, many people will continue to misunderstand the nature of "error" as the first review of this book demonstrates.This is a perfect book for a class on human performance assessments that stray into the thicket of troubles that surround the term "error".
A**S
Behind Human Error
Good, but most material is available elsewhere in other standard texts (with the exception of a few interesting case studies, eg the Apollo 13 EECOM's telemetry display).
J**U
Solid reference on human error
This text is a well structured reference on human error, starting with a summary of how our views of human error have changed over the last decades, summarizing and bringing together much of the recent work in the field and finally brining it all into focus at the end relating human error to overall safety. A fine text for anyone involved in accident or incident investigation and a solid background read for human factors professionals in general.
P**P
Five Stars
Good buy....
Trustpilot
1 month ago
4 days ago