ISO 22000 Resource Center: March 2017

Thursday, March 23, 2017

ISO 31010:2011 Risk Assessment Techniques – IV

Controls Assessments

The level of risk will depend on the adequacy and effectiveness of existing controls. When conduction a controls assessment, questions to be addressed include:

What are the existing controls for a particular risk?

Are those controls capable of adequately treating the risk so that it is controlled to a level that is tolerable?

Are the controls operating in the manner intended and can they be demonstrated to be effective when required (in practice)?

These questions can only be answered with confidence if there are proper documentation and assurance processes in place. The level of effectiveness for a particular control, or suite of related controls, may be expressed qualitatively, semi-quantitatively or quantitatively. In most cases, a high level of accuracy is not warranted. However, it may be valuable to express and record a measure of risk control effectiveness so that judgments can be made on whether effort is best expended in improving a control or providing a different risk treatment. The following methods are given in the ISO 31010:2011 are come under control assessment tools, but some authors have categorized them in different ways in different scenarios. However, we consider following methods are to be controls assessment techniques or tools that can be used in the risk assessments.

18. Layer protection analysis (LOPA)

Layer of Protection Analysis is a risk management technique commonly used in the chemical process industry that can provide a more detailed, semi-quantitative assessment of the risks and layers of protection associated with hazard scenarios. LOPA allows the safety review team an opportunity to discover weaknesses and strengths in the safety systems used to protect employees, the plant, and the public. LOPA is a means to identify the scenarios that present the most significant risk and determine if the consequences could be reduced by the application of inherently safer design principles. LOPA can also be used to identify the need for safety instrumented systems (SIS) or other protection layers to improve process safety.

A variety of approaches are employed which could use order-of-magnitude, half order-of-magnitude, and decimal math. Conceptually, LOPA is used to understand how a process deviation can lead to a hazardous consequence if not interrupted by the successful operation of a safeguard called an independent protection layer (IPL). An IPL is a safeguard that can prevent a scenario from propagating to a consequence of concern without being adversely affected by either the initiating event or by the action (or inaction) of any other protection layer in the same scenario.

The basic steps are:

A cause-consequence pair is selected and the layers of protection which prevent the cause leading to the undesired consequence are identified.

An order of magnitude calculation is then carried out to determine whether the protection is adequate to reduce risk to a tolerable level.

LOPA is a less resource-intensive process than a fault tree analysis or a fully quantitative form of risk assessment, but it is more rigorous than qualitative subjective judgments alone. It basically focuses efforts on the most critical layers of protection, identifying operations, systems and processes for which there are insufficient safeguards and where failure will have serious consequences. However, this technique looks at one cause-consequence pair and one scenario at a time and, therefore, does not apply to complex scenarios where there are many cause consequence pairs or where a variety of consequences affect different stakeholders.

Normally multiple Protection Layers (PLs) are provided in the process industry, in which each protection layer consists of a grouping of equipment and/or administrative controls that function in concert with the other layers. Protection layers that perform their function with a high degree of reliability may qualify as Independent Protection Layers (IPL). The criteria to qualify a Protection Layer (PL) as an IPL are:

The protection provided reduces the identified risk by a large amount, that is, a minimum of a 10- fold reduction.

The protective function is provided with a high degree of availability (90% or greater).

It has the following important characteristics:

Specificity

An IPL is designed solely to prevent or to mitigate the consequences of one potentially hazardous event (e.g., a runaway reaction, release of toxic material, a loss of containment, or a fire). Multiple causes may lead to the same hazardous event; and, therefore, multiple event scenarios may initiate action of one IPL.

Independence

An IPL is independent of the other protection layers associated with the identified danger.

Dependability

It can be counted on to do what it was designed to do. Both random and systematic failures modes are addressed in the design.

Auditability

It is designed to facilitate regular validation of the protective functions. Proof testing and maintenance of the safety system is necessary. Only those protection layers that meet the tests of availability, specificity, independence, dependability, and auditability are classified as Independent Protection Layers.

19. Decision Tree

Decision trees are a simple, but powerful form of multiple variable analysis, which provide unique capabilities to supplement, complement, and substitute for traditional statistical forms of analysis (such as multiple linear regression) with variety of data mining tools and techniques (such as neural networks) as well as recently developed multidimensional forms of reporting and analysis found in the field of business intelligence. A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision. Decision tree builds classification or regression models in the form of a tree structure, which breaks down a data-set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes, where a decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf node (e.g., Play) represents a classification or decision, whereas the topmost decision node in a tree which corresponds to the best predictor called root node.

Decision trees can handle both categorical and numerical data. Decision trees can be drawn by hand or created with a graphics program or specialized software. Informally, decision trees are useful for focusing discussion when a group must make a decision. Programmatically, they can be used to assign monetary/time or other values to possible outcomes so that decisions can be automated. Decision tree software is used in data mining to simplify complex strategic challenges and evaluate the cost-effectiveness of research and business decisions. Variables in a decision tree are usually represented by circles.

Drawing a Decision Tree

You start a Decision Tree with a decision that you need to make, and draw a small square to represent this towards the left of a large piece of paper.

From this box draw out lines towards the right for each possible solution, and write that solution along the line. Keep the lines apart as far as possible so that you can expand your thoughts.

At the end of each line, consider the results. If the result of taking that decision is uncertain, draw a small circle.

If the result is another decision that you need to make, draw another square. Squares represent decisions, and circles represent uncertain outcomes.

Write the decision or factor above the square or circle. If you have completed the solution at the end of the line, just leave it blank.

Starting from the new decision squares on your diagram, draw out lines representing the options that you could select. From the circles draw lines representing possible outcomes.

Again make a brief note on the line saying what it means.

Keep on doing this until you have drawn out as many of the possible outcomes and decisions as you can see leading on from the original decisions.

20. Human Reliability Analysis

Human reliability assessment (HRA) is the common name for an assortment of methods and models that are used to predict the occurrence of ‘human errors’. While the origin of HRA is in Probabilistic Safety Assessment (PSA), HRA is increasingly being used on its own both as a way to assess the risks from ‘human error’ and as a way to reduce system vulnerability. According to the three principal functions of HRA are:

Identifying what errors can occur (Human Error Identification),

Deciding how likely the errors are to occur (Human Error Quantification), and,

Enhancing human reliability by reducing the given error likelihood (Human Error Reduction), if appropriate.

Thus anticipating failures of joint human-machine systems requires an underlying model. This should not be a model of human information processing in disguise, but a model of how human performance is determined by – hence reflects – the context or circumstances, i.e., a model of joint system performance rather than of individual human actions.

Human reliability assessment deals with the impact of humans on system performance and can be used to evaluate human error influences on the system. At the risk of stating the obvious, human reliability is very important due to the contributions of humans to the resilience of systems and to possible adverse consequences of human errors or oversights, especially when the human is a crucial part of today’s large socio-technical systems. A variety of methods exist for human reliability analysis. These break down into two basic classes of assessment methods, that are probabilistic risk assessment (PRA) and those based on a cognitive theory of control. In 2009, the Health and Safety Laboratory compiled a report for the Health and Safety Executive (HSE) outlining HRA methods for review. They identified 35 tools that constituted true HRA techniques and that could be used effectively in the context of health and safety management.

Conducting a HRA requires people with expertise in conducting such methods where you need to follow these steps to prepare a risk model based on HRA.

1. Understand the actions being investigated

2. Use a structured approach to investigate and represent the actions (task analysis)

3. Consider the level of detail needed (compare with description detail of available data)

4. Understand the failure criteria

5. Select an appropriate technique(s)

Represent the identified errors in a risk model

Application of HRA

i). Eliminate Error Occurrence

This is the first preference, where design features known to be a source of human error are eliminated (e.g., lack of feedback, lack of differentiation, inconsistent or unnatural mappings). Design choices available for error elimination include:

Replacement of error inducing design features (e.g., physical device separation, physical guards, application of validity and range)

Restructuring of task where error prevalent behaviour is no longer performed (e.g., by information filtering, only the information needed for the task is provided).

Automate to change the role of human involvement in support of task performance.

ii). Reduce Error Occurrence

Consider this approach, if complete error elimination is not possible or feasible through design choices. Design features which can reduce error occurrence include:

Identification (e.g., device labeling)

Constraints (i.e., build in constraints to limit operation to acceptable ranges)

Coding (i.e., aid in choice differentiation and selection)

Consistency feedback (i.e., convey device and system state directly in the interface)

Predictability (i.e., design system responses so that operators can associate specific control actions with system response)

iii). Eliminate Error Consequence

The third approach is to eliminate error consequences, where there are three categories of design features that reflect the components of the consequence prevention strategy:

A. Error detection design features (to promote detection prior to consequence/ occurrence):

Feedback (i.e., status information in relation to operational goals and potential side-effects of an action)

Alert of Unacceptable Device States (e.g., visual/auditory feedback of off-normal or unacceptable device states)

Confirmation (i.e., support of self-checking and independent verification practices)

Prediction (i.e., providing information on the outcome of an action prior to its implementation, or with sufficient time for correction)

B. Error recovery design features (to enable recovery prior to consequence/ occurrence):

Undo (e.g., facilities for reversing recent control actions to allow promote error recovery)

Guidance (i.e., alternative forms of guidance for cases where reversing a recent control action is not the preferred action)

C. Consequence Prevention Design Features

Interlocks

Margins and Delays (i.e., these features can provide more time to unacceptable consequence realization, thus increasing the chances of error detection and recovery prior to consequence occurrence)

Fail Safe Features

iv). Reduce Error Consequence

If errors and consequences cannot be completely eliminated, consider measures that enable consequence reduction. This may be achieved through application of additional design features that allow operators or automation to recognize the occurrence of an error consequence, and to take action to mitigate the consequences.

Additional design features include:

Margins (i.e., apply larger design margins to allow some consequences to be accommodated by normal system function and capacities).

Engineered Mitigating Systems (e.g., automatic special safety systems actions, such as CANDU Automatic Stepback and Setback).

Human Intervention (i.e., operations team can readily adapt to both predefined and undefined operating situations).

Response Teams (i.e., organizational structure is prepared and coordinated to deal with the predefined consequences).

Consequence Prediction (e.g., aids can assist operations staff in predicting the extent of consequences of operating actions and assist in selection and execution of mitigating actions).

Backup Replacement Function (i.e., provision of equipment and/or human intervention to mitigate consequences).

21. Bow Tie Analysis

Bow-tie analysis is a simple diagrammatic way to display the pathways of a risk showing a range of possible causes and consequences. It is used in situations when a complex fault tree analysis is not justified or to ensure that there is a barrier or control for each of the possible failure pathways. The Bow-tie analysis starts with the risk at the “knot” of the tie, and then moves to the left to identify and describe the events or circumstances that may cause the risk event to occur, paying particular attention to root causes. Once those causes have been identified, the analysis then identifies preventive measures that could be implemented. At this point there could be an evaluation of the actual preventive measures that the organization has in place to determine whether additional measures should be implemented. The analysis then moves to the right to look at the potential consequences that would result after the risk event happens, and the plans the organization either has or should have in place to minimize the negative effects of the risk.

Bow-tie Diagram Construction

1. Define the Hazard and the Top Event which is the initial consequence

"What happens when the danger is" released "?"

2. Identify the Threats which are the Top Event causes

"What causes the release of danger? "How can lost control?

3. Identify the existing Protection Barriers each Threat

Prevent the Top Event occurrence

Can be independents or dependents

"How can controls fail?"

"How can that their effectiveness can be compromised?"

4. Identify for each Barrier their Escalation Factors

Factors that make the Barrier fail

“How can we avoid that the hazard being released?

“How can we keep the control?”

5. Identify for each Barrier their Escalation Factors Control

Factors that prevent or minimize the possibility of the Barrier or the Recovery Measures becomes ineffective

"How to ensure that the controls will not fail?"

6. Identify the consequences

Top Event could have several consequences

7. Identify the Recovery Measures

Factors that make the barriers fail

"How can we limit the severity of the event?”

"How can we minimize the effects?"

8. Identify for each Recovery Measure their Escalation Factors and Escalation Factors Controls

9. For each Barrier, Recovery Measures and Escalation Factors Controls identify the Critical Safety Tasks

Critical Safety Tasks

Tasks prevent and/or minimize the possibility of the Barrier, the Escalation Factor Control or the Recovery Measures fails or becomes ineffective What tasks can be taken to ensure that the control is working?

Project engineering, operation, maintenance, management.

"How can we ensure that these tasks are done?"

"Who do these tasks?

"How do you know when to do the tasks?"?

"How do you know what to do?“

"Is there a procedure, checklist, instruction?"

22. Reliability Centered Maintenance (RCM)

Reliability centered maintenance (RCM) is a corporate-level maintenance strategy that is implemented to optimize the maintenance program of a company or facility. RCM is the process of determining the most effective maintenance approach, where RCM philosophy employs Preventive Maintenance (PM), Predictive Maintenance (PdM), Real-time Monitoring (RTM), Run-to-Failure (RTF- also called reactive maintenance) and Proactive Maintenance techniques in an integrated manner to increase the probability that a machine or component will function in the required manner over its design life cycle with a minimum of maintenance. The goal of the philosophy is to provide the stated function of the facility, with the required reliability and availability at the lowest cost. RCM requires that maintenance decisions be based on maintenance requirements supported by sound technical and economic justification.

RCM allows you to identify applicable and effective preventive maintenance requirements for equipment “…in accordance with the safety, operational and economic consequences of identifiable failures, and the degradation mechanism responsible for those failures”. RCM uses a failure mode, effect and criticality analysis (FMECA) type of risk assessment that requires a specific approach to analysis in this context. From a quality management standpoint, it’s worth being aware that RCM identifies required functions and performance standards and failures of equipment and components that can interrupt those functions. The final result of an RCM program is the implementation of a specific maintenance strategy on each of the assets of the facility. The maintenance strategies are optimized so that the productivity of the plant is maintained using cost-effective maintenance techniques.

There are four principles that are critical for a reliability centered maintenance program.

1. The primary objective is to preserve system function

2. Identify failure modes that can affect the system function

3. Prioritize the failure modes

4. Select applicable and effective tasks to control the failure modes

An effective reliability centered maintenance implementation examines the facility as a series of functional systems, each of which has inputs and outputs contributing to the success of the facility. It is the reliability, rather than the functionality, of these systems that are considered. There are basic seven questions that need to be asked for each asset is:

1. What are the functions and desired performance standards of each asset?

2. How can each asset fail to fulfill its functions?

3. What are the failure modes for each functional failure?

4. What causes each of the failure modes?

5. What are the consequences of each failure?

6. What can and/or should be done to predict or prevent each failure?

7. What should be done if a suitable proactive task cannot be determined?

23. Sneak Circuit Analysis

Sneak Circuit Analysis (SCA) is used in safety-critical systems to identify sneak (or hidden) paths in electronic and electro-mechanical systems that may cause unwanted action or inhibit desired functions. Sneak analysis can locate problems in both hardware and software using any technology. The sneak analysis tools can integrate several analyses such as fault trees, failure mode and effects analysis (FMEA), reliability estimates, etc. into a single analysis saving time and project expenses. The technique helps in identifying design errors and works best when applied in conjunction with HAZOP. It is very good for dealing with systems which have multiple states such as batch and semi-batch plant.

The analysis is based on identification of designed-in inadvertent modes of operation and is not based on failed equipment or software. SCA is most applicable to circuits that can cause irreversible events. These include:

a. Systems that control or perform active tasks or functions

b. Systems that control electrical power and its distribution

c. Embedded code which controls and times system functions

The SCA process differs depending on whether it is applied to electrical circuits, process plants, mechanical equipment or software technology, and the method used is dependent on establishing correct network trees. When evaluating sneak circuits, the problem causing path may consist of hardware, software, operator actions, or combinations of these elements. Sneak circuits are not the result of hardware failure but are latent conditions, inadvertently designed into the system, coded into the software program, or triggered by human error. Four categories of sneak circuits are:

1. Sneak Paths

Unexpected paths along which current, energy, or logical sequence flows in an unintended direction

2. Sneak Timing

Events occurring in an unexpected or conflicting sequence

3. Sneak Indications

Ambiguous or false displays of system operating conditions that may cause the system or an operator to take an undesired action

4. Sneak Labels

Incorrect or imprecise labeling of system functions - e.g., system inputs, controls, display buses, that may cause an operator to apply an incorrect stimulus to the system

Sunday, March 19, 2017

ISO 31010:2011 Risk Assessment Techniques – III

Function Analysis

As part of decision making approach, a problem is broken down into its component functions (accounting, marketing, manufacturing, etc.) in which, these functions are further divided into sub-functions and sub-sub functions ... until the function level suitable for solving the problem is reached. Thus, in a mathematical point of view; functional analysis is a methodology that is used to explain the workings of a complex system. The basic idea is that the system is viewed as computing a function (or, more generally, as solving an information processing problem). Functional analysis assumes that such processing can be explained by decomposing this complex function into a set of simpler functions that are computed by an organized system of sub processors. The hope is that when this type of decomposition is performed, the sub functions that are defined will be simpler than the original function, and as a result will be easier to explain. The following risk assessment techniques are based on function analyses and are more complex than previous set of assessment methods.

This is the third article of ISO 31010:2009 risk assessment methods, which continuing to explain 31 risk assessment methods given in the standard. We already discussed the initial 12 methods given in the ISO 31010:2009 standard. This part of the methods is more complex and resource requirements are high requiring expert knowledge on the subject.

13. Failure Mode Effect Analysis

FMEA (Failure modes and effects analysis) and FMECA (Failure modes and effects and criticality analysis) FMEA/FMECA is an inductive reasoning (forward logic) single point of failure analysis and is a core task in reliability engineering, safety engineering and quality engineering. Quality engineering is especially concerned with the “Process” (Manufacturing and Assembly) type of FMEA.

FMEA/FMECA identifies:

All potential failure modes of the various parts of a system (a failure mode is what is observed to fail or to perform incorrectly);

The effects these failures may have on the system;

The mechanisms of failure;

How to avoid the failures, and/or mitigate the effects of the failures on the system.

FMEA/FMECA is a systematic analysis technique that can be used to identify the ways in which components, systems or processes can fail to fulfill their design intent, highlighting:

Design alternatives with high dependability;

Failure modes of systems and processes, and their effects on operational success have been considered;

Human error modes and effects;

A basis for planning testing and maintenance of physical systems;

Improvements in the design of procedures and processes.

FMEA/FMECA also provides qualitative or quantitative information for other types of analysis, such as fault tree analysis, and is used in quality assurance applications. For example, it can produce a semi-quantitative measure of criticality known as the risk priority number (RPN) obtained by multiplying numbers from rating scales (usually between 1 and 10) for (a) consequence of failure, (b) likelihood of failure, (c) ability to detect the problem, where a failure is given a higher priority if it is difficult to detect.

14. Fault Tree Analysis

Fault tree analysis is a deductive procedure used to determine the various combinations of hardware and software failures and human errors that could cause undesired events (referred to as top events) at the system level. The deductive analysis begins with a general conclusion, then attempts to determine the specific causes of the conclusion by constructing a logic diagram called a fault tree. This is also known as taking a top-down approach. It is a technique used in safety engineering and reliability engineering, mostly in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries. Fault tree analysis (FTA) can be used to understand how systems can fail, to identify the best ways to reduce risk or to determine or ‘get a feel for’ event rates of a safety accident or a particular system level (functional) failure. It sounds more complicated than it actually is; however, it is a resource hungry method. The analysis starts with undesired event (top event) and determines all the ways in which it could occur, shown graphically in a logical tree diagram.

Fault tree analysis is a time-consuming and costly exercise although it can be invaluable in determining the probability of (undesirable) outcomes.

FTA can be used to:

Understand the logic leading to the top event / undesired state.

Show compliance with the (input) system safety / reliability requirements.

Prioritize the contributors leading to the top event – Creating the Critical Equipment/Parts/Events lists for different importance measures.

Monitor and control the safety performance of the complex system (e.g., is a particular aircraft safe to fly when fuel valve x malfunctions? For how long is it allowed to fly with the valve malfunction?).

Minimize and optimize resources.

Assist in designing a system. The FTA can be used as a design tool that helps to create (output / lower level) requirements.

Function as a diagnostic tool to identify and correct causes of the top event. It can help with the creation of diagnostic manuals / processes.

Fault tree construction

To do a comprehensive FTA, follow these steps:

1. Define the top event

To define the top event the type of failure to be investigated must be identified. This could be whatever the end result of an incident may have been, such as a forklift overturning.

2. Determine all the undesired events in operating a system

Separate this list into groups having common characteristics. Several FTAs may be necessary to study a system completely. Finally, one event should be established representing all events within each group. This event becomes the undesired event to study.

3. Know the system

All available information about the system and its environment should be studied. A job analysis may prove helpful in determining the necessary information.

4. Construct the fault tree

This step is perhaps the simplest because only the few symbols are involved and the actual construction is pretty straightforward.

Principles of construction

The tree must be constructed using the event standard symbols which should be kept simple.

Maintain a logical, uniform, and consistent format from tier to tier.

Use clear, concise titles when writing in the event symbols.

The logic gates used should be restricted to the “and gate” and “or gate” with constraint symbols used only when necessary.

The transfer triangle should be used sparingly if at all.

The more the transfer triangle is used, the more complicated the tree becomes.

The purpose of the tree is to keep the procedure as simple as possible.

5. Validate the tree

This requires allowing a person knowledgeable in the process to review the tree for completeness and accuracy.

6. Evaluate the fault tree

The tree should then be scrutinized for those areas where improvements in the analysis can be made or where there may be an opportunity to utilize alternative procedures or materials to decrease the hazard.

7. Study tradeoffs

In this step, any alternative methods that are implemented should be further evaluated. This will allow evaluators to see any problems that may be related with the new procedure prior to implementation.

8. Consider alternatives and recommend action

This is the last step in the process where corrective action or alternative measures are recommended.

15. Event Tree Analysis

Event tree analysis (ETA) is an analysis technique for identifying and evaluating the sequence of events in a potential accident scenario following the occurrence of an initiating event and a forward, bottom up, logical modeling technique for both success and failure. ETA utilizes a visual logic tree structure known as an event tree (ET). The objective of ETA is to determine whether the initiating event will develop into a serious mishap or if the event is sufficiently controlled by the safety systems and procedures implemented in the system design. An ETA can result in many different possible outcomes from a single initiating event, and it provides the capability to obtain a probability for each outcome. It is arguably less resource intensive than fault tree analysis. ETA can be applied to a wide range of systems including: nuclear power plants, spacecraft, and chemical plants. Once again, if you are managing the quality system of a small enterprise in a relatively ‘low risk’ context, this technique is unlikely to be for you.

ETA Process in 10 steps

Step	Task	Description
1	Define the system.	Examine the system and define the system boundaries, subsystems, and interfaces.
2	Identify the initiating events.	Perform a system assessment or hazard analysis to identify the system hazards and accident scenarios existing within the system design
3	Identify the initiating events.	Refine the hazard analysis to identify the significant IEs in the accident scenarios. IEs include events such as fire, collision, explosion, pipe break, toxic release, etc.
4	Identify the pivotal events.	Identify the safety barriers or countermeasures involved with the particular scenario that are intended to preclude a mishap.
5	Build the event tree diagram.	Construct the logical ETD, starting with the IE, then the PEs, and completing with the outcomes of each path.
6	Obtain the failure event probabilities.	Obtain or compute the failure probabilities for the PEs on the ETD. It may be necessary to use FTs to determine how a PE can fail and to obtain the probability.
7	Identify the outcome risk.	Compute the outcome risk for each path in the ETD.
8	Evaluate the outcome risk.	Evaluate the outcome risk of each path and determine if the risk is acceptable.
9	Recommend corrective action.	If the outcome risk of a path is not acceptable, develop design strategies to change the risk.
10	Document ETA.	Document the entire ETA process on the ETDs. Update for new information as necessary.

16. Cause and Consequence Analysis

Cause-consequence analysis (CCA) is a method for analyzing consequence chains and can be used individually or as a supportive method for other analysis methods. The objective of the analysis is to recognize consequence chains developing from failures or other unwanted events, and to estimate these consequences with their probabilities. The cause-consequence structure of the analysis is formed by combining two different types of tree structures together. To the consequence tree, built from left to right, includes the examined primary event and its follow-up events leading eventually to a failure or some other unwanted event like for example a serious injury of a person.

ISO 31010 describes the Cause and consequence analysis method as: “A combination of fault and event tree analysis that allows inclusion of time delays. Both causes and consequences of an initiating event are considered.”

It starts from a critical event and analyses consequences by means of a combination of “YES/NO” logic gates which represent conditions that may occur or failures of systems designed to mitigate the consequences of the initiating event. The causes of the conditions or failures are analyzed by means of fault trees. Cause-consequence analysis does provide a comprehensive view of the entire system. However, it is more complex than fault tree and event tree analysis, both to construct and in the manner in which dependencies are dealt with during quantification, and so requires more time and resources.

The causes and the probabilities for the realization of the primary event and the follow-up events are defined to cause trees built from top to down. Often cause trees describe failures and are therefore called fault trees. The top level of the cause tree is at the same time a node in the consequence tree describing an event realizing or not. Cause and consequence tree together create a visual consequence chain to help illustrate the relations between causes and consequences that lead into different damages. Consequence tree shows the possible consequence chains and damages of a single event, whereas cause trees (fault trees) describe the causes and probabilities of each consequence.

Cause-consequence analysis includes the following phases:

Recognizing damage chains

Recognizing the primary event (failure or some unwanted event that triggers the damage chain)

Recognizing the follow-up events (events between primary event and final damages)

Final consequence damages (damages coming from different levels of follow-up events)

Defining causes of primary and follow-up events to cause/fault trees

Inputting realization probabilities (failure data) for the causes of primary and follow-up events

Cause-consequence analysis is an effective tool when confirming that the operational safety features have been taken into account already on the design phase. The method can be applied especially when examining complex event chains where there are many possible consequence damages for a single primary event.

The results of cause-consequence analysis include among other things:

Visual and logical description of the consequence chain evolving from the examined primary event

Probabilities for the final consequence damages based on the cause-consequence structure

Cause-consequence relations (causalities) between events

Requirements for the safety features

17. Cause-and-Effect Analysis/Fishbone Diagrams

An effect can have a number of contributory factors which can be grouped in Ishikawa diagrams. Contributory factors are identified often through a brainstorming process (see Part I of this article for more information).

Ishikawa diagrams were popularized by Kaoru Ishikawa in the 1960s, who pioneered quality management processes in the Kawasaki shipyards. The basic concept was first used in the 1920s, and is considered one of the seven basic tools of quality control. Ishikawa diagrams are known as fishbone diagrams because their shape is like the side view of a fish skeleton.

The basic steps in performing a cause-and-effect analysis are as follows:

Establish the effect to be analyzed and place it in a box. The effect may be positive (an objective) or negative (a problem) depending on the circumstances;

Determine the main categories of causes represented by boxes in the Fishbone diagram. Typically, for a system problem, the categories might be people, equipment, environment, processes, etc. However, these are chosen to fit the particular context;

Fill in the possible causes for each major category with branches and sub-branches to describe the relationship between them;

Keep asking “why?” or “what caused that?” to connect the causes;

Review all branches to verify consistency and completeness and ensure that the causes apply to the main effect;

Identify the most likely causes based on the opinion of the team and available evidence.

The results are displayed as either an Ishikawa diagram or tree diagram.