ISO 22000 Resource Center: ISO 31010:2011 Risk Assessment Techniques

Controls Assessments

The level of risk will depend on the adequacy and effectiveness of existing controls. When conduction a controls assessment, questions to be addressed include:

What are the existing controls for a particular risk?

Are those controls capable of adequately treating the risk so that it is controlled to a level that is tolerable?

Are the controls operating in the manner intended and can they be demonstrated to be effective when required (in practice)?

These questions can only be answered with confidence if there are proper documentation and assurance processes in place. The level of effectiveness for a particular control, or suite of related controls, may be expressed qualitatively, semi-quantitatively or quantitatively. In most cases, a high level of accuracy is not warranted. However, it may be valuable to express and record a measure of risk control effectiveness so that judgments can be made on whether effort is best expended in improving a control or providing a different risk treatment. The following methods are given in the ISO 31010:2011 are come under control assessment tools, but some authors have categorized them in different ways in different scenarios. However, we consider following methods are to be controls assessment techniques or tools that can be used in the risk assessments.

18. Layer protection analysis (LOPA)

Layer of Protection Analysis is a risk management technique commonly used in the chemical process industry that can provide a more detailed, semi-quantitative assessment of the risks and layers of protection associated with hazard scenarios. LOPA allows the safety review team an opportunity to discover weaknesses and strengths in the safety systems used to protect employees, the plant, and the public. LOPA is a means to identify the scenarios that present the most significant risk and determine if the consequences could be reduced by the application of inherently safer design principles. LOPA can also be used to identify the need for safety instrumented systems (SIS) or other protection layers to improve process safety.

A variety of approaches are employed which could use order-of-magnitude, half order-of-magnitude, and decimal math. Conceptually, LOPA is used to understand how a process deviation can lead to a hazardous consequence if not interrupted by the successful operation of a safeguard called an independent protection layer (IPL). An IPL is a safeguard that can prevent a scenario from propagating to a consequence of concern without being adversely affected by either the initiating event or by the action (or inaction) of any other protection layer in the same scenario.

The basic steps are:

A cause-consequence pair is selected and the layers of protection which prevent the cause leading to the undesired consequence are identified.

An order of magnitude calculation is then carried out to determine whether the protection is adequate to reduce risk to a tolerable level.

LOPA is a less resource-intensive process than a fault tree analysis or a fully quantitative form of risk assessment, but it is more rigorous than qualitative subjective judgments alone. It basically focuses efforts on the most critical layers of protection, identifying operations, systems and processes for which there are insufficient safeguards and where failure will have serious consequences. However, this technique looks at one cause-consequence pair and one scenario at a time and, therefore, does not apply to complex scenarios where there are many cause consequence pairs or where a variety of consequences affect different stakeholders.

Normally multiple Protection Layers (PLs) are provided in the process industry, in which each protection layer consists of a grouping of equipment and/or administrative controls that function in concert with the other layers. Protection layers that perform their function with a high degree of reliability may qualify as Independent Protection Layers (IPL). The criteria to qualify a Protection Layer (PL) as an IPL are:

The protection provided reduces the identified risk by a large amount, that is, a minimum of a 10- fold reduction.

The protective function is provided with a high degree of availability (90% or greater).

It has the following important characteristics:

Specificity

An IPL is designed solely to prevent or to mitigate the consequences of one potentially hazardous event (e.g., a runaway reaction, release of toxic material, a loss of containment, or a fire). Multiple causes may lead to the same hazardous event; and, therefore, multiple event scenarios may initiate action of one IPL.

Independence

An IPL is independent of the other protection layers associated with the identified danger.

Dependability

It can be counted on to do what it was designed to do. Both random and systematic failures modes are addressed in the design.

Auditability

It is designed to facilitate regular validation of the protective functions. Proof testing and maintenance of the safety system is necessary. Only those protection layers that meet the tests of availability, specificity, independence, dependability, and auditability are classified as Independent Protection Layers.

19. Decision Tree

Decision trees are a simple, but powerful form of multiple variable analysis, which provide unique capabilities to supplement, complement, and substitute for traditional statistical forms of analysis (such as multiple linear regression) with variety of data mining tools and techniques (such as neural networks) as well as recently developed multidimensional forms of reporting and analysis found in the field of business intelligence. A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision. Decision tree builds classification or regression models in the form of a tree structure, which breaks down a data-set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes, where a decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf node (e.g., Play) represents a classification or decision, whereas the topmost decision node in a tree which corresponds to the best predictor called root node.

Decision trees can handle both categorical and numerical data. Decision trees can be drawn by hand or created with a graphics program or specialized software. Informally, decision trees are useful for focusing discussion when a group must make a decision. Programmatically, they can be used to assign monetary/time or other values to possible outcomes so that decisions can be automated. Decision tree software is used in data mining to simplify complex strategic challenges and evaluate the cost-effectiveness of research and business decisions. Variables in a decision tree are usually represented by circles.

Drawing a Decision Tree

You start a Decision Tree with a decision that you need to make, and draw a small square to represent this towards the left of a large piece of paper.

From this box draw out lines towards the right for each possible solution, and write that solution along the line. Keep the lines apart as far as possible so that you can expand your thoughts.

At the end of each line, consider the results. If the result of taking that decision is uncertain, draw a small circle.

If the result is another decision that you need to make, draw another square. Squares represent decisions, and circles represent uncertain outcomes.

Write the decision or factor above the square or circle. If you have completed the solution at the end of the line, just leave it blank.

Starting from the new decision squares on your diagram, draw out lines representing the options that you could select. From the circles draw lines representing possible outcomes.

Again make a brief note on the line saying what it means.

Keep on doing this until you have drawn out as many of the possible outcomes and decisions as you can see leading on from the original decisions.

20. Human Reliability Analysis

Human reliability assessment (HRA) is the common name for an assortment of methods and models that are used to predict the occurrence of ‘human errors’. While the origin of HRA is in Probabilistic Safety Assessment (PSA), HRA is increasingly being used on its own both as a way to assess the risks from ‘human error’ and as a way to reduce system vulnerability. According to the three principal functions of HRA are:

Identifying what errors can occur (Human Error Identification),

Deciding how likely the errors are to occur (Human Error Quantification), and,

Enhancing human reliability by reducing the given error likelihood (Human Error Reduction), if appropriate.

Thus anticipating failures of joint human-machine systems requires an underlying model. This should not be a model of human information processing in disguise, but a model of how human performance is determined by – hence reflects – the context or circumstances, i.e., a model of joint system performance rather than of individual human actions.

Human reliability assessment deals with the impact of humans on system performance and can be used to evaluate human error influences on the system. At the risk of stating the obvious, human reliability is very important due to the contributions of humans to the resilience of systems and to possible adverse consequences of human errors or oversights, especially when the human is a crucial part of today’s large socio-technical systems. A variety of methods exist for human reliability analysis. These break down into two basic classes of assessment methods, that are probabilistic risk assessment (PRA) and those based on a cognitive theory of control. In 2009, the Health and Safety Laboratory compiled a report for the Health and Safety Executive (HSE) outlining HRA methods for review. They identified 35 tools that constituted true HRA techniques and that could be used effectively in the context of health and safety management.

Conducting a HRA requires people with expertise in conducting such methods where you need to follow these steps to prepare a risk model based on HRA.

1. Understand the actions being investigated

2. Use a structured approach to investigate and represent the actions (task analysis)

3. Consider the level of detail needed (compare with description detail of available data)

4. Understand the failure criteria

5. Select an appropriate technique(s)

Represent the identified errors in a risk model

Application of HRA

i). Eliminate Error Occurrence

This is the first preference, where design features known to be a source of human error are eliminated (e.g., lack of feedback, lack of differentiation, inconsistent or unnatural mappings). Design choices available for error elimination include:

Replacement of error inducing design features (e.g., physical device separation, physical guards, application of validity and range)

Restructuring of task where error prevalent behaviour is no longer performed (e.g., by information filtering, only the information needed for the task is provided).

Automate to change the role of human involvement in support of task performance.

ii). Reduce Error Occurrence

Consider this approach, if complete error elimination is not possible or feasible through design choices. Design features which can reduce error occurrence include:

Identification (e.g., device labeling)

Constraints (i.e., build in constraints to limit operation to acceptable ranges)

Coding (i.e., aid in choice differentiation and selection)

Consistency feedback (i.e., convey device and system state directly in the interface)

Predictability (i.e., design system responses so that operators can associate specific control actions with system response)

iii). Eliminate Error Consequence

The third approach is to eliminate error consequences, where there are three categories of design features that reflect the components of the consequence prevention strategy:

A. Error detection design features (to promote detection prior to consequence/ occurrence):

Feedback (i.e., status information in relation to operational goals and potential side-effects of an action)

Alert of Unacceptable Device States (e.g., visual/auditory feedback of off-normal or unacceptable device states)

Confirmation (i.e., support of self-checking and independent verification practices)

Prediction (i.e., providing information on the outcome of an action prior to its implementation, or with sufficient time for correction)

B. Error recovery design features (to enable recovery prior to consequence/ occurrence):

Undo (e.g., facilities for reversing recent control actions to allow promote error recovery)

Guidance (i.e., alternative forms of guidance for cases where reversing a recent control action is not the preferred action)

C. Consequence Prevention Design Features

Interlocks

Margins and Delays (i.e., these features can provide more time to unacceptable consequence realization, thus increasing the chances of error detection and recovery prior to consequence occurrence)

Fail Safe Features

iv). Reduce Error Consequence

If errors and consequences cannot be completely eliminated, consider measures that enable consequence reduction. This may be achieved through application of additional design features that allow operators or automation to recognize the occurrence of an error consequence, and to take action to mitigate the consequences.

Additional design features include:

Margins (i.e., apply larger design margins to allow some consequences to be accommodated by normal system function and capacities).

Engineered Mitigating Systems (e.g., automatic special safety systems actions, such as CANDU Automatic Stepback and Setback).

Human Intervention (i.e., operations team can readily adapt to both predefined and undefined operating situations).

Response Teams (i.e., organizational structure is prepared and coordinated to deal with the predefined consequences).

Consequence Prediction (e.g., aids can assist operations staff in predicting the extent of consequences of operating actions and assist in selection and execution of mitigating actions).

Backup Replacement Function (i.e., provision of equipment and/or human intervention to mitigate consequences).

21. Bow Tie Analysis

Bow-tie analysis is a simple diagrammatic way to display the pathways of a risk showing a range of possible causes and consequences. It is used in situations when a complex fault tree analysis is not justified or to ensure that there is a barrier or control for each of the possible failure pathways. The Bow-tie analysis starts with the risk at the “knot” of the tie, and then moves to the left to identify and describe the events or circumstances that may cause the risk event to occur, paying particular attention to root causes. Once those causes have been identified, the analysis then identifies preventive measures that could be implemented. At this point there could be an evaluation of the actual preventive measures that the organization has in place to determine whether additional measures should be implemented. The analysis then moves to the right to look at the potential consequences that would result after the risk event happens, and the plans the organization either has or should have in place to minimize the negative effects of the risk.

Bow-tie Diagram Construction

1. Define the Hazard and the Top Event which is the initial consequence

"What happens when the danger is" released "?"

2. Identify the Threats which are the Top Event causes

"What causes the release of danger? "How can lost control?

3. Identify the existing Protection Barriers each Threat

Prevent the Top Event occurrence

Can be independents or dependents

"How can controls fail?"

"How can that their effectiveness can be compromised?"

4. Identify for each Barrier their Escalation Factors

Factors that make the Barrier fail

“How can we avoid that the hazard being released?

“How can we keep the control?”

5. Identify for each Barrier their Escalation Factors Control

Factors that prevent or minimize the possibility of the Barrier or the Recovery Measures becomes ineffective

"How to ensure that the controls will not fail?"

6. Identify the consequences

Top Event could have several consequences

7. Identify the Recovery Measures

Factors that make the barriers fail

"How can we limit the severity of the event?”

"How can we minimize the effects?"

8. Identify for each Recovery Measure their Escalation Factors and Escalation Factors Controls

9. For each Barrier, Recovery Measures and Escalation Factors Controls identify the Critical Safety Tasks

Critical Safety Tasks

Tasks prevent and/or minimize the possibility of the Barrier, the Escalation Factor Control or the Recovery Measures fails or becomes ineffective What tasks can be taken to ensure that the control is working?

Project engineering, operation, maintenance, management.

"How can we ensure that these tasks are done?"

"Who do these tasks?

"How do you know when to do the tasks?"?

"How do you know what to do?“

"Is there a procedure, checklist, instruction?"

22. Reliability Centered Maintenance (RCM)

Reliability centered maintenance (RCM) is a corporate-level maintenance strategy that is implemented to optimize the maintenance program of a company or facility. RCM is the process of determining the most effective maintenance approach, where RCM philosophy employs Preventive Maintenance (PM), Predictive Maintenance (PdM), Real-time Monitoring (RTM), Run-to-Failure (RTF- also called reactive maintenance) and Proactive Maintenance techniques in an integrated manner to increase the probability that a machine or component will function in the required manner over its design life cycle with a minimum of maintenance. The goal of the philosophy is to provide the stated function of the facility, with the required reliability and availability at the lowest cost. RCM requires that maintenance decisions be based on maintenance requirements supported by sound technical and economic justification.

RCM allows you to identify applicable and effective preventive maintenance requirements for equipment “…in accordance with the safety, operational and economic consequences of identifiable failures, and the degradation mechanism responsible for those failures”. RCM uses a failure mode, effect and criticality analysis (FMECA) type of risk assessment that requires a specific approach to analysis in this context. From a quality management standpoint, it’s worth being aware that RCM identifies required functions and performance standards and failures of equipment and components that can interrupt those functions. The final result of an RCM program is the implementation of a specific maintenance strategy on each of the assets of the facility. The maintenance strategies are optimized so that the productivity of the plant is maintained using cost-effective maintenance techniques.

There are four principles that are critical for a reliability centered maintenance program.

1. The primary objective is to preserve system function

2. Identify failure modes that can affect the system function

3. Prioritize the failure modes

4. Select applicable and effective tasks to control the failure modes

An effective reliability centered maintenance implementation examines the facility as a series of functional systems, each of which has inputs and outputs contributing to the success of the facility. It is the reliability, rather than the functionality, of these systems that are considered. There are basic seven questions that need to be asked for each asset is:

1. What are the functions and desired performance standards of each asset?

2. How can each asset fail to fulfill its functions?

3. What are the failure modes for each functional failure?

4. What causes each of the failure modes?

5. What are the consequences of each failure?

6. What can and/or should be done to predict or prevent each failure?

7. What should be done if a suitable proactive task cannot be determined?

23. Sneak Circuit Analysis

Sneak Circuit Analysis (SCA) is used in safety-critical systems to identify sneak (or hidden) paths in electronic and electro-mechanical systems that may cause unwanted action or inhibit desired functions. Sneak analysis can locate problems in both hardware and software using any technology. The sneak analysis tools can integrate several analyses such as fault trees, failure mode and effects analysis (FMEA), reliability estimates, etc. into a single analysis saving time and project expenses. The technique helps in identifying design errors and works best when applied in conjunction with HAZOP. It is very good for dealing with systems which have multiple states such as batch and semi-batch plant.

The analysis is based on identification of designed-in inadvertent modes of operation and is not based on failed equipment or software. SCA is most applicable to circuits that can cause irreversible events. These include:

a. Systems that control or perform active tasks or functions

b. Systems that control electrical power and its distribution

c. Embedded code which controls and times system functions

The SCA process differs depending on whether it is applied to electrical circuits, process plants, mechanical equipment or software technology, and the method used is dependent on establishing correct network trees. When evaluating sneak circuits, the problem causing path may consist of hardware, software, operator actions, or combinations of these elements. Sneak circuits are not the result of hardware failure but are latent conditions, inadvertently designed into the system, coded into the software program, or triggered by human error. Four categories of sneak circuits are:

1. Sneak Paths

Unexpected paths along which current, energy, or logical sequence flows in an unintended direction

2. Sneak Timing

Events occurring in an unexpected or conflicting sequence

3. Sneak Indications

Ambiguous or false displays of system operating conditions that may cause the system or an operator to take an undesired action

4. Sneak Labels

Incorrect or imprecise labeling of system functions - e.g., system inputs, controls, display buses, that may cause an operator to apply an incorrect stimulus to the system

ISO 22000 Resource Center

Thursday, March 23, 2017

ISO 31010:2011 Risk Assessment Techniques – IV

No comments:

Post a Comment