Systems Failures & System Reliability Flashcards
What is the meaning of ‘System’?
A system is a group of interacting, interrelated, or interdependent elements forming a complex whole
A set or arrangement of things so related or connected as to form a unity or organic whole
Explain the meaning of a ‘Holistic Approach’ to system failure analysis
A holistic approach requires looking at the behaviour of the total system rather than the isolated workings of individual components.
Holistic means trying to understand all the interactions between the separate components as they work together as a whole.
Explain the meaning of a ‘Reductionist Approach’ to system failure analysis
This approach is when the system is divided into its components for individual analysis to identify system or subsystem failures i.e. HAZOPS or FMEA study
Explain what is meant by Hazard and Operability Studies (HAZOPS)
HAZOPS is a powerful tool, developed primarily for use on chemical process plants but now with wide applicability.
It employs a methodical approach using specialists guided by a formal system.
The process critically examines sub-components of the process system using guide words such as ‘High, Low, More, Less’ applied to key parameters such as pressure, temperature, flow etc.
The aim is to identify deviations from design intent that could have critical consequences and establish necessary safeguards at the design stage.
Explain what is meant by Failure Mode and Effects Analysis (FMEA)
FMEA is a simple but effective tool to improve reliability.
The purpose of the analysis is to explore the effect of failures or malfunctions of individual components within a system.
Consequently, the system needs to be broken down into sub-components which can then be analysed for failure.
So, for each sub-component, we examine the possible failure modes, the effect of this failure, and the consequences of the failure in terms of severity and likelihood of detection, which can be allocated a risk priority code.
This analytical approach allows us to focus on the critical failure modes where we need to improve reliability.
Explain what is meant by Fault Tree Analysis (FTA)
FTA acknowledges the fact that most accidents are multi-causal, and employs analytical techniques to trace the events that could contribute.
The fault tree is a logic diagram which traces all the branches of events that could contribute to an accident or failure.
Consequently, we need to be able to identify the sub-elements that have a bearing on the final event, e.g. for an explosion we need a flammable atmosphere, a source of ignition and enough oxygen.
We then examine each of these sub-components to identify how they could arise.
We can use quantified techniques, if necessary, to establish the critical events where reliability needs to be improved and introduce measures which will make the original accident or failure less likely.
Explain what is meant by Event Tree Analysis (ETA)
ETA starts with a primary event, then develops the resulting sequence of events that describe potential accidents, examining both the success and failure of safeguards as the accident sequence progresses.
Event trees provide a methodical way of recording accident sequences and defining the relationships between initiating events and subsequent events within the system under study.
How may each element of a system be connected?
In Series i.e. one after the other
In Parallel i.e. side by side
As a combination of both, which is quite common
How does a parallel system work?
In a parallel system, the failure of one component will not stop the system functioning.
How does a series system work?
In a series system, the components are joined to each other such that all must function for the system to operate.
How does a mixed system work?
Unfortunately, systems are not composed solely of series systems or parallel ones but are generally mixed.
The components that are less reliable are put in a parallel system to increase their reliability, while those with good reliability are left in series.
Describe the methodology for Human Reliability Analysis/Assessment
Determine the scope of assessment
Gather information
Describe the tasks
Identify any potential human errors
Estimate overall human error probabilities for the task
Give result to system analyst to incorporate into the overall risk assessment of the system and consider if human error has significant impact on the system
Develop control measures
Explain methods for improving system reliability
USE OF RELIABLE COMPONENTS
A system is only as reliable as the components that make it up. It is importnt that quality checks are carried out on the parts to ensure they meet legal specifications as well as additional specified ones. Suppliers can be asked to provide details of their quality assurance procedures and testing regimes.
QUALITY ASSURANCE
Materials will be delivered to the factory for processing into the finished product. Checks should be conducted at each phase and should be recorded and a management process introduced - this is called quality control. The system will probably be one based on the BS ISO 9001 series of documents detailing quality assurance.
PARALLEL REDUNDANCY
Additional components added in parallel series so that if one fails the other one will keep the system going.
STANDBY SYSTEMS
In order to prevent a system failure, a standby system can be installed so that should part of the system or a component stop working, then an alternative system automatically steps in to continue operation. This is invaluable where failure of the system could affect safety.
MINIMIZING FAILURES TO DANGER
When a system does fail, it is important that the failure does not end with the production of a hazardous situation. For this reason, it is vital that systems fail to safety. This can be achieved by good design e.g. ensuring that dangerous machinery has an automatic cut out as soon as a hazardous component fails.
PLANNED PREVENTIVE MAINTENANCE
This improves safety and plant integrity as well as reliability. This may be conducted as specific intervals to prevent failure of the system and increase reliability.
MINIMIZING HUMAN ERROR
Making sure the right person is doing the right job
Individual has adequate training and instruction
Individual receives appropriate rest breaks
Man-machine interface is ergonomically suitable
Working environment is comfortable i.e. noise, lighting, heating etc.