Managing Infrastructure and Systems Flashcards
Why do we want components and systems to be reliable?
- Safety reasons
- Financial reasons (maintenance and delays are costly)
What is the purpose of system reliability analysis?
To evaluate the performance of a system using known
information about the system components and structure.
Why perform reliability analyses?
To assess the adequacy of engineering systems
– at the design stage or to assess upgrades.
To satisfy regulatory requirements
– demonstrate that a system is fit for purpose.
To support decision making, e.g.:
– to find a balance between safety and cost,
– to determine optimal maintenance strategies.
What can happen without reliability analyses?
• Decisions can be subjective and based on biased information.
• Decisions can be inconsistent and based on qualitative
measures or prejudices.
• Available finances can be used inefficiently.
What is the definition for Reliability?
Reliability, Rsys(t) : the probability that the system failure mode does not occur from 0 to time t, given that the system worked at time 0.
What is the definition for Unreliability?
Unreliability, Fsys(t) : the probability that the system failure mode occurs at least once from 0 to time t given that it worked at time 0.
What is the definition for Availability?
Availability, Asys(t) : the probability that the system is operational at
time t, given that it was operational at time 0.
What is the definition for Unavailability?
Unavailability, Qsys(t) : the probability that the system failure mode
exists at time t
What is the definition for Failure rate?
Failure rate : the rate at which the system failure mode occurs.
What is the definition of a path set?
A path set is a list of components, such that if they all work then the system is also in the working state.
What is the definition of a minimal path set?
Is a path set, such that if any component is removed the system no longer functions
What is the definition of a cut set?
Is a list of components, such that if they all fail then the system is also in the failed state
What is the definition of a minimal cut set?
Is a cut set, such that if any item is removed the system will no longer fail
Describe what goes inside the connectivity matrix.
cij = k, where k is the number of edges from node i to node j.
(0 for diagonal)
Describe what goes inside the connection matrix.
1 for diagonal terms (certain to connect)
0 i f no connection
A where A is the component linking node to node
Which 3 methods could you use to calculate the unreliability of an RBD that cannot be represented as a combination of series and parallel networks?
Solution Methods:
– Key Element Method.
– Conversion from Deltas to Stars.
– Minimal Path Set / Cut Set Evaluation.
What is the expression for…
a) system success using key element method
b) system failure using key element method
…. given that E is the key element
a) RSYS = (RX) (RE) + (RY) (QE)
b) QSYS = (QX)(RE) + (QY)(QE)
Give three matrix methods you can use to find the minimal path sets and state which matrix each method relates to
General Algorithm - using the connectivity matrix
Node Removal - using the connection matrix
Matrix multiplication - using the connection matrix
Give two uses of RBD analysis
- Helps to find points of failure and identifies what is making the system unreliable
- Shows a visual representation of a system and therefore reliability can be assessed without numbers
How could you improve a system’s reliability? (Once completing RBD analysis)
- Reduce the amount of series systems
- Increase the redundancy
- Upgrade the components to ones with more reliability
- Improve the accuracy of the data on the reliability on for the components
How can you improve RBD analysis?
- RBDs are often simplified. To improve analysis, create a more accurate model
- Minimise assumptions and make sure any you do make are reasonable
Why might an RBD analysis not be accurate?
- RBDs don’t take into account the effect of not have all 3 in a 2/3 component working (i.e. if 1 fails it might still work but with less power)
- The analysis doesn’t take into account time variations in demand
What is FMEA?
FMEA is a powerful design tool that analyses each potential failure mode in the system to examine the
effects on the system
What is FMECA?
When FMEA is extended to classify each potential failure effect according to its severity the method is known as FMECA
Which stage in a product’s life cycle is it best to carry out FMEA?
Design stage
What are the advantages of FMEA?
- Good data gathering process on existing systems
- Rigorous
- Systematic
What are the disadvantages of FMEA?
- Time-consuming
* Expensive
What type of analysis is FMEA? And what does it identify?
Qualitative analysis that identifies:
– potential system failure modes,
– the causes of the failure modes
– the effects on the system operation associated with the failure modes’ occurrence.
What are the two types of FMEA?
1) Product FMEA
– Analyses the product and how failure modes affect its operation.
E.g. Determine causes and effects of fire protection system failure
2) Process FMEA
– Analyses the process by which the product is built, maintained and
used.
– Examines how failures in the process affect product operation.
E.g. Determine causes and effects of failures while maintaining a fire protection system
What are the two approaches for FMEA?
- Functional (top-down) :
– System decomposed to sub-assemblies (sub-system, modules,
components)
- Depending on the information available and study objective
– Consider effects of loss of inputs and sub-assembly failures
– Used in the early design stages - Hardware (bottom-up):
– Detailed system breakdown
– Consider each individual component and effects of its failure modes
– Used in detailed design
What are the eight steps involved in FMEA procedure?
- Define system:
– Components, boundaries and interfaces
– All modes of operation
– Environment profile
– Mission / phases and times in each phase
– Mission / phase objectives - Construct functional block diagrams:
– Functional connection between sub-systems / components
– Hierarchy level at which the analysis is done - Note assumptions:
– System and sub-system boundaries
– Failure modes/failure rates, etc. - Define system failure modes
- List component (sub-system) failure modes:
– Review failure information prior to commencing study (failure modes can be found by investigation of failure data) - Complete FMEA worksheets:
– Analyse the effect at LOCAL and SYSTEM level for each component (sub-system) failure mode
– Assume worst potential consequences - Review worksheets to determine the reliability critical
components - Make recommendations for design improvements and further work
What do the symbols stand for under the ‘Failure Mode Criticality Number’ section of the formula sheet?
lamda o = failure mode rate
lamda p = failure rate
alpha = proportion of failures in specified failure mode
beta = probability that expected failure effect will result
Cm = Criticality Number
t = mission or phase time period
What does a vote gate mean in a fault tree?
Output event occurs if at least k of the n input events occur.
What does a Priority AND gate mean in a fault tree?
Output event occurs if all input events occur in sequential order from left to right.
What does a NOT gate mean in a fault tree?
Output event occurs if input event does not occur.
What does an intermediate event represent in a fault tree?
System or component event description
What does a basic event represent in a fault tree?
Basic event for which failure and repair data is available.
Usually represents a component failure.
What does a house event represent in a fault tree?
Represents definitely occurring or definitely not occurring events.
What does a transfer event represent in a fault tree?
Indicates that this part of the fault tree is developed in a different part of the diagram or on a different page.
What do dual fault trees allow you to find?
minimal path sets
What are the tree possible maintenance policies?
- No repair
- Repair when failure is revealed
(unscheduled maintenance) - Repair when failure is unrevealed and must be discovered
(scheduled maintenance)
What is the symbol for:
a) Failure rate (hazard rate)
b) Repair rate
c) Mean time to failure (MTTF)
d) Mean time to repair (MTTR)
e) Inspection interval
a) lamda
b) weird v
c) mew
d) tau
e) theata
What does steady state mean?
t tends to infinity
What are the drawbacks when using fault trees to calculate top event probability?
Using the inclusion-exclusion expansion to calculate the exact top event probability for large fault trees is not practical. Even for fast modern digital computers a calculation involving many cut sets can take a great deal of processing time.
How do the approximations to the top event probability compare to each other?
Qsys is less than or equal to Min Cut Set Upper Bound which is less than or equal to Rare Event/ Upper Bound
What do importance measures tell us?
Indicate, in some sense, the contribution each element of the system makes to the system failure event.
What is the definition of ‘Critical State’?
A critical system state for component i is a state for the remaining n-1 components such that failure of component i causes the system to go from a working to a failed state.
Give an example of a deterministic importance measure.
Structural Importance Measure
Give examples of probabilistic availability importance measures.
Birnbaum’s Measure/Criticality Function
Criticality Measure of Importance
Fussell-Vesely Measure of Importance
Fussell-Vesely Measure of Cut Set Importance
Give examples of probabilistic reliability importance measures.
Barlow-Proschan Measure of Initiator Importance
Sequential Contributory Measure of Enabler Importance
How do you calculate the structural importance measure?
(number of critical states for component i) / (total number of states for the n-1 remaining components)
What is the definition of Birnbaum’s Measure/Criticality Function?
The Criticality Function for a component i, Gi
(q) is the probability that the system is in a critical state for component i.
What is the definition of Criticality Measure of Importance?
The probability that the system is in a critical state for component i and i has failed. (Weighted by QSYS).
What is the definition of Fussell-Vesely Measure of Importance?
Probability of the union of all minimal cut sets containing i given that the system has failed.
What is the definition of Fussell-Vesely Measure of Cut Set Importance?
Probability of occurrence of cut set i given that the system has failed.
What is the definition of frequency/unconditional failure intensity?
The frequency of an event is the probability that the event occurs per unit time at time t.
What assumption do we make when calculating unconditional failure intensity?
Basic events cannot occur simultaneously, i.e. only one basic event can occur in any time interval dt.
What is the formula for unconditional failure intensity in terms of lamda and Q(t)
lambda( 1 - Q(t))
How do you calculate minimal cut set frequency?
Sum of all the combinations of 1 w and the rest qs (using all letters/components in cut set)
What is top event frequency? (Also known as system unconditional failure intensity)
The probability that the top event occurs at t
per unit time
Why is it more accurate to use initiators and enablers in some cases?
As with many safety protection systems, the order of events is actually of vital importance to the occurrence of the top event.
If a hazardous event occurs after the protection systems have failed then there will be a dangerous system failure.
If the hazardous event occurs before the protection systems fail then the dangerous system failure will be avoided.
What is an initiating event?
Perturb system variables and place a demand on
control/protection systems to respond.
What is an enabling event?
Are inactive control/protection systems which permit initiating events to cause the top event.
What is the definition of Barlow-Proschan Measure of Initiator Importance
The probability that initiating event i causes the system failure over the interval (0, t). [Weighted by W(0,t)].
What is the definition of Sequential Contributory Measure of Enabler Importance
The probability that enabling event i permits an initiating event to cause system failure in (0, t). [Weighted by W(0,t)].
Why would you use FTAs in real life?
- To better understand a system and how it can fail
- To calculate system unreliability and its unconditional failure intensity
- To look at the way safety systems operate (through using initiators and enablers)
- To identify areas of improvement for the system (through using importance measures)
What does simulation seek to do and how?
Simulation seeks to “duplicate” the behaviour of the system under investigation by studying interactions among its components.
Give an example of ‘next event scheduling’
Customers arrive and either go into service immediately if the server is idle or join a waiting line (queue) if the server is busy.
Why might you use simulation/the Monte Carlo method instead of the analytical methods?
Problem Areas for Analytical Methods:
• Large Fault Trees / Networks.
• Complex Component Failure Distributions.
• Investigating Maintenance Philosophy (Queues).
• Considering more than two component states.
• Dependencies - failures or repairs.
Give some desirable properties of random numbers.
- Uniform ~ U [0,1]
- Independent.
- Fast
Additional Desirable Properties (Pseudo-random numbers):
• Long cycle lengths.
• Repeatable.
What are four of the methods that can be used to generate random numbers?
- Die.
- Tables.
- Calculators (electronic noise).
Pseudo-Random Numbers: - Computer Algorithm
What does modulo m mean?
keep dividing by m until you get a number less than m
What is discrete event simulation?
• Component failure and repair characteristics can be represented by probability distributions and in this case simulation steps through time dealing with events chronologically.
• Discrete event simulation is used to model a system in this way.
• The times at which events occur are obtained by random
sampling from the appropriate probability distributions.
What are the steps to complete the convolution method?
- Generate 12 random numbers
- Calculate the sum of these 12
- t = ((x-6)*standard deviation)+mew
(remember, mew in this case can be MMTF or MMTR, depends what you’re looking at)
Which time to failure would you take as the system time to failure from multiple component times in series?
The minimum
Which time to failure would you take as the system time to failure from multiple component times in parallel?
The maximum
What are the advantages of simulation?
- Model Complex Systems.
- Systems can be stochastic (random).
- Vary operating conditions.
- Vary system designs.
- Vary maintenance strategies.
- Can change time scales.
What are the disadvantages of simulation
- Exact solutions are not obtained.
- Cost.
- Time.
- Validation process difficult.
What should the initial conditions be?
Desire conditions that are “typical” or “representative” of real system.
If the initial conditions are not obvious, what could you do?
– Ignore.
– “Warm Up” simulation
– Sample from a range of conditions.