Safety-critical systems Flashcards

Question

Controller operation

Answer 1

- Controller operation has three main parts, each of which may contribute to the lack of control actions 1) Control Inputs and Other External Information - Control actions flow through a system, so there is a risk of incorrect information being provided by another level or component of the system - In these cases, the incorrect information may be acted upon correctly. 2) Control Algorithms - Algorithms may not enforce safety constraints due to inadequate design initially or unsafe modification to safe designs - Time delays and lag must be taken into account when designing control routines, and sometimes these delays need to be inferred 3) The Process Model - The model used by the controllers must be consistent with the actual process state, otherwise these discrepancies may contribute to an accident through erroneous actions being taken

Answer 2

Level 1: Mechanisms - The succession of events Level 2: Conditions - Conditions (or lack of conditions) that allowed the events at Level 1 to occur Level 3: Constraints - Constraints (or lack of constraints) that allowed conditions to cause the events, or that allowed conditions to exist

Answer 3

- It is sometimes the case that while the control maintains the constraints that the controlled process may be unable to act on the commands - This can stem from multiple causes, including: 1) Communication channel failure 2) Mechanical failure 3) Correct execution of safety inputs may depend on input from other system components - These kinds of flaws arise from system design and development

Answer 4

- When there are multiple sources of control it can be the case that control actions are not properly coordinated. This may result in unexpected side-effects or conflicts between control actions. This usually arises from communication flaws. - Accidents appear to be more likely in boundary areas where multiple controllers control the same process (or processes with common boundaries). This is due to the potential for ambiguity and conflicts between decisions and often occurs due to poorly defined boundaries. - Overlap areas occur where a function is achieved through the cooperation of multiple controllers or where multiple controllers influence the same object. This also creates the potential for conflicting control actions.

Answer 5

1) Deficiencies in the safety culture of the industry or organisation 2) Flawed organisational structures 3) Superficial or ineffective technical activities

Answer 6

- Safety culture: the general attitude and approach to safety reflected by those who participate in that industry. e.g. management, industry regulators, government regulators - In an ideal world all participants are equally concerned about safety, both in the processes they use and in the final product. But this is not always the case; that's why there is a requirement to have industry and government regulators

Answer 7

Major accidents often stem from flaws in this culture, especially: • Overconfidence and complacency a) Discounting risk. b) Over-reliance on redundancy. c) Unrealistic risk assessment. d) Ignoring high-consequence, low probability events. e) Assuming risk decreases over time. f) Underestimating software related risks. g) Ignoring warning signs. • Disregard or low priority for safety • Flawed resolution of conflicting goals

Answer 8

Major accidents are often preceded by the belief that they cannot happen

Answer 9

- Redundant systems use extra components to ensure that failure of one component doesn't result in failure of the whole system. - Many accidents can be traced back to common-cause failures in redundant systems. - Common-cause failure happens when multiple redundant components fail at the same time for the same reason (e.g. fire, or electric outage) - Providing redundancy may help if a component fails, BUT we must be aware that all redundant components may fail

Answer 10

It is quite common for developers to state that the probability of a software fault occurring in a piece of software is 10^-4 , usually with little or no justification Example: Therac-25 software risk assessment was 10-11 for the event "computer selects wrong energy level" Instead of launching an investigation when informed about possible overdoses, the manufacturer of the therac-25 responded that the risk assessment showed the accidents were impossible

Answer 11

- A common discovery after accidents is that the events involved were recognized as being very hazardous before the accident, but were dismissed as incredible

Answer 12

- A common thread in accidents is the belief that a system must be safe because it has operated without any accidents for many years - Risk may decrease, remain constant or increase over time - It can increase due to operators becoming over-familiar with safety procedures and hence become lax or even miss them out

Answer 13

- There is a pervading belief that software cannot fail, and that all errors will be removed by testing

Answer 14

- Accidents are frequently preceded by public warnings or a series of minor occurrences - Many basic mechanical safety devices are well tested, cheap, reliable and failsafe (based on physical principles to fail in a safe state)

Answer 15

- Problems will occur if management is not interested in safety, because the workers will not be encouraged to think about safety - The Government may disregard safety and ignore the need for government/industry watchdogs and standards committees - In fact these often only appear after major accidents • The entire organisation must have a high level of commitment to safety in order to prevent accident. • The lead must come from the top and permeate every organizational level

Answer 16

- The most common one is the cost-safety trade-off or appears to cost more money at the time of development - Often cost becomes more important and safety may therefore be compromised in the rush for greater profits

Answer 17

Many accident investigations uncover a sincere concern for safety in the organisation, but find organisational structures in place that were ineffective in implementing this concern. 1) Diffusion of responsibility and authority 2) Lack of independence and low status of safety personnel 3) Poor and limited communications channels

Answer 18

- Accidents are often associated with ill-defined responsibility and authority for safety matters - Should be at least 1 person with overall responsibility for safety, they must have real power within the company

Answer 19

- This leads to their inability or unwillingness to bring up safety issues - e.g. Safety officers should not be under the supervision of the groups whose activities they must check - Low status means no involvement in decision making

Answer 20

- In some industries, strict line management means that workers report only to their direct superiors - Problems with safety may not be reported to interested parties and as a result, safety decisions may not be reported back to the workers - All staff should have direct access to safety personnel and vice versa

Answer 21

This is concerned with poor implementation of all the activities necessary to achieve an acceptable level of safety 1) Superficial safety efforts 2) Ineffective risk control 3) Failure to evaluate changes 4) Information deficiencies

Answer 22

- Any efforts to ensure safety only take place at a superficial level, with no substantive action taken about any issues discovered and recorded Example: - Hazard logs kept but no description of design decisions taken or trade-offs made to mitigate/control the recognised hazards - No follow-ups to ensure hazards have ever been controlled - No follow-ups to ensure safety devices are kept in working order

Answer 23

- Know risks, but little effort is placed in controlling them - The majority of accidents are not the results from a lack of knowledge about how to prevent them - They are the results of failure to use that knowledge effectively when trying to fix the problem

Answer 24

- Accidents often involve a failure to re-evaluate safety after changes are made - Any changes in hardware or software must be re-evaluated to determine whether safety has been compromised - Quick fixes often affect safety because they are not evaluated properly - For software, this would comprise of a regression test plus a system and software safety analysis

Answer 25

- Feedback of operational experience is one of the most important sources of designing, maintaining and improving safety, but is often overlooked - Case studies are valuable for assessing hypotheses and forming intuition for mistakes - There are 2 types of data that are important: • Information about accidents/incidents for the system itself • Information about accidents/incidents for similar systems

Answer 26

- Stands for Automated Driving System

Answer 27

- Accident models attempt to reduce an accident description to a series of events and conditions that account for the outcome - Such models are used to: 1) Understand past accidents (a way to organise data and set priorities) 2) Learn how to prevent future accidents (predictive modeling)

Answer 28

A representation of a system, made from the composition of concepts, that is used to help people understand and simulate the subject represented by the model

Answer 29

- The general accident sequence is mapped onto five "dominoes" in the following order: 1) Ancestry or, social environment 2) Fault of person 3) Unsafe act or condition 4) Accident 5) Injury - Once one domino "falls", it causes a chain of falling dominoes until the accident occurs - Removing any of the dominoes will break the sequence - Although removing any domino will prevent the accident, it is generally considered that the easiest and most effective domino to remove is domino 3: unsafe act or condition - This model has been very influential in accident investigations, but has often been wrongly used to look for a single unsafe act or condition, when causes were actually more complex

Answer 30

- Organise causal factors into chains of events - Events are chained in chronological order, but there is often no obvious stopping point when tracing back from the cause of an accident - This model is very close to our view of accidents, where we often try to rationalise it into a series of events - As with the domino model, if the chain is broken, the accident won't happen - Thus, accident prevention measures concentrate on either: 1) Eliminating certain events or conditions 2) Intervening between events of the chain 3) Adding enough AND gates

Answer 31

- Failure is the inability of a system or component to fulfil its operational requirement, i.e., to perform its intended function for a specified time under specified environmental conditions. - Failure is an event or behaviour which occurs at a particular instant in time

Answer 32

An error is a design flaw or deviation from a desired or intended state - An error is a static condition, a state, which remains until it is removed (usually through human intervention) - An error may lead to an operational failure

Answer 33

- A fault is a hardware or software defect which resides temporarily or permanently in the system - Faults are higher order events - All failures are faults, but not all faults are failures Example: - If a relay fails to close properly when a voltage is impressed across its terminals, then this event is a relay failure - If the relay closes at the wrong time due to the improper functioning of some upstream component, then the relay has not failed, but untimely relay operation may well cause the entire circuit to enter an unsatisfactory state - this event is called a fault

Answer 34

- An undesired and unplanned (but not necessarily unexpected) event that results in (at least) a specified level of loss - An accident for any particular system must be defined by level of loss involved - Loss: life, property or environment; immediate or long-term - Example: two aircrafts coming within a pre-defined distance from each other and colliding

Answer 35

- An event which involves no loss (or only minor loss) but with the potential for loss under different circumstances - An incident for a particular system depends on how accidents are defined for the system - Example: two aircrafts coming within a pre-defined distance from each other but not colliding

Answer 36

- A state of the system that may give rise to an accident - Hazard is a situation in which there is actual or potential danger to people or to the environment - Hazard is specific to a particular system and is defined w.r.t the environment of the system/object

Answer 37

A combination of the likelihood of an accident and the severity of the potential consequences

Answer 38

- Freedom from accidents or losses - It is often argued that there is no such thing as absolute safety, but instead a thing is safe if the attendant risks are judged acceptable

Answer 39

- Risk is the hazard level (severity + likelihood of occurring) combined with likelihood of hazard leading to an accident (danger) and hazard exposure or duration (latency) - Exposure of hazard: the longer the hazardous state exists, the greater the chance that other prerequisite conditions occur

Answer 40

- Identify the hazards and the accidents they may lead to - Assess the risk of these accidents - Reduce the risk, where possible and/or appropriate by eliminating/controlling the hazards that lead to the accidents - Set safety requirements for the system, and define how they will be met in the system design and implementation - Show that the safety requirements have been met in the system - Independent professional review - Provide a safety case for the system, usually for certification purposes - Seek safety approval

Answer 41

1) Build in Safety rather than adding it onto a complex design - Considerations of safety should be made part of the initial stages of conceptual development and requirements engineering 2) Deal with systems in their entirety - Safety issues commonly arise at component boundaries as system components interact. 3) Analyse the current system rather than relying on past experience and standards - Sometimes the pace of change doesn't allow for experience to accumulate or for proven designs to be used - Aim to anticipate safety issues before they occur 4) Examine hazards more widely than just component failures - Hazards can arise in the requirements and design stages and hence be present even if the system components were operating as they should 5) Recognise the importance of tradeoffs and conflicts in system design - Nothing is absolutely safe, and safety is not the only goal in building a system - Safety should act as a constraint on possible system designs and interact with other system constraints - This poses a Multi-Objective Optimisation problem

Answer 42

0: No Driving Automation 1: Driver Assistance 2: Partial Driving Automation 3: Conditional Driving Automation 4: High Driving Automation 5: Full Driving Automation

Answer 43

- Zero autonomy; the driver performs all driving tasks - The human driver does all the driving

Answer 44

- Vehicle is controlled by the driver, but some driving assist features may be included in the vehicle design - An advanced driver assistance system (ADAS) on the vehicle can sometimes assist the human driver with either steering or braking/accelerating, but not both simultaneously

Answer 45

- Vehicle has combined automated functions, like accleration and steering, but the driver must remain engaged with the driving task and monitor the environment at all times - An ADAS on the vehicle can itself actually control both steering and braking/accelerating simultaneously under some circumstances. The human driver must continue to pay full attention ("monitor the driving environment") at all times and perform the rest of the driving task

Answer 46

- Driver is a neccesity, but is not required to monitor the environment. The driver must be ready to take control of the vehicle at all times with notice - An ADS on the vehicle can itself perform all aspects of the driving task under some circumstances. In those circumstances, the human driver must be ready to take back control at any time when the ADAS requests the human driver to do so. In all other circumstances, the human driver performs the driving task. - In a Level 3 vehicle, the driver always must be receptive to a request by the system to take back driving responsibilities. However, a driver's ability to do so is limited by their capacity to stay alert to the driving task and thus capable of quickly taking over control, while at the same time not performing the actual driving task until prompted by the vehicle.

Answer 47

- The vehicle is capable of performing all driving functions under certain conditions. The driver may have the option to control the vehicle. - An ADS on the vehicle can itself perform all driving tasks and monitor the driving environment - essentially, do all the driving - in certain circumstances. The human need not pay attention in those circumstances

Answer 48

- The vehicle is capable of performing all driving functions under all conditions. The driver may have the option to control the vehicle. - An ADS on the vehicle can do all the driving in all circumstances. The human occupants are just passengers and need never be involved in driving

Answer 49

1) The cost of computers is lower than that of analogue or electromechanical devices 2) Software is easy to change 3) Computers provide greater reliability than the devices they replace 4) Increasing software reliability will increase safety 5) Testing and/or proving software correct (using formal verification techniques) can remove all the errors 6) Reusing software increases safety. 7) Computers reduce risk over mechanical systems

Answer 50

- The cost of computers is lower than that of analogue or electromechanical devices - There is a little truth in this as computer hardware is often cheaper than analogue/electromechanical devices - But the cost of designing, writing and certifying reliable and safe software + software maintenance costs can be enormous

Answer 51

- Software is easy to change - Superficially true: making changes to software is easy - However, there are three limitations: 1) Making changes without introducing errors is very hard 2) Any changes mean the software must be re-tested, re-verified (to check that it still meets its requirements) and re-certified for safety (as far as possible) 3) Software tends to become more "brittle" as changes are made: i.e. the difficulty of making a change without introducing errors may actually increase over the lifetime of the software

Answer 52

- Computers provide greater reliability than the devices they replace - In theory this is true - Computer hardware is highly reliable and computer software doesn't "fail" in the normal engineering sense - However, errors in software are (design) errors in the logic of what the software does, and they will affect system behaviour

Answer 53

- Increasing software reliability will increase safety - There are two views of reliability: 1) Reliability in software is frequently defined as "compliance with the original requirements specification" - The myth is untrue because many safety-critical software errors can be traced back to errors in the requirements - It's also untrue for the fact that many software-related accidents have occurred whilst the software was doing exactly what the requirements specification said it should - So compliance with the specification might increase, but that doesn't do anything for safety if the specification is wrong 2) Reliability can also be viewed as "running without errors." - This myth is untrue because errors can be removed that have no effect on safety, so the system is more reliable but not any safer - Safety and reliability, while partially overlapping, are not the same thing - Increasing computer and software reliability does not necessarily result in increased system safety

Answer 54

- Testing and/or proving software correct (using formal verification techniques) can remove all the errors - This is untrue - Testing tools and verification of software are getting better and better - Testing only shows presence of bugs and not their absence However: 1) The large number of possible paths through most realistically sized programs makes exhaustive testing impossible 2) It's possible to try verifying that software meets its original specification, but, this won't tell us if the specification itself is wrong. There are no formal techniques for checking the correctness of requirements specifications 3) It's possible to do some formal verification of code - mostly mathematical proofs, but can only do this for small pieces of software

Answer 55

- Reusing software increases safety - Could be true for reliability, but untrue for safety - Unfortunately reusing software may actually reduce safety because: 1) It gives rise to complacency - "it worked in the previous system so it'll work in this new system" 2) Specific hazards of the new system cannot be considered in the original design and development of the reused software

Answer 56

- Computers reduce risk over mechanical systems - This is partially true - Computers have the potential to reduce risk, particularly where they are used to automate hazardous/tedious jobs

Answer 57

1) They allow finer, more accurate control of a process/machine - Yes, they can check process parameters more often, perform calculations quicker - But this could lead to processes running with reduced safety margins, closer to its optimum 2) Automated systems allow operators to work farther away from hazardous areas - Yes, in normal running of the process they do allow this - But operators end up having to enter unfamiliar hazardous areas 3) Eliminating operators eliminates human errors - Yes, operator errors are reduced - But there are still human design and maintenances errors - Humans are not eliminated but instead only shifted to different jobs

Answer 58

- Performs or controls functions which, if executed erroneously or if they fail to execute, could directly inflict serious injury to people and/or the environment and cause loss of life - Performs or controls functions which are activated to prevent or minimise the effect of a failure of a safety-critical system (this is sometimes called safety-related software)

Answer 59

- Features which ensure that the software will execute within a system without contributing to hazards - A product performs predictably under normal and abnormal conditions - The likelihood of an unplanned event occurring is minimised and its consequences controlled and contained thereby preventing accidental injury and death - Must be able to deal with unexpected events. Perform in right way even if given wrong data

Answer 60

- Features are things that are designed into a product - Procedures must ensure that the system is used in an operational environment for which it was intended, and for the task for which it was intended

Answer 61

- Products are usually designed to perform under certain "normal" conditions - Safety critical systems must also be designed to take into account performance under abnormal conditions - If a product has been designed to perform predictably under normal and abnormal conditions, in theory, no unplanned events will occur - But loopholes that designers have not spotted can cause accidents for two reasons 1: Designers are not guaranteed to have considered all normal and abnormal situations 2: Human operators/users of systems tend to make mistakes, particularly when they are tired, bored, in a hurry or under pressure

Answer 62

Software safety is one of many components which determine overall system safety. • How software contributes to or detracts rom the safety of this system • What safety role is performed by the software • How the software reduces the likelihood/severity of potential hazards

Answer 63

- Computer-based control systems are obvious ways in which computers are used - However there are many circumstances where computers and software are involved in a much less obvious way • Software-generated data used to make safety-critical decisions • Software used for design • Database software that stores and retrieves information

Answer 64

1) Providing information or advice to a human controller upon request, perhaps by reading sensors directly 2) Interpreting data and displaying it to the controller, who makes the control decisions 3) ) Issuing commands directly, but with a human monitor of the computer's actions providing varying levels of input 4) Eliminating the human from the control loop completely

Answer 65

- If the computer gives incorrect process-control information, it may not be a big issue because the operator has direct access to controls and displays - However, a knowledgeable and experienced operator is needed

Answer 66

- The computer reads the sensor outputs, interprets it and displays its interpretation to the operator - The operator no longer has direct access to the sensor output. The operator only has access to what that computer displays - Now, any misinterpretation by the computer will affect how the operator controls the process - The operator has lost direct feedback on the state of the process

Answer 67

- Here the operator controls the process and receives sensor information only via the computer - The computer is interpreting both sides of the control loop - Thus it is even more important that the computer works correctly, since the operator is totally isolated from the process (e.g. fly-by-wire aircraft)

Answer 68

- Computer assumes complete control of the process - The operator may provide advice or high-level direction - The operator only knows what the computer reveals about the state of the process, and has no access to the controls or sensors - In such automated systems the operator usually has to set the initial parameters for the process and starts it running, but the computer then takes over control - The implications of computer error or failure are now very serious - Upon error/failure, to control the process the operator may have to step in with little or no information to deal with the situation

Answer 69

1) Hazard identification 2) Hazard causal analysis 3) Hazard resolution and control 4) Hazard verification

Answer 70

- Use: checklists, hazard indexes, event trees, HAZOPS - Aims to identify hazards that singly or in combination could lead to an accident - Check what hazards exist - Check what effects these hazards have - Check the complexity of these hazards

Answer 71

- Use: reliability block diagrams, failure modes and effects analysis, fault trees - Look at how these hazards occur - Evaluate the causal factors of the hazards - Determine if a particular causal factor responsible for several hazards

Answer 72

- Use: many, varied techniques, which, of course, depend on the type of hazard - Identify how hazards can be controlled and eliminated - Identify general design criteria that the design must meet - Identify safety devices and procedures - Identify specific design methods for reducing, controlling or indeed eliminating hazards

Answer 73

- Use: many techniques such as documentation, testing, operational experience - Check whether identified hazards have been appropriately controlled for - Identify the remaining hazards - Consider their probability of occurrence, potential losses, cost of removing them

Answer 74

1) Brainstorming 2) Checklists 3) Event Tree 4) HAZOPS (HAzards and OPerability Studies) 5) Design Criteria 6) Fault Tree Analysis 7) RBD (Reliability Block Diagrams)

Answer 75

- One of the most widely used methods of hazard analysis - Users must have background knowledge - This is an effective technique for hazard identification, even if it appears very simple - It is, however, essential to assemble a team with sufficient breadth of expertise. Advantages: - Very simple - Can be applied to software Disadvantages: - Needs a suitable team of experts - Can be time-consuming

Answer 76

- Basic checklists are simple lists of hazards or specific design features - More complex ones involve open-ended questions Advantages: - Embody a tremendous amount of existing experience which can be passed from project to project - Can be tailored to local practices and projects - Can be derived from standards and codes of practice, allowing them to test compliance - Guides thinking about the hazards in a system - Good for analysing well-understood systems with standard design features Disadvantages: - Users may rely on them too much and ignore hazards not on the list - Can become very large, making them difficult to use and giving a false sense of security - Really only useful for simple systems

Answer 77

- Given some initiating event, identify all possible outcomes by determining all sequences of events that could follow it - The event tree is drawn left to right, with branches under each component heading corresponding to two alternatives: (1) Component operates (2) Component fails Advantages: - Useful for identifying the protection system features that contribute most to the probability of an accident - Identify top events for subsequent fault tree analysis - Display various accident scenarios that may result from a single initiating event - Can handle sequencing of events Disadvantages: - Can become very complex, especially when a number of time-ordered interactions are involved - A separate tree is needed for each initiating event, thus it's not feasible to consider multiple initiating events or interactions between initiating events - Doesn't help in determining whether the paths lead to system failure. This must be determined by the operator of the tree using their knowledge of the system components

Answer 78

- A structured and systematic examination of a complex planned or existing process or operation in order to identify and evaluate problems that may represent risks to personnel or equipment - A qualitative technique whose purpose is to identify all possible deviations from the design's expected operation, and hazards associated with these deviations - HAZOPS analyses aim to consider: 1) The design intention of the system 2) Potential deviations from the design intention 3) The causes of these deviations from the design intention 4) The consequences of such deviations - The HAZOPS process operates as follows: 1) Identification of entities in the system - An entity is an interconnection between components and the corresponding interactions - These encompass flows of physical material or signals and data 2) Description of the Attributes for the Entities in the system - An attribute is a property of an entity which will help to determine the correctness of the system's operation 3) Each line in the system drawing is examined, and guidewords are applied to generate questions 4) Each deviation generated is considered by the team to assess every potential cause and the effect on the system as a whole Advantages: - Simple, easy to use, identifies problems at the design stage - Is focused on finding hazards - Does not concentrate on failures, but can uncover more complex types of hazardous events and causes - The basic idea is applicable to new designs, complex systems of which there is little experience, and procedures that occur infrequently Disadvantages: - Time and effort required — it is very labour-intensive - Relies heavily on the judgment of those performing HAZOP analysis - Rarely considers organisational factors as it only observes the physical aspects of the system, without its surrounding context

Answer 79

- NO, NOT, MORE: The intended result is not achieved, but nothing else happens - MORE: More of any relevant physical property than there should be - LESS: Less of a relevant physical property than there should be - AS WELL AS: An activity occurs in addition to what was intended, or too many components are present - PART OF: Only some of the design intentions are achieved - REVERSE: The logical opposite of what was intended occurs - OTHER THAN: No part of the intended result is achieved, and something completely different happens - EARLY: Signal arrives before expected clock time - LATE: Signal arrives after expected clock time - BEFORE: Signal arrives earlier than intended within a sequence of signals - AFTER: Signal arrives later than intended within a sequence of signals

Answer 80

- A particular hazard will usually have a related design criterion which is needed to avoid or mitigate the hazard - It is common to for these to be given as part of the hazard analysis process - A design criterion is a statement of what is to be achieved, not how it is achieved - That is, it states the goal that the designer must achieve - Can be used to derive requirements

Answer 81

- A graphical method; means of analysing the underlying causes of hazards: 1) Assume a particular system state 2) Assume the required top event 3) Write down the causal events related to the top event, and the logical relationships between them, going down the tree until the basic/primary events are reached 4) Intermediate events are combined with logical operations (AND, OR, ...) Advantages: - Developing the tree forces a system-level examination, beyond the context of single components or subsystems - Graphical format provides a pictorial representation of event relationships. Humans are good at seeing patterns in graphics - Can assist in identifying scenarios leading to hazards and suggest possibilities for hazard elimination or control Disadvantages: - Shows cause and effect relationships but very little else - A fault tree is a simplification of some complex process, and may leave out important information - Time and rate-dependent events are hard to represent

Answer 82

- The top or root node is called the "top event" — the hazard whose cause is to be analysed - Work backwards to determine its cause

Answer 83

- A technique for showing the subsystems which contribute to a potential hazard, so that only those contributing to a hazard need to be analysed in detail - RBDs can be used to model both what happens when components work and what happens when components fail RBD process: 1. Construct an appropriate block diagram for the system 2. Define the system's failure modes 3. Construct the RBDs by connecting the blocks identified in step 1 into "success paths" 4. Analyse the RBDs to determine which blocks of the system contribute to particular failure modes

Answer 84

- A RBD can be converted to a success tree by replacing series paths with AND gates and parallel paths with OR gates - This can then be converted to a fault-tree by applying De Morgan's Theorem: 1) ¬(A Λ B) = ¬A ¬B 2) ¬(A V B) = ¬A V ¬B - In general, a Fault-Tree can be converted to a RBD, but it is generally more difficult to convert an RBD into a fault-tree, especially if the RBD uses highly complex configurations

Answer 85

- If failure is being modelled (for a serial system) then the diagram represents failure if any component in the system fails - Failure rate will be the sum of the individual components' failure rates - Equation: λ = λ₁ + λ₂ + ... + λₙ

Answer 86

- If success is being modelled (for a serial system) then the diagram represents success if any component in the system works then the following diagram represents success if all components in the system work - Reliability will be the product of each component's reliability - Equation: R(t) = R₁(t) \* R₂(t) ... \* Rₙ(t)

Answer 87

Failure: - If failure is being modelled (for a parallel system) then failure rate will be the product of the individual components' failure rates - Equation: λ = λ₁ λ₂ ... \* λₙ Success: - If success is being modelled (for a parallel system) then reliability can only be calculated indirectly - This starts by calculating the probability that an individual component will fail - The probability of component i failing to function correctly (Q = unreliability) is: Qᵢ(t) = 1 - Rᵢ(t) - The unreliability of the complete system must be the probability of all components failing independently - If there are N components, with reliabilities R₁(t) ... Rₙ(t), then for the complete system: Q(t) = (1 - R₁(t))(1 - R₂(t)) ... (1 - Rₙ(t)) - The reliability of this system (if all N components have the same reliability) is then, with Q representing unreliability: • R(t) = 1 - Q(t) • = 1 - [(1 - R₁(t))(1 - R₂(t))...(1 - Rₙ(t))] • = 1 - [1 - Rᵢ(t)]ᴺ Overall: - The diagram represents success if one of the components in the system works as well as if all components in the system fail

Answer 88

Event Tree: - Initiating event (may/may not lead to a hazard) - Work forward in time to determine all possible sequences of events arising from the initiating event Fault Tree: - Identified Hazard - Work backwards in time to determine the intermediate events (causes) leading to the hazard - Event trees are very good at determining possible hazards because we can look at the hazards it shows us - Fault trees allow us to work out how the hazards occurred

Answer 89

Combines fault tree with event tree

Answer 90

- The number of failures that occur in a given period of time - Denoted as λ

Answer 91

Reliability: - Denoted as R(t), reliability is the probability of a device functioning correctly over a given period of time under a given set of operating conditions - Formula: R(t) = e^(-λt) Exponential failure law: - States that for a constant failure rate, reliability falls exponentially with time Unreliability: - Denoted as Q(t), unreliability is the probability of a device failing to function correctly over a given period of time - Formula: Q(t) = 1 - R(t)

Answer 92

- Stands for Mean Time Between Failures - Denoted as: 1/λ

Answer 93

- The mental action or process of acquiring knowledge and understanding through thought, experience and the senses

Answer 94

- The behavioural and cognitive process of selectively concentrating on a discrete aspect of information, whether deemed subjective or objective, while ignoring other perceivable information

Answer 95

- The process of organising, identifying and interpreting sensory information in order to represent and understand the environment

Answer 96

- The memory that provides a space for the combination of sensory and memory information that provides conscious cognition - Roughly speaking it can handle 7 ± 2 chunks of information with around 70ms rapid access times - It doesn't persist beyond 200ms unless actively maintained

Answer 97

- The repository of all knowledge gathered throughout life - It has long access times ranging from 100ms to days, and seems to exhibit cache-like behaviour, with recent information being easier to access

Answer 98

1) Semantic - Knowledge-based memory that stores information about the world - It stores meaning such as facts, as well as information about the relationships between certain things - Hence, it contains conceptual connections between huge numbers of items - This idea is what spawned the concept of a neural network - It is built up by association via experience and exposure 2) Skill - Memory that retains information on how to perform certain tasks - This tends to consist of semantic scripts/frames for performing certain tasks - template like knowledge for certain situations - These provide us with expectations and allow us to predict outcomes - They tend to consist of a generalized outline of a particular situation 3) Episodic - Memory that retains information about what has happened, either factual or fictitious - This memory usually consists of memory of key objects, actions and agents - It is usually remembering through reconstruction more than simple playback of mental recordings - The gaps are filled in with the help of frame semantics

Answer 99

- The use of domains of knowledge and understanding to interpret and formulate theories about the functioning of the world around the individual - Reasoning can take place within domains of knowledge, using familiar structures and common concepts, or can be via the application of pseudo-logic to combine rules and experience to be rational within sets of knowledge

Answer 100

1) Intuitive - This type of reasoning is parallel, rapid and implicit and is provided by default heuristic processes for everyday cognitive tasks 2) Deliberative - These are singular and conscious reasoning processes that are often strategic and abstract for demanding cognitive problems

Answer 101

- A discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them 1) The scientific study and formulation of engineering principles for interactions with computer systems 2) The matching of computational tools to human needs and aspirations 3) The amplification of human potential

Answer 102

Low-Fidelity Prototypes: - Loose exemplifiers of the proposed system that are quick and cheap to build and modify - Unfinished UX lets users suggest alterations and interact creatively - Examples include sketches (static pictures and text), storyboards (dynamic interactions as a series of images), index cards (cards for parts of interactions to be stepped through), wizard-of-oz (faking functionality with a human operator) High-Fidelity Prototypes: - These aim to represent the final prototype as accurately as possible, both in materials and design - Useful for exploring dynamic navigation flow and technical feasibility, as well as obtaining user feedback on dynamic interactions - Complete functionality with clearly defined a navigational schema, and great for exploration and testing - Can be expensive - They are usually made in rapid-prototyping languages Issues with prototyping: - Paper prototypes make it hard to capture the dynamics of software - Software prototyping can result in slow programs with bad icons and limited functionality - Horizontal Prototypes: A wide range of functionality but with little detail - Vertical Prototypes: A depth of detail but with few functions - Evolutionary vs. Throw-Away Prototypes: There could be concerens over whether the prototype can really be used as a stepping stone to the final product

Answer 103

Elements that should be considered: 1) Visual and Auditory Feedback: - The use of sound, video, animation (2D and 3D) for feedback to the users - Care must be taken to ensure that such displays convey information in a clear and unambiguous fashion 2) Interaction Schemes - Consideration of what interaction schemes are most appropriate for the system to prevent user error and reduce user cognitive load as much as possible - Use of hardware controls, touch, mouse and keyboard, and other interaction paradigms 3) Device Paradigm: - Size and portability of the device should be appropriate for the task at hand - This can also consider whether the interaction paradigm should be ubiquitous or wearable 4) Input Methodologies - Implementing appropriate input and control schemes for the task at hand - This can be common schemes, but also involve gestures, speech recognition and other more advanced interaction techniques - This should also consider system accessibility 5) System Context: - The system should be aware of its context and how it interacts with it and the wider system - This will reduce the cognitive overhead for users in interacting with the system 6) Personalisation: - Under some circumstances it is appropriate to let users customise their interaction with the system, as this sometimes has the potential to further reduce cognitive load

Answer 104

- A set of techniques that aim to assess the usability of a system - It involves an examination of the interactive system to reason about the effectiveness and efficiency of the system - It is essential to evaluate the design of the safety-critical system's interaction to ensure that it reduces the scope for users to make errors

Answer 105

- The process of applying a set of 'rules' for interaction design to an interface to determine which elements follow or violate the rules - It can be applied at all stages of system design

Answer 106

1) Visibility of System Status - The system should always keep users informed about what is going on through appropriate feedback in a reasonable time-frame 2) Match between the system and the real world - The system should speak the domain language with words, phrases and concepts familiar to the user - Information should appear in a natural and logical order 3) User control and freedom - Users often choose system functions by mistake and will need a clearly marked exit to leave any unwanted states - Undo and redo support where relevant 4) Flexibility and efficiency of use - Accelerators may speed up interaction for the expert user so it can cater for both novice and expert users 5) Help and documentation - Any documentation should be easy to search and focused on the user's task - It should not be too cumbersome and should list concrete steps 6) Recognition rather than recall - Minimise user memory load by making object, actions and options visible - The user should not have to remember information while moving around the interface 7) Error prevention - Even better than good error messages is a design that prevents error states - Either eliminate error-prone conditions or check for them and use confirmation dialogues 8) Help users to recognise, diagnose and recover from errors - Error messages should be in plain language and could also suggest a solution 9) Aesthetic and minimalist design - Dialogues and the interface should not contain information that is irrelevant or rarely needed - Extra information decreases the relative visibility of other information 10) Consistency and standards - Users should not have to wonder whether different words, actions or situations mean the same thing - Follow platform conventions

Answer 107

- Observation of users interacting with the system with the intention of uncovering problems with the system that compromise the interaction and safety or impose additional cognitive load on users

Answer 108

1) Make Participants Comfortable - They may feel that they are being tested rather than the system 2) Think-Aloud Protocols - Verbal reports from the person completing the task - Can interfere with task performance or relies on memory being reliable, but they provide detailed information on goals, planning and actions 3) Task-Oriented - Provide the users with a set of tasks to complete 4) Probing - Whether interaction with the user is during or after the testing should dive into the interaction to find usability issues 5) Neutrality - Participants can tend to try to please the testers, so this should be considered in the assessment

Answer 109

- An engineering model of the human informational processes that allows for the calculation of basic performance parameters and consists of the following subsystems: 1) Perception - The information received from the various senses about the world around us and how we respond to change. This is based on change, so all receptors respond to change, and attentional filtering in the brain allows change to be spotted. It is a cognitive process that allows an understanding of what is happening in the surrounding world. 2) Working Memory - The memory used for conscious cognition 3) Long-Term Memory - The memory used for the storage and recall of information 4) Action - The results of cognition and reasoning about the surrounding world

Answer 110

- A subset of external cognition that involves the interactions between individuals and artefacts, using internal and external representations - It focuses on information propagation as information is processed by an entire socio-cognitive system and is transformed through different media - It takes a distributed view of problem solving

Answer 111

- The perception of elements in the environment within a volume of time and space - The comprehension of their meaning - The projection of their status in the near future - Is highly important in work domains with high information flow and where poor decisions may lead to serious consequences - It provides a critical foundation for successful decision making in complex and dynamic systems

Answer 112

- The degree to which every team member possesses the SA required for their responsibilities

Answer 113

Level 1: failure to correctly perceive the information Level 2: failure to correctly integrate or comprehend information Level 3: Failure to project future actions or state of the system

Answer 114

_Level 1_ 1) Data not available 2) Data difficult to detect/perceive 3) Failure to scan or observe data a) Omission b) Attentional narrowing/distraction c) High taskload 4) Misperception of data 5) Memory failure _Level 2_ 1) Lack of/poor mental model 2) Use of incorrect mental model 3) Over-reliance on default values in model 4) Memory failure 5) Other _Level 3_ 1) Lack of/poor mental model 2) Other _Other_ 1) Maintaining Multiple Goals 2) Habitual Schema 3) General

Answer 115

- The process of determining whether a component or the output of a development phase meets its specification, as well as whether the system safety constraints and the safetyrelated functional requirements are correct - It includes both whether the specification is correct, and whether the system meets it specification - Verification of safety goes beyond software verification, including: 1) System Safety Constraints 2) Safety-related Functional requirements

Answer 116

- The execution of a system or component to investigate its characteristics and behaviour - It involves testing for hazardous outputs and direct testing of safety design features (those that are included to eliminate or control hazards) - Dynamic testing is often success-oriented, focusing on what the software is supposed to do, rather than what it is not supposed to do

Answer 117

- The evaluation of a system or component to investigate its characteristics without executing the system - This is often more time-consuming than dynamic analysis as the techniques utilised are complex

Answer 118

- Black-box testing treats the component as a black box, without access to the source code - In this context, the only knowledge of the component comes from its specification - The aim is to design tests (with expected results) based only on the specification of the system, and checking the results and behaviours that the system produces against the expected results - It is impossible to test every possible input, so focus is placed on common modes of operation and critical modes of operation - In white-box testing, the code is visible to those performing the tests - In this kind of testing, knowledge about the system can come from both its specification and the code itself - The aim is to design tests (with expected results) to force execution through all possible paths in the code, with a particular focus on critical paths and boundary conditions - It is almost invariably not complete, but having access to the source allows the use of coverage metrics and other testing metrics

Answer 119

1) Use the Safety Lifecycle 2) Have a Safety Culture 3) Ensure Competence and Training Principle 4) Ensure Accountability Principle 5) Have a Safety Management System

Answer 120

- A milestone during product development where a design is evaluated against its requirements in order to verify the outcomes of previous activities - They are key parts of design controls for Safety-Critical Systems

Answer 121

- Proving that a program satisfies a formal specification of its behaviour - This can be via deductive verification, abstract interpretation, automated theorem proving, type systems and lightweight formal methods

Answer 122

- A technique for gathering information about the possible set of values calculated at various points in a program - A program's control-flow graph is used to determine the parts of a program to which a particular value might propagate

Answer 123

- The creation of models in order to help people understand and simulate the subject of the model Common modelling techniques: 1) Formal Methods: - Use of mathematical languages to model system components at the specification level - This allows the specification to be checked for correctness and consistency through formal proof 2) Software Prototyping: - Partial implementations of the system used to investigate key system features - Often used in UI design or component conceptualisation 3) Performance Modelling: - Modelling the processes in the system and the ways in which they interact, adding appropriate input data - This aims to check the working capacity of the system 4) State-Transition Diagrams: - Validation of the state transition requirements in the specification - Ensures that all states are reachable and that appropriate intermediate states are used where necessary, and thus helping to build a comprehensive model of the system's states 5) Environmental Modelling: - Use of a simulation environment, that simulates the environment in which the system is used, to test the system

Answer 124

- Validation aims to ensure that the specification for the software system is complete and correct - It cannot be assumed that these properties hold, and the specification should be rigorously examined to ensure that it encompasses all elements of the software's functionality and all edge cases

Answer 125

- Once the specification has been validated it can be used to compose a set of test cases - These should cover all elements of the specification, including: boundary values, expected values, extreme values

Answer 126

- In general, it is impossible to have tested enough except on the most trivial of programs - There is always a possibility of bugs, and this possibility should be dealt with at an organisational level - In general, there are certain things that companies can do to ensure better testing: 1) Test coverage metrics (for white-box testing) to determine how much of the codebase has been tested 2) Testing by experts, who know what to look for and break more efficiently than novices 3) Manual introduction of bugs: this allows the measurement of how many potential bugs remain: SD/S = UD/U where SD = Seeded Detected, S = Seeded, UD = Unseeded Detected, U = Unseeded.

Answer 127

1) Safety Plan • Safety plan is a description of how the project will meet the safety requirements • Must be submitted to the appropriate Safety Authority who will check that the course of action being proposed is safe • Approval by the Safety Authority provides assurance that the approach documented in the Safety Plan is appropriate 2) Documentation and Records • Records must be kept to prove that the Safety Plan has been followed, including: records of design activities, tests, meetings etc. • Particularly important is the Hazard Log describing: 1) Hazards/accidents identified 2) Action to remove them or reduce them 3) Independent Professional Review • Involves checks by independent engineers providing assurance and supporting the Safety Case • Frequency and depth of each type of review depends on the risks involved • Two types: Audits (examinations of process) and Assessments (examinations of and product or system) 4) Safety Case - A document that provides an argument for the safety of the project - It is a compilation of evidence: 1) records 2) independent review - Objective is usually to gain certification 5) Safety Approval • Approval from Safety Authority (usually some standards body), which is based on the safety case • Issue of Safety Certificate upon approval

Answer 128

- The difficulties of management are primarily related to people, and the management of safety is no exception - However: • Competent staff cannot alone ensure safety • Flaws in the safety culture of an organisation are a common cause of accidents • Organisations must be run so that safety is seen as the primary goal, with consciousness of safety evident in every activity

Answer 129

- All staff involved in safety-related activities must be competent in terms of training, technical knowledge, experience and qualifications - Inter-personal skills are also very important such as teamwork and acceptance of constructive criticism - The level required of all the above will depend upon the particular task being undertaken

Answer 130

- Accountability: assessment and measurement of results • The person ultimately accountable for safety is the Safety Authority. - Responsibility for safety can be delegated, accountability cannot

Answer 131

- Part of achieving a safety culture is getting every project to use the same best practice processes and engineering techniques - Best management practice to achieve this is: • Establish a Safety Management System — a documented process that embodies all the principles we have discussed for enhancing safety • Ensure that the Safety Management System is auditable and can be easily changed as best practice evolves

Answer 132

1. Safety is a System Issue Safety of system components cannot be considered in isolation. Software safety can only be evaluated in the context of the system in which it operates. 2. Build Safety Into Systems: It is much more effective to build safety into a system than to add protection systems to a completed design. The earlier safety is considered in development, the better the results will be. 3. Simplicity: Systems that are simple and intellectually manageable are more likely to be safe. Simple systems have only the features required, and should not be confused with simplistic solutions that do not adequately account for the complexities of an issue. This is contrasted with complex systems that have unneeded features and that are over-engineered (more complex than necessary for its application). 4. Avoid Complacency: This is the most important risk factor in the development and use of safety-critical systems, and a safety culture must be established to minimise it. 5. Safety and Reliability: These are different things and should not be confused. While safe systems are generally reliable, reliable systems may be unsafe. 6. Hazard Identification and Analysis: This is essential for designing safety into a system, as hazards evolve over time. 7. Do not Over-Simplify: Accidents are complex and caused by many factors. Fixing symptoms and ignoring root-level causes will not prevent the repetition of most accidents. Concentrating on a subset of the aspects of the problem will not result in an effective safety program (e.g. not accounting for cultural issues). 8. Learn: From past accidents and avoid repeating past mistakes. 9. Humans and Computers: Replacing humans with computers will not necessarily make systems safer. 10. Computer and Software Use: Only use computers and software if their advantages outweigh their disadvantages

Answer 133

- Mission critical - Business critical - Security critical

Safety-critical systems Flashcards

(157 cards)