Availability/Reliability Flashcards

1
Q

The most important aspect of Availability

A

to avoid a single point of failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What two methods are used to avoid a single point of failure

A

redundancy and diversity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is SW redundancy

A

have multiple versions of the system running at once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is SW diversity

A

having different ways of doing the same work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 system architectures for ensuring availability

A
  • hot backup (running multiple versions)
  • triple voting system (have 3 versions of the system and run the best 2 at all times)
  • module monitoring code (code that monitors the running of a module)
  • dynamic post-test monitor (self-testing of hardware)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2 characteristics of a dependable SW development process

A

1) process is explicitly defined (aka well-documented)

2) process is repeatable and not subjective (works in different domains)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

5 characteristics of a well-documented process

A
  • auditable (outsiders can check it)
  • diverse (multiple ways to check it)
  • documentable (defines process activities and resulting documentation products)
  • robust (can recover from failures)
  • standardized
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 types of formal methods

A
  • consistency proof
  • refinement
  • model checking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is FM consistency proof

A
  • develop math/logic model for developed SW program

- prove the program is consistent with the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is FM refinement

A

the generation of a program from a math/logic specification using trusted correctness-preserving transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is FM model checking

A

develop math/logic model for the program. show that safety constraints/requirements/invariants are true at certain parts of the program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

6 cons of using FM

A
  • domain experts cannot check formal specs because they don’t know FM
  • hard to estimate cost/effort savings from FMs
  • few programmers have FM skills
  • can’t use for large systems (because it doesn’t scale)
  • few FM tools available
  • incompatible with Agile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

system fault vs system error vs system failure

A

fault - the wrong code is executed

error - bad system state

failure - user sees the bad system state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3 approaches to make systems more reliable

A
  • fault avoidance (write better code)
  • fault detect and correct (monitor system and correct it when faults arise)
  • fault tolerant (accept faults but have workarounds for when they arise)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is POFOD and what does it mean?

A

Probability of Failure on Demand

likelihood of system failure when a certain user action is taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is ROCOF and what does it mean?

A

rate of occurrence of failure

how often on average does the system fail (across all actions) in general

17
Q

Which types of reliability metric should be tracked for rare, common and short, and common and long operations respectively

A

POFOD for rare

ROCOF for common and short

MTTF for common and long

18
Q

3 categories of fault-tolerant architectures

A

validation filter = validate all possible input and make sure it won’t break the system

recovery from failure = have a way for the system to continue on in the face of failure

redundancy = preventing a single point of failure from crashing the system

19
Q

What is a protection system

A

a normal software system with some monitoring and correction software added on separately to the system

will shut down, reboot, and recover to the last good state in the case of failure

20
Q

What is a self-monitoring architecture

A

perform operations by copying input into several streams. Each stream then performs the same operation in a different fashion. Then the outputs of each stream are compared to each other. Any discrepancies will cause a reboot and recover

21
Q

How to create software diversity from the SW process

A

having different teams code duplicate units and not socialize with each other

22
Q

4 ways to achieve SW diversity

A
  • use different software design styles
  • use different programming langs
  • use different development tools
  • use different algorithms
23
Q

8 good programming practices to ensure reliability in code

A
  • control data visibility
  • validate all inputs
  • provide handlers for all thrown exceptions
  • minimize the use of error-prone constructs
  • make SW restartable
  • check array bounds
  • include timeouts for RPC
  • include units in src code numbers