Chapter 5 - Availability Flashcards

Question

10 tactics for preparation and repair for recovering from faults

Answer 1

- active redundancy - passive redundancy - spare - exception handling - rollback - software upgrade - retry - ignore faulty behavior - degradation - reconfiguration

Answer 2

basically having hot backups

Answer 3

the backups are not hot and get fed information during periodic updates

Answer 4

a completely offline (or cold) version that undergoes a power-on-reset procedure when a fail-over occurs before it goes into service

Answer 5

in-service upgrades to executable code images in a non-service-affecting manner think iOS update!!

Answer 6

trying an operation again may lead to success if the failure is transient

Answer 7

reassigning responsibilities to the resources left functioning while maintaining as much functionality as possible

Answer 8

- shadow - state resynchronization - escalating restart - non-stop forwarding

Answer 9

operating a previously failed or in-service upgraded component in a “shadow mode” for a predefined time prior to reverting the component back to an active role

Answer 10

partner to active redundancy and passive redundancy where state information is sent from active to standby components

Answer 11

recover from faults by varying the granularity of the component(s) restarted and minimizing the level of service affected

Answer 12

functionality is split into supervisory and data. If a supervisor fails, a router continues forwarding packets along known routes while protocol information is recovered and validated.

Answer 13

- removal from service - transactions - predictive model - exception prevention - increase competence set

Answer 14

bundling state updates so that asynchronous messages exchanged between distributed components are atomic, consistent, isolated, and durable

Answer 15

monitor the state of health of a process to ensure that the system is operating within nominal parameters take some action if a dangerous state is near THINK BANKERS ALGORITHM

Answer 16

preventing system exceptions from occurring by masking a fault, or preventing it via smart pointers, abstract data types, or wrappers.

Answer 17

designing a component to handle more cases/faults as part of its normal operation.

Answer 18

- determining system responsibilities that need to be highly available - allocate responsibilities for detecting 4 possible stimuli - allocate responsibilities for performing some combination of the 6 possible responses

Answer 19

- ensure that coordination mechanisms can detect the 4 possible stimuli - ensure the coordination mechanisms enable the 6 responses - ensure the coordination model supports the replacement of any of the 4 artifacts - determine if the coordination model will work under any of the 6 environments

Answer 20

- determine which data abstractions could cause a stimulus | - ensure that the 4 repair from recovery actions can be used on the data abstractions

Answer 21

- determine which artifacts may produce a stimuli | - ensure that mapping/re-mapping of architectural elements is flexible enough to permit recovery from a fault

Answer 22

- determine what critical resources are necessary to continue operating in the presence of one of the 4 stimuli - ensure there are sufficient resources after a fault to perform any of the 6 responses - determine availability time for critical resources - specify time intervals in which critical resources must be available in any of the system environments

Answer 23

-ensure availability strategy is sufficient to cover introduced faults caused by late bindings

Answer 24

- determine if available technologies can detect faults, recover, and reintroduce failed components - determine what technologies can help the response to a fault - determine availability characteristics of chosen technologies themselves

Chapter 5 - Availability Flashcards

(48 cards)