unit 8: distributed DBMS reliability Flashcards
Topics included in this topic
reliability concepts
failures and fault tolerance
failures in Distributed DBMS
Local reliability protocol
Distributed reliability protocol
dealing with site failure
dealing with network partitioning
what will happen if system has some inputs
system will produce some outputs
what is realibality
reliability is nothing but a measure of how far a system is working
OR
It is a measure used to indicate how successful a system is in providing the service it was intended
Reliable proper definition
A system is considered a reliable if it functions as per its specifications and produces a correct output values for a given set of input values
what is purpose of reliability
purpose of reliability is, ki ham database ki atomicity and durability ko kaise maintain kar sake
jab ki dono, atomicity and durability transaction ki properties he
what is reliable DBMS
a reliable DDBMS is the one which continues to process the user requests even when the system is unreliable i.e. even if the components of distributed computing fails, DDBMS should be able to continue execution of user request without violating database consistency
what does reliability refers to ?
reliability refers to atomicity and durability of transaction.
explain what is an availability ?
- the fraction of the time that a system meet its specification
- the probability that the system is operational at a given time t.
which protocols address these issues of reliability
the protocol, which addresses these issues are commit & recovery protocol resp.
what is meaning of failure
the failure occurs when system does not function according to specifications of a system
OR
the failure of a system occurs when the system does not function according to its specifications and fails to deliver the service for which it was intended
what is erroneous state
the internal state of a system such that there exists circumstances in which further processing, by the normal algorithms of the system, will lead to a failure which is not attributed to a subsequent fault
===
koi internal matter jisse further ki processing nahi hoti he, unn kuch internal faults ya incorrect state ki vajah se aap failure ki taraf jaa sakte he
What is an error
an error in the system occurs when a component of the system assumes a state that is not desirable the fact that the state is undesirable is a subjective judgeing
what are the types of errors / how errors are distinguished
errors are distinguished as, those which are permanent and those which are non-permanent
what is fault
a fault is detected either when an error is propagated from one component to another or failure of the component is observed
fault to failure diagram
Fault [causes] Error [results in] Failure
types of faults
Hard faults
Soft faults
how hard faults are further divided
- permanent
- resulting failures are called hard failures
explain soft faults
- transient of intermittent
- account for more than 90% of all failures
- resulting failures are called soft failures
faults classification
three type of errors
1. permanent error
2. intermittent error
3. transient error
permanent fault and incorrect design causes, permanent error, leads to system failure
unstable or marginal components causes Internal error leads to system failure
unstable environment causes transient error leads to system failure
operator mistake leads to system failure
what is fault-tolerant system
in addition to fault detection scheme, two system has redundant components and sub-system build in
on detection of a fault, these redundant components are used to replace the faulty components
full form of MTBF
mean time between failure
MTTR full form
mean time to repair
types of failure in DBMS
- hardware failures
- software failures
- storage medium failures
- implementation of stable storage
- communication failure
- transaction failure
what are hardware failures divided into further
- design errors
- poor quality control
- over utilization and overloading
- wear out
system / site failure
- failures of processor, main memory, power supply
- main memory contents are lost, but secondary storage contents are safe
- partial vs total failure
Software failures
- design errors
- poor quality control
- over utilization and overloading
- wear out
what are further storage medium failure divided into
- volatile storage
- non-volatile storage
- permanent or stable storage
which type of failures are common in both environment
- hardware failure
- software failures
- system failures
this failures are common in both environment
which failure is specific to distributed environment
- communication failure is specific to distributed environment only
- explain communication failure
- what are types of communication failure
all previous failures are common in both environment but communication failure is specific to distributed environment
- types of communication failure
- error in messages
- improper ordering of messages
- lost messages
- line failures
which failures are handled by communication network
error in message and
improper ordering of messages
this errors are handled by communication software
what is responsibility of network software and hardware
network hardware and software are responsible for for ensuring that the messages reach from source to destination correctly & in order
how messages are lost
messages are lost due to line or site failures
if communication link fails, the messages are lost but in addition network might get divided in disjoint parts, called as network partitions
what is network partitions
if communication link fails, the messages are lost but in addition network might get divided in disjoint parts, called as network partitions
how this network partitions create problem
if network gets partition, then sites in each partition can keep working but if a transaction tries to access data from two or more partitions, it can create problem
- thus maintaining mutual consistency is a problem when database is replicated
whose responsibility it is to handle loss of messages
handling loss of messages is the responsibility of network software
what is performance failure
failure of communication network to deliver & receive messages in certain time periods is called performance Failure
media failures
- failures of secondary storage devices such that the stored data is lost
- head crash / controller failure
communication failures
- lost / undeliverable messages
- network partitioning
explain implementation of stable storage
- writing same block of data from volatile to stable storage in multiple for two or more times ensuring successful identified block writing
what is audit trails
audit trail is one record which is generated for each and every transaction
and regarding the transaction it keeps certain information
the information it stores like
- who has initiated it
- when it has been initiated
- what is purpose
- where it has initiated
audit trail
the audit trail records who ( user or the application program and a transaction number ), when ( time and data ), from where ( location of the data affected, as well as a before and after image of that portion of the database that was affected by the update operation )
in addition, a DBMS contains routines that make a backup copy of the data that is modified, this is done by taking a snapshot of the before and after image of that portion of the database that is modified.
For obvious reasons, the backup are produced on a separate storage medium.
Explain local reliability protocols
what is Local Recovery Manager
LRM is a module of DBMS, which exists at each of the site.
what is function of Local recovery manager
Its function is to maintain atomicity and durability of local transactions.
what are commands handle by local recovery manager
the commands it handles are
- begin transaction
- read
- write
- commit
- abort
architecture of local reliability protocol
secondary storage
- stable DB
Main memory
- local recovery manager
[ fetch / flush ]
- database buffer manager
[ write / read ]
- database buffers volatile memory
what is Stable Database
the database, stored on stable storage device is known as stable database
what is volatile database
the data loaded in internal memory is called as volatile database
When does LRM issues fetch command
when Local Recovery Manager ( LRM ) wants to read a page on the behalf of transaction, i tissues Fetch command specifying page number to buffer manager
What is use of Flush command
LRM can also force the buffer manager to write the page on to disk.
This can be done using Flush command
how buffer manager responds to fetch command
buffer manager responds to fetch command in following manner
- searches buffer pool for required page
- if not found allocates a free buffer page and loads required database page into it
- if no free buffer is available, selects a buffer, vacates it & uses to load the page from stable database
how allocation of buffer pages is done
allocation of buffer pages is done dynamically
how buffer manager allocates pages to process
buffer manager finds out number of pages each process will need and accordingly attempts to allocate those many pages to each process
which is best known technique to replace buffer pages
to replace buffer pages the best known technique is Least recently used algorithm
what is recovery informatoin
when system fails it lost all volatile data and system must maintain some information about its state at the time of failure to restore consistency
this is called recovery information
https://www.youtube.com/watch?v=eq2EMu1Mh-w
on what methods recovery information depends
- in place update
- out of place update
reliability issues are simpler if out of place updates are used.
but most of the systems use in-place updates because of its efficiency
what are in-place update
out of place update
explain shadowing
- when update is done, this creates duplicate page called shadow page
-
https://www.youtube.com/watch?v=YA0sXVDoHig
what is differential file
https://www.youtube.com/watch?v=1xX68YYAMAM&t=147s
execution of LRM commands
log based database recovery
https://www.youtube.com/watch?v=0_DnBLn3nqg
explain redo and undo in recovery of database
https://www.youtube.com/watch?v=NzQetfezwp0
what is primary copy
https://youtu.be/RsudXqML-M8?list=PLV8vIYTIdSnbAW2wj_TiHyrFJId5zkhz2
-
what is primary site
check pointing
https://www.youtube.com/watch?v=cQHriQKfA_c
what is majority locking
https://youtu.be/RsudXqML-M8?list=PLV8vIYTIdSnbAW2wj_TiHyrFJId5zkhz2
what is timestamping
- idea is that each transaction in the system is assigned a unique timestamp to determine the serialization order
how timestamping works in centralized scheme
how timestamping works in distributed scheme
2 Phase commit protocol
write a short note on two phase commit protocol
- two phase commit protocol is used in distributed database systems
- it is basically used as a recovery system in database
- this has two phases
1. voting phase
2. decision phase
there are two different type of sites
- participant site
- coordinator site
in voting, participant vote that they want abort or commit
in decision phase, coordinator site decides whether the transaction is completed or aborted
- suppose we have transaction T1, which is started at site S1
- where transaction is started is known as coordinator site
- this transaction is running on 3 different sites, S2, S3 and S4
- all other sites, where transaction is running is known as participating sites
Now who will vote and who will participate
- S2, S3 and S4 are participating sites, inke voting ke aadhar par S1 decide karega ki isko commit karna he ya abort karna he
- kyunki S1 is coordinator site
how voting take place
- voting me jo bhi coordinator site he, S1 iss case me, wo apne log record me entry karta he jisko bolte he transaction T prepare [T, prepare]
- iska matlab hota he ki this site is ready to commit or abort the transaction
[ Ready to commit ] - aaur ye S1 wait kar rahi he ki baki ki sites commit karengi ya abort karengi
- jab apne log me entry kar lega S1, tab baki ki sites ko message send karega S2 s3 and s4 ko ki, [T, prepare]
- agar S2 ready he to wo apne log record me likh dega, [ T, ready ]
- agar S3 ready he to [ T, ready ]
- S4 [T, not-ready ]
===
depending on this votes S1 will decide whethere to commit or abort this transaction
decision phase of 2 phase commit
decision phase ke andar do rules hote he,
- agar sabse ready milta he, [ Ready, T ] message then commit
- if atleast one [ not-ready, T ] then abort the transaction
S1 will abort transaction if any one of participant has voted for not-ready
why 2 phase name like this
there are two phases
- voting phase
- decision phase