Shambhu Jha: Introductory Concept of Database Failures and Recovery

Database operations can not be protected to the system on which it operates (both the hardware and the software, including the operating systems). The system should ensure that any transaction submitted to it is terminated in one of the following ways.

a) All the operations listed in the transaction are completed, the changes are recorded permanently back to the database and the database is indicated that the operations are complete.

b) In case the transaction has failed to achieve its desired objective, the system should ensure that no change, whatsoever, is reflected onto the database. Any intermediate changes made to the database are restored to their original values, before calling off the transaction and intimating the same to the database.

In the second case, we say the system should be able to “Recover” from the failure.

Database failure

Database Failures can occur in a variety of ways.

i) A System Crash: A hardware, software or network error can make the completion of the transaction impossibility.

ii) A transaction or system error: The transaction submitted may be faulty – like creating a situation of division by zero or creating a negative numbers which cannot be handled (For example, in a reservation system, negative number of seats conveys no meaning). In such cases, the system simply discontinuous the transaction by reporting an error.

iii) Some programs provide for the user to interrupt during execution. If the user changes his mind during execution, (but before the transactions are complete) he may opt out of the operation.

iv) Local exceptions: Certain conditions during operation may force the system to raise what are known as “exceptions”. For example, a bank account holder may not have sufficient balance for some transaction to be done or special instructions might have been given in a bank transaction that prevents further continuation of the process. In all such cases, the transactions are terminated.

v) Concurrency control enforcement: In certain cases when concurrency constrains are violated, the enforcement regime simply aborts the process to restart later.

The other reasons can be physical problems like theft, fire etc or system problems like disk failure, viruses etc. In all such cases of failure, a recovery mechanism is to be in place.

Database Recovery

Recovery most often means bringing the database back to the most recent consistent state, in the case of transaction failures. This obviously demands that status information about the previous consistent states are made available in the form a “log” (which has been discussed in one of the previous sections in some detail).

A typical algorithm for recovery should proceed on the following lines.

If the database has been physically damaged or there are catastrophic crashes like disk crash etc, the database has to be recovered from the archives. In many cases, a reconstruction process is to be adopted using various other sources of information.
In situations where the database is not damaged but has lost consistency because of transaction failures etc, the method is to retrace the steps from the state of the crash (which has created inconsistency) until the previously encountered state of consistency is reached. The method normally involves undoing certain operation, restoring previous values using the log etc.

In general two broad categories of these retracing operations can be identified. As we have seen previously, most often, the transactions do not update the database as and when they complete the operation. So, if a transaction fails or the system crashes before the commit operation, those values need not be retraced. So no “undo” operation is needed. However, if one is still interested in getting the results out of the transactions, then a “Redo” operation will have to be taken up. Hence, this type of retracing is often called the “no-undo /Redo algorithm”. The whole concept works only when the system is working on a “deferred update” mode.

However, this may not be the case always. In certain situations, where the system is working on the “immediate update” mode, the transactions keep updating the database without bothering about the commit operation. In such cases however, the updating will be normally onto the disk also. Hence, if a system fails when the immediate updating are being made, then it becomes necessary to undo the operations using the disk entries. This will help us to reach the previous consistent state. From there onwards, the transactions will have to be redone. Hence, this method of recovery is often termed as the Undo/Redo algorithm.

Shambhu Jha

Monday, 4 April 2016

Introductory Concept of Database Failures and Recovery

No comments: