Backup Options   «Prev  Next»

Lesson 2Listing methods for minimizing downtime
ObjectiveIdentify methods to minimize downtime

List Methods for Minimizing Downtime

What should I do if there is a media failure and I find out that several database files are damaged?
Encountering media failure and resultant database file damage is a serious issue. However, it's not an uncommon scenario in database administration. By following a systematic recovery process, you can often restore your database to a functional state. Here's a broad outline of the steps you should take:
  1. Identify the Extent of the Damage: Use database utility tools and logs to identify the scope of the damage. Understand which files are damaged and whether they affect system-critical data or operations.
  2. Quarantine the Affected System: To prevent further damage or data loss, isolate the affected system immediately. This could involve taking it offline or disconnecting it from your network.
  3. Notify Key Stakeholders: Inform the relevant parties about the situation. This could include IT managers, database users, and potentially customers if the outage impacts service delivery.
  4. Begin the Recovery Process: Implement your disaster recovery plan. This should involve the following steps:
    1. Restore from Backup: Hopefully, you have a recent backup of your database. Restore the damaged files from this backup. The restore process will depend on your DBMS (Database Management System) and backup solution.
    2. Apply Transaction Logs: If your backup isn't current up to the point of failure, apply transaction logs to bring the database up to date. This assumes you're using a form of logging that supports Point In Time Recovery (PITR).
    3. Check Data Integrity: After restoring the data, perform integrity checks to validate that the data is consistent and correct. Most DBMS have built-in tools for integrity checks.
  5. Post-Recovery Analysis: After the recovery process, analyze the incident to understand why the failure occurred and how a similar situation can be prevented in the future. Check if your backup strategy is sufficient and whether you need to improve hardware or system software to prevent such failures.
  6. Review Disaster Recovery Plan: Use this opportunity to review and improve your disaster recovery plan. If there were hiccups during the recovery, address them in the revised plan.

Remember, the best defense against media failure is a robust backup strategy, including regular backups and transaction log backups, coupled with a well-planned and frequently tested disaster recovery strategy. It's always better to prevent disaster than to have to recover from one.
What would you do if there was a media failure and you found out that several database files were damaged? As an experienced DBA, you know that you must act quickly to keep the database downtime to a minimum. Before you start media recovery, you must determine the nature of the damaged database files and the tablespaces that they belong to. Then you need to choose an appropriate recovery method based on whether the damaged database files are essential or non-essential. Generally speaking, there are three methods that you may incorporate into your recovery strategy to minimize database downtime:
  1. Start the database with missing datafiles
  2. Apply parallel recovery to the damaged database files
  3. Avoid the need for recoveries by multiplexing the online redo logs and control files

Start the database with missing datafiles

If the damaged files are neither control files, nor do they belong to the system or rollback segment tablespaces, you can start the database without them. In this way, you make the database accessible to users who do not need these datafiles. After the database is open, you can recover the damaged datafiles and put them back online when recovery is complete.

Parallel recovery

Parallel recovery is a very effective way to minimize recovery time. This process allows several datafiles located on different disks to be recovered at the same time. It is especially useful when you have to perform a closed database recovery. For example, if a most of the datafiles are damaged, or if the damaged datafiles belong to the system or rollback segment tablespaces, you can perform parallel recovery.

Multiplexing Control Files and redo logs

The most efficient way to minimize database downtime for recovery is to avoid the need to perform one. If the damaged or lost files are online redo logs or control files, and you have multiplexed these files in your database configuration, you do not need a recovery at all. You simply copy the mirrored files back to their original location and make the database available immediately.
In order to minimize database downtime for the purpose of recovery, all you need to do is to establish a good strategy for your database configuration and recovery process. If you spread the database files onto different disks, or multiplex the online redo logs and the control files, you'll be better prepared in the event of a media failure. The next lesson demonstrates how to start a database with missing datafiles.