Backup Recovery   «Prev  Next»

Lesson 2 Introduction to Fast-Start recovery
Objective Identify advantages and components of Fast-Start recovery.

Identify Components of Fast-Start Recovery

Oracle Parallel Server (OPS) was replaced by Real Application Clusters (RAC). OPS was a proprietary distributed database technology developed by Oracle that allowed multiple Oracle instances to share a single database. However, OPS had a number of limitations, including:
  1. It was difficult to manage and configure.
  2. It was not as scalable as RAC.
  3. It was not as reliable as RAC.

RAC was released in 1998 as a successor to OPS. RAC addresses all of the limitations of OPS and provides a number of additional benefits, such as:
  1. It is easier to manage and configure.
  2. It is more scalable than OPS.
  3. It is more reliable than OPS.
  4. It supports a wider range of platforms and operating systems.

As a result, RAC has become the standard for distributed Oracle databases. OPS was discontinued in 2013.

Minimize failure or downtime

Minimizing the failure or downtime of a database is an important goal for a systems administrator. To help meet this goal, Oracle introduced (TAF) Transparent Application Failover.
Transparent Application Failover (TAF) can be used with Oracle Real Application Clusters (RAC). In fact, TAF is a key feature of RAC, as it allows applications to failover to a surviving node if one of the nodes in the cluster fails.
TAF works by using a heartbeat mechanism to monitor the health of all of the nodes in the cluster. If one of the nodes fails, TAF will automatically reconnect the application to a surviving node. This will allow the application to continue running without interruption, even if the primary node fails. To use TAF with RAC, you will need to configure the following:
  • The Oracle Net listener: The Oracle Net listener is responsible for listening for incoming connections from applications. You will need to configure the listener to accept connections from all of the nodes in the cluster.
  • The Oracle Net connect descriptor: The Oracle Net connect descriptor is used by applications to connect to the database. You will need to configure the connect descriptor to specify the name of the listener and the port number that it is listening on.
  • The Oracle Net TNSNAMES.ORA file: The Oracle Net TNSNAMES.ORA file is used by applications to resolve the name of the listener to the IP address of the node that it is running on. You will need to add an entry to the TNSNAMES.ORA file for each of the nodes in the cluster.

Once you have configured these settings, you can enable TAF on the RAC cluster by issuing the following command:
ALTER SYSTEM SET FAILOVER_METHOD = TAF;

Once TAF is enabled, applications will automatically failover to a surviving node if one of the nodes in the cluster fails.
Here are some of the benefits of using TAF with RAC:
  • Reduced downtime: TAF can significantly reduce the downtime that occurs after a node failure.
  • Improved availability: TAF can help to improve the availability of your applications by reducing the amount of downtime that occurs after a node failure.
  • Reduced administrative overhead: TAF is a self-tuning feature, so there is no need for you to manually configure it.
This results in complete transparent migration of the users to the failover node, providing continuous availability of a database in case of database outages. This also allows the user to cache the queries on the client machine and re-instantiate them within the failover node.

Historical Note

Oracle8i introduced Fast-Start Fault Recovery to help systems administrators quickly recover the database from system faults and further minimize database downtime. Fast-Start Fault Recovery improves the recovery time when a database crashes, and consists of the following features:
  1. Fast-Start checkpointing
  2. Fast-Start on-demand rollback
  3. Fast-Start parallel rollback

Checkpointing

One way to increase the performance of a database is through checkpointing, which decreases the number of I/O operations needed to perform instance recovery. In Oracle8i, this was accomplished by the Fast-Start checkpointing[1] component of the Fast-Start Fault Recovery feature. We will look at checkpointing in more detail later in this module.

Checkpointing is still used to increase the performance of an Oracle database. Checkpointing is a process that copies the contents of the database buffer cache to the redo log files. This allows the database to quickly recover from a system failure or other unplanned outage.
Checkpointing is an important part of the Oracle recovery mechanism. When the database crashes, it must be able to read the redo log files to recover the data that was in the buffer cache when the crash occurred. If the buffer cache is not checkpointed, the database will have to reread all of the data from the datafiles, which can take a long time. It also helps to improve the performance of the database by reducing the amount of time that is spent writing dirty buffers to disk. When a dirty buffer is written to disk, it is no longer considered to be part of the redo log. This means that the database can reuse the space that was previously occupied by the dirty buffer.
There are two types of checkpoints:
  1. Full checkpoints: A full checkpoint writes all of the dirty buffers to disk. This is the most comprehensive type of checkpoint, and it ensures that the database can recover from any type of failure.
  2. Incremental checkpoints: An incremental checkpoint writes only the dirty buffers that have been written to the database since the last checkpoint. This is a less comprehensive type of checkpoint, but it can be used more often without slowing down the database.
The frequency of checkpoints can be controlled using the `LOG_CHECKPOINT_INTERVAL` parameter. The default value for this parameter is 30 minutes. In addition to the `LOG_CHECKPOINT_INTERVAL` parameter, there are a number of other parameters that can affect the frequency of checkpoints. These parameters include:
  • MAXLOGFILES: The maximum number of redo log files that can be used for checkpointing.
  • ARCHIVE_LAG_TARGET: The maximum amount of time that a redo log file can be lagged behind the current checkpoint before it is archived.
  • LGWR_DELAY_TIME: The amount of time that the LGWR process will delay before writing redo data to the redo log files.

By carefully configuring these parameters, you can optimize the frequency of checkpoints for your specific database environment.

Here is a table that summarizes the benefits of checkpointing:
Benefit Description
Reduced recovery time: Checkpointing allows the database to quickly recover from a system failure.
Improved performance: Checkpointing can reduce the amount of time that is spent writing dirty buffers to disk.
Reduced I/O load: Checkpointing can reduce the I/O load on the database, which can improve performance.
Reduced memory usage: Checkpointing can reduce the amount of memory that the database needs to store dirty buffers.

On-demand Rollback

Fast-Start Fault Recovery also provides an on-demand rollback feature to help minimize database failure. When a row is locked due to a dead transaction and that row is required by another transaction, on-demand rollback recovers only the data block required by the present transaction and leaves the rest of the data blocks to be recovered within the background. This improves the availability of data to users.

Parallel Rollback

Recovery of a database includes two phases:
  1. rolling forward and
  2. rolling back.
During the roll forward phase, all the transactions within the redo log files are rolled forward or implemented to the database. For example, before the crash of a database, there were 1,500 transactions including 500 inserts, 500 updates, and 500 deletes to the different tables that were within the memory and not implemented or written to the database. When the database crashes, all these transactions are within the redo log files and during the roll forward phase, all these transactions are written to the database. During the roll back phase, all the uncommitted transactions are rolled back. These phases can happen serially or in parallel. When the process is serial, only one server process is associated with it and it processes one transaction at a time. If the process is in parallel, several server processes are assigned the task of processing them. Within Oracle8, rolling forward was accomplished in parallel but rolling back was done serially. Oracle8i allowed for rolling back to be done in parallel. Parallel rollback uses a group of server processes and recovers transactions in parallel. The database determines when it is faster to recover in parallel or serially ( serial rollback), thus minimizing database downtime. Move your mouse over the following diagram to learn more about the different components of Fast-Start Fault Recovery.

  1. Roll forward phase: In this process, which happens during recovery, all transactions within the redo log files are applied to the database.
  2. Rollback: This is a process, where the Oracle server replaces the old values for a record when a transaction is not committed.
  3. Parallel rollback: Rolling back of data within multiple parallel processes.
  4. Serial rollback: Rolling back of data within a single serial process.


Faststart Fault Recovery Components The next lesson discusses fast checkpointing in more detail.

[1]Checkpointing: The process of writing all transactions into the redo log files.