|Lesson 5||The changing attitudes regarding distributed data|
|Objective||Explain why Replication and Distribution are now viable Elements of Database Design.|
Replication and Distribution as Elements of Database Design
Prior to the advent of cheap hard disks in the 1980s, an important design requirement for all databases was to minimize the amount of redundant information.
All databases were kept in centralized mainframe environments and distributed processing was very rare.
However, once hard disks became cheap enough to permit replicated data, Oracle introduced the concept of snapshots and their first distributed-database tool,
called SQL*Net. A snapshot is an Oracle construct whereby remote tables are refreshed from a master table. This allows a table to be replicated on many Oracle databases.
SQL*Net permits geographically distributed databases to be "linked" such that they function as a single database.
The first version of SQL*Net was quite primitive when compared to Net8, but it did have the advantage of being simple and functional.
In the next lesson, we will look at how Oracle implements the features of a distributed database.
Options for Distributing a Database
How should a database be distributed among the sites (or nodes) of a network?
this important issue of physical database design in Chapter 5, which introduced
an analytical procedure for evaluating alternative distribution strategies. In that
chapter, we noted that there are four basic strategies for distributing databases:
- Data replication
- Horizontal partitioning
- Vertical partitioning
- Combinations of the above
We will explain and illustrate each of these approaches using relational databases
The same concepts apply (with some variations) for other data models, such as hierarchical and network models.
Suppose that a bank has numerous branches located throughout a state.
One of the base relations in the bank's database is the Customer relation.
For simplicity, the sample data in the relation apply to only two of the branches (Lakeview and Valley).
The primary key in this relation is account number (AcctNumber).
BranchName is the name of the branch where customers have opened their accounts (and therefore where they presumably
perform most of their transactions).
Lazy or asynchronous Replication
Eager Replication update strategies are synchronous, in the sense that they require
the atomic updating of some number of copies. Lazy Group Replication and Lazy Master Replication
both operate asynchronously.
If the users of distributed database systems are willing to pay the price of some inconsistency in
exchange for the freedom to do asynchronous updates, they will insist that:
- the degree of inconsistency be bounded precisely, and that
- the system guarantees convergence to standard notions of correctness.
Without such properties, the system in effect becomes partitioned as the replicas diverge more and
more from one another (Davidson et al, 1985).
Lazy Group Replication
Lazy Group Replication, however, allows any node to update any local data. When the transaction
commits, a transaction is sent to every other node to apply the root transactions updates to the replicas
at the destination node. It is possible for two nodes to update the same object and race each other to
install their updates at other nodes. The replication mechanism must detect this and reconcile the two
transactions so that their updates are not lost .
Timestamps are commonly used to detect and reconcile lazy-group transactional updates. Each object
carries the timestamp of its most recent update. Each replica update carries the new value and is tagged
with the old object timestamp. Each node detects incoming replica updates that would overwrite earlier
committed updates. The node tests if the local replica's timestamp and the update's old timestamp
are equal. If so, the update is safe. The local replica's timestamp advances to the new transaction's
timestamp and the object value is updated. If the current timestamp of the local replica does not match
the old timestamp seen by the root transaction, then the update may be dangerous. In such cases, the
node rejects the incoming transaction and submits it for reconciliation. The reconciliation process is
then responsible for applying all waiting update transactions in their correct time sequence.
Transactions that would wait in an Eager Replication system face reconciliation in a Lazy Group
Replication system. Waits are much more frequent than deadlocks because it takes two waits to make
Lazy Master Replication
Another alternative to Eager Replication is Lazy Master Replication.
This replication method assigns an owner to each object and the owner stores the object's correct value.
Updates are first done by the owner and then propagated to other replicas. When a transaction wants
to update an object, it sends a Remote Procedure Call (RPC) to the node owning the object. To
achieve serialisability, a read action should send read-lock RPCs to the masters of any objects it reads.
Therefore, the node originating the transaction broadcasts the replica updates to all the slave replicas
after the master transaction commits. The originating node sends one slave transaction to each slave
node. Slave updates are time-stamped to assure that all the replicas converge to the same final state.
If the record timestamp is newer than a replica update timestamp, the update is stale and can be
ignored. Alternatively, each master node sends replica updates to slaves in sequential commit order.