Distributed Fragmentation Independence

Fragmentation Independence and Logically Related Data

Fragmentation independence refers to the ability of end users to store logically related information at different physical locations. There are two types of fragmentation independence: vertical partitioning and horizontal partitioning. Horizontal partitioning permits different rows of the same table to be stored at different remote sites. This is commonly done by organizations that maintain several branch offices, each with an identical set of table structures.
Vertical partitioning refers to the ability of a distributed system to fragment information such that the data columns from the same logical tables are maintained across the network. Oracle accomplishes this with Oracle views that hide specific columns and rows in a table.

Definition Of Distributed Databases

There is an ongoing debate over the standard definition of distributed database systems. Vendors have implemented distributed database technology in different manners. To many database vendors, a distributed database is a geographically distributed system composed entirely of one brand of database products. On the other hand, front end applications vendors define a distributed database as a system distributed architecturally, using a blend of database products and access methods. Finally, to hardware vendors, a distributed database is a system composed of different databases running on the same hardware platforms.

Distributed Database Defined

When an organization is geographically dispersed, it may choose to store its databases

on a central database server or
to distribute them to local servers (or both).

A distributed database is a single logical database that is spread physically across computers in multiple locations that are connected by a data communications network^[1]. I emphasize that a distributed database is truly a database, not a loose collection of files. The distributed database is still centrally administered as a corporate resource while providing local flexibility and customization. The network must allow the users to share the data; thus, a user (or program) at location X must be able to access (and perhaps update) data at location Y. The sites of a distributed system may be spread over a large area (i.e., the United States or the world) or over a small area (i.e., a building or campus). The computers may range from PCs, large-scale servers or even supercomputers. A distributed database requires multiple instances of a database management system (or several DBMSs), running at each remote site. The degree to which these different DBMS instances cooperate, or work in partnership, and whether there is a master site that coordinates requests involving data from multiple sites distinguish different types of distributed database environments. It is important to distinguish between 1) distributed and 2) decentralized databases.
A decentralized database is also stored on computers at multiple locations. However, the computers are not interconnected by network and database software that make the data appear to be in one logical database. Thus, users at the various sites cannot share data. A decentralized database is best regarded as a collection of independent databases, rather than having the geographical distribution of a single database.

Conditions which encourage Distributed Databases

Various business conditions encourage the use of distributed databases:

Distribution and autonomy of business units: Divisions, departments, and facilities in modern organizations are often geographically distributed, often across national boundaries. Often each unit has the authority to create its own information systems, and often these units want local data over which they can have control. Business mergers and acquisitions often create this environment.
Data sharing: Even moderately complex business decisions require sharing data across business units, so it must be convenient to consolidate data across local databases on demand.
Data communications costs and reliability: The cost to ship large quantities of data across a communications network or to handle a large volume of transactions from remote sources can still be high, even if data communication costs have decreased substantially recently. It is in many cases more economical to locate data and applications close to where they are needed. Also, dependence on data communications always involves an element of risk, so keeping local copies or fragments of data can be a reliable way to support the need for rapid access to data across the organization.
Multiple application vendor environment: Today, many organizations purchase packaged application software from several different vendors. Each hybridization package is designed to work with its own database, and possibly with different database management systems. A distributed database can possibly be defined to provide functionality that cuts across the separate applications.
Database Recovery: Replicating data on separate computers is one strategy for ensuring that a damaged database can be quickly recovered and users can have access to data while the primary site is being restored. Replicating data across multiple computer sites is one natural form of a distributed database.

[1]communications network: A local area network (LAN) is a communications network that interconnects a variety of data communications devices within a small geographic area and transmits data at high data transfer rates.