|Lesson 9||The downside to Horizontal Partitioning|
|Objective||Describe the trade-offs that occur with Horizontal Partitioning|
Disadvantages of Horizontal Partitioning
Splitting table records into two or more tables reduces the amount of data the RDBMS must work through in most searches and joins.
The drawback of horizontally partitioning
a table is that if the RDBMS must search every row in the table
(to find, for example, the total of all orders to a given distributor), then the RDBMS must search each table for appropriate records.
Because the RDBMS must
- open one table,
- find all appropriate records,
- close the table,
- find and open the next table,
- find all appropriate records,
and so on until every record has been searched, the search will take much longer than if the table was in one piece.
The only way to determine if horizontal partitioning
will improve database performance is to analyze usage patterns.
If users rely more on one identifiable set of rows than another,
it may be worthwhile to partition the table horizontally where the rows appear to allow faster access.
The next lesson introduces vertical partitioning.
Relational Database Design and Implementation
Horizontal partitioning involves creating two or more tables with exactly the same structure and splitting rows between those tables.
DistributedNetworks might use this technique to solve the problem with the order and order items tables becoming increasingly large.
The database design might be modified as follows:
open_order (order_numb, customer_numb, order_date)
open_order_items (order_numb, item_numb, quantity,
filled_order (order_numb, customer_numb, order_date)
filled_order_items (order_numb, item_numb, quantity,
Whenever all items in an open order have shipped, an application program deletes rows from the open order and open order items table
and inserts them into the filled order and filled order items table. The open order and open order items tables remain relatively small, speeding up both retrieval and modification performance. Although retrieval from filled order and filled order lines will be slower, DistributedNetworks uses those tables much less frequently.
The drawback to this solution occurs when DistributedNetworks needs to access all of the orders and/or order items at the same time.
A query whose result table includes data from both sets of open and filled tables must actually be two queries connected by the union operator. (The union operation creates one table by merging the rows of two tables with the same structure.) Performance of such a query will be worse than that of a query of either set of tables individually. Nonetheless, if an analysis of DistributedNetworks' data access patterns reveals that such queries occur rarely and that most retrieval involves the open set of tables, then the horizontal partitioning is worth doing.
The only way you can determine whether horizontal partitioning will increase performance is to examine the ways in which your database
applications access data. If there is a group of rows that are accessed together significantly more frequently than the rest of the rows in a table, then horizontal partitioning may make sense.
Horizontal partitioning: Distributing the tuples of a relation across a number of (smaller) relations.