The term STAR was coined by data warehouse researcher Dr. Ralph Kimball. Kimball described the data warehouse table structure as a star formation, where the central FACT table is surrounded by smaller points called DIMENSION or lookup tables.
Data Warehouse Schema
Question: What are the characteristics of a star schema used in a data warehouse?
A star schema is a popular database design for a data warehouse due to its simplicity and efficiency in handling analytical queries and reporting tasks. It is characterized by the following key features:
Central Fact Table: At the heart of the star schema is the fact table, a large table that stores the primary transactional data. This table includes measures, which are often numeric and additive in nature, such as sales amounts, quantities, or counts. Fact tables typically have a composite primary key made up of foreign keys referencing the associated dimension tables.
Dimension Tables: Surrounding the fact table are one or more dimension tables. These tables provide descriptive, categorical information related to the measures stored in the fact table. Dimension tables typically have a simple primary key and contain attributes such as dates, geographic information, product details, or customer demographics. Each row in a dimension table represents a unique combination of attributes and is linked to the fact table via foreign key relationships.
Denormalization: In contrast to the third normal form (3NF) typically used in OLTP databases, star schemas use denormalized data. Denormalization reduces the number of tables and joins required to answer a query, which significantly speeds up data retrieval in analytical and reporting queries.
Hierarchies: Dimension tables in a star schema often represent hierarchies. For instance, a time dimension table might include columns for day, month, quarter, and year, enabling analysis at different levels of granularity.
Data Integrity: While the star schema uses denormalized data, it still maintains referential integrity through the use of primary and foreign keys. However, it usually does not enforce domain integrity (a function of normalization) since it prioritizes query performance over update anomalies.
Query Performance: Star schemas are optimized for query performance. By reducing the number of joins and the complexity of queries, they enable faster data retrieval. This is particularly beneficial for complex and ad-hoc analytical queries common in business intelligence and reporting applications.
Bitmap Indexing: Star schemas often use bitmap indexes on the foreign key columns in the fact table. When a query joins the fact table with dimension tables, Oracle can use these bitmap indexes to speed up data retrieval significantly.
Data Warehouse Environments: Star schemas are most commonly found in data warehouse environments, which are characterized by large volumes of data and a read-intensive workload. They are less suited to transactional (OLTP) environments with frequent, small updates.
By understanding these characteristics, database administrators and data architects can effectively design, implement, and manage star schemas in data warehouse environments, optimizing data retrieval and analysis performance.
This is an example of a star schema. Notice that the DIMENSION tables are not joined to each other.
A single set of data about an item described in a fact table, a dimension is usually a denormalized table. A dimension table holds a key value and a numerical measurement or set of related measurements about the fact table object. A measurement is usually a sum but could also be an average, a mean or a variance. A dimension can have many attributes, 50 or more is the norm, since they are denormalized structures.
The star schema is one of the simplest of data warehouse schemas. It is known as star because it appears like a star with points expanding from a center. Figure 6.4 represents the star schema in which the fact table is at the center and the dimension tables are the nodes of the star. Each dimension in a star schema represents a one-dimensional table only and the dimension table consists of a set of attributes.
Dimension tables comprise of relatively small numbers of records in comparison to fact tables, but each record may consist of a large number of attributes to describe the fact data. Fact tables usually consist of numeric facts and foreign keys to dimensional data.
Generally, fact tables are in (3NF) third normal form in the case of star schema while dimensional tables are in de-normalized form.
Although the star schema is one of the simplest structures, it is still extensively used nowadays and recommended by Oracle.
About Bitmap Indexes on Partitioned Tables
You can create bitmap indexes on partitioned tables but they must be local to the partitioned table, they cannot be global indexes.
A partitioned table can only have global B-tree indexes, partitioned or nonpartitioned.
Benefits of Indexes for Data Warehousing Applications
Bitmap indexes are primarily intended for data warehousing applications where users query the data rather than update it.
They are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data. Indexes are more beneficial for high cardinality columns.
The STAR Query
A STAR query is a join between a FACT table and a number of DIMENSION tables. The first Oracle data warehouses lacked an optimized execution strategy to efficiently support access for queries on a STAR data warehouse table structure. In running STAR queries, early query optimizers often chose to process substantial amounts of data from the FACT table, multiple times. They would first join the smallest DIMENSION table against the FACT table, and then join each of the other DIMENSION tables in turn against the intermediate table. These n-way table joins were performed very slowly.
The STAR Schema Method
To remedy this problem, Oracle developed the STAR schema method for performing large n-way joins of very large FACT tables. The following series of images below examines the STAR query process.
Star Query Process
A Star query is a join between a fact table and a number of dimension tables.
Each dimension table is joined to the fact table using a primary key to foreign key join
The dimension tables are not joined to each other
The Oracle 11g cost-based optimizer recognizes star queries and generates the most efficient execution plans for them
Question:Is a star query a join between a fact table and a number of dimension tables in a data warehouse environment?
Yes, in a data warehousing context, a star query is a query that joins a central fact table with one or more dimension tables.
This type of query is named after the star schema, a common design pattern in data warehouses.
The star schema consists of one or more fact tables referencing any number of dimension tables, creating a structure that resembles a star. The fact table at the center of the star contains the primary transactional data, often numeric measures, that are the focus of the analysis. Each dimension table connected to the fact table contains descriptive attributes related to the measures stored in the fact table.
A star query typically starts with a SELECT statement that retrieves data from the fact table and uses JOIN statements to combine this data with relevant information from the associated dimension tables. The WHERE clause in a star query often includes conditions on columns from the dimension tables. This type of query is designed to facilitate the analysis of measures in the fact table based on various dimensions.
For instance, in a sales data warehouse, the fact table might store sales transactions with measures like units sold and revenue, and dimension tables might include date, product, customer, and region details. A star query might join the fact table with the date and region dimension tables to analyze total revenue per region for a specific time period.
Oracle's query optimizer can process star queries very efficiently by using a method called star transformation. When a star query is executed, Oracle can leverage bitmap indexes on the join columns in the fact table to retrieve the data more efficiently, optimizing the execution plan and enhancing query performance. This makes star queries a powerful tool in the data warehousing and business intelligence context.
Conditions for STAR Query indexing Technique
The STAR query requires that a single concatenated index reside on the FACT table for all keys. To invoke the STAR query path, the following characteristics must be present:
At least three tables in the join, including one large FACT table and several smaller DIMENSION tables
A concatenated index on the FACT table with at least three columns, one for each of the table join keys
An EXPLAIN PLAN to verify that the NESTED LOOPS operation is being used to perform the join
The speed of the star join technique is a result of reducing the physical I/O.
The STAR Query Process
In the STAR query below, the indexes are read to gather the virtual table in memory. The FACT table is not accessed until the virtual index has everything it requires to go directly to the requested rows via the composite index on the FACT table.