Lesson 4	STAR Index Queries
Objective	Describe STAR Query Operation

Describe STAR Index Query Operation

The term STAR was coined by data warehouse researcher Dr. Ralph Kimball. Kimball described the data warehouse table structure as a star formation, where the central FACT table is surrounded by smaller points called DIMENSION or lookup tables.
Data Warehouse Schema

A STAR query operation is not limited to a specific RDBMS like Oracle or SQL Server. It can be used in any relational database management system (RDBMS) that supports star schema design for querying data, particularly in the context of data warehousing.
Key Points about STAR Queries:

Star Schema: A star schema is a widely used design pattern in data warehouses, consisting of a central fact table and several dimension tables. The fact table contains measurable data (e.g., sales, transactions), and the dimension tables store descriptive data (e.g., time, location, product).
Star Query Operation:
- A star query typically involves joining a fact table to multiple dimension tables based on foreign keys.
- The goal is to retrieve aggregated data by navigating through dimensions, such as sales by region and time period.
- These queries are highly optimized in RDBMSs for fast performance.

Applicability Across RDBMS Systems:

Oracle: Oracle supports star queries and optimizations like star transformation to improve the performance of queries on star schemas.
SQL Server: SQL Server also supports star schema queries and provides indexing strategies and partitioning to optimize performance.
Other RDBMSs:
- PostgreSQL, MySQL, IBM Db2, and other relational databases can also implement star schema queries
- Although specific query optimization techniques may differ across RDBMSs, the general principle of executing star queries remains the same.

Optimizations in RDBMS:

Star Transformation (Oracle): Oracle can transform star queries into more efficient forms using bitmap indexing or star join optimization, allowing the database engine to quickly navigate the fact and dimension tables.
Query Optimization (SQL Server): SQL Server uses indexed views, clustered indexes, and partitioning to optimize star queries for performance.

Key Use Cases: Star queries are most effective in data warehouse environments where large datasets are stored for reporting and analytics. These environments typically use RDBMSs that are designed to handle complex, multi-table joins with large amounts of data.
Summary:

Star queries are not restricted to any one RDBMS like Oracle or SQL Server.
They are commonly used in any data warehouse setting, regardless of the RDBMS.
The schema design (fact and dimension tables) drives the use of star queries, and most modern RDBMSs support this pattern with varying degrees of optimization.

This is an example of a star schema. Notice that the DIMENSION tables are not joined to each other.

Oracle Indexing and Access Paths

Dimension Table

A single set of data about an item described in a fact table, a dimension is usually a denormalized table. A dimension table holds a key value and a numerical measurement or set of related measurements about the fact table object. A measurement is usually a sum but could also be an average, a mean or a variance. A dimension can have many attributes, 50 or more is the norm, since they are denormalized structures.

This is an example of a star schema. — This is an example of a star schema

The star schema is one of the simplest of data warehouse schemas. It is known as star because it appears like a star with points expanding from a center. Figure 6.4 represents the star schema in which the fact table is at the center and the dimension tables are the nodes of the star. Each dimension in a star schema represents a one-dimensional table only and the dimension table consists of a set of attributes. Dimension tables comprise of relatively small numbers of records in comparison to fact tables, but each record may consist of a large number of attributes to describe the fact data. Fact tables usually consist of numeric facts and foreign keys to dimensional data. Generally, fact tables are in (3NF) third normal form in the case of star schema while dimensional tables are in de-normalized form. Although the star schema is one of the simplest structures, it is still extensively used nowadays and recommended by Oracle.

Figure 6.4 Graphical representation of Star schema

About Bitmap Indexes on Partitioned Tables

You can create bitmap indexes on partitioned tables but they must be local to the partitioned table, they cannot be global indexes. A partitioned table can only have global B-tree indexes, partitioned or nonpartitioned.

Benefits of Indexes for Data Warehousing Applications: Bitmap indexes are primarily intended for data warehousing applications where users query the data rather than update it. They are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data. Indexes are more beneficial for high cardinality columns.

The STAR Query

A STAR query is a join between a FACT table and a number of DIMENSION tables. The first Oracle data warehouses lacked an optimized execution strategy to efficiently support access for queries on a STAR data warehouse table structure. In running STAR queries, early query optimizers often chose to process substantial amounts of data from the FACT table, multiple times. They would first join the smallest DIMENSION table against the FACT table, and then join each of the other DIMENSION tables in turn against the intermediate table. These n-way table joins were performed very slowly.

The STAR Schema Method:
To remedy this problem, Oracle developed the STAR schema method for performing large n-way joins of very large FACT tables. The following series of images below examines the STAR query process.

1) Oracle will first service the queries against the smaller DIMENSION tables, combining the result set into a Cartesian product table that is held in Oracle RAM in the shared pool. This virtual table will contain all of the columns from all of the participating DIMENSION tables.

2) The primary key for this virtual table will be a composite of all of the keys for the DIMENSION tables.

3) If this key matches the composite index on the FACT table, then the query will be able to process very quickly.

4) Once the sum of the reference tables has been addressed, Oracle will perform a nested-loop join of the intermediate table against the FACT table.

Star Query Process

A Star query is a join between a fact table and a number of dimension tables.

Each dimension table is joined to the fact table using a primary key to foreign key join
The dimension tables are not joined to each other
The Oracle 11g cost-based optimizer recognizes star queries and generates the most efficient execution plans for them

Question:Is a star query a join between a fact table and a number of dimension tables in a data warehouse environment?
Yes, in a data warehousing context, a star query is a query that joins a central fact table with one or more dimension tables. This type of query is named after the star schema, a common design pattern in data warehouses.
The star schema consists of one or more fact tables referencing any number of dimension tables, creating a structure that resembles a star. The fact table at the center of the star contains the primary transactional data, often numeric measures, that are the focus of the analysis. Each dimension table connected to the fact table contains descriptive attributes related to the measures stored in the fact table.
A star query typically starts with a SELECT statement that retrieves data from the fact table and uses JOIN statements to combine this data with relevant information from the associated dimension tables. The WHERE clause in a star query often includes conditions on columns from the dimension tables. This type of query is designed to facilitate the analysis of measures in the fact table based on various dimensions.
For instance, in a sales data warehouse, the fact table might store sales transactions with measures like units sold and revenue, and dimension tables might include date, product, customer, and region details. A star query might join the fact table with the date and region dimension tables to analyze total revenue per region for a specific time period.
Oracle's query optimizer can process star queries very efficiently by using a method called star transformation. When a star query is executed, Oracle can leverage bitmap indexes on the join columns in the fact table to retrieve the data more efficiently, optimizing the execution plan and enhancing query performance. This makes star queries a powerful tool in the data warehousing and business intelligence context.

Conditions for STAR Query indexing Technique

The STAR query requires that a single concatenated index reside on the FACT table for all keys. To invoke the STAR query path, the following characteristics must be present:

At least three tables in the join, including one large FACT table and several smaller DIMENSION tables
A concatenated index on the FACT table with at least three columns, one for each of the table join keys
An EXPLAIN PLAN to verify that the NESTED LOOPS operation is being used to perform the join

The speed of the star join technique is a result of reducing the physical I/O.

The STAR Query Process

In the STAR query below, the indexes are read to gather the virtual table in memory. The FACT table is not accessed until the virtual index has everything it requires to go directly to the requested rows via the composite index on the FACT table.

NESTED LOOPS
   MERGE JOIN CARTESIAN
      MERGE JOIN CARTESIAN
      SORT JOIN 
         TABLE ACCESS ... DIMENSION1
      SORT JOIN 
         TABLE ACCESS ... DIMENSION2
      SORT JOIN 
         TABLE ACCESS ... DIMENSION3
   TABLE ACCESS BY ROWID FACT_TABLE
INDEX RANGE SCAN FACT_TABLE_CONCAT_IDX

In summary, the STAR query is extremely useful in cases where large, fully populated data warehouse tables are joined together. The next lesson describes the STAR transformation.