| Lesson 6 | Create an index |
| Objective | Create an Index on a Column in the Database Table. |
BasicTable, defined in lesson 4, has three columns: Firstname, Lastname, and PhoneNumber. The most common query against a contact table is a search by last name — finding everyone named Smith, or everyone whose last name starts with H. Without an index on Lastname, every such query performs a full table scan, reading every row to find the matches. As BasicTable grows from dozens of rows to thousands, this scan grows proportionally slower.
As with the CREATE TABLE statement from lesson 4, the CREATE INDEX statement begins with the CREATE keyword followed by the object type — INDEX — and then the object name and its definition:
CREATE INDEX index_name
ON table_name (column_name);
CREATE INDEX LastnameIndex
ON BasicTable (Lastname);
LastnameIndex on the Lastname column of BasicTable. The index is built immediately when the statement executes — the engine reads all existing Lastname values, sorts them into the index structure, and stores the row pointers. From that point forward, the engine maintains the index automatically: every INSERT that adds a new row also adds an entry to LastnameIndex; every UPDATE that changes a Lastname value updates the corresponding index entry; every DELETE removes the entry.
After LastnameIndex is created, queries that filter or sort on Lastname can use it without any change to the SQL syntax. The query:
SELECT Firstname, Lastname, PhoneNumber
FROM BasicTable
WHERE Lastname = 'Hilbert';
If queries frequently sort or filter on both Lastname and Firstname together — for example, finding all contacts with the last name Hull and then sorting by first name — a composite index on both columns is more efficient than two separate single-column indexes:
CREATE INDEX idx_basictable_lastname_firstname
ON BasicTable (Lastname, Firstname);
This composite index supports queries that filter on Lastname alone, or on Lastname and Firstname together. The Lastname column is the leading column — the index is sorted first by Lastname, then by Firstname within each Lastname group. A query that filters only on Firstname without Lastname in the WHERE clause cannot use this composite index efficiently, because Firstname is not the leading column.
-- Single-column index on the foreign-key lookup column
CREATE INDEX idx_contacts_lastname
ON Contacts (Lastname);
-- Composite index for full-name searches and sorts
CREATE INDEX idx_contacts_lastname_firstname
ON Contacts (Lastname, Firstname);
LastnameIndex name in the original lesson conveys the indexed column but does not identify the table. In a database with dozens of tables, an index name like LastnameIndex is ambiguous — which table's Lastname column does it index? The standard convention used throughout this module follows the pattern idx_tablename_columnname:
-- Pattern: idx_tablename_columnname
CREATE INDEX idx_basictable_lastname
ON BasicTable (Lastname);
-- Composite: idx_tablename_col1_col2
CREATE INDEX idx_contacts_lastname_firstname
ON Contacts (Lastname, Firstname);
This convention makes the index's purpose self-documenting. Any developer reading a list of indexes in the schema browser immediately knows which table each index covers and which column or columns it indexes.
Index names appear in error messages, query execution plans, monitoring dashboards, and DROP INDEX statements. An ambiguous name like index1 or LastnameIndex forces developers to look up the index definition to understand what it covers. A descriptive name like idx_contacts_lastname_firstname communicates the full context without additional lookup.
Consistent naming also makes it easier to identify unused indexes — if EXPLAIN output or the database's index usage statistics show that idx_basictable_lastname has never been used in six months of query traffic, you know immediately which table and column to examine and whether dropping it makes sense.
-- Standard index (B-tree by default)
CREATE INDEX idx_basictable_lastname
ON BasicTable (Lastname);
-- Explicitly specify index type
CREATE INDEX idx_basictable_lastname
ON BasicTable (Lastname)
USING BTREE;
MySQL InnoDB stores indexes as part of the table's data file — the clustered primary key index and all secondary indexes live together. Secondary indexes store the primary key value as the row pointer rather than a physical row address, which means secondary index lookups involve two B-tree traversals: one to find the key in the secondary index, and one to find the full row via the primary key.
-- Standard index
CREATE INDEX idx_basictable_lastname
ON BasicTable (Lastname);
-- Build without locking the table (takes longer but allows concurrent writes)
CREATE INDEX CONCURRENTLY idx_basictable_lastname
ON BasicTable (Lastname);
PostgreSQL also supports several index types beyond B-tree: Hash (equality-only), GiST (geometric and full-text), and GIN (for array and JSONB columns). The default B-tree index is appropriate for the column types covered in this course.
-- Standard non-clustered index (default)
CREATE INDEX idx_basictable_lastname
ON BasicTable (Lastname);
-- SQL Server online index build (equivalent to PostgreSQL CONCURRENTLY)
CREATE INDEX idx_basictable_lastname
ON BasicTable (Lastname)
WITH (ONLINE = ON);
ALTER TABLE BasicTable
ADD INDEX idx_basictable_lastname (Lastname);
-- Or as a unique index
ALTER TABLE BasicTable
ADD UNIQUE INDEX idx_basictable_phone (PhoneNumber);
This syntax is MySQL-specific. PostgreSQL and SQL Server use CREATE INDEX exclusively for index creation outside of table definition.
-- List all indexes on a table
SHOW INDEX FROM BasicTable;
-- Or using information_schema (standard SQL, works across engines)
SELECT index_name, column_name, non_unique
FROM information_schema.statistics
WHERE table_name = 'BasicTable'
ORDER BY index_name, seq_in_index;
-- In psql client: show table structure including indexes
\d BasicTable
-- Query the pg_indexes system view
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'basictable'
ORDER BY indexname;
-- List all indexes on a table
EXEC sp_helpindex 'BasicTable';
-- Or query sys.indexes
SELECT i.name, i.type_desc, c.name AS column_name
FROM sys.indexes i
JOIN sys.index_columns ic ON i.object_id = ic.object_id
AND i.index_id = ic.index_id
JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE OBJECT_NAME(i.object_id) = 'BasicTable'
ORDER BY i.name, ic.key_ordinal;
-- MySQL / MariaDB: requires table name
DROP INDEX idx_basictable_lastname ON BasicTable;
-- PostgreSQL: index name only (indexes are schema-scoped, not table-scoped)
DROP INDEX idx_basictable_lastname;
-- PostgreSQL: safe drop — no error if index does not exist
DROP INDEX IF EXISTS idx_basictable_lastname;
-- SQL Server: requires table-qualified name
DROP INDEX BasicTable.idx_basictable_lastname;
performance_schema.table_io_waits_summary_by_index_usage view shows how many times each index has been used for reads. In PostgreSQL, pg_stat_user_indexes shows the idx_scan count for each index. An index with zero or near-zero scans over a representative time period is a candidate for removal.
ORDER BY Lastname against an indexed Lastname column can read the index entries in sorted order rather than reading all rows and sorting them in memory. Range queries using BETWEEN or comparison operators (WHERE Price BETWEEN 10 AND 25) can use the index to jump to the start of the range and read forward, stopping when the range is exceeded — without examining rows outside the range at all.
JOIN operations between tables are significantly faster when the join columns are indexed. When two tables are joined on ON Orders.ContactID = Contacts.ContactID, an index on Contacts.ContactID allows the engine to look up each ContactID value from the Orders table in the Contacts index rather than scanning the entire Contacts table for each Orders row. Without the index, a JOIN between a large Orders table and a large Contacts table requires a full scan of one table for every row in the other — an O(n×m) operation. With an index, it becomes O(n log m).
Unique indexes enforce uniqueness at the database level. When a UNIQUE INDEX exists on a column, the engine checks the index on every INSERT and UPDATE to verify that the new value does not duplicate an existing entry. This check is fast — O(log n) — because the index is sorted. Without the index, enforcing uniqueness would require a full table scan on every write operation. Primary keys are always backed by a unique index; CREATE UNIQUE INDEX allows the same enforcement on any other column that requires uniqueness.
Every INSERT, UPDATE, and DELETE operation must update all indexes defined on the table. An INSERT into BasicTable with three indexes — a primary key index, LastnameIndex, and a composite index on (Lastname, Firstname) — performs three index updates in addition to the base table insert. On tables with very high write rates — transaction logs, real-time event streams, sensor data tables — this per-write overhead accumulates and can become a bottleneck.
The trade-off is straightforward: indexes improve read performance at the cost of write performance. A table used primarily for reads (a reporting table, a reference data table, a historical archive) can sustain aggressive indexing. A table with extremely high write rates should be indexed minimally, with only the indexes that directly support the most critical read queries.
There is no universal answer to how many indexes a table should have — it depends on the read-to-write ratio, the query patterns, the data volume, and the performance requirements. A practical starting point: index the primary key (automatic), index foreign key columns that appear in JOIN conditions, index columns that appear in WHERE clauses of your most frequent or most critical queries, and index columns used in ORDER BY on large result sets. Measure performance before and after creating each index using EXPLAIN and query timing. Remove indexes that are not used.
Lesson 7 examines how the database index engine uses indexes internally — the physical storage structures, the query optimizer's index selection logic, and the conditions under which the engine chooses a full table scan even when an index is available.