This module introduces the fundamentals of SQL: what it is, why it matters, and how it differs from “traditional” programming. You will also set up a practice database and load the PUBS sample data so you can run consistent queries throughout the course.
SQL (Structured Query Language) is the standard language used to define, query, and control data in a relational database management system (RDBMS). SQL includes statements for: data definition (creating and changing schema objects), data manipulation (querying and modifying rows), and data control (permissions and access control). SQL has been standardized through ANSI and ISO, which is why SQL skills transfer across many platforms.
SQL is one of the foundational building blocks of modern data architecture. It defines how applications and users create tables, enforce data rules, and retrieve information efficiently. By the end of this module, you should be comfortable reading SQL and writing basic queries—enough to start working effectively with databases in real systems.
SQL emerged from relational theory—specifically relational algebra and tuple relational calculus. In the relational model, data is represented as relations (tables), and queries are operations that transform one set of relations into another set of relations. Edgar F. Codd’s 1970 paper introduced the relational model, and SQL became the first broadly successful commercial language influenced by that work.
SQL is often described as a declarative language: you describe the result you want (the set of rows and columns), and the database engine decides how to produce it (index usage, join order, access paths). That is one of the biggest differences between SQL and general-purpose programming languages.
It’s also worth knowing that SQL is not a perfect “pure relational” language. For example, SQL permits duplicate rows
unless you explicitly eliminate them (for example via DISTINCT), and it includes NULL to represent missing/unknown
values. Even with these practical compromises, SQL remains the dominant language for relational systems and has evolved over
decades with new capabilities.
Information is one of the most valuable assets in modern organizations. Businesses store and analyze data about customers, orders, inventory, finances, operations, and more. A database system exists to store that data safely, allow many users to query it concurrently, and keep it consistent as it changes over time.
SQL is useful because it gives you:
General-purpose languages (Java, C#, Python, etc.) are usually imperative: you write step-by-step instructions that specify control flow (loops, branches, algorithms). SQL is typically declarative: you specify the data set you want, and the database figures out the retrieval strategy.
That difference has a huge practical impact:
Neighboring topic: most platforms also provide procedural extensions for database-side logic (for example PL/SQL in Oracle, T-SQL in SQL Server, PL/pgSQL in PostgreSQL). Those extensions can add loops and control-of-flow, but they are still best used when the work is data-centric and close to the database.
Data volume and variety have grown massively. Along with traditional structured data (tables of numbers and text), modern systems manage semi-structured and unstructured content such as documents, logs, messages, images, and event streams.
Relational databases evolved to meet this reality. Many engines now support JSON storage and querying, full-text search, spatial data, and integration with external runtimes and services. Even when data begins “unstructured,” organizations often need to model, validate, and query it reliably—which is exactly where relational design and SQL remain valuable.
A core design goal stays the same: reduce redundancy and improve correctness. Normalization and keys are not just academic topics; they are practical tools for preventing update anomalies and ensuring your data stays trustworthy over time.