How are Table Joins Formed in SQL: Understanding the Basics
Table joins can be a bit challenging for beginners, but with the right guidance and knowledge, you will be joining tables like an experienced SQL Developer in no time. In this article, we will cover everything you need to know about how table joins are formed in SQL, from the basics to the most complex scenarios.
Table Joins Theory
When working with databases, it is common to have data distributed across multiple tables. For example, in an e-commerce platform, customer information is stored in one table, order information in another table, and product information in yet another table. To get meaningful insights and answers from this data, we need to combine or join these tables.
SQL (Structured Query Language) provides us with various types of table joins, such as
inner joins,
left joins,
right joins, and
full outer joins.
Understanding how to use each of these joins is crucial when working with databases.
In this article, we will focus on the fundamentals of how table joins are formed in SQL.
We will explain the syntax, types of joins, and provide examples to help you understand how it all works.
How are Table Joins Formed in SQL?
At its core, a table join in SQL is a way of combining two or more tables based on a common column or set of columns. The result of the join is a new table that contains all the columns from both tables, where rows from both tables with matching values in the specified columns are combined into a single row.
The basic syntax for joining two tables in SQL is as follows:
SELECT column_name(s)
FROM table1
JOIN table2
ON table1.column_name = table2.column_name;
Let us break down the above syntax:
The SELECT statement specifies the columns to be returned in the result set.
The FROM statement specifies the first table to be joined.
The JOIN statement specifies the second table to be joined.
The ON statement specifies the condition for the join. It specifies the columns in both tables to be used to match rows.
The JOIN statement can be replaced with other types of joins such as LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN. Each of these joins returns a different result set, which we'll explain in the following sections.
When designing a database you must ensure that every table represents a single entity. The other side of that design goal is that, while you will occasionally need to look up data about a single entity (such as a CD or Distributor), you will often need to combine records from more than one table to derive the information you need. Designers create tables with data about more than one entity in order to include frequently needed information in the same table. Joins are operations that link table records based on data in common fields and enable you to meet your design goal of representing a single entity in each table and still produce meaningful information based on data from more than one table.
As an example, consider the ER diagram for the course project,
which has separate entities representing Customers, CDs, Distributors, Orders, and order Line Items.
Entity Diagram
Every entity in the project ER diagram has a field in common with at least one other entity. For example, the CD and Distributor entities have the DistID field in common. You can use these common fields to create joins between tables based on those entities.
Remember that in the Stories on CD database, every CD has exactly one Distributor. Consider the sample records from the truncated CD and Distributor tables in the following illustration.
You can create a join between records in these two tables whenever the records have the same value. The new table created by the join is called a virtual table because it exists only in the computer's memory and is not part of the database's permanent structure. This would look like the one illustrated below.
Primary Key Field
It is very important that the field used to create the join be a primary-key field in one of the joined tables.
Because the DistID field is the primary-key field of the Distributor table, every DistID value will point to a single distributor.
If a value in the DistID field occurred in records for more than one distributor, the results would contain more than one distributor for each CD, which is invalid. The next lesson describes the different types of joins.