Many database designs separate information into different tables based on the relationships between certain data points. Even in cases like this, it’s likely that there will be times when someone will want to retrieve information from more than one table at a time.
A common way of accessing data from multiple tables in a single Structured Query Language (SQL) operation is to combine the tables with a JOIN
clause. Based on join operations in relational algebra, a JOIN
clause combines separate tables by matching up rows in each table that relate to one another. Usually, this relationship is based on a pair of columns — one from each table — that share common values, such as one table’s foreign key and the primary key of another table that the foreign key references.
This guide outlines how to construct a variety of SQL queries that include a JOIN
clause. It also highlights different types of JOIN
clauses, how they combine data from multiple tables, and how to alias column names to make writing JOIN
operations less tedious.
In order to follow this guide, you will need a computer running some type of relational database management system (RDBMS) that uses SQL. The instructions and examples in this guide were validated using the following environment:
Note: Please note that many RDBMSs use their own unique implementations of SQL. Although the commands outlined in this tutorial will work on most RDBMSs, the exact syntax or output may differ if you test them on a system other than MySQL.
JOIN
operations. We encourage you to go through the following Connecting to MySQL and Setting up a Sample Database section for details on how to connect to a MySQL server and create the testing database used in examples throughout this guide.If your SQL database system runs on a remote server, SSH into your server from your local machine:
- ssh sammy@your_server_ip
Then open up the MySQL server prompt, replacing sammy
with the name of your MySQL user account:
- mysql -u sammy -p
Create a database named joinsDB
:
- CREATE DATABASE joinsDB;
If the database was created successfully, you’ll receive output like this:
OutputQuery OK, 1 row affected (0.01 sec)
To select the joinsDB
database, run the following USE
statement:
- USE joinsDB;
OutputDatabase changed
After selecting joinsDB
, create a few tables within it. For the examples used in this guide, imagine that you run a factory and have decided to begin tracking information about your product line, employees on your sales team, and your company’s sales in an SQL database. You plan to start off with three tables, the first of which will store information about your products. You decide this first table needs three columns:
productID
: each product’s identification number, expressed with the int
data type. This column will serve as the table’s primary key, meaning that each value will function as a unique identifier for its respective row. Because every value in a primary key must be unique, this column will also have a UNIQUE
constraint applied to itproductName
: each product’s name, expressed using the varchar
data type with a maximum of 20 charactersprice
: the price of each product, expressed using the decimal
data type. This statement specifies that any values in this column are limited to a maximum of four digits in length with two of those digits to the right of the decimal point. Thus, the range of values allowed in this column goes from -99.99
to 99.99
Create a table named products
that has these three columns:
- CREATE TABLE products (
- productID int UNIQUE,
- productName varchar(20),
- price decimal (4,2),
- PRIMARY KEY (productID)
- );
The second table will store information about the employees on your company’s sales team. You decide this table also needs three columns:
empID
: similar to the productID
column, this column will hold a unique identification number for each employee on the sales team expressed with the int
data type. Likewise, this column will have a UNIQUE
constraint applied to it and will serve as the primary key for the team
tableempName
: the name of each salesperson, expressed using the varchar
data type with a maximum of 20 charactersproductSpecialty
: each member of your sales team has been assigned a product as their specialty; they can sell any product your company makes, but their overall focus will be on whatever product they specialize in. To indicate this in the table, you create this column which holds the productID
value of whatever product each employee specializes inTo ensure that the productSpecialty
column only holds values that represent valid product ID numbers, you decide to apply a foreign key constraint to the column that references the products
table’s productID
column. A foreign key constraint is a way to express a relationship between two tables by requiring that values in the column on which it applies must already exist in the column that it references. In the following CREATE TABLE
statement, the FOREIGN KEY
constraint requires that any value added to the productSpecialty
column in the team
table must already exist in the products
table’s productID
column.
Create a table named team
with these three columns:
- CREATE TABLE team (
- empID int UNIQUE,
- empName varchar(20),
- productSpecialty int,
- PRIMARY KEY (empID),
- FOREIGN KEY (productSpecialty) REFERENCES products (productID)
- );
The last table you create will hold records of the company’s sales. This table will have four columns:
saleID
: similar to the productID
and empID
columns, this column will hold a unique identification number for each sale expressed with the int
data type. This column will also have a UNIQUE
constraint so it can serve as the primary key for the sales
tablequantity
: the number of units of each product sold, expressed with the int
data typeproductID
: the identification number of the product sold, expressed as an int
salesperson
: the identification number of the employee who made the saleLike the productSpecialty
column from the team
table, you decide to apply FOREIGN KEY
constraints to both the productID
and salesperson
columns. This will ensure that these columns only contain values that already exist in the products
table’s productID
column and the team
table’s empID
columns, respectively.
Create a table named sales
with these four columns:
- CREATE TABLE sales (
- saleID int UNIQUE,
- quantity int,
- productID int,
- salesperson int,
- PRIMARY KEY (saleID),
- FOREIGN KEY (productID) REFERENCES products (productID),
- FOREIGN KEY (salesperson) REFERENCES team (empID)
- );
Following that, load the products
table with some sample data by running the following INSERT INTO
operation:
- INSERT INTO products
- VALUES
- (1, 'widget', 18.99),
- (2, 'gizmo', 14.49),
- (3, 'thingamajig', 39.99),
- (4, 'doodad', 11.50),
- (5, 'whatzit', 29.99);
Then load the team
table with some sample data:
- INSERT INTO team
- VALUES
- (1, 'Florence', 1),
- (2, 'Mary', 4),
- (3, 'Diana', 3),
- (4, 'Betty', 2);
Load the sales
table with some sample data as well:
- INSERT INTO sales
- VALUES
- (1, 7, 1, 1),
- (2, 10, 5, 4),
- (3, 8, 2, 4),
- (4, 1, 3, 3),
- (5, 5, 1, 3);
Lastly, imagine that your company makes a few sales without the involvement of anyone on your sales team. To record these sales, run the following operation to add three rows to the sales
table that don’t include a value for the salesperson
column:
- INSERT INTO sales (saleID, quantity, productID)
- VALUES
- (6, 1, 5),
- (7, 3, 1),
- (8, 4, 5);
With that, you’re ready to follow the rest of the guide and begin learning about how to join tables together in SQL.
JOIN
OperationsJOIN
clauses can be used in a variety of SQL statements, including UPDATE
and DELETE
operations. For illustration purposes, though, the examples in this guide use SELECT
queries to demonstrate how JOIN
clauses work.
The following example shows the general syntax of a SELECT
statement that includes a JOIN
clause:
- SELECT table1.column1, table2.column2
- FROM table1 JOIN table2
- ON search_condition;
This syntax begins with a SELECT
statement that will return two columns from two separate tables. Note that because JOIN
clauses compare the contents of more than one table, this example syntax specifies which table to select each column from by preceding the name of the column with the name of the table and a period. This is known as a fully qualified column reference.
You can use fully qualified column references like these in any operation, but doing so is technically only necessary in operations where two columns from different tables share the same name. It’s good practice to use them when working with multiple tables, though, as they can help make JOIN
operations easier to read and understand.
After the SELECT
clause comes the FROM
clause. In any query, the FROM
clause is where you define the data set that should be searched in order to return the desired data. The only difference here is that the FROM
clause includes two tables separated by the JOIN
keyword. A helpful way to think of writing queries is to remember that you SELECT
which columns to return FROM
which table you’d like to query.
Following that is an ON
clause, which describes how the query should join the two tables together by defining a search condition. A search condition is a set of one or more predicates, or expressions that can evaluate whether a certain condition is “true,” “false,” or “unknown.” It can be helpful to think of a JOIN
operation as combining every row from both tables, and then returning any rows for which the search condition in the ON
clause evaluates to “true”.
In an ON
clause, it usually makes sense to include a search condition that tests whether two related columns — like one table’s foreign key and the primary key of another table that the foreign key references — have values that are equal. This is sometimes referred to as an equi join.
As an example of how equi joins match data from multiple tables, run the following query using the sample data you added previously. This statement will join the products
and team
tables with a search condition that tests for matching values in their respective productID
and productSpecialty
columns. It will then return the names of every member of the sales team, the name of each product they specialize in, and the price of those products:
- SELECT team.empName, products.productName, products.price
- FROM products JOIN team
- ON products.productID = team.productSpecialty;
Here is this query’s result set:
Output+----------+-------------+-------+
| empName | productName | price |
+----------+-------------+-------+
| Florence | widget | 18.99 |
| Mary | doodad | 11.50 |
| Diana | thingamajig | 39.99 |
| Betty | gizmo | 14.49 |
+----------+-------------+-------+
4 rows in set (0.00 sec)
To illustrate how SQL combines these tables to form this result set, let’s take a closer look at this process. To be clear, the following isn’t exactly what happens when a database management system joins two tables together, but it can be helpful to think of JOIN
operations as following a procedure like this.
First, the query prints every row and column in the first table in the FROM
clause, products
:
JOIN Process Example+-----------+-------------+-------+
| productID | productName | price |
+-----------+-------------+-------+
| 1 | widget | 18.99 |
| 2 | gizmo | 14.49 |
| 3 | thingamajig | 39.99 |
| 4 | doodad | 11.50 |
| 5 | whatzit | 29.99 |
+-----------+-------------+-------+
Then, it looks at each of these rows and matches any row from the team
table whose productSpecialty
is equal to the productID
value in that row:
JOIN Process Example+-----------+-------------+-------+-------+----------+------------------+
| productID | productName | price | empID | empName | productSpecialty |
+-----------+-------------+-------+-------+----------+------------------+
| 1 | widget | 18.99 | 1 | Florence | 1 |
| 2 | gizmo | 14.49 | 4 | Betty | 2 |
| 3 | thingamajig | 39.99 | 3 | Diana | 3 |
| 4 | doodad | 11.50 | 2 | Mary | 4 |
| 5 | whatzit | 29.99 | | | |
+-----------+-------------+-------+-------+----------+------------------+
Then, it cuts any rows that don’t have a match and rearranges the columns based on their order in the SELECT
clause, drops any columns that weren’t specified, resorts the rows, and returns the final result set:
JOIN Process Example+----------+-------------+-------+
| empName | productName | price |
+----------+-------------+-------+
| Florence | widget | 18.99 |
| Mary | doodad | 11.50 |
| Diana | thingamajig | 39.99 |
| Betty | gizmo | 14.49 |
+----------+-------------+-------+
4 rows in set (0.00 sec)
Using equi joins is the most common way to join tables, but it’s possible to use other SQL operators such as <
, >
, LIKE
, NOT LIKE
, or even BETWEEN
in ON
clause search conditions. Be aware, though, that using more complicated search conditions can make it difficult to predict what data will appear in the result set.
In most implementations, you can join tables with any set of columns that have what the SQL standard refers to as “JOIN
eligible” data type. This means that, in general, it’s possible to join a column that holds numeric data with any other column that holds numeric data, regardless of their respective data types. Likewise, it’s usually possible to join any column that holds character values with any other column holding character data. As stated previously, though, the columns you match to join two tables will typically be ones that already signify a relationship between the tables, like a foreign key and the primary key of another table that it references.
Many SQL implementations also allow you to join columns that have the same name with the USING
keyword instead of ON
. This is how the syntax for such an operation might look:
- SELECT table1.column1, table2.column2
- FROM table1 JOIN table2
- USING (related_column);
In this example syntax, the USING
clause is equivalent to ON table1.related_column = table2.related_column;
.
Because sales
and products
each have a column named productID
, you can join them by matching these columns with the USING
keyword. The following command does this, and returns the saleID
of each sale, the quantity of units sold, the name of each product sold, and its price. Also, it sorts the result set in ascending order based on the saleID
value:
- SELECT sales.saleID, sales.quantity, products.productName, products.price
- FROM sales JOIN products
- USING (productID)
- ORDER BY saleID;
Output+--------+----------+-------------+-------+
| saleID | quantity | productName | price |
+--------+----------+-------------+-------+
| 1 | 7 | widget | 18.99 |
| 2 | 10 | whatzit | 29.99 |
| 3 | 8 | gizmo | 14.49 |
| 4 | 1 | thingamajig | 39.99 |
| 5 | 5 | widget | 18.99 |
| 6 | 1 | whatzit | 29.99 |
| 7 | 3 | widget | 18.99 |
| 8 | 4 | whatzit | 29.99 |
+--------+----------+-------------+-------+
8 rows in set (0.00 sec)
When joining tables, the database system will sometimes rearrange rows in ways that aren’t easy to predict. Including an ORDER BY
clause like this can help make result sets more coherent and readable.
There may be times when you need to combine data from more than just two tables. You can join any number of tables together by embedding JOIN
clauses within other JOIN
clauses. The following syntax is an example of how this can look when joining three tables:
- SELECT table1.column1, table2.column2, table3.column3
- FROM table1 JOIN table2
- ON table1.related_column = table2.related_column
- JOIN table3
- ON table3.related_column = table1_or_2.related_column;
This example syntax’s FROM
clause starts by joining table1
with table2
. After this joining’s ON
clause, it starts a second JOIN
that combines the initial set of joined tables with table3
. Note that the third table can be joined to a column in either the first or second table.
To illustrate, imagine that you want to know how much revenue your employee’s sales have brought in, but you only care about sales records that involve an employee selling the product they specialize in.
To get this information, you could run the following query. This query starts by joining the products
and sales
tables together by matching their respective productID
columns. It then joins the team
table to the first two by matching each row in the initial JOIN
to its productSpecialty
column. The query then filters the results with a WHERE
clause to only return rows where the matched employee was also the person who made the sale. This query also includes an ORDER BY
clause that sorts the final results in ascending order based on the value in the saleID
column:
- SELECT sales.saleID,
- team.empName,
- products.productName,
- (sales.quantity * products.price)
- FROM products JOIN sales
- USING (productID)
- JOIN team
- ON team.productSpecialty = sales.productID
- WHERE team.empID = sales.salesperson
- ORDER BY sales.saleID;
Note that among the columns listed in this query’s SELECT
clause is an expression that multiplies the values in the sales
table’s quantity
column by the products
table’s price
values. It returns the products of these values in the matched rows:
Output+--------+----------+-------------+-----------------------------------+
| saleID | empName | productName | (sales.quantity * products.price) |
+--------+----------+-------------+-----------------------------------+
| 1 | Florence | widget | 132.93 |
| 3 | Betty | gizmo | 115.92 |
| 4 | Diana | thingamajig | 39.99 |
+--------+----------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
All the examples so far have featured the same type of JOIN
clause: the INNER JOIN
. For an overview of INNER
joins, OUTER
joins, and how they differ, continue reading the next section.
JOIN
OperationsThere are two main types of JOIN
clauses: INNER
joins and OUTER
joins. The difference between these two types of joins has to do with what data they return. INNER
join operations return only matching rows from each joined table, while OUTER
joins return both matching and non-matching rows.
The example syntaxes and queries from the previous sections all used INNER JOIN
clauses even though none of them include the INNER
keyword. Most SQL implementations treat any JOIN
clause as an INNER
join unless explicitly stated otherwise.
Queries that specify an OUTER JOIN
combine multiple tables and return any rows that match as well as rows that do not match. This can be useful for finding rows with missing values, or in cases where partial matches are acceptable.
OUTER
join operations can be further divided into three types: LEFT OUTER
joins, RIGHT OUTER
joins, and FULL OUTER
joins. LEFT OUTER
joins, or just LEFT
joins, return every matching row from the two joined tables, as well as every non-matching row from the “left” table. In the context of JOIN
operations, the “left” table is always the first table specified immediately after the FROM
keyword and to the left of the JOIN
keyword. Likewise, the “right” table is the second table, or the one immediately following JOIN
, and RIGHT OUTER
joins return every matching row from the joined tables along with every non-matching row from the “right” table. A FULL OUTER JOIN
returns every row from both tables, including any rows from either table that don’t have matches.
To illustrate how these different types of JOIN
clauses return data, run the following example queries on the tables created in the previous Connecting to and Setting up a Sample Database subsection. These queries are identical except that each specifies a different type of JOIN
clause.
This first example uses an INNER JOIN
to combine the sales
and team
tables together by matching their respective salesperson
and empID
columns. Again, the INNER
keyword is implied even though it’s not explicitly included:
- SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
- FROM sales JOIN team
- ON sales.salesperson = team.empID;
Because this query uses an INNER JOIN
clause, it only returns matching rows from both tables:
Output+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName |
+--------+----------+-------------+----------+
| 1 | 7 | 1 | Florence |
| 4 | 1 | 3 | Diana |
| 5 | 5 | 3 | Diana |
| 2 | 10 | 4 | Betty |
| 3 | 8 | 4 | Betty |
+--------+----------+-------------+----------+
5 rows in set (0.00 sec)
This version of the query uses a LEFT OUTER JOIN
clause instead:
- SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
- FROM sales LEFT OUTER JOIN team
- ON sales.salesperson = team.empID;
Like the previous query, this one also returns every matching value from both tables. However, it also returns any values from the “left” table (in this case, sales
) that don’t have matches in the “right” table (team
). Because these rows in the left table don’t have matches in the right, the unmatched values are returned as NULL
:
Output+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName |
+--------+----------+-------------+----------+
| 1 | 7 | 1 | Florence |
| 2 | 10 | 4 | Betty |
| 3 | 8 | 4 | Betty |
| 4 | 1 | 3 | Diana |
| 5 | 5 | 3 | Diana |
| 6 | 1 | NULL | NULL |
| 7 | 3 | NULL | NULL |
| 8 | 4 | NULL | NULL |
+--------+----------+-------------+----------+
8 rows in set (0.00 sec)
This next version of the query instead uses a RIGHT JOIN
clause:
- SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
- FROM sales RIGHT JOIN team
- ON sales.salesperson = team.empID;
Notice that this query’s JOIN
clause reads RIGHT JOIN
instead of RIGHT OUTER JOIN
. Similarly to how the INNER
keyword isn’t required to specify an INNER JOIN
clause, OUTER
is implied any time you write LEFT JOIN
or RIGHT JOIN
.
This query’s result is the opposite of the previous one in that it returns every row from both tables, but only the unmatched rows from the “right” table:
Output+--------+----------+-------------+----------+
| saleID | quantity | salesperson | empName |
+--------+----------+-------------+----------+
| 1 | 7 | 1 | Florence |
| NULL | NULL | NULL | Mary |
| 4 | 1 | 3 | Diana |
| 5 | 5 | 3 | Diana |
| 2 | 10 | 4 | Betty |
| 3 | 8 | 4 | Betty |
+--------+----------+-------------+----------+
6 rows in set (0.00 sec)
Note: Be aware that MySQL doesn’t support FULL OUTER JOIN
clauses. To illustrate what data this query would return if it used a FULL OUTER JOIN
clause, here’s what the result set would look like on a PostgreSQL database:
- SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
- FROM sales FULL OUTER JOIN team
- ON sales.salesperson = team.empID;
Output saleid | quantity | salesperson | empname
--------+----------+-------------+----------
1 | 7 | 1 | Florence
2 | 10 | 4 | Betty
3 | 8 | 4 | Betty
4 | 1 | 3 | Diana
5 | 5 | 3 | Diana
6 | 1 | |
7 | 3 | |
8 | 4 | |
| | | Mary
(9 rows)
As this output indicates, the FULL JOIN
returns every row in both tables including the unmatched ones.
JOIN
ClausesWhen joining tables with long or highly descriptive names, having to write multiple fully qualified column references can become tedious. To avoid this, users sometimes find it helpful to provide table or column names with shorter aliases.
You can do this in SQL by following any table definition in the FROM
clause with the AS
keyword, and then following that with an alias of your choice:
- SELECT t1.column1, t2.column2
- FROM table1 AS t1 JOIN table2 AS t2
- ON t1.related_column = t2.related_column;
This example syntax uses aliases in the SELECT
clause even though they aren’t defined until the FROM
clause. This is possible because, in SQL queries, the order of execution begins with the FROM
clause. This can be confusing, but it’s helpful to remember this and think of your aliases before you begin writing the query.
As an example, run the following query which joins the sales
and products
tables and provides them with the aliases S
and P
, respectively:
- SELECT S.saleID, S.quantity,
- P.productName,
- (P.price * S.quantity) AS revenue
- FROM sales AS S JOIN products AS P
- USING (productID);
Notice that this example also creates a third alias, revenue
, for the product of the values in the sales
table’s quantity
column and their matching values from the products
table’s price
column. This is only apparent in the column name in the result set, but providing aliases like this can be helpful for conveying the meaning or purpose behind query results:
Output+--------+----------+-------------+---------+
| saleID | quantity | productName | revenue |
+--------+----------+-------------+---------+
| 1 | 7 | widget | 132.93 |
| 2 | 10 | whatzit | 299.90 |
| 3 | 8 | gizmo | 115.92 |
| 4 | 1 | thingamajig | 39.99 |
| 5 | 5 | widget | 94.95 |
| 6 | 1 | whatzit | 29.99 |
| 7 | 3 | widget | 56.97 |
| 8 | 4 | whatzit | 119.96 |
+--------+----------+-------------+---------+
8 rows in set (0.00 sec)
Note that when defining an alias the AS
keyword is technically optional. The previous example could also be written like this:
- SELECT S.saleID, S.quantity, P.productName, (P.price * S.quantity) revenue
- FROM sales S JOIN products P
- USING (productID);
Even though the AS
keyword isn’t needed to define an alias, it’s considered a good practice to include it. Doing so can help keep the query’s purpose clear and improve its readability.
By reading this guide, you learned how to use JOIN
operations to combine separate tables into a single query result set. While the commands shown here should work on most relational databases, be aware that every SQL database uses its own unique implementation of the language. You should consult your DBMS’s official documentation for a more complete description of each command and their full sets of options.
If you’d like to learn more about working with SQL, we encourage you to check out the other tutorials in this series on How To Use SQL.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Structured Query Language — commonly known as SQL — is a language used to define, control, manipulate, and query data held in a relational database. SQL has been widely adopted since it was first developed in the 1970s, and today it’s the predominant language used to manage relational database management systems.
Ideal for managing structured data (data that can fit neatly into an existing data model), SQL is an essential tool for developers and system administrators in a wide variety of contexts. Also, because of its maturity and prevalence, candidates with SQL experience are highly sought after for jobs across a number of industries.
This series is intended to help you get started with using SQL. It includes a mix of conceptual articles and tutorials which provide introductions to various SQL concepts and practices. You can also use the entries in this series for reference while you continue to hone your skills with SQL.
Note: Please be aware that the tutorials in this series use MySQL in examples, but many RDBMSs use their own unique implementations of SQL. Although the commands outlined in this tutorial will work on most RDBMSs, the exact syntax or output may differ if you test them on a system other than MySQL.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Thank, your article safe my days. I learn and sharping more about sql at online database design from dynobird
Thanks for your sharing