Overview
I want to cover a fundamental topic for SQL programming. This is the topic you should start studying right after you have a basic understanding of what a relational database, what tables are, and that tables can be related to each other. If you’re a fan of Duct tape, then Joins will become the duct tape of your SQL tool belt.
To download a script that will give you a copy of all the tables used as examples in this post, click here.
When two tables are defined that have a column in common, then you can write a query that will tell you the information in the one table that relates to information in the other table. There are three basic types of joins:
- INNER – show only the records where there is a match in both tables.
- OUTER – show the records from one table, even if there isn’t a match in the other table.
- CROSS – Show all the records of the first table, matching each record of the first table to each record of the second table.
Take a minute to process those options. You’ll find that if two pieces of data are truly related, any relationship you can think of will fall into one of these three categories. In future lessons I’ll go into much greater detail about when you would want to use each of these three joins. Also, in later posts, I’ll introduce you to performance implications of each of these joins. But for now, I’m going to give you examples of each of these types. You will start to learn how to use them, and you may even begin to think about when you would want to use each of these in your own programming.
INNER JOIN
Whenever you’re looking to get data from one table, where there’s matching data in the other, you want to use an INNER JOIN. Let’s revisit a problem from the last post. In our previous post we wanted to see all the data in the productSale table, where there was a match in the product table. Here is the code we used
SELECT * FROM productSale WHERE productName IN ( SELECT productName FROM products WHERE price <= '1.00') buyer productName purchaseDate qtypurchased pricePaid Shannon Lowder paper 2000-01-01 00:00:00.000 2 1.00 Shannon Lowder pencil 2000-01-05 00:00:00.000 1 0.25 Shannon Lowder pencil 2000-01-07 00:00:00.000 1 0.25
This same query can now be re-written as an INNER JOIN and show us the same records.
SELECT productSale.* FROM productSale INNER JOIN products on productSale.productName = products.productName WHERE products.price <= 1.00 buyer productName purchaseDate qtypurchased pricePaid Shannon Lowder paper 2000-01-01 00:00:00.000 2 1.00 Shannon Lowder pencil 2000-01-05 00:00:00.000 1 0.25 Shannon Lowder pencil 2000-01-07 00:00:00.000 1 0.25
In small record sets there won’t be a difference in the performance of these two queries. But when you start dealing with hundreds of thousands of rows. You’ll find there is a difference in performance.
Another good point to take from this example is there is almost always more than one way to write a query, and get the same results. When discussing queries with other developers, keep that in mind. New techniques will come to you all the time, you have to learn to take the strengths of these new techniques and compare them with the way you’re doing things now. You never know when a difference of method will teach you a better method for doing something.
OUTER JOIN
While INNER JOIN will show you records where there is a match, OUTER JOIN will show you data, even if there isn’t a match. There are three variations of the OUTER JOIN in Microsoft SQL. LEFT, RIGHT, and FULL OUTER joins will combine one table’s data with another, and give you three slightly different versions of the data.
LEFT JOIN
A LEFT JOIN will show you all the data from the first (or LEFT) table, even if that record doesn’t have a match in the second (or RIGHT) table.
SELECT * FROM leftTable LEFT JOIN rightTable on leftTable.value = rightTable.value id value id value 1 a NULL NULL 2 b 10 b 3 C 20 c
Notice that there is a record for “a” in leftTable, but there isn’t a record for “a” in rightTable.
RIGHT JOIN
A RIGHT JOIN will show you all the data from the second (or RIGHT) table, even if that record doesn’t have a match in the first (or LEFT) table.
SELECT * FROM leftTable RIGHT JOIN rightTable on leftTable.value = rightTable.value id value id value 2 b 10 b 3 C 20 c NULL NULL 30 d
Notice there is a record for “d” in rightTable, but there isn’t a record for “d” in leftTable.
FULL OUTER JOIN
A FULL OUTER JOIN will show you all the records of the first (or LEFT) table and all the records of the second (or RIGHT) table.
SELECT * FROM leftTable FULL OUTER JOIN rightTable on leftTable.value = rightTable.value id value id value 1 a NULL NULL 2 b 10 b 3 C 20 c NULL NULL 30 d
With the FULL OUTER JOIN you see all the data from both tables, even where that value is missing from the other.
OUTER JOIN can be a very useful tool when you’re trying to locate missing data. This will be a reoccurring task you’ll face as a database administrator. Practice with these joins until you know the difference by memory. It will serve you well!
CROSS JOIN
The CROSS JOIN is the least used query. At least in my opinion, it is. In a CROSS JOIN, ach record of the first table is joined to each record of the second table. This join is also called a cross-product. The only time I’ve found it useful to use a CROSS JOIN is to create a lot of data quickly for test scenarios.
The reason is simple, if you have 10 test records, and you want 1000, simply CROSS JOIN the 10 records to a table with 100, and INSERT the results into a new table. Let’s look at a CROSS JOIN now.
SELECT * FROM leftTable CROSS JOIN rightTable id value id value 1 a 10 b 2 b 10 b 3 C 10 b 1 a 20 c 2 b 20 c 3 C 20 c 1 a 30 d 2 b 30 d 3 C 30 d
I’d also like to point out a query style that can cause you problems. If you see a query like the following, try to rewrite it as an OUTER or INNER JOIN.
SELECT * FROM leftTable, rightTable
Occasionally you’ll see this style join (referred to Microsoft style joins) in code, and the person who wrote it, thinks it’s an INNER JOIN. The problem is, just this part of the query is actually a CROSS JOIN. It will save you a lot of effort and grief to use explicitly defined joins, rather than not knowing for certain what JOIN you’ve written.
Otherwise you might get duplicates, and not understand why.
Conclusion
Adding a JOIN to your query will dramatically increase the types of data you can pull from your database. It’s the foundation of relationships in relational database management systems. Learn when to use each type of JOIN. More importantly, learn when you don’t want to use a certain JOIN.
Master using it, and you can have this.
As always, if you have any questions, send them in! I’m here to help!
Previous: Sub Queries | Next: INSERT |