Earlier today I was discussing methods you can use to differences in data sets between two tables or views. My colleague was discussing the usual method
SELECT columName(s) FROM table1 t1 (READCOMMITTED) LEFT JOIN table2 (READCOMMITTED) ON t1.sharedKey = t2.sharedKey WHERE t2.sharedKey IS NULL
While this is a perfectly cromulent way of finding out what records are missing from table2, I mentioned the EXCEPT operator and got a questioning look, so I took the time to explain UNION, INTERSECT, and EXCEPT to him. Now, I’d like to share this with you.
UNION
The UNION operator allows you to combine the output of two SELECT statements into one result set. There are a few catches that you need to know about. First, the number of columns must be the same between the result sets. Second, the data types for the columns returned must me the same, or able to be converted to the same. Finally you should also know that UNION will return only a distinct record set, if you want to keep your duplicate rows, you must use UNION ALL instead.
SELECT colmn1, column2 FROM testing.dbo.table1 t1 (READCOMMITTED) UNION --ALL --all is optional SELECT colmn1, column2 FROM testing.dbo.table1 t1 (READCOMMITTED)
INTERSECT
Just like the UNION operator, INTERSECT allows you to join two data sets into one. The difference is, The INTERSECT operator will only return records from the first table or view where the records exist in the second table or view. This can be used like an INNER JOIN, except each column of the first table is compared to the columns of the second table. This can allow you to perform quicker comparisons between the two data sets.
I will issue a word of caution, don’t use SELECT * with the INTERSECT or EXCEPT function, if the columns are in different orders, or have slightly different data types on the columns, you may get results different than you’d expect. Since the difference will be seen by SQL, it may ignore records based on a precision change, and you wouldn’t want that, right?
SELECT colmn1, column2 FROM testing.dbo.table1 t1 (READCOMMITTED) INTERSECT SELECT colmn1, column2 FROM testing.dbo.table1 t1 (READCOMMITTED)
EXCEPT
Finally, there’s EXCEPT, this would be similar to a LEFT JOIN with a WHERE clause that looks for NULL values in the second table. This is the solution I introduced my colleague to. By running a SELECT on both tables, and including all the key columns, he could quickly look for records that were missing from the first table. Again, please keep in mind SELECT * is a bad idea when using EXCEPT. Explicitly calling column names will save you much grief in the long haul.
SELECT colmn1, column2 FROM testing.dbo.table1 t1 (READCOMMITTED) EXCEPT SELECT colmn1, column2 FROM testing.dbo.table1 t1 (READCOMMITTED)
As always, if you have any questions, send them in! I look forward to helping you add more tools to your SQL tool belt!