This week it’s going to be a short blog post, touching on the difference between using UNION and UNION ALL operators in SQL Server.
UNION and UNION ALL are set operators that we use to concatenate two data sets together.
To showcase their differences I created two tables, Table1 and Table2, naming was never my strong point. These tables hold random first and last names.
SELECT * FROM Table1
SELECT * FROM Table2
As you can see between the two tables, ‘Will Smith’, ‘Bobby Brown’, ‘Luffy Monkey’ and ‘Chris Rock’ exist in both tables.
Now we will see how the operators interact with those duplicated records.
SELECT * FROM Table1 UNION ALL SELECT * FROM Table2
All 12 records are returned when UNION ALL is used. Both full data sets are concatenated.
SELECT * FROM Table1 UNION SELECT * FROM Table2
When we use UNION, there are only 8 records. This is because no duplicate records are returned.
Do they matter in terms of Performance?
There is a performance difference between the operators.
Below you see the execution plans for UNION and UNION ALL respectively
The main difference between the two plans is the SORT operator in the UNION execution plan. It makes sense why this happens as SQL Server needs to remove duplicate records when the UNION operator is used.
This adds an additional overhead for SQL Server when executing UNION in comparison to UNION ALL.
There is also a difference when Collation comes into play
-- Query 1 UNION SELECT 'abc' UNION SELECT 'ABC' UNION SELECT 'AbC ' UNION SELECT 'abC' -- Query 2 UNION ALL SELECT 'abc' UNION ALL SELECT 'ABC' UNION ALL SELECT 'AbC ' UNION ALL SELECT 'abC'
Query 2 returns all the records as UNION ALL is used, but in query 1 only ‘abc’ is returned. This is due to the default collation of SQL Server, Latin1_General_CI_AS.
So what’s the difference between these two?
UNION ALL – Includes duplicate records
UNION – No duplicate records
UNION ALL is also better in terms of performance
Have a great week!