This week it’s going to be a short blog post, touching on the difference between using UNION and UNION ALL operators in SQL Server.
UNION and UNION ALL are set operators that we use to concatenate two data sets together.
To showcase their differences I created two tables, Table1 and Table2, naming was never my strong point. These tables hold random first and last names.
SELECT *
FROM Table1

SELECT *
FROM Table2

As you can see between the two tables, ‘Will Smith’, ‘Bobby Brown’, ‘Luffy Monkey’ and ‘Chris Rock’ exist in both tables.
Now we will see how the operators interact with those duplicated records.
UNION ALL:
SELECT *
FROM Table1
UNION ALL
SELECT *
FROM Table2

All 12 records are returned when UNION ALL is used. Both full data sets are concatenated.
UNION:
SELECT *
FROM Table1
UNION
SELECT *
FROM Table2

When we use UNION, there are only 8 records. This is because no duplicate records are returned.
Do they matter in terms of Performance?
There is a performance difference between the operators.
Below you see the execution plans for UNION and UNION ALL respectively

The main difference between the two plans is the SORT operator in the UNION execution plan. It makes sense why this happens as SQL Server needs to remove duplicate records when the UNION operator is used.
This adds an additional overhead for SQL Server when executing UNION in comparison to UNION ALL.
There is also a difference when Collation comes into play
-- Query 1 UNION
SELECT 'abc'
UNION
SELECT 'ABC'
UNION
SELECT 'AbC '
UNION
SELECT 'abC'
-- Query 2 UNION ALL
SELECT 'abc'
UNION ALL
SELECT 'ABC'
UNION ALL
SELECT 'AbC '
UNION ALL
SELECT 'abC'

Query 2 returns all the records as UNION ALL is used, but in query 1 only ‘abc’ is returned. This is due to the default collation of SQL Server, Latin1_General_CI_AS.
Summary
So what’s the difference between these two?
Simply put:
UNION ALL – Includes duplicate records
UNION – No duplicate records
UNION ALL is also better in terms of performance
Have a great week!