Example of a join with column filters
The presence of a column filter can change the query plan. A column filter is a WHERE expression that reduces the number of rows that a table contributes to a join.
The following example shows the query described in Example of query-plan execution with a filter added:
SELECT C.customer_num, O.order_num
FROM customer C, orders O, items I
WHERE C.customer_num = O.customer_num
AND O.order_num = I.order_num
AND O.paid_date IS NULL
The expression O.paid_date IS NULL filters out
some rows, reducing the number of rows that are used from the orders table.
Consider a plan that starts by reading from orders. Figure 1 displays this sample
plan in pseudocode.
Figure 1. Query plan that uses a column filter
for each row in the orders table do:
read the row into O
if O.paid_date is null then
for each row in the customer table do:
read the row into C
if O.customer_num = C.customer_num then
for each row in the items table do:
read the row into I
if I.order_num = O.order_num then
accept row and return to user
end if
end for
end if
end for
end if
end for
Let pdnull represent the number of rows in orders that
pass the filter. It is the value of COUNT(*) that results from
the following query:
SELECT COUNT(*) FROM orders WHERE paid_date IS NULL
If one customer exists for every order, the plan in Figure 1 reads the following
rows:
- All rows of the orders table once
- All rows of the customer table, pdnull times
- All rows of the items table, pdnull times
Figure 2 shows an alternative
execution plan that reads from the customer table first.
Figure 2. The alternative
query plan in pseudocode
for each row in the customer table do:
read the row into C
for each row in the orders table do:
read the row into O
if O.paid_date is null and
O.customer_num = C.customer_num then
for each row in the items table do:
read the row into I
if I.order_num = O.order_num then
accept row and return to user
end if
end for
end if
end for
Because the filter is not applied in the first step that Figure 2 shows, this plan reads
the following rows:
- All rows of the customer table once
- All rows of the orders table once for every row of customer
- All rows of the items table, pdnull times
The query plans in Figure 1 and Figure 2 produce the same output in a different sequence. They differ in that one reads a table pdnull times, and the other reads a table SELECT COUNT(*) FROM customer times. By choosing the appropriate plan, the optimizer can save thousands of disk accesses in a real application.