Sunday, 31 January 2016

SQL Subqueries

Introduction

Before we start discussing what a subquery is, we need to have a sample database schema.
Diagram was created with Vertabelo

What is a subquery?

A subquery is a SELECT statement with another SQL statement, like in the example below:
select *
from product
where id in
           (select product_id
            from provider_offer
            where provider_id = 156);
Subqueries are further classified as either a correlated subquery or a nested subquery. They are usually constructed in such a way to return:
  • a table:
    select max(average.average_price)
    from (select product_category, avg(price) as average_price
          from product
          group by product_category) average;      
  • or a value :
    select id
    from purchase
    where  value > (select avg(value)
                    from purchase )

Nested Subqueries

Nested subqueries are subqueries that don't rely on an outer query. In other words, both queries in a nested subquery may be run as separate queries.
This type of subquery could be used almost everywhere, but it usually takes one of these formats:
SELECT
FROM
WHERE [NOT] IN (subquery)

Select *
from client
where city in (select city
               from provider);
The example subquery returns all clients that are from the same city as the product providers.
The IN operator checks if the value is within the table and retrieves the matching rows.

Correlated Subqueries

Subqueries are correlated when the inner and outer queries are interdependent, that is, when the outer query is a query that contains a subquery and the subquery itself is an inner query. Users that know programming concepts may compare it to a nested loop structure.
Let's start with a simple example. The inner query calculates the average value and returns it. In the outer query’s where clause, we filter only those purchases which have a value greater than the inner query’s returned value.

Subquery Correlated in Where Clause

select id
from purchase p1
where date > '2013-07-15'
and value > (select avg(value)
             from purchase p2
             where p1.date = p2.date)
The query returns purchases after 15/07/2014 with a total price greater than the average value from the same day.
The equivalent example, but with joining tables.
Select  p1.id
from purchase p1, purchase p2
where p1.date = p2.date
and p1.date > '2013-07-15'
group by p1.id
having p1.value > avg(p2.value);
This example can also be written as a select statement with a subquery correlated in a from clause.
The subquery returns the table that contains the average value for each purchase for each day. We join this result with the Purchase table on column 'date' to check the condition date > '15/07/2014'.
select id
from purchase, (select date, avg(value) as average_value
                from purchase
                  where date > '2013-07-15'
                  group by date) average
where purchase.date = average.date
and purchase.date > '2013-07-15'
and purchase.value > average.average_value;
Usually, this kind of subquery should be avoided because indexes can't be used on a temporary table in memory.
When a subquery is used, the query optimizer performs additional steps before the results from the subquery are used. If a query that contains a subquery can be written using a join, it should be done this way. Joins usually allow the query optimizer to retrieve the data in a more efficient way.
You can find an extended article as well as more examples here.

No comments:

Post a Comment