Subscribe RSS
Database Primer – Part 3 Oct 25
She didnt get it till she read this

The moment the left join becomes clear

Part 1 | Part 2 | Part 3

Mastering the LEFT join

The left join is very powerful, very useful and very under used. In my experience few people have even heard of the left join, let alone used one. The left join comes in handy when you are trying to join tables together where you have unreliable joins or where you are managing subscriptions between two objects. A similar result to the left join can be achieved programatically using nested queries. I’ve seen many a PHP script that uses nested loops of SQL queries to accomplish a similar result. However performing these joins programtically is of course far slower than using a database that has been specially designed for the purpose. Using a query is simpler too, once you know how it works.

As with the standard join, its important to know when a left join is applicable. There ae many situations where the use of a left join is applicable, but i’ll just explain the simple, most obvious and most common ones I come across almost daily.

Unreliable Joins

This is a term I have made up. Please feel free to suggest a more technically correct word but this does it for me. An unreliable join is where we have a relationship between two objects but the relationship is optional – it does not always exist. Its ‘unreliable’.

The objects and tables we have so far have reliable joins, that is the relationships between them are considered always relevant and required.

e.g. All surfer’s have a gender. The model makes no sense if you have a surfer without a gender. If we have a case where the value for gender_id in the surfer table does not match an id value in the gender table then, well, the data is corrupt. This condition shouldnt exist. (Some databases let you check and enforce this but lets not go there now!)

In order to demonstrate an unreliable join we need a scenario that fits: As you know, surfers like their surfer mobiles. Unfortunately our surf shack on has a small car park so to manage the parking we need to keep a list of each surfers motors. We can then clamp and charge extortianate rates to anyone not on the list!

With this scenario in mind, the following information about a vehicle needs to be stored:
* unique id
* id of the surfer who owns the vehicle
* description of vehicle
* license plate number of the vehicle

An object model diagram for the relationship between surfers and their vehicles can be written as follows:

	SURFER 1-------* VEHICLE

Read from left to right: “Each surfer owns zero or more vehicles”
Read from right to left: “Each vehicle has exactly one owner”

Notice how this relationship differs to the relationship to gender:

	SURFER *-------1 GENDER

The most important thing to realise about this new relationship then is that NOT ALL SURFERS HAVE TO OWN A VEHICLE. And this is true for any object related to surfer where the relationship is zero or more (line ENDS in a *).

The table is created with the following vlaues:

+----+-----------+----------------------+--------------+
| id | surfer_id | description          | licenceplate |
+----+-----------+----------------------+--------------+
|  1 |         2 | Blue and white van   | SRFR 1       |
|  2 |         4 | Black Hummer         | W4V3 42      |
|  3 |         5 | Red Ferarri          | P0ZR 12      |
|  4 |         7 | Yellow Reliant Robin | TR0TR 8      |
+----+-----------+----------------------+--------------+

If we wanted to generate a list of Surfers and their vehicles we could use a standard join:

select s.name, v.description, v.licenceplate from surfer as s, vehicle as v where v.surfer_id=s.id;

However this only gives us results where a surfer owns a vehicle, as the standard join works as a restrictive filter:

+-------+----------------------+--------------+
| name  | description          | licenceplate |
+-------+----------------------+--------------+
| Kylie | Blue and white van   | SRFR 1       |
| Hank  | Black Hummer         | W4V3 42      |
| Brad  | Red Ferarri          | P0ZR 12      |
| Tom   | Yellow Reliant Robin | TR0TR 8      |
+-------+----------------------+--------------+

If we want a list of ALL of the surfers, displaying their vehicle details where there are some, we need to use a left join. The left join is aptly named – it returns results for all records for the table “on the left” (subject to restrictions in the WHERE clause), displaying values for where ther is a match in the table on the right of the join or a NULL otherwise.

select s.name, v.description, v.licenceplate from surfer as s left join vehicle as v on v.surfer_id=s.id;

In this query the “surfer as s” is on the left side of the join, with “vehicle as v” on the right. When joining using a left join you define the join conditions in the ON clause rather that the WHERE clause. The general format is:

SELECT fields FROM table_a LEFT JOIN table_b ON left_join_condition WHERE conditions

Whilst we’re looking at the syntax, its worth noting that if you want multiple ‘chained’ left joins then you should have an ON clause for each left join but only ever one WHERE clause:

SELECT fields FROM table_a LEFT JOIN table_b ON left_join_condition LEFT JOIN table_c ON left_join_condition LEFT JOIN table_d ON left_join_condition WHERE conditions

As you can see from the results, we now get a full list of surfers with data filled in where appropriate for vehicle information.

+--------+----------------------+--------------+
| name   | description          | licenceplate |
+--------+----------------------+--------------+
| Sheila | NULL                 | NULL         |
| Kylie  | Blue and white van   | SRFR 1       |
| Danni  | NULL                 | NULL         |
| Hank   | Black Hummer         | W4V3 42      |
| Brad   | Red Ferarri          | P0ZR 12      |
| Joey   | NULL                 | NULL         |
| Tom    | Yellow Reliant Robin | TR0TR 8      |
| Jason  | NULL                 | NULL         |
+--------+----------------------+--------------+

You can use the fact that there are NULL results to generate useful queries. For example the following query returns as list of surfers and indicates wheter they own a vehicle or not:

select s.name, if(v.id is null,"No","Yes") as owns_vehicle from surfer as s left join vehicle as v on v.surfer_id=s.id;

Which returns:

+--------+--------------+
| name   | owns_vehicle |
+--------+--------------+
| Sheila | No           |
| Kylie  | Yes          |
| Danni  | No           |
| Hank   | Yes          |
| Brad   | Yes          |
| Joey   | No           |
| Tom    | Yes          |
| Jason  | No           |
+--------+--------------+

The standard join we did earlier listed all the surfers who owned a vehicle. But what if you want a list of people who DO NOT own a vehicle? The last query can be amended to return this information by adding a reference to the vehicle table in the WHERE clause:

select s.name, if(v.id is null,"No","Yes") as owns_vehicle from surfer as s left join vehicle as v on v.surfer_id=s.id where v.id is null
+--------+--------------+
| name   | owns_vehicle |
+--------+--------------+
| Sheila | No           |
| Danni  | No           |
| Joey   | No           |
| Jason  | No           |
+--------+--------------+

This demonstrates that the WHERE clause effectively performs a final filter across the result set after all the left joins etc have taken place.
Mixing Joins

You can, of course, combine standard joins and left joins together. Here is a more comprehensive query:

SELECT s.name, g.gender, sk.level as skill_level, IF(v.id is NULL,"No vehicle",v.licenceplate) as licenceplate
FROM (surfer as s, gender as g, skill as sk)
LEFT JOIN vehicle as v ON v.surfer_id=s.id
WHERE s.gender_id=g.id and s.skill_id=sk.id;

Note the way we have grouped the standard joins together in brackets. This returns the following result:

+--------+--------+--------------+--------------+
| name   | gender | skill_level  | licenceplate |
+--------+--------+--------------+--------------+
| Sheila | Babe   | Mental       | No vehicle   |
| Kylie  | Babe   | Spectator    | SRFR 1       |
| Danni  | Babe   | Advanced     | No vehicle   |
| Hank   | Dude   | Spectator    | W4V3 42      |
| Brad   | Dude   | Beginner     | P0ZR 12      |
| Joey   | Dude   | Mental       | No vehicle   |
| Tom    | Dude   | Professional | TR0TR 8      |
| Jason  | Dude   | Professional | No vehicle   |
+--------+--------+--------------+--------------+

If we look back at the scenario and the object diagram we can see that a surfer may own zero or MORE vehicles. So far we have only had a surfer owning zero or 1 vehicle. If we add another vehicle for Tom, so that he now has two vehicles related to him this changes the dynamics of the join somewhat.

The same query with the extra vehicle added for Tom returns:

+--------+--------+--------------+--------------+
| name   | gender | skill_level  | licenceplate |
+--------+--------+--------------+--------------+
| Sheila | Babe   | Mental       | No vehicle   |
| Kylie  | Babe   | Spectator    | SRFR 1       |
| Danni  | Babe   | Advanced     | No vehicle   |
| Hank   | Dude   | Spectator    | W4V3 42      |
| Brad   | Dude   | Beginner     | P0ZR 12      |
| Joey   | Dude   | Mental       | No vehicle   |
| Tom    | Dude   | Professional | TR0TR 8      |  <<--
| Tom    | Dude   | Professional | Z1PPY 14     |  <<--
| Jason  | Dude   | Professional | No vehicle   |
+--------+--------+--------------+--------------+
(note the additional row here for Tom)

So if there are more than one matches in the ‘right’ table each match is returned and you get an extra row for each additional match – but you always get at least one result for each record in the left table. The same is true of the standard join, you get addintional rows, but you DONT get rows where there is no match (SQL from above):

select s.name, v.description, v.licenceplate from surfer as s, vehicle as v where v.surfer_id=s.id;
+-------+----------------------+--------------+
| name  | description          | licenceplate |
+-------+----------------------+--------------+
| Kylie | Blue and white van   | SRFR 1       |
| Hank  | Black Hummer         | W4V3 42      |
| Brad  | Red Ferarri          | P0ZR 12      |
| Tom   | Yellow Reliant Robin | TR0TR 8      |  <<--
| Tom   | Mini Metro           | Z1PPY 14     |  <<--
+-------+----------------------+--------------+
(note the missing rows here)

Managing Subscriptions with the Left Join

Unreliable joins are one very good reason to use left joins. Another is in a subscription scenario. This type of scenario is found frequently in even the most basic of applications. The most easily recognisable is an HTML form with a series of check boxes where you check all the applicable options.

To bring this into our scenario, imagine that each surfer may own multiple surf boards of different brands. Some surfers own a board from just one brand whereas others may have boards from a variety of manufacturers.

Heres the simple data table for this:

[brand]
+----+------------+
| id | brandname  |
+----+------------+
|  1 | Billabong  |
|  2 | Rip Curl   |
|  3 | Reef       |
|  4 | O'Neil     |
|  5 | Quiksilver |
+----+------------+

The relationship between surfer and brands is many to many: A surfer may have zero or more brands of board, and for each board there may be zero or more ‘owners’. This can be illustrated as follows:

	SURFER *--------* BRAND

Now, unlike the previous relationships neither of these objects is directly related to each other. By this I mean that surfer does not have a reference to brand and brand does not have a reference to surfer. In order to manage this relationship in a database we need an extra table to manage the relationship between the objects. I call this a subscription table.

	SURFER 1------* SURFERBOARDS *-------1 BRAND
		       (subscription)

You can always convert an many-many relationship into a subscriber model by simply inserting the subscription object and changing the relationships as we have above.

We’ll call our subscription management table “surferboards”, it needs the following fields:
* id of the surfer
* id of the brand

[surferboards]
+-----------+----------+
| surfer_id | brand_id |
+-----------+----------+
|         1 |        3 |
|         1 |        4 |
|         2 |        5 |
|         4 |        3 |
|         5 |        6 |
|         7 |        3 |
|         7 |        5 |
|         7 |        6 |
|         8 |        2 |
+-----------+----------+

We can see from this table that surfer 2 (Kylie) owns a board of type 2 (Rip Curl) and surfer 7 (Tom) owns three types of board. Surfers 3 and 6 (Danni and Joey) do not own a surfboard of any type.

We can use a standard join to get some information from this data model. To list all the types of board for a particular surfer (Tom) the following query could be used:

select b.brandname from brand b, surferboards s where b.id=s.brand_id and s.surfer_id=7;

+------------+
| brandname  |
+------------+
| Rip Curl   |
| Reef       |
| Quiksilver |
+------------+

Whilst this is useful for display, the left join allows us to retrieve more structured data. If you wanted to display a HTML form with checkboxes for each brand you need to have a full list of the brands and an indication as to whether the checkbox should be checked or not. Here is the query to achieve that, again for Tom:

select b.brandname, if(s.brand_id is null,"Not Checked","Checked") as chk from brand b left join surferboards s on b.id=s.brand_id and s.surfer_id=7;
+------------+-------------+
| brandname  | chk         |
+------------+-------------+
| Billabong  | Not Checked |
| Rip Curl   | Checked     |
| Reef       | Checked     |
| O'Neil     | Not Checked |
| Quiksilver | Checked     |
+------------+-------------+

For each record in the result table you would display a label and a check box, marking the check box as selected where indicatd in the chk column.

Say you wanted to know what brands were owned by each surfer. You can do this in one query:

select s.name, if(b.brandname is null, "--none--",b.brandname) as brand from surfer s left join surferboards sb on s.id=sb.surfer_id left join brand b on b.id=sb.brand_id order by s.name;
+--------+------------+
| name   | brand      |
+--------+------------+
| Brad   | Rip Curl   |
| Danni  | --none--   |
| Hank   | Reef       |
| Jason  | Rip Curl   |
| Joey   | --none--   |
| Kylie  | Quiksilver |
| Sheila | Reef       |
| Sheila | O'Neil     |
| Tom    | Rip Curl   |
| Tom    | Reef       |
| Tom    | Quiksilver |
+--------+------------+

As pointed out before, the left join gives us a row for every record in the left table and an additional row for each additional match in the right table. We can also simply summarise the data display how many types of board each member has by using the COUNT function and grouping by surfer:

select s.name, count(sb.surfer_id) as numboards from surfer s left join surferboards sb on s.id=sb.surfer_id left join brand b on b.id=sb.brand_id group by s.id order by s.name;
+--------+-----------+
| name   | numboards |
+--------+-----------+
| Brad   |         1 |
| Danni  |         0 |
| Hank   |         1 |
| Jason  |         1 |
| Joey   |         0 |
| Kylie  |         1 |
| Sheila |         2 |
| Tom    |         3 |
+--------+-----------+

As you can see, the left join has made this possible in one query, returning all of the surfers.

Out of interest, when I first prepared this SQL statment I made an error in the GROUP clause, take care to group by the correct field (out of the left table):

//This statment gives an erroneous result
select s.name, count(sb.surfer_id) as numboards from surfer s left join surferboards sb on s.id=sb.surfer_id left join brand b on b.id=sb.brand_id group by sb.surfer_id order by s.name;
+--------+-----------+
| name   | numboards |
+--------+-----------+
| Brad   |         1 |
| Danni  |         0 |
| Hank   |         1 |
| Jason  |         1 |
| Kylie  |         1 |   <<-- where's 'Joey' ?
| Sheila |         2 |
| Tom    |         3 |
+--------+-----------+

Final Model

For completeness sake, here is our final model for the system:

GENDER 1------* SURFER *------1 SKILL
                   1
                   |
                   *
              SURFERBOARDS
                   *
                   |
                   1
                 BRAND
Category: SQL, Tutorials  | Tags: , , ,
You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.
Leave a Reply