I often have cause to query a database table and join it with a subtable of related data (i.e. a table with which this one has a one-to-many relationship), and to retrieve only the highest (or lowest) of related records (based on some sortable field; often a date). Usually in MySQL, and often to be used in a view and so sub-queries are not an option.
The key is to
Outer join
with the subtable twice (well, the first join can be inner, if a subrecord result is required);- add, to the second join condition, the constraint that this second-subtable’s sorting field must be greater than its first-subtable counterpart; and
- add a
where
condition that the second-subtable’s primary key be null.
Why this is so hard to remember, I do not know.
For example, for these two tables
<th> <code>id</code> </th> <th> <code>name</code> </th>
<td> 1 </td> <td> Record One </td>
<td> 2 </td> <td> Record Two </td>
<td> 3 </td> <td> Record Three </td>
<th> <code>id</code> </th> <th> <code>record_id</code> </th> <th> <code>date</code> </th>
<td> 1 </td> <td> 1 </td> <td> 2013-01-10 </td>
<td> 2 </td> <td> 1 </td> <td> 2013-01-09 </td>
<td> 3 </td> <td> 1 </td> <td> 2013-01-05 </td>
<td> 4 </td> <td> 3 </td> <td> 2013-01-10 </td>
the SQL is
SELECT *
FROM records r
LEFT JOIN subrecords s1 ON (s1.record_id = r.id)
LEFT JOIN subrecords s2 ON (s2.record_id = r.id AND s2.date > s1.date)
WHERE s2.id IS NULL
ORDER BY r.id
to give a result of
<th colspan="2"> <code>r</code> </th> <th colspan="3"> <code>s1</code> </th> <th colspan="3"> <code>s2</code> </th>
<th> <code>id</code> </th> <th> <code>name</code> </th> <th> <code>id</code> </th> <th> <code>record_id</code> </th> <th> <code>date</code> </th> <th> <code>id</code> </th> <th> <code>record_id</code> </th> <th> <code>date</code> </th>
<td> 1 </td> <td> Record One </td> <td> 1 </td> <td> 1 </td> <td> 2013-01-10 </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td>
<td> 2 </td> <td> Record Two </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td>
<td> 3 </td> <td> Record Three </td> <td> 4 </td> <td> 3 </td> <td> 2013-01-10 </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td> <td> <em>NULL</em> </td>
Of course the list of selected columns needs to be changed.