Joining DB records to the ‘highest’ of their related (1:m) data

2013 January 10 (Thursday), 11:43PM

I often have cause to query a database table and join it with a subtable of related data (i.e. a table with which this one has a one-to-many relationship), and to retrieve only the highest (or lowest) of related records (based on some sortable field; often a date). Usually in MySQL, and often to be used in a view and so sub-queries are not an option.

The key is to

Outer join with the subtable twice (well, the first join can be inner, if a subrecord result is required);
add, to the second join condition, the constraint that this second-subtable’s sorting field must be greater than its first-subtable counterpart; and
add a where condition that the second-subtable’s primary key be null.

Why this is so hard to remember, I do not know.

For example, for these two tables

<th>
  <code>id</code>
</th>

<th>
  <code>name</code>
</th>

<td>
  1
</td>

<td>
  Record One
</td>

<td>
  2
</td>

<td>
  Record Two
</td>

<td>
  3
</td>

<td>
  Record Three
</td>

<th>
  <code>id</code>
</th>

<th>
  <code>record_id</code>
</th>

<th>
  <code>date</code>
</th>

<td>
  1
</td>

<td>
  1
</td>

<td>
  2013-01-10
</td>

<td>
  2
</td>

<td>
  1
</td>

<td>
  2013-01-09
</td>

<td>
  3
</td>

<td>
  1
</td>

<td>
  2013-01-05
</td>

<td>
  4
</td>

<td>
  3
</td>

<td>
  2013-01-10
</td>

the SQL is

SELECT *
FROM records r
    LEFT JOIN subrecords s1 ON (s1.record_id = r.id)
    LEFT JOIN subrecords s2 ON (s2.record_id = r.id AND s2.date > s1.date)
WHERE s2.id IS NULL
ORDER BY r.id

to give a result of

<th colspan="2">
  <code>r</code>
</th>

<th colspan="3">
  <code>s1</code>
</th>

<th colspan="3">
  <code>s2</code>
</th>

<th>
  <code>id</code>
</th>

<th>
  <code>name</code>
</th>

<th>
  <code>id</code>
</th>

<th>
  <code>record_id</code>
</th>

<th>
  <code>date</code>
</th>

<th>
  <code>id</code>
</th>

<th>
  <code>record_id</code>
</th>

<th>
  <code>date</code>
</th>

<td>
  1
</td>

<td>
  Record One
</td>

<td>
  1
</td>

<td>
  1
</td>

<td>
  2013-01-10
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  2
</td>

<td>
  Record Two
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  3
</td>

<td>
  Record Three
</td>

<td>
  4
</td>

<td>
  3
</td>

<td>
  2013-01-10
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

<td>
  <em>NULL</em>
</td>

Of course the list of selected columns needs to be changed.