MediaWiki with two database servers

I’ve been trying to replicate locally a bug with MediaWiki’s GlobalPreferences extension. The bug is about the increased number of database reads that happen when the extension is loaded, and the increase happens not on the database table that stores the global preferences (as might be expected) but rather on the ‘local’ tables. However, locally I’ve had all of these running on the same database server, which makes it hard to watch the standard monitoring tools to see differences; so, I set things up on two database servers locally.

Firstly, this was a matter of starting a new MySQL server in a Docker container (accessible at and with its data in a local directory so I could destroy and recreate the container as required):

docker run -it -e MYSQL_ROOT_PASSWORD=pwd123 -p3305:3306 -v$PWD/mysqldata:/var/lib/mysql mysql

(Note that because we’re keeping local data, root’s password is only set on the first set-up, and so the MYSQL_ROOT_PASSWORD can be left off future invocations of this command.)

Then it’s a matter of setting up MediaWiki to use the two servers:

$wgLBFactoryConf = [
	'class' => 'LBFactory_Multi',
	'sectionsByDB' => [
		// Map of database names to section names.
		'mediawiki_wiki1' => 's1',
		'wikimeta' => 's2',
	'sectionLoads' => [
		// Map of sections to server-name/load pairs.
		'DEFAULT' => [ 'localdb'  => 0 ],
		's1' => [ 'localdb'  => 0 ],
		's2' => [ 'metadb' => 0 ],
	'hostsByName' => [
		// Map of server-names to IP addresses (and, in this case, ports).
		'localdb' => '',
		'metadb' => '',
	'serverTemplate' => [
		'dbname'        => $wgDBname,
		'user'          => $wgDBuser,
		'password'      => $wgDBpassword,
		'type'          => 'mysql',
		'flags'         => DBO_DEFAULT,
		'max lag'       => 30,
$wgGlobalPreferencesDB = 'wikimeta';

Should Tabulate support ENUM columns?

I’m trying to figure out if it’s worthwhile adding better support for enumerated fields in Tabulate. MySQL’s ENUM type is useful when one has an immutable list of options such as days of the week, seasons of the year, planets in the solar system, or suits in a deck of cards. It can also be good for making ternary options more distinct and communicable than a nullable boolean field.

But really, I’ve never used them! I mean, I have in one place (which is why this is coming up for me at all, because I’m trying to do some work with an ancient database that I’ve pulled into WordPress + Tabulate) but I’ve never used them in any situation that I couldn’t have more easily solved by adding a cross-reference to another table.

Reference tables are far easier to work with, and allow other metadata to be attached to the referenced values (such as colour, in the card-suit example).

However, ENUMs are already supported by Tabulate for the display of data, so I guess I should just do the little extra bit of work required to add support to the table-structure editing as well. Even if no one uses it.

(On a related note, I don’t think SET fields are going to get the same treatment!)

Tabulate 1.0.0

I’ve just realeased version 1.0.0 of Tabulate, a WordPress plugin for working with data in a site’s MySQL database. I’ve been using it for a few months in production, and the shift from 0.* to 1.0 was fairly arbitrary — it just seemed stable enough now. The new feature that got included in this release is the ability to export to OpenStreetMap XML (not a great leap ahead of the KML export that was already done).

Any problems with Tabulate can be lodged on the issue tracker at Github, or on the normal WordPress support forum.

Here’s a suburb’s worth of power pole locations, exported from Tabulate and opened in JOSM:

Poles in Hamilton Hill


Loading spatial data into MySQL with LOAD DATA INFILE

Just a note for my future reference: importing an Excel CSV into MySQL. The WKT column has been constructed by hand to be POINT(lng lat) and the CSV contains headers.

LOAD DATA INFILE '/full_path/to/file-on-server.csv'
INTO TABLE the_table
SET geographic_location = GeomFromText(@geographic_location)

Joining DB records to the ‘highest’ of their related (1:m) data

I often have cause to query a database table and join it with a subtable of related data (i.e. a table with which this one has a one-to-many relationship), and to retrieve only the highest (or lowest) of related records (based on some sortable field; often a date). Usually in MySQL, and often to be used in a view and so sub-queries are not an option.

The key is to

  1. Outer join with the subtable twice (well, the first join can be inner, if a subrecord result is required);
  2. add, to the second join condition, the constraint that this second-subtable’s sorting field must be greater than its first-subtable counterpart; and
  3. add a where condition that the second-subtable’s primary key be null.

Why this is so hard to remember, I do not know.

For example, for these two tables

records table
id name
1 Record One
2 Record Two
3 Record Three
subrecords table
id record_id date
1 1 2013-01-10
2 1 2013-01-09
3 1 2013-01-05
4 3 2013-01-10

the SQL is

FROM records r
    LEFT JOIN subrecords s1 ON (s1.record_id =
    LEFT JOIN subrecords s2 ON (s2.record_id = AND >

to give a result of

r s1 s2
id name id record_id date id record_id date
1 Record One 1 1 2013-01-10 NULL NULL NULL
3 Record Three 4 3 2013-01-10 NULL NULL NULL

Of course the list of selected columns needs to be changed.

Finding all date ranges (which may be open-ended) that overlap with a given range

Given a database table listing events and their date ranges: events { id, start_date, end_date, }, where either or both of the dates can be null, how is one to find all of the events that fall within (even partially) a given date range? (This is pretty much what Kieran Benton asked on Stack Overflow, with the addition of the nullability.)

There are twelve possibilities for ranges with respect to the given range A-B:

Note that the gradients indicate null start or end dates. PNG

The red ranges are the ones that should be included in the result; the orange ones should be omitted. It’s easier to query for the smaller set, which satisfy the following conditions:

> A < A > B < B NULL
Start Date x

End Date

Start Date

End Date

Start Date


End Date


Start Date


End Date


Which corresponds to the following SQL:

SELECT * FROM events
  (start_date IS NOT NULL AND start_date < :A AND end_date IS NOT NULL AND end_date < :A)
  OR (start_date IS NOT NULL AND start_date > :B AND end_date IS NOT NULL AND end_date > :B)
  OR (start_date IS NULL AND end_date IS NOT NULL AND end_date < :A)
  OR (start_date IS NOT NULL AND start_date > :B and end_date IS NULL)