Tabulate 2.10.3

I fixed a bug in Tabulate last night, and released version 2.10.3. I’ve rather been neglecting this project for a while, so it’s been nice to come back to it. There are a few things that I might try to get done on it in the next few days or weeks (starting with more documentation).

The bug was to do with the change in WordPress’ default database character set and how that affected the reports ‘title’ column. Previously, a key length on this column of 200 was okay as it was under MySQL’s maximum of 767 (i.e. 767 bytes / 3 bytes for utf8 = max. 255 characters), but now the default is utf8mb4 each character can use up to 4 bytes instead of 3, and so the length must be reduced to 191 (i.e. 767 bytes / 4 bytes for utf8mb4 = max. 191 characters). WordPress core only does this for sites running on MySQL 5.5 and above, but it seemed easier just to reduce the key length for all Tabulate installations; it doesn’t seem likely to be an annoyance to anyone.

MediaWiki with two database servers

I’ve been trying to replicate locally a bug with MediaWiki’s GlobalPreferences extension. The bug is about the increased number of database reads that happen when the extension is loaded, and the increase happens not on the database table that stores the global preferences (as might be expected) but rather on the ‘local’ tables. However, locally I’ve had all of these running on the same database server, which makes it hard to watch the standard monitoring tools to see differences; so, I set things up on two database servers locally.

Firstly, this was a matter of starting a new MySQL server in a Docker container (accessible at 127.0.0.1:3305 and with its data in a local directory so I could destroy and recreate the container as required):

docker run -it -e MYSQL_ROOT_PASSWORD=pwd123 -p3305:3306 -v$PWD/mysqldata:/var/lib/mysql mysql

(Note that because we’re keeping local data, root’s password is only set on the first set-up, and so the MYSQL_ROOT_PASSWORD can be left off future invocations of this command.)

Then it’s a matter of setting up MediaWiki to use the two servers:

$wgLBFactoryConf = [
	'class' => 'LBFactory_Multi',
	'sectionsByDB' => [
		// Map of database names to section names.
		'mediawiki_wiki1' => 's1',
		'wikimeta' => 's2',
	],
	'sectionLoads' => [
		// Map of sections to server-name/load pairs.
		'DEFAULT' => [ 'localdb'  => 0 ],
		's1' => [ 'localdb'  => 0 ],
		's2' => [ 'metadb' => 0 ],
	],
	'hostsByName' => [
		// Map of server-names to IP addresses (and, in this case, ports).
		'localdb' => '127.0.0.1:3306',
		'metadb' => '127.0.0.1:3305',
	],
	'serverTemplate' => [
		'dbname'        => $wgDBname,
		'user'          => $wgDBuser,
		'password'      => $wgDBpassword,
		'type'          => 'mysql',
		'flags'         => DBO_DEFAULT,
		'max lag'       => 30,
	],
];
$wgGlobalPreferencesDB = 'wikimeta';

‘Reports’ in Tabulate

I’ve been working on a system for integrating custom queries and outputs into Tabulate. So far, it’s only been possible to use MySQL views to combine data from different tables, and the output has been just the same as for base tables. I keep wanting to be able to apply custom formatting to columns or to whole tables — the data is fine, it can almost always be derived with a single SQL statement.

Currently, I’m working on a system of ‘reports’, where one can define a template (using Twig) and a set of SQL queries whose data will be given to the template. Pretty simple, really. I thought about (actually, still wonder if it’s not a better way) making it possible to define custom formats/templates for individual columns and column types, but everything that that approach can do can be done with these ‘reports’, plus a whole lot more.

So, a report (a record in the $prefix_tabulate_reports table) has a title, description, and a Twig template. Then it has a set of SQL queries (in the $prefix_tabulate_report_queries table), each of which has a name (which is what the query is passed into the template as).

It’s hard to know if this is enough. It might even be too much — Tabulate is not meant to be anything more than a CRUD tool, and customisation above and beyond what it does can easily be handled with custom plugins. It does seem that it’s worth making Tabulate able to do some basic display-customisation though, so that site authors can more easily work with data without having to write any custom code.

Because that’s really the goal of Tabulate: it should provide simple ways to work with tabualar data within WordPress in much the same way that spreadsheet programes do on the desktop (well, that’s perhaps a bit of an overstatement, considering the crazy things that people do with spreadsheets). It should easily insert a list of statistics into a post; or add an always-up-to-date chart to a page; or invite submissions of data points from registered users; and generally make it easier (than spreadsheets do) to enforce referential integrity, proper data types, and other essential constraints.

Drupal terminology vs old-fashioned DB words

Drupal’s entity model is pretty confusing if one is used to the strict world of ‘proper’ RDBMSs, but it does start to make sense after a while. It’s best to just forget about all the cruft that comes with the standard installation, and work with the base functionality (and some that’s currently provided by modules but seems will be in D8 core).

Taxonomy, for instance, provides a subset of the functionality that can be built with entity references (a.k.a. foreign keys in the relational model, except there’s no integrity!).

The commenting system (which is also in core) can be constructed with basic Drupal things like content types, views, blocks, and rules.

Same goes for Book pages. Everything, really.

(At least, this is my current thesis; an attempt to reduce the number of modules I have to get my head around to under a thousand…)

Basically one needs to just know of the following:


Database term Drupal equivalent (sort of)
Table Content (or Node) Type
Row Node
Column Field
Foreign key Entity reference field
View View
Enum field type list_text field type
Boolean field type list_boolean field type

Integration (and no unit) tests

I’ve been writing lots of integration tests lately, for a system that has zero unit tests. Does this make me a bad programmer? Probably. But it’s so easy! This is in Kohana, using ORM, and so the model basically is the database (which idea I rather like), and mocking it or splitting it out to be separate is just a pain. Far less code to write if one can test the whole interaction of the system at once.

I am being slightly tongue-in-cheek here, because I do realise that the maintenance burden of a system built with tight coupling between the various layers is likely to increase contiunually (to a point where someone at somepoint says “oh sod it, let’s rebuild from scratch on Drupal”). But for the multitude of systems that are basically just CRUD, the approach of writing tests that mimic the code seen in controllers is pretty simple and neat.

Entity Relationship Diagrams for WebDB

twyne_dev_erd

I have added a ERD module to WebDB, for producing simple little relationship diagrams like the one above. As with WebDB generally, it’s not about displaying the actual database schema but rather just a simple way to display the semantics of the schema (often, to users who don’t want to see even a single underscore!).

Initially, it selects all tables that are linked to at least one other table (either inwards or outwards), and then gives the user the option of hiding or showing each table.

I’m moving WebDB towards a modular structure, and one that can be installed with Packagist. Not there yet though.

Joining DB records to the ‘highest’ of their related (1:m) data

I often have cause to query a database table and join it with a subtable of related data (i.e. a table with which this one has a one-to-many relationship), and to retrieve only the highest (or lowest) of related records (based on some sortable field; often a date). Usually in MySQL, and often to be used in a view and so sub-queries are not an option.

The key is to

  1. Outer join with the subtable twice (well, the first join can be inner, if a subrecord result is required);
  2. add, to the second join condition, the constraint that this second-subtable’s sorting field must be greater than its first-subtable counterpart; and
  3. add a where condition that the second-subtable’s primary key be null.

Why this is so hard to remember, I do not know.

For example, for these two tables

records table
id name
1 Record One
2 Record Two
3 Record Three
subrecords table
id record_id date
1 1 2013-01-10
2 1 2013-01-09
3 1 2013-01-05
4 3 2013-01-10

the SQL is

SELECT *
FROM records r
    LEFT JOIN subrecords s1 ON (s1.record_id = r.id)
    LEFT JOIN subrecords s2 ON (s2.record_id = r.id AND s2.date > s1.date)
WHERE s2.id IS NULL
ORDER BY r.id

to give a result of

r s1 s2
id name id record_id date id record_id date
1 Record One 1 1 2013-01-10 NULL NULL NULL
2 Record Two NULL NULL NULL NULL NULL NULL
3 Record Three 4 3 2013-01-10 NULL NULL NULL

Of course the list of selected columns needs to be changed.

On What Gets Kept, and Changing How Over Time

“Make things that can be archived (databases cannot be, not if you don’t also store the application that reads them). Make it possible to change one’s data structures (the ways in which things are stored — not the file formats, so much), and leave old data alone. To update, copy and morph; don’t try to force everything into the new system. Files are good for this; their formats should be standardised though, of course.”

WebDB updated

Yesterday I added a little fix to WebDB for a bug that prevented filtering on column names that occur in both the table being filtered and any of the tables referred to by foreign keys. This most often applied to id columns.

[Update:] And today I’ve (hopefully) fixed the problem with foreign-key fields in edit forms not being able to be emptied. How did that live this long?! Sorry.

How to kill WordPress

(Other than mis-capitalising its ‘P’, that is.)

I have had a wordpress site (see, I’m failing even to give it a capital ‘W’ now) for many years, since about… umm… November 3rd 2003 at 16:01:23 or thereabouts. I’m finally sick of it. It’s grown and grown and is trying so hard to be everything to every blogger out there, that I don’t know where I stand with it. It used to be fun, y’know?! A codebase I could fiddle with, and make do whatever I wanted. Now, I just find it very good for doing things that I don’t need to do.

So, I’m dumping it. No more WordPress.

Now the question is: how to migrate away from it? The important thing (although, really, I’m not actually that fussed about it; it’s more a pride thing — a web geek hardly wants to go against Tim BL’s advice, does he?) is to preserve URIs, at least the important ones.

So I started by making a final backup — all files, WP core included, and the database dump — and moving that tarball out of my usual backup rotation. So I’ve got a snapshot of the site, that will never fall off the far end of my backups. You never know (to quote Duane Dibbley).

Then, I inserted all of the WP posts into my new system’s database::

INSERT INTO journal_entries (id, title, date_and_time, entry_text)
  SELECT id+1000, post_title, post_date, post_content
    FROM wp_posts
    WHERE post_type = 'post' AND post_status = 'publish'
    ORDER BY post_date ASC

The +1000 on the ID was to ensure that I could refer to the new IDs of the imported posts in the next section, the redirections (there were fewer than 1000 records already in the journal_entries table)::

  SELECT CONCAT(
    'Redirect permanent /',
    YEAR(post_date),'/',
    LPAD(MONTH(post_date),2,'0'),'/',
    LPAD(DAY(post_date),2,'0'),'/',
    post_name,' ',
    'http://samwilson.id.au/journal/view/',
    id+1000
  ) AS redirection
  FROM wp_posts
    WHERE post_type = 'post' AND post_status = 'publish'
    ORDER BY post_date ASC

And I dumped all that into .htaccess.