A Tuesday morning

The internet seems to be full of bad news at the moment, so I’m ignoring it and focussing on some nice little code projects that are calming and friendly and only slightly maniacally frustrating.

SVG Translate tickets are petering out; version 0.10.3 is just out with some login fixes and language updates.

This morning I’m working on adding a ‘messages exist’ notification icon to the PageTriage toolbar.

And I think I’ve cracked the search-debouncing for my Embed Wikimedia plugin, so might be able to move on with the metadata- and appearance-improving stuff that I’ve been trying to do for weeks.

MediaLoader extension

There’s a new MediaWiki extension that’s just been published: MediaLoader. It looks like it’s supposed to load media items such as images, videos, etc. on demand. I haven’t been able to get it to actually work (there’s some strange Composer loading stuff going on in its code) but I think it works by displaying a click-able bit of text such as ‘Load example.jpg’ (not actually a link) that, when clicked, turns into the image or whatever. All it’s doing for me right now is turning into the raw wikitext, but maybe there’s something I’m missing.

I guess the idea is to not download/display the image if its not wanted by the user?

Anyway, it’s new, and it’s always nice to see a new extension being made. Huzza!

Embedding Wikimedia URLs in WordPress

I’ve had a stab at a WordPress plugin for embedding Wikimedia URLs: https://github.com/samwilson/embed-wikimedia

It’s of course just a draft and proof-of-concept and beta and rough at the moment. It only supports Wikipedia and Wikimedia Commons; I’m going to add Wikidata next, I think, and then Wikisource (although that will mostly be a reformatted version of the Wikidata one, because all relevant metadata about Wikisource items is in Wikidata).

I have no idea if it’s very useful. I mainly want it for Commons photos, and Wikisource books.

Wrong date? Just add 3½ days

More PHP date weirdness, this time in the Cargo extension for MediaWiki:

+		// 'o' is better than 'Y' because it does not add leading
+		// zeroes to years with fewer than four digits.
+		// For some reason, though, this fails for some years -
+		// returning one year lower than it's supposed to - unless you
+		// add the equivalent of 3 days or more to the number of
+		// seconds. Is that a leap day thing? Weird PHP bug? Who knows.
+		// Anyway, it's easy to get around.
+		$yearString = date( 'o', $seconds + 300000 );

New MediaWiki extension: AutoCategoriseUploads

New MediaWiki extension: AutoCategoriseUploads. It “automatically adds categories to new file uploads based on keyword metadata found in the file. The following metadata types are supported: XMP (many file types, including JPG, PNG, PDF, etc.); ITCP (JPG); ID3 (MP3)”.

Unfortunately there’s no code yet in the repository, so there’s nothing to test. Sounds interesting though.

Extension:DocBookExport

There’s a new extension recently been added to mediawiki.org, called DocBookExport. It provides a system of defining a book’s structure (a set of pages and some title and other metadata) and then pipes the pages’ HTML through Pandoc and out into DocBook format, from where it can be turned into PDF or just downloaded as-is.

There are a few issues with getting the extension to run (e.g. it wants to write to its own directory, rather than a normal place for temporary files), and I haven’t actually managed to get it fully functioning. But the idea is interesting. Certainly, there are some limitations with Pandoc, but mostly it’s remarkably good at converting things.

It seems that DocBookExport, and any other MediaWiki export or format conversion system, works best when the wiki pages (and their templates etc.) are written with the output formats in mind. Then, one can avoid things such as web-only formatting conventions that make PDF (or epub, or man page) generation trickier.

Wikisource books for binding

I have been experimenting with turning Wikisource works into LaTeX-formatted bindable PDFs. My initial idea was to produce quatro or octavo layout sheets (i.e. 8 or 16 book pages to a sheet of paper that’s printed on both sides and has the pages layed out in such a way as when the sheet is folded the pages are in the correct order) but now I’m thinking of just using a print-on-demand service (hopefully Pediapress, because they seem pretty brilliant).

Basically, my tool downloads all of a work’s pages and subpages (in the main namespace only; it doesn’t care about the method of construction of the work) and saves the HTML for these, in order, to a html/ directory. Then (here’s the crux of the thing) it uses Pandoc to create a set of matching TeX files in an adjacent latex/ directory.

So far, so obvious. But the trouble with this approach of wanting to create a separate source format for a work is that there are changes that one wants to make to the work (either formatting or structural) that can’t be made upstream on Wikisource — but we also want to be able to bring down updates at any time from Wikisource. That is to say, this is creating a fork of the work in a different format, but it’s a fork that needs to be able to be kept up to date.

My current solution to this is to save the HTML and LaTeX files in a Git repository (one per work) and have two branches: one containing the raw un-edited HTML and LaTeX, on which the download operation can be re-run at any time; and the other being based off this, being a place to make any edits required, and which can have the first merged into it whenever that’s updated. This will sometimes result in merge conflicts, but for the most part (because the upstream changes are generally small typo fixes and the like) will happen without error.

Now I just want to automate all this a little bit more, so a new project can be created (with GitHub repo and all) with a single (albeit slow!) command.

The output ends up something like The Nether World by George Gissing.pdf.

Display Title extension

The MediaWiki Display Title extension is pretty cool. It uses a page’s display title in all links to that page. That might not sound like much, but it’s really useful to only have to change the title in one place, and have it show correctly all over the wiki. (This is much the same as Dokuwiki with the useheading configuration variable set to 1).

This is the sort of extension that I really like: it does a small thing, but does it well, and it makes sense as an addition to the core software. It’s not trying to do something completely different and just sit on top of or inside MediaWiki. It’s also not something that everyone would want, and so does belong as an extension and not an addition to core (even though the display title feature is part of core).

The other thing the Display Title extension provides is a parser function for retrieving the display title of any page: {{#getdisplaytitle:A page name}}, so you can use the display title without creating a link.