Piwigo Embeds (for WordPress)

Here’s my first draft at making Piwigo sites embeddable in WordPress: github.com/samwilson/piwigo-embeds

‘Embed’ here is what WordPress calls the ability to add a URL of a site on its own line in a post or page, and for a nice rendering of the site at that URL to be provided automatically. It works with core WordPress with sites like Youtube and Flickr, and somewhat for random other sites if they provide the right metadata. Piwigo does not yet provide particularly rich metadata (there are some ideas to do so, though), but anyway it’s nicer to be able to do something more complicated that uses the Piwigo API.

As a first hack at this, my plugin just shows the medium-sized image, with title below and description as the tooltip, and the image linked to the page on the Piwigo site. I plan on introducing caching, and perhaps some nicer display (dates, comment count, etc.). Ideas welcome!

New MediaWiki extension: AutoCategoriseUploads

New MediaWiki extension: AutoCategoriseUploads. It “automatically adds categories to new file uploads based on keyword metadata found in the file. The following metadata types are supported: XMP (many file types, including JPG, PNG, PDF, etc.); ITCP (JPG); ID3 (MP3)”.

Unfortunately there’s no code yet in the repository, so there’s nothing to test. Sounds interesting though.

Self-hosted websites are doomed to die

I keep wanting to be able to recommend the ‘best’ way for people (who don’t like command lines) to get research stuff online. Is it Flickr, Zenodo, Internet Archive, Wikimedia, and Github? Or is it a shared hosting account on Dreamhost, running MediaWiki, WordPress, and Piwigo? I’d rather the latter! Is it really that hard to set up your own website? (I don’t think so, but I probably can’t see what I can’t see.)

Anyway, even if running your own website, one should still be putting stuff on Wikimedia projects. And even if not using it for everything, Flickr is a good place for photos (in Australia) because you can add them to the Australia in Pictures group and they’ll turn up in searches on Trove. The Internet Archive, even if not a primary and cited place for research materials, is a great place to upload wikis’ public page dumps. So it really seems that the remaining trouble with self-hosting websites is that they’re fragile and subject to complete loss if you abandon them (i.e. stop paying the bills).

My current mitigation to my own sites’ reliance on me is to create annual dumps in multiple formats, including uploading public stuff to IA, and printing some things, and burning all to Blu-ray discs that get stored in polypropylene sleeves in the dark in places I can forget to throw them out. (Of course, I deal in tiny amounts of data, and no video.)

What was it Robert Graves said in I, Claudius about the best way to ensure the survival of a document being to just leave it sitting on ones desk and not try at all to do anything special — because it’s all perfectly random anyway as to what persists, and we can not influence the universe in any meaningful way?

Extension:DocBookExport

There’s a new extension recently been added to mediawiki.org, called DocBookExport. It provides a system of defining a book’s structure (a set of pages and some title and other metadata) and then pipes the pages’ HTML through Pandoc and out into DocBook format, from where it can be turned into PDF or just downloaded as-is.

There are a few issues with getting the extension to run (e.g. it wants to write to its own directory, rather than a normal place for temporary files), and I haven’t actually managed to get it fully functioning. But the idea is interesting. Certainly, there are some limitations with Pandoc, but mostly it’s remarkably good at converting things.

It seems that DocBookExport, and any other MediaWiki export or format conversion system, works best when the wiki pages (and their templates etc.) are written with the output formats in mind. Then, one can avoid things such as web-only formatting conventions that make PDF (or epub, or man page) generation trickier.

Wikisource books for binding

I have been experimenting with turning Wikisource works into LaTeX-formatted bindable PDFs. My initial idea was to produce quatro or octavo layout sheets (i.e. 8 or 16 book pages to a sheet of paper that’s printed on both sides and has the pages layed out in such a way as when the sheet is folded the pages are in the correct order) but now I’m thinking of just using a print-on-demand service (hopefully Pediapress, because they seem pretty brilliant).

Basically, my tool downloads all of a work’s pages and subpages (in the main namespace only; it doesn’t care about the method of construction of the work) and saves the HTML for these, in order, to a html/ directory. Then (here’s the crux of the thing) it uses Pandoc to create a set of matching TeX files in an adjacent latex/ directory.

So far, so obvious. But the trouble with this approach of wanting to create a separate source format for a work is that there are changes that one wants to make to the work (either formatting or structural) that can’t be made upstream on Wikisource — but we also want to be able to bring down updates at any time from Wikisource. That is to say, this is creating a fork of the work in a different format, but it’s a fork that needs to be able to be kept up to date.

My current solution to this is to save the HTML and LaTeX files in a Git repository (one per work) and have two branches: one containing the raw un-edited HTML and LaTeX, on which the download operation can be re-run at any time; and the other being based off this, being a place to make any edits required, and which can have the first merged into it whenever that’s updated. This will sometimes result in merge conflicts, but for the most part (because the upstream changes are generally small typo fixes and the like) will happen without error.

Now I just want to automate all this a little bit more, so a new project can be created (with GitHub repo and all) with a single (albeit slow!) command.

The output ends up something like The Nether World by George Gissing.pdf.

CFB Folder 1 done

The first folder of the C.F. Barker Archives’ material is done: finished scanning and initial entry into ArchivesWiki. This is my attempt to use MediaWiki as a digital archive platform for physical records (and digitally-created ones, although they don’t feature as much in the physical folders). It’s reasonably satisfactory so far, although there’s lots that’s a bit frustrating. I’m attempting to document what I’m doing (in a Wikibook), and there’s more to figure out.

There are a few key parts to it; two stand out as a bit weird. Firstly, the structure of access control is that completely separate wikis are created for each group of access required. This can make it tricky linking things together, but makes for much clearer separation of privacy, and almost removes the possibility of things being inadvertently made public when they shouldn’t be. The second is that the File namespace is not used at all for file descriptions. Files are considered more like ‘attachments’ and their metadata is contained on main-namespace pages, where the files are displayed. This means that files are not considered to be archival items (except of course when they are; i.e. digitally-created ones!), but just representations of them, and for example multiple file types or differently cropped photos can all appear on a single item’s record. The basic idea is to have a single page that encapsulates the entire item (it doesn’t matter if the item is just a single photograph, and the system also works when the ‘item’ is an aggregate item of, for example, a whole box of photos being accessioned into ArchivesWiki).

Display Title extension

The MediaWiki Display Title extension is pretty cool. It uses a page’s display title in all links to that page. That might not sound like much, but it’s really useful to only have to change the title in one place, and have it show correctly all over the wiki. (This is much the same as Dokuwiki with the useheading configuration variable set to 1).

This is the sort of extension that I really like: it does a small thing, but does it well, and it makes sense as an addition to the core software. It’s not trying to do something completely different and just sit on top of or inside MediaWiki. It’s also not something that everyone would want, and so does belong as an extension and not an addition to core (even though the display title feature is part of core).

The other thing the Display Title extension provides is a parser function for retrieving the display title of any page: {{#getdisplaytitle:A page name}}, so you can use the display title without creating a link.

On not hosting everything

I’ve been moving all my photos to Flickr lately. It’s been a long process, one complicated by the fact that it seems silly to run my own WordPress installation (and things like ArchivesWiki) if I’m not going to bother hosting everything myself. Of course, that’s not really very logical, and so I’ve decided that it’s perfectly okay to host photos on Flickr, videos on YouTube, and all the text (and miscellaneous) stuff here on my own server.