WikiCite 2017

(Firefox asked me to rate it this morning, with a little picture of a broken heart and five stars to select from. I gave it five (’cause it’s brilliant) and then it sent me to a survey on mozilla.com titled “Heavy User V2”, which sounds like the name of an confused interplanetary supply ship.)

Today WikiCite17 begins. Three days of talking and hacking about the galaxy that comprises Wikipedia, Wikidata, Wikisource, citations, and all bibliographic data. There are lots of different ways into this topic, and I’m focusing not on Wikipedia citations (which is the main drive of the conference, I think), but on getting (English) Wikisource metadata a tiny bit further along (e.g. figure out how to display work details on a Wikisource edition page); and on a little side project of adding a Wikidata-backed citation system to WordPress.

The former is currently stalled on me not understanding the details of P629 ‘edition or translation of’ — specifically whether it should be allowed to have multiple values.

The latter is rolling on quite well, and I’ve got it searching and displaying and the beginnings of updating ‘book’ records on Wikidata. Soon it shall be able to make lists of items, and insert the lists (or individual citations of items on them) into blog posts and pages. I’m not sure what the state of the art is in PHP of packages for formatting citations, but I’m hoping there’s something good out there.

And here is a scary chicken I saw yesterday at the Naturhistorisches Museum:

Scary chicken (Deinonychus antirrhopus)

Editing MediaWiki pages in an external editor

I’ve been working on a MediaWiki gadget lately, for editing Wikisource authors’ metadata without leaving the author page. It’s fun working with and learning more about OOjs-UI, but it’s also a pain because gadget code is kept in Javascript pages in the MediaWiki namespace, and so every single time you want to change something it’s a matter of saving the whole page, then clicking ‘edit’ again, and scrolling back down to find the spot you were at. The other end of things—the re-loading of whatever test page is running the gadget—is annoying and slow enough, without having to do much the same thing at the source end too.

So I’ve added a feature to the ExternalArticles extension that allows a whole directory full of text files to be imported at once (namespaces are handled as subdirectories). More importantly, it also ‘watches’ the directories and every time a file is updated (i.e. with Ctrl-S in a text editor or IDE) it is re-imported. So this means I can have MediaWiki:Gadget-Author.js and MediaWiki:Gadget-Author.css open in PhpStorm, and just edit from there. I even have these files open inside a MediaWiki project and so autocompletion and documentation look-up works as usual for all the library code. It’s even quite a speedy set-up, luckily: I haven’t yet noticed having to wait at any time between saving some code, alt-tabbing to the browser, and hitting F5.

I dare say my bodged-together script has many flaws, but it’s working for me for now!

New feature for ia-upload

I have been working on an addition to the IA Upload tool these last few days, and it’s ready for testing. Hopefully we’ll merge it tomorrow or the next day.

This is the first time I’ve done much work with the internal structure of DjVu files, and really it’s all been pretty straight-forward. A couple of odd bits about matching element and page names up between things, but once that was sorted it all seems to be working as it should.

It’s a shame that the Internet Archive has discontinued their production of DjVu files, but I guess they’ve got their reasons, and it’s not like anyone’s ever heard of DjVu anyway. I don’t suppose anyone other than Wikisource was using those files. Thankfully they’re still producing the DjVu XML that we need to make our own DjVus, and it sounds like they’re going to continue doing so (because they use the XML to produce the text versions of items).

Wikisource Hangout

I wonder how long it takes after someone first starts editing a Wikimedia project that they figure out that they can read lots of Wikimedia news on https://en.planet.wikimedia.org/ — and when, after that, they realise they can also post to the news there? (At which point they probably give up if they haven’t already got a blog.)

Anyway, I forgot that I can post news, but then I remembered. So:

There’s going to be a Wikisource meeting next weekend (28 January, on Google Hangouts), if you’re interested in joining:
https://meta.wikimedia.org/wiki/Wikisource_Community_User_Group/January_2017_Hangout

Penguin Classics portal on Wikisource

I’ve made a start of a system to pull data from Wikidata and generate a portal for the Penguin Classics, with appropriate links for those that are on Wikisource or are ready to be transcribed.

I’m a bit of a Sparql newbie, so perhaps this could’ve been done in a single query. However, I’m doing it in two stages: first, gathering all the ‘works’ that have at least one edition published by Penguin Classics, and then finding all editions of each of those works and seeing if any of them are on Wikisource. Oh, and including the ones that aren’t, too!

Wikidata:WikiProject Books sort of uses the FRBF model to represent primarily books and editions (‘editions’ being a combination of manifestation and expression levels of the FRBF; i.e. an edition realises and embodies a work). So most of the metadata we want exists at the ‘work’ level: title, author, date of first publication, genre, etc.

At the ‘edition’ level we look for a link to Wikisource (because a main-namespace item on Wikisource is an edition… although this gets messy; see below), and a link to the edition’s transcription project. Actually, we also look for these on the work itself, because often Wikidata has these properties there instead or as well — which is wrong.

Strictly speaking, the work metadata shouldn’t have anything about where the work is on Wikisource (either mainspace or Index file). The problem with adhering to this, however, is that by doing so we break interwiki links from Wikisource to Wiktionary. Because a Wikipedia article is (almost always) about a work, and we want to link a top-level Wikisource mainspace pages to this work… and the existing systems for doing this don’t allow for the intermediate step of going from Wikisource to the edition, then to the work and then to Wikipedia.

So for now, my scruffy little script looks for project links at both levels, and seems to do so successfully.

The main problem now is that there’s just not much data about these books on Wikidata! I’ll get working on that next…

2016 begins

It’s 2016 and it seems like a good time to attempt some new type of explanation of things. Things in general, I mean, and things internety. Or, maybe not ‘explanation’ so much as formless rambling. That’s easier on the brain, given the amount of sleep I’ve been getting (i.e. sod all).

I’m four days in to the new working year, and some good bits of code are already shaping up (file attachment fields and schema-editing in Tabulate, hopefully both ready to roll before too much longer). Some odd bits of enterprise bureaucracy have nearly fallen on my head but for the most part missed me (whereon I’ve attempted the old I-didn’t-see-anything trick, and carried on regardless).

I had a couple of weeks off, and explored some great bits of the south west. So nice to be back at Wilyabrup (not climbing, just looking, and some mapping). And I didn’t even take my GPS to Walpole; good to be not attempting to Record Everything for a while.

Things for this year, perhaps: Wikisource proofreading; importing Nyunga words into Wiktionary; carry on with Tabulate; print CFB at long last; go to Wikimania; try to write every day; get MoonMoon working again properly for Planet Freo. But mostly: stop re-evaluating everything and just get on with what’s (reasonably and probably not perfectly) good enough and worthwhile. Code less! Work on content and data more; code only what’s required.

Wikisource category browser now has other languages

I’ve updated the Wikisource validated works’ category browser tool to include other languages. So far it’s just Italian, and to some perhaps-incorrect extent French (there’s only four? that’s not right).

I just need more Wikisources to tell me the names of their validated works and root categories, and it’ll just be a matter of adding these to the config to get them running.

The category list is updated weekly.

Wikisource needs your input « Wikimedia blog

A new Wikisource survey is being conducted!

During the survey, you will be asked questions regarding your personal involvement with the Wikisource project, your preferences regarding governance and technology, and your opinion on how a Wikisource Conference should be shaped. With the support of Wikimedia Österreich and Wikimedia Italia, a Project and Event Grant proposal is to be presented for such a conference. We would like to involve Wikisourcers in a joint venture both to spread knowledge about the project and to strengthen community bonds. This,

Read more: Wikisource needs your input « Wikimedia blog

I now use curly quotation marks when proofreading

There are currently two things that are annoying me about Wikisource books. These are: the inclusion of hyperlinks (to be all 1990s about it, with using that word); and the usage of straight quotation marks.

Links I can forgive, or even actively enjoy, in non-fiction; but in fiction, they have no place. (So think I, anyway.) Especially when they link to a sodding dictionary term! I know how to look up a word I don’t know. Sigh.

The curly-vs-straight argument is an odd one. We only have straight ones thanks to typewriters (or their manufacturers, I guess) not wanting to have two sorts for each type of quotation mark. So why we persist I cannot say! No, I can say… it’s mostly to do with ease of typing, on common systems, I think. It’s annoying to type the opening and the closing glyphs, when there’s only one button on the keyboard. But really! That might hold sway where there’s no automatic system for handling these things, but we have those systems and they work admirably. And certainly, when it comes to typesetting books that are going to be read by (we hope) very many people, it’s worth putting a bit more effort in to make them look nice.

Because that’s what it’s about, ultimately: making the text beautiful! For how many hundreds of years have people been taking terrific care over making books look nice?! Let’s not give up on that.

I’m not really sure why I’m writing this, today. (Probably due to the glass of White Rabbit I’ve just here.) It’s that I’m firing with the zeal of the converted! I am, you see. I used to not care about quotes, and think they should be left straight — now, I stand on speakers’ corner and holler to confused passersby!

So, would that ye enjoy yr ebooks?! Then set them with loveliness!

Right… where’s that beer…