Welcome

Welcome
My coffee mug

Hello world, and welcome to my corner of the web. This is where I write words about what I'm working on, and post photographs of things I've seen.

I'm a Software Engineer at the Wikimedia Foundation, and so of course my personal website is a wiki (running on MediaWiki). In my spare time I volunteer with WikiClubWest to work on Wikimedia projects, mostly around my family's genealogy and local Western Australian history (especially to do with Fremantle). I try to keep up with issues on all the things I maintain (but usually fail), as well as listing the software that I use.

I try to find time to work in my workshop on various woodworking projects. Recently, that's been focused on building a metalworking bench, and will soon be about a set campaign-style drawers that's in the works. I've a good-sized workshop because I don't have a car.

Travel features in my life, not because I really hugely want to go elsewhere but because I just do — and also because then I can do some interesting mapping on OpenStreetMap, and take photos for Wikimedia Commons. Sometimes I ride my bike to get there, or walk, but more often it's planes, trains and ferries.

I'm currently reading the following books: A Puritan Bohemia (Margaret Sherwood, 1896), and Arrowsmith (Anon), and Doctor Thorne (Anthony Trollop), and The Countryside Companion (Tom Stephenson).

To contact me, you can email me, find me on Matrix as '@samwilson:matrix.org', the fediverse as @samwilson@wikis.world, or Telegram as @freosam. If you want to leave a comment on this site (by creating an account), you need to know the secret code Tuart (it's not very secret, but seems to be confusing enough for most spammers).

Below are my recent blog posts.




Switching from WordPress to an SSG

Fremantle

· WordPress · websites · hosting ·

More discussion today about WordPress, and people who suggest switching to a static site generator:

Jason Lefkowitz on 15 October 2024:

The WordPress drama has brought forward a bunch of nerds advocating different systems they think WordPress users should switch to, which mostly have illustrated how few nerds understand what makes WordPress appealing to its users in the first place

And:

Like, if your pitch for a system to replace WordPress starts with "first, learn Markdown and Git," I need you to understand that you are living in a completely different galaxy than the median WordPress user

One of the replies was from tante:

And people who think "dump a bunch of PHP files in a folder" can be replaced by a bunch of dockerized microservices and a textmode readme of 30 lines with 3 subtle mistakes really need to get out more.

All very true, I think. As much as I love the simplicity of a bunch of Markdown files in a Git repository, it's a way of working that doesn't seem to have captured the imaginations of a big swathe of bloggers. Not only that, but it doesn't solve a big part of blogging: managing photos and other files.

The idea of "put these files in a folder on your server, and do everything else via the browser" has been pretty powerful for the last couple of decades. I think it's time is probably over, and it's going to take another ten years to really decline fully. I suspect whatever replaces it will also not involve the command line.


When the bank opens

Fremantle

· timezones ·

I know I shouldn't be an internet pedant, but I am a programmer so it sort of comes with the role. My bank has sent me a message telling me to call them and that their opening hours are "9AM to 6:30PM AEST" (i.e. UTC+10). That's fine, you might think — but AEST finished about a week ago! T'otherside, or most of it, is now in AEDT (UTC+11). So do they actually mean that they're open from 10AM? No, of course not (well, I assume not…). They're just using the "AEST" abbreviation as a way of saying "Melbourne time".

Anyway, I've ranted long enough that it's now after 10 over there and I can call them, regardless of what they mean.


WordPress plugin politics

Fremantle

· WordPress ·

I used to really like working on WordPress plugins (I did it as large part of my job for quite a few years, and maintained a few open source plugins for much longer), and the recent news about Matt Mullenweg being a being a complete muppet is making me sad. It used to seem that there was a real separation between wordpress.com and wordpress.org, and that the latter was a community organisation that existed for the benefit of the users and developers of the software. But now it seems that if Matt Mullenweg decides then you can get your plugin not even just banned but actually usurped in-place and renamed. That's not how you fork something.

I don't know much about WP Engine, and I suspect they may well be a pretty uncaring company (i.e. a normal capitalist one) but it never mattered. There have always been people trying to make shit plugins and services on top of WordPress, and it's never been a problem. There have always been far more who are small businesses doing great stuff, and the fact that wordpress.org has provided hosting and distribution for their (open source) work has never been an issue. If WP Engine was really using resources that were costing too much, the sensible thing would've been to put rate limits in place, or demand that they have their own plugin cache (why are all WP Engine blogs updating independently via the main repo anyway?).

None of it would've been worth commenting on, but now that wordpress.org has got this stupid checkbox on the login form, and a plugin has been forcibly taken over, it just feels horrible. This one boring techbro has come and stuffed everything up! And after twenty years of everything generally going well, it's a shame.

I only maintain a few of WordPress blogs now, for family and friends, and I don't think I'll be recommending that they migrate away from it. But I certainly won't be adding any new ones or bothering updating any of my old plugins (not that I was going to anyway).


OCR on Wikisource

Fremantle

· Wikisource · OCR · Wikimedia · transcription ·

I've been attempting this weekend to get back to sorting out some of the OCR tool's nomenclature around languages and text recognition models. It's the sort of job that's not too hard but touches lots of bits of code, and in this case two separate codebases, so any changes are easier to do piecemeal and must maintain backwards compatibility. When the first Wikisource OCR tools were built, they used Tesseract initially, and Google Cloud Vision after that, and both of those talk about 'languages' as one of the parameters to set when OCRing an image. Google goes as far as saying you must use BCP-47 identifiers.

This is what the on-wiki dialog looks like (with the new label).

But they're not really 'languages' — you can, for instance, tell Tesseract to use Cyrillic (i.e. a writing system used by quite a few languages) — and when we added Transkribus it started to become even clearer that we needed to do something to reduce the confusion around this (Transkribus puts the idea of trained models front and centre).

After all, it does make sense to not think of OCR in terms of language — many languages are written with similar scripts, and OCR is all about shapes and patterns and the likelihood of certain blobs of ink being intended to be particular characters or lines of text. It doesn't care about grammar or meanings or syntax or morphology (although do note that I'm not a linguist nor do I actually know anything about OCR or computer vision!).

Does "text recognition model" mean anything to Wikisource users though? I guess the term 'model' is pretty widespread at the moment (thanks to all this AI bollocks), so perhaps it's clear enough. And it will hopefully separate the ideas of a given Wikisource's content language from what OCR model should be picked for any given work (i.e. they're often the same, and we do set a default for each Wikisource, but a different model might work better for any particular scanned work).


Railfest 2024

Fremantle

· trains ·

Flyer for Railfest 2024.

We had a bit of a chat about the WAGR Fs 460 loco, and how it could be set up on Commons and Wikidata.

I have a bunch of other photos to upload (and items and categories to create)… so will tag this as [todo].


WA Biographical Index

Fremantle

· Wikimedia · Western Australia · datsets · open data ·

I noticed the other day that the Western Australian Biographical Index is licensed under CC-BY, so I thought I'd try to copy relevant entries to Freopedia (and other things). I downloaded the 18 CSV files:

   A-final.csv  DE-final.csv  H-final.csv   L-final_edited.csv  O-final.csv   S-final.csv
   B-final.csv  F-final.csv   IJ-final.csv  M-final.csv         PQ-final.csv  T-final.csv
   C-final.csv  G-final.csv   K-final.csv   N-final.csv         R-final.csv   UVXYZ-final.csv

Combined them into one, without their header rows (which were confirmed to exist before doing this):

$ awk '(NR == 1) || (FNR > 1)' *.csv > wabi.csv

This was imported into OpenRefine, and resulted in 85,403 records.

Found duplicates by sorting, applying "reorder rows permanently", and then "edit cells" > "blank down". The blanks can then be faceted on, and 421 duplicates were found, e.g. PQ/P2626 (where the second here is the correct record):

POCOCK Ruth Elsie May b. 1900. m. N.S.W. 1928 Edwin Lennard MINCHIN

vs.

PLUSH Edward, son of Thomas Hall (artist). arr 18.3.1886 per Albany (steerage) from SA - listed as G. Plush. m. 1.1.1890 (Perth C/E) Amelia GOLDING, dtr. of William (gardener). PERTH painter. Joined the Police force 1886.

That meant there were 84,982 unique cards.

These were imported to a Mix'n'Match catalogue: https://mix-n-match.toolforge.org/#/catalog/6490 For this, the card text had to be truncated.

I proposed a new property on Wikidata, and it was approved and created a week or so after.

Now the task is to link items to the WABI, probably starting with any mentioning Fremantle.


Freo skin for MediaWiki

Fremantle

· MediaWiki · skins · Freopedia · Wikimedia ·

I've been working a bit lately on Freopedia, and have started building a new MediaWiki skin for it. I just want something very simple. I'd go with any of the existing ones, but it's nice to have different sites looking different I think. I've also been wanting to experiment with putting things in the menus that make most sense to me — I've never really understood why we have 'page information' next to 'upload file'. I'm putting all "per page" actions together in a page menu, and all "whole of site" actions in a site munu. The user menu stays pretty much the same (user page, log in/out, preferences, etc.). The page information and Cargo page data links will sit along with edit, move, delete, etc.

I'll get around to setting up mw:Skin:Freo soon. Once I've convinced myself this isn't all a bit silly.


Southern side of the old synagogue

Fremantle

· Fremantle ·

I should probably take some more photos of things that will be harder to see when the new police station is built. If it ever is.

The beer garden side.

View older posts: ·1998 · 1999 · 2000 · 2001 · 2002 · 2003 · 2004 · 2005 · 2006 · 2007 · 2008 · 2009 · 2010 · 2011 · 2012 · 2013 · 2014 · 2015 · 2016 · 2017 · 2018 · 2019 · 2020 · 2021 · 2022 · 2023 · 2024 ·