Hello world, and welcome to my corner of the web. This is where I write words about what I'm working on, and post photographs of things I've seen.
I'm a Software Engineer at the Wikimedia Foundation, and so of course my personal website is a wiki (running on MediaWiki). In my spare time I volunteer with WikiClubWest to work on Wikimedia projects, mostly around my family's genealogy and local Western Australian history (especially to do with Fremantle). I try to keep up with issues on all the things I maintain (but usually fail), as well as listing the software that I use.
I try to find time to work in my workshop on various woodworking projects. Recently, that's been focused on building a metalworking bench, and will soon be about a set campaign-style drawers that's in the works. I've a good-sized workshop because I don't have a car.
Travel features in my life, not because I really hugely want to go elsewhere but because I just do — and also because then I can do some interesting mapping on OpenStreetMap, and take photos for Wikimedia Commons. Sometimes I ride my bike to get there, or walk, but more often it's planes, trains and ferries.
I'm currently reading the following books: A Puritan Bohemia (Margaret Sherwood, 1896), and Arrowsmith (Anon), and Doctor Thorne (Anthony Trollop), and The Countryside Companion (Tom Stephenson).
To contact me, you can email me, find me on Matrix as '@samwilson:matrix.org', the fediverse as @samwilson@wikis.world, or Telegram as @freosam. If you want to leave a comment on this site (by creating an account), you need to know the secret code Tuart
(it's not very secret, but seems to be confusing enough for most spammers).
Below are my recent blog posts.
Adding photos to a wiki
Fremantle
I've been attempting to do a few things lately that involve adding a whole bunch of photos to a wiki, with the aim being that there will be more information added about them at a later date.
The idea is to make it as easy as possible, so that the photography can continue. This sometimes depends on access to a decent internet connection, so it's not a synchronisation process (although I've experimented with that as well). The usual approach is to create a new wiki page, and use the MsUpload extension to dump 25 files at a time into a gallery, creating subpages that list 100 files each. The files are given names that we know are unique on the wiki, so there's no need to use the annoying interface to rename them (so that saves a fair bit of time). The slow parts are the uploading, as well as after saving the page when all the thumbnails are generated. But it mostly all works well enough.
The trouble comes after the files have been uploaded. MediaWiki's gallery syntax isn't the most flexible, it basically lets you have a thumbnail with a caption, and optionally a link that can go to somewhere other than the file's own page. Additionally, editing large galleries like these via VisualEditor is really not fun. So how best to quickly add notes about each photo (or sometimes, groups of photos), when those notes actually properly belong on per-photo description pages? We don't have time at the point of uploading to create those other pages (the idea is that all the extra metadata can be derived from the photos, and it's only the 'special' metadata that we want to capture right away). It looks like it's going to be some sort of custom 'gallery-like' template, a bit like how the G template on Commons does it, with weird custom parsing of pipes and newlines.
Then each photo can be rendered with an accession number that links (along with linking the thumbnail) to either the record page after it exists — but more usefully, before that exists, to a preloaded edit link at which the caption and other info can be added.
(I'd planned to put the actual template here… but there's other stuff going on today, so it'll have to wait….)
FOSS4G Perth 2024
Fremantle
· FOSS4G · Geogeeks · conferences ·
Restoration of the Crafts Council Centre
Perth
I was waiting at the bus stop on Wellington Street, and noticed this plaque on the end wall of the railway station. There wasn't really any information about what the Crafts Council is or was. I guess it's something related to World Crafts Council Australia, or maybe the wikipedia:Australia Council, but I don't think I'm interested enough to find out more.
The Crafts Council Centre of Western Australia was officially opened on Wednesday February 8th 1984 by the Hon. Ron Davies MLA Minister for the Arts. Funding for the restoration of this centre was made available from the Government of Western Australia through Instant Lottery funds. Phillip Douglas, President. Anna S. Petterson, Executive Director.
Fremantle Studies and FOSS4G-Perth
Fremantle
· Fremantle · OSM · Geogeeks ·
There is not one but two exciting conferences happening this coming week, and both in Fremantle! (Amazing. Nothing ever happens actually in Freo.)
The first is on Sunday (from 1PM), with Fremantle Studies Day, an afternoon of four talks at the History Centre:
- Simon Meath and Anne Smith, The Boys Reformatory, Rottnest Island Prison: Forgotten genocide site of the Frontier Wars
- Caroline Ingram: Dead in the water: The life and trial of Margaret Cody
- Nick Everett: Wobblies on the waterfront: The Industrial Workers of the World in Fremantle during WWI
- Kiara Gormlie: The founding women of Soroptimist International Fremantle: Early intentions and lasting legacies
Then, on Wednesday (10:30AM–9PM) as part of the ISPRS Technical Commission IV Symposium, is FOSS4G Perth:
- Andrew Dowding, Kass Boladeras and Tim Cable: Empowering Indigenous Communities with GIS: Micro-credentialing for First Nations Land Management
- Patrick Morrison: Discovering shipwrecks using open datasets
- Hidenori Fujimura: Smart Maps Portable: JICA Enhances Geospatial Capacity using Raspberry Pi
- Anna Ischenko: Mapping water trees and tracing travel routes in Noongar Boodjar
- Cholena Smart: An Overview of Open Source Web Mapping Tools
- Adam Abdul Razak: Building Identification on Campus: A CityGML-Based AR Smartphone App
- Nathan Regan: Critical analysis of methods used in public transport accessibility
- John Lang: QGIS for Subsea Route Analysis using Projection for Vertical Exaggeration
- Chris Scott: How open source transformed MNG's PIT reports: From QGIS to Mergin Maps
- Michel M. Nzikou: The geologist toolbox QGIS plugin
- Grant Boxer: Hyperspectral satellite imagery in QGIS
- Gabriel Diosan: Building the Network
- Jack Green: QGIS Plugins for Mineral Exploration
- Roberto Lujan Rocha: Remote sensing for scalable weed mapping in Agriculture
- Monica Danilevicz: Exploring open-source methods for anomaly identification in agricultural fields
- Nimalika Fernando: Lost in indoors? Indoor mapping for navigation using FOSS4G tools
- Prabhjot Kaur Virk: Smartphone based Indoor Pathfinding Application for the Visually Impaired
- Alexandra Maskell: Harnessing GIS and Free Open-Source Data for Flood Risk Assessments
- Ana Carvalho: Validation of 32 Years of Fire Records in the Mundaring Catchment
- Diana Ong: LLM generated python for geospatial analysis in GDAL native environment
- Lavender (Qingxiang) Liu: EO-Insights: Accelerating Open Earth Observation Data Management & Analysis
- Nick Wright: Training sensor-agnostic deep learning models for remote sensing
- Duncan Kinnear: Canopy Conundrum: How FOSS Helps us to See the Forest Through the Trees
Switching from WordPress to an SSG
Fremantle
· WordPress · websites · hosting ·
More discussion today about WordPress, and people who suggest switching to a static site generator:
Jason Lefkowitz on 15 October 2024:
The WordPress drama has brought forward a bunch of nerds advocating different systems they think WordPress users should switch to, which mostly have illustrated how few nerds understand what makes WordPress appealing to its users in the first place
And:
Like, if your pitch for a system to replace WordPress starts with "first, learn Markdown and Git," I need you to understand that you are living in a completely different galaxy than the median WordPress user
One of the replies was from tante:
And people who think "dump a bunch of PHP files in a folder" can be replaced by a bunch of dockerized microservices and a textmode readme of 30 lines with 3 subtle mistakes really need to get out more.
All very true, I think. As much as I love the simplicity of a bunch of Markdown files in a Git repository, it's a way of working that doesn't seem to have captured the imaginations of a big swathe of bloggers. Not only that, but it doesn't solve a big part of blogging: managing photos and other files.
The idea of "put these files in a folder on your server, and do everything else via the browser" has been pretty powerful for the last couple of decades. I think it's time is probably over, and it's going to take another ten years to really decline fully. I suspect whatever replaces it will also not involve the command line.
When the bank opens
Fremantle
· timezones ·
I know I shouldn't be an internet pedant, but I am a programmer so it sort of comes with the role. My bank has sent me a message telling me to call them and that their opening hours are "9AM to 6:30PM AEST" (i.e. UTC+10). That's fine, you might think — but AEST finished about a week ago! T'otherside, or most of it, is now in AEDT (UTC+11). So do they actually mean that they're open from 10AM? No, of course not (well, I assume not…). They're just using the "AEST" abbreviation as a way of saying "Melbourne time".
Anyway, I've ranted long enough that it's now after 10 over there and I can call them, regardless of what they mean.
WordPress plugin politics
Fremantle
· WordPress ·
I used to really like working on WordPress plugins (I did it as large part of my job for quite a few years, and maintained a few open source plugins for much longer), and the recent news about Matt Mullenweg being a being a complete muppet is making me sad. It used to seem that there was a real separation between wordpress.com and wordpress.org, and that the latter was a community organisation that existed for the benefit of the users and developers of the software. But now it seems that if Matt Mullenweg decides then you can get your plugin not even just banned but actually usurped in-place and renamed. That's not how you fork something.
I don't know much about WP Engine, and I suspect they may well be a pretty uncaring company (i.e. a normal capitalist one) but it never mattered. There have always been people trying to make shit plugins and services on top of WordPress, and it's never been a problem. There have always been far more who are small businesses doing great stuff, and the fact that wordpress.org has provided hosting and distribution for their (open source) work has never been an issue. If WP Engine was really using resources that were costing too much, the sensible thing would've been to put rate limits in place, or demand that they have their own plugin cache (why are all WP Engine blogs updating independently via the main repo anyway?).
None of it would've been worth commenting on, but now that wordpress.org has got this stupid checkbox on the login form, and a plugin has been forcibly taken over, it just feels horrible. This one boring techbro has come and stuffed everything up! And after twenty years of everything generally going well, it's a shame.
I only maintain a few of WordPress blogs now, for family and friends, and I don't think I'll be recommending that they migrate away from it. But I certainly won't be adding any new ones or bothering updating any of my old plugins (not that I was going to anyway).
OCR on Wikisource
Fremantle
· Wikisource · OCR · Wikimedia · transcription ·
I've been attempting this weekend to get back to sorting out some of the OCR tool's nomenclature around languages and text recognition models. It's the sort of job that's not too hard but touches lots of bits of code, and in this case two separate codebases, so any changes are easier to do piecemeal and must maintain backwards compatibility. When the first Wikisource OCR tools were built, they used Tesseract initially, and Google Cloud Vision after that, and both of those talk about 'languages' as one of the parameters to set when OCRing an image. Google goes as far as saying you must use BCP-47 identifiers.
But they're not really 'languages' — you can, for instance, tell Tesseract to use Cyrillic
(i.e. a writing system used by quite a few languages) — and when we added Transkribus it started to become even clearer that we needed to do something to reduce the confusion around this (Transkribus puts the idea of trained models front and centre).
After all, it does make sense to not think of OCR in terms of language — many languages are written with similar scripts, and OCR is all about shapes and patterns and the likelihood of certain blobs of ink being intended to be particular characters or lines of text. It doesn't care about grammar or meanings or syntax or morphology (although do note that I'm not a linguist nor do I actually know anything about OCR or computer vision!).
Does "text recognition model" mean anything to Wikisource users though? I guess the term 'model' is pretty widespread at the moment (thanks to all this AI bollocks), so perhaps it's clear enough. And it will hopefully separate the ideas of a given Wikisource's content language from what OCR model should be picked for any given work (i.e. they're often the same, and we do set a default for each Wikisource, but a different model might work better for any particular scanned work).
View older posts: ·1998 · 1999 · 2000 · 2001 · 2002 · 2003 · 2004 · 2005 · 2006 · 2007 · 2008 · 2009 · 2010 · 2011 · 2012 · 2013 · 2014 · 2015 · 2016 · 2017 · 2018 · 2019 · 2020 · 2021 · 2022 · 2023 · 2024 ·