Why don’t we just start blogging and rolling again? Why not go back to making content on my own, er leased, servers? It was never that hard to make a blog and it’s even easier now. Within a few minutes, good ‘ol Dreamhost (proud customer since 2001) and WordPress had me back up again. So I’m going to start cranking out blog posts again even if only two people read them. I’ll tell you, it’s much easier to write and read a blog post than a tweet storm.
— Mike Tatum, https://www.miketatum.com/2018/08/20/think/
How about if, instead of ditching Twitter for Mastodon, we all start blogging and subscribing to each other's Atom feeds again instead? The original distributed social network could still work pretty well if we actually start using it
— Simon Willison (@simonw) August 18, 2018
How about if, instead of ditching Twitter for Mastodon, we all start blogging and subscribing to each other’s Atom feeds again instead? The original distributed social network could still work pretty well if we actually start using it
I finally closed my Dreamhost account yesterday, after nearly 11 years of hosting my websites, email accounts & lists, Jabber, and code repos with them. Over the last couple of years I’ve been migrating things to other providers: the web stuff to Digital Ocean, email to Fastmail, and a few bits and pieces to Google and Github. It’s a nice feeling to shed this old account! And a bit sad.
I now have just three main computers in my life: my phone, laptop, and VPS.
It’s not because Dreamhost is bad that I’m leaving, not at all: it’s just about simplifying things, gaining control, and feeling confident enough to manage a Linux server on my own. If I wanted a shared hosting account, I’d still use them, and I dare say I’ll carry on recommending them to anyone looking for a place to host their own WordPress, MediaWiki, or Piwigo etc.
I get the impression that self-hosting isn’t as popular as it once was, and that people prefer to hand their data off to firms who will dictate structure and offer reliability and ease. I think that’s partly because of the network effect of it being somtimes better to use the same service as everyone else, and partly because self-hosted software isn’t as easy to use as it should be. The former is hard to fix (see Mastadon etc), but the latter is about building quality software that’s easy to install and maintain. That’s something I want to help with.
‘Embed’ here is what WordPress calls the ability to add a URL of a site on its own line in a post or page, and for a nice rendering of the site at that URL to be provided automatically. It works with core WordPress with sites like Youtube and Flickr, and somewhat for random other sites if they provide the right metadata. Piwigo does not yet provide particularly rich metadata (there are some ideas to do so, though), but anyway it’s nicer to be able to do something more complicated that uses the Piwigo API.
As a first hack at this, my plugin just shows the medium-sized image, with title below and description as the tooltip, and the image linked to the page on the Piwigo site. I plan on introducing caching, and perhaps some nicer display (dates, comment count, etc.). Ideas welcome!
New MediaWiki extension: AutoCategoriseUploads. It “automatically adds categories to new file uploads based on keyword metadata found in the file. The following metadata types are supported: XMP (many file types, including JPG, PNG, PDF, etc.); ITCP (JPG); ID3 (MP3)”.
Unfortunately there’s no code yet in the repository, so there’s nothing to test. Sounds interesting though.
Midwest Heritage of Western Australia is a terrific database of records of graves and deceased people in the mid-west region of WA.
I keep wanting to be able to recommend the ‘best’ way for people (who don’t like command lines) to get research stuff online. Is it Flickr, Zenodo, Internet Archive, Wikimedia, and Github? Or is it a shared hosting account on Dreamhost, running MediaWiki, WordPress, and Piwigo? I’d rather the latter! Is it really that hard to set up your own website? (I don’t think so, but I probably can’t see what I can’t see.)
Anyway, even if running your own website, one should still be putting stuff on Wikimedia projects. And even if not using it for everything, Flickr is a good place for photos (in Australia) because you can add them to the Australia in Pictures group and they’ll turn up in searches on Trove. The Internet Archive, even if not a primary and cited place for research materials, is a great place to upload wikis’ public page dumps. So it really seems that the remaining trouble with self-hosting websites is that they’re fragile and subject to complete loss if you abandon them (i.e. stop paying the bills).
My current mitigation to my own sites’ reliance on me is to create annual dumps in multiple formats, including uploading public stuff to IA, and printing some things, and burning all to Blu-ray discs that get stored in polypropylene sleeves in the dark in places I can forget to throw them out. (Of course, I deal in tiny amounts of data, and no video.)
What was it Robert Graves said in I, Claudius about the best way to ensure the survival of a document being to just leave it sitting on ones desk and not try at all to do anything special — because it’s all perfectly random anyway as to what persists, and we can not influence the universe in any meaningful way?
There’s a new extension recently been added to mediawiki.org, called DocBookExport. It provides a system of defining a book’s structure (a set of pages and some title and other metadata) and then pipes the pages’ HTML through Pandoc and out into DocBook format, from where it can be turned into PDF or just downloaded as-is.
There are a few issues with getting the extension to run (e.g. it wants to write to its own directory, rather than a normal place for temporary files), and I haven’t actually managed to get it fully functioning. But the idea is interesting. Certainly, there are some limitations with Pandoc, but mostly it’s remarkably good at converting things.
It seems that DocBookExport, and any other MediaWiki export or format conversion system, works best when the wiki pages (and their templates etc.) are written with the output formats in mind. Then, one can avoid things such as web-only formatting conventions that make PDF (or epub, or man page) generation trickier.
I have been experimenting with turning Wikisource works into LaTeX-formatted bindable PDFs. My initial idea was to produce quatro or octavo layout sheets (i.e. 8 or 16 book pages to a sheet of paper that’s printed on both sides and has the pages layed out in such a way as when the sheet is folded the pages are in the correct order) but now I’m thinking of just using a print-on-demand service (hopefully Pediapress, because they seem pretty brilliant).
Basically, my tool downloads all of a work’s pages and subpages (in the main namespace only; it doesn’t care about the method of construction of the work) and saves the HTML for these, in order, to a html/ directory. Then (here’s the crux of the thing) it uses Pandoc to create a set of matching TeX files in an adjacent latex/ directory.
So far, so obvious. But the trouble with this approach of wanting to create a separate source format for a work is that there are changes that one wants to make to the work (either formatting or structural) that can’t be made upstream on Wikisource — but we also want to be able to bring down updates at any time from Wikisource. That is to say, this is creating a fork of the work in a different format, but it’s a fork that needs to be able to be kept up to date.
My current solution to this is to save the HTML and LaTeX files in a Git repository (one per work) and have two branches: one containing the raw un-edited HTML and LaTeX, on which the download operation can be re-run at any time; and the other being based off this, being a place to make any edits required, and which can have the first merged into it whenever that’s updated. This will sometimes result in merge conflicts, but for the most part (because the upstream changes are generally small typo fixes and the like) will happen without error.
Now I just want to automate all this a little bit more, so a new project can be created (with GitHub repo and all) with a single (albeit slow!) command.
The output ends up something like The Nether World by George Gissing.pdf.
The first folder of the C.F. Barker Archives’ material is done: finished scanning and initial entry into ArchivesWiki. This is my attempt to use MediaWiki as a digital archive platform for physical records (and digitally-created ones, although they don’t feature as much in the physical folders). It’s reasonably satisfactory so far, although there’s lots that’s a bit frustrating. I’m attempting to document what I’m doing (in a Wikibook), and there’s more to figure out.
There are a few key parts to it; two stand out as a bit weird. Firstly, the structure of access control is that completely separate wikis are created for each group of access required. This can make it tricky linking things together, but makes for much clearer separation of privacy, and almost removes the possibility of things being inadvertently made public when they shouldn’t be. The second is that the File namespace is not used at all for file descriptions. Files are considered more like ‘attachments’ and their metadata is contained on main-namespace pages, where the files are displayed. This means that files are not considered to be archival items (except of course when they are; i.e. digitally-created ones!), but just representations of them, and for example multiple file types or differently cropped photos can all appear on a single item’s record. The basic idea is to have a single page that encapsulates the entire item (it doesn’t matter if the item is just a single photograph, and the system also works when the ‘item’ is an aggregate item of, for example, a whole box of photos being accessioned into ArchivesWiki).