Self-hosted websites are doomed to die

I keep wanting to be able to recommend the ‘best’ way for people (who don’t like command lines) to get research stuff online. Is it Flickr, Zenodo, Internet Archive, Wikimedia, and Github? Or is it a shared hosting account on Dreamhost, running MediaWiki, WordPress, and Piwigo? I’d rather the latter! Is it really that hard to set up your own website? (I don’t think so, but I probably can’t see what I can’t see.)

Anyway, even if running your own website, one should still be putting stuff on Wikimedia projects. And even if not using it for everything, Flickr is a good place for photos (in Australia) because you can add them to the Australia in Pictures group and they’ll turn up in searches on Trove. The Internet Archive, even if not a primary and cited place for research materials, is a great place to upload wikis’ public page dumps. So it really seems that the remaining trouble with self-hosting websites is that they’re fragile and subject to complete loss if you abandon them (i.e. stop paying the bills).

My current mitigation to my own sites’ reliance on me is to create annual dumps in multiple formats, including uploading public stuff to IA, and printing some things, and burning all to Blu-ray discs that get stored in polypropylene sleeves in the dark in places I can forget to throw them out. (Of course, I deal in tiny amounts of data, and no video.)

What was it Robert Graves said in I, Claudius about the best way to ensure the survival of a document being to just leave it sitting on ones desk and not try at all to do anything special — because it’s all perfectly random anyway as to what persists, and we can not influence the universe in any meaningful way?

On not hosting everything

I’ve been moving all my photos to Flickr lately. It’s been a long process, one complicated by the fact that it seems silly to run my own WordPress installation (and things like ArchivesWiki) if I’m not going to bother hosting everything myself. Of course, that’s not really very logical, and so I’ve decided that it’s perfectly okay to host photos on Flickr, videos on YouTube, and all the text (and miscellaneous) stuff here on my own server.

What goes Where on the Web

Every now and then I recap on where and what I store online. Today I do so again, while I’m rather feeling that there should be discrete and specific tools for each of the things.

Firstly there are the self-hosted items:

  1. WordPress for blogging (where photo and file attachments should be customized to the exact use in question, not linked from external sites). Is also my OpenID provider.
  2. Piwigo as the primary location for all photographs.
  3. MoonMoon for feed reading (and, hopefully one day, archiving).
  4. MediaWiki for family history sites that are closed-access.
  5. My personal DokuWiki for things that need to be collaboratively edited.

Then the third-party hosts:

  1. OpenStreetMap for map data (GPX traces) and blogging about map-making.
  2. Wikimedia Commons for media of general interest.
  3. The NLA’s Trove for correcting newspaper texts.
  4. Wikisource as a library.
  5. Twitter (although I’m not really sure why I list this here at all).

Finally, I’m still trying to figure out the best system for:

  1. Public family history research. There’s some discussion about this on Meta.

Don’t Write Code (write descriptions of things)

I wish I didn’t know how to code.

For a programmer, the solution to every problem is to write more code.

But sometimes, all that is needed is to write proper words. To explain things and explore them through prose.

Not to remove oneself to the meta-realm of trying to understand the general structure of the problem and model it accordingly. (And then build something that resembles that model, and hope that the people using it see through the layers back to what the buggery’s trying to be done!)

Just write some nice, verbose, rambling blather about what it is and how it works and where we’re trying to go from here. Nothing too technical, and hopefully actually interesting to read. At least, linear, in that old-fashioned way of real writing. Interesting is probably too much to aim for… just words, then.

I was reading Phoebe Ayers recent post about the task of archiving the Wikimedia Foundation’s material. My first thought was “what sort of database/catalogue would be useful for this sort of thing?” Which is quite the wrong question, of course. There’s a whole world of wikis (both instances and engines) out there, perfect for this sort of variably-structured data. (If there’s one thing that constantly amazes me about Wikipedia it’s the fact that so much structure and repeated data is contained in what is basically an immense flat list of lone text files, and that it does rather work! The database geek in me shudders.)

I think a basic tennent for archiving physical and digital resources is that each object, and each grouping of objects, needs to have its own web page. In most cases, I use this both as a catalogue entry for the object or group, and as a printable coversheet to store along with the physical objects (or, in the case of digital-only objects, to be a physical placeholder or archive copy, if they warrant it).

The other thing I try to stick to is that a fonds and its catalogue (i.e. a pile of folders/boxes and the website that indexes them and adds whatever other digital material to the mix) should be able to be shifted off to someone else to maintain! That not everything should live in the same system, nor require particularly technical skills to maintain.

I know that there’s a dozen formalised ways of doing this stuff, and I wish I knew the details of them more thoroughly! For now, I’ll hope that a non-structured catalogue can work, and continue to write little printable English-language wiki pages to collate in amongst my folders of polypropylene document sleeves. And I’ll keep checking back to en.wikibooks.org/wiki/Subject:Library_and_Information_Science for instructions on how to do it better…

On What Gets Kept, and Changing How Over Time

“Make things that can be archived (databases cannot be, not if you don’t also store the application that reads them). Make it possible to change one’s data structures (the ways in which things are stored — not the file formats, so much), and leave old data alone. To update, copy and morph; don’t try to force everything into the new system. Files are good for this; their formats should be standardised though, of course.”

Digital Permanence

Manton Reececited wrote some sensible words about the permanence of material on the Internet, and Dave Winercited followed suit shortly after (and then againcited). It’s an important topic.

We need places — secure, digital, permanent places — to store things. It’s not a particularly difficult problem, at least to attempt to solve. (Of course, we won’t really know if we’ve succeded for another few hundred years.) So we should try!

A couple of ideas that I’m using as a baseline these days:

  1. Store things in open formats, so we can continue to read them.
  2. Store things in a small number of large (and non-esoteric!) repositories (i.e. filesystems, or drives, or websites, or whatever), so they’re easy to migrate to other places.

The latter is, I think, important: it means that the data can be easily handed over to someone else.

Catching up…

I have been playing around with a different form for this website for the last couple of weeks. Because I don’t particularly care if people don’t have access to it all the time, I made the changes to the live site, and so it’s looked pretty bad lately. Lots of changes behind the scenes, though, for me at least (I’m working on my email archiving system, and that’s taken priority).

Apologies to the only people who might actually have tried looking for this site — those looking for my WordPress plugins. But all’s back and well now; I’ll be posting a couple of updates to a couple of plugins sometime in the next fortnight. Maybe.

* * *

I’m leaving soon; I think I’ve mentioned that. No more shall my daily view be this:

…and I’m saying goodbye to here:

…and moving in to a lovely little house in White Gum Valley! Katie’s found us somewhere to live, and she’s moving in today! Good news.

But I won’t go on now; I’m waiting for the removalists to arrive and then I’m off to work. And the blog-self-consciousness has set in (so please don’t read this).

What I Did/Read/Thought Today

Hurrying off to uni after remembering the chai & cake stall, I forgot my lunch and the honey (not sweet mate), but managed to prove to myself the wisdom in having a slow bike. [Oh how I wish I could get my digital camera to work with these uni computers!] I got the chai on, retired to the Greens office to help with some ICT stuff and to do a bit of reading (more Bachelard), and headed to Civic Square for a Save The Ridge rally. I would post shots of that too, if only…

Reading. I pay close attention to my body when reading — how I’m sitting, where the pressure is, the weight of the book in my hands, my hands on my arms, where the forces are going. The intellectual exercise of entertaining the author[‘s ideas] is balanced (of course only partially — one still needs to swing from the trees shouting) by the awareness of my physical body.

The first graduate seminar that I’ve been to for weeks. Lenticulars are those pictures that move! Ooh err! (So I didn’t take many notes — eh!) The second talk set me thinking about media-independent replication of art; we’ve been doing it with text for ever, and that’s one of the aspects that draws me to the web-based data-gathering/page layout/hand-binding process: anyone anywhere could be doing similar things in totally different ways, but the ideas captured within the text would remain totally intact (much of why we marvel at digital storage). M. spoke about seating and stools and inspired me about sit/standing postures; I should like that for computeranating…

Enough for tonight!!!