forking wikis

I wish wikis were less collaborative! I wish they were more like software projects, where if one wants to modify anything, one gets one’s own copy and does anything at all to it.

No, I’m not really saying that there should be fewer centralised places of communal effort, these things are great… I just want a good way to branch and modify non-code content.

A cross between the Internet Archive’s system for uploading content into their collections, and Github’s user-centric arrangement.

The problem seems to often come back to the formats that things are in. It’s easy in the text-only code world; but wiki’s each have their own markup…

I wondered about the use of MediaWiki, and pulling in remote articles (periodic synchronisation), but of course there’s no merging in that idea, so it doesn’t work. It’s what Printable WeRelate does, but I’m yet to quite figure out how that’s going to deal with local additions to the data (probably, pages will be quite separate, with links only going from the local-only content to the remote-sync’d stuff; because we can’t modify the remote articles locally, and links in them when they’re elsewhere wouldn’t make sense).

So, there’s no solution: I’ll stick to centralised editing and storage, but carry on pulling backups (huzza to Wikiteam).

Don’t Write Code (write descriptions of things)

I wish I didn’t know how to code.

For a programmer, the solution to every problem is to write more code.

But sometimes, all that is needed is to write proper words. To explain things and explore them through prose.

Not to remove oneself to the meta-realm of trying to understand the general structure of the problem and model it accordingly. (And then build something that resembles that model, and hope that the people using it see through the layers back to what the buggery’s trying to be done!)

Just write some nice, verbose, rambling blather about what it is and how it works and where we’re trying to go from here. Nothing too technical, and hopefully actually interesting to read. At least, linear, in that old-fashioned way of real writing. Interesting is probably too much to aim for… just words, then.

I was reading Phoebe Ayers recent post about the task of archiving the Wikimedia Foundation’s material. My first thought was “what sort of database/catalogue would be useful for this sort of thing?” Which is quite the wrong question, of course. There’s a whole world of wikis (both instances and engines) out there, perfect for this sort of variably-structured data. (If there’s one thing that constantly amazes me about Wikipedia it’s the fact that so much structure and repeated data is contained in what is basically an immense flat list of lone text files, and that it does rather work! The database geek in me shudders.)

I think a basic tennent for archiving physical and digital resources is that each object, and each grouping of objects, needs to have its own web page. In most cases, I use this both as a catalogue entry for the object or group, and as a printable coversheet to store along with the physical objects (or, in the case of digital-only objects, to be a physical placeholder or archive copy, if they warrant it).

The other thing I try to stick to is that a fonds and its catalogue (i.e. a pile of folders/boxes and the website that indexes them and adds whatever other digital material to the mix) should be able to be shifted off to someone else to maintain! That not everything should live in the same system, nor require particularly technical skills to maintain.

I know that there’s a dozen formalised ways of doing this stuff, and I wish I knew the details of them more thoroughly! For now, I’ll hope that a non-structured catalogue can work, and continue to write little printable English-language wiki pages to collate in amongst my folders of polypropylene document sleeves. And I’ll keep checking back to en.wikibooks.org/wiki/Subject:Library_and_Information_Science for instructions on how to do it better…

On What Gets Kept, and Changing How Over Time

“Make things that can be archived (databases cannot be, not if you don’t also store the application that reads them). Make it possible to change one’s data structures (the ways in which things are stored — not the file formats, so much), and leave old data alone. To update, copy and morph; don’t try to force everything into the new system. Files are good for this; their formats should be standardised though, of course.”

Digital Permanence

Manton Reececited wrote some sensible words about the permanence of material on the Internet, and Dave Winercited followed suit shortly after (and then againcited). It’s an important topic.

We need places — secure, digital, permanent places — to store things. It’s not a particularly difficult problem, at least to attempt to solve. (Of course, we won’t really know if we’ve succeded for another few hundred years.) So we should try!

A couple of ideas that I’m using as a baseline these days:

  1. Store things in open formats, so we can continue to read them.
  2. Store things in a small number of large (and non-esoteric!) repositories (i.e. filesystems, or drives, or websites, or whatever), so they’re easy to migrate to other places.

The latter is, I think, important: it means that the data can be easily handed over to someone else.

Archiving a password-protected site with wget

The combination of wget and the Export Cookies add-on for Firefox is useful for creating offline, complete, static archives of websites that are only accessible with a password:

  1. First log in to the site and export cookies.txt,
  2. Then run
    wget \
    	--recursive \
    	--no-clobber \
    	--page-requisites \
    	--html-extension \
    	--convert-links \
    	--restrict-file-names=windows \
    	--domains example.com \
    	--no-parent \
    	--load-cookies cookies.txt \
    	--reject logout,admin* \
    	example.com/sub/dir
    

The rejection of logout URLs is especially useful, because otherwise one will probably be logged out by wget accessing the logout link.

Brewster Kahle on the Internet Archive

We offered unlimited storage, unlimited bandwidth, for ever, for free — to anybody who has something to share that belongs in a library.

Brewster Kahle, Entertainment Gathering Conference 2007 (republished as a TED Talk). The above quote is at 14:19.

The crux of it is of course “something that belongs in a library”. If one has something that could conceivably be held in a library, then there should be a library in which it can be held; the Internet Archive is one possibility.

Archive Team

A while ago I came across Archive Team, a group of people dedicated to *saving* any and all digital cultural heritage materials (read: websites) that are being deleted (or in danger of being deleted; or just because they want to).

Like Geo Cities. Or those BBC sites that were closed. Or a dozen other things that no one is likely to think about, nor would’ve missed — but they’re saving them nonetheless.

Brilliant work! Not only necessary and sensible and useful, but done with such style!

Who else is making ctrl-s an act of defiance?! (And, yes, legitimising digital hoarding, perhaps…)

Indexing Newspapers

I have been working again this morning down at the Local History Collection at the library.  The newspaper clippings’ catalogue is progressing — up to a hundred and thirty clippings so far — and proving to be quite an interesting project.  This morning I got up to the end of 1953, the beginning of ’54, and the Royal Visit (I’m working through a chronological scrapbook of old clippings).  If the selection of news that was considered worthy of preservation is anything to go by (and it probably isn’t), the whole of Fremantle was happy and excited about the Queen’s passage through the city, to the exclusion of everything else.

But there was other stuff happening, such as the seemingly never-ending discussions about the new bus terminal outside the train station, and someone’s idea to amalgamate East Fremantle and the FCC (they even wanted a referrendum).

I was playing a bit with adding notes about the people in these articles to pages on ArchivesWiki. Generally they’re not notable enough for Wikipedia, and I haven’t yet found a good, similar, project that accepts ramdom little snippets about random people. I’ve a slight idea of working on some sort of ‘local history wiki’ for Fremantle, with pages about any and all people, places, buildings, etc. — but I don’t suppose it’ll take off.

It’s frustrating, reading through these newspaper clippings and not being able to put the full text up anywhere (although I have put the East Freo one above on Wikisource), and not assimilating their information into relevant, composite, articles. It just makes it feel more satisfying, if when I find a reference to some doing of Mr. McCombe the Town Clarke, I note it down on his biography. So I think I’ll do more of that.

Further Afield

I think there is a need for a general, world-wide, catalogue of newspaper articles, both historical and modern. Wikisource can’t be it, because it strives for full texts, and all modern newspaper material is under copyright. I envisage something pretty simple, that just catches headlines, summaries, and keywords (and of course source data). It’s not that hard to find libraries that have access to newspaper material, but it’s usually in microform and so utterly unusable if you don’t know what date/page you’re looking for. An index is needed!

The National Library of Australia’s new Australian Newspapers site looks pretty fantastic, and assuming they do end up digatising everything (which I think is the aim), will effectively supplant things like Wikisource so far as public domain material goes. But they’re still stuck when it comes to contemporary newspapers.

But I won’t ramble on about this any more; I’ve got daft blathering about systems development to get on with.

if:book: ephemera

I’m bored and tired this Monday morning, but still I flick through my blog feeds; I found this: [if:book: ephemera] from the Institute for the Future of the Book.

It’s an interesting idea: that the inconsequential, unconsidered, printed matter of the day gives ‘the future’ (the people, that is) insight into how normal lives were led. Does it? And if it does, does that mean that it’s a good idea to collect modern ephemera? (oh, incidentally: I have read in old books the word ‘ephemera’ treated as singular in number, so maybe I mean: ‘should ephemeras be collected?’…) I think I am, at heart, something of a hoarder of things — words, pictures, stories, whatever — that seem in danger of otherwise going unrecorded; I must think that there’s some value in these things…

I’m not convinced that we can actually choose what records are left for the future, however. Much extant documentation from the past was never set aside for preservation; much that was set has disappeared without trace (well, obviously not totally without trace: we know a certain library might have existed, but not what stories its books held). Isn’t this what Claudius (well, Mister Graves, anyway) said in the preface to I, Claudius? That it’d be better to leave his memoirs lying on a table somewhere, and leave their preservation to chance, than to entomb them under stone and law?

But maybe we can choose, a bit, or at least make it easier for things to survive (by not destroying them!). Leaving aside the question of why it’s worth doing, I wonder about how. They say that the internet, and computer-based documentation in general, is making the printed record of modern times sparser and maybe less meaningful than that of other times (the eighteenth century, for example). In the post I’ve linked above they ask “what provisions are we making for our own mass memory?” Some people say that computers should be used to solve the problem of things existing only on computers, which seems a little contradictory to me, but (being the geek I am) I also at times think this. So I write programs that help me order things and decide what not to keep.

Oh, then I get confused and wonder why I ever bother keeping a blog…

Sorry.

Where I Write

I have often thought that one of the greatest attractions for me to writing in ink, on paper, in a properly-bound book, is that where one writes the words is where they will remain, and the only place they will ever be. That’s not the case when writing on a screen: I often write a post for this blog, for example, whilst off-line, and in different editors, on different computers. Then I paste it into here, save it, and it appears in it’s final place where you’re reading it now.

That last sentence belies where I’m writing this, but had I transcribed these words from a book that I’d written in whilst sitting on the ground on Mount Ainslie being buffeted by a strong wind — how would be different? How does being exposed to the original copy of a piece of writing affect how it is read? What is the ‘original copy’ on the web?

Sometimes (this morning, for example) I’m into the fact that the web separates content from medium — the written word certainly remains, and the loss of the detail of the act of writing is good, because we then focus on what is written, and how how it was composed. Of course (like so much of the blogosphere), this does allow for this sort of introspective post that really does no one any good and is rightly ignored by the whole planet. But that doesn’t matter. The point is that I generally have in mind the final resting place of my words as I write them, and that changes what I write about and how I write it. My problem at the moment (oh, yeah, it’s a real problem!) is that I sometimes write good stuff on paper, and there it languishes forever and is never read; conversly, I (often?) write poor ramblings on screen (generally on this blog, or on my wiki) that should never have been written, let alone read. So whereto from that?

It feels more ‘pure’ to write with a fountain pen in a book, more active and engaging to tap at a keyboard on the web. A rough draft composed in pencil[1] up a tree in a disposable notebook which is then posted here with accompanying images? Or carefully-formed words in a Moleskine that are never to be seen again? Given my current desire to not encumber myself with Stuff, the former is where I’m at.

[1] The Faber-Castell ‘E-Motion’ is in my pocket always. [Back up]