Sam Wilson's Journal


This is a project that I’ve wanted for years, and now it’s here:

Project GITenberg is a Free and Open, Collaborative, Trackable and Scriptable digital library. It leverages the power of the Git version control system and the collaborative potential of Github to make books more open.

40,000 Project Gutenberg books have been uploaded to GitHub, and can now be forked, fixed, and fed back to the world’s biggest library of public domain ebooks. Other alliteration is also possible.

I’ve just sent my first pull request, for a typo I found in Gissing’s The Paying Guest.

The only thing lacking now is the original scans of these books, so that the ebooks can be verified against the source.

[No comments] [Keywords: , , , , ] [Permanent link] [2 views]

WikiTeam has released an update of the chronological archive of all Wikimedia Commons files, up to 2013. Now ~34 TB in total.

Just seed one or more of these torrents (typically 20-40 GB) and you’ll be like a brick in the Library of Alexandria (or something), doing your bit for permanent preservation of this massive archive.

From this post to wikimedia-l.

[One comment] [Keywords: , , , , ] [Permanent link] [926 views]

I’ve just been tinkering with writing a new Dokuwiki plugin that I’ve wanted for a while: log404. It logs all not-found page hits (as in, HTTP status 404 Not Found), and gives an admin page for viewing (and deleting) them. Soon to come is a way to add a not-found page ID to the redirect plugin’s list of redirects. That’ll be about all the thing shall do. Oh, and keep a list of 404s to ignore. And have a nicer-looking admin page! That’s all…

I’ve not created a page on yet ( because it’s not finished enough yet.

I have, however, added it to Travis CI—the first thing I’ve done that with. It’s jolly nice having a little green badge in the readme!

Right then. Friday arvo—time to get a beer.

[No comments] [Keywords: , , , , , , ] [Permanent link] [404 views]

Task Scheduler is failing to run a particular task. Or, rather, it’s running it, exiting immediately, and refusing to log the output. (I don’t know why Microsoft came up with the brilliant idea of Task Scheduler not having a MAILTO feature!) This is a PHP command, something like php file.php --param=val …all Task Scheduler sees it as, according to its log, is Task Scheduler successfully completed task "\Stage\DB\Generate_VU_files" , instance "{3676cfd5-9fc0-460d-9738-1b1b5347ecb9}" , action "C:\Program Files (x86)\PHP\php.exe" with return code 255. Agh! Tell me more!

When the same ‘action’ command command is run from the command line, by the same user as whom Task Scheduler runs the task, everything works fine. There is no error, and the command does it stuff. Running through Task Scheduler, on the other hand… :-(

The chances are that this is a permissions problem.

Yes, it would seem that attempting to change permissions on a parent directory of the directory to which the command is trying to write gives “Error Applying Security”.

An answer on Server Fault suggests taking ownership (and only in Windows-world does one see things like “This can happen if you really don’t have access to that directory.”! because one’s first thought is always that the error is just stupid and not telling you the whole story —— I’m an admin, and admins have write access, but do I have write access? No!).

So I changed the owner of the target directory, clicking “Replace owner on subcontainers and objects”.


[One comment] [Permanent link] [589 views]

Sometimes, just impossibly gently and in a way so fragile, things seem like they might almost make sense. Mostly they don’t, of course. Mostly, it’s all just a blur of panic, distraction, and confusion. But then some piece of code comes along, clear and simple and smilingly happy, and I can breath for a little while. Breath calmly, and type bits of text onto a clean screen and know that the database doesn’t hate me for squashing it into a form that it never liked. That things are bit-by-bit going into their correct places, and progress is being made.

Tomorrow—no: later this morning probably—the code, the database, and I shall slide back down the muddy cold slope, from the bottom of which no clear view is possible. There will just be a memory of a thought that sometimes we can see clearly, and that’ll be enough to keep us clawing back up.

[2 comments] [Permanent link] [664 views]

Reading on an ereader, I seem to lose all of the “publisher’s metadata”: there is no longer any hint of what type of book this is — no cover to judge, no binding, no typography to tell if it’s a serious literary thing or a pulpy time-passer or an old forgotten once-loved.

It’s probably good this way. Lets the text speak for itself. Mainly the loss harms my ability to recall a book, more than the way I receive its words. No more recollection of 20th century authors as dusty orange Penguins with failing glue. Now they sit alongside every other of any time whose surname begins as theirs does, or is (as arbitrarily) co-alphabetically titled.

Perhaps what I’m looking for is a chronology of literature? Victorians vs. post-war makes more sense than the alphabet as a reading criteria!

[2 comments] [Keywords: , , , , , , , , ] [Permanent link] [412 views]

Photo of a page of the Guardian Weekly
The danger is that we are becoming ever more disconnected from place: “Most modern intellectuals and scientists,” he tells us, “have hardly any interest in place, for they consider their theories to be applicable everywhere.”

—From a review of Off the Map by Alastair Bonnett.

[5 comments] [Keywords: , , , , ] [Permanent link] [968 views]

Every now and then I recap on where and what I store online. Today I do so again, while I’m rather feeling that there should be discrete and specific tools for each of the things.

Firstly there are the self-hosted items:

  1. WordPress for blogging (where photo and file attachments should be customized to the exact use in question, not linked from external sites). Is also my OpenID provider.
  2. Piwigo as the primary location for all photographs.
  3. MoonMoon for feed reading (and, hopefully one day, archiving).
  4. MediaWiki for family history sites that are closed-access.
  5. My personal DokuWiki for things that need to be collaboratively edited.

Then the third-party hosts:

  1. OpenStreetMap for map data (GPX traces) and blogging about map-making.
  2. Wikimedia Commons for media of general interest.
  3. The NLA’s Trove for correcting newspaper texts.
  4. Wikisource as a library.
  5. Twitter (although I’m not really sure why I list this here at all).

Finally, I’m still trying to figure out the best system for:

  1. Public family history research. There’s some discussion about this on Meta.

[One comment] [Keywords: , , , , , , , , , ] [Permanent link] [4,948 views]

Grand Budapest Hotel ticket

[2 comments] [Keywords: ] [Permanent link] [486 views]

I’ve been writing lots of integration tests lately, for a system that has zero unit tests. Does this make me a bad programmer? Probably. But it’s so easy! This is in Kohana, using ORM, and so the model basically is the database (which idea I rather like), and mocking it or splitting it out to be separate is just a pain. Far less code to write if one can test the whole interaction of the system at once.

I am being slightly tongue-in-cheek here, because I do realise that the maintenance burden of a system built with tight coupling between the various layers is likely to increase contiunually (to a point where someone at somepoint says “oh sod it, let’s rebuild from scratch on Drupal”). But for the multitude of systems that are basically just CRUD, the approach of writing tests that mimic the code seen in controllers is pretty simple and neat.

[3 comments] [Keywords: , , , , , , , ] [Permanent link] [816 views]