Uploading to the IA

Fremantle

· archiving · Internet Archive · digitisation ·

I've been sorting out a better workflow for uploading to the Internet Archive. It's only just dawned on me that TIFF files are much better supported than PNGs, and so I'm now going to stick to those (actually I have been for quite a while now, but there are still lots of files that I scanned a few years ago that I'm getting sorted out).

If you upload a TIFF, a smaller JPEG gets derived with the same name (just with `.jpg` instead of `.tiff`). I've not yet looked into what size it aims to generate, but they seem to be about one megabyte.

But the overall workflow needs to me more like:

  • Scan a few images (e.g. multiple pages of a letter, front and back) and give them meaningful names (spaces and other special characters are allowed, but usually best avoided; the actual item title is given separately).
  • An item accession number is assigned by looking at the previous highest (actually more normally this is by looking at the count of files in the items/ directory). This is confirmed to be unique and unused by making sure the items/1234.md file doesn't exist yet.
  • The item file is created, and given a title.
  • The files are uploaded to IA with a command like this:
    $ ia upload ArchiveName1234 -m mediatype:image ~/path/to/scans/ArchiveName1234*.tif
    Note that the mediatype doesn't actually have to be supplied, and the item will end up as 'data', but that can't be changed later and so it's best to set it to 'image' (for photos) or 'text' (for letters etc.)
  • The item file is updated with the ArchiveName1234*.jpg filenames. At the moment this is all a bit manual and there's no system that either confirms that they're correct nor that will make sure that there's a connection between the JPEG and the TIFF; I'll probably improve that at some point.
  • The IA item is updated with a URL pointing to the item's generated HTML file.

The Internet Archive is only used here for files that can not be uploaded to Wikimedia Commons, because it's much nicer that anyone can help improve the metadata. IA items can be edited only by the uploader, so they're somewhat immutable blobs (Flickr is similar, in this workflow). It does mean that non-commercial and orphan works can be uploaded though, and that's very useful.

← Previous
Comments on this post
No comments yet