Penguin Classics portal on Wikisource

I’ve made a start of a system to pull data from Wikidata and generate a portal for the Penguin Classics, with appropriate links for those that are on Wikisource or are ready to be transcribed.

I’m a bit of a Sparql newbie, so perhaps this could’ve been done in a single query. However, I’m doing it in two stages: first, gathering all the ‘works’ that have at least one edition published by Penguin Classics, and then finding all editions of each of those works and seeing if any of them are on Wikisource. Oh, and including the ones that aren’t, too!

Wikidata:WikiProject Books sort of uses the FRBF model to represent primarily books and editions (‘editions’ being a combination of manifestation and expression levels of the FRBF; i.e. an edition realises and embodies a work). So most of the metadata we want exists at the ‘work’ level: title, author, date of first publication, genre, etc.

At the ‘edition’ level we look for a link to Wikisource (because a main-namespace item on Wikisource is an edition… although this gets messy; see below), and a link to the edition’s transcription project. Actually, we also look for these on the work itself, because often Wikidata has these properties there instead or as well — which is wrong.

Strictly speaking, the work metadata shouldn’t have anything about where the work is on Wikisource (either mainspace or Index file). The problem with adhering to this, however, is that by doing so we break interwiki links from Wikisource to Wiktionary. Because a Wikipedia article is (almost always) about a work, and we want to link a top-level Wikisource mainspace pages to this work… and the existing systems for doing this don’t allow for the intermediate step of going from Wikisource to the edition, then to the work and then to Wikipedia.

So for now, my scruffy little script looks for project links at both levels, and seems to do so successfully.

The main problem now is that there’s just not much data about these books on Wikidata! I’ll get working on that next…

Penguin Classics on Wikisource

As a way of learning Sparql and more about Wikidata, I’m trying to make a list of which pre-1924 Penguin Classics are on Wikisource.

Penguin lists their books at penguin.com.au/browse/by-imprint/penguin-classics.

The following Wikidata Query Service query lists all editions published by Penguin, their date of original publication, and whether there’s an Index page on Wikisource for the work or edition.

SELECT ?edition ?editionLabel ?work ?workLabel ?originalPublicationDate ?wikisourceIndexForWork ?wikisourceIndexForEdition
WHERE
{
  ?edition wdt:P31 wd:Q3331189 .
  ?edition wdt:P577 ?publicationDate .
  ?edition wdt:P123 ?publisher .
  FILTER(
    ?publisher = wd:Q1336200 # Penguin Books Q1336200
    || ?publisher = wd:Q11281443 # Penguin Classics Q11281443
  )
  ?edition wdt:P629 ?work .
  OPTIONAL{ ?work wdt:P577 ?originalPublicationDate } .
  OPTIONAL{ ?work wdt:P1957 ?wikisourceIndexForWork } .
  OPTIONAL{ ?edition wdt:P1957 ?wikisourceIndexForEdition } .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

I’m not sure how often the WDS data is updated, but so far it’s not being very useful for on-the-fly checking of recent updates. I’m sure there’s a better way of doing that though.