WA Biographical Index

Fremantle

· Wikimedia · Western Australia · datsets · open data ·

I noticed the other day that the Western Australian Biographical Index is licensed under CC-BY, so I thought I'd try to copy relevant entries to Freopedia (and other things). I downloaded the 18 CSV files:

   A-final.csv  DE-final.csv  H-final.csv   L-final_edited.csv  O-final.csv   S-final.csv
   B-final.csv  F-final.csv   IJ-final.csv  M-final.csv         PQ-final.csv  T-final.csv
   C-final.csv  G-final.csv   K-final.csv   N-final.csv         R-final.csv   UVXYZ-final.csv

Combined them into one, without their header rows (which were confirmed to exist before doing this):

$ awk '(NR == 1) || (FNR > 1)' *.csv > wabi.csv

This was imported into OpenRefine, and resulted in 85,403 records.

Found duplicates by sorting, applying "reorder rows permanently", and then "edit cells" > "blank down". The blanks can then be faceted on, and 421 duplicates were found, e.g. PQ/P2626 (where the second here is the correct record):

POCOCK Ruth Elsie May b. 1900. m. N.S.W. 1928 Edwin Lennard MINCHIN

vs.

PLUSH Edward, son of Thomas Hall (artist). arr 18.3.1886 per Albany (steerage) from SA - listed as G. Plush. m. 1.1.1890 (Perth C/E) Amelia GOLDING, dtr. of William (gardener). PERTH painter. Joined the Police force 1886.

That meant there were 84,982 unique cards.

These were imported to a Mix'n'Match catalogue: https://mix-n-match.toolforge.org/#/catalog/6490 For this, the card text had to be truncated.

I proposed a new property on Wikidata, and it was approved and created a week or so after.

Now the task is to link items to the WABI, probably starting with any mentioning Fremantle.

← PreviousNext →
Comments on this post
No comments yet