Nyunga words on Wiktionary

I’ve pretty much finished moving a set of ‘template’ Nyunga-language Wiktionary entries into my userspace on Wiktionary, from where they can be copied into mainspace. There are a few dramas with differing character-sets between definitions in some of the word lists I’ve got, so a couple of letters are missing. There’s plenty that are there though, and mainly I’m interested now to see if this idea of copying, pasting, and then copy-editing these entries is going to be a sensible workflow.

I thought about bulk importing these directly into place, but the problem with that is (quite apart from the first fact that none of these wordlists have machine-readable part-of-speech data) that almost all of them are going to need cleaning up and improving. For example, “kabain nin nana kulert” is in there as an entry. It means “perhaps someone ate it and went away”, and (I’m guessing) isn’t an idiom and so really oughtn’t have it’s own entry. It can however be used as a citation in every single one of its constituent words. That’s something that I think is best left up to a human, rather that forcing a human to clean up a bot’s mistakes. Or take “tandaban” which has a definition of “jump, to [9]” (and the square bracket references are throughout this dataset and are not explained anywhere that I’ve been able to find). This should just be translated as “jump” with a link to the English verb; again, a script could handle that, but the myriad of incoming formats would take too much time to code.

Maybe I’m just not being clever enough about preparing the data, and an import script, in a rich enough way. But that could take ages before ever this data sees the light of day on Wiktionary; the approach I’ve used means that it’s there now for anyone who wants to work with it. There are also so very many improvements that a human editor can make along the way, that it seems we’ll have better data for fewer words… and that seems to be the correct trade-off. Wiktionary is a ‘forever’ project, after all!

Of course, the plan is to be able to extract the data after it’s been put in its proper place, and I’ve started work on a PHP library for doing just that. I’d rather do the code-work on that end of it, and put in the time for a human-mediated import at the beginning end.

All of this is a long-winded way of putting out there on the web, in this tiny way, an invitation for anyone to come and help see if this import is going to work at all! Will you help?

2016 begins

It’s 2016 and it seems like a good time to attempt some new type of explanation of things. Things in general, I mean, and things internety. Or, maybe not ‘explanation’ so much as formless rambling. That’s easier on the brain, given the amount of sleep I’ve been getting (i.e. sod all).

I’m four days in to the new working year, and some good bits of code are already shaping up (file attachment fields and schema-editing in Tabulate, hopefully both ready to roll before too much longer). Some odd bits of enterprise bureaucracy have nearly fallen on my head but for the most part missed me (whereon I’ve attempted the old I-didn’t-see-anything trick, and carried on regardless).

I had a couple of weeks off, and explored some great bits of the south west. So nice to be back at Wilyabrup (not climbing, just looking, and some mapping). And I didn’t even take my GPS to Walpole; good to be not attempting to Record Everything for a while.

Things for this year, perhaps: Wikisource proofreading; importing Nyunga words into Wiktionary; carry on with Tabulate; print CFB at long last; go to Wikimania; try to write every day; get MoonMoon working again properly for Planet Freo. But mostly: stop re-evaluating everything and just get on with what’s (reasonably and probably not perfectly) good enough and worthwhile. Code less! Work on content and data more; code only what’s required.