This is a project that I’ve wanted for years, and now it’s here:

Project GITenberg is a Free and Open, Collaborative, Trackable and Scriptable digital library. It leverages the power of the Git version control system and the collaborative potential of Github to make books more open.

40,000 Project Gutenberg books have been uploaded to GitHub, and can now be forked, fixed, and fed back to the world’s biggest library of public domain ebooks. Other alliteration is also possible.

I’ve just sent my first pull request, for a typo I found in Gissing’s The Paying Guest.

The only thing lacking now is the original scans of these books, so that the ebooks can be verified against the source.

forking wikis

I wish wikis were less collaborative! I wish they were more like software projects, where if one wants to modify anything, one gets one’s own copy and does anything at all to it.

No, I’m not really saying that there should be fewer centralised places of communal effort, these things are great… I just want a good way to branch and modify non-code content.

A cross between the Internet Archive’s system for uploading content into their collections, and Github’s user-centric arrangement.

The problem seems to often come back to the formats that things are in. It’s easy in the text-only code world; but wiki’s each have their own markup…

I wondered about the use of MediaWiki, and pulling in remote articles (periodic synchronisation), but of course there’s no merging in that idea, so it doesn’t work. It’s what Printable WeRelate does, but I’m yet to quite figure out how that’s going to deal with local additions to the data (probably, pages will be quite separate, with links only going from the local-only content to the remote-sync’d stuff; because we can’t modify the remote articles locally, and links in them when they’re elsewhere wouldn’t make sense).

So, there’s no solution: I’ll stick to centralised editing and storage, but carry on pulling backups (huzza to Wikiteam).

Contributing to Github-hosted projects

Some projects provide information about how people should fork and contribute to them. This is my general approach (included here, obviously, for my own edification):

  1. Fork a project: Github clickity-click
  2. Clone it locally:
    git clone
  3. Add the upstream project:
    git remote add upstream
  4. Do not commit to the master branch; it is to be kept up-to-date with upstream master:
    git pull upstream master
  5. Create branches that solve one feature or issue each, named whatever:
    git branch new-branch-name master
  6. Create a ‘personal master’ named with your username:
    git branch username master
  7. Do not merge master into feature branches, rather rebase these on top of master:
    git rebase new-branch-name
  8. Merge all personal feature branches into your personal master branch, so you’ve got a branch that represents all your development.

My main goal is to create discrete branches, based on the upstream master, for features that I want to push back upstream.

(No doubt I’m missing obvious things, and any git-geek will see instantly the gaps in my knowledge.)


To combine the last three commits (and write a new commit message):*

git reset --soft HEAD~3
git commit