Importing to Piwigo

Piwigo is pretty good!

I mean, I mostly use Flickr at the moment, because it is quick, easy to recommend to people, and allows photos to be added to Trove. But I’d rather host things myself. Far easier for backups, and so nice to know that if the software doesn’t do a thing then there’s a possibility of modifying it.

To bulk import into Piwigo one must first rsync all photos into the galleries/ directory. Then, rename them all to not have any unwanted characters (such as spaces or accented characters). To do this, first have a look at the files that will fail:

find -regex '.*[^a-zA-Z0-9\-_\.].*'

(The regex is determined by $conf['sync_chars_regex'] in include/config_default.inc.php which defaults to ^[a-zA-Z0-9-_.]+$.)

Then you can rename the offending files (replace unwanted characters with underscores) by extending the above command with an exec option:

find -regex '.*[^a-zA-Z0-9\-\._].*' -exec rename -v -n "s/[^a-zA-Z0-9\-\._\/]/_/g" {} \;

(I previously used a more complicated for-loop for this, that didn’t handle directories.)

Once this command is showing what you expect, remove the -n (“no action”) switch and run it for real. Note also that the second regex includes the forward slash, to not replace directory separators. And don’t worry about it overwriting files whose normalized names match; rename will complain if that happens (unless you pass the --force option).

Once all the names are normalized, use the built-in synchronization feature to update Piwigo’s database.

At this point, all photos should be visible in your albums, but there is one last step to take before all is done, for maximum Piwigo-grooviness. This is to use the Virtualize plugin to turn all of these ‘physical’ photos into ‘virtual’ ones (so they can be added to multiple albums etc.). This plugin comes with a warning to ensure that your database is backed up etc. but personally I’ve used it dozens of times on quite large sets of files and never had any trouble. It seems that even if it runs out of memory and crashes halfway, it doesn’t leave anything in an unstable state (of course, you shouldn’t take my word for it…).

Deleting files with special characters in their names, in Windows

A couple of directories in Windows couldn’t be deleted by Windows Explorer, because they had unprintable characters (I’m assuming) in their names.

D:\tmp>dir
 Volume in drive D is Data
 Volume Serial Number is 8C47-34BD

 Directory of D:\tmp

28/09/2012  11:34 AM              .
28/09/2012  11:34 AM              ..
26/10/2010  01:51 PM              954321.
               0 File(s)              0 bytes
               3 Dir(s)  89,164,262,548 bytes free

On on hitting Delete it replied “Could not find this item. This is no longer located in D:\tmp”. I tried on the command line, a similar error:

D:\tmp>rd 954321.
The system cannot find the file specified.

The security properties of the folder looked weird, saying “The requested security information is either unavailable or can’t be displayed.”:

A screenshot of the top part of the properties dialog, showing the Security tab.

So I faffed around trying to change ownership, filenames, etc. all with no luck. Nothing seemed to see these files as existing except for Windows Explorer and ls -force.

In the end Superuser came to the rescue, as it often does, with the suggestion of referring to the file by its shortname, which can be got via dir /x.

D:\tmp>rd /s 954321~1
954321~1, Are you sure (Y/N)? y

Agh. Why are the simple things so hard to remember sometimes?…

On What Gets Kept, and Changing How Over Time

“Make things that can be archived (databases cannot be, not if you don’t also store the application that reads them). Make it possible to change one’s data structures (the ways in which things are stored — not the file formats, so much), and leave old data alone. To update, copy and morph; don’t try to force everything into the new system. Files are good for this; their formats should be standardised though, of course.”

Getting sorted for the new year, in the Mocca Lounge

A new cafe, on the way home from a ride this morning: the Mocca Lounge, it seems to be called. I guess they mean brown and not quite one thing nor another, but at least relaxing. It’s a reasonable place to sit for a while and read a book. It’s an inside cafe with no windows (can you believe such a thing?!), but it is at least dim and carpeted and large and mostly empty, which are good things. And I’ve a coffee and a book and time, which are also good things.

So, three cheers for all that, then.

I’ve been sorting out a new filesystem nomenclature, these last few months…

  1. The top level (my home directory, /home/sam) contains one directory per year and ~/tmp, and a pile of other stuff, as usual, but that’s all maintained by various programmes and the OS.

    ~/
        1995/
        1996/
        …
        2011/
        2012/
        tmp/
    
  2. Each year has only a single level below it, topically- and old-fashionedly-named to maintain alphabetical sorting:

        2011/
            Subject, clarification/
            Subject, andother aspect of it/
            Another subject/
            Again, something else/
    

    There are no files at that level, only directories.

  3. At the turn of the year, items which are of continuing activity are moved to the new year. All else stays put. This means that the current year only ever contains things that are useful and whatever is old but still needs to be kept—and which will rarely be looked at—disappears out of sight in the old years.

    I’ve always found it annoying that computer organisation systems don’t allow things to moulder away in boxes in sheds (as it were), instead forcing everything to be current and visible — and thus liable thrown away once no longer useful. A core part of my archival system is to hide things from my own penchant for disposal.

  4. Within each item, and within the tmp directory, there is no prescribed ordering. Files take whatever names and arrangements as seem suitable.

  5. File and directory names contain whatever characters they want, with the exception of quotation marks, slashes, colons, asterisks, octothorpes, and anything else I think is likely to be annoying in scripts, moving between filesystems, or other filename handling.

  6. There is no rule six. :-)