Loading spatial data into MySQL with LOAD DATA INFILE

Just a note for my future reference: importing an Excel CSV into MySQL. The WKT column has been constructed by hand to be POINT(lng lat) and the CSV contains headers.

LOAD DATA INFILE '/full_path/to/file-on-server.csv'
REPLACE
INTO TABLE the_table
COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
IGNORE 1 ROWS
(name,description,@geographic_location)
SET geographic_location = GeomFromText(@geographic_location)
;

Joining DB records to the ‘highest’ of their related (1:m) data

I often have cause to query a database table and join it with a subtable of related data (i.e. a table with which this one has a one-to-many relationship), and to retrieve only the highest (or lowest) of related records (based on some sortable field; often a date). Usually in MySQL, and often to be used in a view and so sub-queries are not an option.

The key is to

  1. Outer join with the subtable twice (well, the first join can be inner, if a subrecord result is required);
  2. add, to the second join condition, the constraint that this second-subtable’s sorting field must be greater than its first-subtable counterpart; and
  3. add a where condition that the second-subtable’s primary key be null.

Why this is so hard to remember, I do not know.

For example, for these two tables

records table
id name
1 Record One
2 Record Two
3 Record Three
subrecords table
id record_id date
1 1 2013-01-10
2 1 2013-01-09
3 1 2013-01-05
4 3 2013-01-10

the SQL is

SELECT *
FROM records r
    LEFT JOIN subrecords s1 ON (s1.record_id = r.id)
    LEFT JOIN subrecords s2 ON (s2.record_id = r.id AND s2.date > s1.date)
WHERE s2.id IS NULL
ORDER BY r.id

to give a result of

r s1 s2
id name id record_id date id record_id date
1 Record One 1 1 2013-01-10 NULL NULL NULL
2 Record Two NULL NULL NULL NULL NULL NULL
3 Record Three 4 3 2013-01-10 NULL NULL NULL

Of course the list of selected columns needs to be changed.

How to kill WordPress

(Other than mis-capitalising its ‘P’, that is.)

I have had a wordpress site (see, I’m failing even to give it a capital ‘W’ now) for many years, since about… umm… November 3rd 2003 at 16:01:23 or thereabouts. I’m finally sick of it. It’s grown and grown and is trying so hard to be everything to every blogger out there, that I don’t know where I stand with it. It used to be fun, y’know?! A codebase I could fiddle with, and make do whatever I wanted. Now, I just find it very good for doing things that I don’t need to do.

So, I’m dumping it. No more WordPress.

Now the question is: how to migrate away from it? The important thing (although, really, I’m not actually that fussed about it; it’s more a pride thing — a web geek hardly wants to go against Tim BL’s advice, does he?) is to preserve URIs, at least the important ones.

So I started by making a final backup — all files, WP core included, and the database dump — and moving that tarball out of my usual backup rotation. So I’ve got a snapshot of the site, that will never fall off the far end of my backups. You never know (to quote Duane Dibbley).

Then, I inserted all of the WP posts into my new system’s database::

INSERT INTO journal_entries (id, title, date_and_time, entry_text)
  SELECT id+1000, post_title, post_date, post_content
    FROM wp_posts
    WHERE post_type = 'post' AND post_status = 'publish'
    ORDER BY post_date ASC

The +1000 on the ID was to ensure that I could refer to the new IDs of the imported posts in the next section, the redirections (there were fewer than 1000 records already in the journal_entries table)::

  SELECT CONCAT(
    'Redirect permanent /',
    YEAR(post_date),'/',
    LPAD(MONTH(post_date),2,'0'),'/',
    LPAD(DAY(post_date),2,'0'),'/',
    post_name,' ',
    'http://samwilson.id.au/journal/view/',
    id+1000
  ) AS redirection
  FROM wp_posts
    WHERE post_type = 'post' AND post_status = 'publish'
    ORDER BY post_date ASC

And I dumped all that into .htaccess.

Finding all date ranges (which may be open-ended) that overlap with a given range

Given a database table listing events and their date ranges: events { id, start_date, end_date, }, where either or both of the dates can be null, how is one to find all of the events that fall within (even partially) a given date range? (This is pretty much what Kieran Benton asked on Stack Overflow, with the addition of the nullability.)

There are twelve possibilities for ranges with respect to the given range A-B:


Note that the gradients indicate null start or end dates. PNG
SVG

The red ranges are the ones that should be included in the result; the orange ones should be omitted. It’s easier to query for the smaller set, which satisfy the following conditions:







































































> A < A > B < B NULL
Start Date x


End Date
x


Start Date



x
End Date
x


Start Date

x

End Date

x

Start Date

x

End Date



x

Which corresponds to the following SQL:

SELECT * FROM events
WHERE NOT (
  (start_date IS NOT NULL AND start_date < :A AND end_date IS NOT NULL AND end_date < :A)
  OR (start_date IS NOT NULL AND start_date > :B AND end_date IS NOT NULL AND end_date > :B)
  OR (start_date IS NULL AND end_date IS NOT NULL AND end_date < :A)
  OR (start_date IS NOT NULL AND start_date > :B and end_date IS NULL)
)