I've worked out what the problem is - how lua handles unicode/non-ansi strings.
Given a cache with the name
name = "Église Saint-Germain-l'Auxerrois" (GC2FNN9)
If you do either:
firstLetter = StringByte(name, 1) or
_,_, firstLetter =StringFind(name, "^(.)")
then you would expect
firstLetter == "É" to be true
In fact the result will be false, the reason being the name is being stored as a UTF-8 byte stream and É is actually represented as 2 bytes (c3, 88). However Byte and Find just treat name as a String of 8 bit bytes and return the first byte not the first utf-8 character. When I then try and Print(firstLetter) what is outputted is an invalid UTF-8 codepoint which will then presumably breaks string conversion/validation somewhere in the chain.
There's a bit more detail here:
http://lua-users.org/wiki/LuaUnicode