[mkgmap-dev] Copyright & License file reader improvements
From Marko Mäkelä marko.makela at iki.fi on Tue Dec 27 20:29:48 GMT 2016
On Tue, Dec 27, 2016 at 06:07:11PM -0000, Mike Baggaley wrote: >Hi Gerd, please find attached a small patch that improves the loading of >copyright and license data when the --copyright-file and --license-file >options are used. It will attempt to load the data using ANSI, UTF-8, UTF-16 >and the default code page. If it fails, more information is provided as to >the reason why. I am not Gerd, and I am not that active with mkgmap any more, but I have some interest in character encodings. I had a quick look at the patch. It first tries ASCII (which is a proper subset of UTF-8), then UTF-8, UTF-16 and the default code page. I do not think that there is any need to try ASCII separately. Any valid ASCII input is also valid UTF-8. If the input is not valid UTF-8, things get tricky. I am not sure if UTF-16 is a good thing to try. Here is an example where 6 ASCII characters (which could be part of a non-ASCII, non-UTF-8 input) get misinterpreted as 3 Chinese glyphs in UTF-16: $ echo -n foobar|recode utf16..utf8;echo 景潢慲 Because of this, I would omit the UTF-16 pass altogether. If UTF-16 input is truly needed, the default code page could be set to it. Also, some non-UTF-8 superset of ASCII could accidentally look like valid UTF-8. For example, the bytes 0xc2 0xa0 could represent the two-character string U+00C2 U+00A0 in ISO 8859-1. But the same bytes could also be interpreted as the single UTF-8 encoded character U+00A0. I think that if multiple input formats are supported (which would be against the Unix philosophy of keeping programs simple), the selection must be explicit, by some command line switch that chooses to use the default code page instead of UTF-8. In my opinion, the current code is good as it is. Because mkgmap already deals with mostly UTF-8 input (the OSM data), I think it is consistent to assume that all text files are encoded in UTF-8. Best regards, Marko
- Previous message: [mkgmap-dev] Copyright & License file reader improvements
- Next message: [mkgmap-dev] Copyright & License file reader improvements
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list