[mkgmap-dev] character repertoires
From Steve Ratcliffe steve at parabola.me.uk on Mon Feb 25 20:03:40 GMT 2013
Hi > It actually is the CP1252 superset of ISO 8859-1, I see the printable > characters in the range 128–159 (which ISO 8859 reserves for a second > set of control characters). Your observations neatly illustrate the way that the code works. This is the current algorithm: 1a. if ascii(no-code-page): all characters > 0x7f are transliterated into ascii characters 1b. if code-page=1252: all characters > 0xff are transliterated into latin1 characters. 1c. all other code pages: no transliteration. 2. Create a character set name by prepending "cp" to the code-page (eg. cp1252). 3. Use the standard java character set conversion with that name to convert the result of step 1. Any character that cannot be converted is replaced with a '?' symbol. This may possibly vary with java version and platform. That explains most of the observations I think. U+2021 is transliterated to ++ for 1252, but not for any other 125x Same for the Euro symbol to Eu. > The micro sign U+00B5 μ becomes a ? on most code page maps, except for > the Greek one, even though it is at the same position in all code pages. U+00b5 is upper cased to GREEK CAPITAL LETTER MU, which is only present in the Greek code page. > And in the Arabic map’s upper half, the latin based characters show up > as “?”. That's because only lower case characters are included. > Another peculiar thing: while the Garmin does its usual wierd > upper/lower casing, TWO LABELS ARE ALL CAPS, namely those containing > the ª feminine and º masculine ordinal indicators. I don't know about this. Possibly a device thing? > Asian: > A map with CP1258 shows up with totally unlabeled streets, not even > anything from the ASCII range. Strange - are labels correct in the file? If you run strings on the img do you see the ascii labels? If so then it is a device thing. So currently ascii and 1252 are better than the other code pages since just about every unicode character can be represented, whereas in the other code pages you are limited to characters from that page. It looks possible to fix this by removing the transliteration step from where it is and only using it when a character that is un-mappable into the target code page is encountered. ..Steve
- Previous message: [mkgmap-dev] character repertoires
- Next message: [mkgmap-dev] character repertoires
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list