[mkgmap-dev] patched polish file charset and multipolygon handling
From Steve Ratcliffe steve at parabola.me.uk on Mon Feb 21 22:46:37 GMT 2011
On 21/02/11 12:22, Kolesár András wrote: Hello, Welcome to the list. > Finally, I have modified READING_CHARSET in > mkgmap/reader/polish/PolishMapDataSource.java from "UTF-8" to > "ISO-8859-2" and accented characters started to work. The used config Yes, you are correct. The way it was meant to work was that, since you didn't know the codepage before reading the file, you read the file in iso-8859-1 always. When you save a label, you recover the bytes from the string that you have read (it was read incorrectly because the character set is different, but you can always recover the actual bytes that were in the file) and decode them into unicode using the correct charset. The recode() method does this. But.. then READING_CHARSET was changed to utf-8 to deal with a commonly found kind of file, and the recode() method only works properly if the READING_CHARSET is iso-8859-1 (or similar 8-bit only charset). The change to utf-8 was made, I belive, because there are files that do no contain a CodePage and have the strings in utf-8 (produced by osm2mp). I've never used cgpsmapper, so I don't know if there is a standard way to say that the file is in utf-8 for this case. So I guess, we should change READING_CHARSET back to iso-8859-1 and find some other way to deal with utf-8 files if it is still an important use. Best wishes ..Steve
- Previous message: [mkgmap-dev] patched polish file charset and multipolygon handling
- Next message: [mkgmap-dev] Multipolygones and tags in outer line
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list