[mkgmap-dev] TYP files and character encoding
From Ticker Berkin rwb-mkgmap at jagit.co.uk on Sat Dec 21 16:11:32 GMT 2019
Hi Gerd Attached is a patch that: Doesn't use the 'CodePage=' command in the typ-file to determine output character encoding of the typ-file, rather it uses the main map encoding from the --code-page argument. log.warn's any typ labels that can't be encoded in the --code-page, rather than just giving up with message like: > TYP file cannot be written in code page 1252 The message: > WARNING: SortCode in TYP txt file different from command line setting that was written direct to system.out is changed to a log.warn and it shouldn't happen anyway now For the moment, the 'CodePage=' command in the typ-file is, under some circumstances, used to determine the encoding of the typ-file itself and I've left this alone for compatibility with existing useage. Sometime in January I'll provide a better method for this Ticker On Wed, 2019-12-18 at 19:54 +0000, Ticker Berkin wrote: > Hi Gerd > > I think it is best to continue with the ideas for typ-files that: > > 1/ they can be in any character set and we just need a better way of > working out the correct one - see my posting earlier today. > > 2/ it can include as many languages as anyone can be bothered to add, > and so has to be an a character set that allows the languages to be > added, implying unicode for a common one (more particulary, UTF-8) > > 3/ the codepage= statement should be redundant and ignored for > controlling the output character set, which should be taken from the > map, but its use for determining the input coding might need to be > kept > for a while for compatability. > > 4/ the messages my hack generates should be turned into 1 warning or > information message per language or maybe suppressed altogether. If > someone is generating a map with a character set that doesn't support > a > particular language, they really won't care that that data for other > languages that have an incompatible representation with their > language > won't be there. > > Ticker > > On Wed, 2019-12-18 at 19:08 +0000, Gerd Petermann wrote: > > Hi Ticker, > > > > I think I understand now why we didn't have a default typ file ;) > > If I got that right I should revert the changes in r4395 and mkgmap > > should not allow or warn loudly when a typ file with a different > > codepage is merged? > > Or should we force the usage of unicode codepage? > > Or is it possible to compile mapnik.txt with cp 1252 (or any other) > > in a way that only those lines which contain non-matching > > characters > > are ignored? > > > > Gerd > > > > > > ________________________________________ > > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag > > von Ticker Berkin <rwb-mkgmap at jagit.co.uk> > > Gesendet: Mittwoch, 18. Dezember 2019 19:46 > > An: mkgmap development > > Betreff: [mkgmap-dev] TYP files and character encoding > > > > Hi > > > > A couple of problems with typ-files and unicode. > > > > With 'Codepage=65001' the final contents of the labels in > > mapnik.typ > > that is included with the composite map is unicode, but if the map > > is > > codepage 1252, the unicode characters with the top bit set are > > simply > > displayed as if in 1252. > > > > Removing the codepage statement from mapnik.txt and making fixes > > elsewhere to ensure that the file is read correctly as utf-8 and > > then > > generating a map with --code-page=1252, it gives the error: > > > > SEVE: uk.me.parabola.imgfmt.MapFailedException > > ../svn/trunk/resources/typ-files/mapnik.txt: > > (thrown in TypCompiler.makeMap()) > > TYP file cannot be written in code page 1252 > > > > Changing the exception handling in imgfmt/app/typ/TypElement.java, > > so > > that makeLabelBlock() reads as > > ... > > CharBuffer cb = CharBuffer.wrap(tl.getText()); > > try { > > ByteBuffer buffer = encoder.encode(cb); > > out.put((byte) tl.getLang()); > > out.put(buffer); > > out.put((byte) 0); > > } catch (CharacterCodingException ignore) { > > // ignore.printStackTrace(); > > String name = encoder.charset().name(); > > System.out.println("Cannot represent String=" + > > tl.getLang() + "," + tl.getText() + > > " in CodePage=" + name); > > // throw newTypLabelException(name); > > } > > ... > > > > It gives output like: > > Cannot represent String=21,Gara|e in CodePage=windows-1252 > > Cannot represent String=21,Obszar przemysBowy in CodePage=windows > > -1252 > > Cannot represent String=21,ZieleD in CodePage=windows-1252 > > Cannot represent String=21,Zaro[la in CodePage=windows-1252 > > Cannot represent String=21,MokradBa in CodePage=windows-1252 > > Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in > > CodePage=windows-1252 > > Cannot represent String=21,Droga szybkiego ruchu (B^Ecznik) in > > CodePage=windows-1252 > > Cannot represent String=21,Droga szybkiego ruchu (B^Ecznik) in > > CodePage=windows-1252 > > Cannot represent String=21,Zcie|ka rowerowa in CodePage=windows > > -1252 > > Cannot represent String=21,Wybrze|e in CodePage=windows-1252 > > Cannot represent String=21,Zcie|ka in CodePage=windows-1252 > > Cannot represent String=21,StrumieD in CodePage=windows-1252 > > Cannot represent String=21,Granica paDstwa in CodePage=windows-1252 > > Cannot represent String=21,Rzeka, KanaB in CodePage=windows-1252 > > Cannot represent String=21,StrumieD in CodePage=windows-1252 > > Cannot represent String=21,Ruroci^Eg in CodePage=windows-1252 > > Cannot represent String=21,Kabel wysokiego napi^Ycia in > > CodePage=windows-1252 > > Cannot represent String=21,Tor wy[cigowy in CodePage=windows-1252 > > Cannot represent String=21,Droga szybkiego ruchu (B^Ecznik) in > > CodePage=windows-1252 > > Cannot represent String=21,Droga krajowa (B^Ecznik) in > > CodePage=windows > > -1252 > > Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in > > CodePage=windows-1252 > > Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252 > > Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252 > > Cannot represent String=21,Restauracja (AmerykaDska) in > > CodePage=windows-1252 > > Cannot represent String=21,Restauracja (ChiDska) in > > CodePage=windows > > -1252 > > Cannot represent String=21,Restauracja (Mi^Ydzynarodowa) in > > CodePage=windows-1252 > > Cannot represent String=21,Restauracja (WBoska) in CodePage=windows > > -1252 > > Cannot represent String=21,Restauracja (MeksykaDska) in > > CodePage=windows-1252 > > Cannot represent String=21,Restauracja (P^Eczki) in > > CodePage=windows > > -1252 > > Cannot represent String=21,Restauracja (WegetariaDska) in > > CodePage=windows-1252 > > Cannot represent String=21,Kr^Ygle in CodePage=windows-1252 > > Cannot represent String=21,Sklep odzie|owy in CodePage=windows-1252 > > Cannot represent String=21,Wypo|yczalnia samochod\363w in > > CodePage=windows-1252 > > Cannot represent String=21,Gara| in CodePage=windows-1252 > > Cannot represent String=21,Sprzeda| samochod\363w in > > CodePage=windows > > -1252 > > Cannot represent String=21,Sklep |eglarski in CodePage=windows-1252 > > Cannot represent String=21,S^Ed in CodePage=windows-1252 > > Cannot represent String=21,O[rodek kultury in CodePage=windows-1252 > > Cannot represent String=21,Wi^Yzienie in CodePage=windows-1252 > > Cannot represent String=21,Stra| po|arna in CodePage=windows-1252 > > Cannot represent String=21,SBupek in CodePage=windows-1252 > > Cannot represent String=21,PrzystaD in CodePage=windows-1252 > > Cannot represent String=21,L^Edowisko helikopterowe in > > CodePage=windows > > -1252 > > Cannot represent String=21,Wie|a in CodePage=windows-1252 > > Cannot represent String=21,yr\363dBo in CodePage=windows-1252 > > Cannot represent String=21,Pla|a in CodePage=windows-1252 > > Cannot represent String=21,Przyl^Edek in CodePage=windows-1252 > > Cannot represent String=21,SkaBa in CodePage=windows-1252 > > > > Which makes sense if codepage 1252 doesn't handle Polish (hex 0x15, > > decimal 21). > > > > NB the non ascii characters in above are messed up by my cutting > > and > > pasting. > > > > Checking the French, on my Garmin device, the type descriptions now > > display accents correctly. > > > > Ticker > > > > _______________________________________________ > > mkgmap-dev mailing list > > mkgmap-dev at lists.mkgmap.org.uk > > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > > _______________________________________________ > > mkgmap-dev mailing list > > mkgmap-dev at lists.mkgmap.org.uk > > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: typCodePage.patch Type: text/x-patch Size: 4774 bytes Desc: not available URL: <http://www.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20191221/fc56a67f/attachment.bin>
- Previous message: [mkgmap-dev] TYP files and character encoding
- Next message: [mkgmap-dev] Bay polygons points conversion
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list