[mkgmap-dev] Twülpstedt, Normalisation of unicode strings
From Ticker Berkin rwb-mkgmap at jagit.co.uk on Tue Nov 16 10:42:03 GMT 2021
Hi Gerd If it is standard that in, for example, cp1252, rendering a letter followed by accent looks the same as the equivalent unicode sequence (ie it merges them), normalisation could be delayed until after an attempt had been made to encode the whole string, ie in AnyCharSetEncoder, after if (result.isUnmappable()) { do the normalisation and try to encode the whole string again, before going on to transliterate the normalised string if it fails. I couldn't any pointers to expected behaviour for these circumstances, to probably best to use your version. Agree with Format6Encoder. Utf8Encoder: Be consistent with AnyCharSetEncoder, ie agree with your version if you keep your version of AnyCharSetEncode. If you change it as above, then don't Normalise here. In the "latin1" part of the test, depending on the editor, it might be difficult to see that the test result contains the single char "ü", or, that the starting string contains 2 chars, "u" and "¨". Worse, an editor might change them. Maybe should be a test on the string lengths. Ticker On Tue, 2021-11-16 at 09:27 +0000, Gerd Petermann wrote: > Patch was missing... > > ________________________________________ > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag > von Gerd Petermann <gpetermann_muenchen at hotmail.com> > Gesendet: Dienstag, 16. November 2021 10:27 > An: Development list for mkgmap > Betreff: Re: [mkgmap-dev] Twülpstedt, Normalisation of unicode > strings > > Hi, > > please review my patch. I had some problems adding the Twülpstedt > example to the existing unit test. I think the new code is closer to > what should be tested. > Did I miss something? > > Gerd > > ________________________________________ > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag > von Gerd Petermann <gpetermann_muenchen at hotmail.com> > Gesendet: Montag, 15. November 2021 17:22 > An: Development list for mkgmap > Betreff: Re: [mkgmap-dev] Twülpstedt, Normalisation of unicode > strings > > Hi Ticker, > > OK, I had the same thoughts. > > Gerd > > ________________________________________ > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag > von Ticker Berkin <rwb-mkgmap at jagit.co.uk> > Gesendet: Montag, 15. November 2021 16:19 > An: Development list for mkgmap > Betreff: Re: [mkgmap-dev] Twülpstedt, Normalisation of unicode > strings > > Hi > > I'd vote for normalisation when the label is generated. > > If the un-normalised string can be represented in the target charset, > no need for normalisation. > > I don't see that styles should be testing names like this, and, if > they > really need to, clauses for alternate representations could be added. > > The proportion of input tag values that never make it into the final > .img must be quite high, so doing it early could be costly. > > Ticker > > On Mon, 2021-11-15 at 11:01 +0000, Gerd Petermann wrote: > > Hi all, > > > > see also https://forum.openstreetmap.org/viewtopic.php?id=74231 > > mkgmap sometimes fails to encode correct strings for a given > > codepage > > like 1252 (latin1). > > I've uploaded a file that contains an area in Germany where the u- > > umlaut in name > > Twülpstedt is encoded in two different ways, either with ü (0xfc) > > or > > as u + "COMBINING DIAERESIS" (0x75 + 0x308) > > See umlaut.osm at https://files.mkgmap.org.uk/detail/537 > > > > With the current code the 2nd variant is displayed as Twu?lpstedt. > > This 1-liner > > name = Normalizer.normalize(name, Normalizer.Form.NFC); > > helps to change the name to the usual encoding which works well > > with > > the codepage translation. > > > > So far so good. Now I wonder where exactly this call should be > > placed. > > My first idea was the code where the string is converted to a > > Garmin > > label, but maybe > > it should happen much earlier so that also the style rules "see" > > the > > normalized form. > > > > Any thoughts? > > > > Gerd > > > > _______________________________________________ > > mkgmap-dev mailing list > > mkgmap-dev at lists.mkgmap.org.uk > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > > > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
- Previous message: [mkgmap-dev] Twülpstedt, Normalisation of unicode strings
- Next message: [mkgmap-dev] Twülpstedt, Normalisation of unicode strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list