[mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
From Ticker Berkin rwb-mkgmap at jagit.co.uk on Mon Oct 18 09:44:19 BST 2021
Hi Gerd Yes - I don't know how we could test Garmin device/software use of these indexes. Does the mkgmap ordering have to agree with something Garmin is going to presume? Maybe it doesn't matter as long as there is consistency where one ordered mdr structure points into another ordered mdr. So, I propose to not worry about the actual ordering, but just make it use all available information so that sort/unique dedupe works correctly and do this consistently wherever necessary. This also side- steps the issue of surrogate-pairs, which would need more significant changes in code structure to deal with. It's interesting that the existing code would have generated as more- or-less unsorted mdr5 and rubbish mdr25/mdr29 when -unicode for chars without sort entries and no one has complained. Ticker On Mon, 2021-10-18 at 08:12 +0000, Gerd Petermann wrote: > Hi Ticker, > > thanks for looking into this. I have no clue how to test if the index > really works with those characters as I don't know how to type them. > If I got you right mkgmap isn't able to sort the city names so I > wonder how the index can be of any use? I assume we have the same > problem for other names like those for highways, POI etc? > > Gerd > > ________________________________________ > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag > von Ticker Berkin <rwb-mkgmap at jagit.co.uk> > Gesendet: Montag, 18. Oktober 2021 09:58 > An: Development list for mkgmap > Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building > index from unicode tiles > > Hi > > Although 2 16-bit items (surrogate pairs in UTF-16 speak) are > required > to represent many Chinese characters, this isn't the significant > problem in this case. > > Problem is that resources/sort/cp65001.txt doesn't give ordering to > lots of characters; it looks like it covers only about 10,500 of the > 1,112,064 possible code-points. Many of these non-ordered characters > are being used by the names in the tile in question. > > The basic handling for other codings (eg cp125*) uses a missing sort > as > the basis for ignoring the character; it won't be represented in the > output so no point in considering it in the sorting. > > This isn't the case with Unicode as all characters should show, but, > more importantly relating to this crash, stable sorting is required > for > de-duplication of some of the index structures this isn't happening > because of characters being ignored. > > Assuming the actual ordering of unspecified code-points doesn't > really > matter, I propose to change the logic slightly so undefined Unicode > is > sorted on its 16-bit value after the range of known sorts. > > I also need to make SortKey generation consistent in a similar way, > fix > some of uniqueness tests to be consistent with the sort and verify > that > the size of mdr5 is >= mdr25 so this type problem is detected before > it > is exposed when mdr25 indexes can't be represented in the same number > of bytes as mdr5 indexes. > > Ticker > > > On Sun, 2021-10-17 at 11:16 +0100, Ticker Berkin wrote: > > Hi > > > > It is most likely that this problem is because Chinese requires 2 > > UTF16 chars to encode many of its characters - see > > > > https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful > > > > I think it is only --index processing where this is a problem > > mkgmap. > > > > I'll investigate more > > > > Ticker > > > > > > _______________________________________________ > > mkgmap-dev mailing list > > mkgmap-dev at lists.mkgmap.org.uk > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > > > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
- Previous message: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
- Next message: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list