[mkgmap-dev] Increasing the performance of whole-planet splits.
From Scott Crosby scrosby06 at gmail.com on Thu Jun 10 20:35:44 BST 2010
Hello! I would like to submit the first of a set of performance optimizations to the splitter that improve the performance and reduce memory usage. The end result is that a core 2 duo can split the entire planet in 3.5 hours with -Xmx1700m and without using any temporary space. (67 minutes on a 8gb core i7) Everything is up on github. I'm sending the first set of patches now. Perhaps the biggest change is that I've made the splitter accept my binary OSM format. I created a binary OSM format that is 30-50% smaller than the bzipped planet (about 5.3gb without metadata) and 5x-10x faster to parse than the gzipped planet. As that format is substantially smaller than the splitter's cache, and to simplify my patch, I removed the cache before refactoring the splitter. My original announcement is at http://lists.openstreetmap.org/pipermail/dev/2010-April/019370.html Since that post I have fixed the issue of unwanted extra precision. I want your thoughts on the binary format before submitting it. Please have a look and tell me what you think. In this email I am submitting some other optimizations and patches. The performance numbers I quote assume that I was using the binary format; I am unsure what benefit they have on trunk. In IntIntMap, the current offset value for rehashing is way to low, given the dense nature of OSM nodeID values and a sequential insertion order. This problem is most obvious on a whole-planet split. It results in huge numbers of collisions; requiring 30+ probes on average. Note when benchmarking that the collision rate is irregular throughout the planet file; with a --max-nodes=4000000 whole planet split, there's virtually no performance differences on the first 90M nodes, yet by 120M nodes, using a random 31-bit prime number is over 3x faster. My multithread changes reduced the real and user CPU time by 30-50% on a uk extract. Other tests, e.g., a whole planet, show almost no effect. I think this is because a lot of my changes increase the efficiency of code that is parallelized, which paradoxically ends up reducing the relative benefits of parellelization. The single biggest remaining slowdown is that for each point, the splitter sends each Node to each area (represented by an OSMWriter), which writes it in the output file if it is in the bounds of the region (with overlap), or does nothing otherwise. On a whole planet, with 840 areas and 550M nodes, that is over 5 trillion bounds-checks that are not run in parallel. An index that can can map a node to the regions that may contain it could avoid 99% of these checks. Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: 0004-Lazily-generate-the-tag-hash-table.patch Type: text/x-patch Size: 1595 bytes Desc: not available Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: 0005-Change-the-offset-for-the-hash-table.patch Type: text/x-patch Size: 827 bytes Desc: not available Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: 0006-IntIntMap-Simplify-hash-search-function.patch Type: text/x-patch Size: 1740 bytes Desc: not available Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment-0002.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: 0008-New-thread-design.patch Type: text/x-patch Size: 7485 bytes Desc: not available Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment-0003.bin
- Previous message: [mkgmap-dev] Getting rid of the 'Node in too many areas' warning.
- Next message: [mkgmap-dev] Increasing the performance of whole-planet splits.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list