[mkgmap-dev] Splitting the entire planet in one pass.
From Scott Crosby scrosby at cs.rice.edu on Thu Sep 16 22:59:33 BST 2010
Well, technically two passes; the first pass is needed to figure out where to split. A new set of patches and line of development in my tree. Features: About 1/3 of the RAM usage. An entire planet can be split into 1200 areas in one pass with -Xmx5000m --max-nodes=1000000 in one hour. No upper bound on how many areas a node/way/relation can occur in. I've tested large overlaps with nodes in >10 regions. Generate up to 6700 areas in a single pass. (Yes, -Xmx5000m --max-nodes=50000 --max-regions=6700 on a 8gb machine generates 20,000 regions in three passes in 4 hours!) If areas.list says so, areas can overlap. The underlying algorithm can also handle non-rectangular areas. Quite a bit faster RAM usage proportional to the total number of nodes in the output files, at about 4 bytes per output node. Issues: 1. Two passes may be needed for a whole planet with 4gb of physical RAM. 2. Scaling to thousands of areas/output files in one pass can cause problems. a. There may be a system limit on number of open files. b. Per-file memory overhead of Java's IO layer and GzipOutputStream can cause unexpectedly high memory usage. On my machine, this limits me from doing >=10,000 areas per pass. c. The current threading code behaves badly, causing >300k context switches/second. My previous threading code also behaves badly, requiring one thread per output stream. I have new threading code that overcomes both problems and is 15% faster. 3. Dependencies on fastutils and dsiutils jars. 4. Depends on the refactoring I did to support the binary format. 5. Splitter does not find good splits with --max-nodes =< 50000 in the default resolution. Changes: The main memory overhead in the splitter is to track which nodes are in which area. It formerly used a custom IntIntMap. I refactored it to use an Int2ShortMultiMap, then implemented that MutiMap by composing two underlying Int2ShortMap implementations with different space efficiency tradeoffs, a custom sparsearray implementation based on http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html, and one imported from a library of java collections specialized to primitive types. The hybrid has a memory overhead of about 4 bytes per output, storing 750m keys with 800m vals in 3.2gb of heap. Testing: I split the cloudmade extract of NY into 160 areas of --max-nodes=100000 and compared my integration branch 'dev' versus SVN trunk. I then compared 3 of the area files. Files have minor differences. My branch reorders the tags within each node compared to trunk. Also, my XML output routines *always* output 7 degrees of precision, even if the data has only 6 degrees of precision (i.e. '123.1234560' instead of '123.123456'). Other notes: I just found out about a curious option -XX:+UseCompressedOops for recent Java VM's. It lets 32-bit pointers be used in a 64-bit VM, which can, for pointer-heavy uses, cut down on memory usage, cache pressure, memory bandwidth, etc. My SparseInt2ShortMap is fairly heavy on pointers and that option appears to offer a 10% memory reduction. Code is in branch 'perf/multiregion'. Code integrating all of my patches is currently available in branch 'dev' in my git repository on github and will be in SVN within a few weeks. Scott
- Previous message: [mkgmap-dev] Binary format update
- Next message: [mkgmap-dev] Splitting the entire planet in one pass.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list