[mkgmap-dev] Splitter details question
From Chris Miller chris.miller at kbcfp.com on Thu Jan 14 10:36:54 GMT 2010
JG> Couldn't you ignore the overlapping region completely and insert JG> instead a step 3a) After lookup the nodes of a way, add them to all JG> touched tiles too. Sort of... this is roughly the approach I'm considering, but it's not as straightforward as it first appears. The problem is that it's not possible to hold all the information required in the XML file in memory for each node. Currently that's not a problem; the first pass of the splitter figures out what the resultant split areas will be and the second pass scans through the source OSM file (or the disk cache), reads in each node and writes all of its info to the appropriate output file. But as soon as you need to write out additional nodes during processing of the ways, the following happens: 1) Nodes and ways get intermingled in the output OSM file. I don't think this is a problem at all since there still won't be any forward references from eg ways to nodes, but currently the splitter writes out <node/> tags first, then <way/> tags, then <rel/> tags so maybe this will break/annoy something/someone? 2) We have to somehow find out all the information about the additional node so we can write it out as XML. This is the killer. It means the cache becomes required rather than optional because it's impractical to rescan the source OSM file, and a disk-based index into the cache is required because even custom code mapping from node ID to cache index (long) takes up several GB of memory when splitting a large OSM file. There will be a lot of these lookups required, so a very fast index with minimum disk access is essential. My intention is to experiment with a b-tree variant that uses node IDs as keys (the example you posted uses Strings for both keys and values so isn't very suitable), and additionally is able to take some advantage of the fact that sequential node IDs are strongly correlated to their coordinates (and hence the tile they'll end up in). Careful prefetch caching will make a huge difference here. Another optimisation will be to cache either all the node data (or at least the IDs, coordinates, and cache index) in memory for nodes that fall into the overlap area of each tile since these will almost always be the nodes we're most interested in when processing the ways. Only if we can't find the node in that cache would we need to hit the disk. The downside is that this could add significant memory overhead to an already strained heap. An unrelated improvement I'd also like to build is an R-tree (in memory) of the split tiles to further speed mapping nodes to areas. JG> If the line crosses an corner of a tile, it will be clipped at the JG> boundaries or the other tiles. So there will be a small gap exactly JG> in the corner. I have never seen such a thing and think this is JG> neglectable. (On the other hand, if this gap is a piece of a JG> motorway, routing will became a little unexpected) You don't see this happen currently because the overlap almost always catches both of the nodes that fall outside the tile. Without the overlap, for each line segment in a way that crossed a tile boundary we'd need to test whether it clipped another tile or not. As far as I can see, this test would be required even if I implement all the above but disable the current overlap fucntionality. Hope that wasn't too much of a waffle and helps explain where I see this heading. Chris
- Previous message: [mkgmap-dev] Splitter details question
- Next message: [mkgmap-dev] Splitter details question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list