[mkgmap-dev] splitter that generates problem list

Tue Nov 6 08:08:53 GMT 2012

Hello Henning,

> > for each node: calculate the area that contains it and save it in a map 
> 
> So at the end of reading nodes, RAM knows, where a node is, or am I wrong?

Yes, that is right

> Then it should be possible to check for each relation/way/node in which 
> tiles they have at least one node. The result would be, that a object 
> has at least one node in tile x, y and z. I don't see a reason, why it 
> should be problematic, if these tiles overlap. But maybe this is caused 
> by limitations I don't see.

The limit is the huge amount of data. 
- to identify a node we need its id and a value representing the position. If we store the complete coordinates, we need
around 20 bytes for each node. We have to store all nodes that are elements of a way or relation, so I expect
20 * 0.5 * number_of_nodes_in_planet --> 16GB
- to identify the nodes of a way we have to store the list of node ids for each way, to calculate the tiles of a way we need 
fast random access to the list of saved nodes. A simple solution stores the list of way nodes in an array, this requires again
around 8 bytes for each node, or 8 * 0.5 * number_of_nodes_in_planet --> 6GB
- to calculate the tiles of a relation we need also fast random access to the list of ways. If we use a HashMap for that we need
approx. 40 bytes for each way, giving 40*num_of_ways_in_planet -> 6GB
Besides that we have to store the resulting list of problem polygons, so, you will need > 32GB to produce 
the problem list. A big problem is the fast random access, this is likely to require even more memory.

The big advantage of this simple solution is that it can be implemented with a few lines of code, and maybe optimized 
data structures will reduce the number of bytes.
I will add this to splitter to give you the chance to try it.

> 
> Your first solution wont have any real benefit for me. It will end in 
> five splitter runs for ten mapsets. This will need to much time. Almost 
> splitting hole planet with a given list of problematic polygons need 
> about 1:40 h. With automatic calulation of problematic polygons this 
> time will increase and them must be multiplied with five. 12h are to 
> much time for this.
I don't see that we need 5 runs instead of one, but anyway, I have to implement 
the first solution to make splitter robust. Note that the additional passes are only needed 
to create the problem list, the rest of the split process will be done in the same way as 
with patch v2.
I agree that 12h are too much. I'd like to see the log of the split process with patch v2.

Please, could you run it with patch v2 and additional parms for the JVM:
-Xrunhprof:cpu=samples,depth=20  -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
and send me the complete log plus the created java.hprof.txt file (maybe on linux the name of the latter is different).

> 
> Running several splitter parallel wont be clever because of 
> disk-IO-limit. A solution could be: splitter reads an object and then 
> spread it to several "sub-splitter", eg. one for each mapset. But this 
> will also need much RAM.
On my machine, with pbf input and output, the disk speed is no bottleneck 
at all, at least not  the read processes. With o5m this could be different because 
- the files are a bit  and 
- the cpu cost are much smaller

We don't have to guess, the log file will show details.

Ciao,
Gerd

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20121106/1ed58ab6/attachment.html