[mkgmap-dev] splitter that generates problem list
From Gerd Petermann gpetermann_muenchen at hotmail.com on Tue Nov 6 08:08:53 GMT 2012
Hello Henning, > > for each node: calculate the area that contains it and save it in a map > > So at the end of reading nodes, RAM knows, where a node is, or am I wrong? Yes, that is right > Then it should be possible to check for each relation/way/node in which > tiles they have at least one node. The result would be, that a object > has at least one node in tile x, y and z. I don't see a reason, why it > should be problematic, if these tiles overlap. But maybe this is caused > by limitations I don't see. The limit is the huge amount of data. - to identify a node we need its id and a value representing the position. If we store the complete coordinates, we need around 20 bytes for each node. We have to store all nodes that are elements of a way or relation, so I expect 20 * 0.5 * number_of_nodes_in_planet --> 16GB - to identify the nodes of a way we have to store the list of node ids for each way, to calculate the tiles of a way we need fast random access to the list of saved nodes. A simple solution stores the list of way nodes in an array, this requires again around 8 bytes for each node, or 8 * 0.5 * number_of_nodes_in_planet --> 6GB - to calculate the tiles of a relation we need also fast random access to the list of ways. If we use a HashMap for that we need approx. 40 bytes for each way, giving 40*num_of_ways_in_planet -> 6GB Besides that we have to store the resulting list of problem polygons, so, you will need > 32GB to produce the problem list. A big problem is the fast random access, this is likely to require even more memory. The big advantage of this simple solution is that it can be implemented with a few lines of code, and maybe optimized data structures will reduce the number of bytes. I will add this to splitter to give you the chance to try it. > > Your first solution wont have any real benefit for me. It will end in > five splitter runs for ten mapsets. This will need to much time. Almost > splitting hole planet with a given list of problematic polygons need > about 1:40 h. With automatic calulation of problematic polygons this > time will increase and them must be multiplied with five. 12h are to > much time for this. I don't see that we need 5 runs instead of one, but anyway, I have to implement the first solution to make splitter robust. Note that the additional passes are only needed to create the problem list, the rest of the split process will be done in the same way as with patch v2. I agree that 12h are too much. I'd like to see the log of the split process with patch v2. Please, could you run it with patch v2 and additional parms for the JVM: -Xrunhprof:cpu=samples,depth=20 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and send me the complete log plus the created java.hprof.txt file (maybe on linux the name of the latter is different). > > Running several splitter parallel wont be clever because of > disk-IO-limit. A solution could be: splitter reads an object and then > spread it to several "sub-splitter", eg. one for each mapset. But this > will also need much RAM. On my machine, with pbf input and output, the disk speed is no bottleneck at all, at least not the read processes. With o5m this could be different because - the files are a bit and - the cpu cost are much smaller We don't have to guess, the log file will show details. Ciao, Gerd -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20121106/1ed58ab6/attachment.html
- Previous message: [mkgmap-dev] splitter that generates problem list
- Next message: [mkgmap-dev] splitter that generates problem list
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list