[mkgmap-dev] splitter - performance and memory
From Lambertus osm at na1400.info on Sun Jul 19 21:48:31 BST 2009
By all means go ahead. I think many people would welcome a Splitter that can handle the whole world. I think that Steve Ratcliffe also announced that he would be working on new version but he will no doubt respond. If your expectation of less then 50% memory usage is correct then you should be able to handle the world on your 6GB machine as North and South America (~2/3 of the data) can be split using 8GB ram with the current splitter. Chris Miller wrote: > First off, let me say a quick thanks to everyone involved in this > project. I've only just discovered it recently but wish I had found it > much earlier, it really is incredible to see what has been done so far. > > I've downloaded the complete planet-090715.osm.bz2 file, and have been > looking at splitting it. I read the description and limitations of the > splitter.jar tool but decided to give it a go anyway since I have a 64 > bit OS with 6GB ram. Unfortunately it still failed with a -Xmx5200m > heap. I have a 16GB machine at work I could try it on that might work, > however instead I decided to take a look at the source code to see if > there's any possibility of reducing the memory requirements. > > I've only spent a short time looking at the code, but as far as I can > tell the whole first step (computing the areas.list file) is using far > more memory than it actually needs. The SplitIntMap (which is what takes > up all the memory) isn't required here for two reasons. One is that the > code never retrieves entries via .get(), rather it only uses an iterator > so a list/array would suffice. Second, the node IDs aren't used in this > stage so we don't even need to parse them let alone hold on to them. > Assuming we replace the SplitIntMap with a wrapper around an int[] (or > multiple int[] to mitigate the double-memory-on-copy problem), we'd be > looking at memory savings of >50%. > > Does that make sense or have I missed something? If it sounds sensible > I'd be happy to have a go at implementing it. Also, given the nature of > the algorithm it wouldn't be too hard on performance if the lat+long > values were written out to disk rather than held in memory which would > mean splitting the whole dataset would be feasible even on a 32bit machine. > > I haven't yet looked at possibilities for tuning the second step but I > assume that some sort of map/lookup is still required here. I figure > there's a few options - perform multiple passes processing a subset of > the splits at a time (limited by the total number of nodes we can hold > in memory), optimise the existing data structures further, page some out > to disk, etc. > > I was also thinking a little about performance. Given the enormous size > of the full .osm file, I'd suggest a move away from SAX over to a pull > parser (http://www.extreme.indiana.edu/xgws/xsoap/xpp/mxp1/index.html). > It's even faster than SAX and uses very little memory. In my job we use > it to parse many GB of XML daily with very good results. Another idea is > to parallelise the code by running parts of the split on different > threads to take advantage of multi-core CPUs. Possibly the best gain > here would be when writing the files since gzip compression is fairly > CPU intensive. > > What do people think? I'm happy to work on the above though I must > confess up front that my spare time is quite limited so please don't > expect too much too soon! > > Chris > > > > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev at lists.mkgmap.org.uk > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
- Previous message: [mkgmap-dev] splitter - performance and memory
- Next message: [mkgmap-dev] splitter - performance and memory
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list