[mkgmap-dev] [PATCH v12] make maps in parallel
From Mark Burton markb at ordern.com on Sun May 31 21:49:41 BST 2009
v12 Great stuff, I found out why I was getting occasional differences in the output files. It was another occurrence of a non-deterministic iteration order of a HashMap but this one was real hard to understand because it only had an effect if the OSM map data had a badly formed restriction relation in it. So now that I understand what was going on, it's cool. And yes, the re-ordering was completely harmless given that the OSM data wasn't valid anyway (restriction with multiple "to" ways"). I intend to commit this stuff to trunk by the end of this week unless any -ve reports come in. ---------- v11 Patch applies to r1052. This seems to me to be well behaved. However, I am still detecting some changes in the NOD data between runs but, at this time, have no evidence that the resulting maps are broken in any way. Before committing this patch to trunk, I would like to find the cause of the reordering but it's proving elusive. I have checked all the obvious stuff like non-deterministic iteration order and shared (static) data and I believe that's all good. Anyway, in the meantime, please use this patch if you have a multi-core machine (especially if > 2 cores), specify --max-jobs and report any notable findings. --------- Changed default number of threads to be 1. If you specify --max-jobs without a value, you get one thread per core. --max-jobs=N means use N threads. With regard to comparing the output with known good maps to see if the parallel processing is corrupting anything, one problem is that the files contain timestamps. I have test code that zeros the time stamps and have been able to compare the output from different runs. What I have seen is that sometimes there are differences that appear to be due to the order in which the labels are written to the output file. If only the order is changing that is harmless but it would be nice to understand how it's happening (I have a theory about this, yet to be proven). --------- Now preserves order in which files are combined (thanks Steve for the tweak). --------- Now serialises reading of style files and map source to avoid reentrancy issue in GType. Reworked top-level loop that waits for the parallel jobs to complete. Appears to use a lot less CPU and could possibly influence the weird problems some were reporting on Windows/Mac - please retest with this version. Steve, I haven't incorporated your changed options handling stuff yet but will do in the future if (a) you don't commit it separately and (b) we can fix the reliability issues with this parallelisation code. --------- Now respects --num-jobs again (broken in last patch). --------- Now reports exceptions in the worker threads. --------- Here's a better fix than last night's effort for the problem where the mapname and description for each job were getting clobbered due to the way that the command args are processed. Each job now gets a "snapshot" of the command args so it doesn't matter if they subsequently get changed. --------- Whoops! fixed a bad bug whereby each map was being output to the same file. Not sure if the fix is very elegant but at least it's not being silly any more. Now limits the default value of max-jobs to 4 no matter how many cores you have as further testing shows that having more threads just burns CPU cycles but doesn't actually finish any quicker. I guess the memory system is limiting the performance and the CPUs are spinning waiting for access. Now showing a real speedup of around 240% (my earlier higher claim was based on CPU usage and I now realise that was erroneous, sorry). -------- Now defaults to creating a thread per core so without doing anything you should see a speedup on a SMP box when processing multiple maps. You can use --max-jobs=N to limit the concurrency - you may want to specify that if you can't increase the VM size to what is required. However, it occurs to me that if you can afford a box with more than 2 cores, then you can probably afford a reasonable amount of memory (otherwise, what's the point in having more cores?) Added help blurb. -------- OK, let it not be said that I don't listen to others! The attached patch provides support for making maps in parallel. By default, the behaviour is the same as before but if you specify --num-threads=N where N is greater than 1, it will process N maps at the same time and then combine the results (if required). Don't forget to increase the heap size appropriately. A quick test on the big box shows good speedup - specifying --num-threads=4 and 2GB VM size. I was seeing better than 380% utilisation with 8 cores in use. I suspect the performance limitation here will be VM size and memory system bandwidth. BTW - I don't think num-threads is actually the best name for the option, so please suggest alternatives. Cheers, Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: mb-parallel-maps-v12.patch Type: text/x-patch Size: 9139 bytes Desc: not available Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20090531/26d750f7/attachment.bin
- Previous message: [mkgmap-dev] Need Help with osm5XmlHandler File
- Next message: [mkgmap-dev] WARNING (MapArea): ycell was -1... (do I care?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the mkgmap-dev mailing list