logo separator

[mkgmap-dev] Patch to reduce memory usage by interning strings.

From WanMil wmgcnfg at web.de on Wed Mar 31 22:58:45 BST 2010

Am 31.03.2010 22:10, schrieb Scott A Crosby:
> On Wed, 31 Mar 2010 21:13:49 +0200, WanMil<wmgcnfg at web.de>  writes:
>
>>> I noticed that mkgmap does not intern any strings. In particular, this
>>> tile, generated by the splitter, fails to build with -Xmx3000m on
>>> 64-bit jdk under linux. With my patch, mkgmap generates the tile with
>>> -Xmx1000m.
>>>
>>>       <bounds minlat='55.1953125' minlon='9.4921875' maxlat='56.6015625'
>>>       maxlon='11.513671875'/>
>>>
>>> This tile has 1m nodes. Among the nodes and ways on this tile, there
>>> are 12m tags, yet only 100k distinct tag key/value pairs; on average
>>> each value occurs 120 times.
>>>
>>> I explicitly do not use normal string interning because
>>> String.intern() strings are kept forever, and I want these strings to
>>> be GC'able after the tile is done. I trade GCability for having the
>>> occasional string duplicated in memory by flushing the interning table
>>> every 10k unique strings.
>>>
>>> This code is not presently multithread safe; Ideally there should be
>>> one string interning table for each parser/thread.
>>>
>>> Scott
>>>
>>
>> Hi Scott!
>>
>> I think that's a good idea to intern the strings.
>> As far as I know the LossyIntern class is not needed. The .intern()
>> function of a string does exactly the same.
>
> You are right. String intern does not intern forever at least since
> Java 1.2.
>
>> Some time ago I sent a very similar patch to the mailing list which
>> is not yet committed. Could you please test with your use case if it
>> performs a similar memory reduction?
>
> You can run it if you want, but from the numbers I gave above for this
> tile, interning values as in my patch will decrease the number of
> strings in RAM from 12M to<100k values. Interning only keys would
> reduce the number of Strings in RAM from 24M to 12M.
>
>
>> The patch is thread safe and does not intern all strings. In my
>> opinion the value of a name tag should not be interned because there
>> is a high probability that this tag is used once only.
>
> Thats probably true for many or most tiles, but not for the tile I
> referenced above, where on average each value occurs 120 times. That
> tile is unbuildable with a 3gb heap without my patch and buildable
> with 1gb heap with my patch.
>
> Shall I post an updated patch without FuzzyIntern?
>
> Scott

Scott,

my patch interned all keys and additionally the values of a limited 
number of keys. Maybe it's not necessary to limit the interning of 
values. So I have attached the very simple but hopefully very effective 
patch regarding the memory footprint of mkgmap.

Regarding your patch: I don't understand the function of the FuzzyIntern 
class. You build a HashMap from (uninterned) Strings to the interned 
String. Then you are looking up new strings in this HashMap and use the 
interned variant. Where's the difference to the (hopefully) very 
performance optimized intern() method?

 > String intern does not intern forever
I didn't know that. Do you have any link where this is specified?

WanMil


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mem_tag_reduce_v2.patch
Url: http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100331/eabc1c76/attachment.pl 


More information about the mkgmap-dev mailing list