[email protected]
[Top] [All Lists]

Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-b

Subject: Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage even with cache-size-bytes
From: Ryan Noon
Date: Tue, 18 May 2010 11:14:47 -0700
Hi All,

I converted my code to use LOBTrees holding LLTreeSets and it sticks to the memory bounds and performs admirably throughout the whole process.  Unfortunately opening the database afterwards seems to be really really slow.  Here's what I'm doing:

from ZODB.FileStorage import FileStorage
from ZODB.DB import DB

storage = FileStorage('attempt3_wordid_to_docset',pack_keep_old=False)

I think the file in question is about 7 GB in size.  It's using 100 percent of a core and I've never seen it get past the FileStorage object creation.  Is there something I'm doing wrong when I initially fill this storage that makes it so hard to index, or is there something wrong with the way I'm creating the new FileStorage?

Thanks for everything, you guys have really been great.


On Wed, May 12, 2010 at 3:48 AM, Jim Fulton <[email protected]> wrote:
Perhaps I should have picked up on this, but it wasn't clear that you
were refering to word_id_docset. I couldn't see that in the code and I
didn't get an answer to my question.

> wordid_to_docset is a "ZMap", which just wraps the ZODB
> boilerplate/connection and forwards dictionary methods to the root.

This is the last piece to the puzzle.  The root object is a persistent
mapping object that is a single database object and is thus not a
scalable data structure.  As Lawrence pointed out, this, together with
the fact that you're using non-persistent arrays as mapping values
means that all your data is in a single object.

> but I'm still sorta worried because in my experimentation with ZODB
> so far I've never been able to observe it sticking to any cache limits, no
> matter how often I tell it to garbage collect (even when storing very small
> values that should give it adequate granularity...see my experiment at the
> end of my last email).

The unit of granularity is the persistent object.  It is persitent
object that are managed by the cache, not indivdual Python objects
like strings.  If your entire database is in a single persistent
object, then you're entire database will be in memory.

If you want a scallable mapping and your keys are stabley ordered (as
are strings and numbers) then you should use a BTree.  BTrees spread
there data over multiple data records, so you can have massive
mappings without storing massive amounts of data in memory.

If you want a set and the items are stabley ordered, then a TreeSet
(or a Set if the set is known to be small.)

There are build-in BTrees and sets that support compact storage of
signed 32-bit or 64-bit ints.


Jim Fulton

Ryan Noon
Stanford Computer Science
BS '09, MS '10
For more information about ZODB, see the ZODB Wiki:

ZODB-Dev mailing list  -  [email protected]
<Prev in Thread] Current Thread [Next in Thread>