zodb-dev@zope.org
[Top] [All Lists]

Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage (even with cache-size-b

Subject: Re: [ZODB-Dev] ZODB Ever-Increasing Memory Usage even with cache-size-bytes
From: Ryan Noon
Date: Tue, 11 May 2010 16:37:20 -0700
Hi Jim,

I'm really sorry for the miscommunication, I thought I made that clear in my last email:

"I'm wrapping ZODB in a 'ZMap' class that just forwards all the dictionary methods to the ZODB root and allows easy interchangeability with my old sqlite OODB abstraction."

wordid_to_docset is a "ZMap", which just wraps the ZODB boilerplate/connection and forwards dictionary methods to the root.  If this seems superfluous, it was just to maintain backwards compatibility with all of the code I'd already written for the sqlite OODB I was using before I switched to ZODB.  Whenever you see something like wordid_to_docset[id] it's just doing self.root[id] behind the scenes in a __setitem__ call inside the ZMap class, which I've pasted below.

The db is just storing longs mapped to array('L')'s with a few thousand longs in em.  I'm going to try switching to the persistent data structure that Laurence suggested (a pointer to relevant documentation would be really useful), but I'm still sorta worried because in my experimentation with ZODB so far I've never been able to observe it sticking to any cache limits, no matter how often I tell it to garbage collect (even when storing very small values that should give it adequate granularity...see my experiment at the end of my last email).  If the memory reported to the OS by Python 2.6 is the problem I'd understand, but memory usage goes up the second I start adding new things (which indicates that Python is asking for more and not actually freeing internally, no?).

If you feel there's something pathological about my memory access patterns in this operation I can just do the actual inversion step in Hadoop and load the output into ZODB for my application later, I was just hoping to keep all of my data in OODB's the entire time.

Thanks again all of you for your collective time.  I really like ZODB so far, and it bugs me that I'm likely screwing it up somewhere.

Cheers,
Ryan



class ZMap(object):
    
    def __init__(self, name=None, dbfile=None, cache_size_mb=512, autocommit=True):
        self.name = name
        self.dbfile = dbfile
        self.autocommit = autocommit
        
        self.__hash__ = None #can't hash this
        
        #first things first, figure out if we need to make up a name
        if self.name == None:
            self.name = make_up_name()
        if sep in self.name:
            if self.name[-1] == sep:
                self.name = self.name[:-1]
            self.name = self.name.split(sep)[-1]
        
            
        if self.dbfile == None:
            self.dbfile = self.name + '.zdb'
        
        self.storage = FileStorage(self.dbfile, pack_keep_old=False)
        self.cache_size = cache_size_mb * 1024 * 1024
        
        self.db = DB(self.storage, pool_size=1, cache_size_bytes=self.cache_size, historical_cache_size_bytes=self.cache_size, database_name=self.name)
        self.connection = self.db.open()
        self.root = self.connection.root()
        
        print 'Initializing ZMap "%s" in file "%s" with %dmb cache. Current %d items' % (self.name, self.dbfile, cache_size_mb, len(self.root))

        

    # basic operators
    def __eq__(self, y): # x == y
        return self.root.__eq__(y)
    def __ge__(self, y): # x >= y
        return len(self) >= len(y)
    def __gt__(self, y): # x > y
        return len(self) > len(y)
    def __le__(self, y): # x <= y
        return not self.__gt__(y)
    def __lt__(self, y): # x < y
        return not self.__ge__(y)
    def __len__(self): # len(x)
        return len(self.root)
        
        
    # dictionary stuff    
    def __getitem__(self, key): # x[key]
        return self.root[key]

    def __setitem__(self, key, value): # x[key] = value
        self.root[key] = value
        self.__commit_check() # write back if necessary 
        
    def __delitem__(self, key): # del x[key]
        del self.root[key]
        

    def get(self, key, default=None): # x[key] if key in x, else default
        return self.root.get(key, default)

    def has_key(self, key): # True if x has key, else False
        return self.root.has_key(key)

    def items(self): # list of key/val pairs
        return self.root.items()

    def keys(self):
        return self.root.keys()

    def pop(self, key, default=None):
        return self.root.pop()

    def popitem(self): #remove and return an arbitrary key/val pair
        return self.root.popitem()

    def setdefault(self, key, default=None):
        #D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
        return self.root.setdefault(key, default)

    def values(self):
        return self.root.values()
    
    def copy(self): #copy it? dubiously necessary at the moment
        NOT_IMPLEMENTED('copy')
        
        
    # iteration
    def __iter__(self): # iter(x)
        return self.root.iterkeys()
        
    def iteritems(self): #iterator over items, this can be hellaoptimized
        return self.root.iteritems()


    def itervalues(self):
        return self.root.itervalues()

    def iterkeys(self):
        return self.root.iterkeys()
    

    # practical realities of the abstraction
    def garbage_collect(self):
        self.root._p_jar.cacheGC()
        #self.connection.cacheGC()
    
    def commit(self):
        return self.__commit_check(force=True)
    
    def __commit_check(self, force=False):
        if self.autocommit or force:
            transaction.commit()



On Tue, May 11, 2010 at 3:50 AM, Jim Fulton <jim@xxxxxxxx> wrote:
On Mon, May 10, 2010 at 8:20 PM, Ryan Noon <rmnoon@xxxxxxxxx> wrote:
> P.S. About the data structures:
> wordset is a freshly unpickled python set from my old sqlite oodb thingy.
> The new docsets I'm keeping are 'L' arrays from the stdlib array module.
>  I'm up for using ZODB's builtin persistent data structures if it makes a
> lot of sense to do so, but it sorta breaks my abstraction a bit and I feel
> like the memory issues I'm having are somewhat independent of the container
> data structures (as I'm having the same issue just with fixed size strings).

This is getting tiresome.  We can't really advise you because we can't
see what data structures you're using and we're wasting too much time
guessing. We wouldn't have to guess and grill you if you showed a
complete demonstration program, or at least one that showed what the
heck your doing.

The program you've showed so far is so incomplete, perhaps we're
missing the obvious.

In your original program, you never actually store anything in the
database. You assign the database root to self.root, but never use
self.root. (The variable self is not defined and we're left to assume
that this disembodied code is part of a method definition.) In your
most recent snippet, you don't show any database access. If you
never actually store anything in the database, then nothing will be
removed from memory.

You're inserting data into wordid_to_docset, but you don't show its
definition and won't tell us what it is.

Jim

--
Jim Fulton



--
Ryan Noon
Stanford Computer Science
BS '09, MS '10
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@xxxxxxxx
https://mail.zope.org/mailman/listinfo/zodb-dev
<Prev in Thread] Current Thread [Next in Thread>