Sunday, March 02, 2014

Should the Python garbage collector be disabled?

Ok, though question... so, first a little bit of background:

The Python garbage collector is useful for collecting reference cycles, but objects are collected by default when their reference count reaches 0, so, most of the time objects will be collected properly and the collector is only useful when you have a cycle.

Also, there's no guarantee when it'll bump in to do a collection, so, if you're doing UI programming (i.e.: using something as Qt), and you use multiple threads, if you have a cycle, it's possible that the cycle is broken on a collect out of the main thread, which can cause your application to crash if an UI object is collected!

In this case, even if you're careful about collecting things, there's always the case where you have an exception and the object goes to sys.exc_info and becomes alive for much more time than you'd intend, so, if you are using an UI framework, at least making sure that you only collect in the UI thread is a must (see below code which helps doing that).

So, personally, I think that in Python the garbage collector should always be turned off (which can even make your code a lot faster in many situations) and the gc module should be used as a debug tool to find cycles which may occur -- and those should be treated as application errors!

weakref.ref() is one of the most useful things for breaking the cycles and if you need references to methods use a WeakMethodRef: http://code.activestate.com/recipes/81253/

Below is some code to make manual garbage-collection (credit to Erik Janssens) -- while developing the method check() should usually return self.debug_cycles() -- if you want you can use the remaining code to leave as a tool to break cycles in a real application if you want to play safe (although I think disabling it altogether is better if you make sure you don't have cycles) ...

Also, while we're talking about cycles and garbage collection, make sure you never override __del__... Python has an optional callable in weakref.ref() which can be used to do things when an object is collected -- and which doesn't have the problems related to __del__.


class GarbageCollector(QObject):
    '''
    Disable automatic garbage collection and instead collect manually
    every INTERVAL milliseconds.

    This is done to ensure that garbage collection only happens in the GUI
    thread, as otherwise Qt can crash.
    '''

    INTERVAL = 10000

    def __init__(self, parent, debug=False):
        QObject.__init__(self, parent)
        self.debug = debug

        self.timer = QTimer(self)
        self.timer.timeout.connect(self.check)

        self.threshold = gc.get_threshold()
        gc.disable()
        self.timer.start(self.INTERVAL)

    def check(self):
        #return self.debug_cycles() # uncomment to just debug cycles
        l0, l1, l2 = gc.get_count()
        if self.debug:
            print ('gc_check called:', l0, l1, l2)
        if l0 > self.threshold[0]:
            num = gc.collect(0)
            if self.debug:
                print ('collecting gen 0, found:', num, 'unreachable')
            if l1 > self.threshold[1]:
                num = gc.collect(1)
                if self.debug:
                    print ('collecting gen 1, found:', num, 'unreachable')
                if l2 > self.threshold[2]:
                    num = gc.collect(2)
                    if self.debug:
                        print ('collecting gen 2, found:', num, 'unreachable')

    def debug_cycles(self):
        gc.set_debug(gc.DEBUG_SAVEALL)
        gc.collect()
        for obj in gc.garbage:
            print (obj, repr(obj), type(obj))

5 comments:

Tuure Laurinolli said...

You say that there are problems with threads, Qt and garbage collection. What exactly are these problems?

Fabio Zadrozny said...

Hi Tuure, I think that's explained in the post, but I'll try to rephrase it:

If you have 2 threads: the UI thread and a secondary thread if you create an UI object which has a cyclic reference and gc.collect() is called out of the main thread your UI objects will be collected out of the main thread, which may cause your application to crash!

Sometimes you may not even have to create a cycle, just have an exception in the main thread which will keep some UI object alive in a frame in sys.exc_info and have sys.exc_info cleared in a non-ui thread (I saw this recently using Qt, having my shutdown crash because in the exit process it was clearing that in a non-ui thread -- kudos to the faulthandler module: https://pypi.python.org/pypi/faulthandler/ for helping me find the culprit!).

You can check: http://www.riverbankcomputing.com/pipermail/pyqt/2011-August/030378.html as a reference on that too...

Fabio Zadrozny said...

Also, the gc module itself can have quite a lot of overhead (even if you make sure you have no cycles).

See: http://dsvensson.wordpress.com/2010/07/23/the-garbage-garbage-collector-of-python/ for details on that...

In short, if you make sure you have no cycles, the garbage-collector is just overhead to your application...

Anonymous said...

So, how one actually uses your garbage collector? I should create it within my class, or next to it, or?

So far I have a class, within that class is the GUI (using pyside), and I've put your garbage collector inside the __init__ of my main class. It seems to be working (I see the debug messages), but I didn't eliminate the segfaults.

Also, do I need some special code to disable the default garbage collector, or your class does that as well (I see a line gc.disable())

Fabio Zadrozny said...

@Anonymous To use it just instance it and keep a live reference to it (it takes care of disabling the default garbage collector and doing collections in the main thread).

If you still have segfaults, that's probably for another reason -- i.e.: this fix is valid for when you have multiple threads and a cycle with a QWidget would be broken in a secondary thread which could cause a segfault (I recommend taking a look at faulthandler to check where a segfault may be happening).