Thursday, February 26, 2015

Design for client-side applications in Python (but applicable to other languages too)


Ok, so, this is a post I wanted to write for some time already and never really got the time to do...

First off, this is a design I've worked some times during the years on client-side applications (Python and Javascript). It revolves around some simple concepts which can be used to develop small to big sized applications, and is suited to places where there are more long-lived instances -- which is usually the case on client-side applications (and usually not for server-side web applications).

These are also the concepts I used when implementing PyVmMonitor (http://pyvmmonitor.com).

For those interested, although PyVmMonitor is closed source, the non-domain specific bits are actually open source and may be found at the links below (so, they may be cloned in git and I'll reference them in this post):

https://github.com/fabioz/pyvmmonitor-framework
https://github.com/fabioz/pyvmmonitor-core

1. Plugin system based on interfaces (interfaces in this case being the java concept of interfaces or ABCs in Python) -- usually I like to be explicit about the interfaces provided and registering the implementors for those interfaces (if you want your programs to be extensible, you really should be programming based on interfaces -- for me it helps to think how a client would consume it, instead of thinking about issues on how to actually implement it).

This is also the mechanism that provides dependency-injection (so, you can swap out some implementor -- although it's not the classing dependency-injection because you still ask for things instead of them appearing magically for you at some variable).

The structure would be something like:


pm = PluginManager()
pm.register(EPView, 'my.View') # As a note, EP is a short for 'Extension Point'.
pm.register(EPMenu, 'my.Menu1') # Implementors are registered as strings to avoid 
pm.register(EPMenu, 'my.Menu2') # having to import the classes (to cut on startup time).


An example of use would be:


# Keeps the EPView instance alive inside the PluginManager 
# (more details ahead on item #2) and starts the view main loop.
view = pm.get_instance(EPView).main_loop()

# The EPView implementation could create its menus by asking the EPMenus registered.
menus = pm.get_implementations(EPMenu)
for menu in menus:
    view.create_menu(menu)


The actual implementation that PyVmMonitor is using can be found at https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/plugins.py

You can see that we're not worried about discoverability as most plugin frameworks are, as we're being explicit about registering things (and it should be easy to add that to the structure if we do need the extensibility for clients).

Also, in this particular implementation, I actually skipped the plugins as things are self contained and I didn't want the added complexity, but usually you'd register plugins and then the extension points and implementations would be registered only through plugins (and would specify the dependencies to other plugins).

2. A place to hold your instances:

I like to keep track of where instances I created are... usually it's difficult to track things in long-lived applications (which is usually not a problem in web-based applications because the model is really the database and objects are short-lived).

In the Eclipse SDK for instance it's easy to see many singleton-like structures spread apart many places and there's no unified approach to it. Every place has an API, so, there's a platform which has a workbench which has a part which has an editor... (I know, that's plain object orientation, but I find that lacking when extending things and there's always a different API for accessing anything).

So, instead of using many different APIs there are 2 main places where instances live -- and that's also the place to query for instances:

  •  The PluginManager itself: it's the place to ask for any extension/service in the application, so, it's fair that it can also keep the needed references alive (and we could also query for existing services/extensions in a shell/debugger).
  •  Inside the PluginManager, an extension named EPModelsContainer is provided and it's the place to put the actual models (in a tree-like structure) to compose your domain (for instance, in PyVmMonitor a new AttachedProcess is created for each attached process and has as a child a model for the the actual monitoring, where the statistics gathered are kept in the client side).

Note that this means that the 'ownership' of the items is one of these 2 places -- and they're quite different: when it's in the PluginManager, it'll be kept alive until the application finishes (although it's lazily started). For items in the EPModelsContainer, when any instance leaves the EPModelsContainer it should be readily deleted as well as anything it holds (so, other places should only keep a weak-reference or should monitor the removal of the item from the models container to make the proper cleanup so that the object can be garbage collected at that point).

The actual extension point that PyVmMonitor is using can be found at https://github.com/fabioz/pyvmmonitor-framework/blob/master/pyvmmonitor_framework/extensions/ep_models_container.py (note that it provides classes-based filtering and tree-iteration and could be extended to provide a JQuery-like api to query it).

Also, test-cases should test that before/after each test, all the instances created in the PluginManager and in the EPModelsContainer are garbage-collected! Note that on non-deterministic garbage-collection implementations such as PyPy/Jython this is not feasible because they aren't immediately collected, but on CPython with reference counting, this works great. So, at least test on CPython -- then if you don't have binary dependencies, run on PyPy :)

3. Callbacks:

As we've just defined that the instances ownership is really on the EPModelsContainer, there's room for callbacks which only keep weak-references to bound methods when you're interested is something -- in which case https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py is a pretty good implementation for that... note that for top-level methods, strong references are kept.

Why you may ask? Well, the main reason is that this use-case is usually for closures, so, it may be hard to find a place to add the function to -- and if it's a top level, the function will be alive until the end of times anyway (i.e.: process shutdown).

Anyways, this is probably a case that should only be used with care as unregistering must be explicit and things in the function scope will be kept alive!

4. A standard selection mechanism.

Again, as we just defined that all our model lives inside EPModelsContainer, the selection is usually simply keeping the id of the object(s) selected.

In PyVmMonitor, the base extension for this is the EPSelectionService. The concept is pretty simple: clients can listen to changes in the selection (through a Callback), can trigger selection changes and can get the current selection. As PyVmMonitor accepts multiple selection to inspect multiple processes at once, it always deals with a list(obj_id) and clients act accordingly to the selection to show the proper UI.

The extension interface used in PyVmMonitor lives at https://github.com/fabioz/pyvmmonitor-framework/blob/master/pyvmmonitor_framework/extensions/ep_selection_service.py

5. Undo/redo: Well, in PyVmMonitor there's really no undo/redo functionality because it's not really required for what it does, but on other applications I worked that had undo/redo, the basis was actually on the model entities that entered the EPModelsContainer: if they implemented an interface saying that they provided changes in them, they'd be tracked and commands would be automatically added to a command list for undo/redo purposes when the model changed (simply by recording the id of the object and the attributes new/previous values -- as well as providing a memento for the specific object when it entered/left the EPModelsContainer)... This is a bit different from using the command pattern because it's all done from the outside as we'd actually be hearing changes in the model instead of actively changing how we code to add the command pattern (which IMHO makes code much more verbose than it needs to be).

6. The actual UI... Well, for PyVmMonitor I didn't go through the effort of actually creating a reusable UI, and other projects I worked in which did have that concern aren't actually mine to open source, but the idea here would be making a view which would query for EPMenu, EPToolbar, EPAction, EPCentralWidget, EPDock, etc and would create the actual UI from that. For PyVmMonitor I just have a simple non-extensible UI which listens to the EPSelectionService and updates its central editor accordingly, and the UI has a 'def set_model' which receives the actual model or models it should show.

Well, that's it, hope you liked the approach, as I said, I've used it some times in the last years and it works great for me (but I'm interested if there are better approaches out there which could be reused or improve on those aspects).

Friday, February 06, 2015

PyDev 3.9.2 released

PyDev 3.9.2 is now out!

This version had many enhancements. The nicer one for me is that when debugging a console prompt which has history, code-completion (with tabs and Ctrl+Space), send line from editor to console with F2, PageUp to select multiple line from previous history (in sum, everything that the interactive console has) will appear for the user by default.

The image below shows it in action... As a note, the themed scrollbars are a courtesy of the latest LiClipse: http://liclipse.com (so, if you're not using LiClipse, the scrollbars will probably take up more space).


Besides this, there were many other enhancements:

  • PyVmMonitor: http://www.pyvmmonitor.com/ is now on public beta, so, you can now use the integrated profile view (Window > Show view > Other > PyDev > Profile) to profile your code.
  • The module rename refactoring can now switch to a regular rename (so it's easier to change the extension of a Python file).
  • The interactive console sends a real signal on interrupt when possible (so, it's handled immediately and works when in a sleep() call -- and not only when there some Python code executing).
  • The Cython parsing was enhanced a bit
  • When pasting contents directly in the PyDev Package Explorer, the new line delimiters are properly respected.
  • Tab settings may be saved in the project or user settings.
  • 2 critical deadlock conditions.
  • And many other things (see http://pydev.org/ for more info).
I'd also like to thank the PyDev supporters: https://sw-brainwy.rhcloud.com/supporters/PyDev/ -- without your support it wouldn't be possible to keep PyDev going!

Sunday, January 18, 2015

Using tail-recursion on plain Python (hack)

Ok, just to note, I don't think this should be actually used anywhere, but I thought it was a nice hack to share :)

The idea is using the Python debug utilities to go back to the start of the frame... it probably won't win any speed competition and will break your debugger (so, really, don't use it).

The idea is setting "frame.f_lineno" to set the next line to be executed (but that can only be done inside a tracing function, so, we do have to set the tracing just for that and remove it right afterwards).

As a note, setting the next line to execute is  available in the PyDev debugger, so, you could jump back to some place backward in the code through Ctrl+Alt+R to see how some code behaved -- with the possibility of changing local vars, this is usually pretty handy.

So, now on to the code -- I think it's pretty straightforward:

import sys
from functools import partial

    
def trisum(n, csum):
    f = sys._getframe()
    
    if n == 0:
        return csum
    else:
        n -= 1
        csum += 1
        print(n, csum)
        
        f.f_trace = retrace_to_trisum
        sys.settrace(retrace_to_trisum)
        raise AssertionError('Never gets here!')
        
        # Originally the 3 lines above would be the recursion call
        # It's possible to see that commenting the above lines
        # and executing the code will make Python throw a
        # RuntimeError: maximum recursion depth exceeded.
        trisum(n, csum)

def reuse_frame(line, frame, *args):
    # Reusing a frame means setting the lineno and stopping the trace.
    frame.f_lineno = line + 2 # +2 to Skip the f = sys._getframe()
    sys.settrace(None)
    frame.f_trace = None
    return None

retrace_to_trisum = partial(reuse_frame, trisum.__code__.co_firstlineno)
print(trisum(1000, 0))

Friday, January 09, 2015

Creating safe cyclic reference destructors (without requiring __del__)

Well, it seems common for people to use __del__ in Python, but that should be a no-go mainly for the reasons below:

1. If there's a cycle, the Python VM won't be able to decide in what order elements should be deleted and will keep them alive forever (unless you manually clear that cycle from the gc module)... Yes, you shouldn't create a cycle in the first place, but it's hardly guaranteed some client of your library does a cycle when he shouldn't.

2. There are caveats related to reviving self during the __del__ (i.e.: say making a new reference to it somewhere else during its __del__ -- which you should definitely not do...).

3. Not all Python VMs work the same, so, unless you explicitly do some release on the object, some resource may be alive much longer than it should (i.e.: Pypy, Jython...)

4. If an exception in the context is thrown, all the objects may stay alive for much longer than antecipated (because the exception keeps a reference to the frame that has thrown the exception).

Now, if you still want to manage things that way (say, to play safe if the user forgets to do a context manager on some case), at least there's a relatively easy solution for points 1 and 2: instead of using __del__, use the weakref module to have a callback when the object dies to make the needed clearing...

The only thing to make sure here is that you don't use 'self' directly inside the callback, only the things it has to clear (otherwise you'd create a cycle to 'self', which is something you want to avoid here).

The example below shows what I mean (StreamWrapperDel is the __del__ based solution which shouldn't be used and StreamWrapperNoDel is the solution you should use):

import weakref

class StreamWrapperDel(object):
    
    def __init__(self, stream):
        self.stream = stream

    def __del__(self):
        print('__del__')
        self.stream.close()
        
class StreamWrapperNoDel(object):
    
    def __init__(self, stream):
        self.stream = stream
        def on_die(killed_ref):
            print('on_die')
            stream.close()
        self._del_ref = weakref.ref(self, on_die)


if __name__ == '__main__':
    class Stub(object):
        def __init__(self):
            self.closed = False
        
        def close(self):
            self.closed = True
            
    s = Stub()
    w = StreamWrapperDel(s)
    del w
    assert s.closed
    
    s = Stub()
    w = StreamWrapperNoDel(s)
    del w
    assert s.closed

Given that, personally I think Python shouldn't allow __del__ at all as there's another way to do it which doesn't have the related caveats.

For some real-world code which uses that approach, see: https://code.activestate.com/recipes/578998-systemmutex (recipe for a system wide mutex).

p.s.: Thanks to Raymond Hettinger the code above is colorized: https://code.activestate.com/recipes/578178-colorize-python-sourcecode-syntax-highlighting

Thursday, January 08, 2015

PyDev 3.9.1 released

PyDev 3.9.1 has just been released.

There are some noteworthy improvements done:
  • Preferences may now be saved and persisted per project or to the user settings.

For configuring the preferences, the approach is a bit different from most other Eclipse plugin, as it extends the existing preferences pages instead of creating project property pages and allows saving the options to multiple projects or to the user settings from there.







  • The pytest integration had some critical issues fixed (related to expected failures no longer being reported as failures and conftest loading fixed by automatically running from the proper folder).


  • The attach to process is now working in Mac OS.
See: http://pydev.org for more details on the release.


Wednesday, November 19, 2014

pytest fixtures: When does it make sense to use them?

Just a bit of background: pytest (http://pytest.org) is one of the main Python testing frameworks, and it provides many new ways on how to write tests (compared with xUnit based frameworks), so, the idea here is exploring a bit on making use of one of its ideas: fixtures.

Personally, I think fixtures are a pretty good idea, but seeing some real-world code with it has led me to believe it's often abused...

So, below I'll try to list what I think are the PROs and CONs of fixtures and give some examples to back up those points...

PROs: 

  • It's a good way to provide setup/tear down for tests with little boilerplate code.

The example below (which is based on pytest-qt: https://github.com/nicoddemus/pytest-qt) shows a nice example where fixtures are used to setup the QApplication and provide an API to deal with testing Qt.

  
from PyQt4 import QtGui
from PyQt4.QtGui import QPushButton
import pytest

@pytest.yield_fixture(scope='session')
def qapp():
    app = QtGui.QApplication.instance()
    if app is None:
        app = QtGui.QApplication([])
        yield app
        app.exit()
    else:
        yield app

class QtBot(object):

    def click(self, widget):
        widget.click()
        
@pytest.yield_fixture
def qtbot(qapp, request):
    result = QtBot()
    yield result
    qapp.closeAllWindows()

def test_button_clicked(qtbot):
    button = QPushButton()
    clicked = [False]
    def on_clicked():
        clicked[0] = True

    button.clicked.connect(on_clicked)
    qtbot.click(button)
    assert clicked[0]


  • autouse is especially useful for providing global setup/tear down affecting tests without doing any change on existing tests.

The example below shows a fixture which verifies that after each test all files are closed (it's added by default to all tests by using autouse=True to make sure no test has such a leak).

  
import os
import psutil
import pytest

@pytest.fixture(autouse=True)
def check_no_files_open(request):
    process = psutil.Process(os.getpid())
    open_files = set(tup[0] for tup in process.open_files())

    def check():
        assert set(tup[0] for tup in process.open_files()) == open_files

    request.addfinalizer(check)

def test_create_array(tmpdir): # tmpdir is also a nice fixture which creates a temporary dir for us and gives an easy to use API.
    stream = open(os.path.join(str(tmpdir.mkdir("sub")), 'test.txt'), 'w')
    test_create_array.stream = stream  # Example to keep handle open to make test fail


Now, on to the CONs of fixtures...

  • Fixtures can make the code less explicit and harder to follow.

  
--- my_window.py file:

from PyQt4.QtCore import QSize
from PyQt4 import QtGui

class MyWindow(QtGui.QWidget):
    def sizeHint(self, *args, **kwargs):
        return QSize(200, 200)

--- conftest.py file:

import pytest
@pytest.fixture()
def window(qtbot):
    return MyWindow()

--- test_window.py file:

def test_window_size_hint(window):
    size_hint = window.sizeHint()
    assert size_hint.width() == 200


Note that this example uses the qtbot shown in the first example (and that's a good thing), but the bad case is that if we had fixtures coming from many cases, it's hard to know what the window fixture does... in this case, it'd be more straightforward to simply have a test which imports MyWindow and does window = MyWindow() instead of using that fixture... Note that if a custom teardown was needed for the window, it could make sense to create a fixture with a finalizer to do a proper teardown, but in this example, it's clearly too much for too little...

Besides, by just looking at the test, what's this window we're dealing with? Where's it defined? So, if you really want to use fixtures like that, at the very least add some documentation on the type you're expecting to receive in the fixture!

  • It's usually easy to overuse fixtures when a simple function call would do...

The example below shows a Comparator being created where no special setup/teardown is needed and we're just using a stateless object...

  
import pytest

class Comparator(object):

    def assert_almost_equal(self, o1, o2):
        assert abs(o1 - o2) < 0.0001


@pytest.fixture()
def comparator():
    return Comparator()

def test_numbers(comparator):
    comparator.assert_almost_equal(0.00001, 0.00002)

I believe in this case it'd just make much more sense creating a simple 'def assert_almost_equal' function which was imported and used as needed instead of having a fixture to provide this kind of function...

Or, if the Comparator object was indeed needed, the code below would make it clearer what the comparator is, while having it as a parameter in a test makes it much more harder to know what exactly are you getting (mostly because it's pretty hard to reason about parameter types in Python).

  
def test_numbers():
    comparator = Comparator()
    comparator.assert_almost_equal(0.00001, 0.00002)
 
That's it, I think this sums up my current experience in dealing with fixtures -- I think it's a nice mechanism, but has to be used with care because it can be abused and make your code harder to follow!

Now, I'm curious about other points of view too :)

Tuesday, November 11, 2014

Vertical indent guides (PyDev 3.9.0)

The latest PyDev release (3.9.0) is just out. 

The major feature added is that it now has vertical indent guides -- see screenshot below -- they're turned on by default and may be configured in the preferences: PyDev > Editor > Vertical Indent Guide.

This has actually been a long-awaited feature and was added as one of the targets in the last crowdfunding!


Besides this, the 3.9.0 release is packed with many bug-fixes:

  • A critical issue with the minimap when on Ubuntu 12 was fixed. 
  • Some issues in the interactive console (which were introduced due to the latest enhancements related to asynchronous output) were also fixed
  • A bunch of others -- which may be seen at the release notes in http://pydev.org.

Also, this release makes the horizontal scrollbar visible by default again... this is mostly because many users seemed to be confused by not having it (personally, as the editor still scrolls with the cursor and my lines usually aren't that long, it doesn't really bother me, but I can see that many users expect having it -- so, those that want it hidden have to disable it in the minimap preferences and not the other way around).

-- Note that the vertical scroll is still hidden as the minimap is enabled by default.