Sunday, January 18, 2015

Using tail-recursion on plain Python (hack)

Ok, just to note, I don't think this should be actually used anywhere, but I thought it was a nice hack to share :)

The idea is using the Python debug utilities to go back to the start of the frame... it probably won't win any speed competition and will break your debugger (so, really, don't use it).

The idea is setting "frame.f_lineno" to set the next line to be executed (but that can only be done inside a tracing function, so, we do have to set the tracing just for that and remove it right afterwards).

As a note, setting the next line to execute is  available in the PyDev debugger, so, you could jump back to some place backward in the code through Ctrl+Alt+R to see how some code behaved -- with the possibility of changing local vars, this is usually pretty handy.

So, now on to the code -- I think it's pretty straightforward:

import sys
from functools import partial

    
def trisum(n, csum):
    f = sys._getframe()
    
    if n == 0:
        return csum
    else:
        n -= 1
        csum += 1
        print(n, csum)
        
        f.f_trace = retrace_to_trisum
        sys.settrace(retrace_to_trisum)
        raise AssertionError('Never gets here!')
        
        # Originally the 3 lines above would be the recursion call
        # It's possible to see that commenting the above lines
        # and executing the code will make Python throw a
        # RuntimeError: maximum recursion depth exceeded.
        trisum(n, csum)

def reuse_frame(line, frame, *args):
    # Reusing a frame means setting the lineno and stopping the trace.
    frame.f_lineno = line + 2 # +2 to Skip the f = sys._getframe()
    sys.settrace(None)
    frame.f_trace = None
    return None

retrace_to_trisum = partial(reuse_frame, trisum.__code__.co_firstlineno)
print(trisum(1000, 0))

Friday, January 09, 2015

Creating safe cyclic reference destructors (without requiring __del__)

Well, it seems common for people to use __del__ in Python, but that should be a no-go mainly for the reasons below:

1. If there's a cycle, the Python VM won't be able to decide in what order elements should be deleted and will keep them alive forever (unless you manually clear that cycle from the gc module)... Yes, you shouldn't create a cycle in the first place, but it's hardly guaranteed some client of your library does a cycle when he shouldn't.

2. There are caveats related to reviving self during the __del__ (i.e.: say making a new reference to it somewhere else during its __del__ -- which you should definitely not do...).

3. Not all Python VMs work the same, so, unless you explicitly do some release on the object, some resource may be alive much longer than it should (i.e.: Pypy, Jython...)

4. If an exception in the context is thrown, all the objects may stay alive for much longer than antecipated (because the exception keeps a reference to the frame that has thrown the exception).

Now, if you still want to manage things that way (say, to play safe if the user forgets to do a context manager on some case), at least there's a relatively easy solution for points 1 and 2: instead of using __del__, use the weakref module to have a callback when the object dies to make the needed clearing...

The only thing to make sure here is that you don't use 'self' directly inside the callback, only the things it has to clear (otherwise you'd create a cycle to 'self', which is something you want to avoid here).

The example below shows what I mean (StreamWrapperDel is the __del__ based solution which shouldn't be used and StreamWrapperNoDel is the solution you should use):

import weakref

class StreamWrapperDel(object):
    
    def __init__(self, stream):
        self.stream = stream

    def __del__(self):
        print('__del__')
        self.stream.close()
        
class StreamWrapperNoDel(object):
    
    def __init__(self, stream):
        self.stream = stream
        def on_die(killed_ref):
            print('on_die')
            stream.close()
        self._del_ref = weakref.ref(self, on_die)


if __name__ == '__main__':
    class Stub(object):
        def __init__(self):
            self.closed = False
        
        def close(self):
            self.closed = True
            
    s = Stub()
    w = StreamWrapperDel(s)
    del w
    assert s.closed
    
    s = Stub()
    w = StreamWrapperNoDel(s)
    del w
    assert s.closed

Given that, personally I think Python shouldn't allow __del__ at all as there's another way to do it which doesn't have the related caveats.

For some real-world code which uses that approach, see: https://code.activestate.com/recipes/578998-systemmutex (recipe for a system wide mutex).

p.s.: Thanks to Raymond Hettinger the code above is colorized: https://code.activestate.com/recipes/578178-colorize-python-sourcecode-syntax-highlighting

Thursday, January 08, 2015

PyDev 3.9.1 released

PyDev 3.9.1 has just been released.

There are some noteworthy improvements done:
  • Preferences may now be saved and persisted per project or to the user settings.

For configuring the preferences, the approach is a bit different from most other Eclipse plugin, as it extends the existing preferences pages instead of creating project property pages and allows saving the options to multiple projects or to the user settings from there.







  • The pytest integration had some critical issues fixed (related to expected failures no longer being reported as failures and conftest loading fixed by automatically running from the proper folder).


  • The attach to process is now working in Mac OS.
See: http://pydev.org for more details on the release.


Wednesday, November 19, 2014

pytest fixtures: When does it make sense to use them?

Just a bit of background: pytest (http://pytest.org) is one of the main Python testing frameworks, and it provides many new ways on how to write tests (compared with xUnit based frameworks), so, the idea here is exploring a bit on making use of one of its ideas: fixtures.

Personally, I think fixtures are a pretty good idea, but seeing some real-world code with it has led me to believe it's often abused...

So, below I'll try to list what I think are the PROs and CONs of fixtures and give some examples to back up those points...

PROs: 

  • It's a good way to provide setup/tear down for tests with little boilerplate code.

The example below (which is based on pytest-qt: https://github.com/nicoddemus/pytest-qt) shows a nice example where fixtures are used to setup the QApplication and provide an API to deal with testing Qt.

  
from PyQt4 import QtGui
from PyQt4.QtGui import QPushButton
import pytest

@pytest.yield_fixture(scope='session')
def qapp():
    app = QtGui.QApplication.instance()
    if app is None:
        app = QtGui.QApplication([])
        yield app
        app.exit()
    else:
        yield app

class QtBot(object):

    def click(self, widget):
        widget.click()
        
@pytest.yield_fixture
def qtbot(qapp, request):
    result = QtBot()
    yield result
    qapp.closeAllWindows()

def test_button_clicked(qtbot):
    button = QPushButton()
    clicked = [False]
    def on_clicked():
        clicked[0] = True

    button.clicked.connect(on_clicked)
    qtbot.click(button)
    assert clicked[0]


  • autouse is especially useful for providing global setup/tear down affecting tests without doing any change on existing tests.

The example below shows a fixture which verifies that after each test all files are closed (it's added by default to all tests by using autouse=True to make sure no test has such a leak).

  
import os
import psutil
import pytest

@pytest.fixture(autouse=True)
def check_no_files_open(request):
    process = psutil.Process(os.getpid())
    open_files = set(tup[0] for tup in process.open_files())

    def check():
        assert set(tup[0] for tup in process.open_files()) == open_files

    request.addfinalizer(check)

def test_create_array(tmpdir): # tmpdir is also a nice fixture which creates a temporary dir for us and gives an easy to use API.
    stream = open(os.path.join(str(tmpdir.mkdir("sub")), 'test.txt'), 'w')
    test_create_array.stream = stream  # Example to keep handle open to make test fail


Now, on to the CONs of fixtures...

  • Fixtures can make the code less explicit and harder to follow.

  
--- my_window.py file:

from PyQt4.QtCore import QSize
from PyQt4 import QtGui

class MyWindow(QtGui.QWidget):
    def sizeHint(self, *args, **kwargs):
        return QSize(200, 200)

--- conftest.py file:

import pytest
@pytest.fixture()
def window(qtbot):
    return MyWindow()

--- test_window.py file:

def test_window_size_hint(window):
    size_hint = window.sizeHint()
    assert size_hint.width() == 200


Note that this example uses the qtbot shown in the first example (and that's a good thing), but the bad case is that if we had fixtures coming from many cases, it's hard to know what the window fixture does... in this case, it'd be more straightforward to simply have a test which imports MyWindow and does window = MyWindow() instead of using that fixture... Note that if a custom teardown was needed for the window, it could make sense to create a fixture with a finalizer to do a proper teardown, but in this example, it's clearly too much for too little...

Besides, by just looking at the test, what's this window we're dealing with? Where's it defined? So, if you really want to use fixtures like that, at the very least add some documentation on the type you're expecting to receive in the fixture!

  • It's usually easy to overuse fixtures when a simple function call would do...

The example below shows a Comparator being created where no special setup/teardown is needed and we're just using a stateless object...

  
import pytest

class Comparator(object):

    def assert_almost_equal(self, o1, o2):
        assert abs(o1 - o2) < 0.0001


@pytest.fixture()
def comparator():
    return Comparator()

def test_numbers(comparator):
    comparator.assert_almost_equal(0.00001, 0.00002)

I believe in this case it'd just make much more sense creating a simple 'def assert_almost_equal' function which was imported and used as needed instead of having a fixture to provide this kind of function...

Or, if the Comparator object was indeed needed, the code below would make it clearer what the comparator is, while having it as a parameter in a test makes it much more harder to know what exactly are you getting (mostly because it's pretty hard to reason about parameter types in Python).

  
def test_numbers():
    comparator = Comparator()
    comparator.assert_almost_equal(0.00001, 0.00002)
 
That's it, I think this sums up my current experience in dealing with fixtures -- I think it's a nice mechanism, but has to be used with care because it can be abused and make your code harder to follow!

Now, I'm curious about other points of view too :)

Tuesday, November 11, 2014

Vertical indent guides (PyDev 3.9.0)

The latest PyDev release (3.9.0) is just out. 

The major feature added is that it now has vertical indent guides -- see screenshot below -- they're turned on by default and may be configured in the preferences: PyDev > Editor > Vertical Indent Guide.

This has actually been a long-awaited feature and was added as one of the targets in the last crowdfunding!


Besides this, the 3.9.0 release is packed with many bug-fixes:

  • A critical issue with the minimap when on Ubuntu 12 was fixed. 
  • Some issues in the interactive console (which were introduced due to the latest enhancements related to asynchronous output) were also fixed
  • A bunch of others -- which may be seen at the release notes in http://pydev.org.

Also, this release makes the horizontal scrollbar visible by default again... this is mostly because many users seemed to be confused by not having it (personally, as the editor still scrolls with the cursor and my lines usually aren't that long, it doesn't really bother me, but I can see that many users expect having it -- so, those that want it hidden have to disable it in the minimap preferences and not the other way around).

-- Note that the vertical scroll is still hidden as the minimap is enabled by default.

Thursday, September 25, 2014

Attaching debugger to running process (PyDev 3.8.0)

The latest PyDev 3.8.0 has just been released... along with a bunch of bugfixes, the major feature added is the possibility of attaching the debugger to a running process.

So, I thought about explaining a bit how to use it (and later a bit on how it was done).

The first thing is that the Debug perspective must be activated (as the attach to process menu is only shown by default in the Debug Perspective). Then, when on the debug perspective, select PyDev > Attach to Process (as the image below shows).



When that action is activated, a list with the active processes is shown so that the process we should attach to can be selected (as a note, by default only Python processes are filtered, but if you have an executable running Python under a different name, it's also possible to select it).


After selecting the executable, the user must provide a Python interpreter that's compatible with the interpreter we'll attach to (i.e.: if we're attaching to a 64 bit interpreter, the interpreter selected must also be a 64 bit process -- and likewise for 32 bits).


And if everything went right, we'll be connected with the process after that step (you can note in the screenshot below that in the Debug View we have a process running regularly and we connected to the process through the remote debugger) -- after it's connected, it's possible to pause the running process (through right-click process > suspend)


So, this was how to use it within PyDev... now, I'll explain a bit on the internals as I had a great time implementing that feature :)

The first thing to note is that we currently support (only) Windows and Linux... although the basic idea is the same on both cases (we get a dll loaded in the target process and then execute our attach code through it), the implementations are actually pretty different (as it's Open Source, it can be seen at: https://github.com/fabioz/PyDev.Debugger/tree/development/pydevd_attach_to_process)

So, let me start with the Windows implementation...

In the Windows world, I must thank a bunch of people that came before me in order to make it work:
First, Mario Vilas for Winappdbg: https://github.com/MarioVilas/winappdbg -- mostly, this is a Windows debugger written in Python. We use it to attach a dll to a target process. Then, after the dll is loaded, there's some hand-crafted shellcode created to execute a function from that dll (which is actually different for 32 bits and for 64 bits -- I did spend quite some time here since my assembly knowledge was mostly theoretical, so, it was nice to see it working in practice: that's the GenShellCodeHelper class in add_code_to_python_process.py and it's used in the run_python_code_windows function).

Ok, so, that gives us the basic hackery needed in order to execute some code in a different process (which Winappdbg does through the Windows CreateRemoteThread API) -- so, initially I had a simple version working which did (all in shellcode) an acquire of the Python GIL/run some hand-crafted Python code through PyRun_SimpleString/release GIL... but there are a number of problems with that simple approach: the first one is that although we'll execute some Python code there, we'll do it under a new thread, which under Python 3 meant that this would be no good since we have to initialize the Python threading if it still wasn't initialized. Also, to setup the Debugger we need to call sys.settrace on existing threads (while executing in that thread -- or at least making Python think we're at that thread as there's no API for that)... at that point I decided on having the dll (and not only shellcode).

So, here I must thank the PVTS guys (which actually have an attach to process on Windows which does mixed mode debugging), so, the attach.cpp file which generates our dll is adapted from PVTS to the PyDev use case. It goes through a number of hoops to initialize the threading facility in Python if it still wasn't initialized and does the sys.settrace while having all threads suspended and makes Python think we're actually at the proper thread to make that call... looking at it, I find it really strange that Python itself doesn't have an API for that (it should be easy to do on Python, but it's such a major hack on the debugger because that API is not available).

Now, on to Linux: in Linux the approach is simpler as we reuse a debugger that should be readily available for Linux developers: gdb. Also, because gdb stops threads for us and executes the code in an existing thread, things become much simpler... first, because we're executing in an existing thread, we don't have to start the threading if it's not started -- the only reason this is needed in Windows is because we're executing the code in a new thread in the process, created through CreateRemoteThread -- and also, as gdb has a way to script in Python were we can switch threads and execute something in it while having other threads stopped, we also don't need to do the trick on Python to make it think we're in a thread to execute a sys.settrace as if we were in a thread when in reality we weren't, as we can switch to a thread and really execute code on it.

So, all in all, it should be working properly, although there are a number of caveats... it may fail if we don't have permissions for the CreateRemoteThread on Windows or in Linux it could fail if the ptrace permissions are not set for us to attach with GDB -- and probably a bunch of other things I still didn't think of :)

Still, it's nice to see it working!

Also, the last thanks goes to JetBrains/PyCharm, which helped in sponsoring the work in the debugger -- as I mentioned earlier: http://pydev.blogspot.com.br/2014/08/pydev-370-pydevpycharm-debugger-merge.html the debugger in PyDev/PyCharm is now merged :)

Tuesday, August 26, 2014

PyDev 3.7.0, PyDev/PyCharm Debugger merge, Crowdfunding

PyDev 3.7.0 was just released.

There are some interesting things to talk about in this release...

The first is that the PyDev debugger was merged with the fork which was used in PyCharm. The final code for the debugger (and the interactive console) now lives at: https://github.com/fabioz/PyDev.Debugger. This effort was backed-up by Intellij, and from now on, work on the debugger from either front (PyDev or PyCharm) should benefit both -- pull requests are also very welcome :)

With this merge, PyDev users will gain GEvent debugging and breakpoints at Django templates (but note that the breakpoints can only be added through the LiClipse HTML/Django Templates editor), and in the interactive console front (which was also part of this merge), the asynchronous output and console interrupt are new.

This release also changed the default UI for the PyDev editor (and for LiClipse editors too), so, the minimap (which had a bunch of enhancements) is now turned on by default and the scrollbars are hidden by default -- those that prefer the old behavior must change the settings on the minimap preferences to match the old style.

Also noteworthy is that the code-completion for all letter chars is turned on by default (again, users that want the old behavior have to uncheck that setting from the code completion preferences page), and this release also has a bunch of bugfixes.

Now, I haven't talked about the crowdfunding for keeping the support on PyDev and a new profiler UI (https://sw-brainwy.rhcloud.com/support/pydev-2014) after it finished... well, it didn't reach its full goal -- in practice that means the profiler UI will still be done and users which supported it will receive a license to use it, but it won't be open source... all in all, it wasn't that bad either, it got halfway through its target and many people seemed to like the idea -- in the end I'll know if keeping it on even if not having the full target reached was a good idea or not only after it's commercially available (as the idea is that new licenses will be what will cover for its development expenses and will keep it going afterwards).

A note for profiler contributors is that I still haven't released an early-release version, but I'm working on it :)

As for PyDev, the outcome of the funding also means I have fewer resources to support it than I'd like, but given that LiClipse (http://brainwy.github.io/liclipse/) provides a share of its earnings to support PyDev and latecomers can still contribute through http://pydev.org/, I still hope that I won't need to lower its support (which'd mean taking on other projects in the time I currently have for PyDev), and I think it'll still be possible to do the things outlined in the crowdfunding regarding it.