PyDev adventures

Tuesday, March 13, 2012

PyCon retrospect

Ok, so, this was the first time I've been on PyCon and I must say it was pretty nice.

Initially, it started out kind of lousy for me: United seems to have 'misplaced' my bag and I only got it back in time to check it back to Brazil, so, that sucked. On the other hand, to balance things, I won a Kindle DX in raffle from http://www.truecar.com :)

Mainly, it was awesome talking to so many great names of the Python world... Meeting people face to face does make a great difference (some things are simply not as well expressed in an e-mail).

On the PyDev side, it was pretty nice seeing many people using it while hacking away during the conference and even having some people asking 'how can I help to make PyDev better?'.

I guess I wasn't properly prepared and mostly, my answer was: read the devs guide to grab the code/compile, find an itch to scratch and mail me about it so that I can help you find yourself there... Some things can be done in Java, some in Jython, depends a lot on what you want to do.

Aside from that, documentation-wise, I guess the major issue would be having some section explaining how to configure each major library on how to work with PyDev (most work out of the box, but some need minor adjustments such as adding a token to the forced builtins and some may even require more than that), so, if someone is willing to help working on the PyDev documentation in that topic, that would be a great help (and I guess that if someone doesn't have an itch to scratch and still wants to contribute in code, there are things I can point to that need to be done and are relatively simple to get started).

On the scientific computing side, I liked Numba (https://github.com/ContinuumIO/numba), which seems to be a pretty nice idea... it's still in its early stages, but I think it could be a nice substitute for Psyco (but with knowledge about numpy types and maybe a bit more strict -- if I did understand it properly)... I'd much rather use something as Numba for speedups than Cython (as the Cython code is only 'close' to Python, but doesn't really run on a Python interpreter) and Pypy is just not an option right now...

Although Pypy seemed strong in the conference, I must say that it'll probably be a long, long time until I'll be able to use it in production... I really use lots of code in C/C++ bindings, so, this is a killer CPython feature which I'm not sure Pypy will ever be able to provide (and the 'current' workaround of porting a library to RPython as is being done in Pypy/Numpy doesn't seem to be really feasible).

One thing I thought was a bit strange was the lack of projects with Python in mobile platforms, as I was guessing it should be straightforward to compile Python to Android/iPhone (so, I'm guessing the issue is probably the lack of proper multi-platform bindings on that area). In contrast, Python seems very strong on the server-side and scientific computing fields.

And I think Python 3 does deserve a special note here: I think Guido left it pretty clear (as much of his keynote talked about the subject) that Python 2 is really deprecated, so, although it's currently the production version that's most widely deployed/used, people have to really start getting used to the idea that a port to Python 3 is the way to go -- although in the real world, I still do expect this to take quite a few years (probably more than was initially expected by the Python-core folks). The fact that Guido made this topic such a big portion of his keynote, does make me feel like many of the Python lovers were against this move and the actual response from the community on this respect has been mixed, but regardless of that, the direction that Python-core is going is 100% on Python 3 (maybe doing something here or there to help in easing the port or having a shared Python 2-3 codebase).

And, there are still have some talks I'll see later at: http://pyvideo.org (because many interesting talks took place at the same time).

Tuesday, March 06, 2012

PyCon

Tomorrow I'll be traveling to PyCon and I'll be staying during the 3 days of the main conference (Friday to Sunday).

So, anyone who wants to talk about PyDev, scientific programming, how to best structure a Python development environment -- or just wants to say hi, I'll be there this time :)

Just to note, currently the major projects I'm working on are:

PyDev (http://pydev.org), which I guess is pretty well known (especially if you're reading this blog), and I've been developing it for 9 years already :)

and Kraken (https://www.esss.com.br/kraken/), which is a software to do post-processing of reservoir simulations (this may be less known, but actually, I started working on PyDev to scratch my own itches while doing scientific programming in Python, and so far, Kraken is the most advanced/complex software I've worked on this area and I've been involved with its development since its beginning, around 5 years ago).

See you in Santa Clara :)

Wednesday, February 08, 2012

PyDev forums -> StackOverflow

The PyDev forums at SourceForge are now officially deprecated :)

So, anyone having a doubt regarding PyDev should now ask at StackOverflow and add a 'PyDev' tag.

I think this will be a real improvement over the current status quo... Some reasons I see for that are:

1. Many PyDev users follow StackOverflow and do answer things there, whereas in the PyDev forum, many questions were asked, but I was almost the only one answering... (I think the real plus here are the 'gaming' features that StackOverflow has, so, more people are inclined to participate actively).

2. As people started asking there anyways, I really had to follow StackOverflow closely too, so, deprecating the PyDev forums means I'll be able to follow a single place again :)

3. Interacting with StackOverflow as a whole seem a nice improvement over the SourceForge forum (it's edition is nicer, accepts pictures, etc.)

And now on to something a bit unrelated... the PyDev homepage (http://pydev.org) is now being generated from a wiki (it's still a read-only wiki -- but hopefully that'll change soon -- but at least, I feel it'll be easier for me to edit things there and later have the homepage updated from it). So, if someone finds something strange in the homepage, please let me know :)

Wednesday, February 01, 2012

PyDev 2.4.0 released

This release was mostly focused on performance and memory optimizations.

On the performance front, the major focus was on start up (i.e.: start up Eclipse, open an editor, request a code-completion and show the globals token browser (Ctrl+Shift+T)) -- which should've become pretty fast (tested only with the Eclipse runtime and PyDev -- as things in other plugins can't really be controlled -- the subversive plugin for subversion seems to be especially slow to startup).

Memory-wise, things have been improved too, with the AST taking up less memory and doing a 'pseudo-intern' for some rather large caches (it's a pseudo-intern because the String.intern() function is not used: a HashMap is done and strings are reused inside it for some processes -- and later that HashMap is thrown away), and the Jython plugin was fine tuned to make less plugins visible to save on memory (and startup time).

Just to note: the real memory used can be seen going to window > preferences > general > show heap status (the real size of the java process in the OS will probably be bigger as java will usually grow to the size specified by -Xmx, regardless of how memory it's really using at a given time). Personally, on large projects I allocate 300 Mb for the process, but this is mostly because the subversion plugin seems to be rather resource hungry -- migrating to git on some of those projects seems to be making things better :)

Aside from that, this time I spent some time migrating the PyDev homepage to a wiki ( https://wiki.appcelerator.org/display/tis/Python+Development ) -- right now it's not available for external edition, but that should happen soon (hopefully), and the idea is that the PyDev homepage will be generated mostly from that wiki.

And as usual, a bunch of bugs were fixed :P

Thursday, January 05, 2012

Code-completion strategies in PyDev

I believe one of the strong points in PyDev is its code-completion, so, I thought a bit about giving some details on it :)

The main preference page for code completion is: Window > Preferences > PyDev > Editor > Code Completion (my preferred configuration is setting the 'Request completions on all letter chars and '_'', so that completions appear automatically when typing, otherwise Ctrl+Space would need to be used to request the completions -- I was actually thinking about making that the default and decided against it to conform to other editors in Eclipse).

* Word completion (also called Hippie Completion):

This is probably the simplest one and is provided by Eclipse itself (through Alt+/). It provides a simple word-based completion which uses all the currently opened editors in Eclipse.

I've actually provided a patch for Eclipse to improve the speed of this completion (https://bugs.eclipse.org/bugs/show_bug.cgi?id=270385), which was added in Eclipse 3.6.

* Templates completion:

These are user-defined templates that may be configured at PyDev > Editor > Templates (most of the base for this completion is provided by Eclipse... PyDev uses a subclass: PyTemplateCompletionProcessor and some of the available variables may be defined in Jython code -- see: pytemplate_defaults.py for details).

* Common tokens completion:

When you start typing in PyDev, some common tokens (i.e.: keywords, self, etc) start appearing directly. Those can be configured in PyDev > Editor > Code Completion (ctx insensitive and common tokens).

It's implementation is pretty simple (may be seen at: KeywordsSimpleAssist)

* Context insensitive completion:

This completion goes through all the tokens available for a given project (which may need to consider project dependencies and which interpreter is being used) and shows those tokens as a completion (i.e.: top-level tokens such as classes or methods and the modules themselves).

If one of those is selected, the token will be completed and an import will be added for it too (if the preference in PyDev > Editor > Auto Imports > "Do auto import?" is marked as true -- in that same preferences page, the number of chars that need to be available in a word so that these completions start appearing may be specified).

Note that if the option was set not to do the auto-import, one could just add the token, let it be marked as an unrecognized variable by PyDev and later do an Organize Imports (Ctrl+O), or a Quick Fix in that line (Ctrl+1), to add the import.

The major issue in this completion isn't actually the completion per-se (implemented in ImportsCompletionParticipant and CtxParticipant), but the structure which needs to be kept to have it as a fast and efficient completion.

Mainly, PyDev has a concept called 'AdditionalInfo' (this was done when PyDev Extensions was separated from the PyDev Open Source, so, the name is a bit strange now, but the general idea is that it was additional information related to a given project or interpreter), which keeps the following information:

- Two TreeMaps (AbstractAdditionalTokensInfo.topLevelInitialsToInfo and AbstractAdditionalTokensInfo.innerInitialsToInfo) which map token names to information of the places where the token may be found (i.e.: module and structure inside that module). Those are all kept in memory and are pretty fast to access (AbstractAdditionalTokensInfo.getTokensStartingWith is what's interesting for a code-completion and AbstractAdditionalTokensInfo.getTokensEqualTo is interesting when doing a quick fix or organize imports). This structure is also used in the global tokens browser (Ctrl+Shift+T).

- Note that it also has a structure (AbstractAdditionalDependencyInfo.completeIndex) which maps a module to all the available tokens in it. This structure is kept in memory only as a SoftHashMap (so, it's only kept in memory while there's enough space for it) and persisted to the disk. It's also only lazily created on operations that need it (currently only a project-wide rename refactoring or a find references (Ctrl+Shift+G) would use it as it's basically a structure which is a bit faster for doing exact match searches than actually doing a search in Eclipse -- especially if the SoftHashMap is still in memory, so, if many find references are done in succession, if there's enough memory, from the 2nd attempt onwards, things should be fast).

On a project build, the tokens of the completeIndex are simply all removed (to be recalculated when some action that needs it is called). As for the maps, those are always kept up to date when a file is changed. The strategy for having it build fast is that the in-memory cache is directly updated (which is reasonably fast) and instead of saving the whole map it just saves the delta information and when restoring the info, those deltas are applied to have it in the last state (and from time to time it does dump the whole structure and removes the deltas). Also, it runs in a separate thread (not actually in the thread that's doing the build, and a singleton: RunnableAsJobsPoolThread, makes sure than only some of those, depending on the number of processors in your machine, are running at the same time, so, if you change 200 files at once, your computer won't come to a halt).

* Context sensitive completion:

This is by far the most complex completion available as it analyzes the context where you're requesting a completion and provides tokens based on it. Basically, PyDev has an internal type-inference engine to do that (which is also used by actions such as find definition or TDD actions such as create method).

Internally it uses an LRU structure which maps module names to the module AST (Abstract Syntax Tree) and in a pretty recursive algorithm finds out about the available tokens needed for a given context and provides completions based on that (thankfully it has a huge amount of unit tests holding it all together). That process starts in PyCodeCompletion.getCodeCompletionProposals(ITextViewer, CompletionRequest) and the type inference engine main classes are: ASTManager and ProjectModulesManager.

On some occasions some modules may be pretty hard to analyze, in which case PyDev resorts to launching a shell and querying it for the needed tokens (those are pre-specified as in window > preferences > PyDev > Interpreter > Forced Builtins, and the communication happens in the java side through the AbstractShell class) -- it's also probably one of the main reasons of problems when configuring PyDev, as it's common to have a firewall blocking that communication (in which case PyDev wouldn't even be able to get common builtins such as len, object, etc).

On the good side, this also makes it possible for PyDev to analyze .pyd modules (although if you're developing such a module as a part of your project, you have to remember to call Ctrl+2, kill so that PyDev will kill those shells before you actually build it, otherwise that module will be locked and you won't be able to link it -- and tokens wouldn't be updated).