Wednesday, November 26, 2008

Making code work in Python 2 and 3

Below are some tips for those interested in writing code that runs on Python 2 and Python 3 -- there are probably many other issues, but those were the ones I ran into while porting the code-completion code in Pydev (so, there's probably going to be a part 2 of this when I go on to port the debugger)

0. This may be one of the most important advices: breathe regularly while doing the porting... and be prepared to have uglier code waiting for you if you want to support both Python 2 and Python 3 -- if you do have a choice, don't try to support both versions of Python -- the way Python 3 is implemented, no one's supposed to do that.

1. Print can NEVER be used (use the write() method from objects or create your own print function -- with a different name and use it everywhere)

Catching exceptions putting the exception in a given var can NEVER be used (there's no compatible way of doing it in a way that's acceptable in both versions, so, just deal with it as you can using the traceback module)

socket.send needs bytearray: socket.send(bytearray(str, 'utf-8'))

4. socket.receive
gets bytes (so, they need to be decoded: socket.recv(size).decode('utf-8')

Some imports:

    import StringIO
    import io as StringIO #Python 3.0

    from urllib import quote_plus, unquote_plus
except ImportError:
    from urllib.parse import quote_plus, unquote_plus #Python 3.0

    import __builtin__
except ImportError:
    import builtins as __builtin__ #Python 3.0

There are way too many others, so, the approach for getting it right is basically running the 2to3 with that import to see the new version of it and then making it work as it used to.

6. True, False assign: in some scripts, to support older python/jython versions, the following construct was used:
__builtin__.True = 1
__builtin__.False = 0

As True and False are keywords now, this assignment will give a syntax error, so, to keep it working, one must do:
setattr(__builtin__, 'True', 1) -- as this will only be executed if True is not defined, that should be ok.

7. The long representation for values is not accepted anymore, so, 2L would not be accepted in python 3.

The solution for something as long_2 = 2L may be something as:
except NameError:
    long = int
long_2 = long(2)

Note that if you want to define a number that's already higher than the int limit, that won't actually help you (in my particular case, that was used on some arithmetic, just to make sure that the number would be coerced to a long, so, that solution is ok -- note: if you were already on python 2.5, that would not be needed as the conversion int -> long is already automatic)

8. raw_input is now input and input should be written explicitly as eval(raw_input('enter value'))
So, to keep backwards compatibility, I think the best approach would be keeping on with the raw_input (and writing the "old input" explicitly, while removing the "new input" reference from the builtins)

except NameError:
    import builtins

    original_input = builtins.input
    del builtins.input
    def raw_input(*args, **kwargs):
        return original_input(*args, **kwargs)
    builtins.raw_input = raw_input

9. The compiler module is gone. So, to parse something, the solution seems to be using ast.parse and to compile, there's the builtins.compile (NOTE: right now, the 2to3 script doesn't seem to get this correctly)


Anonymous said...

I really hope nobody will do such tricks to maintain compatibility among the two incompatible versions.

I think the best way to go is to have two different branches and back/foreporting fixes or just using 2to3 from a real Python 2.6 codebase.

Unknown said...

You can use "print", but only in a resticted way.

print ("Hello World")

works in both 2.x and 3.0. The basic point is that a single string surrounded by parentheses and separated from "print" by one or more spaces will work.

See the "Backwards Compatibility" section of PEP 3105 for more details.

Michael Watkins said...

Fabio, re your point #1, its not that ``print`` can't be used, it just can't be used as a statement (as it was in Python < 3) because it is now a function.

print "boo" therefore becomes

Not very scary and relatively minor in the grand scheme of things.

@eddy: there is no need to separate the parenthesis from the print statement in 2.x versions of Python:

>>> sys.version
'2.5.2 ...'
>>> print("foo")

This has been the case since 2.4x at least, perhaps much longer than that.

@Lawrence: I really hope nobody will do such tricks

Perhaps the right answer is not being categorical about such things. I have been doing testing on one full stack web application framework and object database which supports 2.x and 3.x; by luck or design (I think the latter, and its not my code so the praise is not self centred) very few "compatibility shims" were required - nine 'if sys.version < "3"' out of 64 modules. Not bad.

That there have frequently been language and library changes over the years that require such tests in order for a package to support more than one version of 2.x, so the precedence is set and is not all bad.

If a package can achieve dual 2 and 3 compatibility with a minor shim here and there, I don't see anything untoward in carrying on with that tradition.

That said I'm sure we'll see many non trivial packages will not have such a smooth transition.

Fabio Zadrozny said...

Yes, I agree, those are the last resort -- but for Pydev not doing so would complicate things the other way:

One would have to know the version of Python before choosing which script to run -- and add a considerable boilerplate for doing so -- so, back / foreporting fixes seems worse for me.

Actually, keeping separate branches for the same code is not something I usually consider a good solution for anything unless you want to keep something stuck to a given version and start applying only high-priority patches, as happens with older versions of Python -- e.g.: 2.3.xx, 2.4.xx.

Also, the 2to3 is not really 100% correct. Even after running it, there are a number of things that have to be manually patched before all things work, so, maintaining those 2 versions up to date is actually considerably more tricky than just running 2to3.

As for the print, you're right, when a single string is printed to sys.stdout that solution can be adopted (thanks for the tip).

Michael Watkins said...

I was thinking of your list of tips Fabio and decided to fully document another 2 and 3 hurdle: metaclasses.

The syntax change introduced in Python 3 precludes doing an "if sys.version <" style hack, but there is a simple solution.


Python 2 and 3: Metaclasses

Maybe a 2 and 3 'best practices' FAQ somewhere might be in order.

Anonymous said...

It can be noted that most of these tricks aren't necessary under Python 2.6, so this specifically is for Python 2.5 or earlier.

Personally I would use 2to3 to support 2.5 and 3.0 at the same time, but there may be reasons why you really want code to run on both without conversion tools in some limited cases.

Fabio Zadrozny said...

That's right... Actually, in my case, I want to support from Python 2.1 to Python 3.0 with a single codebase.

Andrew said...

Thanks for all of the tips. By the way, it would be great to see more information like this to make its way to the official Porting to Python 3 wiki.