Friday, June 23, 2023

robocorp.log: a library to answer what happened in a Python run.

It's been a while since I don't post, so, I decided to shed some details on what I'm working now (which I think is a really nice -- open source -- tool for the Python ecosystem -- in my view it's something close to a "Time travel debugger" for Python, although it's mostly labelled as automatic logging for Python 😄).

I'll start with a bit of history though: I'm working with Robocorp right now (Robocorp is a company which focuses on automation projects -- it leverages and provides many open source tools, especially in the Python ecosystem).

A few years back Robocorp approached me to do a language server for Robot Framework (which started as a testing framework done in Python and right now is also used for doing automations, which Robocorp recommends for many clients doing automations -- although many clients also prefer to just use Python directly). 

Now, one of the strong points of Robot Framework is that it generates an output which provides information on everything that happened during its execution (Keyword calls -- which is how "methods" are called in Robot Framework, if statements, return values, etc), so, it's almost a "Time travel" debugger as it records and shows information on everything that happened in the execution.

Now, this brings me to what I'm working right now: a library which records what happens inside a Python process -- along with a UI which makes it possible to inspect it afterwards (its the Python log counterpart of Robot Framework).

For those curious I've created a repo with one example showing the output from Robot Framework and from the one generated by the Robocorp's Python Framework: https://github.com/fabioz/log_examples.

You can also see a live example (which solves the https://rpachallenge.com/) output of Robot Framework log.html as well as the output of Robocorp's Python Framework log.html.

The easiest way to use it right now is by using the tasks from Robocorp's Python Framework (see: https://github.com/robocorp/robo/tree/master/tasks as the log will be automatically generated for tasks run through it -- mostly, mark your entry points with @task (from robocorp.tasks import task) and run with `python -m robocorp.tasks run` and get your `log.html` in the `output` showing method calls, if statements, assigns, returns, etc that happened in your run).

One question I got a few times is how does it work... Well, after working quite a bit on pydevd and debugpy one of the things scratched right from the start was trying to use Python debugger infrastructure due to a simple fact: if the debugger infrastructure was used then no one could actually debug the code while using the logging framework.

In the end, inspiration ended up coming from PyTest. p.s.: thanks to Bruno Oliveira for a discussion back in the day about how assertions were rewritten using import hooks to provide nicer messages in assertion failures in PyTest.

robocorp.log uses the same approach as PyTest (import hooks + ast rewriting), but instead of rewriting asserts it rewrites the whole method to add callbacks on what's happening.

So, something as:

def method():
    a = 1

roughly becomes something as:

def method():
    report_method_start(..., 'method')
    a = 1
    report_assign(..., 'a', a)
    report_method_end()

The real output is a bit more contrived as it needs to deal with exceptions, yields, making sure the stack is correct, but I hope you get the idea.

Then, the callbacks are converted into a kind of journal of what happens and that is then fed to the log.html (the idea is having it directly in a view in VSCode in the future so that you can see the log being created in real time -- right now the info is added to a bunch of "robolog" files and embedded into the final log.html).

Now, the approach does come with one caveat: import hooks need to be setup prior to importing a module, code imported before setting it up won't be traced -- it's one of the reasons why it's recommended to use robocorp.tasks to do the launching instead of bolting the logging manually as it makes sure things happen in the proper order.

The second caveat is related to the object __repr__. The framework is quite keen on getting the representation from objects at various times, so, if the __repr__ is too slow the execution may be much slower or even worse, if it has side-effects bad things will happen (thankfully most objects do the right thing as a __repr__ with side effects is a bug and the program would misbehave on debuggers too).

The third caveat is that right now it needs to be told what needs to be traced (by default full logging is available for all code which is considered user code and libraries need to be manually specified to be logged when it's called directly from a function which is considered user code -- this may change in the future, but it's the current state of affairs).

The final caveat is that right now it'll only trace what's happening in the main thread. Other threads won't appear in the log.

Well, I guess this post is already a bit bigger than I planned so I'll stop here. For those interested in testing or reporting enhancements/bugs see: https://github.com/robocorp/robo/.

Enjoy!