Tuesday, May 10, 2005

Pydev parser... the future??!?!?!

Yesterday I did some 'experiments' with ANTLR as opposed to the current parser that uses JavaCC.

Well, let me talk a little more about the current parser (it is borrowed from Jython)...

It uses JavaCC and asdl (that goes for Abstract Syntax Description Language).

Basically, JavaCC creates the asdl structure as it parses the code, so that after it is parsed, you get an asdl tree data-structure. This structure uses a visitor pattern, and this is what is currently used to find things about the code, like tokens, definitions, etc.

The structure provided by asdl gives us a very complete structure with information about the code, so that we have tokens that are hierarchy structured and we have their starting line and column in the code.

The drawbacks (for me) right now in this structure are:

- We do not get the end of the token, just its start;
- We do not get any indentation info, because indent and dedent tokens were not supported in the asdl structure (as it is now), and we do not get the end of the token;
- There is a huge lack of documentation for asdl;
- In my opinion, JavaCC is not as easy as antlr to work with;
- Antlr seems to have a much better 'error recovery' than JavaCC;

So, I did some experiments and discovered I could do most things I want with antlr, but I still need to find how I want to treat the code after the parsing. I guess I could use antlr to generate the asdl data-structure, but as I said, it is missing some things.

Options I have:
- make it generate some structure I want. So, I would need to have all the info available in asdl (otherwise, I won't be able to do things as refactoring in the future), plus token end and indentation data. Or I could try to extend the asdl structure a bit and keep it, as I think it is easy to deal with.

- Just extend the Jython JavaCC grammar (this would not allow a better error-recovery), only thing I would get would be the decorators.

- Use antlr to generate the same structure I have now for asdl. I would get error recovery and decorators.

Other tools would be built upon this structure, to get completions, definitions, references, etc.

Some notes:
asdlGen location:
http://asdl.sourceforge.net/

Python.asdl location:
http://tinyurl.com/4mj39

To generate the asdl structure from python asdl: asdl_java.py python.asdl

ANTLR:
http://www.antlr.org

Python ANTLR:
http://www.antlr.org/grammar/list

JavaCC:
https://javacc.dev.java.net/

Jython (the JavaCC file for the python JavaCC grammar is here):
http://www.jython.org/

No comments: