paul@810 | 1 | = History = |
paul@810 | 2 | |
paul@861 | 3 | Lichen is one of a number of experiments focused on the analysis of Python |
paul@861 | 4 | source code with potential reliability and performance improvements in mind. |
paul@861 | 5 | There have been a number of projects that can be regarded as its predecessors, |
paul@861 | 6 | the first of these being a project with the name "Analysis" that was inspired |
paul@861 | 7 | by the then-recent announcement of the |
paul@861 | 8 | [[https://dspace.mit.edu/handle/1721.1/16688|Starkiller]] tool, which |
paul@861 | 9 | attempted to compile Python to C++ (and was never publicly released). With the |
paul@861 | 10 | emergence of the [[http://pypy.org/|PyPy]] project, the notion of writing a |
paul@861 | 11 | Python implementation in a dialect of Python was also considered attractive. |
paul@810 | 12 | |
paul@810 | 13 | == Initial Experiments == |
paul@810 | 14 | |
paul@861 | 15 | After some experience gained, a successor to "Analysis" was attempted with the |
paul@861 | 16 | uninspired, but distinct, name of "analysis". This attempted to apply type |
paul@861 | 17 | inference and also generate C programs that could be compiled, albeit with a |
paul@861 | 18 | very simple run-time model that was not really amenable to further |
paul@861 | 19 | improvement. However, some useful concepts were explored, such as abstract |
paul@861 | 20 | syntax tree (AST) node annotations that could be used to direct code |
paul@861 | 21 | generation activities and to provide annotated source code and documentation |
paul@861 | 22 | summaries. |
paul@810 | 23 | |
paul@861 | 24 | Various experiences with specialising functions and methods, plus general |
paul@861 | 25 | experience working with the AST led to a successor called "simplify", which |
paul@861 | 26 | attempted to extend specialisation to classes, thus introducing the notion of |
paul@861 | 27 | multiple forms of a single class whose attributes have differing |
paul@861 | 28 | characteristics. It also introduced the idea of a simplified instruction set |
paul@861 | 29 | for program operations, in contrast to Python bytecode whose instructions can |
paul@861 | 30 | represent relatively complicated operations, along with a simplified program |
paul@861 | 31 | representation that was intended to make program analysis easier. Issues |
paul@861 | 32 | around "data polymorphism" led to the realisation that attempting to identify |
paul@861 | 33 | specific types, in order to facilitate insights into program behaviour or |
paul@861 | 34 | optimisations of programs, was costly and not necessarily effective. |
paul@810 | 35 | |
paul@861 | 36 | (Conversations with the author of [[http://shedskin.github.io/|Shed Skin]], a |
paul@861 | 37 | tool already available at this time, which compiles Python programs complying |
paul@861 | 38 | to certain restrictions to C++, were very useful in considering data |
paul@861 | 39 | polymorphism issues and others. Shed Skin must effectively be what Starkiller |
paul@861 | 40 | promised to be, and it has even been available in Debian for a number of years |
paul@861 | 41 | so that others can evaluate it for their own needs.) |
paul@810 | 42 | |
paul@810 | 43 | == Further Experiences == |
paul@810 | 44 | |
paul@861 | 45 | An alternative approach was taken in the next successor, "micropython" (not to |
paul@861 | 46 | be confused with any other, more popular, project of the same name), where the |
paul@861 | 47 | focus of analysis shifted to defining the attributes provided by program |
paul@861 | 48 | objects and then determining which objects might be used in different places |
paul@861 | 49 | in the program according to the attributes mentioned. It introduced a strategy |
paul@861 | 50 | for representing program objects at run-time, and continued the work on a |
paul@861 | 51 | simplified instruction set, defining its instructions in terms of the run-time |
paul@861 | 52 | structures. The ability to generate annotated source code summaries was |
paul@861 | 53 | enhanced, and the software could also generate programs in the simplified |
paul@861 | 54 | instruction set that could be executed using an interpreter. Although |
paul@861 | 55 | substantial progress was made, the increasing resistance experienced in |
paul@861 | 56 | adapting a growing system to further forms of optimisations, and the need to |
paul@861 | 57 | implement and test run-time functionality on a somewhat unique instruction set |
paul@861 | 58 | architecture meant that progress became slower over time. |
paul@810 | 59 | |
paul@861 | 60 | The dependence of "micropython" on costly whole-program analysis to identify |
paul@861 | 61 | optimisations led to some further realisations. The notion of "attribute |
paul@861 | 62 | usage" had allowed program namespaces to have their names characterised, and |
paul@861 | 63 | the notion of well-defined structures had allowed the identification of |
paul@861 | 64 | specific object types throughout programs, albeit with a degree of |
paul@861 | 65 | understandable inaccuracy. Meanwhile, the run-time data model had relied on |
paul@861 | 66 | restrictions on structures to permit relatively efficient structure access |
paul@861 | 67 | operations even when object types cannot be identified. It thus appeared that |
paul@861 | 68 | such techniques might be almost as potent on their own as the combination of |
paul@861 | 69 | more costly techniques explored in earlier projects. |
paul@810 | 70 | |
paul@810 | 71 | == Current Work == |
paul@810 | 72 | |
paul@861 | 73 | It was with such realisations that a new project was effectively born. |
paul@861 | 74 | Tentatively called "!PythonLight" but renamed to "Lichen" as the code matured, |
paul@861 | 75 | the objectives now involved a simpler processing framework that merely |
paul@861 | 76 | attempted to catalogue structure members, to determine the origins of such |
paul@861 | 77 | members, and to record data flow within namespaces in order to determine |
paul@861 | 78 | attribute usage on names. The name "Lichen" can be pronounced in a way that is |
paul@861 | 79 | a contraction of "like Python", indicating that although the language being |
paul@861 | 80 | analysed is similar to Python, restrictions apply that make it not identical |
paul@861 | 81 | to it. |
paul@810 | 82 | |
paul@861 | 83 | Unlike previous efforts, instead of annotating the AST built by the Python |
paul@861 | 84 | parser, separate catalogues would record details of program operations for |
paul@861 | 85 | more convenient consumption by potentially separate analysis tools. It is |
paul@861 | 86 | worth emphasising that much of the processing work done to characterise |
paul@861 | 87 | program behaviour is effectively database work: searching for information that |
paul@861 | 88 | matches particular criteria. Emitting tabular data for perusal and potential |
paul@861 | 89 | use by other tools is a recognition of this, and it would not be unreasonable |
paul@861 | 90 | to expect future versions of tools of this nature to actually employ more |
paul@861 | 91 | conventional database management systems to facilitate program analysis. |
paul@810 | 92 | |
paul@861 | 93 | Instead of targeting a project-specific virtual machine, the ultimate |
paul@861 | 94 | objective of processing was amended to generate a C program that could be |
paul@861 | 95 | compiled using existing tools. Although care must be taken to generate C |
paul@861 | 96 | source code, delegating the low-level work of actually producing an executable |
paul@861 | 97 | reduces the scope of the project and its associated demands. |
paul@810 | 98 | |
paul@861 | 99 | Despite reasonable progress being made, certain features retained from Python |
paul@861 | 100 | proved to be a burden, notably the behaviour of Python's import system and the |
paul@861 | 101 | way in which programs can quickly accumulate superfluous code through that |
paul@861 | 102 | system's operation. To focus on genuinely important functionality and on |
paul@861 | 103 | producing working code, the project was [[../Restarted|restarted]] with |
paul@861 | 104 | various additional simplifications made to the supported language. |
paul@861 | 105 | Fortunately, this helped to drive progress, resulting in the development of a |
paul@861 | 106 | toolchain that can generate C programs for compilation and execution. |
paul@810 | 107 | |
paul@861 | 108 | Lichen attempts to balance the goals of requiring relatively inexpensive |
paul@861 | 109 | analysis, offering meaningful insights into program behaviour, providing |
paul@861 | 110 | output in the form of working, translated, standalone programs, and |
paul@861 | 111 | encouraging further experimentation in the areas of language design, program |
paul@861 | 112 | analysis and optimisation. Much of the standard library is itself written in |
paul@861 | 113 | Lichen, even functionality that would not be written in Python for CPython, |
paul@861 | 114 | thus realising one of the original ambitions for this work and its |
paul@861 | 115 | predecessors. |