paul@810 | 1 | = Toolchain = |
paul@810 | 2 | |
paul@861 | 3 | The toolchain implements the process of analysing Lichen source files, |
paul@861 | 4 | compiling information about the structures and routines expressed in each |
paul@861 | 5 | program, and generating output for further processing that can produce an |
paul@861 | 6 | executable program. |
paul@810 | 7 | |
paul@810 | 8 | <<TableOfContents(2,3)>> |
paul@810 | 9 | |
paul@810 | 10 | == Compiling Programs == |
paul@810 | 11 | |
paul@861 | 12 | The principal interface to the toolchain is the `lplc` command, run on source |
paul@861 | 13 | files as in the following example: |
paul@810 | 14 | |
paul@810 | 15 | {{{ |
paul@810 | 16 | lplc tests/unicode.py |
paul@810 | 17 | }}} |
paul@810 | 18 | |
paul@861 | 19 | There is no need to specify all the files that might be required by the |
paul@861 | 20 | complete program. Instead, the toolchain identifies files in the program by |
paul@861 | 21 | searching its module search path. This can be configured using the |
paul@861 | 22 | `LICHENPATH` environment variable and the `-E` option. |
paul@810 | 23 | |
paul@861 | 24 | Various [[../Prerequisites|prerequisites]] are needed for the toolchain to |
paul@861 | 25 | work properly. By specifying the `-c` option, the specified program will be |
paul@861 | 26 | translated to a C programming language representation but not built, avoiding |
paul@861 | 27 | the need for some development tools to be installed if this is desirable. |
paul@810 | 28 | |
paul@861 | 29 | The default output file from a successful compilation is a file called |
paul@861 | 30 | `_main`, but this can be overridden using the `-o` option. For example: |
paul@810 | 31 | |
paul@810 | 32 | {{{ |
paul@810 | 33 | lplc -o unicode tests/unicode.py |
paul@810 | 34 | }}} |
paul@810 | 35 | |
paul@861 | 36 | The complete set of options can be viewed by specifying the `--help` option, |
paul@861 | 37 | and a manual page is also provided in the `docs` directory of the source |
paul@861 | 38 | distribution: |
paul@810 | 39 | |
paul@810 | 40 | {{{ |
paul@810 | 41 | man -l docs/lplc.1 |
paul@810 | 42 | }}} |
paul@810 | 43 | |
paul@861 | 44 | This page may already be installed if the software was provided as a package |
paul@861 | 45 | as part of an operating system distribution: |
paul@810 | 46 | |
paul@810 | 47 | {{{ |
paul@810 | 48 | man lplc |
paul@810 | 49 | }}} |
paul@810 | 50 | |
paul@810 | 51 | == Toolchain Implementation == |
paul@810 | 52 | |
paul@861 | 53 | The toolchain itself is currently written in Python, but it is envisaged that |
paul@861 | 54 | it will eventually be written in the Lichen language, hopefully needing only |
paul@861 | 55 | minor modifications so that it may be able to accept its own source files as |
paul@861 | 56 | input and ultimately produce a representation of itself as an executable |
paul@861 | 57 | program. Since the Lichen language is based on Python, it is convenient to use |
paul@861 | 58 | existing Python implementations to access libraries that support the parsing |
paul@861 | 59 | of Python source files into useful representations. |
paul@810 | 60 | |
paul@861 | 61 | The Python standard library provides two particularly useful modules or |
paul@861 | 62 | packages of relevance: the `compiler` package and the `parser` module; |
paul@861 | 63 | `parser` is employed by `compiler` to decode source text, whereas `compiler` |
paul@861 | 64 | takes the concrete syntax tree representation from `parser` and produces an |
paul@861 | 65 | abstract syntax tree (AST) which is particularly helpful to software of the |
paul@861 | 66 | nature described here. (Contrary to impressions that |
paul@861 | 67 | [[http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/|some |
paul@861 | 68 | articles]] might give, the `ast` module available in Python 2.5 and later was |
paul@861 | 69 | not the first module to offer AST representations of Python programs in |
paul@861 | 70 | Python, nor was it even the first such module in the standard library.) |
paul@810 | 71 | |
paul@861 | 72 | However, it is not desirable to have a dependency on a Python implementation, |
paul@861 | 73 | which the `parser` module effectively is (as would the `ast` module also be if |
paul@861 | 74 | it were used here), with it typically being implemented as an extension module |
paul@861 | 75 | in a non-Python language (in C for CPython, in Java for Jython, and so on). |
paul@933 | 76 | Fortunately, the [[http://pypy.org/|PyPy]] project implemented their own |
paul@933 | 77 | parsing module, `pyparser`, that is intended to be used within the PyPy |
paul@933 | 78 | environment together with their own `ast` equivalent, but it has been possible |
paul@933 | 79 | to rework `pyparser` to produce representations that are compatible with the |
paul@933 | 80 | `compiler` package, itself being modified in various ways to achieve |
paul@933 | 81 | compatibility (and also to provide various other conveniences). |
paul@810 | 82 | |
paul@810 | 83 | == Program Analysis == |
paul@810 | 84 | |
paul@861 | 85 | With the means of inspecting source files available through a `compiler` |
paul@861 | 86 | package producing a usable representation of each file, it becomes possible to |
paul@861 | 87 | identify the different elements in each file and to collect information that |
paul@861 | 88 | may be put to use later. But before any files are inspected, it must be |
paul@861 | 89 | determined ''which'' files are to be inspected, these comprising the complete |
paul@861 | 90 | program to be analysed. |
paul@810 | 91 | |
paul@861 | 92 | Both Lichen and Python support the notion of a main source file (sometimes |
paul@861 | 93 | called the "script" file or the main module or `__main__`) and of imported |
paul@861 | 94 | modules and packages. The complete set of modules employed in a program is |
paul@861 | 95 | defined as those imported by the main module, then those imported by those |
paul@861 | 96 | modules, and so on. Thus, the complete set is not known without inspecting |
paul@861 | 97 | part of the program, and this set must be built incrementally until no new |
paul@861 | 98 | modules are encountered. |
paul@810 | 99 | |
paul@861 | 100 | Where Lichen and Python differ is in the handling of [[../Imports|imports]] |
paul@861 | 101 | themselves. Python [[https://docs.python.org/3/reference/import.html|employs]] |
paul@861 | 102 | an intricate mechanism that searches for modules and packages, loading modules |
paul@861 | 103 | encountered when descending into packages to retrieve specific modules. In |
paul@861 | 104 | contrast, Lichen only imports the modules that are explicitly mentioned in |
paul@861 | 105 | programs. Thus, a Lichen program will not accumulate potentially large numbers |
paul@861 | 106 | of superfluous modules. |
paul@810 | 107 | |
paul@861 | 108 | With a given module identified as being part of a program, the module will |
paul@861 | 109 | then be [[../Inspection|inspected]] for the purposes of gathering useful |
paul@861 | 110 | information. Since the primary objectives are to characterise the structure of |
paul@861 | 111 | the objects in a program and to determine how such objects are used, certain |
paul@861 | 112 | kinds of program constructs will be inspected more closely than others. Note |
paul@861 | 113 | that this initial inspection activity is not concerned with the translation of |
paul@861 | 114 | program operations to other forms: such [[../Translation|translation]] will |
paul@861 | 115 | occur later; this initial inspection is purely concerned with obtaining enough |
paul@861 | 116 | information to inform such later activities, with the original program being |
paul@861 | 117 | revisited to provide the necessary detail required to translate it. |