paul@810 | 1 | = Translating Programs = |
paul@810 | 2 | |
paul@861 | 3 | The Lichen toolchain currently targets the C programming language, generating |
paul@861 | 4 | programs that are then compiled using existing C compilers. The process of |
paul@861 | 5 | translation involves the optimisation and generation of program structures, |
paul@861 | 6 | and the translation of Lichen language constructs into equivalent C language |
paul@861 | 7 | constructs. |
paul@810 | 8 | |
paul@810 | 9 | <<TableOfContents(2,3)>> |
paul@810 | 10 | |
paul@810 | 11 | == Attribute Access Plans == |
paul@810 | 12 | |
paul@861 | 13 | During [[../Inspection|inspection]], each attribute access is associated with |
paul@861 | 14 | a unique location and its details stored. During [[../Deduction|deduction]], |
paul@861 | 15 | such accesses are resolved and characterised. During |
paul@861 | 16 | [[../Optimisation|optimisation]], such accesses are encoded as '''instruction |
paul@861 | 17 | plans''' indicating the operations or instructions required to achieve the |
paul@861 | 18 | access. During translation, these instruction plans are generated as part of |
paul@861 | 19 | the final program. |
paul@810 | 20 | |
paul@810 | 21 | == Structure Optimisation == |
paul@810 | 22 | |
paul@861 | 23 | All object structures need to support the attributes defined for them, and |
paul@861 | 24 | although it may be possible to determine the identity of attributes in advance |
paul@861 | 25 | of a program being run, there generally remains a need to provide mechanisms |
paul@861 | 26 | to determine the nature of an object at run-time and to access its attributes |
paul@861 | 27 | dynamically. To achieve this, an [[../Optimisation|optimisation]] process |
paul@861 | 28 | catalogues the presence of each attribute name on all program objects, then |
paul@861 | 29 | deciding upon a position for the structure member corresponding to that |
paul@861 | 30 | attribute name in all objects. |
paul@810 | 31 | |
paul@810 | 32 | == Structure Generation == |
paul@810 | 33 | |
paul@861 | 34 | Each program consists of a number of structures providing the program's |
paul@861 | 35 | objects and defining classes, functions, modules and any predefined instances. |
paul@810 | 36 | |
paul@810 | 37 | === Attribute Representation === |
paul@810 | 38 | |
paul@861 | 39 | Attributes most typically provide references to objects within a certain |
paul@861 | 40 | optional context. Since attributes are employed to encode various other kinds |
paul@861 | 41 | of information, the data structure may be interpreted in other ways within |
paul@861 | 42 | suitably controlled environments. For example, the `__data__` attribute found |
paul@861 | 43 | on instances of certain built-in classes will encode references to other kinds |
paul@861 | 44 | of information that are used within native functions. |
paul@810 | 45 | |
paul@810 | 46 | === Object Representation === |
paul@810 | 47 | |
paul@861 | 48 | Objects provide collections of attributes, and the object representation is |
paul@861 | 49 | meant to support efficient computed access to attributes whilst also allowing |
paul@861 | 50 | a reasonably efficient means of testing and accessing attributes at run-time. |
paul@810 | 51 | |
paul@810 | 52 | === Structure Constants === |
paul@810 | 53 | |
paul@861 | 54 | For clarity, several classes of symbolic constant are defined in the |
paul@861 | 55 | translated program to help define structures or to refer to structure members. |
paul@810 | 56 | |
paul@810 | 57 | || '''Constant Class''' || '''Purpose''' || '''Example''' || '''Application''' || |
paul@810 | 58 | || `csize` || Indicates the number of attributes in a class || `__csize___builtins___list_list` ||<|3> Structure definition || |
paul@810 | 59 | || `isize` || Indicates the number of attributes in an instance || `__isize___builtins___list_list` || |
paul@810 | 60 | || `msize` || Indicates the number of attributes in a module || `__msize___builtins___list` || |
paul@810 | 61 | || `pminsize` || Indicates the minimum number of parameters expected by a callable || `__pminsize___builtins___list_list_append` ||<|2> Invocation || |
paul@810 | 62 | || `pmaxsize` || Indicates the maximum number of parameters expected by a callable || `__pmaxsize___builtins___list_list_append` || |
paul@810 | 63 | || `pcode` || Assigns a code to a particular parameter name || `__pcode_value` ||<|2> Keyword argument positioning || |
paul@810 | 64 | || `ppos` || Indicates the table position used by a particular parameter name || `__ppos_value` || |
paul@810 | 65 | || `code` || Assigns a code to a particular attribute name || `__code_value` ||<|2> Attribute access || |
paul@810 | 66 | || `pos` || Indicates the table position used by a particular attribute name || `__pos_value` || |
paul@810 | 67 | |
paul@861 | 68 | Such constants are encoded using helper functions, producing names resembling |
paul@861 | 69 | those above that are intended to be distinct and not to conflict with other |
paul@861 | 70 | defined names in the final program. |
paul@810 | 71 | |
paul@810 | 72 | === Instance Structures === |
paul@810 | 73 | |
paul@861 | 74 | Like all program objects, instances are defined in C according to a generic |
paul@861 | 75 | structure, meaning that all instances have the same generic members from the |
paul@861 | 76 | perspective of a C program. However, specific structures are defined for each |
paul@861 | 77 | class in order to conveniently describe the dimensions of instances of that |
paul@861 | 78 | class. For example, for the built-in list class, a declaration resembling the |
paul@861 | 79 | following is issued: |
paul@810 | 80 | |
paul@810 | 81 | {{{ |
paul@810 | 82 | typedef struct { |
paul@810 | 83 | const __table * table; |
paul@810 | 84 | unsigned int pos; |
paul@810 | 85 | __attr attrs[__isize___builtins___list_list___init__]; |
paul@810 | 86 | } __obj___builtins___list_list___init__; |
paul@810 | 87 | }}} |
paul@810 | 88 | |
paul@810 | 89 | == Program Translation == |
paul@810 | 90 | |
paul@810 | 91 | === Useful C Language Features === |
paul@810 | 92 | |
paul@861 | 93 | The translated code relies on the availability of certain useful features of |
paul@861 | 94 | C, some of them supported only in modern compilers: |
paul@810 | 95 | |
paul@861 | 96 | * Array and structure literals: these are used to define values within |
paul@861 | 97 | expressions that would otherwise require separate definition; invocation |
paul@861 | 98 | arguments are defined using inline expressions and the `__ARGS` macro (for |
paul@861 | 99 | unknown invocation targets and for keyword arguments) |
paul@861 | 100 | * [[https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros| |
paul@861 | 101 | Variadic macros]]: such macros permit variable numbers of arguments, thus |
paul@861 | 102 | making the `__ARGS` macro possible |
paul@861 | 103 | * [[WikiPedia:Comma operator]] sequences: these are used to construct |
paul@861 | 104 | instruction sequences within expressions, providing a mechanism for |
paul@861 | 105 | formulating attribute accesses |
paul@810 | 106 | |
paul@810 | 107 | === Function Naming === |
paul@810 | 108 | |
paul@861 | 109 | A naming scheme is applied to functions defined in the final C program to |
paul@861 | 110 | support different kinds of callables present in the original program. |
paul@810 | 111 | |
paul@810 | 112 | || '''Name Class''' || '''Purpose''' || '''Example''' || |
paul@810 | 113 | || `fn` || Indicates a plain function || `__fn___builtins___str_str` || |
paul@810 | 114 | || `main` || Indicates a module main program || `__main_sys` || |
paul@810 | 115 | || `new` || Indicates a class instantiator || `__new___builtins___list_list` || |
paul@810 | 116 | || `newliteral` || Provides an instantiator for literals mentioned in programs || `__newliteral___builtins___list_list` || |
paul@810 | 117 | |
paul@810 | 118 | === Generic Program Functionality === |
paul@810 | 119 | |
paul@861 | 120 | Given the representation of the fundamental program objects, a suite of |
paul@861 | 121 | generic operations is provided to support the activities necessary to inspect |
paul@861 | 122 | and update program data. Such operations are largely concerned with generic |
paul@861 | 123 | object and attribute representations and are practically identical from |
paul@861 | 124 | program to program. |
paul@810 | 125 | |
paul@861 | 126 | For example, an attribute provided by an object can be accessed in several |
paul@861 | 127 | different ways depending on the information known about the object when the |
paul@861 | 128 | program is translated. Consequently, a suite of attribute access operations is |
paul@861 | 129 | provided to support each different scenario: |
paul@810 | 130 | |
paul@810 | 131 | ||<-2|2> ||<-2> '''Attribute provider''' || |
paul@810 | 132 | || '''Class''' || '''Instance''' || |
paul@810 | 133 | ||<|3> '''Accessor''' || '''Class''' || `load_via_object` || || |
paul@810 | 134 | || '''Instance''' || `load_via_class` || `load_via_object` || |
paul@810 | 135 | || '''Class or instance''' || `get_class_and_load` || `check_and_load_via_any` || |
paul@810 | 136 | |
paul@810 | 137 | === Specific Program Functionality === |
paul@810 | 138 | |
paul@861 | 139 | Since objects at the program level have their layouts |
paul@861 | 140 | [[../Optimisation|optimised]], and since these layouts may differ from program |
paul@861 | 141 | to program, a suite of specific operations is provided to support a range of |
paul@861 | 142 | activities within programs, such operations being configured with |
paul@861 | 143 | program-specific information in order to do the right thing within a given |
paul@861 | 144 | program. |
paul@810 | 145 | |
paul@810 | 146 | || '''Function''' || '''Purpose''' || |
paul@810 | 147 | || `__invoke` || A generic invocation function that populates the argument array and obtains the appropriate target || |
paul@810 | 148 | || `__new` || Creates a new object, setting the `__class__` attribute to the indicated class identity || |
paul@810 | 149 | || `__BOOL` || Tests the identity of an attribute value against the `True` constant's address || |
paul@810 | 150 | || `__GETDEFAULT` || Obtains a function default from additional members of a function instance beyond the core members || |
paul@810 | 151 | || `__SETDEFAULT` || Updates a function default from additional members of a function instance beyond the core members || |
paul@810 | 152 | |
paul@810 | 153 | === Module Files === |
paul@810 | 154 | |
paul@861 | 155 | Each program module is defined in its own file in the translated program. |
paul@861 | 156 | Since C has no notion of general module-level code, a special main function is |
paul@861 | 157 | generated for each module to perform the operations defined at the module |
paul@861 | 158 | level in the original program. These main functions are called in turn by the |
paul@861 | 159 | principal `main` function of the final C program, and the order of invocation |
paul@861 | 160 | is defined by the module initialisation ordering established when resolving |
paul@861 | 161 | inter-module dependencies. |
paul@810 | 162 | |
paul@810 | 163 | === Statement Translation === |
paul@810 | 164 | |
paul@861 | 165 | Statements in the original program are generally translated to statements in |
paul@861 | 166 | the final C program. During inspection and translation, certain constructs are |
paul@861 | 167 | transformed to produce the operations necessary to implement such constructs. |
paul@861 | 168 | For instance, operators need to be rewritten as function invocations, and thus |
paul@861 | 169 | the form of the original program changes before it is translated to C. |
paul@810 | 170 | |
paul@861 | 171 | When translating certain operations that may appear within a statement, a |
paul@861 | 172 | sequence of instructions is generated, and such instructions must be performed |
paul@861 | 173 | in turn. Fortunately, C supports a wide variety of operations - such as |
paul@861 | 174 | assignments - within expressions, and it also provides the comma operator |
paul@861 | 175 | which permits sequences of operations to be performed in order and to yield a |
paul@861 | 176 | result that can then be used in the context of an expression. Such sequences |
paul@861 | 177 | of operations are employed extensively to realise attribute access instruction |
paul@861 | 178 | plans. |
paul@810 | 179 | |
paul@810 | 180 | === Name References === |
paul@810 | 181 | |
paul@861 | 182 | Names can be translated in different ways for the final C program. Function |
paul@861 | 183 | locals and parameters correspond directly to locals in C functions. |
paul@861 | 184 | Module-level globals and class-level locals are translated to structure |
paul@861 | 185 | accesses. However, temporary names that are used in sequence assignment are |
paul@861 | 186 | translated to C function locals, either in functions corresponding to their |
paul@861 | 187 | equivalents in the original program, or in module initialisation functions |
paul@861 | 188 | corresponding to the module scope in the original program. |
paul@810 | 189 | |
paul@861 | 190 | Certain special names that are formulated during [[../Inspection|inspection]] |
paul@861 | 191 | are converted to constant or static references. |
paul@810 | 192 | |
paul@810 | 193 | === Attribute Accesses === |
paul@810 | 194 | |
paul@861 | 195 | Each attribute access is associated with a particular access location, |
paul@861 | 196 | corresponding to an operation occurring in the original program identified |
paul@861 | 197 | during inspection. This location can be used as a key to access a |
paul@861 | 198 | [[#Attribute_Access_Plans|plan]] that describes how the access may be realised |
paul@861 | 199 | in the final program. Since most of the work computing the nature of the |
paul@861 | 200 | access is done during the preceding deduction and optimisation activities, |
paul@861 | 201 | what remains is mostly the emission of the plan into the generated program |
paul@861 | 202 | text with some substitutions performed. |
paul@810 | 203 | |
paul@810 | 204 | === Invocations === |
paul@810 | 205 | |
paul@861 | 206 | Each invocation involves an object whose identity may or may not be known at |
paul@861 | 207 | compile-time, plus a collection of arguments. The translation process is |
paul@861 | 208 | concerned with obtaining a target to be invoked in the generated program, |
paul@861 | 209 | populating an argument array appropriately for the target, and providing the |
paul@861 | 210 | means of invocation. |
paul@810 | 211 | |
paul@861 | 212 | Where the identity of the object is not known at all, the information provided |
paul@861 | 213 | is effectively passed on to the special `__invoke` function which is then |
paul@861 | 214 | required to do all the work at run-time. However, one of the goals of |
paul@861 | 215 | analysing and compiling the program is to avoid doing such work at run-time |
paul@861 | 216 | and to take advantage of those cases where the identity of the object can be |
paul@861 | 217 | determined. |
paul@810 | 218 | |
paul@861 | 219 | Thus, where the identity of the called object is known, details of the |
paul@861 | 220 | object's signature - its parameters and their positions - are used to populate |
paul@861 | 221 | the argument array and to determine whether this can be done successfully. |
paul@861 | 222 | Some callables require a context to be provided, and knowledge about the |
paul@861 | 223 | callable can be used to obtain or omit such a context from the arguments. |
paul@810 | 224 | |
paul@861 | 225 | Finally, the invocation target must be obtained. Where some knowledge of the |
paul@861 | 226 | identity of the object is available, it may be sufficient to access the |
paul@861 | 227 | special `__fn__` member directly (potentially testing for its presence if this |
paul@861 | 228 | is not certain) and to call the appropriate function with the assembled |
paul@861 | 229 | argument array. Where a definite identity is available, the actual function |
paul@861 | 230 | itself may be named and thus called with the arguments. Such operations are |
paul@861 | 231 | mere fragments of the total work performed by the `__invoke` function in the |
paul@861 | 232 | least optimal case. |
paul@810 | 233 | |
paul@810 | 234 | === Assignments === |
paul@810 | 235 | |
paul@861 | 236 | Assignments of objects to names occur for the following kinds of statements: |
paul@861 | 237 | assignments, definition statements (`class` and `def`), import statements |
paul@861 | 238 | (`from` and `import`). |
paul@810 | 239 | |
paul@810 | 240 | === Guards === |
paul@810 | 241 | |
paul@861 | 242 | Guards are identified during [[../Deduction|deduction]] as being necessary to |
paul@861 | 243 | uphold deductions about the nature of objects referenced via names that may be |
paul@861 | 244 | undermined at run-time by potentially erroneous program behaviour. |
paul@810 | 245 | |
paul@810 | 246 | === Exceptions === |
paul@810 | 247 | |
paul@861 | 248 | The C programming language does not support exceptions in a form supported by |
paul@861 | 249 | more modern programming languages. Consequently, lower-level mechanisms need |
paul@861 | 250 | to be employed to reproduce the semantics of the `try` statement, its `except` |
paul@861 | 251 | and `finally` clauses, and the `raise` statement, as well as considering the |
paul@861 | 252 | effect of exceptions on the `return` statement. |
paul@810 | 253 | |
paul@861 | 254 | Fortunately, some consideration has been given to this topic before. The |
paul@861 | 255 | [[http://www.nicemice.net/cexcept/|cexcept]] library - really a collection of |
paul@861 | 256 | convenience macros - provides a way of expressing exception constructs in a |
paul@861 | 257 | natural way in C while translating those constructs to usage of the `setjmp` |
paul@861 | 258 | and `longjmp` functions. Although direct usage of such functions could be |
paul@861 | 259 | generated during program translation, for convenience and to focus on other |
paul@861 | 260 | implementation concerns, it was decided to employ the cexcept macros and to |
paul@861 | 261 | build any additional functionality in terms of their usage. |
paul@810 | 262 | |
paul@810 | 263 | A structure is defined to hold exception-related information: |
paul@810 | 264 | |
paul@810 | 265 | {{{ |
paul@810 | 266 | typedef struct |
paul@810 | 267 | { |
paul@810 | 268 | __attr arg; |
paul@810 | 269 | int raising; |
paul@810 | 270 | int raising_else; |
paul@810 | 271 | int completing; |
paul@810 | 272 | } __exc; |
paul@810 | 273 | }}} |
paul@810 | 274 | |
paul@861 | 275 | Here, the `arg` member holds a raised exception object reference. Meanwhile, |
paul@861 | 276 | the remaining members interact with language constructs as follows: |
paul@810 | 277 | |
paul@810 | 278 | || '''Member''' || '''Description''' || '''Output Program Operation''' || |
paul@810 | 279 | || `raising` || Indicates an exception that has been raised and not yet handled || `__Raise` || |
paul@810 | 280 | || `raising_else` || Indicates an exception that must be re-raised because it occurred in an `else` clause || `__RaiseElse` || |
paul@810 | 281 | || `completing` || Indicates the completion of, or return from, a `try` clause requiring entry into a `finally` clause || `__Complete` and `__Return` || |
paul@810 | 282 | |
paul@810 | 283 | === Memory Allocation and Garbage Collection === |
paul@810 | 284 | |
paul@861 | 285 | To avoid having to write a garbage collector, the |
paul@861 | 286 | [[http://www.hboehm.info/gc/|Boehm-Demers-Weiser garbage collector]] is |
paul@861 | 287 | employed by programs to allocate and free memory needed for its objects. |