Lichen

Annotated docs/wiki/Translation

825:8e9566713e1a
2018-06-22 Paul Boddie Fixed accessor tests to raise type errors instead of returning zero.
paul@810 1
= Translating Programs =
paul@810 2
paul@810 3
The Lichen toolchain currently targets the C programming language, generating programs that are then compiled using existing C compilers. The process of translation involves the optimisation and generation of program structures, and the translation of Lichen language constructs into equivalent C language constructs.
paul@810 4
paul@810 5
<<TableOfContents(2,3)>>
paul@810 6
paul@810 7
== Attribute Access Plans ==
paul@810 8
paul@810 9
During [[../Inspection|inspection]], each attribute access is associated with a unique location and its details stored. During [[../Deduction|deduction]], such accesses are resolved and characterised. During [[../Optimisation|optimisation]], such accesses are encoded as '''instruction plans''' indicating the operations or instructions required to achieve the access. During translation, these instruction plans are generated as part of the final program.
paul@810 10
paul@810 11
== Structure Optimisation ==
paul@810 12
paul@810 13
All object structures need to support the attributes defined for them, and although it may be possible to determine the identity of attributes in advance of a program being run, there generally remains a need to provide mechanisms to determine the nature of an object at run-time and to access its attributes dynamically. To achieve this, an [[../Optimisation|optimisation]] process catalogues the presence of each attribute name on all program objects, then deciding upon a position for the structure member corresponding to that attribute name in all objects.
paul@810 14
paul@810 15
== Structure Generation ==
paul@810 16
paul@810 17
Each program consists of a number of structures providing the program's objects and defining classes, functions, modules and any predefined instances.
paul@810 18
paul@810 19
=== Attribute Representation ===
paul@810 20
paul@810 21
Attributes most typically provide references to objects within a certain optional context. Since attributes are employed to encode various other kinds of information, the data structure may be interpreted in other ways within suitably controlled environments. For example, the `__data__` attribute found on instances of certain built-in classes will encode references to other kinds of information that are used within native functions.
paul@810 22
paul@810 23
=== Object Representation ===
paul@810 24
paul@810 25
Objects provide collections of attributes, and the object representation is meant to support efficient computed access to attributes whilst also allowing a reasonably efficient means of testing and accessing attributes at run-time.
paul@810 26
paul@810 27
=== Structure Constants ===
paul@810 28
paul@810 29
For clarity, several classes of symbolic constant are defined in the translated program to help define structures or to refer to structure members.
paul@810 30
paul@810 31
|| '''Constant Class''' || '''Purpose''' || '''Example''' || '''Application''' ||
paul@810 32
|| `csize` || Indicates the number of attributes in a class || `__csize___builtins___list_list` ||<|3> Structure definition ||
paul@810 33
|| `isize` || Indicates the number of attributes in an instance || `__isize___builtins___list_list` ||
paul@810 34
|| `msize` || Indicates the number of attributes in a module || `__msize___builtins___list` ||
paul@810 35
|| `pminsize` || Indicates the minimum number of parameters expected by a callable || `__pminsize___builtins___list_list_append` ||<|2> Invocation ||
paul@810 36
|| `pmaxsize` || Indicates the maximum number of parameters expected by a callable || `__pmaxsize___builtins___list_list_append` ||
paul@810 37
|| `pcode` || Assigns a code to a particular parameter name || `__pcode_value` ||<|2> Keyword argument positioning ||
paul@810 38
|| `ppos` || Indicates the table position used by a particular parameter name || `__ppos_value` ||
paul@810 39
|| `code` || Assigns a code to a particular attribute name || `__code_value` ||<|2> Attribute access ||
paul@810 40
|| `pos` || Indicates the table position used by a particular attribute name || `__pos_value` ||
paul@810 41
paul@810 42
Such constants are encoded using helper functions, producing names resembling those above that are intended to be distinct and not to conflict with other defined names in the final program.
paul@810 43
paul@810 44
=== Instance Structures ===
paul@810 45
paul@810 46
Like all program objects, instances are defined in C according to a generic structure, meaning that all instances have the same generic members from the perspective of a C program. However, specific structures are defined for each class in order to conveniently describe the dimensions of instances of that class. For example, for the built-in list class, a declaration resembling the following is issued:
paul@810 47
paul@810 48
{{{
paul@810 49
typedef struct {
paul@810 50
    const __table * table;
paul@810 51
    unsigned int pos;
paul@810 52
    __attr attrs[__isize___builtins___list_list___init__];
paul@810 53
} __obj___builtins___list_list___init__;
paul@810 54
}}}
paul@810 55
paul@810 56
== Program Translation ==
paul@810 57
paul@810 58
=== Useful C Language Features ===
paul@810 59
paul@810 60
The translated code relies on the availability of certain useful features of C, some of them supported only in modern compilers:
paul@810 61
paul@810 62
 * Array and structure literals: these are used to define values within expressions that would otherwise require separate definition; invocation arguments are defined using inline expressions and the `__ARGS` macro (for unknown invocation targets and for keyword arguments)
paul@810 63
 * [[https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros|Variadic macros]]: such macros permit variable numbers of arguments, thus making the `__ARGS` macro possible
paul@810 64
 * [[WikiPedia:Comma operator]] sequences: these are used to construct instruction sequences within expressions, providing a mechanism for formulating attribute accesses
paul@810 65
paul@810 66
=== Function Naming ===
paul@810 67
paul@810 68
A naming scheme is applied to functions defined in the final C program to support different kinds of callables present in the original program.
paul@810 69
paul@810 70
|| '''Name Class''' || '''Purpose''' || '''Example''' ||
paul@810 71
|| `fn` || Indicates a plain function || `__fn___builtins___str_str` ||
paul@810 72
|| `main` || Indicates a module main program || `__main_sys` ||
paul@810 73
|| `new` || Indicates a class instantiator || `__new___builtins___list_list` ||
paul@810 74
|| `newliteral` || Provides an instantiator for literals mentioned in programs || `__newliteral___builtins___list_list` ||
paul@810 75
paul@810 76
=== Generic Program Functionality ===
paul@810 77
paul@810 78
Given the representation of the fundamental program objects, a suite of generic operations is provided to support the activities necessary to inspect and update program data. Such operations are largely concerned with generic object and attribute representations and are practically identical from program to program.
paul@810 79
paul@810 80
For example, an attribute provided by an object can be accessed in several different ways depending on the information known about the object when the program is translated. Consequently, a suite of attribute access operations is provided to support each different scenario:
paul@810 81
paul@810 82
||<-2|2> ||<-2> '''Attribute provider''' ||
paul@810 83
|| '''Class''' || '''Instance''' ||
paul@810 84
||<|3> '''Accessor''' || '''Class''' || `load_via_object` || ||
paul@810 85
|| '''Instance''' || `load_via_class` || `load_via_object` ||
paul@810 86
|| '''Class or instance''' || `get_class_and_load` ||  `check_and_load_via_any` ||
paul@810 87
paul@810 88
=== Specific Program Functionality ===
paul@810 89
paul@810 90
Since objects at the program level have their layouts [[../Optimisation|optimised]], and since these layouts may differ from program to program, a suite of specific operations is provided to support a range of activities within programs, such operations being configured with program-specific information in order to do the right thing within a given program.
paul@810 91
paul@810 92
|| '''Function''' || '''Purpose''' ||
paul@810 93
|| `__invoke` || A generic invocation function that populates the argument array and obtains the appropriate target ||
paul@810 94
|| `__new` || Creates a new object, setting the `__class__` attribute to the indicated class identity ||
paul@810 95
|| `__BOOL` || Tests the identity of an attribute value against the `True` constant's address ||
paul@810 96
|| `__GETDEFAULT` || Obtains a function default from additional members of a function instance beyond the core members ||
paul@810 97
|| `__SETDEFAULT` || Updates a function default from additional members of a function instance beyond the core members ||
paul@810 98
paul@810 99
=== Module Files ===
paul@810 100
paul@810 101
Each program module is defined in its own file in the translated program. Since C has no notion of general module-level code, a special main function is generated for each module to perform the operations defined at the module level in the original program. These main functions are called in turn by the principal `main` function of the final C program, and the order of invocation is defined by the module initialisation ordering established when resolving inter-module dependencies.
paul@810 102
paul@810 103
=== Statement Translation ===
paul@810 104
paul@810 105
Statements in the original program are generally translated to statements in the final C program. During inspection and translation, certain constructs are transformed to produce the operations necessary to implement such constructs. For instance, operators need to be rewritten as function invocations, and thus the form of the original program changes before it is translated to C.
paul@810 106
paul@810 107
When translating certain operations that may appear within a statement, a sequence of instructions is generated, and such instructions must be performed in turn. Fortunately, C supports a wide variety of operations - such as assignments - within expressions, and it also provides the comma operator which permits sequences of operations to be performed in order and to yield a result that can then be used in the context of an expression. Such sequences of operations are employed extensively to realise attribute access instruction plans.
paul@810 108
paul@810 109
=== Name References ===
paul@810 110
paul@810 111
Names can be translated in different ways for the final C program. Function locals and parameters correspond directly to locals in C functions. Module-level globals and class-level locals are translated to structure accesses. However, temporary names that are used in sequence assignment are translated to C function locals, either in functions corresponding to their equivalents in the original program, or in module initialisation functions corresponding to the module scope in the original program.
paul@810 112
paul@810 113
Certain special names that are formulated during [[../Inspection|inspection]] are converted to constant or static references.
paul@810 114
paul@810 115
=== Attribute Accesses ===
paul@810 116
paul@810 117
Each attribute access is associated with a particular access location, corresponding to an operation occurring in the original program identified during inspection. This location can be used as a key to access a [[#Attribute_Access_Plans|plan]] that describes how the access may be realised in the final program. Since most of the work computing the nature of the access is done during the preceding deduction and optimisation activities, what remains is mostly the emission of the plan into the generated program text with some substitutions performed.
paul@810 118
paul@810 119
=== Invocations ===
paul@810 120
paul@810 121
Each invocation involves an object whose identity may or may not be known at compile-time, plus a collection of arguments. The translation process is concerned with obtaining a target to be invoked in the generated program, populating an argument array appropriately for the target, and providing the means of invocation.
paul@810 122
paul@810 123
Where the identity of the object is not known at all, the information provided is effectively passed on to the special `__invoke` function which is then required to do all the work at run-time. However, one of the goals of analysing and compiling the program is to avoid doing such work at run-time and to take advantage of those cases where the identity of the object can be determined.
paul@810 124
paul@810 125
Thus, where the identity of the called object is known, details of the object's signature - its parameters and their positions - are used to populate the argument array and to determine whether this can be done successfully. Some callables require a context to be provided, and knowledge about the callable can be used to obtain or omit such a context from the arguments.
paul@810 126
paul@810 127
Finally, the invocation target must be obtained. Where some knowledge of the identity of the object is available, it may be sufficient to access the special `__fn__` member directly (potentially testing for its presence if this is not certain) and to call the appropriate function with the assembled argument array. Where a definite identity is available, the actual function itself may be named and thus called with the arguments. Such operations are mere fragments of the total work performed by the `__invoke` function in the least optimal case.
paul@810 128
paul@810 129
=== Assignments ===
paul@810 130
paul@810 131
Assignments of objects to names occur for the following kinds of statements: assignments, definition statements (`class` and `def`), import statements (`from` and `import`).
paul@810 132
paul@810 133
=== Guards ===
paul@810 134
paul@810 135
Guards are identified during [[../Deduction|deduction]] as being necessary to uphold deductions about the nature of objects referenced via names that may be undermined at run-time by potentially erroneous program behaviour.
paul@810 136
paul@810 137
=== Exceptions ===
paul@810 138
paul@810 139
The C programming language does not support exceptions in a form supported by more modern programming languages. Consequently, lower-level mechanisms need to be employed to reproduce the semantics of the `try` statement, its `except` and `finally` clauses, and the `raise` statement, as well as considering the effect of exceptions on the `return` statement.
paul@810 140
paul@810 141
Fortunately, some consideration has been given to this topic before. The [[http://www.nicemice.net/cexcept/|cexcept]] library - really a collection of convenience macros - provides a way of expressing exception constructs in a natural way in C while translating those constructs to usage of the `setjmp` and `longjmp` functions. Although direct usage of such functions could be generated during program translation, for convenience and to focus on other implementation concerns, it was decided to employ the cexcept macros and to build any additional functionality in terms of their usage.
paul@810 142
paul@810 143
A structure is defined to hold exception-related information:
paul@810 144
paul@810 145
{{{
paul@810 146
typedef struct
paul@810 147
{
paul@810 148
    __attr arg;
paul@810 149
    int raising;
paul@810 150
    int raising_else;
paul@810 151
    int completing;
paul@810 152
} __exc;
paul@810 153
}}}
paul@810 154
paul@810 155
Here, the `arg` member holds a raised exception object reference. Meanwhile, the remaining members interact with language constructs as follows:
paul@810 156
paul@810 157
|| '''Member''' || '''Description''' || '''Output Program Operation''' ||
paul@810 158
|| `raising` || Indicates an exception that has been raised and not yet handled || `__Raise` ||
paul@810 159
|| `raising_else` || Indicates an exception that must be re-raised because it occurred in an `else` clause || `__RaiseElse` ||
paul@810 160
|| `completing` || Indicates the completion of, or return from, a `try` clause requiring entry into a `finally` clause || `__Complete` and `__Return` ||
paul@810 161
paul@810 162
=== Memory Allocation and Garbage Collection ===
paul@810 163
paul@810 164
To avoid having to write a garbage collector, the [[http://www.hboehm.info/gc/|Boehm-Demers-Weiser garbage collector]] is employed by programs to allocate and free memory needed for its objects.