Lichen

docs/wiki/Optimisation

1022:582d834d392d
14 months ago Paul Boddie Merged changes from the value-replacement branch. value-replacement-for-wrapper
     1 = Optimisation =     2      3 In the process of generating an output representation of a Lichen program, the     4 process of optimisation is concerned with positioning members within program     5 structures. Attributes are positioned in object structures, and '''object     6 tables''' are defined so that attributes can be located in such objects.     7 Similarly, parameters, whose positions are determined by their appearance in     8 function and other callable signatures, have positioning information defined     9 in '''parameter tables''' that can be used to position arguments in parameter    10 arrays when their names are used in argument lists.    11     12 Also performed during optimisation is the consolidation of constant    13 information, initially discussed [[../Inspection#Literals_and_Constants|in the    14 context of inspection]].    15     16 <<TableOfContents(2,3)>>    17     18 == Populating Objects ==    19     20 A program will have objects that are classes, modules and instances, and each    21 such object will provide a number of attributes. Each object's attributes are    22 stored in an array. For simplicity, each attribute having a given name will    23 always be positioned at the same place in any array provided by an object    24 featuring an attribute with that name. Even different object types will use    25 the same position.    26     27 Consider an attribute called `elephant` provided by a number of types:    28     29 || '''Type''' ||<-5> '''Attributes''' ||    30 || class C || `__class__` || `ant` || `dog` ||<style="background-color: #ddd"> `elephant` || `cat` ||    31 || class D || `__class__` || `ant` || `duck` ||<style="background-color: #ddd"> `elephant` || `horse` ||    32 || class E || `__class__` || `ant` || `dog` || `giraffe` || `horse` ||    33 || module M || `__class__` || `rhino` || `hippo` ||<style="background-color: #ddd"> `elephant` || `zebra` ||    34     35 Where `elephant` is provided as an attribute, it will always appear in the    36 same position - in this case as the fourth attribute - in any object attribute    37 array.    38     39 Consequently, any operation involving an object of unknown identity that    40 employs `elephant` can employ the same position information to determine    41 whether the attribute is supported and to retrieve or modify its value. Where    42 an object does not support `elephant`, it may use the same attribute position    43 for another attribute. Determining whether objects support attributes is done    44 using tables, described below.    45     46 It should be noted that the positions of attributes in object structures are    47 the same as the positions of attribute identifiers in object tables. With    48 attribute positions allocated, table definition is trivial.    49     50 === Allocation Considerations ===    51     52 Such common positioning of attributes demands a degree of coordination between    53 objects. If `elephant` is positioned in a given place in one object, then    54 given the constraint of it only ever being positioned in a single location for    55 all objects, it may not then be positioned in a different place in another    56 object. Thus, where two attributes co-exist in an object, their positions must    57 be different and will affect all objects supporting those attributes,    58 regardless of whether they support them both. For example:    59     60 || '''Type''' ||<-5> '''Attributes''' ||    61 || class C || `__class__` ||<style="background-color: #ddd"> `ant` || `dog` ||<style="background-color: #ddd"> `elephant` || `cat` ||    62 || class D || `__class__` ||<style="background-color: #ddd"> `ant` || `duck` ||<style="background-color: #ddd"> `elephant` || `horse` ||    63 || class E || `__class__` ||<style="background-color: #ddd"> `ant` || `dog` || `giraffe` || `horse` ||    64 || module M || `__class__` || `rhino` || `hippo` ||<style="background-color: #ddd"> `elephant` || `zebra` ||    65 || module N || `__class__` || || ||<style="background-color: #ddd"> `elephant` || `zebra` ||    66     67 Here, module N still positions `elephant` in the fourth position. If    68 `elephant` were moved to the second position, it would conflict with `ant` or    69 `rhino` in those objects supporting those attributes. Such coordination    70 therefore has an impact on allocation strategies. Care has to be taken to    71 allocate attributes in such a way that small objects (having few attributes)    72 do not have their attributes positioned far from the start of their attribute    73 arrays, because failing to do so effectively makes those objects large,    74 inefficiently-represented objects.    75     76 A reasonable solution is to consider the following when allocating attribute    77 positions:    78     79  * Attribute frequency (or ubiquity)    80  * Object size    81  * Object kind (whether the object is a class, module or instance)    82     83 More frequently-occurring (or ubiquitous) attributes may need to appear in    84 both large and small objects and should probably be allocated in lower    85 positions (`__class__` being an extreme example of needing to be allocated for    86 all objects). Attributes featured in small objects should also be given    87 priority for lower positions due to the desirability of keeping such objects    88 small. Meanwhile, classes and modules only appear once in a program, whereas    89 there may be countless instances allocated during the lifetime of a program's    90 execution. Therefore, attributes featured in instances should have priority    91 over other attributes for lower positions in object structures.    92     93 == Populating Signatures ==    94     95 The positions of parameters in callable signatures are determined by the    96 definitions of those signatures in the source program. When callables are    97 invoked using an argument list, arguments that are not specified using    98 keywords are merely copied into the parameter array in the same position.    99 However, keyword arguments will need to have their positions looked up in the   100 appropriate parameter table for the callable involved.   101    102 Consequently, no allocation needs to be performed on the parameters   103 themselves: they have specific positions for each callable. However, just as   104 attribute names must yield positions when accessing attributes on objects, a   105 similar mechanism that takes parameter names and yields positions must be   106 provided. It is instead the positions of parameter details that must be   107 allocated in structures to be consulted as if parameter names were attribute   108 names, with attributes providing parameter position details.   109    110 Consider the following functions:   111    112 {{{#!python numbers=disable   113 def f(a, b, c, d):   114     ...   115    116 def g(p, q, r):   117     ...   118    119 def h(d, p, v):   120     ...   121 }}}   122    123 In order to find the position of each parameter using its name, the following   124 table could be provided:   125    126 || '''Table''' ||<-4> '''Parameters''' ||   127 || function f || `a` at 1 || `b` at 2 || `c` at 3 || `d` at 4 ||   128 || function g || `q` at 2 ||<style="background-color: #ddd"> `p` at 1 || `r` at 3 || ||   129 || function h || `v` at 2 ||<style="background-color: #ddd"> `p` at 2 || || `d` at 1 ||   130    131 Such a table behaves like an object structure (or an object table) with   132 parameters acting like attributes in such a structure. Here, the attributes   133 yield the positions of parameters within parameter arrays. Note how `p` always   134 resides in the same location but may yield different positions depending on   135 the actual callable involved (as is also observable with `d`).   136    137 === Allocation Considerations ===   138    139 Parameter table allocation involves similar considerations to those   140 influencing object table allocation. In order to keep parameter tables small,   141 more frequently appearing parameters should be positioned earlier in arrays. A   142 specific consideration of importance is that of the number of tables   143 generated. There may be many callables with the same signature, and such   144 callables therefore do not need their own parameter tables since they will   145 merely be duplicates. An attempt is therefore made to identify distinct   146 signatures and to generate parameter tables only for these signatures, instead   147 of providing a one-to-one mapping between callables and tables.   148    149 == Populating Tables ==   150    151 With names allocated to positions in each kind of table, the straightforward   152 task of generating such tables proceeds as follows.   153    154 === Object Tables and Attribute Codes ===   155    156 Object tables consist of locations directly corresponding to structure member   157 locations. Each location in a table will correspond to a specific name, but   158 since potentially many names may have the same position, a table must provide   159 identifying details for the name that is actually supported.   160    161 In object tables, such identifying details take the form of '''attribute   162 codes'''. Each attribute name is mapped to a distinct identifier, and upon   163 consulting an object table, the lookup process must read the stored attribute   164 code and compare it to the code for the attribute it intends to access. If   165 these codes match, then it can be assumed that the object involved supports   166 the named attribute. Otherwise, a different attribute (or even no attribute at   167 all) resides at that location in the object structure, making the access   168 inappropriate.   169    170 A more comprehensive object table resembles the following:   171    172 || '''Type''' ||<-5> '''Attributes (Codes)''' ||   173 || class C || `__class__` (1) || `ant` (2) || `dog` (4) || `elephant` (6) || `cat` (3) ||   174 || class D || `__class__` (1) || `ant` (2) || `duck` (5) || `elephant` (6) || `horse` (9) ||   175 || class E || `__class__` (1) || `ant` (2) || `dog` (4) || `giraffe` (7) || `horse` (9) ||   176 || module M || `__class__` (1) || `rhino` (10) || `hippo` (8) || `elephant` (6) || `zebra` (11) ||   177 || module N || `__class__` (1) || || || `elephant` (6) || `zebra` (11) ||   178    179 The following attribute codes would be employed:   180    181 || '''Attribute Name''' || '''Attribute Code''' ||   182 || `__class__` || 1 ||   183 || `ant` || 2 ||   184 || `cat` || 3 ||   185 || `dog` || 4 ||   186 || `duck` || 5 ||   187 || `elephant` || 6 ||   188 || `giraffe` || 7 ||   189 || `hippo` || 8 ||   190 || `horse` || 9 ||   191 || `rhino` || 10 ||   192 || `zebra` || 11 ||   193    194 === Parameter Tables and Parameter Codes ===   195    196 Parameter tables consist of locations yielding parameter position information.   197 Each location in a table will correspond to a particular name, but since   198 potentially many names may have the same position, a table must provide   199 identifying details for the name that is actually supported.   200    201 Just as with object tables, in parameter tables, such identifying details take   202 the form of '''parameter codes'''. Each parameter name is mapped to a distinct   203 identifier, and upon consulting a parameter table, the lookup process must   204 read the stored parameter code and compare it to the code for the parameter it   205 is attempting to position in a parameter list. If these codes match, then it   206 can be assumed that the signature supports the named parameter. Otherwise, a   207 different parameter (or even no parameter at all) resides at that location in   208 the parameter table, making any attempt to set such a parameter inappropriate.   209    210 Since parameter tables provide both identifying information and parameter   211 positions, a more comprehensive parameter table resembles the following:   212    213 || '''Table''' ||<-4> '''Parameters (Codes)''' ||   214 || function f || `a` at 1 (1) || `b` at 2 (2) || `c` at 3 (3) || `d` at 4 (4) ||   215 || function g || `q` at 2 (6) ||<style="background-color: #ddd"> `p` at 1 (5) || `r` at 3 (7) || ||   216 || function h || `v` at 2 (8) ||<style="background-color: #ddd"> `p` at 2 (5) || || `d` at 1 (4) ||   217    218 The following parameter codes would be employed:   219    220 || '''Parameter Name''' || '''Parameter Code''' ||   221 || `a` || 1 ||   222 || `b` || 2 ||   223 || `c` || 3 ||   224 || `d` || 4 ||   225 || `p` || 5 ||   226 || `q` || 6 ||   227 || `r` || 7 ||   228 || `v` || 8 ||   229    230 == Consolidating Constants ==   231    232 With knowledge about all constants used in a program, it becomes possible to   233 prepare a catalogue of distinct constants and to assign each of them a unique   234 name for convenient referencing in the generated program code. All constants   235 are inspected in turn and a content digest prepared for each of them,   236 establishing a mapping from values to such digests which act as global   237 identifiers. Since local names are associated with constants, a mapping is   238 also established from each local name to the corresponding global identifier.   239    240 Considering the [[../Inspection#Literals_and_Constants|previous example]], the   241 following mappings would be established, from values to global identifiers:   242    243 || '''Type''' || '''Encoding''' || '''Value''' || '''Global Identifier (Content Digest)''' ||   244 || `__builtins__.str.string` || `iso-8859-15` || `'\xc6\xd8\xc5'` || `OeyoRfPI__XqwJcPgTDTItX9v__OM` ||   245 || `__builtins__.int.int` || || `123` || `qPRjUheGUngBhhVaHVqYNbBu4m8` ||   246    247 And from local names or identifiers to global identifiers:   248    249 || '''Identifier''' || '''Global Identifier (Content Digest)''' ||   250 || `__main__.$c0` || `OeyoRfPI__XqwJcPgTDTItX9v__OM` ||   251 || `__main__.$c1` || `qPRjUheGUngBhhVaHVqYNbBu4m8` ||   252    253 == Optimisation Products ==   254    255 The optimisation process should produce catalogues of attribute and parameter   256 codes plus positioning information for object tables, object structures and   257 parameter tables. It should also produce a catalogue of distinct constant   258 identifiers.