# HG changeset patch # User Paul Boddie # Date 1534498888 -7200 # Node ID f745d151a441a4a22f348cc243d1c783e19e9ac6 # Parent 8f8361de472ae4b511cd393f7a8e6fe400eddca0 Wrapped text in the documentation for more convenient "offline" editing. diff -r 8f8361de472a -r f745d151a441 docs/wiki/APIs --- a/docs/wiki/APIs Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/APIs Fri Aug 17 11:41:28 2018 +0200 @@ -1,25 +1,31 @@ = APIs = -The principal application programming interfaces (APIs) within Lichen are described below. +The principal application programming interfaces (APIs) within Lichen are +described below. == Modules == -When [[../Inspection|inspecting]] and [[../Translation|translating]] modules, common abstractions are used so that elements of the program are handled in similar ways. Various useful attributes and methods are provided by the `CommonModule` abstraction for use within `InspectedModule` and `TranslatedModule` methods: +When [[../Inspection|inspecting]] and [[../Translation|translating]] modules, +common abstractions are used so that elements of the program are handled in +similar ways. Various useful attributes and methods are provided by the +`CommonModule` abstraction for use within `InspectedModule` and +`TranslatedModule` methods: {{{#!table '''Attribute or Method''' || '''Purpose''' == -`name` || -An attribute providing the module name. +`name` || An attribute providing the module name. == -`get_global_path(name)` || -A method returning the qualified name of the given global name within the module being processed. +`get_global_path(name)` || A method returning the qualified name of the given + .. global name within the module being processed. == -`get_namespace_path()` || -A method returning the qualified name of the namespace being processed. +`get_namespace_path()` || A method returning the qualified name of the + .. namespace being processed. + * For modules, this is the module name - * For classes, functions and methods, the path incorporates the module name and namespaces leading to the current namespace itself + * For classes, functions and methods, the path incorporates the module name + and namespaces leading to the current namespace itself == -`get_object_path(name)` || -A method returning the qualified name of the given local name within the namespace being processed. +`get_object_path(name)` || A method returning the qualified name of the given + .. local name within the namespace being processed. }}} diff -r 8f8361de472a -r f745d151a441 docs/wiki/Builtins --- a/docs/wiki/Builtins Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Builtins Fri Aug 17 11:41:28 2018 +0200 @@ -1,36 +1,65 @@ = Built-Ins = -The "built-ins" are a collection of special names that do not need to be explicitly imported. For example: +The "built-ins" are a collection of special names that do not need to be +explicitly imported. For example: {{{#!python numbers=disable biggest = max([23, 19, 27]) # max is a built-in function }}} -Such names are always available as if they were defined in the current module. However, they are provided by a package hierarchy that seeks to divide them into isolated areas of functionality that can be included or excluded depending on the facilities employed by each program. For example, complex numbers are provided in the `__builtins__.complex` module, but for any program not employing complex numbers, this module will be excluded and its functionality not appear in the final program. (The exclusion of modules is achieved using the hidden module functionality provided by the [[../Imports|import mechanism]].) +Such names are always available as if they were defined in the current module. +However, they are provided by a package hierarchy that seeks to divide them +into isolated areas of functionality that can be included or excluded +depending on the facilities employed by each program. For example, complex +numbers are provided in the `__builtins__.complex` module, but for any program +not employing complex numbers, this module will be excluded and its +functionality not appear in the final program. (The exclusion of modules is +achieved using the hidden module functionality provided by the +[[../Imports|import mechanism]].) -The principal, "top-level" module providing built-ins is the `__builtins__` module whose only role is to expose the actual built-in names. It does so by importing names directly from the different submodules, such as `__builtins__.complex`, so that attempts to import names from `__builtins__` may provide such attempts with the named objects. The `__builtins__` module looks like this in such cases: +The principal, "top-level" module providing built-ins is the `__builtins__` +module whose only role is to expose the actual built-in names. It does so by +importing names directly from the different submodules, such as +`__builtins__.complex`, so that attempts to import names from `__builtins__` +may provide such attempts with the named objects. The `__builtins__` module +looks like this in such cases: {{{#!python numbers=disable from __builtins__.complex import complex }}} -Accesses to built-in names use the same technique of importing a name (`complex`) rather than the module providing it (`__builtins__`). It is as if the following code would appear before usage of a built-in name: +Accesses to built-in names use the same technique of importing a name +(`complex`) rather than the module providing it (`__builtins__`). It is as if +the following code would appear before usage of a built-in name: {{{#!python numbers=disable from __builtins__ import complex }}} -Since it is the specific name that is being referenced, not the module, the other contents of the module can be ignored and the reference to the named object followed to its actual definition location. Thus, usage of `complex` causes `__builtins__.complex` to be included in the program so that it may provide the `complex` type. +Since it is the specific name that is being referenced, not the module, the +other contents of the module can be ignored and the reference to the named +object followed to its actual definition location. Thus, usage of `complex` +causes `__builtins__.complex` to be included in the program so that it may +provide the `complex` type. -Thus, it becomes possible to keep a module like `__builtins__` out of the program since its only role would be to hold references to other modules' objects, but such specific imports permit the module to be bypassed by just following the declared import relationships. However, consider the consequences of `__builtins__` being imported as follows: +Thus, it becomes possible to keep a module like `__builtins__` out of the +program since its only role would be to hold references to other modules' +objects, but such specific imports permit the module to be bypassed by just +following the declared import relationships. However, consider the +consequences of `__builtins__` being imported as follows: {{{#!python numbers=disable import __builtins__ }}} -Its entire contents would then need to be exposed because it would then be possible to access any name provided by the module via the module. This was not the case with a specific name import because there was no way of referencing the module itself as a result of such an import. +Its entire contents would then need to be exposed because it would then be +possible to access any name provided by the module via the module. This was +not the case with a specific name import because there was no way of +referencing the module itself as a result of such an import. -This would then cause all of the referenced modules to be imported because it would no longer be possible to readily identify the modules that would actually be needed by the program. For example: +This would then cause all of the referenced modules to be imported because it +would no longer be possible to readily identify the modules that would +actually be needed by the program. For example: {{{#!python numbers=disable def get_things(obj): @@ -41,4 +70,8 @@ get_things(__builtins__) }}} -Of course, no module in a program should be referencing the `__builtins__` module by explicitly importing it, anyway. Given that all the names provided by that module are already available without any need to perform an import operation, such an import operation would have rather limited additional benefits. +Of course, no module in a program should be referencing the `__builtins__` +module by explicitly importing it, anyway. Given that all the names provided +by that module are already available without any need to perform an import +operation, such an import operation would have rather limited additional +benefits. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Cache --- a/docs/wiki/Cache Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Cache Fri Aug 17 11:41:28 2018 +0200 @@ -1,6 +1,13 @@ = Inspection Cache Files = -The results of inspection for each module are written out to cache files, and these files should be able to provide all the information that is gathered during inspection without having to inspect the source code again. One minor benefit of using cached data, instead of having to parse and inspect the source code for a given module, is that of a slightly reduced processing time for the inspection activity. However, an arguably greater benefit is that of being able to see the outcome of the activity as a summary of accumulated data. +The results of inspection for each module are written out to cache files, and +these files should be able to provide all the information that is gathered +during inspection without having to inspect the source code again. One minor +benefit of using cached data, instead of having to parse and inspect the +source code for a given module, is that of a slightly reduced processing time +for the inspection activity. However, an arguably greater benefit is that of +being able to see the outcome of the activity as a summary of accumulated +data. Each cache file has the following general format: @@ -34,7 +41,8 @@ ... }}} -|| Special name and corresponding reference plus comma-separated list of usage namespaces +|| Special name and corresponding reference plus comma-separated list of usage +.. namespaces == {{{ members: @@ -115,7 +123,8 @@ }}} || -Comma-separated parameter definitions for each function, with each definition being of the form... +Comma-separated parameter definitions for each function, with each definition +being of the form... {{{ = }}} @@ -144,7 +153,10 @@ }}} || -Attribute usage details for the given name in the given namespace, with usages being a semicolon-separated list of usage alternatives, each being a comma-separated list of attribute names or {} (meaning no attribute names used), attribute names employing ! if invoked +Attribute usage details for the given name in the given namespace, with usages +being a semicolon-separated list of usage alternatives, each being a +comma-separated list of attribute names or {} (meaning no attribute names +used), attribute names employing ! if invoked == {{{ attribute accesses: @@ -160,7 +172,9 @@ ... }}} -|| Identity of the given attribute chain in the given namespace, with any unresolved attribute chain provided +|| +Identity of the given attribute chain in the given namespace, with any +unresolved attribute chain provided == {{{ attribute access usage: @@ -168,7 +182,9 @@ ... }}} -|| Indicates, for each access involving the given name and first attribute name in the given namespace, the definitions that may provide the name +|| +Indicates, for each access involving the given name and first attribute name +in the given namespace, the definitions that may provide the name == {{{ attribute access-modifiers: @@ -176,7 +192,10 @@ ... }}} -|| Indicates, for accesses involving the given name and first attribute name in the given namespace, the modifiers applying to each access, where = indicates assignment, ! indicates invocation, and _ indicates access +|| +Indicates, for accesses involving the given name and first attribute name in +the given namespace, the modifiers applying to each access, where = indicates +assignment, ! indicates invocation, and _ indicates access == {{{ constant literals: @@ -184,7 +203,9 @@ ... }}} -|| Describes a constant literal in the given namespace having the indicated type, encoding (if a string), and value +|| +Describes a constant literal in the given namespace having the indicated type, +encoding (if a string), and value == {{{ constant values: diff -r 8f8361de472a -r f745d151a441 docs/wiki/Changelog --- a/docs/wiki/Changelog Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Changelog Fri Aug 17 11:41:28 2018 +0200 @@ -1,3 +1,4 @@ = Changelog = -Currently, only an initial release has been made. See the repository history for change-related information. +Currently, only an initial release has been made. See the repository history +for change-related information. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Deduction --- a/docs/wiki/Deduction Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Deduction Fri Aug 17 11:41:28 2018 +0200 @@ -1,23 +1,35 @@ = Deducing Program Behaviour = -With information from [[../Inspection|inspection]] available, the isolated observations about operations in a program may now be combined with knowledge about the program's structure to produce deductions about the nature of each operation. +With information from [[../Inspection|inspection]] available, the isolated +observations about operations in a program may now be combined with knowledge +about the program's structure to produce deductions about the nature of each +operation. <> == The Deduction Process == -The deduction process takes observations made during the [[../Inspection|inspection process]] and attempts to form deductions about the behaviour of the program primarily in terms of the nature of the attribute '''accesses''', with their corresponding '''accessors''', featuring in the program. Where attributes are used in conjunction with names, accessors are name versions; where attributes are used in conjunction with other expressions, accessors are '''anonymous'''. +The deduction process takes observations made during the +[[../Inspection|inspection process]] and attempts to form deductions about the +behaviour of the program primarily in terms of the nature of the attribute +'''accesses''', with their corresponding '''accessors''', featuring in the +program. Where attributes are used in conjunction with names, accessors are +name versions; where attributes are used in conjunction with other +expressions, accessors are '''anonymous'''. === Indexes === -For efficiency, indexes are defined to establish relationships between the following things: +For efficiency, indexes are defined to establish relationships between the +following things: {{{#!table '''Indexes''' || '''Details''' == -`access_index` || Which accessors (name versions) are involved with access operations +`access_index` || Which accessors (name versions) are involved with access + .. operations == -`location_index` || Which attribute usage patterns are supported by accessors (name versions) +`location_index` || Which attribute usage patterns are supported by accessors + .. (name versions) == `attr_class_types`<
>`attr_instance_types`<
>`attr_module_types` || Which types support which attribute names @@ -35,7 +47,9 @@ || Which names are aliases for other names, accesses or invocations }}} -The objective of deduction is to combine these indexes to establish new relationships between the different participants of these basic index relationships. +The objective of deduction is to combine these indexes to establish new +relationships between the different participants of these basic index +relationships. === Building Indexes === @@ -138,7 +152,11 @@ === Converting Usage to Types === -A particularly important operation in the deduction process is that of converting attribute usage information to a set of types supporting such usage. Taking the mapping of attribute names to types, each attribute name provided by a usage observation can be applied, progressively narrowing the set of types available to support the complete attribute usage collection. +A particularly important operation in the deduction process is that of +converting attribute usage information to a set of types supporting such +usage. Taking the mapping of attribute names to types, each attribute name +provided by a usage observation can be applied, progressively narrowing the +set of types available to support the complete attribute usage collection. {{{#!graphviz //format=svg @@ -188,9 +206,14 @@ } }}} -The types supporting attribute usage are the attribute '''providers'''. Where providers are classes, the affected accessors in the program may also be instances, since instances also support access to attributes of the instantiated class (and its ancestors). +The types supporting attribute usage are the attribute '''providers'''. Where +providers are classes, the affected accessors in the program may also be +instances, since instances also support access to attributes of the +instantiated class (and its ancestors). -Indexes mapping attributes to types must combine consideration of class and instance attributes in order to correctly identify instance providers. Consider the following example: +Indexes mapping attributes to types must combine consideration of class and +instance attributes in order to correctly identify instance providers. +Consider the following example: {{{#!graphviz //format=svg @@ -235,13 +258,23 @@ } }}} -To recognise support by instance accessors for the usage pattern concerned, attribute details must be obtained from classes as well as instances. Note that the identified type cannot support such usage purely by providing class or instance attributes alone. +To recognise support by instance accessors for the usage pattern concerned, +attribute details must be obtained from classes as well as instances. Note +that the identified type cannot support such usage purely by providing class +or instance attributes alone. === Attribute Assignments === -Since attribute access operations may occur as part of assignments, the nature of accesses is recorded during inspection. Combining the indexed information for assignment accesses allows the deducer to determine the most pessimistic effects on the program structure of such assignments. +Since attribute access operations may occur as part of assignments, the nature +of accesses is recorded during inspection. Combining the indexed information +for assignment accesses allows the deducer to determine the most pessimistic +effects on the program structure of such assignments. -Taking each attribute usage set employed by accessors involved in assignments, the types are deduced for such accessors, and each individual attribute known to be used in such assignments is then applied to the deduced types, '''mutating''' them in such a way that deductions may no longer assume a fixed identity for such attributes when obtained from these types. +Taking each attribute usage set employed by accessors involved in assignments, +the types are deduced for such accessors, and each individual attribute known +to be used in such assignments is then applied to the deduced types, +'''mutating''' them in such a way that deductions may no longer assume a fixed +identity for such attributes when obtained from these types. {{{#!graphviz //format=svg @@ -277,13 +310,19 @@ In the context of a specific access, the types may be resolved further: - * Any name whose initialisation could be determined during inspection can be associated with its initialised type - * Any name referring to a constant object can be associated with the type of that object - * Usage of `self` in methods can result in only compatible class and instance types being retained from the types obtained from usage deductions + * Any name whose initialisation could be determined during inspection can be + associated with its initialised type + * Any name referring to a constant object can be associated with the type of + that object + * Usage of `self` in methods can result in only compatible class and instance + types being retained from the types obtained from usage deductions === Reference Identification === -The basic information about every accessor and accessed attribute in a program can now be combined to produce specific '''references''' to identities consistent with the inspection observations. Several different kinds of accessors and access situations exist: +The basic information about every accessor and accessed attribute in a program +can now be combined to produce specific '''references''' to identities +consistent with the inspection observations. Several different kinds of +accessors and access situations exist: * Name-based accesses involving attribute usage * Aliases to names, possibly accompanied by accesses @@ -292,72 +331,136 @@ === Aliases === -Names initialised using other names or attribute accesses, or using the invocation of either of these things, are known as '''aliases'''. Information about aliases originates during inspection when such names are initialised with expression results within the program. During the resolution activity finalising the inspected details, this initialisation information is used to define relationships between aliases and other names and accesses. +Names initialised using other names or attribute accesses, or using the +invocation of either of these things, are known as '''aliases'''. Information +about aliases originates during inspection when such names are initialised +with expression results within the program. During the resolution activity +finalising the inspected details, this initialisation information is used to +define relationships between aliases and other names and accesses. -During deduction, attribute accesses are investigated and type information computed for both attribute accesses and accessors. The relationships defined between accesses and aliases can then be employed to propagate such deduced information to the aliases and to define appropriate type characteristics for them. +During deduction, attribute accesses are investigated and type information +computed for both attribute accesses and accessors. The relationships defined +between accesses and aliases can then be employed to propagate such deduced +information to the aliases and to define appropriate type characteristics for +them. -Where aliases are used as the basis of attribute accesses, this propagation refines the previously deduced information about the aliases and may also refine the details of the accesses with which the alias is involved. +Where aliases are used as the basis of attribute accesses, this propagation +refines the previously deduced information about the aliases and may also +refine the details of the accesses with which the alias is involved. === Invocation Target Suitability === -Having identified accessed attributes, invocation information can be applied in cases where such attributes immediately participate in an invocation, comparing the specified argument details against the parameter details defined for the identified attribute, which must be a callable object for this technique to work. Where arguments do not appear to be suitable - either the number of arguments is incorrect or any keyword argument refer to non-existent parameters - the attribute providing the parameter details can be considered unsuitable for the access. +Having identified accessed attributes, invocation information can be applied +in cases where such attributes immediately participate in an invocation, +comparing the specified argument details against the parameter details defined +for the identified attribute, which must be a callable object for this +technique to work. Where arguments do not appear to be suitable - either the +number of arguments is incorrect or any keyword argument refer to non-existent +parameters - the attribute providing the parameter details can be considered +unsuitable for the access. === Classifying Accessors === -Each accessor, being a particular version of a name, will have type information associated with it as a consequence of the deduction process. Such information takes the following forms: +Each accessor, being a particular version of a name, will have type +information associated with it as a consequence of the deduction process. Such +information takes the following forms: - * Provider types, indicating which types may provide the attributes used by the accessor + * Provider types, indicating which types may provide the attributes used by + the accessor * Accessor types, indicating which types will actually appear as the accessor This information can be processed in a number of ways to produce the following: - * All types (from all kinds of type) of providers able to provide attributes via the accessor - * All types (from all kinds of type) of accessors compatible with the accessor + * All types (from all kinds of type) of providers able to provide attributes + via the accessor + * All types (from all kinds of type) of accessors compatible with the + accessor * The most general types of accessors compatible with the accessor -Where many types may be associated with an accessor, identifying the most general types involves eliminating those which are derived from others. Given that descendant types may not remove support for attributes provided by their ancestors, then where an ancestor type has been identified as a possible accessor, it should follow that all of its descendants may also have been identified as possible accessors. Since these descendants should be compatible, identifying them individually is unnecessary: merely specifying that the common ancestor or ''any'' descendant could provide an accessor is sufficient. +Where many types may be associated with an accessor, identifying the most +general types involves eliminating those which are derived from others. Given +that descendant types may not remove support for attributes provided by their +ancestors, then where an ancestor type has been identified as a possible +accessor, it should follow that all of its descendants may also have been +identified as possible accessors. Since these descendants should be +compatible, identifying them individually is unnecessary: merely specifying +that the common ancestor or ''any'' descendant could provide an accessor is +sufficient. ==== Defining Guards ==== -A '''guard''' is a constraint supported by a run-time test indicating the compliance of an accessor with a particular type. Thus, where a single accessor type has been identified, a guard may be established for an accessor that tests the type of the accessor against a specific type. Where a single ''general'' accessor type is identified, a guard is established that tests the accessor against a "common" type: that is, an ancestor type with which other descendant types may comply. +A '''guard''' is a constraint supported by a run-time test indicating the +compliance of an accessor with a particular type. Thus, where a single +accessor type has been identified, a guard may be established for an accessor +that tests the type of the accessor against a specific type. Where a single +''general'' accessor type is identified, a guard is established that tests the +accessor against a "common" type: that is, an ancestor type with which other +descendant types may comply. === Classifying Accesses === -At the point of classifying accesses, information is available about the following: +At the point of classifying accesses, information is available about the +following: * The accessors potentially involved in each access - * The types of accessors and the types providing attributes via those accessors + * The types of accessors and the types providing attributes via those + accessors * Any guards applying to the accessors - * Whether an access is constrained by certain program characteristics and is thus guaranteed to be as deduced + * Whether an access is constrained by certain program characteristics and is + thus guaranteed to be as deduced * The possible attributes referenced by the access -This information can be processed in a number of ways to produce the following: +This information can be processed in a number of ways to produce the +following: * The types of accessors, both general and specific, applying to each access - * The attributes that can be provided by each access, consolidating existing referenced attribute details + * The attributes that can be provided by each access, consolidating existing + referenced attribute details * The general types providing the attributes -Since more than one accessor may be involved, information from all accessors must be combined to determine whether guard information still applies to the access, or whether it is possible for an accessor to be used that has an uncertain type at run-time. +Since more than one accessor may be involved, information from all accessors +must be combined to determine whether guard information still applies to the +access, or whether it is possible for an accessor to be used that has an +uncertain type at run-time. ==== Defining Tests ==== -A '''test''' at the access level is a necessary check performed on an accessor before an access to determine whether it belongs to a certain type or group of types. +A '''test''' at the access level is a necessary check performed on an accessor +before an access to determine whether it belongs to a certain type or group of +types. -Where guards applying to accessors apply by extension to an access, it may not be enough to assume that the the attributes exposed by the accessor are the same as those considered acceptable through deduction. Therefore, attributes are obtained for the access using the applicable guard types as accessors. If this set of attributes does not include potentially accessible attributes (perhaps because the guard types are broadly defined and do not support all attribute usage), a test must be generated. +Where guards applying to accessors apply by extension to an access, it may not +be enough to assume that the the attributes exposed by the accessor are the +same as those considered acceptable through deduction. Therefore, attributes +are obtained for the access using the applicable guard types as accessors. If +this set of attributes does not include potentially accessible attributes +(perhaps because the guard types are broadly defined and do not support all +attribute usage), a test must be generated. -Where a single attribute provider can be identified, a test for a specific type is generated. Where a single general type can be identified as a provider, a test against a "common" type is generated. Tests involving the built-in `object` are not generated since it is the root of all classes and such tests would merely identify an accessor as a class. In all cases where a single, specific type cannot be tested for, the general attribute validation mechanism must be used instead. +Where a single attribute provider can be identified, a test for a specific +type is generated. Where a single general type can be identified as a +provider, a test against a "common" type is generated. Tests involving the +built-in `object` are not generated since it is the root of all classes and +such tests would merely identify an accessor as a class. In all cases where a +single, specific type cannot be tested for, the general attribute validation +mechanism must be used instead. == Preparing Access Descriptions == -The accumulated deduced knowledge is directed towards producing access descriptions or plans which characterise each access in terms of the following: +The accumulated deduced knowledge is directed towards producing access +descriptions or plans which characterise each access in terms of the +following: - * The initial accessor, being the starting object for attribute accesses, unless a static accessor has been identified + * The initial accessor, being the starting object for attribute accesses, + unless a static accessor has been identified * Details of any test required on the initial accessor * Details of any type employed by the test * Any identified static accessor (to be used as the initial accessor) - * Attributes needing to be traversed from the accessor that yield unambiguous objects + * Attributes needing to be traversed from the accessor that yield unambiguous + objects * Access modes for each of the unambiguously-traversed attributes - * Remaining attributes needing to be tested and traversed (after having traversed the above attributes) + * Remaining attributes needing to be tested and traversed (after having + traversed the above attributes) * Details of the context * Any test to apply to the context (to ensure its validity) * The method of obtaining the first attribute @@ -367,58 +470,98 @@ === Characterising the Accessor === -For a given access location, the referenced attribute details established during deduction are used to determine... +For a given access location, the referenced attribute details established +during deduction are used to determine... - * Whether the initial accessor is static, originating from a constant access or involving an identifiable static object + * Whether the initial accessor is static, originating from a constant access + or involving an identifiable static object * Whether the initial accessor is dynamic but has a known, deduced identity -Some useful information about the accessor and about the actual provider of the first accessed attribute can be defined: +Some useful information about the accessor and about the actual provider of +the first accessed attribute can be defined: || || '''Class''' || '''Instance''' || '''Module''' || || '''Accessor''' || Class accessor || Instance accessor || Module accessor || || '''Provider''' || Class provider || Instance provider || || -Depending on which of these criteria are satisfied, some properties of the access operation can be established: +Depending on which of these criteria are satisfied, some properties of the +access operation can be established: - * Object-relative accesses occur with class accessors or module accessors or when attributes are provided by instances - * Class-relative accesses occur with instance accessors when attributes are provided by classes + * Object-relative accesses occur with class accessors or module accessors or + when attributes are provided by instances + * Class-relative accesses occur with instance accessors when attributes are + provided by classes -Object-relative accesses merely involve obtaining attributes directly from accessors. Class-relative accesses involve obtaining the class of each accessor and then obtaining an attribute from that class. +Object-relative accesses merely involve obtaining attributes directly from +accessors. Class-relative accesses involve obtaining the class of each +accessor and then obtaining an attribute from that class. === Defining the First Access Method === -For dynamic or unidentified accessors, the above access properties define the access method on the first attribute in the chain of attributes provided. However, any redefinition of the accessor to a static accessor may influence the first method. For static accessors, the first method is always object-relative since classes and modules do not offer transparent mechanisms to expose the attributes on their own classes. +For dynamic or unidentified accessors, the above access properties define the +access method on the first attribute in the chain of attributes provided. +However, any redefinition of the accessor to a static accessor may influence +the first method. For static accessors, the first method is always +object-relative since classes and modules do not offer transparent mechanisms +to expose the attributes on their own classes. -Static and identified, dynamic accessors should already be known to support the specified attributes. Other accessors require an access method to be used that also checks whether the accessor supports a given attribute. +Static and identified, dynamic accessors should already be known to support +the specified attributes. Other accessors require an access method to be used +that also checks whether the accessor supports a given attribute. === Redefining the Accessor === -With knowledge about the initial accessor, the attributes involved in the access operation are then considered in the context of the accessor. For each attribute name in the chain described for an access, an attempt is made to obtain the details of the attribute of the given name from the accessor, determining whether these details can be used as an accessor to obtain subsequent attribute details. +With knowledge about the initial accessor, the attributes involved in the +access operation are then considered in the context of the accessor. For each +attribute name in the chain described for an access, an attempt is made to +obtain the details of the attribute of the given name from the accessor, +determining whether these details can be used as an accessor to obtain +subsequent attribute details. -Where more than one attribute identity is obtained, traversal is terminated: all remaining attributes must be traversed at run-time. If an attribute identified during traversal is static, the first access method changes accordingly. +Where more than one attribute identity is obtained, traversal is terminated: +all remaining attributes must be traversed at run-time. If an attribute +identified during traversal is static, the first access method changes +accordingly. === Defining the Final Access Method === -An access can also involve an assignment to the accessed attribute or the invocation of the accessed attribute. Such operations change the nature of the access method used on the final attribute in a chain of attribute names. +An access can also involve an assignment to the accessed attribute or the +invocation of the accessed attribute. Such operations change the nature of the +access method used on the final attribute in a chain of attribute names. === Identifying Context Information === -Final attribute accesses involving callables need to yield context information that can subsequently be used to invoke those callables. Where the nature of an accessed attribute is not known, a simplistic attempt can be made to look up all attributes stored using the attribute name in the program. +Final attribute accesses involving callables need to yield context information +that can subsequently be used to invoke those callables. Where the nature of +an accessed attribute is not known, a simplistic attempt can be made to look +up all attributes stored using the attribute name in the program. Of particular interest are the following situations: - * Where class attributes are being accessed via instances, whether the attributes are all methods that can be bound upon access - * Where class attributes may be accessed via instances, whether any attributes could be methods + * Where class attributes are being accessed via instances, whether the + attributes are all methods that can be bound upon access + * Where class attributes may be accessed via instances, whether any + attributes could be methods -Such considerations dictate whether the context information originates from the attribute or from the accessor and whether any run-time test is required to determine this. +Such considerations dictate whether the context information originates from +the attribute or from the accessor and whether any run-time test is required +to determine this. == Preparing Instruction Plans == -Instruction plans are sequences of program operations, encoded in a higher-level form than actual program instructions, that describe the steps needed to access attributes. Such sequences are produced from the details provided by individual access plans. +Instruction plans are sequences of program operations, encoded in a +higher-level form than actual program instructions, that describe the steps +needed to access attributes. Such sequences are produced from the details +provided by individual access plans. === Original Accessor === -The expression providing the object from which the first attribute is obtained (or the only attribute if only one is specified) is known as the original accessor. Where this accessor can be identified, the expression describing it in the program can potentially be replaced with a static reference, and subsequent mentions of the accessor can potentially be replaced with such references or names used as expressions. +The expression providing the object from which the first attribute is obtained +(or the only attribute if only one is specified) is known as the original +accessor. Where this accessor can be identified, the expression describing it +in the program can potentially be replaced with a static reference, and +subsequent mentions of the accessor can potentially be replaced with such +references or names used as expressions. || '''Access Plan Information''' || '''Original Accessor''' || || Static accessor identified || Identified accessor || @@ -427,19 +570,33 @@ || Named accessor invocation, accessor not known to provide the attribute || Accessor expression || || Other accessors || Accessor expression || -By using names or static references, the need to store the result of evaluating an accessor expression is eliminated because such labels can be inserted in the instruction sequence as required. +By using names or static references, the need to store the result of +evaluating an accessor expression is eliminated because such labels can be +inserted in the instruction sequence as required. === First Access Operation === -Although it may be possible to convert accesses into single instructions that obtain attributes directly, many accesses will involve access operations that must consult an accessor object and obtain an attribute from that object, at least for the first attribute in a chain of attributes. This occurs when the access plan indicates the following situations: +Although it may be possible to convert accesses into single instructions that +obtain attributes directly, many accesses will involve access operations that +must consult an accessor object and obtain an attribute from that object, at +least for the first attribute in a chain of attributes. This occurs when the +access plan indicates the following situations: - * Final method is an access (meaning that an attribute cannot be directly obtained) - * Final method is an assignment (requiring the object whose attribute will be updated) + * Final method is an access (meaning that an attribute cannot be directly + obtained) + * Final method is an assignment (requiring the object whose attribute will be + updated) * Attributes (identified or otherwise) need traversing === Accessor Nature === -Attribute assignments involve a single '''target accessor''' and potentially many other accessors, depending on how many distinct expressions are evaluated to yield the value to be set in the assignment. Such a target accessor will usually be derived from the evaluation of an expression, and in some cases the expression will be the result of an opaque operation such as the invocation of a function. In such cases, the target accessor is stored in a temporary variable for subsequent use in access operations. +Attribute assignments involve a single '''target accessor''' and potentially +many other accessors, depending on how many distinct expressions are evaluated +to yield the value to be set in the assignment. Such a target accessor will +usually be derived from the evaluation of an expression, and in some cases the +expression will be the result of an opaque operation such as the invocation of +a function. In such cases, the target accessor is stored in a temporary +variable for subsequent use in access operations. === Context === @@ -455,4 +612,11 @@ == Deduction Products == -The deduction process should produce a complete catalogue of accessor and access references that may then be consulted by the [[../Translation|translation]] process needing to know the nature of any operation within the program. Central to the translation process's understanding of references is the '''attribute access plan''' for each reference which characterises each access and provides the basis for the formulation of the '''instruction plan''' used to replicate it in the final program. +The deduction process should produce a complete catalogue of accessor and +access references that may then be consulted by the +[[../Translation|translation]] process needing to know the nature of any +operation within the program. Central to the translation process's +understanding of references is the '''attribute access plan''' for each +reference which characterises each access and provides the basis for the +formulation of the '''instruction plan''' used to replicate it in the final +program. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Design --- a/docs/wiki/Design Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Design Fri Aug 17 11:41:28 2018 +0200 @@ -1,14 +1,28 @@ = Design Decisions = -The Lichen language design involves some different choices to those taken in Python's design. Many of these choices are motivated by the following criteria: +The Lichen language design involves some different choices to those taken in +Python's design. Many of these choices are motivated by the following +criteria: + + * To simplify the language and to make what programs do easier to understand + and to predict + * To make analysis of programs easier, particularly + [[../Deduction|deductions]] about the nature of the code + * To simplify and otherwise reduce the [[../Representations|representations]] + employed and the operations performed at run-time - * To simplify the language and to make what programs do easier to understand and to predict - * To make analysis of programs easier, particularly [[../Deduction|deductions]] about the nature of the code - * To simplify and otherwise reduce the [[../Representations|representations]] employed and the operations performed at run-time +Lichen is in many ways a restricted form of Python. In particular, +restrictions on the attribute names supported by each object help to clearly +define the object types in a program, allowing us to identify those objects +when they are used. Consequently, optimisations that can be employed in a +Lichen program become possible in situations where they would have been +difficult or demanding to employ in a Python program. -Lichen is in many ways a restricted form of Python. In particular, restrictions on the attribute names supported by each object help to clearly define the object types in a program, allowing us to identify those objects when they are used. Consequently, optimisations that can be employed in a Lichen program become possible in situations where they would have been difficult or demanding to employ in a Python program. - -Some design choices evoke memories of earlier forms of Python. Removing nested scopes simplifies the [[../Inspection|inspection]] of programs and run-time [[../Representations|representations]] and mechanisms. Other choices seek to remedy difficult or defective aspects of Python, notably the behaviour of Python's [[../Imports|import]] system. +Some design choices evoke memories of earlier forms of Python. Removing nested +scopes simplifies the [[../Inspection|inspection]] of programs and run-time +[[../Representations|representations]] and mechanisms. Other choices seek to +remedy difficult or defective aspects of Python, notably the behaviour of +Python's [[../Imports|import]] system. <> @@ -32,7 +46,13 @@ === Fixed Attribute Names === -Attribute names are bound for classes through assignment in the class namespace, for modules in the module namespace, and for instances in methods through assignment to `self`. Class and instance attributes are propagated to descendant classes and instances of descendant classes respectively. Once bound, attributes can be modified, but new attributes cannot be bound by other means, such as the assignment of an attribute to an arbitrary object that would not already support such an attribute. +Attribute names are bound for classes through assignment in the class +namespace, for modules in the module namespace, and for instances in methods +through assignment to `self`. Class and instance attributes are propagated to +descendant classes and instances of descendant classes respectively. Once +bound, attributes can be modified, but new attributes cannot be bound by other +means, such as the assignment of an attribute to an arbitrary object that +would not already support such an attribute. {{{#!python numbers=disable class C: @@ -44,7 +64,10 @@ C().y = 567 # not allowed (y not bound for C instances) }}} -Permitting the addition of attributes to objects would then require that such addition attempts be associated with particular objects, leading to a potentially iterative process involving object type deduction and modification, also causing imprecise results. +Permitting the addition of attributes to objects would then require that such +addition attempts be associated with particular objects, leading to a +potentially iterative process involving object type deduction and +modification, also causing imprecise results. === No Shadowing === @@ -57,11 +80,15 @@ self.a = 234 # not allowed (attribute shadows class attribute) }}} -Permitting this would oblige instances to support attributes that, when missing, are provided by consulting their classes but, when not missing, may also be provided directly by the instances themselves. +Permitting this would oblige instances to support attributes that, when +missing, are provided by consulting their classes but, when not missing, may +also be provided directly by the instances themselves. === No Dynamic Attributes === -Instance attributes cannot be provided dynamically, such that any missing attribute would be supplied by a special method call to determine the attribute's presence and to retrieve its value. +Instance attributes cannot be provided dynamically, such that any missing +attribute would be supplied by a special method call to determine the +attribute's presence and to retrieve its value. {{{#!python numbers=disable class C: @@ -70,31 +97,41 @@ return 123 }}} -Permitting this would require object types to potentially support any attribute, undermining attempts to use attributes to identify objects. +Permitting this would require object types to potentially support any +attribute, undermining attempts to use attributes to identify objects. == Naming == {{{#!table '''Lichen''' || '''Python''' || '''Rationale''' == -Names may be local, global or built-in: nested namespaces must be initialised explicitly +Names may be local, global or built-in: nested namespaces must be initialised +explicitly || Names may also be non-local, permitting closures || Limited name scoping simplifies program inspection and run-time mechanisms == `self` is a reserved name and is optional in method parameter lists -|| `self` is a naming convention, but the first method parameter must always refer to the accessed object -|| Reserving `self` assists deduction; making it optional is a consequence of the method binding behaviour +|| `self` is a naming convention, but the first method parameter must always +.. refer to the accessed object +|| Reserving `self` assists deduction; making it optional is a consequence of +.. the method binding behaviour == Instance attributes can be initialised using `.name` parameter notation -|| [[https://stackoverflow.com/questions/1389180/automatically-initialize-instance-variables|Workarounds]] involving decorators and introspection are required for similar brevity +|| [[https://stackoverflow.com/questions/1389180/automatically-initialize-instance-variables|Workarounds]] +.. involving decorators and introspection are required for similar brevity || Initialiser notation eliminates duplication in program code and is convenient }}} === Traditional Local, Global and Built-In Scopes Only === -Namespaces reside within a hierarchy within modules: classes containing classes or functions; functions containing other functions. Built-in names are exposed in all namespaces, global names are defined at the module level and are exposed in all namespaces within the module, locals are confined to the namespace in which they are defined. +Namespaces reside within a hierarchy within modules: classes containing +classes or functions; functions containing other functions. Built-in names are +exposed in all namespaces, global names are defined at the module level and +are exposed in all namespaces within the module, locals are confined to the +namespace in which they are defined. -However, locals are not inherited by namespaces from surrounding or enclosing namespaces. +However, locals are not inherited by namespaces from surrounding or enclosing +namespaces. {{{#!python numbers=disable def f(x): @@ -108,11 +145,16 @@ return i }}} -Needing to access outer namespaces in order to access any referenced names complicates the way in which such dynamic namespaces would need to be managed. Although the default initialisation technique demonstrated above could be automated, explicit initialisation makes programs easier to follow and avoids mistakes involving globals having the same name. +Needing to access outer namespaces in order to access any referenced names +complicates the way in which such dynamic namespaces would need to be managed. +Although the default initialisation technique demonstrated above could be +automated, explicit initialisation makes programs easier to follow and avoids +mistakes involving globals having the same name. === Reserved Self === -The `self` name can be omitted in method signatures, but in methods it is always initialised to the instance on which the method is operating. +The `self` name can be omitted in method signatures, but in methods it is +always initialised to the instance on which the method is operating. {{{#!python numbers=disable class C: @@ -120,11 +162,18 @@ self.x = y # self is the instance }}} -The assumption in methods is that `self` must always be referring to an instance of the containing class or of a descendant class. This means that `self` cannot be initialised to another kind of value, which Python permits through the explicit invocation of a method with the inclusion of the affected instance as the first argument. Consequently, `self` becomes optional in the signature because it is not assigned in the same way as the other parameters. +The assumption in methods is that `self` must always be referring to an +instance of the containing class or of a descendant class. This means that +`self` cannot be initialised to another kind of value, which Python permits +through the explicit invocation of a method with the inclusion of the affected +instance as the first argument. Consequently, `self` becomes optional in the +signature because it is not assigned in the same way as the other parameters. === Instance Attribute Initialisers === -In parameter lists, a special notation can be used to indicate that the given name is an instance attribute that will be assigned the argument value corresponding to the parameter concerned. +In parameter lists, a special notation can be used to indicate that the given +name is an instance attribute that will be assigned the argument value +corresponding to the parameter concerned. {{{#!python numbers=disable class C: @@ -132,7 +181,10 @@ self.c = c # a traditional assignment using a parameter }}} -To use the notation, such dot-qualified parameters must appear only in the parameter lists of methods, not plain functions. The qualified parameters are represented as locals having the same name, and assignments to the corresponding instance attributes are inserted into the generated code. +To use the notation, such dot-qualified parameters must appear only in the +parameter lists of methods, not plain functions. The qualified parameters are +represented as locals having the same name, and assignments to the +corresponding instance attributes are inserted into the generated code. {{{#!python numbers=disable class C: @@ -147,7 +199,9 @@ pass }}} -Naturally, `self`, being a reserved name in methods, can also be omitted from such parameter lists. Moreover, such initialising parameters can have default values. +Naturally, `self`, being a reserved name in methods, can also be omitted from +such parameter lists. Moreover, such initialising parameters can have default +values. {{{#!python numbers=disable class C: @@ -165,9 +219,13 @@ {{{#!table '''Lichen''' || '''Python''' || '''Rationale''' == -Class attributes are propagated to class hierarchy members during initialisation: rebinding class attributes does not affect descendant class attributes -|| Class attributes are propagated live to class hierarchy members and must be looked up by the run-time system if not provided by a given class -|| Initialisation-time propagation simplifies access operations and attribute table storage +Class attributes are propagated to class hierarchy members during +initialisation: rebinding class attributes does not affect descendant class +attributes +|| Class attributes are propagated live to class hierarchy members and must be +.. looked up by the run-time system if not provided by a given class +|| Initialisation-time propagation simplifies access operations and attribute +.. table storage == Unbound methods must be bound using a special function taking an instance || Unbound methods may be called using an instance as first argument @@ -175,20 +233,24 @@ == Functions assigned to class attributes do not become unbound methods || Functions assigned to class attributes become unbound methods -|| Removing method assignment simplifies deduction: methods are always defined in place +|| Removing method assignment simplifies deduction: methods are always defined +.. in place == Base classes must be well-defined || Base classes may be expressions -|| Well-defined base classes are required to establish a well-defined hierarchy of types +|| Well-defined base classes are required to establish a well-defined +.. hierarchy of types == Classes may not be defined in functions || Classes may be defined in any kind of namespace -|| Forbidding classes in functions prevents the definition of countless class variants that are awkward to analyse +|| Forbidding classes in functions prevents the definition of countless class +.. variants that are awkward to analyse }}} === Inherited Class Attributes === -Class attributes that are changed for a class do not change for that class's descendants. +Class attributes that are changed for a class do not change for that class's +descendants. {{{#!python numbers=disable class C: @@ -201,11 +263,20 @@ print D.a # remains 123 in Lichen, becomes 456 in Python }}} -Permitting this requires indirection for all class attributes, requiring them to be treated differently from other kinds of attributes. Meanwhile, class attribute rebinding and the accessing of inherited attributes changed in this way is relatively rare. +Permitting this requires indirection for all class attributes, requiring them +to be treated differently from other kinds of attributes. Meanwhile, class +attribute rebinding and the accessing of inherited attributes changed in this +way is relatively rare. === Unbound Methods === -Methods are defined on classes but are only available via instances: they are instance methods. Consequently, acquiring a method directly from a class and then invoking it should fail because the method will be unbound: the "context" of the method is not an instance. Furthermore, the Python technique of supplying an instance as the first argument in an invocation to bind the method to an instance, thus setting the context of the method, is not supported. See [[#ReservedSelf|"Reserved Self"]] for more information. +Methods are defined on classes but are only available via instances: they are +instance methods. Consequently, acquiring a method directly from a class and +then invoking it should fail because the method will be unbound: the "context" +of the method is not an instance. Furthermore, the Python technique of +supplying an instance as the first argument in an invocation to bind the +method to an instance, thus setting the context of the method, is not +supported. See [[#Reserved Self|"Reserved Self"]] for more information. {{{#!python numbers=disable class C: @@ -217,13 +288,24 @@ get_using(C.f, self)(123) # binds C.f to self, then the result is called }}} -Binding methods to instances occurs when acquiring methods via instances or explicitly using the `get_using` built-in. The built-in checks the compatibility of the supplied method and instance. If compatible, it provides the bound method as its result. +Binding methods to instances occurs when acquiring methods via instances or +explicitly using the `get_using` built-in. The built-in checks the +compatibility of the supplied method and instance. If compatible, it provides +the bound method as its result. -Normal functions are callable without any further preparation, whereas unbound methods need the binding step to be performed and are not immediately callable. Were functions to become unbound methods upon assignment to a class attribute, they would need to be invalidated by having the preparation mechanism enabled on them. However, this invalidation would only be relevant to the specific case of assigning functions to classes and this would need to be tested for. Given the added complications, such functionality is arguably not worth supporting. +Normal functions are callable without any further preparation, whereas unbound +methods need the binding step to be performed and are not immediately +callable. Were functions to become unbound methods upon assignment to a class +attribute, they would need to be invalidated by having the preparation +mechanism enabled on them. However, this invalidation would only be relevant +to the specific case of assigning functions to classes and this would need to +be tested for. Given the added complications, such functionality is arguably +not worth supporting. === Assigning Functions to Class Attributes === -Functions can be assigned to class attributes but do not become unbound methods as a result. +Functions can be assigned to class attributes but do not become unbound +methods as a result. {{{#!python numbers=disable class C: @@ -238,11 +320,18 @@ C().f(123) # permitted: f has merely been exposed via C.f }}} -Methods are identified as such by their definition location, they contribute information about attributes to the class hierarchy, and they employ certain structure details at run-time to permit the binding of methods. Since functions can defined in arbitrary locations, no class hierarchy information is available, and a function could combine `self` with a range of attributes that are not compatible with any class to which the function might be assigned. +Methods are identified as such by their definition location, they contribute +information about attributes to the class hierarchy, and they employ certain +structure details at run-time to permit the binding of methods. Since +functions can defined in arbitrary locations, no class hierarchy information +is available, and a function could combine `self` with a range of attributes +that are not compatible with any class to which the function might be +assigned. === Well-Defined Base Classes === -Base classes must be clearly identifiable as well-defined classes. This facilitates the cataloguing of program objects and further analysis on them. +Base classes must be clearly identifiable as well-defined classes. This +facilitates the cataloguing of program objects and further analysis on them. {{{#!python numbers=disable class C: @@ -255,11 +344,19 @@ pass }}} -If base class identification could only be done reliably at run-time, class relationship information would be very limited without running the program or performing costly and potentially unreliable analysis. Indeed, programs employing such dynamic base classes are arguably resistant to analysis, which is contrary to the goals of a language like Lichen. +If base class identification could only be done reliably at run-time, class +relationship information would be very limited without running the program or +performing costly and potentially unreliable analysis. Indeed, programs +employing such dynamic base classes are arguably resistant to analysis, which +is contrary to the goals of a language like Lichen. === Class Definitions and Functions === -Classes may not be defined in functions because functions provide dynamic namespaces, but Lichen relies on a static namespace hierarchy in order to clearly identify the principal objects in a program. If classes could be defined in functions, despite seemingly providing the same class over and over again on every invocation, a family of classes would, in fact, be defined. +Classes may not be defined in functions because functions provide dynamic +namespaces, but Lichen relies on a static namespace hierarchy in order to +clearly identify the principal objects in a program. If classes could be +defined in functions, despite seemingly providing the same class over and over +again on every invocation, a family of classes would, in fact, be defined. {{{#!python numbers=disable def f(x): @@ -268,7 +365,9 @@ return f }}} -Moreover, issues of namespace nesting also arise, since the motivation for defining classes in functions would surely be to take advantage of local state to parameterise such classes. +Moreover, issues of namespace nesting also arise, since the motivation for +defining classes in functions would surely be to take advantage of local state +to parameterise such classes. == Modules and Packages == @@ -276,25 +375,33 @@ '''Lichen''' || '''Python''' || '''Rationale''' == Modules are independent: package hierarchies are not traversed when importing -|| Modules exist in hierarchical namespaces: package roots must be imported before importing specific submodules -|| Eliminating module traversal permits more precise imports and reduces superfluous code +|| Modules exist in hierarchical namespaces: package roots must be imported +.. before importing specific submodules +|| Eliminating module traversal permits more precise imports and reduces +.. superfluous code == -Only specific names can be imported from a module or package using the `from` statement +Only specific names can be imported from a module or package using the `from` +statement || Importing "all" from a package or module is permitted -|| Eliminating "all" imports simplifies the task of determining where names in use have come from +|| Eliminating "all" imports simplifies the task of determining where names in +.. use have come from == Modules must be specified using absolute names || Imports can be absolute or relative || Using only absolute names simplifies the import mechanism == -Modules are imported independently and their dependencies subsequently resolved +Modules are imported independently and their dependencies subsequently +resolved || Modules are imported as import statements are encountered -|| Statically-initialised objects can be used declaratively, although an initialisation order may still need establishing +|| Statically-initialised objects can be used declaratively, although an +.. initialisation order may still need establishing }}} === Independent Modules === -The inclusion of modules in a program affects only explicitly-named modules: they do not have relationships implied by their naming that would cause such related modules to be included in a program. +The inclusion of modules in a program affects only explicitly-named modules: +they do not have relationships implied by their naming that would cause such +related modules to be included in a program. {{{#!python numbers=disable from compiler import consts # defines consts @@ -305,11 +412,15 @@ consts # is defined }}} -Where modules should have relationships, they should be explicitly defined using `from` and `import` statements which target the exact modules required. In the above example, `compiler` is not routinely imported because modules within the `compiler` package have been requested. +Where modules should have relationships, they should be explicitly defined +using `from` and `import` statements which target the exact modules required. +In the above example, `compiler` is not routinely imported because modules +within the `compiler` package have been requested. === Specific Name Imports Only === -Lichen, unlike Python, also does not support the special `__all__` module attribute. +Lichen, unlike Python, also does not support the special `__all__` module +attribute. {{{#!python numbers=disable from compiler import * # not permitted @@ -318,11 +429,22 @@ interpreter # undefined in compiler (yet it might be thought to reside there) and in this module }}} -The `__all__` attribute supports `from ... import *` statements in Python, but without identifying the module or package involved and then consulting `__all__` in that module or package to discover which names might be involved (which might require the inspection of yet other modules or packages), the names imported cannot be known. Consequently, some names used elsewhere in the module performing the import might be assumed to be imported names when, in fact, they are unknown in both the importing and imported modules. Such uncertainty hinders the inspection of individual modules. +The `__all__` attribute supports `from ... import *` statements in Python, but +without identifying the module or package involved and then consulting +`__all__` in that module or package to discover which names might be involved +(which might require the inspection of yet other modules or packages), the +names imported cannot be known. Consequently, some names used elsewhere in the +module performing the import might be assumed to be imported names when, in +fact, they are unknown in both the importing and imported modules. Such +uncertainty hinders the inspection of individual modules. === Modules Imported Independently === -When indicating an import using the `from` and `import` statements, the [[../Toolchain|toolchain]] does not attempt to immediately import other modules. Instead, the imports act as declarations of such other modules or names from other modules, resolved at a later stage. This permits mutual imports to a greater extent than in Python. +When indicating an import using the `from` and `import` statements, the +[[../Toolchain|toolchain]] does not attempt to immediately import other +modules. Instead, the imports act as declarations of such other modules or +names from other modules, resolved at a later stage. This permits mutual +imports to a greater extent than in Python. {{{#!python numbers=disable # Module M @@ -344,7 +466,10 @@ import N }}} -Such flexibility is not usually needed, and circular importing usually indicates issues with program organisation. However, declarative imports can help to decouple modules and avoid combining import declaration and module initialisation order concerns. +Such flexibility is not usually needed, and circular importing usually +indicates issues with program organisation. However, declarative imports can +help to decouple modules and avoid combining import declaration and module +initialisation order concerns. == Syntax and Control-Flow == @@ -353,11 +478,14 @@ == If expressions and comprehensions are not supported || If expressions and comprehensions are supported -|| Omitting such syntactic features simplifies program inspection and translation +|| Omitting such syntactic features simplifies program inspection and +.. translation == The `with` statement is not supported -|| The `with` statement offers a mechanism for resource allocation and deallocation using context managers -|| This syntactic feature can be satisfactorily emulated using existing constructs +|| The `with` statement offers a mechanism for resource allocation and +.. deallocation using context managers +|| This syntactic feature can be satisfactorily emulated using existing +.. constructs == Generators are not supported || Generators are supported @@ -369,12 +497,16 @@ == All parameters must be specified || Catch-all parameters (`*` and `**`) are supported -|| Omitting catch-all parameter population simplifies generic invocation handling +|| Omitting catch-all parameter population simplifies generic invocation +.. handling }}} === No If Expressions or Comprehensions === -In order to support the classic [[WikiPedia:?:|ternary operator]], a construct was [[https://www.python.org/dev/peps/pep-0308/|added]] to the Python syntax that needed to avoid problems with the existing grammar and notation. Unfortunately, it reorders the components from the traditional form: +In order to support the classic [[WikiPedia:?:|ternary operator]], a construct +was [[https://www.python.org/dev/peps/pep-0308/|added]] to the Python syntax +that needed to avoid problems with the existing grammar and notation. +Unfortunately, it reorders the components from the traditional form: {{{#!python numbers=disable # Not valid in Lichen, only in Python. @@ -386,7 +518,9 @@ true_result if (inner_true_result if condition else inner_false_result) else false_result }}} -Since if expressions may participate within expressions, they cannot be rewritten as if statements. Nor can they be rewritten as logical operator chains in general. +Since if expressions may participate within expressions, they cannot be +rewritten as if statements. Nor can they be rewritten as logical operator +chains in general. {{{#!python numbers=disable # Not valid in Lichen, only in Python. @@ -399,9 +533,17 @@ a = x and 0 or 1 # not valid }}} -But in any case, it would be more of a motivation to support the functionality if a better syntax could be adopted instead. However, if expressions are not particularly important in Python, and despite enhancement requests over many years, everybody managed to live without them. +But in any case, it would be more of a motivation to support the functionality +if a better syntax could be adopted instead. However, if expressions are not +particularly important in Python, and despite enhancement requests over many +years, everybody managed to live without them. -List and generator comprehensions are more complicated but share some characteristics of if expressions: their syntax contradicts the typical conventions established by the rest of the Python language; they create implicit state that is perhaps most appropriately modelled by a separate function or similar object. Since Lichen does not support generators at all, it will obviously not support generator expressions. +List and generator comprehensions are more complicated but share some +characteristics of if expressions: their syntax contradicts the typical +conventions established by the rest of the Python language; they create +implicit state that is perhaps most appropriately modelled by a separate +function or similar object. Since Lichen does not support generators at all, +it will obviously not support generator expressions. Meanwhile, list comprehensions quickly encourage barely-readable programs: @@ -412,11 +554,21 @@ a = [z for y in x if y for z in y if z] }}} -Supporting the creation of temporary functions to produce list comprehensions, while also hiding temporary names from the enclosing scope, adds complexity to the toolchain for situations where programmers would arguably be better creating their own functions and thus writing more readable programs. +Supporting the creation of temporary functions to produce list comprehensions, +while also hiding temporary names from the enclosing scope, adds complexity to +the toolchain for situations where programmers would arguably be better +creating their own functions and thus writing more readable programs. === No With Statement === -The [[https://docs.python.org/2.7/reference/compound_stmts.html#the-with-statement|with statement]] introduced the concept of [[https://docs.python.org/2.7/reference/datamodel.html#context-managers|context managers]] in Python 2.5, with such objects supporting a [[https://docs.python.org/2.7/library/stdtypes.html#typecontextmanager|programming interface]] that aims to formalise certain conventions around resource management. For example: +The +[[https://docs.python.org/2.7/reference/compound_stmts.html#the-with-statement|with +statement]] introduced the concept of +[[https://docs.python.org/2.7/reference/datamodel.html#context-managers|context +managers]] in Python 2.5, with such objects supporting a +[[https://docs.python.org/2.7/library/stdtypes.html#typecontextmanager|programming +interface]] that aims to formalise certain conventions around resource +management. For example: {{{#!python numbers=disable # Not valid in Lichen, only in Python. @@ -426,7 +578,11 @@ cursor.execute(query, args) }}} -Although this makes for readable code, it must be supported by objects which define the `__enter__` and `__exit__` special methods. Here, the `connect` method invoked in the first `with` statement must return such an object; similarly, the `cursor` method must also provide an object with such characteristics. +Although this makes for readable code, it must be supported by objects which +define the `__enter__` and `__exit__` special methods. Here, the `connect` +method invoked in the first `with` statement must return such an object; +similarly, the `cursor` method must also provide an object with such +characteristics. However, the "pre-with" solution is as follows: @@ -442,11 +598,23 @@ connection.close() }}} -Although this seems less readable, its behaviour is more obvious because magic methods are not being called implicitly. Moreover, any parameterisation of the acts of resource deallocation or closure can be done in the `finally` clauses where such parameterisation would seem natural, rather than being specified through some kind of context manager initialisation arguments that must then be propagated to the magic methods so that they may take into consideration contextual information that is readily available in the place where the actual resource operations are being performed. +Although this seems less readable, its behaviour is more obvious because magic +methods are not being called implicitly. Moreover, any parameterisation of the +acts of resource deallocation or closure can be done in the `finally` clauses +where such parameterisation would seem natural, rather than being specified +through some kind of context manager initialisation arguments that must then +be propagated to the magic methods so that they may take into consideration +contextual information that is readily available in the place where the actual +resource operations are being performed. === No Generators === -[[https://www.python.org/dev/peps/pep-0255/|Generators]] were [[https://docs.python.org/release/2.3/whatsnew/section-generators.html|added]] to Python in the 2.2 release and became fully part of the language in the 2.3 release. They offer a convenient way of writing iterator-like objects, capturing execution state instead of obliging the programmer to manage such state explicitly. +[[https://www.python.org/dev/peps/pep-0255/|Generators]] were +[[https://docs.python.org/release/2.3/whatsnew/section-generators.html|added]] +to Python in the 2.2 release and became fully part of the language in the 2.3 +release. They offer a convenient way of writing iterator-like objects, +capturing execution state instead of obliging the programmer to manage such +state explicitly. {{{#!python numbers=disable # Not valid in Lichen, only in Python. @@ -477,11 +645,20 @@ i += 1 }}} -However, generators make additional demands on the mechanisms provided to support program execution. The encapsulation of the above example generator in a separate class illustrates the need for state that persists outside the execution of the routine providing the generator's results. Generators may look like functions, but they do not necessarily behave like them, leading to potential misunderstandings about their operation even if the code is superficially tidy and concise. +However, generators make additional demands on the mechanisms provided to +support program execution. The encapsulation of the above example generator in +a separate class illustrates the need for state that persists outside the +execution of the routine providing the generator's results. Generators may +look like functions, but they do not necessarily behave like them, leading to +potential misunderstandings about their operation even if the code is +superficially tidy and concise. === Positional and Keyword Arguments Only === -When invoking callables, only positional arguments and keyword arguments can be used. Python also supports `*` and `**` arguments which respectively unpack sequences and mappings into the argument list, filling the list with sequence items (using `*`) and keywords (using `**`). +When invoking callables, only positional arguments and keyword arguments can +be used. Python also supports `*` and `**` arguments which respectively unpack +sequences and mappings into the argument list, filling the list with sequence +items (using `*`) and keywords (using `**`). {{{#!python numbers=disable def f(a, b, c, d): @@ -494,11 +671,15 @@ f(2, 4, **m) # not permitted }}} -While convenient, such "unpacking" arguments obscure the communication between callables and undermine the safety provided by function and method signatures. They also require run-time support for the unpacking operations. +While convenient, such "unpacking" arguments obscure the communication between +callables and undermine the safety provided by function and method signatures. +They also require run-time support for the unpacking operations. === Positional Parameters Only === -Similarly, signatures may only contain named parameters that correspond to arguments. Python supports `*` and `**` in parameter lists, too, which respectively accumulate superfluous positional and keyword arguments. +Similarly, signatures may only contain named parameters that correspond to +arguments. Python supports `*` and `**` in parameter lists, too, which +respectively accumulate superfluous positional and keyword arguments. {{{#!python numbers=disable def f(a, b, *args, **kw): # not permitted @@ -508,4 +689,11 @@ f(1, 2, c=3, d=4) }}} -Such accumulation parameters can be useful for collecting arbitrary data and applying some of it within a callable. However, they can easily proliferate throughout a system and allow erroneous data to propagate far from its origin because such parameters permit the deferral of validation until the data needs to be accessed. Again, run-time support is required to marshal arguments into the appropriate parameter of this nature, but programmers could just write functions and methods that employ general sequence and mapping parameters explicitly instead. +Such accumulation parameters can be useful for collecting arbitrary data and +applying some of it within a callable. However, they can easily proliferate +throughout a system and allow erroneous data to propagate far from its origin +because such parameters permit the deferral of validation until the data needs +to be accessed. Again, run-time support is required to marshal arguments into +the appropriate parameter of this nature, but programmers could just write +functions and methods that employ general sequence and mapping parameters +explicitly instead. diff -r 8f8361de472a -r f745d151a441 docs/wiki/FrontPage --- a/docs/wiki/FrontPage Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/FrontPage Fri Aug 17 11:41:28 2018 +0200 @@ -2,41 +2,92 @@ || [[/Downloads|Downloads]] || [[#Language|Language]] || [[#Toolchain|Toolchain]] || [[#Rationale|Rationale]] || [[#Documents|Documents]] || -Lichen is both a Python-like [[/Design|language]] and a [[/Toolchain|toolchain]] for that language. +Lichen is both a Python-like [[/Design|language]] and a +[[/Toolchain|toolchain]] for that language. Some objectives: - * Perform analysis on programs to better understand program structure and behaviour + * Perform analysis on programs to better understand program structure and + behaviour * Develop code generation capabilities - * Provide a platform for experimentation independent of existing Python language and library implementations + * Provide a platform for experimentation independent of existing Python + language and library implementations * Provide independence from Python language evolution * Learn things about writing compilers -Despite building on a long [[/History|history]] of experimentation, Lichen still requires some [[/ToDo|work to be done]] for it to be more widely usable. +Despite building on a long [[/History|history]] of experimentation, Lichen +still requires some [[/ToDo|work to be done]] for it to be more widely usable. <> == Language == -The Lichen language [[/Design|foregoes]] various dynamic aspects of Python to provide a foundation upon which more predictable programs can be built, while preserving essential functionality to make the core of the language seem very much "like Python" (thus yielding the name "Lichen"). The general syntax is largely identical to Python, with only certain syntactic constructs being deliberately unsupported, largely because the associated features are not desired. +The Lichen language [[/Design|foregoes]] various dynamic aspects of Python to +provide a foundation upon which more predictable programs can be built, while +preserving essential functionality to make the core of the language seem very +much "like Python" (thus yielding the name "Lichen"). The general syntax is +largely identical to Python, with only certain syntactic constructs being +deliberately unsupported, largely because the associated features are not +desired. <> == Toolchain == -The Lichen [[/Toolchain|toolchain]] employs existing tokeniser and parser software to obtain an abstract syntax tree which is then inspected to provide data to support deductions about the structure and nature of a given program. With the information obtained from these processes, a program is then constructed, consisting of a number of source files in the target compilation language (which is currently the C programming language). This generated program may be compiled and run, hopefully producing the results intended by the source program's authors. +The Lichen [[/Toolchain|toolchain]] employs existing tokeniser and parser +software to obtain an abstract syntax tree which is then inspected to provide +data to support deductions about the structure and nature of a given program. +With the information obtained from these processes, a program is then +constructed, consisting of a number of source files in the target compilation +language (which is currently the C programming language). This generated +program may be compiled and run, hopefully producing the results intended by +the source program's authors. -Lichen source files use the `.py` suffix since the language syntax is superficially compatible with Python, allowing text editors to provide highlighting and editing support for Lichen programs without the need to reconfigure such tools. However, an alternative, recommended suffix is likely to be introduced in future. +Lichen source files use the `.py` suffix since the language syntax is +superficially compatible with Python, allowing text editors to provide +highlighting and editing support for Lichen programs without the need to +reconfigure such tools. However, an alternative, recommended suffix is likely +to be introduced in future. <> == Library == -Unlike other Python compilation efforts, Lichen programs employ a newly-written library that is distinct from the CPython standard library distribution and completely independent from the CPython extension and object libraries on which Python programs being run with CPython must depend. Thus, there is no dependency on any `libpython` for run-time functionality. Since the parts of the Python standard library that are written in Python tend to be rather variable in quality, there has been no real inclination to re-use modules from that particular source, noting that they would need modifying to be compatible with Lichen, anyway. However, rewriting such modules provides opportunities to "do things right": with some functionality being over twenty years old and in bad shape, this is arguably something that should have been done for Python, anyway. +Unlike other Python compilation efforts, Lichen programs employ a newly-written +library that is distinct from the CPython standard library distribution and +completely independent from the CPython extension and object libraries on which +Python programs being run with CPython must depend. Thus, there is no +dependency on any `libpython` for run-time functionality. Since the parts of +the Python standard library that are written in Python tend to be rather +variable in quality, there has been no real inclination to re-use modules from +that particular source, noting that they would need modifying to be compatible +with Lichen, anyway. However, rewriting such modules provides opportunities to +"do things right": with some functionality being over twenty years old and in +bad shape, this is arguably something that should have been done for Python, +anyway. <> == Rationale == -Python has proven to be a readable, productive, comfortable-to-use and popular programming language. However, as it has accumulated features, the precise behaviour of programs making use of many of these features has become more difficult to predict. Features added to provide even more convenience to the programmer have often incurred run-time costs, introduced layers of indirection, and have made programs even more inscrutable. Instead of development tools reaching a point of being able to infer information about programs, it has been suggested that programmers annotate their programs in order to help tools understand those programs instead. Beyond superficial code style analysis and providing tooltips in integrated development environments, Python code analysis is often portrayed as a lost cause. +Python has proven to be a readable, productive, comfortable-to-use and popular +programming language. However, as it has accumulated features, the precise +behaviour of programs making use of many of these features has become more +difficult to predict. Features added to provide even more convenience to the +programmer have often incurred run-time costs, introduced layers of +indirection, and have made programs even more inscrutable. Instead of +development tools reaching a point of being able to infer information about +programs, it has been suggested that programmers annotate their programs in +order to help tools understand those programs instead. Beyond superficial code +style analysis and providing tooltips in integrated development environments, +Python code analysis is often portrayed as a lost cause. -By employing a refined language [[/Design|design]], Lichen aims to let each program define its [[/Structure|structure]] conveniently, be readily [[/Inspection|inspected]], and thus support [[/Deduction|deductions]] about the use of the program's objects by the code. The result should be more predictable programs that can be [[/Translation|translated]] to other, more efficient, [[/Representations|representations]]. The Lichen toolchain should be able to tell the programmer useful things about their programs that it may also be able to make use of itself. It not only aims to report information about programs that might be of interest to the developer, but it seeks to produce functioning, translated programs that can actually be run. +By employing a refined language [[/Design|design]], Lichen aims to let each +program define its [[/Structure|structure]] conveniently, be readily +[[/Inspection|inspected]], and thus support [[/Deduction|deductions]] about the +use of the program's objects by the code. The result should be more predictable +programs that can be [[/Translation|translated]] to other, more efficient, +[[/Representations|representations]]. The Lichen toolchain should be able to +tell the programmer useful things about their programs that it may also be able +to make use of itself. It not only aims to report information about programs +that might be of interest to the developer, but it seeks to produce +functioning, translated programs that can actually be run. <> == Document Index == diff -r 8f8361de472a -r f745d151a441 docs/wiki/History --- a/docs/wiki/History Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/History Fri Aug 17 11:41:28 2018 +0200 @@ -1,29 +1,115 @@ = History = -Lichen is one of a number of experiments focused on the analysis of Python source code with potential reliability and performance improvements in mind. There have been a number of projects that can be regarded as its predecessors, the first of these being a project with the name "Analysis" that was inspired by the then-recent announcement of the [[https://dspace.mit.edu/handle/1721.1/16688|Starkiller]] tool, which attempted to compile Python to C++ (and was never publicly released). With the emergence of the [[http://pypy.org/|PyPy]] project, the notion of writing a Python implementation in a dialect of Python was also considered attractive. +Lichen is one of a number of experiments focused on the analysis of Python +source code with potential reliability and performance improvements in mind. +There have been a number of projects that can be regarded as its predecessors, +the first of these being a project with the name "Analysis" that was inspired +by the then-recent announcement of the +[[https://dspace.mit.edu/handle/1721.1/16688|Starkiller]] tool, which +attempted to compile Python to C++ (and was never publicly released). With the +emergence of the [[http://pypy.org/|PyPy]] project, the notion of writing a +Python implementation in a dialect of Python was also considered attractive. == Initial Experiments == -After some experience gained, a successor to "Analysis" was attempted with the uninspired, but distinct, name of "analysis". This attempted to apply type inference and also generate C programs that could be compiled, albeit with a very simple run-time model that was not really amenable to further improvement. However, some useful concepts were explored, such as abstract syntax tree (AST) node annotations that could be used to direct code generation activities and to provide annotated source code and documentation summaries. +After some experience gained, a successor to "Analysis" was attempted with the +uninspired, but distinct, name of "analysis". This attempted to apply type +inference and also generate C programs that could be compiled, albeit with a +very simple run-time model that was not really amenable to further +improvement. However, some useful concepts were explored, such as abstract +syntax tree (AST) node annotations that could be used to direct code +generation activities and to provide annotated source code and documentation +summaries. -Various experiences with specialising functions and methods, plus general experience working with the AST led to a successor called "simplify", which attempted to extend specialisation to classes, thus introducing the notion of multiple forms of a single class whose attributes have differing characteristics. It also introduced the idea of a simplified instruction set for program operations, in contrast to Python bytecode whose instructions can represent relatively complicated operations, along with a simplified program representation that was intended to make program analysis easier. Issues around "data polymorphism" led to the realisation that attempting to identify specific types, in order to facilitate insights into program behaviour or optimisations of programs, was costly and not necessarily effective. +Various experiences with specialising functions and methods, plus general +experience working with the AST led to a successor called "simplify", which +attempted to extend specialisation to classes, thus introducing the notion of +multiple forms of a single class whose attributes have differing +characteristics. It also introduced the idea of a simplified instruction set +for program operations, in contrast to Python bytecode whose instructions can +represent relatively complicated operations, along with a simplified program +representation that was intended to make program analysis easier. Issues +around "data polymorphism" led to the realisation that attempting to identify +specific types, in order to facilitate insights into program behaviour or +optimisations of programs, was costly and not necessarily effective. -(Conversations with the author of [[http://shedskin.github.io/|Shed Skin]], a tool already available at this time, which compiles Python programs complying to certain restrictions to C++, were very useful in considering data polymorphism issues and others. Shed Skin must effectively be what Starkiller promised to be, and it has even been available in Debian for a number of years so that others can evaluate it for their own needs.) +(Conversations with the author of [[http://shedskin.github.io/|Shed Skin]], a +tool already available at this time, which compiles Python programs complying +to certain restrictions to C++, were very useful in considering data +polymorphism issues and others. Shed Skin must effectively be what Starkiller +promised to be, and it has even been available in Debian for a number of years +so that others can evaluate it for their own needs.) == Further Experiences == -An alternative approach was taken in the next successor, "micropython" (not to be confused with any other, more popular, project of the same name), where the focus of analysis shifted to defining the attributes provided by program objects and then determining which objects might be used in different places in the program according to the attributes mentioned. It introduced a strategy for representing program objects at run-time, and continued the work on a simplified instruction set, defining its instructions in terms of the run-time structures. The ability to generate annotated source code summaries was enhanced, and the software could also generate programs in the simplified instruction set that could be executed using an interpreter. Although substantial progress was made, the increasing resistance experienced in adapting a growing system to further forms of optimisations, and the need to implement and test run-time functionality on a somewhat unique instruction set architecture meant that progress became slower over time. +An alternative approach was taken in the next successor, "micropython" (not to +be confused with any other, more popular, project of the same name), where the +focus of analysis shifted to defining the attributes provided by program +objects and then determining which objects might be used in different places +in the program according to the attributes mentioned. It introduced a strategy +for representing program objects at run-time, and continued the work on a +simplified instruction set, defining its instructions in terms of the run-time +structures. The ability to generate annotated source code summaries was +enhanced, and the software could also generate programs in the simplified +instruction set that could be executed using an interpreter. Although +substantial progress was made, the increasing resistance experienced in +adapting a growing system to further forms of optimisations, and the need to +implement and test run-time functionality on a somewhat unique instruction set +architecture meant that progress became slower over time. -The dependence of "micropython" on costly whole-program analysis to identify optimisations led to some further realisations. The notion of "attribute usage" had allowed program namespaces to have their names characterised, and the notion of well-defined structures had allowed the identification of specific object types throughout programs, albeit with a degree of understandable inaccuracy. Meanwhile, the run-time data model had relied on restrictions on structures to permit relatively efficient structure access operations even when object types cannot be identified. It thus appeared that such techniques might be almost as potent on their own as the combination of more costly techniques explored in earlier projects. +The dependence of "micropython" on costly whole-program analysis to identify +optimisations led to some further realisations. The notion of "attribute +usage" had allowed program namespaces to have their names characterised, and +the notion of well-defined structures had allowed the identification of +specific object types throughout programs, albeit with a degree of +understandable inaccuracy. Meanwhile, the run-time data model had relied on +restrictions on structures to permit relatively efficient structure access +operations even when object types cannot be identified. It thus appeared that +such techniques might be almost as potent on their own as the combination of +more costly techniques explored in earlier projects. == Current Work == -It was with such realisations that a new project was effectively born. Tentatively called "!PythonLight" but renamed to "Lichen" as the code matured, the objectives now involved a simpler processing framework that merely attempted to catalogue structure members, to determine the origins of such members, and to record data flow within namespaces in order to determine attribute usage on names. The name "Lichen" can be pronounced in a way that is a contraction of "like Python", indicating that although the language being analysed is similar to Python, restrictions apply that make it not identical to it. +It was with such realisations that a new project was effectively born. +Tentatively called "!PythonLight" but renamed to "Lichen" as the code matured, +the objectives now involved a simpler processing framework that merely +attempted to catalogue structure members, to determine the origins of such +members, and to record data flow within namespaces in order to determine +attribute usage on names. The name "Lichen" can be pronounced in a way that is +a contraction of "like Python", indicating that although the language being +analysed is similar to Python, restrictions apply that make it not identical +to it. -Unlike previous efforts, instead of annotating the AST built by the Python parser, separate catalogues would record details of program operations for more convenient consumption by potentially separate analysis tools. It is worth emphasising that much of the processing work done to characterise program behaviour is effectively database work: searching for information that matches particular criteria. Emitting tabular data for perusal and potential use by other tools is a recognition of this, and it would not be unreasonable to expect future versions of tools of this nature to actually employ more conventional database management systems to facilitate program analysis. +Unlike previous efforts, instead of annotating the AST built by the Python +parser, separate catalogues would record details of program operations for +more convenient consumption by potentially separate analysis tools. It is +worth emphasising that much of the processing work done to characterise +program behaviour is effectively database work: searching for information that +matches particular criteria. Emitting tabular data for perusal and potential +use by other tools is a recognition of this, and it would not be unreasonable +to expect future versions of tools of this nature to actually employ more +conventional database management systems to facilitate program analysis. -Instead of targeting a project-specific virtual machine, the ultimate objective of processing was amended to generate a C program that could be compiled using existing tools. Although care must be taken to generate C source code, delegating the low-level work of actually producing an executable reduces the scope of the project and its associated demands. +Instead of targeting a project-specific virtual machine, the ultimate +objective of processing was amended to generate a C program that could be +compiled using existing tools. Although care must be taken to generate C +source code, delegating the low-level work of actually producing an executable +reduces the scope of the project and its associated demands. -Despite reasonable progress being made, certain features retained from Python proved to be a burden, notably the behaviour of Python's import system and the way in which programs can quickly accumulate superfluous code through that system's operation. To focus on genuinely important functionality and on producing working code, the project was [[../Restarted|restarted]] with various additional simplifications made to the supported language. Fortunately, this helped to drive progress, resulting in the development of a toolchain that can generate C programs for compilation and execution. +Despite reasonable progress being made, certain features retained from Python +proved to be a burden, notably the behaviour of Python's import system and the +way in which programs can quickly accumulate superfluous code through that +system's operation. To focus on genuinely important functionality and on +producing working code, the project was [[../Restarted|restarted]] with +various additional simplifications made to the supported language. +Fortunately, this helped to drive progress, resulting in the development of a +toolchain that can generate C programs for compilation and execution. -Lichen attempts to balance the goals of requiring relatively inexpensive analysis, offering meaningful insights into program behaviour, providing output in the form of working, translated, standalone programs, and encouraging further experimentation in the areas of language design, program analysis and optimisation. Much of the standard library is itself written in Lichen, even functionality that would not be written in Python for CPython, thus realising one of the original ambitions for this work and its predecessors. +Lichen attempts to balance the goals of requiring relatively inexpensive +analysis, offering meaningful insights into program behaviour, providing +output in the form of working, translated, standalone programs, and +encouraging further experimentation in the areas of language design, program +analysis and optimisation. Much of the standard library is itself written in +Lichen, even functionality that would not be written in Python for CPython, +thus realising one of the original ambitions for this work and its +predecessors. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Imports --- a/docs/wiki/Imports Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Imports Fri Aug 17 11:41:28 2018 +0200 @@ -1,17 +1,21 @@ = Imports = -An '''import''' is a declaration of one or more names that are provided by another source file or '''module''': +An '''import''' is a declaration of one or more names that are provided by +another source file or '''module''': * `import` statements declare names that correspond to modules * `from` statements declare names provided by modules -Imports occur either through explicit import operations initiated by the `from` and `import` statements, or through implicit import operations occurring to satisfy the requirements of another kind of operation. +Imports occur either through explicit import operations initiated by the +`from` and `import` statements, or through implicit import operations +occurring to satisfy the requirements of another kind of operation. <> == Packages and Submodules == -A '''package''' is a collection of modules whose names are all prefixed by the package name. For example: +A '''package''' is a collection of modules whose names are all prefixed by the +package name. For example: {{{ compiler @@ -19,30 +23,45 @@ compiler.transformer }}} -Here, the `compiler` package is said to contain the `compiler.ast` and `compiler.transformer` modules. +Here, the `compiler` package is said to contain the `compiler.ast` and +`compiler.transformer` modules. === Defining Packages === -The package root or top-level module is defined in a file called `__init__.py` inside the directory bearing the package's name, and this file provides a namespace for the top-level module. However, a package does not expose its member modules ('''submodules''') as members of its top-level module. Instead, the hierarchical relationship between a package and its submodules exists purely in the naming of those modules, and where submodules are imported they must be done so using their full names. +The package root or top-level module is defined in a file called `__init__.py` +inside the directory bearing the package's name, and this file provides a +namespace for the top-level module. However, a package does not expose its +member modules ('''submodules''') as members of its top-level module. Instead, +the hierarchical relationship between a package and its submodules exists +purely in the naming of those modules, and where submodules are imported they +must be done so using their full names. -Thus, relationships between packages and modules must be explicitly defined in module namespaces. For example, in the `compiler` module, the following would define relationships to the submodules: +Thus, relationships between packages and modules must be explicitly defined in +module namespaces. For example, in the `compiler` module, the following would +define relationships to the submodules: {{{ from compiler.ast import ast from compiler.transformer import transformer }}} -Without such import statements, no attempt will be made upon importing `compiler` to access the submodules and automatically populate the package. +Without such import statements, no attempt will be made upon importing +`compiler` to access the submodules and automatically populate the package. === Accessing Submodules Directly === -Importing of submodules from packages will not cause the package itself to be imported. For example: +Importing of submodules from packages will not cause the package itself to be +imported. For example: {{{ import compiler.ast }}} -This initialises the name `ast` which refers to the `compiler.ast` module, but the `compiler` package and its top-level module will not be imported. Thus, submodules can be considered independent of their packages, although they may seek to import their package top-level module should they need to access objects provided by that module. +This initialises the name `ast` which refers to the `compiler.ast` module, but +the `compiler` package and its top-level module will not be imported. Thus, +submodules can be considered independent of their packages, although they may +seek to import their package top-level module should they need to access +objects provided by that module. == Implicit Imports == @@ -56,11 +75,20 @@ || Unary operators || `operator.unary` || || Access to built-in name || `__builtins__` || (various modules in the [[../Builtins|built-ins]] package hierarchy) || -Operator usage will cause a local name referring to an `operator` module function to be created, with the appropriate function being exposed by the `operator` module itself. However, the inspection process will seek to obtain a reference to the function in its actual definition location, ultimately referencing the function in one of the modules indicated above. +Operator usage will cause a local name referring to an `operator` module +function to be created, with the appropriate function being exposed by the +`operator` module itself. However, the inspection process will seek to obtain +a reference to the function in its actual definition location, ultimately +referencing the function in one of the modules indicated above. == Import Sequencing == -In order to populate modules, other modules may themselves be required to provide names to a given module, and in turn these other modules may rely on yet more modules, and so on. One logical consequence of this is that circular imports become possible, but the resulting mutual dependencies may not be easily untangled without careful attention to the state of each of the participating modules. Consider the following situation: +In order to populate modules, other modules may themselves be required to +provide names to a given module, and in turn these other modules may rely on +yet more modules, and so on. One logical consequence of this is that circular +imports become possible, but the resulting mutual dependencies may not be +easily untangled without careful attention to the state of each of the +participating modules. Consider the following situation: {{{{#!table {{{#!graphviz @@ -116,25 +144,65 @@ }}} }}}} -If modules were loaded upon being encountered in import statements, module A would not be completely processed when attempting to import from module B, and thus the import within module B of module A would only yield some information about module A. Consequently, the details of class D might not be available, and this would then have an impact on whether module B could even be completely processed itself. +If modules were loaded upon being encountered in import statements, module A +would not be completely processed when attempting to import from module B, and +thus the import within module B of module A would only yield some information +about module A. Consequently, the details of class D might not be available, +and this would then have an impact on whether module B could even be +completely processed itself. -The approach taken to generally deal with such situations is to defer resolution until all modules have been populated. Then, names are resolved with any names employing kinds of references specified as `` (instead of, for example, ``) being resolved according to the recorded import dependencies. +The approach taken to generally deal with such situations is to defer +resolution until all modules have been populated. Then, names are resolved +with any names employing kinds of references specified as `` (instead +of, for example, ``) being resolved according to the recorded import +dependencies. -Since the classes in one module may depend on those in other modules, it is not always possible to finalise the details of classes in a module context. And since modules may depend on each other, it is not always possible to finalise the details of classes until the details of all classes in a program are known. +Since the classes in one module may depend on those in other modules, it is +not always possible to finalise the details of classes in a module context. +And since modules may depend on each other, it is not always possible to +finalise the details of classes until the details of all classes in a program +are known. === Module Initialisation === -Although static objects can be defined with interdependencies in a declarative fashion, the initialisation of objects in modules may require the availability of completely-initialised objects defined in other modules. Thus, an initialisation order needs to be established, with some modules being initialised before others, so that all modules do not encounter uninitialised names when they are expecting those names to provide valid objects. +Although static objects can be defined with interdependencies in a declarative +fashion, the initialisation of objects in modules may require the availability +of completely-initialised objects defined in other modules. Thus, an +initialisation order needs to be established, with some modules being +initialised before others, so that all modules do not encounter uninitialised +names when they are expecting those names to provide valid objects. -The most obvious example of a module requiring the initialisation of others before it is itself evaluated is, of course, the `__main__` module. Given that it may import instances defined as attribute on other modules, it clearly requires those modules to have been initialised and those instances to have been created. It would be absurd to consider running the body of the `__main__` module before such other modules. Similarly, such dependencies exist between other modules, and consequently, an appropriate initialisation ordering must be defined for them. In its entirety, then, a program must define a workable ordering for all of its modules, signalling a concrete error if no such ordering can be established. +The most obvious example of a module requiring the initialisation of others +before it is itself evaluated is, of course, the `__main__` module. Given that +it may import instances defined as attribute on other modules, it clearly +requires those modules to have been initialised and those instances to have +been created. It would be absurd to consider running the body of the +`__main__` module before such other modules. Similarly, such dependencies +exist between other modules, and consequently, an appropriate initialisation +ordering must be defined for them. In its entirety, then, a program must +define a workable ordering for all of its modules, signalling a concrete error +if no such ordering can be established. == Hidden Modules == -Imports that do not obtain the imported module name itself, such as those initiated by the `from` statement and by implicit operations, keep the imported module '''hidden'''. Unless other operations expose hidden modules, they will remain hidden and may consequently be omitted from the final generated program: there would be no way of referencing such modules and they would therefore be unable to contribute their contents to the rest of the program. +Imports that do not obtain the imported module name itself, such as those +initiated by the `from` statement and by implicit operations, keep the +imported module '''hidden'''. Unless other operations expose hidden modules, +they will remain hidden and may consequently be omitted from the final +generated program: there would be no way of referencing such modules and they +would therefore be unable to contribute their contents to the rest of the +program. -However, where an object provided by a module is referenced, a module cannot remain hidden, since the provided object may depend on other parts of the module in order to function correctly. And since a provided object might reference or return other objects in the module, the general module contents must also be exposed. +However, where an object provided by a module is referenced, a module cannot +remain hidden, since the provided object may depend on other parts of the +module in order to function correctly. And since a provided object might +reference or return other objects in the module, the general module contents +must also be exposed. -Import dependencies are defined for namespaces indicating modules that are required by each namespace. By following dependency relationships, it is possible to determine the eventual target of an import and to potentially skip over modules that merely import and expose names. For example: +Import dependencies are defined for namespaces indicating modules that are +required by each namespace. By following dependency relationships, it is +possible to determine the eventual target of an import and to potentially skip +over modules that merely import and expose names. For example: {{{{#!table {{{#!graphviz @@ -184,4 +252,8 @@ }}} }}}} -Here, B is never explicitly referenced, nor does it provide any referenced objects other than an imported name. Consequently, B is hidden and ultimately excluded from the final program. Such techniques are employed in the [[../Builtins|built-ins]] package hierarchy to reduce the amount of functionality employed by (and bundled in) a generated program. +Here, B is never explicitly referenced, nor does it provide any referenced +objects other than an imported name. Consequently, B is hidden and ultimately +excluded from the final program. Such techniques are employed in the +[[../Builtins|built-ins]] package hierarchy to reduce the amount of +functionality employed by (and bundled in) a generated program. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Inspection --- a/docs/wiki/Inspection Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Inspection Fri Aug 17 11:41:28 2018 +0200 @@ -1,6 +1,12 @@ = Inspecting Programs = -A program's source code is inspected module by module. As [[../Imports|imports]] of modules are encountered, other modules are added to the program. The inspection process for each module involves top-to-bottom traversal of the code using depth-first processing of the abstract syntax tree, with a stack of namespaces being employed to record definitions and names within namespaces are they are visited. Inspection continues until all modules in the program have been imported and inspected. +A program's source code is inspected module by module. As +[[../Imports|imports]] of modules are encountered, other modules are added to +the program. The inspection process for each module involves top-to-bottom +traversal of the code using depth-first processing of the abstract syntax +tree, with a stack of namespaces being employed to record definitions and +names within namespaces are they are visited. Inspection continues until all +modules in the program have been imported and inspected. <> @@ -12,14 +18,20 @@ * Identifying the names defined by each namespace * Recording relationships between namespaces * Describing the operations in each namespace - * Identifying the names in each namespace and the attributes used by those names + * Identifying the names in each namespace and the attributes used by those + names * Recording details of assignments and invocations -The results of inspection are written out to [[../Cache|cache]] files, one for each module in the program. +The results of inspection are written out to [[../Cache|cache]] files, one for +each module in the program. === Program Units for Inspection === -The `inspector` module performs much of the inspection work, relying on the `common` module for certain tasks, with the `modules` module providing the relevant module abstractions including those writing to and reading from cache files, and the `resolving` module completing each module's inspection. The `importer` module coordinates inspection across the whole program. +The `inspector` module performs much of the inspection work, relying on the +`common` module for certain tasks, with the `modules` module providing the +relevant module abstractions including those writing to and reading from cache +files, and the `resolving` module completing each module's inspection. The +`importer` module coordinates inspection across the whole program. === Name Identification === @@ -31,15 +43,31 @@ * Importing of modules * Importing of names from modules -The origins of names are discovered by considering local, global and built-in namespaces. Where a name is '''local''', it is defined in the same namespace. Where a name is '''global''', it is defined in the same module at the module level. Where a name is '''built-in''', it is defined in a module in the `__builtins__` package and exposed via the `__builtins__` namespace. +The origins of names are discovered by considering local, global and built-in +namespaces. Where a name is '''local''', it is defined in the same namespace. +Where a name is '''global''', it is defined in the same module at the module +level. Where a name is '''built-in''', it is defined in a module in the +`__builtins__` package and exposed via the `__builtins__` namespace. -Initial tentative identification of names will sort names into two categories: locals and external names. Global variables employed in function (or method) namespaces may not be defined when a function body is inspected, either within a module or being imported from another module, and so it is not initially possible to more specifically determine the nature of a name. +Initial tentative identification of names will sort names into two categories: +locals and external names. Global variables employed in function (or method) +namespaces may not be defined when a function body is inspected, either within +a module or being imported from another module, and so it is not initially +possible to more specifically determine the nature of a name. -These categories are later refined to distinguish between globals and built-in names as external names. The built-in origin of a name is only suggested when no locals or globals can provide the name, and the final identification of such names, plus other external names introduced as locals or globals via imports, will occur at the end of the inspection activity. Names that are unrecognised by then may be treated like unrecognised built-ins. +These categories are later refined to distinguish between globals and built-in +names as external names. The built-in origin of a name is only suggested when +no locals or globals can provide the name, and the final identification of +such names, plus other external names introduced as locals or globals via +imports, will occur at the end of the inspection activity. Names that are +unrecognised by then may be treated like unrecognised built-ins. === Name Restrictions and Resolution === -'''Static''' objects, forming the fundamental [[../Structure|structure]] of the program, expose their names through the general structure hierarchy. Classes, which are defined as part of this structure, depend on well-defined names for any base classes they employ. For example: +'''Static''' objects, forming the fundamental [[../Structure|structure]] of +the program, expose their names through the general structure hierarchy. +Classes, which are defined as part of this structure, depend on well-defined +names for any base classes they employ. For example: {{{#!python numbers=disable class C(A): # depends on A being a name that can be resolved (and being a class) @@ -49,50 +77,83 @@ ... }}} -Base classes must be identifiable and unambiguous. Since base classes may be imported, their identification may not necessarily occur immediately, but it must be possible once all information about a program is known and when all such dependencies are resolved. +Base classes must be identifiable and unambiguous. Since base classes may be +imported, their identification may not necessarily occur immediately, but it +must be possible once all information about a program is known and when all +such dependencies are resolved. === Attribute Identification === Attributes are introduced to namespaces as follows: - * For classes and modules, the definition of names defines attributes in those namespaces - * Functions do not generally support attributes, although [[../Representations#Special_Members|representation-specific attributes]] do exist on functions - * For instances, assignments to attributes of the special `self` name, performed in methods, define attributes in all instances of the method's class + * For classes and modules, the definition of names defines attributes in + those namespaces + * Functions do not generally support attributes, although + [[../Representations#Special_Members|representation-specific attributes]] + do exist on functions + * For instances, assignments to attributes of the special `self` name, + performed in methods, define attributes in all instances of the method's + class -Attributes are only supported on program objects if they have been defined or '''bound''' as described above. Any attempt to access or set an attribute on an object using a name that was not determined through the above process is considered an invalid operation. Thus, augmenting the attributes available on an object (so-called "monkeypatching") is not possible. +Attributes are only supported on program objects if they have been defined or +'''bound''' as described above. Any attempt to access or set an attribute on +an object using a name that was not determined through the above process is +considered an invalid operation. Thus, augmenting the attributes available on +an object (so-called "monkeypatching") is not possible. -When an attribute is accessed, the location of its use is recorded and ultimately associated with assignments of the name involved, just as is done for accesses of the plain name itself, but the attribute details are subsequently collected together for each assignment or '''version''' of the name. This is discussed below. +When an attribute is accessed, the location of its use is recorded and +ultimately associated with assignments of the name involved, just as is done +for accesses of the plain name itself, but the attribute details are +subsequently collected together for each assignment or '''version''' of the +name. This is discussed below. === Inherited Attributes === -In order to support inheritance, a process of propagation makes class and instance attributes available to any given class from its ancestor classes according to a depth-first traversal of a class's base class hierarchy. Thus, for each class, given the definition of its base classes, a complete collection of class and instance attribute names can be determined. The actual mechanism of propagation occurs during the consolidation phase of inspection, principally because class bases are not generally immediately identifiable upon completing the inspection of any given class. +In order to support inheritance, a process of propagation makes class and +instance attributes available to any given class from its ancestor classes +according to a depth-first traversal of a class's base class hierarchy. Thus, +for each class, given the definition of its base classes, a complete +collection of class and instance attribute names can be determined. The actual +mechanism of propagation occurs during the consolidation phase of inspection, +principally because class bases are not generally immediately identifiable +upon completing the inspection of any given class. === Name Assignments and Accesses === -Each assignment of a name defines a '''version''' of the name within a namespace. The location of this definition consists of the following: +Each assignment of a name defines a '''version''' of the name within a +namespace. The location of this definition consists of the following: * The namespace in which the assignment appears * The name itself * The version or assignment instance number -When a name is used, the location of its use is recorded and is ultimately associated with the assignments that may be providing the name at that location. This permits information about the type of the name to become available for each usage location. The location of each name usage consists of the following: +When a name is used, the location of its use is recorded and is ultimately +associated with the assignments that may be providing the name at that +location. This permits information about the type of the name to become +available for each usage location. The location of each name usage consists of +the following: * The namespace in which the name appears * The name itself * The access instance number -Name accesses are described as special cases of attribute accesses: where attributes would be indicated, none are specified. +Name accesses are described as special cases of attribute accesses: where +attributes would be indicated, none are specified. === Attribute Accesses === -Attribute '''accesses''' are operations involving attributes where those attributes are obtained or set in conjunction with an '''accessor''' expression. They are recorded in terms of location tuples, each describing an access as follows: +Attribute '''accesses''' are operations involving attributes where those +attributes are obtained or set in conjunction with an '''accessor''' +expression. They are recorded in terms of location tuples, each describing an +access as follows: * The namespace in which the access occurs * Any named accessor or an anonymous accessor * The attribute (or chain of attributes) involved (omitted for name accesses) * The access instance number -As with name accesses, the access instance number distinguishes between different accesses employing the same details. +As with name accesses, the access instance number distinguishes between +different accesses employing the same details. Consider the following example: @@ -197,7 +258,11 @@ }}} }}}} -Since the names involved may be provided by different name versions, accesses are counted independently. Meanwhile, a non-name or '''anonymous''' accessor may be involved in an access. Such anonymous accessors are independent and do not accumulate attribute usage because they potentially involve completely different objects. +Since the names involved may be provided by different name versions, accesses +are counted independently. Meanwhile, a non-name or '''anonymous''' accessor +may be involved in an access. Such anonymous accessors are independent and do +not accumulate attribute usage because they potentially involve completely +different objects. || '''Namespace''' || '''Name''' || '''Attribute''' || '''Access number''' || || `module.f` || `p` || `a` || 0 || @@ -215,11 +280,19 @@ || `module.f` || `q` || `{}` || 0 || || `module.f` || `q` || `{}` || 1 || -Here, the attribute field is left empty and indicates that name definitions are being described. Although the data is superficially similar to name accesses, it should be remembered that accesses employ an access number whereas accessors employ a name version, with such identifiers being different things. +Here, the attribute field is left empty and indicates that name definitions +are being described. Although the data is superficially similar to name +accesses, it should be remembered that accesses employ an access number +whereas accessors employ a name version, with such identifiers being different +things. === Names and Attribute Usage === -Within each scope, the names employed by the program and the attributes used with those names are tracked. As the path of execution diverges and converges again through control-flow constructs, the tracking attempts to maintain the '''attribute usage''' associated with each name, or specifically each '''version''' of each name. +Within each scope, the names employed by the program and the attributes used +with those names are tracked. As the path of execution diverges and converges +again through control-flow constructs, the tracking attempts to maintain the +'''attribute usage''' associated with each name, or specifically each +'''version''' of each name. {{{{#!table @@ -285,21 +358,45 @@ }}} }}}} -The outcome of such tracking should be an indication of the attribute usage with each name based on the shortest routes that names can take through the control-flow structure. Such shortest-route usage defines the minimal selection of attributes that can be considered used with a name, and thus such usage defines the broadest selection of object types that can be identified as supporting such attributes. In the above example, the following minimal selections of attributes apply: +The outcome of such tracking should be an indication of the attribute usage +with each name based on the shortest routes that names can take through the +control-flow structure. Such shortest-route usage defines the minimal +selection of attributes that can be considered used with a name, and thus such +usage defines the broadest selection of object types that can be identified as +supporting such attributes. In the above example, the following minimal +selections of attributes apply: || '''Name Version''' || '''Minimal Set of Attributes''' || '''Types''' || || `y` (version 0) || ''empty set'' || ''any object'' || || `y` (version 1) || `a2` || ''only objects providing the single attribute'' || -The assumption is made that any attribute used with a name is always provided by the object referenced by that name and that the correct execution of the code does not rely on an attribute being absent (and thus raising an exception). By defining usage for each name, the toolchain can determine whether any type can provide the attributes used with a name, producing a compile-time error if no type supports such usage. +The assumption is made that any attribute used with a name is always provided +by the object referenced by that name and that the correct execution of the +code does not rely on an attribute being absent (and thus raising an +exception). By defining usage for each name, the toolchain can determine +whether any type can provide the attributes used with a name, producing a +compile-time error if no type supports such usage. -It is possible that certain routes taken by names might define attribute usage that cannot be supported by types that do support the shortest-route usage, yet it might not be appropriate to forbid usage of such types with such names: the program logic may intend that such types do not visit the regions of the code that employ the attributes that such types cannot support. However, as a consequence, run-time tests will be required to prevent attribute accesses that are inappropriate for such types from taking place. In the above example, the following maximal selections apply: +It is possible that certain routes taken by names might define attribute usage +that cannot be supported by types that do support the shortest-route usage, +yet it might not be appropriate to forbid usage of such types with such names: +the program logic may intend that such types do not visit the regions of the +code that employ the attributes that such types cannot support. However, as a +consequence, run-time tests will be required to prevent attribute accesses +that are inappropriate for such types from taking place. In the above example, +the following maximal selections apply: || '''Name Version''' || '''Maximal Set of Attributes''' || '''Types''' || || `y` (version 0) || `a1`, `a3` || ''only objects providing both attributes'' || || `y` (version 1) || `a1`, `a2`, `a3` || ''only objects providing all three attributes'' || -Tracking occurs separately in function or method namespaces and at a level combining the static namespaces in a module. This latter combination of namespaces permits the flow of global name details through class namespaces. Tracking employs special '''tracking names''' with which usage is associated, with globals and class attributes employing complete object paths to describe their names, whereas locals merely employ the plain names defined and used in local namespaces. Some examples follow: +Tracking occurs separately in function or method namespaces and at a level +combining the static namespaces in a module. This latter combination of +namespaces permits the flow of global name details through class namespaces. +Tracking employs special '''tracking names''' with which usage is associated, +with globals and class attributes employing complete object paths to describe +their names, whereas locals merely employ the plain names defined and used in +local namespaces. Some examples follow: || '''Namespace''' || '''Name''' || '''Name Scope''' || '''Tracking Name''' || || `__main__` (module) || `x` || global || `__main__.x` || @@ -309,14 +406,21 @@ === Name Initialisation and Aliases === -Each name version may be associated with a value upon assignment provided by an expression known as an '''initialiser'''. Such values may be used to inform the interpretation of operations on names, restricting the types involved to the initialised values. Some examples are as follows: +Each name version may be associated with a value upon assignment provided by +an expression known as an '''initialiser'''. Such values may be used to inform +the interpretation of operations on names, restricting the types involved to +the initialised values. Some examples are as follows: {{{#!python numbers=disable x = 123 # initialiser is a constant, indicating a known type x = classname() # initialiser refers to a class, indicating instantiation }}} -Names may also be assigned the values of other names, and such a name becomes an '''alias''' for such other names. Aliases can therefore be used to combine attribute usage observations across names. Other forms of aliases involve assigning attributes accessed via other names to any given name. Some examples are as follows: +Names may also be assigned the values of other names, and such a name becomes +an '''alias''' for such other names. Aliases can therefore be used to combine +attribute usage observations across names. Other forms of aliases involve +assigning attributes accessed via other names to any given name. Some examples +are as follows: {{{#!python numbers=disable y = x # y @@ -324,95 +428,180 @@ z = x.q # z can be restricted to the types deduced for x.q }}} -Initialising values are retained for later resolution because their identities are not generally known until certain name resolution activities have taken place. +Initialising values are retained for later resolution because their identities +are not generally known until certain name resolution activities have taken +place. === Invocations === -During inspection only limited information is available about the nature of invocations. However, some information will already be available about global names and these may be defined by classes. Thus, invocations that cause the instantiation of classes may be determined even during the inspection phase. +During inspection only limited information is available about the nature of +invocations. However, some information will already be available about global +names and these may be defined by classes. Thus, invocations that cause the +instantiation of classes may be determined even during the inspection phase. -Information about class instantiation is most useful in combination with the initialisation of names. When an assignment occurs, any initialising expression that provides enough information can be evaluated to see if it describes instantiation. If so, the nature of the instantiation can be associated with the name and used in the deduction process to constrain any usage of that name and its attributes. Such restrictions on the types associated with names are applied in the deduction phase. +Information about class instantiation is most useful in combination with the +initialisation of names. When an assignment occurs, any initialising +expression that provides enough information can be evaluated to see if it +describes instantiation. If so, the nature of the instantiation can be +associated with the name and used in the deduction process to constrain any +usage of that name and its attributes. Such restrictions on the types +associated with names are applied in the deduction phase. === Literals and Constants === -Literal values or '''literals''' are specific values that appear in the program, typically representing numbers, character strings, mappings or collections. Literal numbers and strings are effectively '''constants''', meaning that their values are unambiguously defined and can eventually be referenced using a unique identifier that applies throughout the program and which can refer to an initialised object in any generated program. Initially, they are recorded for the namespace in which they appear. For example: +Literal values or '''literals''' are specific values that appear in the +program, typically representing numbers, character strings, mappings or +collections. Literal numbers and strings are effectively '''constants''', +meaning that their values are unambiguously defined and can eventually be +referenced using a unique identifier that applies throughout the program and +which can refer to an initialised object in any generated program. Initially, +they are recorded for the namespace in which they appear. For example: || '''Namespace''' || '''Identifier''' || '''Type''' || '''Encoding''' || '''Value''' || || `__main__` || `__main__.$c0` || `__builtins__.str.string` || `iso-8859-15` || `'\xc6\xd8\xc5'` || || `__main__` || `__main__.$c1` || `__builtins__.int.int` || || `123` || -Since values may appear again in other namespaces, a mapping is generated from such local identifiers to common global identifiers for constants. Where such an identifier is derived from the value content, this can potentially occur immediately, although in practice (and in general) such a mapping will be generated after all modules have been inspected. +Since values may appear again in other namespaces, a mapping is generated from +such local identifiers to common global identifiers for constants. Where such +an identifier is derived from the value content, this can potentially occur +immediately, although in practice (and in general) such a mapping will be +generated after all modules have been inspected. -Collections and mappings may also be initialised using literal syntax but may contain non-constant information such as names or expressions whose values are unknown before the program is run. Such values can be represented as instantiation operations of the appropriate type in any generated program, and the instances of the type concerned can be associated with any names to which such literals are assigned. Special names are used to refer to literal types for collections and mappings. For example: +Collections and mappings may also be initialised using literal syntax but may +contain non-constant information such as names or expressions whose values are +unknown before the program is run. Such values can be represented as +instantiation operations of the appropriate type in any generated program, and +the instances of the type concerned can be associated with any names to which +such literals are assigned. Special names are used to refer to literal types +for collections and mappings. For example: || '''Identifier''' || '''Type''' || || `$Ldict` || `__builtins__.dict.dict` || || `$Llist` || `__builtins__.list.list` || || `$Ltuple` || `__builtins__.tuple.tuple` || -Such special names merely serve as convenient placeholders for the types of any literals mentioned in a module. +Such special names merely serve as convenient placeholders for the types of +any literals mentioned in a module. == Consolidating Inspection Details == -As briefly discussed above, certain activities occur after information has been collected from an input program. Within each module, names that are external to namespaces are recorded in a '''name reference''' collection. These references identify unrecognised names but do not generally define their origins. Examples of name references are as follows: +As briefly discussed above, certain activities occur after information has +been collected from an input program. Within each module, names that are +external to namespaces are recorded in a '''name reference''' collection. +These references identify unrecognised names but do not generally define their +origins. Examples of name references are as follows: || '''Name''' || '''Identity''' || || `__main__.repr` || `:__builtins__.repr` || || `__main__.sys` || `:sys` || -In order to obtain the final identities, deferred references may need to be resolved, yielding concrete references: +In order to obtain the final identities, deferred references may need to be +resolved, yielding concrete references: || '''Name''' || '''Identity''' || || `__main__.repr` || `:__builtins__.identity.repr` || || `__main__.sys` || `:sys` || -This process of resolution cannot be performed immediately with only the knowledge available in a single module. Only with all program modules loaded can such resolution occur. +This process of resolution cannot be performed immediately with only the +knowledge available in a single module. Only with all program modules loaded +can such resolution occur. === Identifying Deferred References === -After all modules are loaded, and with all objects present in the program at this point, each deferred reference defined within a module should ultimately yield an identity. Any failure to do so indicates usage of a name not defined in the program. The process of identifying them by resolving what each of them points to, is illustrated by the following example: +After all modules are loaded, and with all objects present in the program at +this point, each deferred reference defined within a module should ultimately +yield an identity. Any failure to do so indicates usage of a name not defined +in the program. The process of identifying them by resolving what each of them +points to, is illustrated by the following example: || '''Reference''' || '''Object Path to Search''' || || `:__builtins__.repr` || `__builtins__.repr` || || `:__builtins__.identity.repr` || `__builtins__.identity.repr` || || `:__builtins__.identity.repr` || ''identification has occurred'' || -Initially, the origin information from the reference is used to find the details of the referenced object. However, in this case, the details yield another deferred reference that has not yet been resolved. Consequently, the origin information must be used to find the details of the object referenced in this case, finally yielding a reference with concrete object information. Deferred references are mutated to concrete references, thus changing the nature of such references throughout the accumulated data. +Initially, the origin information from the reference is used to find the +details of the referenced object. However, in this case, the details yield +another deferred reference that has not yet been resolved. Consequently, the +origin information must be used to find the details of the object referenced +in this case, finally yielding a reference with concrete object information. +Deferred references are mutated to concrete references, thus changing the +nature of such references throughout the accumulated data. === Identifying Module Dependencies === -During deferred reference resolution, relationships are catalogued between modules in which deferred references are found and those providing the corresponding referenced objects. In addition, module ordering dependencies are established on the basis that some kinds of objects must be initialised before being used in other parts of the program. As a result, the modules providing such objects must themselves be initialised before the modules referencing such objects. More information on this topic can be found in the [[../Imports#Import_Sequencing|import sequencing]] documentation. +During deferred reference resolution, relationships are catalogued between +modules in which deferred references are found and those providing the +corresponding referenced objects. In addition, module ordering dependencies +are established on the basis that some kinds of objects must be initialised +before being used in other parts of the program. As a result, the modules +providing such objects must themselves be initialised before the modules +referencing such objects. More information on this topic can be found in the +[[../Imports#Import_Sequencing|import sequencing]] documentation. === Resolving Module Details === -With all dependencies resolved, further work can be done with the details within each module. The `resolving` module provides the functionality to each module instance to perform such work. +With all dependencies resolved, further work can be done with the details +within each module. The `resolving` module provides the functionality to each +module instance to perform such work. 1. Class bases can be identified. - 1. Special names (referring to built-in objects employed by certain operations) are resolved. - 1. Each access involving an external name is investigated to see if it provides an effectively constant object (as described below). - 1. Invocation references are converted to provide instantiation information, if appropriate. - 1. Initialisers are investigated to see if they provide concrete object details and can thus indicate the nature of a name. + 1. Special names (referring to built-in objects employed by certain + operations) are resolved. + 1. Each access involving an external name is investigated to see if it + provides an effectively constant object (as described below). + 1. Invocation references are converted to provide instantiation information, + if appropriate. + 1. Initialisers are investigated to see if they provide concrete object + details and can thus indicate the nature of a name. 1. Constant value literals are associated with the appropriate types. -Thus, attribute details for each module and its classes are finalised. Meanwhile, modules that are still hidden are removed from the program. +Thus, attribute details for each module and its classes are finalised. +Meanwhile, modules that are still hidden are removed from the program. ==== Investigating Accesses ==== -External names will not have been resolved during the inspection process, but with information about the whole program, it becomes possible to identify names and to determine whether attribute accesses involving those names can also be identified. Thus, name references from each namespace are investigated in turn as follows. +External names will not have been resolved during the inspection process, but +with information about the whole program, it becomes possible to identify +names and to determine whether attribute accesses involving those names can +also be identified. Thus, name references from each namespace are investigated +in turn as follows. -Each name referencing a static object is considered a constant access of that object. However, given a number of accompanying attributes describing an access, each attribute is added in turn, and the resulting object path identified. If the path identifies a constant object, the next attribute in the access chain is added and the resulting object path identified, and so on. The maximal collection of attributes identifying a constant object is then recorded as a constant access, with any remaining attributes in the access chain retained for traversal in a later phase. +Each name referencing a static object is considered a constant access of that +object. However, given a number of accompanying attributes describing an +access, each attribute is added in turn, and the resulting object path +identified. If the path identifies a constant object, the next attribute in +the access chain is added and the resulting object path identified, and so on. +The maximal collection of attributes identifying a constant object is then +recorded as a constant access, with any remaining attributes in the access +chain retained for traversal in a later phase. -Names yielding constant accesses do not need to have their identity deduced through the application of attribute usage. +Names yielding constant accesses do not need to have their identity deduced +through the application of attribute usage. ==== Investigating Initialisers ==== -During inspection, initialisers will have been recorded in terms of expression results such as names, invocations, readily-identifiable instances, and attribute accesses. With details of the whole program plus constant accesses having been made available, such results may be definitively identified and associated with initialised names. +During inspection, initialisers will have been recorded in terms of expression +results such as names, invocations, readily-identifiable instances, and +attribute accesses. With details of the whole program plus constant accesses +having been made available, such results may be definitively identified and +associated with initialised names. -Alias information may also be refined at this point by identifying attribute accesses that are used to initialise names. +Alias information may also be refined at this point by identifying attribute +accesses that are used to initialise names. === Finalising Program Details === -With class contents and relationships identified, class attribute inheritance and instance attribute availability can be defined. Some classes may depend on external state for their initialisation, and so additional module dependencies may be defined at this stage. The `importer` module contains the functionality to conduct these whole-program activities. +With class contents and relationships identified, class attribute inheritance +and instance attribute availability can be defined. Some classes may depend on +external state for their initialisation, and so additional module dependencies +may be defined at this stage. The `importer` module contains the functionality +to conduct these whole-program activities. == Inspection Outcome == -Once inspection is complete, the available knowledge concerns the whole program and not merely individual modules or parts of the program. However, only limited efforts have been made until this point, notably in the acquisition of statically-defined object details when referencing names or attributes, to integrate the different modules. The [[../Deduction|deduction]] process is concerned with such integration. +Once inspection is complete, the available knowledge concerns the whole +program and not merely individual modules or parts of the program. However, +only limited efforts have been made until this point, notably in the +acquisition of statically-defined object details when referencing names or +attributes, to integrate the different modules. The [[../Deduction|deduction]] +process is concerned with such integration. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Optimisation --- a/docs/wiki/Optimisation Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Optimisation Fri Aug 17 11:41:28 2018 +0200 @@ -1,14 +1,28 @@ = Optimisation = -In the process of generating an output representation of a Lichen program, the process of optimisation is concerned with positioning members within program structures. Attributes are positioned in object structures, and '''object tables''' are defined so that attributes can be located in such objects. Similarly, parameters, whose positions are determined by their appearance in function and other callable signatures, have positioning information defined in '''parameter tables''' that can be used to position arguments in parameter arrays when their names are used in argument lists. +In the process of generating an output representation of a Lichen program, the +process of optimisation is concerned with positioning members within program +structures. Attributes are positioned in object structures, and '''object +tables''' are defined so that attributes can be located in such objects. +Similarly, parameters, whose positions are determined by their appearance in +function and other callable signatures, have positioning information defined +in '''parameter tables''' that can be used to position arguments in parameter +arrays when their names are used in argument lists. -Also performed during optimisation is the consolidation of constant information, initially discussed [[../Inspection#Literals_and_Constants|in the context of inspection]]. +Also performed during optimisation is the consolidation of constant +information, initially discussed [[../Inspection#Literals_and_Constants|in the +context of inspection]]. <> == Populating Objects == -A program will have objects that are classes, modules and instances, and each such object will provide a number of attributes. Each object's attributes are stored in an array. For simplicity, each attribute having a given name will always be positioned at the same place in any array provided by an object featuring an attribute with that name. Even different object types will use the same position. +A program will have objects that are classes, modules and instances, and each +such object will provide a number of attributes. Each object's attributes are +stored in an array. For simplicity, each attribute having a given name will +always be positioned at the same place in any array provided by an object +featuring an attribute with that name. Even different object types will use +the same position. Consider an attribute called `elephant` provided by a number of types: @@ -18,15 +32,30 @@ || class E || `__class__` || `ant` || `dog` || `giraffe` || `horse` || || module M || `__class__` || `rhino` || `hippo` || `elephant` || `zebra` || -Where `elephant` is provided as an attribute, it will always appear in the same position - in this case as the fourth attribute - in any object attribute array. +Where `elephant` is provided as an attribute, it will always appear in the +same position - in this case as the fourth attribute - in any object attribute +array. -Consequently, any operation involving an object of unknown identity that employs `elephant` can employ the same position information to determine whether the attribute is supported and to retrieve or modify its value. Where an object does not support `elephant`, it may use the same attribute position for another attribute. Determining whether objects support attributes is done using tables, described below. +Consequently, any operation involving an object of unknown identity that +employs `elephant` can employ the same position information to determine +whether the attribute is supported and to retrieve or modify its value. Where +an object does not support `elephant`, it may use the same attribute position +for another attribute. Determining whether objects support attributes is done +using tables, described below. -It should be noted that the positions of attributes in object structures are the same as the positions of attribute identifiers in object tables. With attribute positions allocated, table definition is trivial. +It should be noted that the positions of attributes in object structures are +the same as the positions of attribute identifiers in object tables. With +attribute positions allocated, table definition is trivial. === Allocation Considerations === -Such common positioning of attributes demands a degree of coordination between objects. If `elephant` is positioned in a given place in one object, then given the constraint of it only ever being positioned in a single location for all objects, it may not then be positioned in a different place in another object. Thus, where two attributes co-exist in an object, their positions must be different and will affect all objects supporting those attributes, regardless of whether they support them both. For example: +Such common positioning of attributes demands a degree of coordination between +objects. If `elephant` is positioned in a given place in one object, then +given the constraint of it only ever being positioned in a single location for +all objects, it may not then be positioned in a different place in another +object. Thus, where two attributes co-exist in an object, their positions must +be different and will affect all objects supporting those attributes, +regardless of whether they support them both. For example: || '''Type''' ||<-5> '''Attributes''' || || class C || `__class__` || `ant` || `dog` || `elephant` || `cat` || @@ -35,21 +64,48 @@ || module M || `__class__` || `rhino` || `hippo` || `elephant` || `zebra` || || module N || `__class__` || || || `elephant` || `zebra` || -Here, module N still positions `elephant` in the fourth position. If `elephant` were moved to the second position, it would conflict with `ant` or `rhino` in those objects supporting those attributes. Such coordination therefore has an impact on allocation strategies. Care has to be taken to allocate attributes in such a way that small objects (having few attributes) do not have their attributes positioned far from the start of their attribute arrays, because failing to do so effectively makes those objects large, inefficiently-represented objects. +Here, module N still positions `elephant` in the fourth position. If +`elephant` were moved to the second position, it would conflict with `ant` or +`rhino` in those objects supporting those attributes. Such coordination +therefore has an impact on allocation strategies. Care has to be taken to +allocate attributes in such a way that small objects (having few attributes) +do not have their attributes positioned far from the start of their attribute +arrays, because failing to do so effectively makes those objects large, +inefficiently-represented objects. -A reasonable solution is to consider the following when allocating attribute positions: +A reasonable solution is to consider the following when allocating attribute +positions: * Attribute frequency (or ubiquity) * Object size * Object kind (whether the object is a class, module or instance) -More frequently-occurring (or ubiquitous) attributes may need to appear in both large and small objects and should probably be allocated in lower positions (`__class__` being an extreme example of needing to be allocated for all objects). Attributes featured in small objects should also be given priority for lower positions due to the desirability of keeping such objects small. Meanwhile, classes and modules only appear once in a program, whereas there may be countless instances allocated during the lifetime of a program's execution. Therefore, attributes featured in instances should have priority over other attributes for lower positions in object structures. +More frequently-occurring (or ubiquitous) attributes may need to appear in +both large and small objects and should probably be allocated in lower +positions (`__class__` being an extreme example of needing to be allocated for +all objects). Attributes featured in small objects should also be given +priority for lower positions due to the desirability of keeping such objects +small. Meanwhile, classes and modules only appear once in a program, whereas +there may be countless instances allocated during the lifetime of a program's +execution. Therefore, attributes featured in instances should have priority +over other attributes for lower positions in object structures. == Populating Signatures == -The positions of parameters in callable signatures are determined by the definitions of those signatures in the source program. When callables are invoked using an argument list, arguments that are not specified using keywords are merely copied into the parameter array in the same position. However, keyword arguments will need to have their positions looked up in the appropriate parameter table for the callable involved. +The positions of parameters in callable signatures are determined by the +definitions of those signatures in the source program. When callables are +invoked using an argument list, arguments that are not specified using +keywords are merely copied into the parameter array in the same position. +However, keyword arguments will need to have their positions looked up in the +appropriate parameter table for the callable involved. -Consequently, no allocation needs to be performed on the parameters themselves: they have specific positions for each callable. However, just as attribute names must yield positions when accessing attributes on objects, a similar mechanism that takes parameter names and yields positions must be provided. It is instead the positions of parameter details that must be allocated in structures to be consulted as if parameter names were attribute names, with attributes providing parameter position details. +Consequently, no allocation needs to be performed on the parameters +themselves: they have specific positions for each callable. However, just as +attribute names must yield positions when accessing attributes on objects, a +similar mechanism that takes parameter names and yields positions must be +provided. It is instead the positions of parameter details that must be +allocated in structures to be consulted as if parameter names were attribute +names, with attributes providing parameter position details. Consider the following functions: @@ -64,28 +120,52 @@ ... }}} -In order to find the position of each parameter using its name, the following table could be provided: +In order to find the position of each parameter using its name, the following +table could be provided: || '''Table''' ||<-4> '''Parameters''' || || function f || `a` at 1 || `b` at 2 || `c` at 3 || `d` at 4 || || function g || `q` at 2 || `p` at 1 || `r` at 3 || || || function h || `v` at 2 || `p` at 2 || || `d` at 1 || -Such a table behaves like an object structure (or an object table) with parameters acting like attributes in such a structure. Here, the attributes yield the positions of parameters within parameter arrays. Note how `p` always resides in the same location but may yield different positions depending on the actual callable involved (as is also observable with `d`). +Such a table behaves like an object structure (or an object table) with +parameters acting like attributes in such a structure. Here, the attributes +yield the positions of parameters within parameter arrays. Note how `p` always +resides in the same location but may yield different positions depending on +the actual callable involved (as is also observable with `d`). === Allocation Considerations === -Parameter table allocation involves similar considerations to those influencing object table allocation. In order to keep parameter tables small, more frequently appearing parameters should be positioned earlier in arrays. A specific consideration of importance is that of the number of tables generated. There may be many callables with the same signature, and such callables therefore do not need their own parameter tables since they will merely be duplicates. An attempt is therefore made to identify distinct signatures and to generate parameter tables only for these signatures, instead of providing a one-to-one mapping between callables and tables. +Parameter table allocation involves similar considerations to those +influencing object table allocation. In order to keep parameter tables small, +more frequently appearing parameters should be positioned earlier in arrays. A +specific consideration of importance is that of the number of tables +generated. There may be many callables with the same signature, and such +callables therefore do not need their own parameter tables since they will +merely be duplicates. An attempt is therefore made to identify distinct +signatures and to generate parameter tables only for these signatures, instead +of providing a one-to-one mapping between callables and tables. == Populating Tables == -With names allocated to positions in each kind of table, the straightforward task of generating such tables proceeds as follows. +With names allocated to positions in each kind of table, the straightforward +task of generating such tables proceeds as follows. === Object Tables and Attribute Codes === -Object tables consist of locations directly corresponding to structure member locations. Each location in a table will correspond to a specific name, but since potentially many names may have the same position, a table must provide identifying details for the name that is actually supported. +Object tables consist of locations directly corresponding to structure member +locations. Each location in a table will correspond to a specific name, but +since potentially many names may have the same position, a table must provide +identifying details for the name that is actually supported. -In object tables, such identifying details take the form of '''attribute codes'''. Each attribute name is mapped to a distinct identifier, and upon consulting an object table, the lookup process must read the stored attribute code and compare it to the code for the attribute it intends to access. If these codes match, then it can be assumed that the object involved supports the named attribute. Otherwise, a different attribute (or even no attribute at all) resides at that location in the object structure, making the access inappropriate. +In object tables, such identifying details take the form of '''attribute +codes'''. Each attribute name is mapped to a distinct identifier, and upon +consulting an object table, the lookup process must read the stored attribute +code and compare it to the code for the attribute it intends to access. If +these codes match, then it can be assumed that the object involved supports +the named attribute. Otherwise, a different attribute (or even no attribute at +all) resides at that location in the object structure, making the access +inappropriate. A more comprehensive object table resembles the following: @@ -113,11 +193,22 @@ === Parameter Tables and Parameter Codes === -Parameter tables consist of locations yielding parameter position information. Each location in a table will correspond to a particular name, but since potentially many names may have the same position, a table must provide identifying details for the name that is actually supported. +Parameter tables consist of locations yielding parameter position information. +Each location in a table will correspond to a particular name, but since +potentially many names may have the same position, a table must provide +identifying details for the name that is actually supported. -Just as with object tables, in parameter tables, such identifying details take the form of '''parameter codes'''. Each parameter name is mapped to a distinct identifier, and upon consulting a parameter table, the lookup process must read the stored parameter code and compare it to the code for the parameter it is attempting to position in a parameter list. If these codes match, then it can be assumed that the signature supports the named parameter. Otherwise, a different parameter (or even no parameter at all) resides at that location in the parameter table, making any attempt to set such a parameter inappropriate. +Just as with object tables, in parameter tables, such identifying details take +the form of '''parameter codes'''. Each parameter name is mapped to a distinct +identifier, and upon consulting a parameter table, the lookup process must +read the stored parameter code and compare it to the code for the parameter it +is attempting to position in a parameter list. If these codes match, then it +can be assumed that the signature supports the named parameter. Otherwise, a +different parameter (or even no parameter at all) resides at that location in +the parameter table, making any attempt to set such a parameter inappropriate. -Since parameter tables provide both identifying information and parameter positions, a more comprehensive parameter table resembles the following: +Since parameter tables provide both identifying information and parameter +positions, a more comprehensive parameter table resembles the following: || '''Table''' ||<-4> '''Parameters (Codes)''' || || function f || `a` at 1 (1) || `b` at 2 (2) || `c` at 3 (3) || `d` at 4 (4) || @@ -138,9 +229,16 @@ == Consolidating Constants == -With knowledge about all constants used in a program, it becomes possible to prepare a catalogue of distinct constants and to assign each of them a unique name for convenient referencing in the generated program code. All constants are inspected in turn and a content digest prepared for each of them, establishing a mapping from values to such digests which act as global identifiers. Since local names are associated with constants, a mapping is also established from each local name to the corresponding global identifier. +With knowledge about all constants used in a program, it becomes possible to +prepare a catalogue of distinct constants and to assign each of them a unique +name for convenient referencing in the generated program code. All constants +are inspected in turn and a content digest prepared for each of them, +establishing a mapping from values to such digests which act as global +identifiers. Since local names are associated with constants, a mapping is +also established from each local name to the corresponding global identifier. -Considering the [[../Inspection#Literals_and_Constants|previous example]], the following mappings would be established, from values to global identifiers: +Considering the [[../Inspection#Literals_and_Constants|previous example]], the +following mappings would be established, from values to global identifiers: || '''Type''' || '''Encoding''' || '''Value''' || '''Global Identifier (Content Digest)''' || || `__builtins__.str.string` || `iso-8859-15` || `'\xc6\xd8\xc5'` || `OeyoRfPI__XqwJcPgTDTItX9v__OM` || @@ -154,4 +252,7 @@ == Optimisation Products == -The optimisation process should produce catalogues of attribute and parameter codes plus positioning information for object tables, object structures and parameter tables. It should also produce a catalogue of distinct constant identifiers. +The optimisation process should produce catalogues of attribute and parameter +codes plus positioning information for object tables, object structures and +parameter tables. It should also produce a catalogue of distinct constant +identifiers. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Prerequisites --- a/docs/wiki/Prerequisites Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Prerequisites Fri Aug 17 11:41:28 2018 +0200 @@ -1,7 +1,17 @@ = Prerequisites = -The software currently requires Python to operate. All recent development has been done using Python 2.7 which should be available for most operating system distributions. In future, Lichen will hopefully be compiled using its own toolchain and not require Python. See the Debian `python-minimal` or `python` packages or their equivalents. +The software currently requires Python to operate. All recent development has +been done using Python 2.7 which should be available for most operating system +distributions. In future, Lichen will hopefully be compiled using its own +toolchain and not require Python. See the Debian `python-minimal` or `python` +packages or their equivalents. -To produce executable programs, a C compiler and GNU make are required. The GNU C compiler, GCC, has been used for development and testing, and it should be available in various forms for most operating system distributions. Indeed, cross-compiling form of GCC has been used to test the toolchain output. See the Debian `gcc` and `make` packages or their equivalents. +To produce executable programs, a C compiler and GNU make are required. The GNU +C compiler, GCC, has been used for development and testing, and it should be +available in various forms for most operating system distributions. Indeed, +cross-compiling form of GCC has been used to test the toolchain output. See the +Debian `gcc` and `make` packages or their equivalents. -As Free Software, priority has been given to supporting Lichen on Free Software platforms, with Debian GNU/Linux being the development platform. Proprietary platforms are neither supported nor recommended. +As Free Software, priority has been given to supporting Lichen on Free Software +platforms, with Debian GNU/Linux being the development platform. Proprietary +platforms are neither supported nor recommended. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Representations --- a/docs/wiki/Representations Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Representations Fri Aug 17 11:41:28 2018 +0200 @@ -1,12 +1,16 @@ = Representing Program Objects = -Certain representations have been chosen for program objects that attempt to support efficient access to those objects during the execution of a program. +Certain representations have been chosen for program objects that attempt to +support efficient access to those objects during the execution of a program. <> == Attributes == -The principal means of referring to an object in a program is by using an '''attribute''', having this name because it is the representation of an attribute in classes, instances and modules. Each attribute can hold a reference to an object, known as the '''value''', or other kinds of data: +The principal means of referring to an object in a program is by using an +'''attribute''', having this name because it is the representation of an +attribute in classes, instances and modules. Each attribute can hold a +reference to an object, known as the '''value''', or other kinds of data: {{{#!graphviz //format=svg @@ -25,19 +29,49 @@ } }}} -Although a value indicates a specific object of interest for an attribute, if the object is callable then additional '''context''' information may be required to call the object. Such context information is not stored in an attribute record but is instead obtained from the object itself, if appropriate. +Although a value indicates a specific object of interest for an attribute, if +the object is callable then additional '''context''' information may be +required to call the object. Such context information is not stored in an +attribute record but is instead obtained from the object itself, if +appropriate. === Integer Representation === -The `intvalue` member of the attribute structure is employed instead of the `value` member to store integer values. Since `value` normally holds a pointer, and since pointers are often aligned to certain address boundaries on many modern platforms (usually four-byte boundaries on 32-bit platforms, eight-byte boundaries on 64-bit platforms, two-byte boundaries on platforms with 16-bit addressing), the lowest significant bit (bit 0) will typically be zero for a valid pointer. Consequently, by setting bit 0 to 1, other data can be stored in the remaining bits and be distinguished from pointer information. Obviously, operations on attributes first need to test whether the `value` member or the `intvalue` member is in use by testing bit 0. +The `intvalue` member of the attribute structure is employed instead of the +`value` member to store integer values. Since `value` normally holds a +pointer, and since pointers are often aligned to certain address boundaries on +many modern platforms (usually four-byte boundaries on 32-bit platforms, +eight-byte boundaries on 64-bit platforms, two-byte boundaries on platforms +with 16-bit addressing), the lowest significant bit (bit 0) will typically be +zero for a valid pointer. Consequently, by setting bit 0 to 1, other data can +be stored in the remaining bits and be distinguished from pointer information. +Obviously, operations on attributes first need to test whether the `value` +member or the `intvalue` member is in use by testing bit 0. -Thus, integers are transformed and stored directly within attributes, and they are therefore passed around by value. When an attribute of an integer needs to be accessed, the operation usually providing the `value` member, thus obtaining an instance under normal circumstances, instead provides the address of a common integer instance. This instance may then provide instance attributes, whose values will be the same for all integers, or class attributes through a reference to the integer (`int`) class. +Thus, integers are transformed and stored directly within attributes, and they +are therefore passed around by value. When an attribute of an integer needs to +be accessed, the operation usually providing the `value` member, thus +obtaining an instance under normal circumstances, instead provides the address +of a common integer instance. This instance may then provide instance +attributes, whose values will be the same for all integers, or class +attributes through a reference to the integer (`int`) class. -Each method provided by `int`, when called, will be given the original attribute for which the method was obtained as its context. In such methods, operations on the context via `self` will either involve the propagation of the attribute to other functions or methods or attribute accesses on `self`, yielding common instance attributes or class attributes as already described. Only native functions will attempt to interpret the attribute in a different way, decoding the representation, performing arithmetic or logical operations, and encoding the result in a new attribute. +Each method provided by `int`, when called, will be given the original +attribute for which the method was obtained as its context. In such methods, +operations on the context via `self` will either involve the propagation of +the attribute to other functions or methods or attribute accesses on `self`, +yielding common instance attributes or class attributes as already described. +Only native functions will attempt to interpret the attribute in a different +way, decoding the representation, performing arithmetic or logical operations, +and encoding the result in a new attribute. == Contexts and Wrappers == -The context of an object is of significance if that object is callable. For example, an instance may permit access to a method via an attribute. Since the method will be callable, and since the method is accessed via the instance, the context of that method will be the instance. In order to retain both context and value information, a '''wrapper''' may be created. +The context of an object is of significance if that object is callable. For +example, an instance may permit access to a method via an attribute. Since the +method will be callable, and since the method is accessed via the instance, +the context of that method will be the instance. In order to retain both +context and value information, a '''wrapper''' may be created. {{{#!graphviz //format=svg @@ -57,11 +91,14 @@ } }}} -The context is not always the accessor of the object - in this case, the instance - because the object may already be a wrapper with its own context. +The context is not always the accessor of the object - in this case, the +instance - because the object may already be a wrapper with its own context. == Objects == -Classes, instances and modules are objects, and all of these kinds of objects provide metadata describing the type of each object, together with a collection of attributes forming the contents of such objects. +Classes, instances and modules are objects, and all of these kinds of objects +provide metadata describing the type of each object, together with a +collection of attributes forming the contents of such objects. {{{#!graphviz //format=svg @@ -84,31 +121,52 @@ } }}} -Classes are represented by structures whose members reference class attributes and class metadata (the class table), as well as incorporating invocation metadata (the `__args__` and `__fn__` special attributes). +Classes are represented by structures whose members reference class attributes +and class metadata (the class table), as well as incorporating invocation +metadata (the `__args__` and `__fn__` special attributes). -Instances are represented by structures whose members reference instance attributes (including `__class__` which indicates the class instantiated by a given instance) and instance metadata (the instance table), as well as incorporating invocation metadata (the `__args__` and `__fn__` special attributes). +Instances are represented by structures whose members reference instance +attributes (including `__class__` which indicates the class instantiated by a +given instance) and instance metadata (the instance table), as well as +incorporating invocation metadata (the `__args__` and `__fn__` special +attributes). -Functions are instances of a general function type that does not permit additional, general instance attributes. However, function instance structures may have extra members corresponding to default parameter values. Access to such extra members is performed using the minimum and maximum values provided via `__args__` and with knowledge of the number of declared instance attributes indicated by the instance table for the function type. +Functions are instances of a general function type that does not permit +additional, general instance attributes. However, function instance structures +may have extra members corresponding to default parameter values. Access to +such extra members is performed using the minimum and maximum values provided +via `__args__` and with knowledge of the number of declared instance +attributes indicated by the instance table for the function type. -Modules are represented by structures whose members reference module attributes and module metadata (the module table). +Modules are represented by structures whose members reference module +attributes and module metadata (the module table). == Special Members == -All object representations support the following special members describing the invocation properties of each object: +All object representations support the following special members describing +the invocation properties of each object: {{{#!table -`__args__` || the minimum number of arguments supported in an invocation and a reference to the parameter table for the object +`__args__` || the minimum number of arguments supported in an invocation and a + .. reference to the parameter table for the object == -`__fn__` || a reference to a native function containing the actual code run when calling the object +`__fn__` || a reference to a native function containing the actual code run + .. when calling the object }}} -Classes are invoked in order to create instances of each class ('''instantiation'''). Instances may support invocation by providing a `__call__` method. Functions are supported by instances of a general function class. Modules are generally not callable and will not actually provide these special members in practice. +Classes are invoked in order to create instances of each class +('''instantiation'''). Instances may support invocation by providing a +`__call__` method. Functions are supported by instances of a general function +class. Modules are generally not callable and will not actually provide these +special members in practice. All object representations support information about their type: {{{#!table `__class__` -|| the class of the object: instances refer to their classes, classes refer to the `type` class, functions are instances that refer to the `function` class, modules refer to the `module` class +|| the class of the object: instances refer to their classes, classes refer to +.. the `type` class, functions are instances that refer to the `function` +.. class, modules refer to the `module` class }}} Certain kinds of object support other descriptive attributes: @@ -119,31 +177,48 @@ `__parent__` || the parent scope of a class or a function }}} -Objects supported by native, system-level functionality require a means of retaining information in special attributes: +Objects supported by native, system-level functionality require a means of +retaining information in special attributes: {{{#!table `__data__` || private data manipulated at the native level }}} -Strings support special annotation attributes that permit their use in dynamically resolving attributes: +Strings support special annotation attributes that permit their use in +dynamically resolving attributes: {{{#!table -`__key__` || the code and position of the attribute whose name is represented by the string +`__key__` || the code and position of the attribute whose name is represented + .. by the string }}} -Such "key" attributes provide information that can be used to inspect an object table and to test for the presence of an attribute. With such information, the `getattr` and `hasattr` functions can be supported. +Such "key" attributes provide information that can be used to inspect an +object table and to test for the presence of an attribute. With such +information, the `getattr` and `hasattr` functions can be supported. == Attribute Tables == -In order to provide information about the attributes supported by each object, the structure of each object will reference a table containing entries describing these supported attributes. The size of this table is declared within the table structure, and for each position in the table an entry corresponding to the same position within an actual object structure describes the nature of the attribute at that position. +In order to provide information about the attributes supported by each object, +the structure of each object will reference a table containing entries +describing these supported attributes. The size of this table is declared +within the table structure, and for each position in the table an entry +corresponding to the same position within an actual object structure describes +the nature of the attribute at that position. == Parameter Tables == -In order to support argument validation and keyword arguments in invocations, a structure is referenced by `__args__` that indicates... +In order to support argument validation and keyword arguments in invocations, +a structure is referenced by `__args__` that indicates... * The minimum number of parameters supported by a callable * The maximum number of parameters supported by a callable * The size of the table describing the parameters - * A table of entries with each entry indicating the nature of a parameter (in effect, which name it uses, albeit as a generated code instead of a string) and the position of the parameter in any argument list prepared for an invocation + * A table of entries with each entry indicating the nature of a parameter (in + effect, which name it uses, albeit as a generated code instead of a string) + and the position of the parameter in any argument list prepared for an + invocation -Parameter tables only need to be consulted at run-time if the nature of a callable is undetermined. By supporting a uniform interface, the arguments used in an invocation can be tested against the description provided by `__args__` and the table. +Parameter tables only need to be consulted at run-time if the nature of a +callable is undetermined. By supporting a uniform interface, the arguments +used in an invocation can be tested against the description provided by +`__args__` and the table. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Restarted --- a/docs/wiki/Restarted Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Restarted Fri Aug 17 11:41:28 2018 +0200 @@ -1,8 +1,16 @@ = Lichen Restarted = -Originally, lots of work was being put in to support various Python features that are arguably superfluous. The realisation was had that a lot of effort was being made for little practical benefit by trying to support things that are, [[../Design|in the larger picture]], not that important. Consequently, Lichen was refocused on a smaller set of more useful Python features. +Originally, lots of work was being put in to support various Python features +that are arguably superfluous. The realisation was had that a lot of effort +was being made for little practical benefit by trying to support things that +are, [[../Design|in the larger picture]], not that important. Consequently, +Lichen was refocused on a smaller set of more useful Python features. -This document is of historical interest only, with the [[../Design|design]] and other documents attempting to communicate the results of this restarting effort. Some obsolete information is therefore preserved below. For example, attributes hold context information in the diagrams, but context information is now held in wrappers or is maintained separately within programs. +This document is of historical interest only, with the [[../Design|design]] +and other documents attempting to communicate the results of this restarting +effort. Some obsolete information is therefore preserved below. For example, +attributes hold context information in the diagrams, but context information +is now held in wrappers or is maintained separately within programs. Objectives: @@ -10,7 +18,8 @@ * No importing triggered during module inspection * All unresolved external references are set to `` * Hierarchical module namespaces are not exposed in programs - * Modules are independent: package hierarchies are not traversed when importing + * Modules are independent: package hierarchies are not traversed when + importing * Nested scopes will be dropped * If expressions and comprehensions will be dropped * `self` is a reserved name and is optional in method parameter lists @@ -19,40 +28,51 @@ == Names == -Names are locals, globals or built-ins. (Special names exist internally to support certain operations.) +Names are locals, globals or built-ins. (Special names exist internally to +support certain operations.) -Locals inside functions are dynamic; locals outside functions are static, as are module globals. Built-ins are defined statically in the `__builtins__` package. +Locals inside functions are dynamic; locals outside functions are static, as +are module globals. Built-ins are defined statically in the `__builtins__` +package. == Imports == -Imports provide access to external references. The "leaf" module in a module path is the module returned by the statement. +Imports provide access to external references. The "leaf" module in a module +path is the module returned by the statement. Indicate that module "compiler" is accessed via compiler... + {{{#!python numbers=disable import compiler }}} Indicate that module "compiler" is accessed via comp... + {{{#!python numbers=disable import compiler as comp }}} -Indicate that module "compiler.ast" is accessed via ast; module "compiler.transformer" is accessed via tr... +Indicate that module "compiler.ast" is accessed via ast; module +"compiler.transformer" is accessed via tr... + {{{#!python numbers=disable import compiler.ast as ast, compiler.transformer as tr }}} Import compiler.ast, access Function... + {{{#!python numbers=disable from compiler.ast import Function }}} Import compiler.ast, access Function as F... + {{{#!python numbers=disable from compiler.ast import Function as F }}} -This causes some semantic differences with Python, with the most significant one being the following: +This causes some semantic differences with Python, with the most significant +one being the following: {{{{#!table '''Python''' || '''Lichen''' @@ -108,10 +128,12 @@ Some notes: - * Names not defined in a module and not declared in import statements are unresolved + * Names not defined in a module and not declared in import statements are + unresolved * Modules are identified during inspection but are not loaded * Instead, modules are added to a list and are imported later - * Names imported from modules are set to `` (since the contents of modules will not generally be known) + * Names imported from modules are set to `` (since the contents of + modules will not generally be known) * Names are resolved in a later activity == Self == @@ -119,22 +141,37 @@ In Python: * The `self` name provides access to the instance associated with a method - * The instance is supplied by a "context", initialised when a method is obtained from an instance (or through other attribute accesses) - * Upon invocation, any instance context must be assigned to the `self` parameter, provided the callable is a method - * Meanwhile, any non-instance context is not assigned to the `self` parameter, which should be provided explicitly for class-accessed methods - * Plain functions never expose `self` or have `self` initialised, even if they have been assigned to an instance + * The instance is supplied by a "context", initialised when a method is + obtained from an instance (or through other attribute accesses) + * Upon invocation, any instance context must be assigned to the `self` + parameter, provided the callable is a method + * Meanwhile, any non-instance context is not assigned to the `self` + parameter, which should be provided explicitly for class-accessed methods + * Plain functions never expose `self` or have `self` initialised, even if + they have been assigned to an instance -Apart from tests for the nature of the context and the callable, the argument list is effectively variable. +Apart from tests for the nature of the context and the callable, the argument +list is effectively variable. With `self` as a ubiquitous, hidden parameter: - * The `self` name still provides access to the instance associated with a method - * The instance is still supplied by a "context", initialised when a method is obtained from an instance (or through other attribute accesses) - * Upon invocation, `self` is included in the argument list regardless of the nature of the callable - * Class-accessed methods would have their class as context, following from the above, but `self` may not refer to a class: it must be an instance - * To combine class-accessed methods with instance contexts, a special function is employed + * The `self` name still provides access to the instance associated with a + method + * The instance is still supplied by a "context", initialised when a method is + obtained from an instance (or through other attribute accesses) + * Upon invocation, `self` is included in the argument list regardless of the + nature of the callable + * Class-accessed methods would have their class as context, following from + the above, but `self` may not refer to a class: it must be an instance + * To combine class-accessed methods with instance contexts, a special + function is employed -The argument list for each callable thus remains static, at a cost of allocating an extra argument that may not be used. (Various calling conventions for certain processor architectures employ potentially unused registers, anyway.) Note that a callable may support defaults, however, and thus any argument list may need extending to include default values for parameters without corresponding arguments. +The argument list for each callable thus remains static, at a cost of +allocating an extra argument that may not be used. (Various calling +conventions for certain processor architectures employ potentially unused +registers, anyway.) Note that a callable may support defaults, however, and +thus any argument list may need extending to include default values for +parameters without corresponding arguments. {{{{#!table '''Python''' || '''Without self''' @@ -185,7 +222,13 @@ }}}} -To avoid usage of `self` in undefined ways, only methods are able to use `self` and are not allowed to redefine it. Consequently, when invoking a callable, the context is set (where the callable is unknown until run-time; it is not set if compile-time knowledge indicates that it is not needed), and in situations where `self` is not permitted, the context is therefore safely ignored. Meanwhile, methods are always supplied with a context compatible with `self`. +To avoid usage of `self` in undefined ways, only methods are able to use +`self` and are not allowed to redefine it. Consequently, when invoking a +callable, the context is set (where the callable is unknown until run-time; it +is not set if compile-time knowledge indicates that it is not needed), and in +situations where `self` is not permitted, the context is therefore safely +ignored. Meanwhile, methods are always supplied with a context compatible with +`self`. || '''Callable''' || '''self''' || '''Remarks''' || || Class || context || context discarded and replaced by allocated instance || @@ -196,13 +239,24 @@ || Method (via class) || class as context || method not called (see "unbound methods") || || Method (via instance) || instance as context || `self` set to instance || -Note that the treatment of functions stored on classes differs from Python. In Python, such functions would become unbound methods (see below) and would employ their first parameter as an effective `self` parameter (regardless of name). +Note that the treatment of functions stored on classes differs from Python. In +Python, such functions would become unbound methods (see below) and would +employ their first parameter as an effective `self` parameter (regardless of +name). == Unbound Methods == -Since methods acquired directly from classes ("unbound methods" in Python) are meant to be combined with an instance as context (using the `get_using` function), they must be uncallable until combined with the appropriate context, yet the same methods when acquired via instances ("bound methods" in Python) must be immediately callable. +Since methods acquired directly from classes ("unbound methods" in Python) are +meant to be combined with an instance as context (using the `get_using` +function), they must be uncallable until combined with the appropriate +context, yet the same methods when acquired via instances ("bound methods" in +Python) must be immediately callable. -To support the two different states of methods, the principal structure of a class has attributes referencing uncallable versions of its methods. Meanwhile, such uncallable methods reference callable versions and when instances are employed to access the class attributes, it is these callable versions that are retrieved. For example: +To support the two different states of methods, the principal structure of a +class has attributes referencing uncallable versions of its methods. +Meanwhile, such uncallable methods reference callable versions and when +instances are employed to access the class attributes, it is these callable +versions that are retrieved. For example: {{{#!graphviz //format=svg @@ -253,7 +307,12 @@ } }}} -Callable methods provide a reference to a callable routine in its special callable member, just as functions and classes do. Uncallable methods populate the callable member with a reference to an error routine. Thus, any attempt to call an uncallable method would cause the error routine to be invoked. In addition, uncallable methods reference the corresponding callable methods so that the callable methods can be found and referenced. +Callable methods provide a reference to a callable routine in its special +callable member, just as functions and classes do. Uncallable methods populate +the callable member with a reference to an error routine. Thus, any attempt to +call an uncallable method would cause the error routine to be invoked. In +addition, uncallable methods reference the corresponding callable methods so +that the callable methods can be found and referenced. || '''Accessor''' || '''Provider''' || '''Attribute''' || '''Context''' || '''Summary''' || ||<|4> Instance ||<|4> Instance || Function || ''not used'' ||<|6> Preserve context || @@ -269,14 +328,30 @@ || Unbound method || Providing class || || Other || Same as value || -When obtaining an unbound method from an instance attribute, the context of the method attribute is provided. Indeed, the context is always preserved when accessing instance attributes. +When obtaining an unbound method from an instance attribute, the context of +the method attribute is provided. Indeed, the context is always preserved when +accessing instance attributes. -When obtaining an unbound method from a class attribute via an instance, the context of the method attribute is tested against the accessing instance. If compatible, an attribute is copied containing the instance as context and a callable method reference as value. +When obtaining an unbound method from a class attribute via an instance, the +context of the method attribute is tested against the accessing instance. If +compatible, an attribute is copied containing the instance as context and a +callable method reference as value. -When obtaining an unbound method from a class attribute, the context of the method attribute is provided. Indeed, the context is always preserved when accessing class attributes directly. +When obtaining an unbound method from a class attribute, the context of the +method attribute is provided. Indeed, the context is always preserved when +accessing class attributes directly. -When combining an unbound method obtained from a class with an instance using `get_using`, the context of the method attribute is tested against the supplied instance. If compatible, an attribute is copied containing the instance as context and a callable method reference as value. +When combining an unbound method obtained from a class with an instance using +`get_using`, the context of the method attribute is tested against the +supplied instance. If compatible, an attribute is copied containing the +instance as context and a callable method reference as value. === Functions as Unbound Methods === -Functions not defined within classes could be treated as unbound methods if they were to employ `self` (thus indicating that they are intended as methods). Such functions would then be recorded as uncallable in the module namespace, needing to be explicitly bound to a class using a special function. However, there appears to be limited utility in defining functions in this way, instead of defining them directly as methods, or instead of merely using such generic functions from existing methods. +Functions not defined within classes could be treated as unbound methods if +they were to employ `self` (thus indicating that they are intended as +methods). Such functions would then be recorded as uncallable in the module +namespace, needing to be explicitly bound to a class using a special function. +However, there appears to be limited utility in defining functions in this +way, instead of defining them directly as methods, or instead of merely using +such generic functions from existing methods. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Structure --- a/docs/wiki/Structure Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Structure Fri Aug 17 11:41:28 2018 +0200 @@ -1,12 +1,28 @@ = Program Structure = -A program consists of a number of '''modules''' with each module providing its own namespace. Modules can be organised within '''packages''' which define a hierarchical relationship between them. However, the relationship between modules is not automatically exposed within the program: it is more appropriate to consider modules as independent entities that have a particular naming scheme. +A program consists of a number of '''modules''' with each module providing its +own namespace. Modules can be organised within '''packages''' which define a +hierarchical relationship between them. However, the relationship between +modules is not automatically exposed within the program: it is more +appropriate to consider modules as independent entities that have a particular +naming scheme. -Within each module a hierarchy of namespaces is provided by '''classes''', with each class potentially containing other classes, and so on. Also appearing within modules and classes are '''functions''', with those appearing within classes being regarded as '''methods'''. +Within each module a hierarchy of namespaces is provided by '''classes''', +with each class potentially containing other classes, and so on. Also +appearing within modules and classes are '''functions''', with those appearing +within classes being regarded as '''methods'''. -Function namespaces are considered separately from module and class namespaces and are not considered part of the general namespace hierarchy, instead merely appearing as objects at the edges of that hierarchy. Functions may also be defined within functions, and such inner functions will be referenced within their parent functions, but no hierarchical relationship will exist between their namespaces. +Function namespaces are considered separately from module and class namespaces +and are not considered part of the general namespace hierarchy, instead merely +appearing as objects at the edges of that hierarchy. Functions may also be +defined within functions, and such inner functions will be referenced within +their parent functions, but no hierarchical relationship will exist between +their namespaces. -Thus, a program provides a static namespace hierarchy consisting of modules containing classes (containing other classes, and so on) plus functions. Objects residing within namespaces are accessed via '''attributes''' of those namespaces. +Thus, a program provides a static namespace hierarchy consisting of modules +containing classes (containing other classes, and so on) plus functions. +Objects residing within namespaces are accessed via '''attributes''' of those +namespaces. {{{{#!table {{{#!graphviz @@ -88,7 +104,9 @@ == Referencing the Structure == -Each part of the structure is catalogued using an '''object path''', indicating its location in the structure hierarchy, and a reference indicating its nature and origin. For example: +Each part of the structure is catalogued using an '''object path''', +indicating its location in the structure hierarchy, and a reference indicating +its nature and origin. For example: || '''Object Path''' || '''Reference''' || '''Explanation''' || || `M.A` || `:M.A` || The definition of class `A` in module `M` || @@ -99,17 +117,36 @@ || `__main__.M` || `:M` || A reference to module `M` || || `__main__.counter` || `` || An undetermined object called `counter` in the `__main__` module || -The reference therefore expresses the '''kind''' of object (class, function, instance, module or undetermined variable), possibly accompanied by an object type, and also possibly accompanied by an alias indicating where the reference was obtained. +The reference therefore expresses the '''kind''' of object (class, function, +instance, module or undetermined variable), possibly accompanied by an object +type, and also possibly accompanied by an alias indicating where the reference +was obtained. == Classes == -Classes are regarded as statically-defined objects, meaning that they are only evaluated once and must not be defined within conditional sections or within functions. Base classes must be expressed using readily-identifiable class names, since any potential ambiguity or uncertainty with the identity of base classes could result in a class inheritance hierarchy that is effectively dynamic. +Classes are regarded as statically-defined objects, meaning that they are only +evaluated once and must not be defined within conditional sections or within +functions. Base classes must be expressed using readily-identifiable class +names, since any potential ambiguity or uncertainty with the identity of base +classes could result in a class inheritance hierarchy that is effectively +dynamic. -When '''instantiated''', classes provide '''instances''' that provide the attributes of each class's namespace plus any '''inherited''' attributes from base classes that would not be provided by the class itself. In addition, instances provide their own instance attributes. +When '''instantiated''', classes provide '''instances''' that provide the +attributes of each class's namespace plus any '''inherited''' attributes from +base classes that would not be provided by the class itself. In addition, +instances provide their own instance attributes. == Functions == -Functions are regarded as statically-defined objects, but they may be defined either with names within other named functions or as '''lambdas''' (functions without names) within named functions or other lambdas. Thus, unlike classes, functions may be defined once but replicated many times, each time with different accompanying state information. Such state information needs to be provided via default parameters: it is not detected and propagated automatically. Closures are not supported: any state from enclosing scopes must be supplied at the point of definition of a function; it is not acquired from outer functions by name. +Functions are regarded as statically-defined objects, but they may be defined +either with names within other named functions or as '''lambdas''' (functions +without names) within named functions or other lambdas. Thus, unlike classes, +functions may be defined once but replicated many times, each time with +different accompanying state information. Such state information needs to be +provided via default parameters: it is not detected and propagated +automatically. Closures are not supported: any state from enclosing scopes +must be supplied at the point of definition of a function; it is not acquired +from outer functions by name. {{{#!python numbers=disable def outer(a): @@ -125,11 +162,17 @@ === Lambdas === -Lambdas are given special names for the purposes of identification within the program structure, being named relative to the scope in which they are defined. For example, the first lambda appearing within module `N` would have an object path of `N.$l0`, and a subsequent lambda within the same scope would have an object path of `N.$l1`. Lambdas may appear within classes and functions, including lambdas themselves. For example: +Lambdas are given special names for the purposes of identification within the +program structure, being named relative to the scope in which they are +defined. For example, the first lambda appearing within module `N` would have +an object path of `N.$l0`, and a subsequent lambda within the same scope would +have an object path of `N.$l1`. Lambdas may appear within classes and +functions, including lambdas themselves. For example: {{{#!python numbers=disable def f(x): return lambda y, x=x: lambda z, x=x, y=y: (x, y, z) }}} -Here, within a module `M`, the outer lambda would have the object path `M.f.$l0` whereas the inner lambda would have the object path `M.f.$l0.$l0`. +Here, within a module `M`, the outer lambda would have the object path +`M.f.$l0` whereas the inner lambda would have the object path `M.f.$l0.$l0`. diff -r 8f8361de472a -r f745d151a441 docs/wiki/ToDo --- a/docs/wiki/ToDo Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/ToDo Fri Aug 17 11:41:28 2018 +0200 @@ -1,57 +1,102 @@ = To Do = -As always with software, there are still many things that need to be done. Here is a list of a few of them. +As always with software, there are still many things that need to be done. +Here is a list of a few of them. <> == Finish the Core Standard Library == -Lichen provides its own core standard library featuring the [[../Builtins|built-ins]] and other essential modules, with only a small amount of native code supporting implementations written in the Lichen language. +Lichen provides its own core standard library featuring the +[[../Builtins|built-ins]] and other essential modules, with only a small +amount of native code supporting implementations written in the Lichen +language. === Numeric Types === -Support all the numeric types. Currently, only `int` is supported, but `float` merely requires some native code handling operations and testing for exception conditions. A representation for `long` needs to be determined, and `complex` probably just needs some methods implementing. +Support all the numeric types. Currently, only `int` is supported, but `float` +merely requires some native code handling operations and testing for exception +conditions. A representation for `long` needs to be determined, and `complex` +probably just needs some methods implementing. -Support promotion between some of the numeric types. Currently, `int` values that overflow raise `OverflowError`, as was done in Python before automatic promotion to `long`. +Support promotion between some of the numeric types. Currently, `int` values +that overflow raise `OverflowError`, as was done in Python before automatic +promotion to `long`. === String Types === -The remaining methods need defining for byte and Unicode strings. Some methods are more complicated than others, particularly where some interpretation of the content is required: identification of case, whitespace, punctuation, and so on. +The remaining methods need defining for byte and Unicode strings. Some methods +are more complicated than others, particularly where some interpretation of +the content is required: identification of case, whitespace, punctuation, and +so on. Some Unicode operations could be implemented in the Lichen language, not in C. === Sequence and Mapping Types === -Various methods still need defining in the dictionary, list, tuple and set classes to provide parity with Python. +Various methods still need defining in the dictionary, list, tuple and set +classes to provide parity with Python. === Hashing === -Unlike Python, the hashing approaches currently employed are not well-tuned. One important difference between Lichen and Python is that the former does not use hashtables as a general structure for objects, meaning that certain desirable criteria for Python hashtables (randomising certain aspects of their behaviour to prevent the exploitation of poorly performing cases) are less important for Lichen. +Unlike Python, the hashing approaches currently employed are not well-tuned. +One important difference between Lichen and Python is that the former does not +use hashtables as a general structure for objects, meaning that certain +desirable criteria for Python hashtables (randomising certain aspects of their +behaviour to prevent the exploitation of poorly performing cases) are less +important for Lichen. == Provide Peripheral Library Support == -The Python standard library offers support for things like networking which require a degree of system-level integration. The Lichen libraries could be expanded to offer coverage of many of the same APIs. +The Python standard library offers support for things like networking which +require a degree of system-level integration. The Lichen libraries could be +expanded to offer coverage of many of the same APIs. === Versatile Native Libraries === -Currently, the native functionality already exposes things like file descriptors, and a relatively simple versatile interface might be sufficient for many future libraries. +Currently, the native functionality already exposes things like file +descriptors, and a relatively simple versatile interface might be sufficient +for many future libraries. == Incremental Compilation == -To some extent, the toolchain supports the caching of inspection results with selective regeneration upon changes to source files. However, deduction and subsequent activities are performed in their entirety every time. +To some extent, the toolchain supports the caching of inspection results with +selective regeneration upon changes to source files. However, deduction and +subsequent activities are performed in their entirety every time. === Preserving Structures === -Where source files have changed, it may be possible that the structure details remain the same. Consequently, it is unnecessary to regenerate structures, and it is only necessary to determine the nature of modified program operations. +Where source files have changed, it may be possible that the structure details +remain the same. Consequently, it is unnecessary to regenerate structures, and +it is only necessary to determine the nature of modified program operations. -Where structures have changed, certain existing operations may need to be updated because such operations may no longer support certain types of object or may support additional types. Meanwhile, changes to structures may be compatible with existing structure layouts and not require such layouts to be recomputed. +Where structures have changed, certain existing operations may need to be +updated because such operations may no longer support certain types of object +or may support additional types. Meanwhile, changes to structures may be +compatible with existing structure layouts and not require such layouts to be +recomputed. == Exploit Deductions Further == -So far, deductions have been used to inform the translation of certain operations and to identify situations where suitable operations cannot be generated. However, no attempt has been made to broaden the application of such information. +So far, deductions have been used to inform the translation of certain +operations and to identify situations where suitable operations cannot be +generated. However, no attempt has been made to broaden the application of +such information. === Parameter Type Checking === -Elementary knowledge about function parameters is currently available when inspecting a program, but the deducer does not attempt to propagate such information to likely callers of such functions. Such propagation in its simplest form would merely use the list of potential targets for each invocation, obtain the parameter types from each target, compare these types to any readily-comparable argument (such as a name), and then determine which targets may still be invoked successfully, potentially imposing restrictions on the suitable targets as a result. +Elementary knowledge about function parameters is currently available when +inspecting a program, but the deducer does not attempt to propagate such +information to likely callers of such functions. Such propagation in its +simplest form would merely use the list of potential targets for each +invocation, obtain the parameter types from each target, compare these types +to any readily-comparable argument (such as a name), and then determine which +targets may still be invoked successfully, potentially imposing restrictions +on the suitable targets as a result. -Such propagation activities can be time-consuming because they can lead to iterative whole-program propagation occurring. Consider the restriction of an invocation target: this may indicate a restriction on a name that provides such a target, thus affecting the function parameter providing the name; such a restriction could then be propagated to callers of that function, initiating further refinement of other invocations, and so on. +Such propagation activities can be time-consuming because they can lead to +iterative whole-program propagation occurring. Consider the restriction of an +invocation target: this may indicate a restriction on a name that provides +such a target, thus affecting the function parameter providing the name; such +a restriction could then be propagated to callers of that function, initiating +further refinement of other invocations, and so on. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Toolchain --- a/docs/wiki/Toolchain Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Toolchain Fri Aug 17 11:41:28 2018 +0200 @@ -1,34 +1,48 @@ = Toolchain = -The toolchain implements the process of analysing Lichen source files, compiling information about the structures and routines expressed in each program, and generating output for further processing that can produce an executable program. +The toolchain implements the process of analysing Lichen source files, +compiling information about the structures and routines expressed in each +program, and generating output for further processing that can produce an +executable program. <> == Compiling Programs == -The principal interface to the toolchain is the `lplc` command, run on source files as in the following example: +The principal interface to the toolchain is the `lplc` command, run on source +files as in the following example: {{{ lplc tests/unicode.py }}} -There is no need to specify all the files that might be required by the complete program. Instead, the toolchain identifies files in the program by searching its module search path. This can be configured using the `LICHENPATH` environment variable and the `-E` option. +There is no need to specify all the files that might be required by the +complete program. Instead, the toolchain identifies files in the program by +searching its module search path. This can be configured using the +`LICHENPATH` environment variable and the `-E` option. -Various [[../Prerequisites|prerequisites]] are needed for the toolchain to work properly. By specifying the `-c` option, the specified program will be translated to a C programming language representation but not built, avoiding the need for some development tools to be installed if this is desirable. +Various [[../Prerequisites|prerequisites]] are needed for the toolchain to +work properly. By specifying the `-c` option, the specified program will be +translated to a C programming language representation but not built, avoiding +the need for some development tools to be installed if this is desirable. -The default output file from a successful compilation is a file called `_main`, but this can be overridden using the `-o` option. For example: +The default output file from a successful compilation is a file called +`_main`, but this can be overridden using the `-o` option. For example: {{{ lplc -o unicode tests/unicode.py }}} -The complete set of options can be viewed by specifying the `--help` option, and a manual page is also provided in the `docs` directory of the source distribution: +The complete set of options can be viewed by specifying the `--help` option, +and a manual page is also provided in the `docs` directory of the source +distribution: {{{ man -l docs/lplc.1 }}} -This page may already be installed if the software was provided as a package as part of an operating system distribution: +This page may already be installed if the software was provided as a package +as part of an operating system distribution: {{{ man lplc @@ -36,18 +50,68 @@ == Toolchain Implementation == -The toolchain itself is currently written in Python, but it is envisaged that it will eventually be written in the Lichen language, hopefully needing only minor modifications so that it may be able to accept its own source files as input and ultimately produce a representation of itself as an executable program. Since the Lichen language is based on Python, it is convenient to use existing Python implementations to access libraries that support the parsing of Python source files into useful representations. +The toolchain itself is currently written in Python, but it is envisaged that +it will eventually be written in the Lichen language, hopefully needing only +minor modifications so that it may be able to accept its own source files as +input and ultimately produce a representation of itself as an executable +program. Since the Lichen language is based on Python, it is convenient to use +existing Python implementations to access libraries that support the parsing +of Python source files into useful representations. -The Python standard library provides two particularly useful modules or packages of relevance: the `compiler` package and the `parser` module; `parser` is employed by `compiler` to decode source text, whereas `compiler` takes the concrete syntax tree representation from `parser` and produces an abstract syntax tree (AST) which is particularly helpful to software of the nature described here. (Contrary to impressions that [[http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/|some articles]] might give, the `ast` module available in Python 2.5 and later was not the first module to offer AST representations of Python programs in Python, nor was it even the first such module in the standard library.) +The Python standard library provides two particularly useful modules or +packages of relevance: the `compiler` package and the `parser` module; +`parser` is employed by `compiler` to decode source text, whereas `compiler` +takes the concrete syntax tree representation from `parser` and produces an +abstract syntax tree (AST) which is particularly helpful to software of the +nature described here. (Contrary to impressions that +[[http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/|some +articles]] might give, the `ast` module available in Python 2.5 and later was +not the first module to offer AST representations of Python programs in +Python, nor was it even the first such module in the standard library.) -However, it is not desirable to have a dependency on a Python implementation, which the `parser` module effectively is (as would the `ast` module also be if it were used here), with it typically being implemented as an extension module in a non-Python language (in C for CPython, in Java for Jython, and so on). Fortunately, the !PyPy project implemented their own parsing module, `pyparser`, that is intended to be used within the !PyPy environment together with their own `ast` equivalent, but it has been possible to rework `pyparser` to produce representations that are compatible with the `compiler` package, itself being modified in various ways to achieve compatibility (and also to provide various other conveniences). +However, it is not desirable to have a dependency on a Python implementation, +which the `parser` module effectively is (as would the `ast` module also be if +it were used here), with it typically being implemented as an extension module +in a non-Python language (in C for CPython, in Java for Jython, and so on). +Fortunately, the !PyPy project implemented their own parsing module, +`pyparser`, that is intended to be used within the !PyPy environment together +with their own `ast` equivalent, but it has been possible to rework `pyparser` +to produce representations that are compatible with the `compiler` package, +itself being modified in various ways to achieve compatibility (and also to +provide various other conveniences). == Program Analysis == -With the means of inspecting source files available through a `compiler` package producing a usable representation of each file, it becomes possible to identify the different elements in each file and to collect information that may be put to use later. But before any files are inspected, it must be determined ''which'' files are to be inspected, these comprising the complete program to be analysed. +With the means of inspecting source files available through a `compiler` +package producing a usable representation of each file, it becomes possible to +identify the different elements in each file and to collect information that +may be put to use later. But before any files are inspected, it must be +determined ''which'' files are to be inspected, these comprising the complete +program to be analysed. -Both Lichen and Python support the notion of a main source file (sometimes called the "script" file or the main module or `__main__`) and of imported modules and packages. The complete set of modules employed in a program is defined as those imported by the main module, then those imported by those modules, and so on. Thus, the complete set is not known without inspecting part of the program, and this set must be built incrementally until no new modules are encountered. +Both Lichen and Python support the notion of a main source file (sometimes +called the "script" file or the main module or `__main__`) and of imported +modules and packages. The complete set of modules employed in a program is +defined as those imported by the main module, then those imported by those +modules, and so on. Thus, the complete set is not known without inspecting +part of the program, and this set must be built incrementally until no new +modules are encountered. -Where Lichen and Python differ is in the handling of [[../Imports|imports]] themselves. Python [[https://docs.python.org/3/reference/import.html|employs]] an intricate mechanism that searches for modules and packages, loading modules encountered when descending into packages to retrieve specific modules. In contrast, Lichen only imports the modules that are explicitly mentioned in programs. Thus, a Lichen program will not accumulate potentially large numbers of superfluous modules. +Where Lichen and Python differ is in the handling of [[../Imports|imports]] +themselves. Python [[https://docs.python.org/3/reference/import.html|employs]] +an intricate mechanism that searches for modules and packages, loading modules +encountered when descending into packages to retrieve specific modules. In +contrast, Lichen only imports the modules that are explicitly mentioned in +programs. Thus, a Lichen program will not accumulate potentially large numbers +of superfluous modules. -With a given module identified as being part of a program, the module will then be [[../Inspection|inspected]] for the purposes of gathering useful information. Since the primary objectives are to characterise the structure of the objects in a program and to determine how such objects are used, certain kinds of program constructs will be inspected more closely than others. Note that this initial inspection activity is not concerned with the translation of program operations to other forms: such [[../Translation|translation]] will occur later; this initial inspection is purely concerned with obtaining enough information to inform such later activities, with the original program being revisited to provide the necessary detail required to translate it. +With a given module identified as being part of a program, the module will +then be [[../Inspection|inspected]] for the purposes of gathering useful +information. Since the primary objectives are to characterise the structure of +the objects in a program and to determine how such objects are used, certain +kinds of program constructs will be inspected more closely than others. Note +that this initial inspection activity is not concerned with the translation of +program operations to other forms: such [[../Translation|translation]] will +occur later; this initial inspection is purely concerned with obtaining enough +information to inform such later activities, with the original program being +revisited to provide the necessary detail required to translate it. diff -r 8f8361de472a -r f745d151a441 docs/wiki/Translation --- a/docs/wiki/Translation Sat Jul 21 23:19:26 2018 +0200 +++ b/docs/wiki/Translation Fri Aug 17 11:41:28 2018 +0200 @@ -1,32 +1,58 @@ = Translating Programs = -The Lichen toolchain currently targets the C programming language, generating programs that are then compiled using existing C compilers. The process of translation involves the optimisation and generation of program structures, and the translation of Lichen language constructs into equivalent C language constructs. +The Lichen toolchain currently targets the C programming language, generating +programs that are then compiled using existing C compilers. The process of +translation involves the optimisation and generation of program structures, +and the translation of Lichen language constructs into equivalent C language +constructs. <> == Attribute Access Plans == -During [[../Inspection|inspection]], each attribute access is associated with a unique location and its details stored. During [[../Deduction|deduction]], such accesses are resolved and characterised. During [[../Optimisation|optimisation]], such accesses are encoded as '''instruction plans''' indicating the operations or instructions required to achieve the access. During translation, these instruction plans are generated as part of the final program. +During [[../Inspection|inspection]], each attribute access is associated with +a unique location and its details stored. During [[../Deduction|deduction]], +such accesses are resolved and characterised. During +[[../Optimisation|optimisation]], such accesses are encoded as '''instruction +plans''' indicating the operations or instructions required to achieve the +access. During translation, these instruction plans are generated as part of +the final program. == Structure Optimisation == -All object structures need to support the attributes defined for them, and although it may be possible to determine the identity of attributes in advance of a program being run, there generally remains a need to provide mechanisms to determine the nature of an object at run-time and to access its attributes dynamically. To achieve this, an [[../Optimisation|optimisation]] process catalogues the presence of each attribute name on all program objects, then deciding upon a position for the structure member corresponding to that attribute name in all objects. +All object structures need to support the attributes defined for them, and +although it may be possible to determine the identity of attributes in advance +of a program being run, there generally remains a need to provide mechanisms +to determine the nature of an object at run-time and to access its attributes +dynamically. To achieve this, an [[../Optimisation|optimisation]] process +catalogues the presence of each attribute name on all program objects, then +deciding upon a position for the structure member corresponding to that +attribute name in all objects. == Structure Generation == -Each program consists of a number of structures providing the program's objects and defining classes, functions, modules and any predefined instances. +Each program consists of a number of structures providing the program's +objects and defining classes, functions, modules and any predefined instances. === Attribute Representation === -Attributes most typically provide references to objects within a certain optional context. Since attributes are employed to encode various other kinds of information, the data structure may be interpreted in other ways within suitably controlled environments. For example, the `__data__` attribute found on instances of certain built-in classes will encode references to other kinds of information that are used within native functions. +Attributes most typically provide references to objects within a certain +optional context. Since attributes are employed to encode various other kinds +of information, the data structure may be interpreted in other ways within +suitably controlled environments. For example, the `__data__` attribute found +on instances of certain built-in classes will encode references to other kinds +of information that are used within native functions. === Object Representation === -Objects provide collections of attributes, and the object representation is meant to support efficient computed access to attributes whilst also allowing a reasonably efficient means of testing and accessing attributes at run-time. +Objects provide collections of attributes, and the object representation is +meant to support efficient computed access to attributes whilst also allowing +a reasonably efficient means of testing and accessing attributes at run-time. === Structure Constants === -For clarity, several classes of symbolic constant are defined in the translated program to help define structures or to refer to structure members. +For clarity, several classes of symbolic constant are defined in the +translated program to help define structures or to refer to structure members. || '''Constant Class''' || '''Purpose''' || '''Example''' || '''Application''' || || `csize` || Indicates the number of attributes in a class || `__csize___builtins___list_list` ||<|3> Structure definition || @@ -39,11 +65,18 @@ || `code` || Assigns a code to a particular attribute name || `__code_value` ||<|2> Attribute access || || `pos` || Indicates the table position used by a particular attribute name || `__pos_value` || -Such constants are encoded using helper functions, producing names resembling those above that are intended to be distinct and not to conflict with other defined names in the final program. +Such constants are encoded using helper functions, producing names resembling +those above that are intended to be distinct and not to conflict with other +defined names in the final program. === Instance Structures === -Like all program objects, instances are defined in C according to a generic structure, meaning that all instances have the same generic members from the perspective of a C program. However, specific structures are defined for each class in order to conveniently describe the dimensions of instances of that class. For example, for the built-in list class, a declaration resembling the following is issued: +Like all program objects, instances are defined in C according to a generic +structure, meaning that all instances have the same generic members from the +perspective of a C program. However, specific structures are defined for each +class in order to conveniently describe the dimensions of instances of that +class. For example, for the built-in list class, a declaration resembling the +following is issued: {{{ typedef struct { @@ -57,15 +90,24 @@ === Useful C Language Features === -The translated code relies on the availability of certain useful features of C, some of them supported only in modern compilers: +The translated code relies on the availability of certain useful features of +C, some of them supported only in modern compilers: - * Array and structure literals: these are used to define values within expressions that would otherwise require separate definition; invocation arguments are defined using inline expressions and the `__ARGS` macro (for unknown invocation targets and for keyword arguments) - * [[https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros|Variadic macros]]: such macros permit variable numbers of arguments, thus making the `__ARGS` macro possible - * [[WikiPedia:Comma operator]] sequences: these are used to construct instruction sequences within expressions, providing a mechanism for formulating attribute accesses + * Array and structure literals: these are used to define values within + expressions that would otherwise require separate definition; invocation + arguments are defined using inline expressions and the `__ARGS` macro (for + unknown invocation targets and for keyword arguments) + * [[https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros| + Variadic macros]]: such macros permit variable numbers of arguments, thus + making the `__ARGS` macro possible + * [[WikiPedia:Comma operator]] sequences: these are used to construct + instruction sequences within expressions, providing a mechanism for + formulating attribute accesses === Function Naming === -A naming scheme is applied to functions defined in the final C program to support different kinds of callables present in the original program. +A naming scheme is applied to functions defined in the final C program to +support different kinds of callables present in the original program. || '''Name Class''' || '''Purpose''' || '''Example''' || || `fn` || Indicates a plain function || `__fn___builtins___str_str` || @@ -75,9 +117,16 @@ === Generic Program Functionality === -Given the representation of the fundamental program objects, a suite of generic operations is provided to support the activities necessary to inspect and update program data. Such operations are largely concerned with generic object and attribute representations and are practically identical from program to program. +Given the representation of the fundamental program objects, a suite of +generic operations is provided to support the activities necessary to inspect +and update program data. Such operations are largely concerned with generic +object and attribute representations and are practically identical from +program to program. -For example, an attribute provided by an object can be accessed in several different ways depending on the information known about the object when the program is translated. Consequently, a suite of attribute access operations is provided to support each different scenario: +For example, an attribute provided by an object can be accessed in several +different ways depending on the information known about the object when the +program is translated. Consequently, a suite of attribute access operations is +provided to support each different scenario: ||<-2|2> ||<-2> '''Attribute provider''' || || '''Class''' || '''Instance''' || @@ -87,7 +136,12 @@ === Specific Program Functionality === -Since objects at the program level have their layouts [[../Optimisation|optimised]], and since these layouts may differ from program to program, a suite of specific operations is provided to support a range of activities within programs, such operations being configured with program-specific information in order to do the right thing within a given program. +Since objects at the program level have their layouts +[[../Optimisation|optimised]], and since these layouts may differ from program +to program, a suite of specific operations is provided to support a range of +activities within programs, such operations being configured with +program-specific information in order to do the right thing within a given +program. || '''Function''' || '''Purpose''' || || `__invoke` || A generic invocation function that populates the argument array and obtains the appropriate target || @@ -98,47 +152,113 @@ === Module Files === -Each program module is defined in its own file in the translated program. Since C has no notion of general module-level code, a special main function is generated for each module to perform the operations defined at the module level in the original program. These main functions are called in turn by the principal `main` function of the final C program, and the order of invocation is defined by the module initialisation ordering established when resolving inter-module dependencies. +Each program module is defined in its own file in the translated program. +Since C has no notion of general module-level code, a special main function is +generated for each module to perform the operations defined at the module +level in the original program. These main functions are called in turn by the +principal `main` function of the final C program, and the order of invocation +is defined by the module initialisation ordering established when resolving +inter-module dependencies. === Statement Translation === -Statements in the original program are generally translated to statements in the final C program. During inspection and translation, certain constructs are transformed to produce the operations necessary to implement such constructs. For instance, operators need to be rewritten as function invocations, and thus the form of the original program changes before it is translated to C. +Statements in the original program are generally translated to statements in +the final C program. During inspection and translation, certain constructs are +transformed to produce the operations necessary to implement such constructs. +For instance, operators need to be rewritten as function invocations, and thus +the form of the original program changes before it is translated to C. -When translating certain operations that may appear within a statement, a sequence of instructions is generated, and such instructions must be performed in turn. Fortunately, C supports a wide variety of operations - such as assignments - within expressions, and it also provides the comma operator which permits sequences of operations to be performed in order and to yield a result that can then be used in the context of an expression. Such sequences of operations are employed extensively to realise attribute access instruction plans. +When translating certain operations that may appear within a statement, a +sequence of instructions is generated, and such instructions must be performed +in turn. Fortunately, C supports a wide variety of operations - such as +assignments - within expressions, and it also provides the comma operator +which permits sequences of operations to be performed in order and to yield a +result that can then be used in the context of an expression. Such sequences +of operations are employed extensively to realise attribute access instruction +plans. === Name References === -Names can be translated in different ways for the final C program. Function locals and parameters correspond directly to locals in C functions. Module-level globals and class-level locals are translated to structure accesses. However, temporary names that are used in sequence assignment are translated to C function locals, either in functions corresponding to their equivalents in the original program, or in module initialisation functions corresponding to the module scope in the original program. +Names can be translated in different ways for the final C program. Function +locals and parameters correspond directly to locals in C functions. +Module-level globals and class-level locals are translated to structure +accesses. However, temporary names that are used in sequence assignment are +translated to C function locals, either in functions corresponding to their +equivalents in the original program, or in module initialisation functions +corresponding to the module scope in the original program. -Certain special names that are formulated during [[../Inspection|inspection]] are converted to constant or static references. +Certain special names that are formulated during [[../Inspection|inspection]] +are converted to constant or static references. === Attribute Accesses === -Each attribute access is associated with a particular access location, corresponding to an operation occurring in the original program identified during inspection. This location can be used as a key to access a [[#Attribute_Access_Plans|plan]] that describes how the access may be realised in the final program. Since most of the work computing the nature of the access is done during the preceding deduction and optimisation activities, what remains is mostly the emission of the plan into the generated program text with some substitutions performed. +Each attribute access is associated with a particular access location, +corresponding to an operation occurring in the original program identified +during inspection. This location can be used as a key to access a +[[#Attribute_Access_Plans|plan]] that describes how the access may be realised +in the final program. Since most of the work computing the nature of the +access is done during the preceding deduction and optimisation activities, +what remains is mostly the emission of the plan into the generated program +text with some substitutions performed. === Invocations === -Each invocation involves an object whose identity may or may not be known at compile-time, plus a collection of arguments. The translation process is concerned with obtaining a target to be invoked in the generated program, populating an argument array appropriately for the target, and providing the means of invocation. +Each invocation involves an object whose identity may or may not be known at +compile-time, plus a collection of arguments. The translation process is +concerned with obtaining a target to be invoked in the generated program, +populating an argument array appropriately for the target, and providing the +means of invocation. -Where the identity of the object is not known at all, the information provided is effectively passed on to the special `__invoke` function which is then required to do all the work at run-time. However, one of the goals of analysing and compiling the program is to avoid doing such work at run-time and to take advantage of those cases where the identity of the object can be determined. +Where the identity of the object is not known at all, the information provided +is effectively passed on to the special `__invoke` function which is then +required to do all the work at run-time. However, one of the goals of +analysing and compiling the program is to avoid doing such work at run-time +and to take advantage of those cases where the identity of the object can be +determined. -Thus, where the identity of the called object is known, details of the object's signature - its parameters and their positions - are used to populate the argument array and to determine whether this can be done successfully. Some callables require a context to be provided, and knowledge about the callable can be used to obtain or omit such a context from the arguments. +Thus, where the identity of the called object is known, details of the +object's signature - its parameters and their positions - are used to populate +the argument array and to determine whether this can be done successfully. +Some callables require a context to be provided, and knowledge about the +callable can be used to obtain or omit such a context from the arguments. -Finally, the invocation target must be obtained. Where some knowledge of the identity of the object is available, it may be sufficient to access the special `__fn__` member directly (potentially testing for its presence if this is not certain) and to call the appropriate function with the assembled argument array. Where a definite identity is available, the actual function itself may be named and thus called with the arguments. Such operations are mere fragments of the total work performed by the `__invoke` function in the least optimal case. +Finally, the invocation target must be obtained. Where some knowledge of the +identity of the object is available, it may be sufficient to access the +special `__fn__` member directly (potentially testing for its presence if this +is not certain) and to call the appropriate function with the assembled +argument array. Where a definite identity is available, the actual function +itself may be named and thus called with the arguments. Such operations are +mere fragments of the total work performed by the `__invoke` function in the +least optimal case. === Assignments === -Assignments of objects to names occur for the following kinds of statements: assignments, definition statements (`class` and `def`), import statements (`from` and `import`). +Assignments of objects to names occur for the following kinds of statements: +assignments, definition statements (`class` and `def`), import statements +(`from` and `import`). === Guards === -Guards are identified during [[../Deduction|deduction]] as being necessary to uphold deductions about the nature of objects referenced via names that may be undermined at run-time by potentially erroneous program behaviour. +Guards are identified during [[../Deduction|deduction]] as being necessary to +uphold deductions about the nature of objects referenced via names that may be +undermined at run-time by potentially erroneous program behaviour. === Exceptions === -The C programming language does not support exceptions in a form supported by more modern programming languages. Consequently, lower-level mechanisms need to be employed to reproduce the semantics of the `try` statement, its `except` and `finally` clauses, and the `raise` statement, as well as considering the effect of exceptions on the `return` statement. +The C programming language does not support exceptions in a form supported by +more modern programming languages. Consequently, lower-level mechanisms need +to be employed to reproduce the semantics of the `try` statement, its `except` +and `finally` clauses, and the `raise` statement, as well as considering the +effect of exceptions on the `return` statement. -Fortunately, some consideration has been given to this topic before. The [[http://www.nicemice.net/cexcept/|cexcept]] library - really a collection of convenience macros - provides a way of expressing exception constructs in a natural way in C while translating those constructs to usage of the `setjmp` and `longjmp` functions. Although direct usage of such functions could be generated during program translation, for convenience and to focus on other implementation concerns, it was decided to employ the cexcept macros and to build any additional functionality in terms of their usage. +Fortunately, some consideration has been given to this topic before. The +[[http://www.nicemice.net/cexcept/|cexcept]] library - really a collection of +convenience macros - provides a way of expressing exception constructs in a +natural way in C while translating those constructs to usage of the `setjmp` +and `longjmp` functions. Although direct usage of such functions could be +generated during program translation, for convenience and to focus on other +implementation concerns, it was decided to employ the cexcept macros and to +build any additional functionality in terms of their usage. A structure is defined to hold exception-related information: @@ -152,7 +272,8 @@ } __exc; }}} -Here, the `arg` member holds a raised exception object reference. Meanwhile, the remaining members interact with language constructs as follows: +Here, the `arg` member holds a raised exception object reference. Meanwhile, +the remaining members interact with language constructs as follows: || '''Member''' || '''Description''' || '''Output Program Operation''' || || `raising` || Indicates an exception that has been raised and not yet handled || `__Raise` || @@ -161,4 +282,6 @@ === Memory Allocation and Garbage Collection === -To avoid having to write a garbage collector, the [[http://www.hboehm.info/gc/|Boehm-Demers-Weiser garbage collector]] is employed by programs to allocate and free memory needed for its objects. +To avoid having to write a garbage collector, the +[[http://www.hboehm.info/gc/|Boehm-Demers-Weiser garbage collector]] is +employed by programs to allocate and free memory needed for its objects.