1.1 --- a/README.txt Tue Apr 11 18:29:30 2017 +0200 1.2 +++ b/README.txt Tue Apr 11 21:41:22 2017 +0200 1.3 @@ -107,3 +107,27 @@ 1.4 1.5 Copyright and licence information can be found in the docs directory - see 1.6 docs/COPYING.txt and docs/gpl-3.0.txt for more information. 1.7 + 1.8 +Generating the Wiki Pages 1.9 +========================= 1.10 + 1.11 +The docs/tools/make_pages.sh script generates a page package for MoinMoin. The 1.12 +following command will generate a page package called pages.zip using the 1.13 +pages directory for staging, with Lichen as the page prefix: 1.14 + 1.15 +docs/tools/make_pages.sh pages Lichen 1.16 + 1.17 +Make sure to include the page prefix where the pages are being deployed in a 1.18 +wiki with other content at the top level. 1.19 + 1.20 +Currently, the wiki pages require the following extensions: 1.21 + 1.22 +ImprovedTableParser https://moinmo.in/ParserMarket/ImprovedTableParser 1.23 + 1.24 +MoinSupport http://hgweb.boddie.org.uk/MoinSupport 1.25 + 1.26 +GraphvizParser https://moinmo.in/ParserMarket/graphviz 1.27 + 1.28 +The GraphvizParser requires diagram-tools for the notugly.xsl stylesheet, 1.29 +although a copy of the stylesheet is provided in the GraphvizParser 1.30 +distribution for convenience.
2.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 2.2 +++ b/docs/tools/make_pages.sh Tue Apr 11 21:41:22 2017 +0200 2.3 @@ -0,0 +1,92 @@ 2.4 +#!/bin/sh 2.5 + 2.6 +DIRNAME=`dirname $0` 2.7 +PROGNAME=`basename $0` 2.8 +OUTDIR=$1 2.9 + 2.10 +if [ ! "$OUTDIR" ] || [ "$1" = '--help' ] ; then 2.11 + cat 1>&2 <<EOF 2.12 +Usage: $PROGNAME <output directory> [ <page prefix> ] [ --releases [ --sign ] ] 2.13 +EOF 2.14 + exit 1 2.15 +fi 2.16 + 2.17 +if [ -e "$OUTDIR" ]; then 2.18 + echo "Please remove $OUTDIR before generating a new package." 1>&2 2.19 + exit 1 2.20 +fi 2.21 + 2.22 +if [ "$2" = '--releases' ]; then 2.23 + PREFIX= 2.24 + RELEASES=$2 2.25 + SIGN=$3 2.26 +else 2.27 + PREFIX=$2 2.28 + RELEASES=$3 2.29 + SIGN=$4 2.30 +fi 2.31 + 2.32 +# Generate release archives. These are held in a separate, semi-permanent 2.33 +# place so that archives and signatures are not regenerated unnecessarily. 2.34 + 2.35 +if [ "$RELEASES" ]; then 2.36 + "$DIRNAME/make_releases.sh" releases 2.37 +fi 2.38 + 2.39 +if [ "$SIGN" ]; then 2.40 + "$DIRNAME/sign_releases.sh" releases 2.41 +fi 2.42 + 2.43 +# Generate a manifest for the page package. 2.44 + 2.45 +MANIFEST="$OUTDIR/MOIN_PACKAGE" 2.46 + 2.47 +mkdir "$OUTDIR" 2.48 +cat > "$MANIFEST" <<EOF 2.49 +MoinMoinPackage|1 2.50 +EOF 2.51 + 2.52 +# Add the pages to the manifest. 2.53 + 2.54 +DOCS="$DIRNAME/../wiki" 2.55 + 2.56 +cp "$DOCS/"* "$OUTDIR" 2.57 + 2.58 +for FILENAME in "$DOCS/"* ; do 2.59 + BASENAME=`basename "$FILENAME"` 2.60 + PAGENAME=`echo "$BASENAME" | sed 's/--/\//g'` 2.61 + if [ "$PREFIX" ]; then 2.62 + if [ "$PAGENAME" = "FrontPage" ]; then 2.63 + PAGENAME="$PREFIX" 2.64 + else 2.65 + PAGENAME="$PREFIX/$PAGENAME" 2.66 + fi 2.67 + fi 2.68 + echo "AddRevision|$BASENAME|$PAGENAME" >> "$MANIFEST" 2.69 +done 2.70 + 2.71 +if [ ! -e "releases" ]; then 2.72 + echo "No releases to add to the page package!" 1>&2 2.73 +else 2.74 + # Combine the releases with the pages. 2.75 + 2.76 + ATTACHMENT="attachment_" 2.77 + 2.78 + for FILENAME in releases/* ; do 2.79 + BASENAME=`basename "$FILENAME"` 2.80 + cp "$FILENAME" "$OUTDIR/$ATTACHMENT$BASENAME" 2.81 + done 2.82 + 2.83 + # Add the releases to the manifest. 2.84 + 2.85 + for FILENAME in releases/* ; do 2.86 + BASENAME=`basename "$FILENAME"` 2.87 + PAGENAME="Downloads" 2.88 + if [ "$PREFIX" ]; then 2.89 + PAGENAME="$PREFIX/$PAGENAME" 2.90 + fi 2.91 + echo "AddAttachment|$ATTACHMENT$BASENAME|$BASENAME|$PAGENAME" >> "$MANIFEST" 2.92 + done 2.93 +fi 2.94 + 2.95 +zip -j "$OUTDIR" "$OUTDIR/"*
3.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 3.2 +++ b/docs/tools/make_releases.sh Tue Apr 11 21:41:22 2017 +0200 3.3 @@ -0,0 +1,39 @@ 3.4 +#!/bin/sh 3.5 + 3.6 +DIRNAME=`dirname "$0"` 3.7 +PROGNAME=`basename "$0"` 3.8 + 3.9 +ARCHIVE=Lichen 3.10 + 3.11 +if [ ! "$1" ] || [ "$1" = '--help' ] ; then 3.12 + cat 1>&2 <<EOF 3.13 +Usage: $PROGNAME <output directory> [ -f ] 3.14 + 3.15 +Make release archives from tags starting with "rel-" in the repository, 3.16 +storing the archives in the output directory. If an archive already exists for 3.17 +a release, it is not regenerated unless the -f (force) option is given. 3.18 + 3.19 +All newly-created archive filenames are emitted on standard output. 3.20 +EOF 3.21 + exit 1 3.22 +fi 3.23 + 3.24 +OUTDIR=$1 3.25 +FORCE=$2 3.26 + 3.27 +if [ "$FORCE" != '-f' ]; then 3.28 + FORCE= 3.29 +fi 3.30 + 3.31 +if [ ! -e "$OUTDIR" ]; then 3.32 + mkdir -p "$OUTDIR" 3.33 +fi 3.34 + 3.35 +for TAG in `hg tags | grep ^rel- | cut -f 1 -d ' '` ; do 3.36 + NUM=`echo "$TAG" | sed 's/rel-//;s/-/./g'` 3.37 + OUTFILE="$OUTDIR/$ARCHIVE-$NUM.tar.bz2" 3.38 + if [ ! -e "$OUTFILE" ] || [ "$FORCE" ]; then 3.39 + hg archive -t tbz2 -r "$TAG" "$OUTFILE" 3.40 + echo "$OUTFILE" 3.41 + fi 3.42 +done
4.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 4.2 +++ b/docs/tools/sign_releases.sh Tue Apr 11 21:41:22 2017 +0200 4.3 @@ -0,0 +1,39 @@ 4.4 +#!/bin/sh 4.5 + 4.6 +DIRNAME=`dirname "$0"` 4.7 +PROGNAME=`basename "$0"` 4.8 + 4.9 +if [ ! "$1" ] || [ "$1" = '--help' ] ; then 4.10 + cat 1>&2 <<EOF 4.11 +Usage: $PROGNAME <archive directory> [ -f ] 4.12 + 4.13 +Sign archives in the given archive directory, invoking GPG to produce a 4.14 +detached signature. If a signature already exists for an archive, it is not 4.15 +regenerated unless the -f (force) option is given. 4.16 + 4.17 +All newly-created signature filenames are emitted on standard output. 4.18 +EOF 4.19 + exit 1 4.20 +fi 4.21 + 4.22 +OUTDIR=$1 4.23 +FORCE=$2 4.24 + 4.25 +if [ "$FORCE" != '-f' ]; then 4.26 + FORCE= 4.27 +fi 4.28 + 4.29 +if [ ! -e "$OUTDIR" ]; then 4.30 + cat 1>&2 <<EOF 4.31 +No archive directory to process. 4.32 +EOF 4.33 + exit 1 4.34 +fi 4.35 + 4.36 +for FILENAME in "$OUTDIR/"*".tar.bz2" ; do 4.37 + OUTFILE="$FILENAME.asc" 4.38 + if [ ! -e "$OUTFILE" ] || [ "$FORCE" ]; then 4.39 + gpg --sign -a -b "$FILENAME" 4.40 + echo "$OUTFILE" 4.41 + fi 4.42 +done
5.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 5.2 +++ b/docs/wiki/APIs Tue Apr 11 21:41:22 2017 +0200 5.3 @@ -0,0 +1,25 @@ 5.4 += APIs = 5.5 + 5.6 +The principal application programming interfaces (APIs) within Lichen are described below. 5.7 + 5.8 +== Modules == 5.9 + 5.10 +When [[../Inspection|inspecting]] and [[../Translation|translating]] modules, common abstractions are used so that elements of the program are handled in similar ways. Various useful attributes and methods are provided by the `CommonModule` abstraction for use within `InspectedModule` and `TranslatedModule` methods: 5.11 + 5.12 +{{{#!table 5.13 +'''Attribute or Method''' || '''Purpose''' 5.14 +== 5.15 +`name` || 5.16 +An attribute providing the module name. 5.17 +== 5.18 +`get_global_path(name)` || 5.19 +A method returning the qualified name of the given global name within the module being processed. 5.20 +== 5.21 +`get_namespace_path()` || 5.22 +A method returning the qualified name of the namespace being processed. 5.23 + * For modules, this is the module name 5.24 + * For classes, functions and methods, the path incorporates the module name and namespaces leading to the current namespace itself 5.25 +== 5.26 +`get_object_path(name)` || 5.27 +A method returning the qualified name of the given local name within the namespace being processed. 5.28 +}}}
6.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 6.2 +++ b/docs/wiki/Builtins Tue Apr 11 21:41:22 2017 +0200 6.3 @@ -0,0 +1,44 @@ 6.4 += Built-Ins = 6.5 + 6.6 +The "built-ins" are a collection of special names that do not need to be explicitly imported. For example: 6.7 + 6.8 +{{{#!python numbers=disable 6.9 +biggest = max([23, 19, 27]) # max is a built-in function 6.10 +}}} 6.11 + 6.12 +Such names are always available as if they were defined in the current module. However, they are provided by a package hierarchy that seeks to divide them into isolated areas of functionality that can be included or excluded depending on the facilities employed by each program. For example, complex numbers are provided in the `__builtins__.complex` module, but for any program not employing complex numbers, this module will be excluded and its functionality not appear in the final program. (The exclusion of modules is achieved using the hidden module functionality provided by the [[../Imports|import mechanism]].) 6.13 + 6.14 +The principal, "top-level" module providing built-ins is the `__builtins__` module whose only role is to expose the actual built-in names. It does so by importing names directly from the different submodules, such as `__builtins__.complex`, so that attempts to import names from `__builtins__` may provide such attempts with the named objects. The `__builtins__` module looks like this in such cases: 6.15 + 6.16 +{{{#!python numbers=disable 6.17 +from __builtins__.complex import complex 6.18 +}}} 6.19 + 6.20 +Accesses to built-in names use the same technique of importing a name (`complex`) rather than the module providing it (`__builtins__`). It is as if the following code would appear before usage of a built-in name: 6.21 + 6.22 +{{{#!python numbers=disable 6.23 +from __builtins__ import complex 6.24 +}}} 6.25 + 6.26 +Since it is the specific name that is being referenced, not the module, the other contents of the module can be ignored and the reference to the named object followed to its actual definition location. Thus, usage of `complex` causes `__builtins__.complex` to be included in the program so that it may provide the `complex` type. 6.27 + 6.28 +Thus, it becomes possible to keep a module like `__builtins__` out of the program since its only role would be to hold references to other modules' objects, but such specific imports permit the module to be bypassed by just following the declared import relationships. However, consider the consequences of `__builtins__` being imported as follows: 6.29 + 6.30 +{{{#!python numbers=disable 6.31 +import __builtins__ 6.32 +}}} 6.33 + 6.34 +Its entire contents would then need to be exposed because it would then be possible to access any name provided by the module via the module. This was not the case with a specific name import because there was no way of referencing the module itself as a result of such an import. 6.35 + 6.36 +This would then cause all of the referenced modules to be imported because it would no longer be possible to readily identify the modules that would actually be needed by the program. For example: 6.37 + 6.38 +{{{#!python numbers=disable 6.39 +def get_things(obj): 6.40 + obj.complex 6.41 + obj.function 6.42 + obj.list 6.43 + 6.44 +get_things(__builtins__) 6.45 +}}} 6.46 + 6.47 +Of course, no module in a program should be referencing the `__builtins__` module by explicitly importing it, anyway. Given that all the names provided by that module are already available without any need to perform an import operation, such an import operation would have rather limited additional benefits.
7.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 7.2 +++ b/docs/wiki/Cache Tue Apr 11 21:41:22 2017 +0200 7.3 @@ -0,0 +1,205 @@ 7.4 += Inspection Cache Files = 7.5 + 7.6 +The results of inspection for each module are written out to cache files, and these files should be able to provide all the information that is gathered during inspection without having to inspect the source code again. One minor benefit of using cached data, instead of having to parse and inspect the source code for a given module, is that of a slightly reduced processing time for the inspection activity. However, an arguably greater benefit is that of being able to see the outcome of the activity as a summary of accumulated data. 7.7 + 7.8 +Each cache file has the following general format: 7.9 + 7.10 +{{{{#!table 7.11 +{{{ 7.12 +<filename> 7.13 + 7.14 +}}} 7.15 +|| The main program filename 7.16 +== 7.17 +{{{ 7.18 +imports: 7.19 +<required module-names> 7.20 +<possibly-required module-names> 7.21 + 7.22 +}}} 7.23 +|| 7.24 + * Comma-separated list of modules explicitly imported by this module 7.25 + * Comma-separated list of modules implicitly imported by this module 7.26 +== 7.27 +{{{ 7.28 +deferred: 7.29 +<deferred references> 7.30 + 7.31 +}}} 7.32 +|| Comma-separated list of references not identified within this module 7.33 +== 7.34 +{{{ 7.35 +special: 7.36 +<special-name> <reference> <qualified-names> 7.37 +... 7.38 + 7.39 +}}} 7.40 +|| Special name and corresponding reference plus comma-separated list of usage namespaces 7.41 +== 7.42 +{{{ 7.43 +members: 7.44 +<qualified-name> <reference> 7.45 +... 7.46 + 7.47 +}}} 7.48 +|| Reference for each member of the static namespace hierarchy of the module 7.49 +== 7.50 +{{{ 7.51 +class relationships: 7.52 +<qualified-class-name> <base-class-references> 7.53 +... 7.54 + 7.55 +}}} 7.56 +|| Comma-separated list of base classes for each class 7.57 +== 7.58 +{{{ 7.59 +instance attributes: 7.60 +<qualified-class-name> <instance-attribute-names> 7.61 +... 7.62 + 7.63 +}}} 7.64 +|| Comma-separated list of instance attributes for each class 7.65 +== 7.66 +{{{ 7.67 +instance attribute constants: 7.68 +<qualified-class-name> <attribute-name> <reference> 7.69 +... 7.70 + 7.71 +}}} 7.72 +|| Reference for the named constant instance attribute in the given class 7.73 +== 7.74 +{{{ 7.75 +names used: 7.76 +<qualified-class/function/module-name> <names> 7.77 +... 7.78 + 7.79 +}}} 7.80 +|| Comma-separated list of names in each namespace 7.81 +== 7.82 +{{{ 7.83 +name references: 7.84 +<qualified-name> <reference> 7.85 +... 7.86 + 7.87 +}}} 7.88 +|| Correspondence between name and resolved identity 7.89 +== 7.90 +{{{ 7.91 +initialised-names: 7.92 +<qualified-name> <definition-version> <reference> 7.93 +... 7.94 + 7.95 +}}} 7.96 +|| Identity of a given definition of a name 7.97 +== 7.98 +{{{ 7.99 +aliased-names: 7.100 +<qualified-name> <definition-version> <original-name> <attribute-names> <access-number> 7.101 +... 7.102 + 7.103 +}}} 7.104 +|| Name definition by access operation 7.105 +== 7.106 +{{{ 7.107 +function parameters: 7.108 +<qualified-function-name> <parameter-names> 7.109 +... 7.110 + 7.111 +}}} 7.112 +|| Comma-separated list of parameters for each function 7.113 +== 7.114 +{{{ 7.115 +function default parameters: 7.116 +<qualified-function-name> <parameter-names-with-defaults> 7.117 +... 7.118 + 7.119 +}}} 7.120 +|| 7.121 +Comma-separated parameter definitions for each function, with each definition being of the form... 7.122 +{{{ 7.123 +<name>=<references> 7.124 +}}} 7.125 +...and with the references being semicolon-separated 7.126 +== 7.127 +{{{ 7.128 +function locals: 7.129 +<qualified-function-name> <local-variable-name> <reference> 7.130 +... 7.131 + 7.132 +}}} 7.133 +|| Identity of the given local name in the given class 7.134 +== 7.135 +{{{ 7.136 +scope globals: 7.137 +<qualified-function-name> <global-variable-names> 7.138 +... 7.139 + 7.140 +}}} 7.141 +|| Comma-separated list of global names in each namespace 7.142 +== 7.143 +{{{ 7.144 +attribute usage: 7.145 +<qualified-name> <local/global/qualified-variable-name> <usages> 7.146 +... 7.147 + 7.148 +}}} 7.149 +|| 7.150 +Attribute usage details for the given name in the given namespace, with usages being a semicolon-separated list of usage alternatives, each being a comma-separated list of attribute names or {} (meaning no attribute names used), attribute names employing ! if invoked 7.151 +== 7.152 +{{{ 7.153 +attribute accesses: 7.154 +<qualified-name> <attribute-chains> 7.155 +... 7.156 + 7.157 +}}} 7.158 +|| Comma-separated list of attribute chains used on anonymous/unidentified objects 7.159 +== 7.160 +{{{ 7.161 +constant accesses: 7.162 +<qualified-function-name> <attribute-chain> <reference> <remaining attribute-chain> 7.163 +... 7.164 + 7.165 +}}} 7.166 +|| Identity of the given attribute chain in the given namespace, with any unresolved attribute chain provided 7.167 +== 7.168 +{{{ 7.169 +attribute access usage: 7.170 +<qualified-function-name> <local/global-variable-name> <attribute-name> <definition-versions> 7.171 +... 7.172 + 7.173 +}}} 7.174 +|| Indicates, for each access involving the given name and first attribute name in the given namespace, the definitions that may provide the name 7.175 +== 7.176 +{{{ 7.177 +attribute access-modifiers: 7.178 +<qualified-function-name> <local/global-variable-name> <attribute-name> <access-modifiers> 7.179 +... 7.180 + 7.181 +}}} 7.182 +|| Indicates, for accesses involving the given name and first attribute name in the given namespace, the modifiers applying to each access, where = indicates assignment, ! indicates invocation, and _ indicates access 7.183 +== 7.184 +{{{ 7.185 +constant literals: 7.186 +<qualified-name> <value-type> <encoding> <constant-literal> 7.187 +... 7.188 + 7.189 +}}} 7.190 +|| Describes a constant literal in the given namespace having the indicated type, encoding (if a string), and value 7.191 +== 7.192 +{{{ 7.193 +constant values: 7.194 +<qualified-name> <value-type> <encoding> <constant-literal> 7.195 +... 7.196 + 7.197 +}}} 7.198 +|| Describes a constant literal identified using a locally-qualified name 7.199 +== 7.200 +{{{ 7.201 +exception namespaces: 7.202 +<qualified-names> 7.203 +... 7.204 +}}} 7.205 +|| Comma-separated list of namespaces that need to handle exceptions 7.206 +}}}} 7.207 + 7.208 +A qualified name is a name prefixed with the namespace it appears in.
8.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 8.2 +++ b/docs/wiki/Deduction Tue Apr 11 21:41:22 2017 +0200 8.3 @@ -0,0 +1,458 @@ 8.4 += Deducing Program Behaviour = 8.5 + 8.6 +With information from [[../Inspection|inspection]] available, the isolated observations about operations in a program may now be combined with knowledge about the program's structure to produce deductions about the nature of each operation. 8.7 + 8.8 +<<TableOfContents(2,3)>> 8.9 + 8.10 +== The Deduction Process == 8.11 + 8.12 +The deduction process takes observations made during the [[../Inspection|inspection process]] and attempts to form deductions about the behaviour of the program primarily in terms of the nature of the attribute '''accesses''', with their corresponding '''accessors''', featuring in the program. Where attributes are used in conjunction with names, accessors are name versions; where attributes are used in conjunction with other expressions, accessors are '''anonymous'''. 8.13 + 8.14 +=== Indexes === 8.15 + 8.16 +For efficiency, indexes are defined to establish relationships between the following things: 8.17 + 8.18 +{{{#!table 8.19 +'''Indexes''' || '''Details''' 8.20 +== 8.21 +`access_index` || Which accessors (name versions) are involved with access operations 8.22 +== 8.23 +`location_index` || Which attribute usage patterns are supported by accessors (name versions) 8.24 +== 8.25 +`attr_class_types`<<BR>>`attr_instance_types`<<BR>>`attr_module_types` 8.26 +|| Which types support which attribute names 8.27 +== 8.28 +`assigned_attrs` 8.29 +|| Which usage patterns involve attribute assignment 8.30 +== 8.31 +`reference_assignments` 8.32 +|| Which accesses involve assignments 8.33 +== 8.34 +`reference_invocations` 8.35 +|| Which accesses involve invocations 8.36 +== 8.37 +`alias_index` 8.38 +|| Which names are aliases for other names, accesses or invocations 8.39 +}}} 8.40 + 8.41 +The objective of deduction is to combine these indexes to establish new relationships between the different participants of these basic index relationships. 8.42 + 8.43 +=== Building Indexes === 8.44 + 8.45 +The building of indexes from inspection data is approximately as follows: 8.46 + 8.47 +{{{#!graphviz 8.48 +//format=svg 8.49 +//transform=notugly 8.50 +digraph indexes { 8.51 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Indexes"]; 8.52 + edge [tooltip="Indexes"]; 8.53 + rankdir=LR; 8.54 + 8.55 + all_attr_accesses [label="all_attr_accesses\n(anonymous accesses)"]; 8.56 + attr_usage [label="attr_usage\n(attribute usage)"]; 8.57 + location_index [label="location_index\n(usage by accessor)"]; 8.58 + 8.59 + all_attrs [label="all_class_attrs | all_instance_attrs | all_module attrs | (all attributes)",shape=record]; 8.60 + attr_types [label="attr_class_types | attr_instance_types | attr_module_types | (types by attribute name)",shape=record]; 8.61 + 8.62 + attr_accessors [label="attr_accessors\n(accessors by access)"]; 8.63 + access_index [label="access_index\n(accessor locations by access location)"]; 8.64 + 8.65 + all_attr_access_modifiers [label="all_attr_access_modifiers\n(operations/modifiers by access)"]; 8.66 + reference_assignments [label="reference_assignments\n(assignment accesses)"]; 8.67 + reference_invocations [label="reference_invocations\n(invocation accesses)"]; 8.68 + assigned_attrs [label="assigned_attrs\n(assignment accesses by access)"]; 8.69 + 8.70 + all_aliased_names [label="all_aliased_names\n(accesses by alias)"]; 8.71 + alias_index [label="alias_index\n(access locations by accessor/alias location)"]; 8.72 + 8.73 + init_usage_index [label="init_usage_index",shape=ellipse]; 8.74 + init_attr_type_indexes [label="init_attr_type_indexes",shape=ellipse]; 8.75 + init_accessors [label="init_accessors",shape=ellipse]; 8.76 + init_accesses [label="init_accesses",shape=ellipse]; 8.77 + init_aliases [label="init_aliases",shape=ellipse]; 8.78 + 8.79 + all_attr_accesses -> init_usage_index; 8.80 + attr_usage -> init_usage_index; 8.81 + init_usage_index -> location_index; 8.82 + 8.83 + all_attrs -> init_attr_type_indexes -> attr_types; 8.84 + 8.85 + attr_accessors -> init_accessors -> access_index; 8.86 + 8.87 + all_attr_access_modifiers -> init_accesses; 8.88 + init_accesses -> reference_assignments; 8.89 + init_accesses -> reference_invocations; 8.90 + init_accesses -> assigned_attrs; 8.91 + 8.92 + all_aliased_names -> init_aliases -> alias_index; 8.93 +} 8.94 +}}} 8.95 + 8.96 +=== Deriving Types from Indexes and Accesses === 8.97 + 8.98 +The indexes are employed in deduction approximately as follows: 8.99 + 8.100 +{{{#!graphviz 8.101 +//format=svg 8.102 +//transform=notugly 8.103 +digraph deduction { 8.104 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Deduction"]; 8.105 + edge [tooltip="Deduction"]; 8.106 + rankdir=LR; 8.107 + 8.108 + all_attr_accesses [label="all_attr_accesses\n(anonymous accesses)"]; 8.109 + location_index [label="location_index\n(usage by accessor)"]; 8.110 + 8.111 + attr_types [label="attr_class_types | attr_instance_types | attr_module_types | (types by attribute name)",shape=record]; 8.112 + 8.113 + all_initialised_names [label="all_initialised_names\n(types by accessor)"]; 8.114 + 8.115 + access_index [label="access_index\n(accessor locations by access location)"]; 8.116 + 8.117 + alias_index [label="alias_index\n(access locations by accessor/alias location)"]; 8.118 + 8.119 + record_types_for_usage [label="record_types_for_usage",shape=ellipse]; 8.120 + record_types_for_attribute [label="record_types_for_attribute",shape=ellipse]; 8.121 + 8.122 + accessor_types [label="accessor_class_types | accessor_instance_types | accessor_module_types | (types by accessor)",shape=record]; 8.123 + provider_types [label="provider_class_types | provider_instance_types | provider_module_types | (provider types by accessor)",shape=record]; 8.124 + 8.125 + location_index -> record_types_for_usage; 8.126 + attr_types -> record_types_for_usage; 8.127 + all_initialised_names -> record_types_for_usage; 8.128 + access_index -> record_types_for_usage; 8.129 + alias_index -> record_types_for_usage; 8.130 + 8.131 + record_types_for_usage -> provider_types; 8.132 + record_types_for_usage -> accessor_types; 8.133 + 8.134 + attr_types -> record_types_for_attribute; 8.135 + all_attr_accesses -> record_types_for_attribute; 8.136 + 8.137 + record_types_for_attribute -> provider_types; 8.138 + record_types_for_attribute -> accessor_types; 8.139 +} 8.140 +}}} 8.141 + 8.142 +=== Converting Usage to Types === 8.143 + 8.144 +A particularly important operation in the deduction process is that of converting attribute usage information to a set of types supporting such usage. Taking the mapping of attribute names to types, each attribute name provided by a usage observation can be applied, progressively narrowing the set of types available to support the complete attribute usage collection. 8.145 + 8.146 +{{{#!graphviz 8.147 +//format=svg 8.148 +//transform=notugly 8.149 +digraph usage_to_types { 8.150 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Usage to types"]; 8.151 + edge [tooltip="Usage to types"]; 8.152 + rankdir=LR; 8.153 + 8.154 + usage [label="(a, b, c)",style=filled,fillcolor=darkorange]; 8.155 + 8.156 + subgraph { 8.157 + rank=same; 8.158 + attr_a [label="attribute a",style=filled,fillcolor=gold]; 8.159 + attr_b [label="attribute b",style=filled,fillcolor=gold]; 8.160 + attr_c [label="attribute c",style=filled,fillcolor=gold]; 8.161 + } 8.162 + 8.163 + index [label="types by attribute name",shape=ellipse]; 8.164 + 8.165 + subgraph { 8.166 + rank=same; 8.167 + type_P [label="type P"]; 8.168 + type_Q [label="type Q",style=filled,fillcolor=gold]; 8.169 + type_R [label="type R"]; 8.170 + type_S [label="type S"]; 8.171 + } 8.172 + 8.173 + deduction [label="(a, b, c) needs type Q",style=filled,fillcolor=darkorange]; 8.174 + 8.175 + usage -> attr_a; 8.176 + usage -> attr_b; 8.177 + usage -> attr_c; 8.178 + 8.179 + attr_a -> attr_b -> attr_c [style=dashed,dir=none]; 8.180 + attr_a -> index -> type_P [style=dashed,dir=none]; 8.181 + type_P -> type_Q -> type_R -> type_S [style=dashed,dir=none]; 8.182 + 8.183 + attr_a -> type_P; 8.184 + attr_a -> type_Q; 8.185 + attr_b -> type_Q; 8.186 + attr_b -> type_R; 8.187 + attr_c -> type_Q; 8.188 + attr_c -> type_S; 8.189 + 8.190 + type_Q -> deduction; 8.191 +} 8.192 +}}} 8.193 + 8.194 +The types supporting attribute usage are the attribute '''providers'''. Where providers are classes, the affected accessors in the program may also be instances, since instances also support access to attributes of the instantiated class (and its ancestors). 8.195 + 8.196 +Indexes mapping attributes to types must combine consideration of class and instance attributes in order to correctly identify instance providers. Consider the following example: 8.197 + 8.198 +{{{#!graphviz 8.199 +//format=svg 8.200 +//transform=notugly 8.201 +digraph instance_providers { 8.202 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Instance providers"]; 8.203 + edge [tooltip="Instance providers"]; 8.204 + rankdir=LR; 8.205 + 8.206 + usage [label="(a, b, c)",style=filled,fillcolor=darkorange]; 8.207 + 8.208 + subgraph { 8.209 + rank=same; 8.210 + combined [label="combined attributes",shape=ellipse]; 8.211 + attr_a [label="attribute a",style=filled,fillcolor=gold]; 8.212 + attr_c [label="attribute c",style=filled,fillcolor=gold]; 8.213 + attr_b [label="attribute b",style=filled,fillcolor=gold]; 8.214 + } 8.215 + 8.216 + subgraph { 8.217 + rank=same; 8.218 + class_C [label="class C"]; 8.219 + attr_Ca [label="attribute C.a",style=filled,fillcolor=gold]; 8.220 + attr_Cc [label="attribute C.c",style=filled,fillcolor=gold]; 8.221 + instance_C [label="instance of C"]; 8.222 + attr_Ib [label="attribute (C).b",style=filled,fillcolor=gold]; 8.223 + } 8.224 + 8.225 + type_C [label="type C",style=filled,fillcolor=darkorange]; 8.226 + 8.227 + combined -> attr_a -> attr_c -> attr_b [style=dashed,dir=none]; 8.228 + class_C -> attr_Ca -> attr_Cc [style=dashed,dir=none]; 8.229 + instance_C -> attr_Ib [style=dashed,dir=none]; 8.230 + 8.231 + usage -> attr_a -> attr_Ca; 8.232 + usage -> attr_b -> attr_Ib; 8.233 + usage -> attr_c -> attr_Cc; 8.234 + 8.235 + attr_Ca -> type_C; 8.236 + attr_Cc -> type_C; 8.237 + attr_Ib -> type_C; 8.238 +} 8.239 +}}} 8.240 + 8.241 +To recognise support by instance accessors for the usage pattern concerned, attribute details must be obtained from classes as well as instances. Note that the identified type cannot support such usage purely by providing class or instance attributes alone. 8.242 + 8.243 +=== Attribute Assignments === 8.244 + 8.245 +Since attribute access operations may occur as part of assignments, the nature of accesses is recorded during inspection. Combining the indexed information for assignment accesses allows the deducer to determine the most pessimistic effects on the program structure of such assignments. 8.246 + 8.247 +Taking each attribute usage set employed by accessors involved in assignments, the types are deduced for such accessors, and each individual attribute known to be used in such assignments is then applied to the deduced types, '''mutating''' them in such a way that deductions may no longer assume a fixed identity for such attributes when obtained from these types. 8.248 + 8.249 +{{{#!graphviz 8.250 +//format=svg 8.251 +//transform=notugly 8.252 +digraph assignments { 8.253 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Attribute assignments"]; 8.254 + edge [tooltip="Attribute assignments"]; 8.255 + rankdir=LR; 8.256 + 8.257 + subgraph { 8.258 + rank=same; 8.259 + usage [label="(a, b, c)",style=filled,fillcolor=darkorange]; 8.260 + assigned [label="b is assigned",style=filled,fillcolor=darkorange]; 8.261 + } 8.262 + 8.263 + deduction [label="(a, b, c) needs type Q",style=filled,fillcolor=gold]; 8.264 + 8.265 + subgraph { 8.266 + rank=same; 8.267 + type_Q [label="type Q",style=filled,fillcolor=gold]; 8.268 + attr_Qa [label="attribute Q.a"]; 8.269 + attr_Qb [label="attribute Q.b\n(mutated)",style=filled,fillcolor=darkorange]; 8.270 + attr_Qc [label="attribute Q.c"]; 8.271 + } 8.272 + 8.273 + type_Q -> attr_Qa -> attr_Qb -> attr_Qc [style=dashed,dir=none]; 8.274 + usage -> deduction -> type_Q; 8.275 + assigned -> attr_Qb; 8.276 +} 8.277 +}}} 8.278 + 8.279 +=== Refining Type Deductions === 8.280 + 8.281 +In the context of a specific access, the types may be resolved further: 8.282 + 8.283 + * Any name whose initialisation could be determined during inspection can be associated with its initialised type 8.284 + * Any name referring to a constant object can be associated with the type of that object 8.285 + * Usage of `self` in methods can result in only compatible class and instance types being retained from the types obtained from usage deductions 8.286 + 8.287 +=== Reference Identification === 8.288 + 8.289 +The basic information about every accessor and accessed attribute in a program can now be combined to produce specific '''references''' to identities consistent with the inspection observations. Several different kinds of accessors and access situations exist: 8.290 + 8.291 + * Name-based accesses involving attribute usage 8.292 + * Aliases to names, possibly accompanied by accesses 8.293 + * Anonymous accesses involving individual attributes 8.294 + * Constant or previously-identified names, possibly accompanied by accesses 8.295 + 8.296 +=== Aliases === 8.297 + 8.298 +Names initialised using other names or attribute accesses, or using the invocation of either of these things, are known as '''aliases'''. Information about aliases originates during inspection when such names are initialised with expression results within the program. During the resolution activity finalising the inspected details, this initialisation information is used to define relationships between aliases and other names and accesses. 8.299 + 8.300 +During deduction, attribute accesses are investigated and type information computed for both attribute accesses and accessors. The relationships defined between accesses and aliases can then be employed to propagate such deduced information to the aliases and to define appropriate type characteristics for them. 8.301 + 8.302 +Where aliases are used as the basis of attribute accesses, this propagation refines the previously deduced information about the aliases and may also refine the details of the accesses with which the alias is involved. 8.303 + 8.304 +=== Invocation Target Suitability === 8.305 + 8.306 +Having identified accessed attributes, invocation information can be applied in cases where such attributes immediately participate in an invocation, comparing the specified argument details against the parameter details defined for the identified attribute, which must be a callable object for this technique to work. Where arguments do not appear to be suitable - either the number of arguments is incorrect or any keyword argument refer to non-existent parameters - the attribute providing the parameter details can be considered unsuitable for the access. 8.307 + 8.308 +=== Classifying Accessors === 8.309 + 8.310 +Each accessor, being a particular version of a name, will have type information associated with it as a consequence of the deduction process. Such information takes the following forms: 8.311 + 8.312 + * Provider types, indicating which types may provide the attributes used by the accessor 8.313 + * Accessor types, indicating which types will actually appear as the accessor 8.314 + 8.315 +This information can be processed in a number of ways to produce the following: 8.316 + 8.317 + * All types (from all kinds of type) of providers able to provide attributes via the accessor 8.318 + * All types (from all kinds of type) of accessors compatible with the accessor 8.319 + * The most general types of accessors compatible with the accessor 8.320 + 8.321 +Where many types may be associated with an accessor, identifying the most general types involves eliminating those which are derived from others. Given that descendant types may not remove support for attributes provided by their ancestors, then where an ancestor type has been identified as a possible accessor, it should follow that all of its descendants may also have been identified as possible accessors. Since these descendants should be compatible, identifying them individually is unnecessary: merely specifying that the common ancestor or ''any'' descendant could provide an accessor is sufficient. 8.322 + 8.323 +==== Defining Guards ==== 8.324 + 8.325 +A '''guard''' is a constraint supported by a run-time test indicating the compliance of an accessor with a particular type. Thus, where a single accessor type has been identified, a guard may be established for an accessor that tests the type of the accessor against a specific type. Where a single ''general'' accessor type is identified, a guard is established that tests the accessor against a "common" type: that is, an ancestor type with which other descendant types may comply. 8.326 + 8.327 +=== Classifying Accesses === 8.328 + 8.329 +At the point of classifying accesses, information is available about the following: 8.330 + 8.331 + * The accessors potentially involved in each access 8.332 + * The types of accessors and the types providing attributes via those accessors 8.333 + * Any guards applying to the accessors 8.334 + * Whether an access is constrained by certain program characteristics and is thus guaranteed to be as deduced 8.335 + * The possible attributes referenced by the access 8.336 + 8.337 +This information can be processed in a number of ways to produce the following: 8.338 + 8.339 + * The types of accessors, both general and specific, applying to each access 8.340 + * The attributes that can be provided by each access, consolidating existing referenced attribute details 8.341 + * The general types providing the attributes 8.342 + 8.343 +Since more than one accessor may be involved, information from all accessors must be combined to determine whether guard information still applies to the access, or whether it is possible for an accessor to be used that has an uncertain type at run-time. 8.344 + 8.345 +==== Defining Tests ==== 8.346 + 8.347 +A '''test''' at the access level is a necessary check performed on an accessor before an access to determine whether it belongs to a certain type or group of types. 8.348 + 8.349 +Where guards applying to accessors apply by extension to an access, it may not be enough to assume that the the attributes exposed by the accessor are the same as those considered acceptable through deduction. Therefore, attributes are obtained for the access using the applicable guard types as accessors. If this set of attributes does not include potentially accessible attributes (perhaps because the guard types are broadly defined and do not support all attribute usage), a test must be generated. 8.350 + 8.351 +Where a single attribute provider can be identified, a test for a specific type is generated. Where a single general type can be identified as a provider, a test against a "common" type is generated. Tests involving the built-in `object` are not generated since it is the root of all classes and such tests would merely identify an accessor as a class. In all cases where a single, specific type cannot be tested for, the general attribute validation mechanism must be used instead. 8.352 + 8.353 +== Preparing Access Descriptions == 8.354 + 8.355 +The accumulated deduced knowledge is directed towards producing access descriptions or plans which characterise each access in terms of the following: 8.356 + 8.357 + * The initial accessor, being the starting object for attribute accesses, unless a static accessor has been identified 8.358 + * Details of any test required on the initial accessor 8.359 + * Details of any type employed by the test 8.360 + * Any identified static accessor (to be used as the initial accessor) 8.361 + * Attributes needing to be traversed from the accessor that yield unambiguous objects 8.362 + * Access modes for each of the unambiguously-traversed attributes 8.363 + * Remaining attributes needing to be tested and traversed (after having traversed the above attributes) 8.364 + * Details of the context 8.365 + * Any test to apply to the context (to ensure its validity) 8.366 + * The method of obtaining the first attribute 8.367 + * The method of obtaining the final attribute 8.368 + * Any identified static final attribute 8.369 + * The kinds of objects providing the final attribute 8.370 + 8.371 +=== Characterising the Accessor === 8.372 + 8.373 +For a given access location, the referenced attribute details established during deduction are used to determine... 8.374 + 8.375 + * Whether the initial accessor is static, originating from a constant access or involving an identifiable static object 8.376 + * Whether the initial accessor is dynamic but has a known, deduced identity 8.377 + 8.378 +Some useful information about the accessor and about the actual provider of the first accessed attribute can be defined: 8.379 + 8.380 +|| || '''Class''' || '''Instance''' || '''Module''' || 8.381 +|| '''Accessor''' || Class accessor || Instance accessor || Module accessor || 8.382 +|| '''Provider''' || Class provider || Instance provider || || 8.383 + 8.384 +Depending on which of these criteria are satisfied, some properties of the access operation can be established: 8.385 + 8.386 + * Object-relative accesses occur with class accessors or module accessors or when attributes are provided by instances 8.387 + * Class-relative accesses occur with instance accessors when attributes are provided by classes 8.388 + 8.389 +Object-relative accesses merely involve obtaining attributes directly from accessors. Class-relative accesses involve obtaining the class of each accessor and then obtaining an attribute from that class. 8.390 + 8.391 +=== Defining the First Access Method === 8.392 + 8.393 +For dynamic or unidentified accessors, the above access properties define the access method on the first attribute in the chain of attributes provided. However, any redefinition of the accessor to a static accessor may influence the first method. For static accessors, the first method is always object-relative since classes and modules do not offer transparent mechanisms to expose the attributes on their own classes. 8.394 + 8.395 +Static and identified, dynamic accessors should already be known to support the specified attributes. Other accessors require an access method to be used that also checks whether the accessor supports a given attribute. 8.396 + 8.397 +=== Redefining the Accessor === 8.398 + 8.399 +With knowledge about the initial accessor, the attributes involved in the access operation are then considered in the context of the accessor. For each attribute name in the chain described for an access, an attempt is made to obtain the details of the attribute of the given name from the accessor, determining whether these details can be used as an accessor to obtain subsequent attribute details. 8.400 + 8.401 +Where more than one attribute identity is obtained, traversal is terminated: all remaining attributes must be traversed at run-time. If an attribute identified during traversal is static, the first access method changes accordingly. 8.402 + 8.403 +=== Defining the Final Access Method === 8.404 + 8.405 +An access can also involve an assignment to the accessed attribute or the invocation of the accessed attribute. Such operations change the nature of the access method used on the final attribute in a chain of attribute names. 8.406 + 8.407 +=== Identifying Context Information === 8.408 + 8.409 +Final attribute accesses involving callables need to yield context information that can subsequently be used to invoke those callables. Where the nature of an accessed attribute is not known, a simplistic attempt can be made to look up all attributes stored using the attribute name in the program. 8.410 + 8.411 +Of particular interest are the following situations: 8.412 + 8.413 + * Where class attributes are being accessed via instances, whether the attributes are all methods that can be bound upon access 8.414 + * Where class attributes may be accessed via instances, whether any attributes could be methods 8.415 + 8.416 +Such considerations dictate whether the context information originates from the attribute or from the accessor and whether any run-time test is required to determine this. 8.417 + 8.418 +== Preparing Instruction Plans == 8.419 + 8.420 +Instruction plans are sequences of program operations, encoded in a higher-level form than actual program instructions, that describe the steps needed to access attributes. Such sequences are produced from the details provided by individual access plans. 8.421 + 8.422 +=== Original Accessor === 8.423 + 8.424 +The expression providing the object from which the first attribute is obtained (or the only attribute if only one is specified) is known as the original accessor. Where this accessor can be identified, the expression describing it in the program can potentially be replaced with a static reference, and subsequent mentions of the accessor can potentially be replaced with such references or names used as expressions. 8.425 + 8.426 +|| '''Access Plan Information''' || '''Original Accessor''' || 8.427 +|| Static accessor identified || Identified accessor || 8.428 +|| Named accessor access, not invocation || Indicated name || 8.429 +|| Named accessor invocation, accessor known to provide the attribute || Indicated name || 8.430 +|| Named accessor invocation, accessor not known to provide the attribute || Accessor expression || 8.431 +|| Other accessors || Accessor expression || 8.432 + 8.433 +By using names or static references, the need to store the result of evaluating an accessor expression is eliminated because such labels can be inserted in the instruction sequence as required. 8.434 + 8.435 +=== First Access Operation === 8.436 + 8.437 +Although it may be possible to convert accesses into single instructions that obtain attributes directly, many accesses will involve access operations that must consult an accessor object and obtain an attribute from that object, at least for the first attribute in a chain of attributes. This occurs when the access plan indicates the following situations: 8.438 + 8.439 + * Final method is an access (meaning that an attribute cannot be directly obtained) 8.440 + * Final method is an assignment (requiring the object whose attribute will be updated) 8.441 + * Attributes (identified or otherwise) need traversing 8.442 + 8.443 +=== Accessor Nature === 8.444 + 8.445 +Attribute assignments involve a single '''target accessor''' and potentially many other accessors, depending on how many distinct expressions are evaluated to yield the value to be set in the assignment. Such a target accessor will usually be derived from the evaluation of an expression, and in some cases the expression will be the result of an opaque operation such as the invocation of a function. In such cases, the target accessor is stored in a temporary variable for subsequent use in access operations. 8.446 + 8.447 +=== Context === 8.448 + 8.449 +=== Access Tests === 8.450 + 8.451 +=== Traversing Identified Attributes === 8.452 + 8.453 +=== Traversing Unidentified Attributes === 8.454 + 8.455 +=== Final Access Operation === 8.456 + 8.457 +=== Context Testing === 8.458 + 8.459 +== Deduction Products == 8.460 + 8.461 +The deduction process should produce a complete catalogue of accessor and access references that may then be consulted by the [[../Translation|translation]] process needing to know the nature of any operation within the program. Central to the translation process's understanding of references is the '''attribute access plan''' for each reference which characterises each access and provides the basis for the formulation of the '''instruction plan''' used to replicate it in the final program.
9.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 9.2 +++ b/docs/wiki/Design Tue Apr 11 21:41:22 2017 +0200 9.3 @@ -0,0 +1,469 @@ 9.4 += Design Decisions = 9.5 + 9.6 +The Lichen language design involves some different choices to those taken in Python's design. Many of these choices are motivated by the following criteria: 9.7 + 9.8 + * To simplify the language and to make what programs do easier to understand and to predict 9.9 + * To make analysis of programs easier, particularly [[../Deduction|deductions]] about the nature of the code 9.10 + * To simplify and otherwise reduce the [[../Representations|representations]] employed and the operations performed at run-time 9.11 + 9.12 +Lichen is in many ways a restricted form of Python. In particular, restrictions on the attribute names supported by each object help to clearly define the object types in a program, allowing us to identify those objects when they are used. Consequently, optimisations that can be employed in a Lichen program become possible in situations where they would have been difficult or demanding to employ in a Python program. 9.13 + 9.14 +Some design choices evoke memories of earlier forms of Python. Removing nested scopes simplifies the [[../Inspection|inspection]] of programs and run-time [[../Representations|representations]] and mechanisms. Other choices seek to remedy difficult or defective aspects of Python, notably the behaviour of Python's [[../Imports|import]] system. 9.15 + 9.16 +<<TableOfContents(2,3)>> 9.17 + 9.18 +== Attributes == 9.19 + 9.20 +{{{#!table 9.21 +'''Lichen''' || '''Python''' || '''Rationale''' 9.22 +== 9.23 +Objects have a fixed set of attribute names 9.24 +|| Objects can gain and lose attributes at run-time 9.25 +|| Having fixed sets of attributes helps identify object types 9.26 +== 9.27 +Instance attributes may not shadow class attributes 9.28 +|| Instance attributes may shadow class attributes 9.29 +|| Forbidding shadowing simplifies access operations 9.30 +== 9.31 +Attributes are simple members of object structures 9.32 +|| Dynamic handling and computation of attributes is supported 9.33 +|| Forbidding dynamic attributes simplifies access operations 9.34 +}}} 9.35 + 9.36 +=== Fixed Attribute Names === 9.37 + 9.38 +Attribute names are bound for classes through assignment in the class namespace, for modules in the module namespace, and for instances in methods through assignment to `self`. Class and instance attributes are propagated to descendant classes and instances of descendant classes respectively. Once bound, attributes can be modified, but new attributes cannot be bound by other means, such as the assignment of an attribute to an arbitrary object that would not already support such an attribute. 9.39 + 9.40 +{{{#!python numbers=disable 9.41 +class C: 9.42 + a = 123 9.43 + def __init__(self): 9.44 + self.x = 234 9.45 + 9.46 +C.b = 456 # not allowed (b not bound in C) 9.47 +C().y = 567 # not allowed (y not bound for C instances) 9.48 +}}} 9.49 + 9.50 +Permitting the addition of attributes to objects would then require that such addition attempts be associated with particular objects, leading to a potentially iterative process involving object type deduction and modification, also causing imprecise results. 9.51 + 9.52 +=== No Shadowing === 9.53 + 9.54 +Instances may not define attributes that are provided by classes. 9.55 + 9.56 +{{{#!python numbers=disable 9.57 +class C: 9.58 + a = 123 9.59 + def shadow(self): 9.60 + self.a = 234 # not allowed (attribute shadows class attribute) 9.61 +}}} 9.62 + 9.63 +Permitting this would oblige instances to support attributes that, when missing, are provided by consulting their classes but, when not missing, may also be provided directly by the instances themselves. 9.64 + 9.65 +=== No Dynamic Attributes === 9.66 + 9.67 +Instance attributes cannot be provided dynamically, such that any missing attribute would be supplied by a special method call to determine the attribute's presence and to retrieve its value. 9.68 + 9.69 +{{{#!python numbers=disable 9.70 +class C: 9.71 + def __getattr__(self, name): # not supported 9.72 + if name == "missing": 9.73 + return 123 9.74 +}}} 9.75 + 9.76 +Permitting this would require object types to potentially support any attribute, undermining attempts to use attributes to identify objects. 9.77 + 9.78 +== Naming == 9.79 + 9.80 +{{{#!table 9.81 +'''Lichen''' || '''Python''' || '''Rationale''' 9.82 +== 9.83 +Names may be local, global or built-in: nested namespaces must be initialised explicitly 9.84 +|| Names may also be non-local, permitting closures 9.85 +|| Limited name scoping simplifies program inspection and run-time mechanisms 9.86 +== 9.87 +`self` is a reserved name and is optional in method parameter lists 9.88 +|| `self` is a naming convention, but the first method parameter must always refer to the accessed object 9.89 +|| Reserving `self` assists deduction; making it optional is a consequence of the method binding behaviour 9.90 +}}} 9.91 + 9.92 +=== Traditional Local, Global and Built-In Scopes Only === 9.93 + 9.94 +Namespaces reside within a hierarchy within modules: classes containing classes or functions; functions containing other functions. Built-in names are exposed in all namespaces, global names are defined at the module level and are exposed in all namespaces within the module, locals are confined to the namespace in which they are defined. 9.95 + 9.96 +However, locals are not inherited by namespaces from surrounding or enclosing namespaces. 9.97 + 9.98 +{{{#!python numbers=disable 9.99 +def f(x): 9.100 + def g(y): 9.101 + return x + y # not permitted: x is not inherited from f in Lichen (it is in Python) 9.102 + return g 9.103 + 9.104 +def h(x): 9.105 + def i(y, x=x): # x is initialised but held in the namespace of i 9.106 + return x + y # succeeds: x is defined 9.107 + return i 9.108 +}}} 9.109 + 9.110 +Needing to access outer namespaces in order to access any referenced names complicates the way in which such dynamic namespaces would need to be managed. Although the default initialisation technique demonstrated above could be automated, explicit initialisation makes programs easier to follow and avoids mistakes involving globals having the same name. 9.111 + 9.112 +=== Reserved Self === 9.113 + 9.114 +The `self` name can be omitted in method signatures, but in methods it is always initialised to the instance on which the method is operating. 9.115 + 9.116 +{{{#!python numbers=disable 9.117 +class C: 9.118 + def f(y): # y is not the instance 9.119 + self.x = y # self is the instance 9.120 +}}} 9.121 + 9.122 +The assumption in methods is that `self` must always be referring to an instance of the containing class or of a descendant class. This means that `self` cannot be initialised to another kind of value, which Python permits through the explicit invocation of a method with the inclusion of the affected instance as the first argument. Consequently, `self` becomes optional in the signature because it is not assigned in the same way as the other parameters. 9.123 + 9.124 +== Inheritance and Binding == 9.125 + 9.126 +{{{#!table 9.127 +'''Lichen''' || '''Python''' || '''Rationale''' 9.128 +== 9.129 +Class attributes are propagated to class hierarchy members during initialisation: rebinding class attributes does not affect descendant class attributes 9.130 +|| Class attributes are propagated live to class hierarchy members and must be looked up by the run-time system if not provided by a given class 9.131 +|| Initialisation-time propagation simplifies access operations and attribute table storage 9.132 +== 9.133 +Unbound methods must be bound using a special function taking an instance 9.134 +|| Unbound methods may be called using an instance as first argument 9.135 +|| Forbidding instances as first arguments simplifies the invocation mechanism 9.136 +== 9.137 +Functions assigned to class attributes do not become unbound methods 9.138 +|| Functions assigned to class attributes become unbound methods 9.139 +|| Removing method assignment simplifies deduction: methods are always defined in place 9.140 +== 9.141 +Base classes must be well-defined 9.142 +|| Base classes may be expressions 9.143 +|| Well-defined base classes are required to establish a well-defined hierarchy of types 9.144 +== 9.145 +Classes may not be defined in functions 9.146 +|| Classes may be defined in any kind of namespace 9.147 +|| Forbidding classes in functions prevents the definition of countless class variants that are awkward to analyse 9.148 +}}} 9.149 + 9.150 +=== Inherited Class Attributes === 9.151 + 9.152 +Class attributes that are changed for a class do not change for that class's descendants. 9.153 + 9.154 +{{{#!python numbers=disable 9.155 +class C: 9.156 + a = 123 9.157 + 9.158 +class D(C): 9.159 + pass 9.160 + 9.161 +C.a = 456 9.162 +print D.a # remains 123 in Lichen, becomes 456 in Python 9.163 +}}} 9.164 + 9.165 +Permitting this requires indirection for all class attributes, requiring them to be treated differently from other kinds of attributes. Meanwhile, class attribute rebinding and the accessing of inherited attributes changed in this way is relatively rare. 9.166 + 9.167 +=== Unbound Methods === 9.168 + 9.169 +Methods are defined on classes but are only available via instances: they are instance methods. Consequently, acquiring a method directly from a class and then invoking it should fail because the method will be unbound: the "context" of the method is not an instance. Furthermore, the Python technique of supplying an instance as the first argument in an invocation to bind the method to an instance, thus setting the context of the method, is not supported. See [[#ReservedSelf|"Reserved Self"]] for more information. 9.170 + 9.171 +{{{#!python numbers=disable 9.172 +class C: 9.173 + def f(self, x): 9.174 + self.x = x 9.175 + def g(self): 9.176 + C.f(123) # not permitted: C is not an instance 9.177 + C.f(self, 123) # not permitted: self cannot be specified in the argument list 9.178 + get_using(C.f, self)(123) # binds C.f to self, then the result is called 9.179 +}}} 9.180 + 9.181 +Binding methods to instances occurs when acquiring methods via instances or explicitly using the `get_using` built-in. The built-in checks the compatibility of the supplied method and instance. If compatible, it provides the bound method as its result. 9.182 + 9.183 +Normal functions are callable without any further preparation, whereas unbound methods need the binding step to be performed and are not immediately callable. Were functions to become unbound methods upon assignment to a class attribute, they would need to be invalidated by having the preparation mechanism enabled on them. However, this invalidation would only be relevant to the specific case of assigning functions to classes and this would need to be tested for. Given the added complications, such functionality is arguably not worth supporting. 9.184 + 9.185 +=== Assigning Functions to Class Attributes === 9.186 + 9.187 +Functions can be assigned to class attributes but do not become unbound methods as a result. 9.188 + 9.189 +{{{#!python numbers=disable 9.190 +class C: 9.191 + def f(self): # will be replaced 9.192 + return 234 9.193 + 9.194 +def f(self): 9.195 + return self 9.196 + 9.197 +C.f = f # makes C.f a function, not a method 9.198 +C().f() # not permitted: f requires an explicit argument 9.199 +C().f(123) # permitted: f has merely been exposed via C.f 9.200 +}}} 9.201 + 9.202 +Methods are identified as such by their definition location, they contribute information about attributes to the class hierarchy, and they employ certain structure details at run-time to permit the binding of methods. Since functions can defined in arbitrary locations, no class hierarchy information is available, and a function could combine `self` with a range of attributes that are not compatible with any class to which the function might be assigned. 9.203 + 9.204 +=== Well-Defined Base Classes === 9.205 + 9.206 +Base classes must be clearly identifiable as well-defined classes. This facilitates the cataloguing of program objects and further analysis on them. 9.207 + 9.208 +{{{#!python numbers=disable 9.209 +class C: 9.210 + x = 123 9.211 + 9.212 +def f(): 9.213 + return C 9.214 + 9.215 +class D(f()): # not permitted: f could return anything 9.216 + pass 9.217 +}}} 9.218 + 9.219 +If base class identification could only be done reliably at run-time, class relationship information would be very limited without running the program or performing costly and potentially unreliable analysis. Indeed, programs employing such dynamic base classes are arguably resistant to analysis, which is contrary to the goals of a language like Lichen. 9.220 + 9.221 +=== Class Definitions and Functions === 9.222 + 9.223 +Classes may not be defined in functions because functions provide dynamic namespaces, but Lichen relies on a static namespace hierarchy in order to clearly identify the principal objects in a program. If classes could be defined in functions, despite seemingly providing the same class over and over again on every invocation, a family of classes would, in fact, be defined. 9.224 + 9.225 +{{{#!python numbers=disable 9.226 +def f(x): 9.227 + class C: # not permitted: this describes one of potentially many classes 9.228 + y = x 9.229 + return f 9.230 +}}} 9.231 + 9.232 +Moreover, issues of namespace nesting also arise, since the motivation for defining classes in functions would surely be to take advantage of local state to parameterise such classes. 9.233 + 9.234 +== Modules and Packages == 9.235 + 9.236 +{{{#!table 9.237 +'''Lichen''' || '''Python''' || '''Rationale''' 9.238 +== 9.239 +Modules are independent: package hierarchies are not traversed when importing 9.240 +|| Modules exist in hierarchical namespaces: package roots must be imported before importing specific submodules 9.241 +|| Eliminating module traversal permits more precise imports and reduces superfluous code 9.242 +== 9.243 +Only specific names can be imported from a module or package using the `from` statement 9.244 +|| Importing "all" from a package or module is permitted 9.245 +|| Eliminating "all" imports simplifies the task of determining where names in use have come from 9.246 +== 9.247 +Modules must be specified using absolute names 9.248 +|| Imports can be absolute or relative 9.249 +|| Using only absolute names simplifies the import mechanism 9.250 +== 9.251 +Modules are imported independently and their dependencies subsequently resolved 9.252 +|| Modules are imported as import statements are encountered 9.253 +|| Statically-initialised objects can be used declaratively, although an initialisation order may still need establishing 9.254 +}}} 9.255 + 9.256 +=== Independent Modules === 9.257 + 9.258 +The inclusion of modules in a program affects only explicitly-named modules: they do not have relationships implied by their naming that would cause such related modules to be included in a program. 9.259 + 9.260 +{{{#!python numbers=disable 9.261 +from compiler import consts # defines consts 9.262 +import compiler.ast # defines ast, not compiler 9.263 + 9.264 +ast # is defined 9.265 +compiler # is not defined 9.266 +consts # is defined 9.267 +}}} 9.268 + 9.269 +Where modules should have relationships, they should be explicitly defined using `from` and `import` statements which target the exact modules required. In the above example, `compiler` is not routinely imported because modules within the `compiler` package have been requested. 9.270 + 9.271 +=== Specific Name Imports Only === 9.272 + 9.273 +Lichen, unlike Python, also does not support the special `__all__` module attribute. 9.274 + 9.275 +{{{#!python numbers=disable 9.276 +from compiler import * # not permitted 9.277 +from compiler import ast, consts # permitted 9.278 + 9.279 +interpreter # undefined in compiler (yet it might be thought to reside there) and in this module 9.280 +}}} 9.281 + 9.282 +The `__all__` attribute supports `from ... import *` statements in Python, but without identifying the module or package involved and then consulting `__all__` in that module or package to discover which names might be involved (which might require the inspection of yet other modules or packages), the names imported cannot be known. Consequently, some names used elsewhere in the module performing the import might be assumed to be imported names when, in fact, they are unknown in both the importing and imported modules. Such uncertainty hinders the inspection of individual modules. 9.283 + 9.284 +=== Modules Imported Independently === 9.285 + 9.286 +When indicating an import using the `from` and `import` statements, the [[../Toolchain|toolchain]] does not attempt to immediately import other modules. Instead, the imports act as declarations of such other modules or names from other modules, resolved at a later stage. This permits mutual imports to a greater extent than in Python. 9.287 + 9.288 +{{{#!python numbers=disable 9.289 +# Module M 9.290 +from N import C # in Python: fails attempting to re-enter N 9.291 + 9.292 +class D(C): 9.293 + y = 456 9.294 + 9.295 +# Module N 9.296 +from M import D # in Python: causes M to be entered, fails when re-entered from N 9.297 + 9.298 +class C: 9.299 + x = 123 9.300 + 9.301 +class E(D): 9.302 + z = 789 9.303 + 9.304 +# Main program 9.305 +import N 9.306 +}}} 9.307 + 9.308 +Such flexibility is not usually needed, and circular importing usually indicates issues with program organisation. However, declarative imports can help to decouple modules and avoid combining import declaration and module initialisation order concerns. 9.309 + 9.310 +== Syntax and Control-Flow == 9.311 + 9.312 +{{{#!table 9.313 +'''Lichen''' || '''Python''' || '''Rationale''' 9.314 +== 9.315 +If expressions and comprehensions are not supported 9.316 +|| If expressions and comprehensions are supported 9.317 +|| Omitting such syntactic features simplifies program inspection and translation 9.318 +== 9.319 +The `with` statement is not supported 9.320 +|| The `with` statement offers a mechanism for resource allocation and deallocation using context managers 9.321 +|| This syntactic feature can be satisfactorily emulated using existing constructs 9.322 +== 9.323 +Generators are not supported 9.324 +|| Generators are supported 9.325 +|| Omitting generator support simplifies run-time mechanisms 9.326 +== 9.327 +Only positional and keyword arguments are supported 9.328 +|| Argument unpacking (using `*` and `**`) is supported 9.329 +|| Omitting unpacking simplifies generic invocation handling 9.330 +== 9.331 +All parameters must be specified 9.332 +|| Catch-all parameters (`*` and `**`) are supported 9.333 +|| Omitting catch-all parameter population simplifies generic invocation handling 9.334 +}}} 9.335 + 9.336 +=== No If Expressions or Comprehensions === 9.337 + 9.338 +In order to support the classic [[WikiPedia:?:|ternary operator]], a construct was [[https://www.python.org/dev/peps/pep-0308/|added]] to the Python syntax that needed to avoid problems with the existing grammar and notation. Unfortunately, it reorders the components from the traditional form: 9.339 + 9.340 +{{{#!python numbers=disable 9.341 +# Not valid in Lichen, only in Python. 9.342 + 9.343 +# In C: condition ? true_result : false_result 9.344 +true_result if condition else false_result 9.345 + 9.346 +# In C: (condition ? inner_true_result : inner_false_result) ? true_result : false_result 9.347 +true_result if (inner_true_result if condition else inner_false_result) else false_result 9.348 +}}} 9.349 + 9.350 +Since if expressions may participate within expressions, they cannot be rewritten as if statements. Nor can they be rewritten as logical operator chains in general. 9.351 + 9.352 +{{{#!python numbers=disable 9.353 +# Not valid in Lichen, only in Python. 9.354 + 9.355 +a = 0 if x else 1 # x being true yields 0 9.356 + 9.357 +# Here, x being true causes (x and 0) to complete, yielding 0. 9.358 +# But this causes ((x and 0) or 1) to complete, yielding 1. 9.359 + 9.360 +a = x and 0 or 1 # not valid 9.361 +}}} 9.362 + 9.363 +But in any case, it would be more of a motivation to support the functionality if a better syntax could be adopted instead. However, if expressions are not particularly important in Python, and despite enhancement requests over many years, everybody managed to live without them. 9.364 + 9.365 +List and generator comprehensions are more complicated but share some characteristics of if expressions: their syntax contradicts the typical conventions established by the rest of the Python language; they create implicit state that is perhaps most appropriately modelled by a separate function or similar object. Since Lichen does not support generators at all, it will obviously not support generator expressions. 9.366 + 9.367 +Meanwhile, list comprehensions quickly encourage barely-readable programs: 9.368 + 9.369 +{{{#!python numbers=disable 9.370 +# Not valid in Lichen, only in Python. 9.371 + 9.372 +x = [0, [1, 2, 0], 0, 0, [0, 3, 4]] 9.373 +a = [z for y in x if y for z in y if z] 9.374 +}}} 9.375 + 9.376 +Supporting the creation of temporary functions to produce list comprehensions, while also hiding temporary names from the enclosing scope, adds complexity to the toolchain for situations where programmers would arguably be better creating their own functions and thus writing more readable programs. 9.377 + 9.378 +=== No With Statement === 9.379 + 9.380 +The [[https://docs.python.org/2.7/reference/compound_stmts.html#the-with-statement|with statement]] introduced the concept of [[https://docs.python.org/2.7/reference/datamodel.html#context-managers|context managers]] in Python 2.5, with such objects supporting a [[https://docs.python.org/2.7/library/stdtypes.html#typecontextmanager|programming interface]] that aims to formalise certain conventions around resource management. For example: 9.381 + 9.382 +{{{#!python numbers=disable 9.383 +# Not valid in Lichen, only in Python. 9.384 + 9.385 +with connection = db.connect(connection_args): 9.386 + with cursor = connection.cursor(): 9.387 + cursor.execute(query, args) 9.388 +}}} 9.389 + 9.390 +Although this makes for readable code, it must be supported by objects which define the `__enter__` and `__exit__` special methods. Here, the `connect` method invoked in the first `with` statement must return such an object; similarly, the `cursor` method must also provide an object with such characteristics. 9.391 + 9.392 +However, the "pre-with" solution is as follows: 9.393 + 9.394 +{{{#!python numbers=disable 9.395 +connection = db.connect(connection_args) 9.396 +try: 9.397 + cursor = connection.cursor() 9.398 + try: 9.399 + cursor.execute(query, args) 9.400 + finally: 9.401 + cursor.close() 9.402 +finally: 9.403 + connection.close() 9.404 +}}} 9.405 + 9.406 +Although this seems less readable, its behaviour is more obvious because magic methods are not being called implicitly. Moreover, any parameterisation of the acts of resource deallocation or closure can be done in the `finally` clauses where such parameterisation would seem natural, rather than being specified through some kind of context manager initialisation arguments that must then be propagated to the magic methods so that they may take into consideration contextual information that is readily available in the place where the actual resource operations are being performed. 9.407 + 9.408 +=== No Generators === 9.409 + 9.410 +[[https://www.python.org/dev/peps/pep-0255/|Generators]] were [[https://docs.python.org/release/2.3/whatsnew/section-generators.html|added]] to Python in the 2.2 release and became fully part of the language in the 2.3 release. They offer a convenient way of writing iterator-like objects, capturing execution state instead of obliging the programmer to manage such state explicitly. 9.411 + 9.412 +{{{#!python numbers=disable 9.413 +# Not valid in Lichen, only in Python. 9.414 + 9.415 +def fib(): 9.416 + a, b = 0, 1 9.417 + while 1: 9.418 + yield b 9.419 + a, b = b, a+b 9.420 + 9.421 +# Alternative form valid in Lichen. 9.422 + 9.423 +class fib: 9.424 + def __init__(self): 9.425 + self.a, self.b = 0, 1 9.426 + 9.427 + def next(self): 9.428 + result = self.b 9.429 + self.a, self.b = self.b, self.a + self.b 9.430 + return result 9.431 + 9.432 +# Main program. 9.433 + 9.434 +seq = fib() 9.435 +i = 0 9.436 +while i < 10: 9.437 + print seq.next() 9.438 + i += 1 9.439 +}}} 9.440 + 9.441 +However, generators make additional demands on the mechanisms provided to support program execution. The encapsulation of the above example generator in a separate class illustrates the need for state that persists outside the execution of the routine providing the generator's results. Generators may look like functions, but they do not necessarily behave like them, leading to potential misunderstandings about their operation even if the code is superficially tidy and concise. 9.442 + 9.443 +=== Positional and Keyword Arguments Only === 9.444 + 9.445 +When invoking callables, only positional arguments and keyword arguments can be used. Python also supports `*` and `**` arguments which respectively unpack sequences and mappings into the argument list, filling the list with sequence items (using `*`) and keywords (using `**`). 9.446 + 9.447 +{{{#!python numbers=disable 9.448 +def f(a, b, c, d): 9.449 + return a + b + c + d 9.450 + 9.451 +l = range(0, 4) 9.452 +f(*l) # not permitted 9.453 + 9.454 +m = {"c" : 10, "d" : 20} 9.455 +f(2, 4, **m) # not permitted 9.456 +}}} 9.457 + 9.458 +While convenient, such "unpacking" arguments obscure the communication between callables and undermine the safety provided by function and method signatures. They also require run-time support for the unpacking operations. 9.459 + 9.460 +=== Positional Parameters Only === 9.461 + 9.462 +Similarly, signatures may only contain named parameters that correspond to arguments. Python supports `*` and `**` in parameter lists, too, which respectively accumulate superfluous positional and keyword arguments. 9.463 + 9.464 +{{{#!python numbers=disable 9.465 +def f(a, b, *args, **kw): # not permitted 9.466 + return a + b + sum(args) + kw.get("c", 0) + kw.get("d", 0) 9.467 + 9.468 +f(1, 2, 3, 4) 9.469 +f(1, 2, c=3, d=4) 9.470 +}}} 9.471 + 9.472 +Such accumulation parameters can be useful for collecting arbitrary data and applying some of it within a callable. However, they can easily proliferate throughout a system and allow erroneous data to propagate far from its origin because such parameters permit the deferral of validation until the data needs to be accessed. Again, run-time support is required to marshal arguments into the appropriate parameter of this nature, but programmers could just write functions and methods that employ general sequence and mapping parameters explicitly instead.
10.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 10.2 +++ b/docs/wiki/Downloads Tue Apr 11 21:41:22 2017 +0200 10.3 @@ -0,0 +1,24 @@ 10.4 += Downloads = 10.5 + 10.6 +Remember that certain [[../Prerequisites|prerequisites]] are needed for the 10.7 +software to work. 10.8 + 10.9 +== Copyright and Licence == 10.10 + 10.11 +The Lichen distribution is made available according to the terms of the 10.12 +[[http://www.gnu.org/copyleft/gpl.html|GNU GPL (version 3 or later)]]. See 10.13 +the `docs/COPYING.txt` and `docs/gpl-3.0.txt` files in the source code 10.14 +distribution for details. 10.15 + 10.16 +== Repository == 10.17 + 10.18 +The source code repository is located at the following location: 10.19 + 10.20 +http://hgweb.boddie.org.uk/Lichen 10.21 + 10.22 +The repository can be cloned using [[https://www.mercurial-scm.org/|Mercurial]] 10.23 +as follows: 10.24 + 10.25 +{{{ 10.26 +hg clone http://hgweb.boddie.org.uk/Lichen 10.27 +}}}
11.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 11.2 +++ b/docs/wiki/FrontPage Tue Apr 11 21:41:22 2017 +0200 11.3 @@ -0,0 +1,44 @@ 11.4 += Lichen = 11.5 + 11.6 +|| [[/Downloads|Downloads]] || [[#Language|Language]] || [[#Toolchain|Toolchain]] || [[#Rationale|Rationale]] || [[#Documents|Documents]] || 11.7 + 11.8 +Lichen is both a Python-like [[/Design|language]] and a [[/Toolchain|toolchain]] for that language. 11.9 + 11.10 +Some objectives: 11.11 + 11.12 + * Perform analysis on programs to better understand program structure and behaviour 11.13 + * Develop code generation capabilities 11.14 + * Provide a platform for experimentation independent of existing Python language and library implementations 11.15 + * Provide independence from Python language evolution 11.16 + * Learn things about writing compilers 11.17 + 11.18 +Despite building on a long [[/History|history]] of experimentation, Lichen still requires some [[/ToDo|work to be done]] for it to be more widely usable. 11.19 + 11.20 +<<Anchor(Language)>> 11.21 +== Language == 11.22 + 11.23 +The Lichen language [[/Design|foregoes]] various dynamic aspects of Python to provide a foundation upon which more predictable programs can be built, while preserving essential functionality to make the core of the language seem very much "like Python" (thus yielding the name "Lichen"). The general syntax is largely identical to Python, with only certain syntactic constructs being deliberately unsupported, largely because the associated features are not desired. 11.24 + 11.25 +<<Anchor(Toolchain)>> 11.26 +== Toolchain == 11.27 + 11.28 +The Lichen [[/Toolchain|toolchain]] employs existing tokeniser and parser software to obtain an abstract syntax tree which is then inspected to provide data to support deductions about the structure and nature of a given program. With the information obtained from these processes, a program is then constructed, consisting of a number of source files in the target compilation language (which is currently the C programming language). This generated program may be compiled and run, hopefully producing the results intended by the source program's authors. 11.29 + 11.30 +Lichen source files use the `.py` suffix since the language syntax is superficially compatible with Python, allowing text editors to provide highlighting and editing support for Lichen programs without the need to reconfigure such tools. However, an alternative, recommended suffix is likely to be introduced in future. 11.31 + 11.32 +<<Anchor(Library)>> 11.33 +== Library == 11.34 + 11.35 +Unlike other Python compilation efforts, Lichen programs employ a newly-written library that is distinct from the CPython standard library distribution and completely independent from the CPython extension and object libraries on which Python programs being run with CPython must depend. Thus, there is no dependency on any `libpython` for run-time functionality. Since the parts of the Python standard library that are written in Python tend to be rather variable in quality, there has been no real inclination to re-use modules from that particular source, noting that they would need modifying to be compatible with Lichen, anyway. However, rewriting such modules provides opportunities to "do things right": with some functionality being over twenty years old and in bad shape, this is arguably something that should have been done for Python, anyway. 11.36 + 11.37 +<<Anchor(Rationale)>> 11.38 +== Rationale == 11.39 + 11.40 +Python has proven to be a readable, productive, comfortable-to-use and popular programming language. However, as it has accumulated features, the precise behaviour of programs making use of many of these features has become more difficult to predict. Features added to provide even more convenience to the programmer have often incurred run-time costs, introduced layers of indirection, and have made programs even more inscrutable. Instead of development tools reaching a point of being able to infer information about programs, it has been suggested that programmers annotate their programs in order to help tools understand those programs instead. Beyond superficial code style analysis and providing tooltips in integrated development environments, Python code analysis is often portrayed as a lost cause. 11.41 + 11.42 +By employing a refined language [[/Design|design]], Lichen aims to let each program define its [[/Structure|structure]] conveniently, be readily [[/Inspection|inspected]], and thus support [[/Deduction|deductions]] about the use of the program's objects by the code. The result should be more predictable programs that can be [[/Translation|translated]] to other, more efficient, [[/Representations|representations]]. The Lichen toolchain should be able to tell the programmer useful things about their programs that it may also be able to make use of itself. It not only aims to report information about programs that might be of interest to the developer, but it seeks to produce functioning, translated programs that can actually be run. 11.43 + 11.44 +<<Anchor(Documents)>> 11.45 +== Document Index == 11.46 + 11.47 +<<FullSearch(title:Lichen/)>>
12.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 12.2 +++ b/docs/wiki/History Tue Apr 11 21:41:22 2017 +0200 12.3 @@ -0,0 +1,29 @@ 12.4 += History = 12.5 + 12.6 +Lichen is one of a number of experiments focused on the analysis of Python source code with potential reliability and performance improvements in mind. There have been a number of projects that can be regarded as its predecessors, the first of these being a project with the name "Analysis" that was inspired by the then-recent announcement of the [[https://dspace.mit.edu/handle/1721.1/16688|Starkiller]] tool, which attempted to compile Python to C++ (and was never publicly released). With the emergence of the [[http://pypy.org/|PyPy]] project, the notion of writing a Python implementation in a dialect of Python was also considered attractive. 12.7 + 12.8 +== Initial Experiments == 12.9 + 12.10 +After some experience gained, a successor to "Analysis" was attempted with the uninspired, but distinct, name of "analysis". This attempted to apply type inference and also generate C programs that could be compiled, albeit with a very simple run-time model that was not really amenable to further improvement. However, some useful concepts were explored, such as abstract syntax tree (AST) node annotations that could be used to direct code generation activities and to provide annotated source code and documentation summaries. 12.11 + 12.12 +Various experiences with specialising functions and methods, plus general experience working with the AST led to a successor called "simplify", which attempted to extend specialisation to classes, thus introducing the notion of multiple forms of a single class whose attributes have differing characteristics. It also introduced the idea of a simplified instruction set for program operations, in contrast to Python bytecode whose instructions can represent relatively complicated operations, along with a simplified program representation that was intended to make program analysis easier. Issues around "data polymorphism" led to the realisation that attempting to identify specific types, in order to facilitate insights into program behaviour or optimisations of programs, was costly and not necessarily effective. 12.13 + 12.14 +(Conversations with the author of [[http://shedskin.github.io/|Shed Skin]], a tool already available at this time, which compiles Python programs complying to certain restrictions to C++, were very useful in considering data polymorphism issues and others. Shed Skin must effectively be what Starkiller promised to be, and it has even been available in Debian for a number of years so that others can evaluate it for their own needs.) 12.15 + 12.16 +== Further Experiences == 12.17 + 12.18 +An alternative approach was taken in the next successor, "micropython" (not to be confused with any other, more popular, project of the same name), where the focus of analysis shifted to defining the attributes provided by program objects and then determining which objects might be used in different places in the program according to the attributes mentioned. It introduced a strategy for representing program objects at run-time, and continued the work on a simplified instruction set, defining its instructions in terms of the run-time structures. The ability to generate annotated source code summaries was enhanced, and the software could also generate programs in the simplified instruction set that could be executed using an interpreter. Although substantial progress was made, the increasing resistance experienced in adapting a growing system to further forms of optimisations, and the need to implement and test run-time functionality on a somewhat unique instruction set architecture meant that progress became slower over time. 12.19 + 12.20 +The dependence of "micropython" on costly whole-program analysis to identify optimisations led to some further realisations. The notion of "attribute usage" had allowed program namespaces to have their names characterised, and the notion of well-defined structures had allowed the identification of specific object types throughout programs, albeit with a degree of understandable inaccuracy. Meanwhile, the run-time data model had relied on restrictions on structures to permit relatively efficient structure access operations even when object types cannot be identified. It thus appeared that such techniques might be almost as potent on their own as the combination of more costly techniques explored in earlier projects. 12.21 + 12.22 +== Current Work == 12.23 + 12.24 +It was with such realisations that a new project was effectively born. Tentatively called "!PythonLight" but renamed to "Lichen" as the code matured, the objectives now involved a simpler processing framework that merely attempted to catalogue structure members, to determine the origins of such members, and to record data flow within namespaces in order to determine attribute usage on names. The name "Lichen" can be pronounced in a way that is a contraction of "like Python", indicating that although the language being analysed is similar to Python, restrictions apply that make it not identical to it. 12.25 + 12.26 +Unlike previous efforts, instead of annotating the AST built by the Python parser, separate catalogues would record details of program operations for more convenient consumption by potentially separate analysis tools. It is worth emphasising that much of the processing work done to characterise program behaviour is effectively database work: searching for information that matches particular criteria. Emitting tabular data for perusal and potential use by other tools is a recognition of this, and it would not be unreasonable to expect future versions of tools of this nature to actually employ more conventional database management systems to facilitate program analysis. 12.27 + 12.28 +Instead of targeting a project-specific virtual machine, the ultimate objective of processing was amended to generate a C program that could be compiled using existing tools. Although care must be taken to generate C source code, delegating the low-level work of actually producing an executable reduces the scope of the project and its associated demands. 12.29 + 12.30 +Despite reasonable progress being made, certain features retained from Python proved to be a burden, notably the behaviour of Python's import system and the way in which programs can quickly accumulate superfluous code through that system's operation. To focus on genuinely important functionality and on producing working code, the project was [[../Restarted|restarted]] with various additional simplifications made to the supported language. Fortunately, this helped to drive progress, resulting in the development of a toolchain that can generate C programs for compilation and execution. 12.31 + 12.32 +Lichen attempts to balance the goals of requiring relatively inexpensive analysis, offering meaningful insights into program behaviour, providing output in the form of working, translated, standalone programs, and encouraging further experimentation in the areas of language design, program analysis and optimisation. Much of the standard library is itself written in Lichen, even functionality that would not be written in Python for CPython, thus realising one of the original ambitions for this work and its predecessors.
13.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 13.2 +++ b/docs/wiki/Imports Tue Apr 11 21:41:22 2017 +0200 13.3 @@ -0,0 +1,187 @@ 13.4 += Imports = 13.5 + 13.6 +An '''import''' is a declaration of one or more names that are provided by another source file or '''module''': 13.7 + 13.8 + * `import` statements declare names that correspond to modules 13.9 + * `from` statements declare names provided by modules 13.10 + 13.11 +Imports occur either through explicit import operations initiated by the `from` and `import` statements, or through implicit import operations occurring to satisfy the requirements of another kind of operation. 13.12 + 13.13 +<<TableOfContents(2,3)>> 13.14 + 13.15 +== Packages and Submodules == 13.16 + 13.17 +A '''package''' is a collection of modules whose names are all prefixed by the package name. For example: 13.18 + 13.19 +{{{ 13.20 +compiler 13.21 +compiler.ast 13.22 +compiler.transformer 13.23 +}}} 13.24 + 13.25 +Here, the `compiler` package is said to contain the `compiler.ast` and `compiler.transformer` modules. 13.26 + 13.27 +=== Defining Packages === 13.28 + 13.29 +The package root or top-level module is defined in a file called `__init__.py` inside the directory bearing the package's name, and this file provides a namespace for the top-level module. However, a package does not expose its member modules ('''submodules''') as members of its top-level module. Instead, the hierarchical relationship between a package and its submodules exists purely in the naming of those modules, and where submodules are imported they must be done so using their full names. 13.30 + 13.31 +Thus, relationships between packages and modules must be explicitly defined in module namespaces. For example, in the `compiler` module, the following would define relationships to the submodules: 13.32 + 13.33 +{{{ 13.34 +from compiler.ast import ast 13.35 +from compiler.transformer import transformer 13.36 +}}} 13.37 + 13.38 +Without such import statements, no attempt will be made upon importing `compiler` to access the submodules and automatically populate the package. 13.39 + 13.40 +=== Accessing Submodules Directly === 13.41 + 13.42 +Importing of submodules from packages will not cause the package itself to be imported. For example: 13.43 + 13.44 +{{{ 13.45 +import compiler.ast 13.46 +}}} 13.47 + 13.48 +This initialises the name `ast` which refers to the `compiler.ast` module, but the `compiler` package and its top-level module will not be imported. Thus, submodules can be considered independent of their packages, although they may seek to import their package top-level module should they need to access objects provided by that module. 13.49 + 13.50 +== Implicit Imports == 13.51 + 13.52 +The following kinds of operations cause implicit imports: 13.53 + 13.54 +|| '''Operations''' ||<-2> '''Import names provided by...''' || 13.55 +|| Augmented assignments ||<|5> `operator` || `operator.augmented` || 13.56 +|| Binary operators || `operator.binary` || 13.57 +|| Comparison operators || `operator.comparison` || 13.58 +|| Slice operators || `operator.sequence`<<BR>>~-Subscript operators are converted to item method invocations-~ || 13.59 +|| Unary operators || `operator.unary` || 13.60 +|| Access to built-in name || `__builtins__` || (various modules in the [[../Builtins|built-ins]] package hierarchy) || 13.61 + 13.62 +Operator usage will cause a local name referring to an `operator` module function to be created, with the appropriate function being exposed by the `operator` module itself. However, the inspection process will seek to obtain a reference to the function in its actual definition location, ultimately referencing the function in one of the modules indicated above. 13.63 + 13.64 +== Import Sequencing == 13.65 + 13.66 +In order to populate modules, other modules may themselves be required to provide names to a given module, and in turn these other modules may rely on yet more modules, and so on. One logical consequence of this is that circular imports become possible, but the resulting mutual dependencies may not be easily untangled without careful attention to the state of each of the participating modules. Consider the following situation: 13.67 + 13.68 +{{{{#!table 13.69 +{{{#!graphviz 13.70 +//format=svg 13.71 +//transform=notugly 13.72 +digraph mutual { 13.73 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Mutually-dependent modules"]; 13.74 + edge [tooltip="Mutually-dependent modules"]; 13.75 + rankdir=LR; 13.76 + 13.77 + subgraph { 13.78 + rank=same; 13.79 + moduleA [label="module A",shape=ellipse]; 13.80 + fromB [label="from B import C",style=filled,fillcolor=gold]; 13.81 + D [label="class D(C)"]; 13.82 + } 13.83 + 13.84 + subgraph { 13.85 + rank=same; 13.86 + moduleB [label="module B",shape=ellipse]; 13.87 + fromA [label="from A import D",style=filled,fillcolor=gold]; 13.88 + C [label="class C"]; 13.89 + E [label="class E(D)"]; 13.90 + } 13.91 + 13.92 + moduleA -> fromB -> D [dir=none,style=dashed]; 13.93 + moduleB -> fromA -> C -> E [dir=none,style=dashed]; 13.94 + 13.95 + fromB -> fromA; 13.96 + fromA -> fromB; 13.97 +} 13.98 +}}} 13.99 +|| 13.100 +Module A: 13.101 + 13.102 +{{{ 13.103 +from B import C 13.104 + 13.105 +class D(C): 13.106 + ... 13.107 +}}} 13.108 + 13.109 +Module B: 13.110 + 13.111 +{{{ 13.112 +from A import D 13.113 + 13.114 +class C: 13.115 + ... 13.116 + 13.117 +class E(D): 13.118 + ... 13.119 +}}} 13.120 +}}}} 13.121 + 13.122 +If modules were loaded upon being encountered in import statements, module A would not be completely processed when attempting to import from module B, and thus the import within module B of module A would only yield some information about module A. Consequently, the details of class D might not be available, and this would then have an impact on whether module B could even be completely processed itself. 13.123 + 13.124 +The approach taken to generally deal with such situations is to defer resolution until all modules have been populated. Then, names are resolved with any names employing kinds of references specified as `<depends>` (instead of, for example, `<class>`) being resolved according to the recorded import dependencies. 13.125 + 13.126 +Since the classes in one module may depend on those in other modules, it is not always possible to finalise the details of classes in a module context. And since modules may depend on each other, it is not always possible to finalise the details of classes until the details of all classes in a program are known. 13.127 + 13.128 +=== Module Initialisation === 13.129 + 13.130 +Although static objects can be defined with interdependencies in a declarative fashion, the initialisation of objects in modules may require the availability of completely-initialised objects defined in other modules. Thus, an initialisation order needs to be established, with some modules being initialised before others, so that all modules do not encounter uninitialised names when they are expecting those names to provide valid objects. 13.131 + 13.132 +The most obvious example of a module requiring the initialisation of others before it is itself evaluated is, of course, the `__main__` module. Given that it may import instances defined as attribute on other modules, it clearly requires those modules to have been initialised and those instances to have been created. It would be absurd to consider running the body of the `__main__` module before such other modules. Similarly, such dependencies exist between other modules, and consequently, an appropriate initialisation ordering must be defined for them. In its entirety, then, a program must define a workable ordering for all of its modules, signalling a concrete error if no such ordering can be established. 13.133 + 13.134 +== Hidden Modules == 13.135 + 13.136 +Imports that do not obtain the imported module name itself, such as those initiated by the `from` statement and by implicit operations, keep the imported module '''hidden'''. Unless other operations expose hidden modules, they will remain hidden and may consequently be omitted from the final generated program: there would be no way of referencing such modules and they would therefore be unable to contribute their contents to the rest of the program. 13.137 + 13.138 +However, where an object provided by a module is referenced, a module cannot remain hidden, since the provided object may depend on other parts of the module in order to function correctly. And since a provided object might reference or return other objects in the module, the general module contents must also be exposed. 13.139 + 13.140 +Import dependencies are defined for namespaces indicating modules that are required by each namespace. By following dependency relationships, it is possible to determine the eventual target of an import and to potentially skip over modules that merely import and expose names. For example: 13.141 + 13.142 +{{{{#!table 13.143 +{{{#!graphviz 13.144 +//format=svg 13.145 +//transform=notugly 13.146 +digraph imports { 13.147 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Import dependencies"]; 13.148 + edge [tooltip="Import dependencies"]; 13.149 + rankdir=LR; 13.150 + 13.151 + importer [label="from A import C",style=filled,fillcolor=darkorange]; 13.152 + 13.153 + subgraph { 13.154 + rank=same; 13.155 + moduleA [label="module A",shape=ellipse]; 13.156 + fromB [label="from B import C",style=filled,fillcolor=gold]; 13.157 + } 13.158 + 13.159 + subgraph { 13.160 + rank=same; 13.161 + moduleB [label="module B",shape=ellipse]; 13.162 + C [label="class C",style=filled,fillcolor=darkorange]; 13.163 + } 13.164 + 13.165 + moduleA -> fromB [dir=none,style=dashed]; 13.166 + moduleB -> C [dir=none,style=dashed]; 13.167 + 13.168 + importer -> fromB -> C; 13.169 +} 13.170 +}}} 13.171 +|| 13.172 +{{{ 13.173 +from A import C 13.174 +}}} 13.175 + 13.176 +Module A: 13.177 + 13.178 +{{{ 13.179 +from B import C 13.180 +}}} 13.181 + 13.182 +Module B: 13.183 + 13.184 +{{{ 13.185 +class C: 13.186 + ... 13.187 +}}} 13.188 +}}}} 13.189 + 13.190 +Here, B is never explicitly referenced, nor does it provide any referenced objects other than an imported name. Consequently, B is hidden and ultimately excluded from the final program. Such techniques are employed in the [[../Builtins|built-ins]] package hierarchy to reduce the amount of functionality employed by (and bundled in) a generated program.
14.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 14.2 +++ b/docs/wiki/Inspection Tue Apr 11 21:41:22 2017 +0200 14.3 @@ -0,0 +1,418 @@ 14.4 += Inspecting Programs = 14.5 + 14.6 +A program's source code is inspected module by module. As [[../Imports|imports]] of modules are encountered, other modules are added to the program. The inspection process for each module involves top-to-bottom traversal of the code using depth-first processing of the abstract syntax tree, with a stack of namespaces being employed to record definitions and names within namespaces are they are visited. Inspection continues until all modules in the program have been imported and inspected. 14.7 + 14.8 +<<TableOfContents(2,3)>> 14.9 + 14.10 +== The Inspection Process == 14.11 + 14.12 +The inspection process is focused on two primary tasks: 14.13 + 14.14 + * Populating namespaces 14.15 + * Identifying the names defined by each namespace 14.16 + * Recording relationships between namespaces 14.17 + * Describing the operations in each namespace 14.18 + * Identifying the names in each namespace and the attributes used by those names 14.19 + * Recording details of assignments and invocations 14.20 + 14.21 +The results of inspection are written out to [[../Cache|cache]] files, one for each module in the program. 14.22 + 14.23 +=== Program Units for Inspection === 14.24 + 14.25 +The `inspector` module performs much of the inspection work, relying on the `common` module for certain tasks, with the `modules` module providing the relevant module abstractions including those writing to and reading from cache files, and the `resolving` module completing each module's inspection. The `importer` module coordinates inspection across the whole program. 14.26 + 14.27 +=== Name Identification === 14.28 + 14.29 +Names can be introduced to a namespace via several different mechanisms: 14.30 + 14.31 + * Assignments to local names 14.32 + * References to global names 14.33 + * References to built-in names (defined in the built-in modules) 14.34 + * Importing of modules 14.35 + * Importing of names from modules 14.36 + 14.37 +The origins of names are discovered by considering local, global and built-in namespaces. Where a name is '''local''', it is defined in the same namespace. Where a name is '''global''', it is defined in the same module at the module level. Where a name is '''built-in''', it is defined in a module in the `__builtins__` package and exposed via the `__builtins__` namespace. 14.38 + 14.39 +Initial tentative identification of names will sort names into two categories: locals and external names. Global variables employed in function (or method) namespaces may not be defined when a function body is inspected, either within a module or being imported from another module, and so it is not initially possible to more specifically determine the nature of a name. 14.40 + 14.41 +These categories are later refined to distinguish between globals and built-in names as external names. The built-in origin of a name is only suggested when no locals or globals can provide the name, and the final identification of such names, plus other external names introduced as locals or globals via imports, will occur at the end of the inspection activity. Names that are unrecognised by then may be treated like unrecognised built-ins. 14.42 + 14.43 +=== Name Restrictions and Resolution === 14.44 + 14.45 +'''Static''' objects, forming the fundamental [[../Structure|structure]] of the program, expose their names through the general structure hierarchy. Classes, which are defined as part of this structure, depend on well-defined names for any base classes they employ. For example: 14.46 + 14.47 +{{{#!python numbers=disable 14.48 +class C(A): # depends on A being a name that can be resolved (and being a class) 14.49 + ... 14.50 + 14.51 +class D(A.B.C): # depends on A.B.C being resolvable (and being a class) 14.52 + ... 14.53 +}}} 14.54 + 14.55 +Base classes must be identifiable and unambiguous. Since base classes may be imported, their identification may not necessarily occur immediately, but it must be possible once all information about a program is known and when all such dependencies are resolved. 14.56 + 14.57 +=== Attribute Identification === 14.58 + 14.59 +Attributes are introduced to namespaces as follows: 14.60 + 14.61 + * For classes and modules, the definition of names defines attributes in those namespaces 14.62 + * Functions do not generally support attributes, although [[../Representations#Special_Members|representation-specific attributes]] do exist on functions 14.63 + * For instances, assignments to attributes of the special `self` name, performed in methods, define attributes in all instances of the method's class 14.64 + 14.65 +Attributes are only supported on program objects if they have been defined or '''bound''' as described above. Any attempt to access or set an attribute on an object using a name that was not determined through the above process is considered an invalid operation. Thus, augmenting the attributes available on an object (so-called "monkeypatching") is not possible. 14.66 + 14.67 +When an attribute is accessed, the location of its use is recorded and ultimately associated with assignments of the name involved, just as is done for accesses of the plain name itself, but the attribute details are subsequently collected together for each assignment or '''version''' of the name. This is discussed below. 14.68 + 14.69 +=== Inherited Attributes === 14.70 + 14.71 +In order to support inheritance, a process of propagation makes class and instance attributes available to any given class from its ancestor classes according to a depth-first traversal of a class's base class hierarchy. Thus, for each class, given the definition of its base classes, a complete collection of class and instance attribute names can be determined. The actual mechanism of propagation occurs during the consolidation phase of inspection, principally because class bases are not generally immediately identifiable upon completing the inspection of any given class. 14.72 + 14.73 +=== Name Assignments and Accesses === 14.74 + 14.75 +Each assignment of a name defines a '''version''' of the name within a namespace. The location of this definition consists of the following: 14.76 + 14.77 + * The namespace in which the assignment appears 14.78 + * The name itself 14.79 + * The version or assignment instance number 14.80 + 14.81 +When a name is used, the location of its use is recorded and is ultimately associated with the assignments that may be providing the name at that location. This permits information about the type of the name to become available for each usage location. The location of each name usage consists of the following: 14.82 + 14.83 + * The namespace in which the name appears 14.84 + * The name itself 14.85 + * The access instance number 14.86 + 14.87 +Name accesses are described as special cases of attribute accesses: where attributes would be indicated, none are specified. 14.88 + 14.89 +=== Attribute Accesses === 14.90 + 14.91 +Attribute '''accesses''' are operations involving attributes where those attributes are obtained or set in conjunction with an '''accessor''' expression. They are recorded in terms of location tuples, each describing an access as follows: 14.92 + 14.93 + * The namespace in which the access occurs 14.94 + * Any named accessor or an anonymous accessor 14.95 + * The attribute (or chain of attributes) involved (omitted for name accesses) 14.96 + * The access instance number 14.97 + 14.98 +As with name accesses, the access instance number distinguishes between different accesses employing the same details. 14.99 + 14.100 +Consider the following example: 14.101 + 14.102 +{{{{#!table 14.103 +<style="vertical-align: top"> 14.104 +{{{#!python numbers=disable 14.105 + 14.106 +def f(): 14.107 + p = ... # p version 0 14.108 + p.a # p.a access 0 14.109 + if fn().a: # anonymous a access 0 14.110 + q = ... # q version 0 14.111 + q.a # q.a access 0 14.112 + p # p access 0 14.113 + else: 14.114 + q = ... # q version 1 14.115 + q.a # q.a access 1 14.116 + q.b # q.b access 0 14.117 + p # p access 1 14.118 +}}} 14.119 +|| 14.120 +{{{#!graphviz 14.121 +//format=svg 14.122 +//transform=notugly 14.123 +digraph accesses { 14.124 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Names and accesses"]; 14.125 + edge [tooltip="Names and accesses"]; 14.126 + rankdir=TB; 14.127 + 14.128 + p [label="p = ...",style=filled,fillcolor=gold]; 14.129 + 14.130 + subgraph { 14.131 + rank=same; 14.132 + pa0expr [label="p.a",style=filled,fillcolor=cyan]; 14.133 + pa0 [label="p.a #0",style=filled,fillcolor=red]; 14.134 + } 14.135 + 14.136 + subgraph { 14.137 + rank=same; 14.138 + if [label="if",shape=ellipse]; 14.139 + fnaexpr [label="fn().a",style=filled,fillcolor=cyan]; 14.140 + fna [label="{}.a #0",style=filled,fillcolor=red]; 14.141 + } 14.142 + 14.143 + q0 [label="q = ...",style=filled,fillcolor=gold]; 14.144 + 14.145 + subgraph { 14.146 + rank=same; 14.147 + qa0expr [label="q.a",style=filled,fillcolor=cyan]; 14.148 + qa0 [label="q.a #0",style=filled,fillcolor=red]; 14.149 + } 14.150 + 14.151 + subgraph { 14.152 + rank=same; 14.153 + p0expr [label="p",style=filled,fillcolor=cyan]; 14.154 + p0 [label="p.{} #0",style=filled,fillcolor=red]; 14.155 + } 14.156 + 14.157 + else [label="else",shape=ellipse]; 14.158 + q1 [label="q = ...",style=filled,fillcolor=gold]; 14.159 + 14.160 + subgraph { 14.161 + rank=same; 14.162 + qa1expr [label="q.a #1",style=filled,fillcolor=cyan]; 14.163 + qa1 [label="q.a #1",style=filled,fillcolor=red]; 14.164 + } 14.165 + 14.166 + subgraph { 14.167 + rank=same; 14.168 + qb0expr [label="q.b",style=filled,fillcolor=cyan]; 14.169 + qb0 [label="q.b #0",style=filled,fillcolor=red]; 14.170 + } 14.171 + 14.172 + subgraph { 14.173 + rank=same; 14.174 + p1expr [label="p",style=filled,fillcolor=cyan]; 14.175 + p1 [label="p.{} #1",style=filled,fillcolor=red]; 14.176 + } 14.177 + 14.178 + p -> pa0expr -> if -> fnaexpr -> q0 -> qa0expr -> p0expr -> qb0expr; 14.179 + if -> else -> q1 -> qa1expr -> qb0expr; 14.180 + qb0expr -> p1expr; 14.181 + 14.182 + fnaexpr -> fna [dir=none,style=dashed]; 14.183 + pa0expr -> pa0 [dir=none,style=dashed]; 14.184 + qa0expr -> qa0 [dir=none,style=dashed]; 14.185 + p0expr -> p0 [dir=none,style=dashed]; 14.186 + qa1expr -> qa1 [dir=none,style=dashed]; 14.187 + qb0expr -> qb0 [dir=none,style=dashed]; 14.188 + p1expr -> p1 [dir=none,style=dashed]; 14.189 + 14.190 + p -> pa0 [dir=none]; 14.191 + p -> p0 [dir=none]; 14.192 + p -> p1 [dir=none]; 14.193 + 14.194 + q0 -> qa0 [dir=none]; 14.195 + q0 -> qb0 [dir=none]; 14.196 + 14.197 + q1 -> qa1 [dir=none]; 14.198 + q1 -> qb0 [dir=none]; 14.199 +} 14.200 +}}} 14.201 +}}}} 14.202 + 14.203 +Since the names involved may be provided by different name versions, accesses are counted independently. Meanwhile, a non-name or '''anonymous''' accessor may be involved in an access. Such anonymous accessors are independent and do not accumulate attribute usage because they potentially involve completely different objects. 14.204 + 14.205 +|| '''Namespace''' || '''Name''' || '''Attribute''' || '''Access number''' || 14.206 +|| `module.f` || `p` || `a` || 0 || 14.207 +|| `module.f` || `p` || `{}` || 0 || 14.208 +|| `module.f` || `p` || `{}` || 1 || 14.209 +|| `module.f` || `q` || `a` || 0 || 14.210 +|| `module.f` || `q` || `a` || 1 || 14.211 +|| `module.f` || `q` || `b` || 0 || 14.212 +|| `module.f` || `{}` || `a` || 0 || 14.213 + 14.214 +Accessors may be recorded using a similar location scheme. For example: 14.215 + 14.216 +|| '''Namespace''' || '''Name''' || '''Attribute''' || '''Name version''' || 14.217 +|| `module.f` || `p` || `{}` || 0 || 14.218 +|| `module.f` || `q` || `{}` || 0 || 14.219 +|| `module.f` || `q` || `{}` || 1 || 14.220 + 14.221 +Here, the attribute field is left empty and indicates that name definitions are being described. Although the data is superficially similar to name accesses, it should be remembered that accesses employ an access number whereas accessors employ a name version, with such identifiers being different things. 14.222 + 14.223 +=== Names and Attribute Usage === 14.224 + 14.225 +Within each scope, the names employed by the program and the attributes used with those names are tracked. As the path of execution diverges and converges again through control-flow constructs, the tracking attempts to maintain the '''attribute usage''' associated with each name, or specifically each '''version''' of each name. 14.226 + 14.227 +{{{{#!table 14.228 +<style="vertical-align: top"> 14.229 +{{{#!python numbers=disable 14.230 +y = ... # y version 0 14.231 +while cond0: 14.232 + if cond1: 14.233 + y.a1 14.234 + elif cond2: 14.235 + y = ... # y version 1 14.236 + y.a2 14.237 + else: 14.238 + y.a3 14.239 + 14.240 +# y version 0 is used with a1 or a3 or neither 14.241 + 14.242 +# y version 1 is used with a2 14.243 +}}} 14.244 +|| 14.245 +{{{#!graphviz 14.246 +//format=svg 14.247 +//transform=notugly 14.248 +digraph usage { 14.249 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Name and attribute tracking"]; 14.250 + edge [tooltip="Name and attribute tracking"]; 14.251 + rankdir=TB; 14.252 + 14.253 + sety0 [label="y = ...",style=filled,fillcolor=gold]; 14.254 + while [label="while cond0",shape=ellipse]; 14.255 + 14.256 + subgraph { 14.257 + rank=same; 14.258 + if [label="if cond1",shape=ellipse]; 14.259 + ya1 [label="y.a1",style=filled,fillcolor=cyan]; 14.260 + } 14.261 + 14.262 + subgraph { 14.263 + rank=same; 14.264 + elif [label="elif cond2",shape=ellipse]; 14.265 + sety1 [label="y = ...",style=filled,fillcolor=gold]; 14.266 + ya2 [label="y.a2",style=filled,fillcolor=green]; 14.267 + } 14.268 + 14.269 + subgraph { 14.270 + rank=same; 14.271 + ifelse [label="else",shape=ellipse]; 14.272 + ya3 [label="y.a3",style=filled,fillcolor=cyan]; 14.273 + } 14.274 + 14.275 + endif [label="end if",shape=ellipse]; 14.276 + 14.277 + whileelse [label="(else)",shape=ellipse]; 14.278 + 14.279 + endwhile [label="end while",shape=ellipse]; 14.280 + 14.281 + sety0 -> while -> if -> elif -> ifelse -> endif; 14.282 + if -> ya1 -> endif; 14.283 + elif -> sety1 -> ya2 -> endif; 14.284 + ifelse -> ya3 -> endif; 14.285 + endif -> while [style=dashed]; 14.286 + while -> whileelse -> endwhile; 14.287 +} 14.288 +}}} 14.289 +}}}} 14.290 + 14.291 +The outcome of such tracking should be an indication of the attribute usage with each name based on the shortest routes that names can take through the control-flow structure. Such shortest-route usage defines the minimal selection of attributes that can be considered used with a name, and thus such usage defines the broadest selection of object types that can be identified as supporting such attributes. In the above example, the following minimal selections of attributes apply: 14.292 + 14.293 +|| '''Name Version''' || '''Minimal Set of Attributes''' || '''Types''' || 14.294 +|| `y` (version 0) || ''empty set'' || ''any object'' || 14.295 +|| `y` (version 1) || `a2` || ''only objects providing the single attribute'' || 14.296 + 14.297 +The assumption is made that any attribute used with a name is always provided by the object referenced by that name and that the correct execution of the code does not rely on an attribute being absent (and thus raising an exception). By defining usage for each name, the toolchain can determine whether any type can provide the attributes used with a name, producing a compile-time error if no type supports such usage. 14.298 + 14.299 +It is possible that certain routes taken by names might define attribute usage that cannot be supported by types that do support the shortest-route usage, yet it might not be appropriate to forbid usage of such types with such names: the program logic may intend that such types do not visit the regions of the code that employ the attributes that such types cannot support. However, as a consequence, run-time tests will be required to prevent attribute accesses that are inappropriate for such types from taking place. In the above example, the following maximal selections apply: 14.300 + 14.301 +|| '''Name Version''' || '''Maximal Set of Attributes''' || '''Types''' || 14.302 +|| `y` (version 0) || `a1`, `a3` || ''only objects providing both attributes'' || 14.303 +|| `y` (version 1) || `a1`, `a2`, `a3` || ''only objects providing all three attributes'' || 14.304 + 14.305 +Tracking occurs separately in function or method namespaces and at a level combining the static namespaces in a module. This latter combination of namespaces permits the flow of global name details through class namespaces. Tracking employs special '''tracking names''' with which usage is associated, with globals and class attributes employing complete object paths to describe their names, whereas locals merely employ the plain names defined and used in local namespaces. Some examples follow: 14.306 + 14.307 +|| '''Namespace''' || '''Name''' || '''Name Scope''' || '''Tracking Name''' || 14.308 +|| `__main__` (module) || `x` || global || `__main__.x` || 14.309 +|| `__main__.C` (class) || `x` || global || `__main__.x` || 14.310 +|| `__main__.C` (class) || `y` || local (class attribute) || `__main__.C.y` || 14.311 +|| `__main__.f` (function) || `y` || local || `y` || 14.312 + 14.313 +=== Name Initialisation and Aliases === 14.314 + 14.315 +Each name version may be associated with a value upon assignment provided by an expression known as an '''initialiser'''. Such values may be used to inform the interpretation of operations on names, restricting the types involved to the initialised values. Some examples are as follows: 14.316 + 14.317 +{{{#!python numbers=disable 14.318 +x = 123 # initialiser is a constant, indicating a known type 14.319 +x = classname() # initialiser refers to a class, indicating instantiation 14.320 +}}} 14.321 + 14.322 +Names may also be assigned the values of other names, and such a name becomes an '''alias''' for such other names. Aliases can therefore be used to combine attribute usage observations across names. Other forms of aliases involve assigning attributes accessed via other names to any given name. Some examples are as follows: 14.323 + 14.324 +{{{#!python numbers=disable 14.325 +y = x # y 14.326 +y.p # p can be assumed to apply to x 14.327 +z = x.q # z can be restricted to the types deduced for x.q 14.328 +}}} 14.329 + 14.330 +Initialising values are retained for later resolution because their identities are not generally known until certain name resolution activities have taken place. 14.331 + 14.332 +=== Invocations === 14.333 + 14.334 +During inspection only limited information is available about the nature of invocations. However, some information will already be available about global names and these may be defined by classes. Thus, invocations that cause the instantiation of classes may be determined even during the inspection phase. 14.335 + 14.336 +Information about class instantiation is most useful in combination with the initialisation of names. When an assignment occurs, any initialising expression that provides enough information can be evaluated to see if it describes instantiation. If so, the nature of the instantiation can be associated with the name and used in the deduction process to constrain any usage of that name and its attributes. Such restrictions on the types associated with names are applied in the deduction phase. 14.337 + 14.338 +=== Literals and Constants === 14.339 + 14.340 +Literal values or '''literals''' are specific values that appear in the program, typically representing numbers, character strings, mappings or collections. Literal numbers and strings are effectively '''constants''', meaning that their values are unambiguously defined and can eventually be referenced using a unique identifier that applies throughout the program and which can refer to an initialised object in any generated program. Initially, they are recorded for the namespace in which they appear. For example: 14.341 + 14.342 +|| '''Namespace''' || '''Identifier''' || '''Type''' || '''Encoding''' || '''Value''' || 14.343 +|| `__main__` || `__main__.$c0` || `__builtins__.str.string` || `iso-8859-15` || `'\xc6\xd8\xc5'` || 14.344 +|| `__main__` || `__main__.$c1` || `__builtins__.int.int` || || `123` || 14.345 + 14.346 +Since values may appear again in other namespaces, a mapping is generated from such local identifiers to common global identifiers for constants. Where such an identifier is derived from the value content, this can potentially occur immediately, although in practice (and in general) such a mapping will be generated after all modules have been inspected. 14.347 + 14.348 +Collections and mappings may also be initialised using literal syntax but may contain non-constant information such as names or expressions whose values are unknown before the program is run. Such values can be represented as instantiation operations of the appropriate type in any generated program, and the instances of the type concerned can be associated with any names to which such literals are assigned. Special names are used to refer to literal types for collections and mappings. For example: 14.349 + 14.350 +|| '''Identifier''' || '''Type''' || 14.351 +|| `$Ldict` || `__builtins__.dict.dict` || 14.352 +|| `$Llist` || `__builtins__.list.list` || 14.353 +|| `$Ltuple` || `__builtins__.tuple.tuple` || 14.354 + 14.355 +Such special names merely serve as convenient placeholders for the types of any literals mentioned in a module. 14.356 + 14.357 +== Consolidating Inspection Details == 14.358 + 14.359 +As briefly discussed above, certain activities occur after information has been collected from an input program. Within each module, names that are external to namespaces are recorded in a '''name reference''' collection. These references identify unrecognised names but do not generally define their origins. Examples of name references are as follows: 14.360 + 14.361 +|| '''Name''' || '''Identity''' || 14.362 +|| `__main__.repr` || `<deferred>:__builtins__.repr` || 14.363 +|| `__main__.sys` || `<module>:sys` || 14.364 + 14.365 +In order to obtain the final identities, deferred references may need to be resolved, yielding concrete references: 14.366 + 14.367 +|| '''Name''' || '''Identity''' || 14.368 +|| `__main__.repr` || `<function>:__builtins__.identity.repr` || 14.369 +|| `__main__.sys` || `<module>:sys` || 14.370 + 14.371 +This process of resolution cannot be performed immediately with only the knowledge available in a single module. Only with all program modules loaded can such resolution occur. 14.372 + 14.373 +=== Identifying Deferred References === 14.374 + 14.375 +After all modules are loaded, and with all objects present in the program at this point, each deferred reference defined within a module should ultimately yield an identity. Any failure to do so indicates usage of a name not defined in the program. The process of identifying them by resolving what each of them points to, is illustrated by the following example: 14.376 + 14.377 +|| '''Reference''' || '''Object Path to Search''' || 14.378 +|| `<depends>:__builtins__.repr` || `__builtins__.repr` || 14.379 +|| `<depends>:__builtins__.identity.repr` || `__builtins__.identity.repr` || 14.380 +|| `<function>:__builtins__.identity.repr` || ''identification has occurred'' || 14.381 + 14.382 +Initially, the origin information from the reference is used to find the details of the referenced object. However, in this case, the details yield another deferred reference that has not yet been resolved. Consequently, the origin information must be used to find the details of the object referenced in this case, finally yielding a reference with concrete object information. Deferred references are mutated to concrete references, thus changing the nature of such references throughout the accumulated data. 14.383 + 14.384 +=== Identifying Module Dependencies === 14.385 + 14.386 +During deferred reference resolution, relationships are catalogued between modules in which deferred references are found and those providing the corresponding referenced objects. In addition, module ordering dependencies are established on the basis that some kinds of objects must be initialised before being used in other parts of the program. As a result, the modules providing such objects must themselves be initialised before the modules referencing such objects. More information on this topic can be found in the [[../Imports#Import_Sequencing|import sequencing]] documentation. 14.387 + 14.388 +=== Resolving Module Details === 14.389 + 14.390 +With all dependencies resolved, further work can be done with the details within each module. The `resolving` module provides the functionality to each module instance to perform such work. 14.391 + 14.392 + 1. Class bases can be identified. 14.393 + 1. Special names (referring to built-in objects employed by certain operations) are resolved. 14.394 + 1. Each access involving an external name is investigated to see if it provides an effectively constant object (as described below). 14.395 + 1. Invocation references are converted to provide instantiation information, if appropriate. 14.396 + 1. Initialisers are investigated to see if they provide concrete object details and can thus indicate the nature of a name. 14.397 + 1. Constant value literals are associated with the appropriate types. 14.398 + 14.399 +Thus, attribute details for each module and its classes are finalised. Meanwhile, modules that are still hidden are removed from the program. 14.400 + 14.401 +==== Investigating Accesses ==== 14.402 + 14.403 +External names will not have been resolved during the inspection process, but with information about the whole program, it becomes possible to identify names and to determine whether attribute accesses involving those names can also be identified. Thus, name references from each namespace are investigated in turn as follows. 14.404 + 14.405 +Each name referencing a static object is considered a constant access of that object. However, given a number of accompanying attributes describing an access, each attribute is added in turn, and the resulting object path identified. If the path identifies a constant object, the next attribute in the access chain is added and the resulting object path identified, and so on. The maximal collection of attributes identifying a constant object is then recorded as a constant access, with any remaining attributes in the access chain retained for traversal in a later phase. 14.406 + 14.407 +Names yielding constant accesses do not need to have their identity deduced through the application of attribute usage. 14.408 + 14.409 +==== Investigating Initialisers ==== 14.410 + 14.411 +During inspection, initialisers will have been recorded in terms of expression results such as names, invocations, readily-identifiable instances, and attribute accesses. With details of the whole program plus constant accesses having been made available, such results may be definitively identified and associated with initialised names. 14.412 + 14.413 +Alias information may also be refined at this point by identifying attribute accesses that are used to initialise names. 14.414 + 14.415 +=== Finalising Program Details === 14.416 + 14.417 +With class contents and relationships identified, class attribute inheritance and instance attribute availability can be defined. Some classes may depend on external state for their initialisation, and so additional module dependencies may be defined at this stage. The `importer` module contains the functionality to conduct these whole-program activities. 14.418 + 14.419 +== Inspection Outcome == 14.420 + 14.421 +Once inspection is complete, the available knowledge concerns the whole program and not merely individual modules or parts of the program. However, only limited efforts have been made until this point, notably in the acquisition of statically-defined object details when referencing names or attributes, to integrate the different modules. The [[../Deduction|deduction]] process is concerned with such integration.
15.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 15.2 +++ b/docs/wiki/Optimisation Tue Apr 11 21:41:22 2017 +0200 15.3 @@ -0,0 +1,157 @@ 15.4 += Optimisation = 15.5 + 15.6 +In the process of generating an output representation of a Lichen program, the process of optimisation is concerned with positioning members within program structures. Attributes are positioned in object structures, and '''object tables''' are defined so that attributes can be located in such objects. Similarly, parameters, whose positions are determined by their appearance in function and other callable signatures, have positioning information defined in '''parameter tables''' that can be used to position arguments in parameter arrays when their names are used in argument lists. 15.7 + 15.8 +Also performed during optimisation is the consolidation of constant information, initially discussed [[../Inspection#Literals_and_Constants|in the context of inspection]]. 15.9 + 15.10 +<<TableOfContents(2,3)>> 15.11 + 15.12 +== Populating Objects == 15.13 + 15.14 +A program will have objects that are classes, modules and instances, and each such object will provide a number of attributes. Each object's attributes are stored in an array. For simplicity, each attribute having a given name will always be positioned at the same place in any array provided by an object featuring an attribute with that name. Even different object types will use the same position. 15.15 + 15.16 +Consider an attribute called `elephant` provided by a number of types: 15.17 + 15.18 +|| '''Type''' ||<-5> '''Attributes''' || 15.19 +|| class C || `__class__` || `ant` || `dog` ||<style="background-color: #ddd"> `elephant` || `cat` || 15.20 +|| class D || `__class__` || `ant` || `duck` ||<style="background-color: #ddd"> `elephant` || `horse` || 15.21 +|| class E || `__class__` || `ant` || `dog` || `giraffe` || `horse` || 15.22 +|| module M || `__class__` || `rhino` || `hippo` ||<style="background-color: #ddd"> `elephant` || `zebra` || 15.23 + 15.24 +Where `elephant` is provided as an attribute, it will always appear in the same position - in this case as the fourth attribute - in any object attribute array. 15.25 + 15.26 +Consequently, any operation involving an object of unknown identity that employs `elephant` can employ the same position information to determine whether the attribute is supported and to retrieve or modify its value. Where an object does not support `elephant`, it may use the same attribute position for another attribute. Determining whether objects support attributes is done using tables, described below. 15.27 + 15.28 +It should be noted that the positions of attributes in object structures are the same as the positions of attribute identifiers in object tables. With attribute positions allocated, table definition is trivial. 15.29 + 15.30 +=== Allocation Considerations === 15.31 + 15.32 +Such common positioning of attributes demands a degree of coordination between objects. If `elephant` is positioned in a given place in one object, then given the constraint of it only ever being positioned in a single location for all objects, it may not then be positioned in a different place in another object. Thus, where two attributes co-exist in an object, their positions must be different and will affect all objects supporting those attributes, regardless of whether they support them both. For example: 15.33 + 15.34 +|| '''Type''' ||<-5> '''Attributes''' || 15.35 +|| class C || `__class__` ||<style="background-color: #ddd"> `ant` || `dog` ||<style="background-color: #ddd"> `elephant` || `cat` || 15.36 +|| class D || `__class__` ||<style="background-color: #ddd"> `ant` || `duck` ||<style="background-color: #ddd"> `elephant` || `horse` || 15.37 +|| class E || `__class__` ||<style="background-color: #ddd"> `ant` || `dog` || `giraffe` || `horse` || 15.38 +|| module M || `__class__` || `rhino` || `hippo` ||<style="background-color: #ddd"> `elephant` || `zebra` || 15.39 +|| module N || `__class__` || || ||<style="background-color: #ddd"> `elephant` || `zebra` || 15.40 + 15.41 +Here, module N still positions `elephant` in the fourth position. If `elephant` were moved to the second position, it would conflict with `ant` or `rhino` in those objects supporting those attributes. Such coordination therefore has an impact on allocation strategies. Care has to be taken to allocate attributes in such a way that small objects (having few attributes) do not have their attributes positioned far from the start of their attribute arrays, because failing to do so effectively makes those objects large, inefficiently-represented objects. 15.42 + 15.43 +A reasonable solution is to consider the following when allocating attribute positions: 15.44 + 15.45 + * Attribute frequency (or ubiquity) 15.46 + * Object size 15.47 + * Object kind (whether the object is a class, module or instance) 15.48 + 15.49 +More frequently-occurring (or ubiquitous) attributes may need to appear in both large and small objects and should probably be allocated in lower positions (`__class__` being an extreme example of needing to be allocated for all objects). Attributes featured in small objects should also be given priority for lower positions due to the desirability of keeping such objects small. Meanwhile, classes and modules only appear once in a program, whereas there may be countless instances allocated during the lifetime of a program's execution. Therefore, attributes featured in instances should have priority over other attributes for lower positions in object structures. 15.50 + 15.51 +== Populating Signatures == 15.52 + 15.53 +The positions of parameters in callable signatures are determined by the definitions of those signatures in the source program. When callables are invoked using an argument list, arguments that are not specified using keywords are merely copied into the parameter array in the same position. However, keyword arguments will need to have their positions looked up in the appropriate parameter table for the callable involved. 15.54 + 15.55 +Consequently, no allocation needs to be performed on the parameters themselves: they have specific positions for each callable. However, just as attribute names must yield positions when accessing attributes on objects, a similar mechanism that takes parameter names and yields positions must be provided. It is instead the positions of parameter details that must be allocated in structures to be consulted as if parameter names were attribute names, with attributes providing parameter position details. 15.56 + 15.57 +Consider the following functions: 15.58 + 15.59 +{{{#!python numbers=disable 15.60 +def f(a, b, c, d): 15.61 + ... 15.62 + 15.63 +def g(p, q, r): 15.64 + ... 15.65 + 15.66 +def h(d, p, v): 15.67 + ... 15.68 +}}} 15.69 + 15.70 +In order to find the position of each parameter using its name, the following table could be provided: 15.71 + 15.72 +|| '''Table''' ||<-4> '''Parameters''' || 15.73 +|| function f || `a` at 1 || `b` at 2 || `c` at 3 || `d` at 4 || 15.74 +|| function g || `q` at 2 ||<style="background-color: #ddd"> `p` at 1 || `r` at 3 || || 15.75 +|| function h || `v` at 2 ||<style="background-color: #ddd"> `p` at 2 || || `d` at 1 || 15.76 + 15.77 +Such a table behaves like an object structure (or an object table) with parameters acting like attributes in such a structure. Here, the attributes yield the positions of parameters within parameter arrays. Note how `p` always resides in the same location but may yield different positions depending on the actual callable involved (as is also observable with `d`). 15.78 + 15.79 +=== Allocation Considerations === 15.80 + 15.81 +Parameter table allocation involves similar considerations to those influencing object table allocation. In order to keep parameter tables small, more frequently appearing parameters should be positioned earlier in arrays. A specific consideration of importance is that of the number of tables generated. There may be many callables with the same signature, and such callables therefore do not need their own parameter tables since they will merely be duplicates. An attempt is therefore made to identify distinct signatures and to generate parameter tables only for these signatures, instead of providing a one-to-one mapping between callables and tables. 15.82 + 15.83 +== Populating Tables == 15.84 + 15.85 +With names allocated to positions in each kind of table, the straightforward task of generating such tables proceeds as follows. 15.86 + 15.87 +=== Object Tables and Attribute Codes === 15.88 + 15.89 +Object tables consist of locations directly corresponding to structure member locations. Each location in a table will correspond to a specific name, but since potentially many names may have the same position, a table must provide identifying details for the name that is actually supported. 15.90 + 15.91 +In object tables, such identifying details take the form of '''attribute codes'''. Each attribute name is mapped to a distinct identifier, and upon consulting an object table, the lookup process must read the stored attribute code and compare it to the code for the attribute it intends to access. If these codes match, then it can be assumed that the object involved supports the named attribute. Otherwise, a different attribute (or even no attribute at all) resides at that location in the object structure, making the access inappropriate. 15.92 + 15.93 +A more comprehensive object table resembles the following: 15.94 + 15.95 +|| '''Type''' ||<-5> '''Attributes (Codes)''' || 15.96 +|| class C || `__class__` (1) || `ant` (2) || `dog` (4) || `elephant` (6) || `cat` (3) || 15.97 +|| class D || `__class__` (1) || `ant` (2) || `duck` (5) || `elephant` (6) || `horse` (9) || 15.98 +|| class E || `__class__` (1) || `ant` (2) || `dog` (4) || `giraffe` (7) || `horse` (9) || 15.99 +|| module M || `__class__` (1) || `rhino` (10) || `hippo` (8) || `elephant` (6) || `zebra` (11) || 15.100 +|| module N || `__class__` (1) || || || `elephant` (6) || `zebra` (11) || 15.101 + 15.102 +The following attribute codes would be employed: 15.103 + 15.104 +|| '''Attribute Name''' || '''Attribute Code''' || 15.105 +|| `__class__` || 1 || 15.106 +|| `ant` || 2 || 15.107 +|| `cat` || 3 || 15.108 +|| `dog` || 4 || 15.109 +|| `duck` || 5 || 15.110 +|| `elephant` || 6 || 15.111 +|| `giraffe` || 7 || 15.112 +|| `hippo` || 8 || 15.113 +|| `horse` || 9 || 15.114 +|| `rhino` || 10 || 15.115 +|| `zebra` || 11 || 15.116 + 15.117 +=== Parameter Tables and Parameter Codes === 15.118 + 15.119 +Parameter tables consist of locations yielding parameter position information. Each location in a table will correspond to a particular name, but since potentially many names may have the same position, a table must provide identifying details for the name that is actually supported. 15.120 + 15.121 +Just as with object tables, in parameter tables, such identifying details take the form of '''parameter codes'''. Each parameter name is mapped to a distinct identifier, and upon consulting a parameter table, the lookup process must read the stored parameter code and compare it to the code for the parameter it is attempting to position in a parameter list. If these codes match, then it can be assumed that the signature supports the named parameter. Otherwise, a different parameter (or even no parameter at all) resides at that location in the parameter table, making any attempt to set such a parameter inappropriate. 15.122 + 15.123 +Since parameter tables provide both identifying information and parameter positions, a more comprehensive parameter table resembles the following: 15.124 + 15.125 +|| '''Table''' ||<-4> '''Parameters (Codes)''' || 15.126 +|| function f || `a` at 1 (1) || `b` at 2 (2) || `c` at 3 (3) || `d` at 4 (4) || 15.127 +|| function g || `q` at 2 (6) ||<style="background-color: #ddd"> `p` at 1 (5) || `r` at 3 (7) || || 15.128 +|| function h || `v` at 2 (8) ||<style="background-color: #ddd"> `p` at 2 (5) || || `d` at 1 (4) || 15.129 + 15.130 +The following parameter codes would be employed: 15.131 + 15.132 +|| '''Parameter Name''' || '''Parameter Code''' || 15.133 +|| `a` || 1 || 15.134 +|| `b` || 2 || 15.135 +|| `c` || 3 || 15.136 +|| `d` || 4 || 15.137 +|| `p` || 5 || 15.138 +|| `q` || 6 || 15.139 +|| `r` || 7 || 15.140 +|| `v` || 8 || 15.141 + 15.142 +== Consolidating Constants == 15.143 + 15.144 +With knowledge about all constants used in a program, it becomes possible to prepare a catalogue of distinct constants and to assign each of them a unique name for convenient referencing in the generated program code. All constants are inspected in turn and a content digest prepared for each of them, establishing a mapping from values to such digests which act as global identifiers. Since local names are associated with constants, a mapping is also established from each local name to the corresponding global identifier. 15.145 + 15.146 +Considering the [[../Inspection#Literals_and_Constants|previous example]], the following mappings would be established, from values to global identifiers: 15.147 + 15.148 +|| '''Type''' || '''Encoding''' || '''Value''' || '''Global Identifier (Content Digest)''' || 15.149 +|| `__builtins__.str.string` || `iso-8859-15` || `'\xc6\xd8\xc5'` || `OeyoRfPI__XqwJcPgTDTItX9v__OM` || 15.150 +|| `__builtins__.int.int` || || `123` || `qPRjUheGUngBhhVaHVqYNbBu4m8` || 15.151 + 15.152 +And from local names or identifiers to global identifiers: 15.153 + 15.154 +|| '''Identifier''' || '''Global Identifier (Content Digest)''' || 15.155 +|| `__main__.$c0` || `OeyoRfPI__XqwJcPgTDTItX9v__OM` || 15.156 +|| `__main__.$c1` || `qPRjUheGUngBhhVaHVqYNbBu4m8` || 15.157 + 15.158 +== Optimisation Products == 15.159 + 15.160 +The optimisation process should produce catalogues of attribute and parameter codes plus positioning information for object tables, object structures and parameter tables. It should also produce a catalogue of distinct constant identifiers.
16.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 16.2 +++ b/docs/wiki/Prerequisites Tue Apr 11 21:41:22 2017 +0200 16.3 @@ -0,0 +1,7 @@ 16.4 += Prerequisites = 16.5 + 16.6 +The software currently requires Python to operate. All recent development has been done using Python 2.7 which should be available for most operating system distributions. In future, Lichen will hopefully be compiled using its own toolchain and not require Python. See the Debian `python-minimal` or `python` packages or their equivalents. 16.7 + 16.8 +To produce executable programs, a C compiler and GNU make are required. The GNU C compiler, GCC, has been used for development and testing, and it should be available in various forms for most operating system distributions. Indeed, cross-compiling form of GCC has been used to test the toolchain output. See the Debian `gcc` and `make` packages or their equivalents. 16.9 + 16.10 +As Free Software, priority has been given to supporting Lichen on Free Software platforms, with Debian GNU/Linux being the development platform. Proprietary platforms are neither supported nor recommended.
17.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 17.2 +++ b/docs/wiki/Representations Tue Apr 11 21:41:22 2017 +0200 17.3 @@ -0,0 +1,149 @@ 17.4 += Representing Program Objects = 17.5 + 17.6 +Certain representations have been chosen for program objects that attempt to support efficient access to those objects during the execution of a program. 17.7 + 17.8 +<<TableOfContents(2,2)>> 17.9 + 17.10 +== Attributes == 17.11 + 17.12 +The principal means of referring to an object in a program is by using an '''attribute''', having this name because it is the representation of an attribute in classes, instances and modules. Each attribute can hold a reference to an object, known as the '''value''', or other kinds of data: 17.13 + 17.14 +{{{#!graphviz 17.15 +//format=svg 17.16 +//transform=notugly 17.17 +digraph attributes { 17.18 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Attributes"]; 17.19 + edge [fontsize="13.0",fontname="Helvetica",tooltip="Attributes"]; 17.20 + rankdir=TB; 17.21 + 17.22 + attrA [label="attribute | { value |<value> reference to object }",shape=record]; 17.23 + obj1 [label="<main> object | { attr1 | value } | { attr2 | value } | ...",shape=record]; 17.24 + 17.25 + attrB [label="attribute | { intvalue |<intvalue> 12345 }",shape=record]; 17.26 + 17.27 + attrA:value -> obj1:main; 17.28 +} 17.29 +}}} 17.30 + 17.31 +Although a value indicates a specific object of interest for an attribute, if the object is callable then additional '''context''' information may be required to call the object. Such context information is not stored in an attribute record but is instead obtained from the object itself, if appropriate. 17.32 + 17.33 +=== Integer Representation === 17.34 + 17.35 +The `intvalue` member of the attribute structure is employed instead of the `value` member to store integer values. Since `value` normally holds a pointer, and since pointers are often aligned to certain address boundaries on many modern platforms (usually four-byte boundaries on 32-bit platforms, eight-byte boundaries on 64-bit platforms, two-byte boundaries on platforms with 16-bit addressing), the lowest significant bit (bit 0) will typically be zero for a valid pointer. Consequently, by setting bit 0 to 1, other data can be stored in the remaining bits and be distinguished from pointer information. Obviously, operations on attributes first need to test whether the `value` member or the `intvalue` member is in use by testing bit 0. 17.36 + 17.37 +Thus, integers are transformed and stored directly within attributes, and they are therefore passed around by value. When an attribute of an integer needs to be accessed, the operation usually providing the `value` member, thus obtaining an instance under normal circumstances, instead provides the address of a common integer instance. This instance may then provide instance attributes, whose values will be the same for all integers, or class attributes through a reference to the integer (`int`) class. 17.38 + 17.39 +Each method provided by `int`, when called, will be given the original attribute for which the method was obtained as its context. In such methods, operations on the context via `self` will either involve the propagation of the attribute to other functions or methods or attribute accesses on `self`, yielding common instance attributes or class attributes as already described. Only native functions will attempt to interpret the attribute in a different way, decoding the representation, performing arithmetic or logical operations, and encoding the result in a new attribute. 17.40 + 17.41 +== Contexts and Wrappers == 17.42 + 17.43 +The context of an object is of significance if that object is callable. For example, an instance may permit access to a method via an attribute. Since the method will be callable, and since the method is accessed via the instance, the context of that method will be the instance. In order to retain both context and value information, a '''wrapper''' may be created. 17.44 + 17.45 +{{{#!graphviz 17.46 +//format=svg 17.47 +//transform=notugly 17.48 +digraph wrappers { 17.49 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Wrappers"]; 17.50 + edge [fontsize="13.0",fontname="Helvetica",tooltip="Wrappers"]; 17.51 + rankdir=TB; 17.52 + 17.53 + inst [label="<main> instance | { attr1 |<attr1> reference to method } | { attr2 | value } | ...",shape=record]; 17.54 + method [label="<main> method | ...",shape=record]; 17.55 + wrapper [label="<main> wrapper | { __context__ |<context> reference to instance } | { __value__ |<value> reference to method } | ...",shape=record]; 17.56 + 17.57 + inst:attr1 -> method:main; 17.58 + wrapper:context -> inst:main; 17.59 + wrapper:value -> method:main; 17.60 +} 17.61 +}}} 17.62 + 17.63 +The context is not always the accessor of the object - in this case, the instance - because the object may already be a wrapper with its own context. 17.64 + 17.65 +== Objects == 17.66 + 17.67 +Classes, instances and modules are objects, and all of these kinds of objects provide metadata describing the type of each object, together with a collection of attributes forming the contents of such objects. 17.68 + 17.69 +{{{#!graphviz 17.70 +//format=svg 17.71 +//transform=notugly 17.72 +digraph objects { 17.73 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Objects"]; 17.74 + edge [fontsize="13.0",fontname="Helvetica",tooltip="Objects"]; 17.75 + rankdir=TB; 17.76 + 17.77 + instC [label="<main> instance of C | { 0 | reference to\ninstance table } | { __class__ |<cls> reference\nto C } | { a | value } | { b | value } | ...",shape=record]; 17.78 + clsC [label="<main> class C | { class identifier | reference to\nclass table } | { __class__ |<cls> reference\nto type } | { __fn__ | instantiator\nreference } | { __args__ | reference to\nparameter table } | { f | <f> reference\nto f } | { g | value } | ...",shape=record]; 17.79 + methodF [label="<main> function f | { 0 | reference to\ninstance table } | { __class__ |<cls> reference\nto function } | { __fn__ | method\nreference } | { __args__ | reference to\nparameter table } | ...",shape=record]; 17.80 + function [label="<main> class function | { class identifier | reference to\nclass table } | { __class__ |<cls> reference\nto type } | ...",shape=record]; 17.81 + type [label="<main> class type | { class identifier | reference to\nclass table } | { __class__ |<cls> reference\nto type } | ...",shape=record]; 17.82 + 17.83 + instC:cls -> clsC:main; 17.84 + clsC:cls -> type:main; 17.85 + clsC:f -> methodF:main; 17.86 + methodF:cls -> function:main; 17.87 +} 17.88 +}}} 17.89 + 17.90 +Classes are represented by structures whose members reference class attributes and class metadata (the class table), as well as incorporating invocation metadata (the `__args__` and `__fn__` special attributes). 17.91 + 17.92 +Instances are represented by structures whose members reference instance attributes (including `__class__` which indicates the class instantiated by a given instance) and instance metadata (the instance table), as well as incorporating invocation metadata (the `__args__` and `__fn__` special attributes). 17.93 + 17.94 +Functions are instances of a general function type that does not permit additional, general instance attributes. However, function instance structures may have extra members corresponding to default parameter values. Access to such extra members is performed using the minimum and maximum values provided via `__args__` and with knowledge of the number of declared instance attributes indicated by the instance table for the function type. 17.95 + 17.96 +Modules are represented by structures whose members reference module attributes and module metadata (the module table). 17.97 + 17.98 +== Special Members == 17.99 + 17.100 +All object representations support the following special members describing the invocation properties of each object: 17.101 + 17.102 +{{{#!table 17.103 +`__args__` || the minimum number of arguments supported in an invocation and a reference to the parameter table for the object 17.104 +== 17.105 +`__fn__` || a reference to a native function containing the actual code run when calling the object 17.106 +}}} 17.107 + 17.108 +Classes are invoked in order to create instances of each class ('''instantiation'''). Instances may support invocation by providing a `__call__` method. Functions are supported by instances of a general function class. Modules are generally not callable and will not actually provide these special members in practice. 17.109 + 17.110 +All object representations support information about their type: 17.111 + 17.112 +{{{#!table 17.113 +`__class__` 17.114 +|| the class of the object: instances refer to their classes, classes refer to the `type` class, functions are instances that refer to the `function` class, modules refer to the `module` class 17.115 +}}} 17.116 + 17.117 +Certain kinds of object support other descriptive attributes: 17.118 + 17.119 +{{{#!table 17.120 +`__name__` || the name of a class or a function 17.121 +== 17.122 +`__parent__` || the parent scope of a class or a function 17.123 +}}} 17.124 + 17.125 +Objects supported by native, system-level functionality require a means of retaining information in special attributes: 17.126 + 17.127 +{{{#!table 17.128 +`__data__` || private data manipulated at the native level 17.129 +}}} 17.130 + 17.131 +Strings support special annotation attributes that permit their use in dynamically resolving attributes: 17.132 + 17.133 +{{{#!table 17.134 +`__key__` || the code and position of the attribute whose name is represented by the string 17.135 +}}} 17.136 + 17.137 +Such "key" attributes provide information that can be used to inspect an object table and to test for the presence of an attribute. With such information, the `getattr` and `hasattr` functions can be supported. 17.138 + 17.139 +== Attribute Tables == 17.140 + 17.141 +In order to provide information about the attributes supported by each object, the structure of each object will reference a table containing entries describing these supported attributes. The size of this table is declared within the table structure, and for each position in the table an entry corresponding to the same position within an actual object structure describes the nature of the attribute at that position. 17.142 + 17.143 +== Parameter Tables == 17.144 + 17.145 +In order to support argument validation and keyword arguments in invocations, a structure is referenced by `__args__` that indicates... 17.146 + 17.147 + * The minimum number of parameters supported by a callable 17.148 + * The maximum number of parameters supported by a callable 17.149 + * The size of the table describing the parameters 17.150 + * A table of entries with each entry indicating the nature of a parameter (in effect, which name it uses, albeit as a generated code instead of a string) and the position of the parameter in any argument list prepared for an invocation 17.151 + 17.152 +Parameter tables only need to be consulted at run-time if the nature of a callable is undetermined. By supporting a uniform interface, the arguments used in an invocation can be tested against the description provided by `__args__` and the table.
18.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 18.2 +++ b/docs/wiki/Restarted Tue Apr 11 21:41:22 2017 +0200 18.3 @@ -0,0 +1,282 @@ 18.4 += Lichen Restarted = 18.5 + 18.6 +Originally, lots of work was being put in to support various Python features that are arguably superfluous. The realisation was had that a lot of effort was being made for little practical benefit by trying to support things that are, [[../Design|in the larger picture]], not that important. Consequently, Lichen was refocused on a smaller set of more useful Python features. 18.7 + 18.8 +This document is of historical interest only, with the [[../Design|design]] and other documents attempting to communicate the results of this restarting effort. Some obsolete information is therefore preserved below. For example, attributes hold context information in the diagrams, but context information is now held in wrappers or is maintained separately within programs. 18.9 + 18.10 +Objectives: 18.11 + 18.12 + * Individual module inspection 18.13 + * No importing triggered during module inspection 18.14 + * All unresolved external references are set to `<depends>` 18.15 + * Hierarchical module namespaces are not exposed in programs 18.16 + * Modules are independent: package hierarchies are not traversed when importing 18.17 + * Nested scopes will be dropped 18.18 + * If expressions and comprehensions will be dropped 18.19 + * `self` is a reserved name and is optional in method parameter lists 18.20 + * Unbound methods must be bound using a special function taking an instance 18.21 + * Functions assigned to classes do not become unbound methods 18.22 + 18.23 +== Names == 18.24 + 18.25 +Names are locals, globals or built-ins. (Special names exist internally to support certain operations.) 18.26 + 18.27 +Locals inside functions are dynamic; locals outside functions are static, as are module globals. Built-ins are defined statically in the `__builtins__` package. 18.28 + 18.29 +== Imports == 18.30 + 18.31 +Imports provide access to external references. The "leaf" module in a module path is the module returned by the statement. 18.32 + 18.33 +Indicate that module "compiler" is accessed via compiler... 18.34 +{{{#!python numbers=disable 18.35 +import compiler 18.36 +}}} 18.37 + 18.38 +Indicate that module "compiler" is accessed via comp... 18.39 +{{{#!python numbers=disable 18.40 +import compiler as comp 18.41 +}}} 18.42 + 18.43 +Indicate that module "compiler.ast" is accessed via ast; module "compiler.transformer" is accessed via tr... 18.44 +{{{#!python numbers=disable 18.45 +import compiler.ast as ast, compiler.transformer as tr 18.46 +}}} 18.47 + 18.48 +Import compiler.ast, access Function... 18.49 +{{{#!python numbers=disable 18.50 +from compiler.ast import Function 18.51 +}}} 18.52 + 18.53 +Import compiler.ast, access Function as F... 18.54 +{{{#!python numbers=disable 18.55 +from compiler.ast import Function as F 18.56 +}}} 18.57 + 18.58 +This causes some semantic differences with Python, with the most significant one being the following: 18.59 + 18.60 +{{{{#!table 18.61 +'''Python''' || '''Lichen''' 18.62 +== 18.63 +<style="vertical-align:top"> 18.64 + 18.65 +Import compiler, import compiler.ast, set ast on compiler... 18.66 +{{{#!python numbers=disable 18.67 +import compiler.ast 18.68 +}}} 18.69 +...returning compiler 18.70 + 18.71 +|| 18.72 +<style="vertical-align:top"> 18.73 + 18.74 +Import compiler.ast... 18.75 +{{{#!python numbers=disable 18.76 +import compiler.ast 18.77 +}}} 18.78 +...returning compiler.ast as ast 18.79 + 18.80 +}}}} 18.81 + 18.82 +Some statements can be rewritten to achieve the same effect: 18.83 + 18.84 +{{{{#!table 18.85 +'''Python''' || '''Lichen''' 18.86 +== 18.87 +<style="vertical-align:top"> 18.88 + 18.89 +Import compiler, access ast as submodule... 18.90 +{{{#!python numbers=disable 18.91 +from compiler import ast 18.92 +}}} 18.93 + 18.94 +|| 18.95 +<style="vertical-align:top"> 18.96 + 18.97 +Import compiler.ast... 18.98 +{{{#!python numbers=disable 18.99 +import compiler.ast 18.100 +}}} 18.101 +...returning compiler.ast as ast 18.102 + 18.103 +}}}} 18.104 + 18.105 +Other kinds of import are not directly possible with Lichen. For example: 18.106 + 18.107 +Import all names from compiler.ast... 18.108 +{{{#!python numbers=disable 18.109 +from compiler.ast import * 18.110 +}}} 18.111 + 18.112 +Some notes: 18.113 + 18.114 + * Names not defined in a module and not declared in import statements are unresolved 18.115 + * Modules are identified during inspection but are not loaded 18.116 + * Instead, modules are added to a list and are imported later 18.117 + * Names imported from modules are set to `<depends>` (since the contents of modules will not generally be known) 18.118 + * Names are resolved in a later activity 18.119 + 18.120 +== Self == 18.121 + 18.122 +In Python: 18.123 + 18.124 + * The `self` name provides access to the instance associated with a method 18.125 + * The instance is supplied by a "context", initialised when a method is obtained from an instance (or through other attribute accesses) 18.126 + * Upon invocation, any instance context must be assigned to the `self` parameter, provided the callable is a method 18.127 + * Meanwhile, any non-instance context is not assigned to the `self` parameter, which should be provided explicitly for class-accessed methods 18.128 + * Plain functions never expose `self` or have `self` initialised, even if they have been assigned to an instance 18.129 + 18.130 +Apart from tests for the nature of the context and the callable, the argument list is effectively variable. 18.131 + 18.132 +With `self` as a ubiquitous, hidden parameter: 18.133 + 18.134 + * The `self` name still provides access to the instance associated with a method 18.135 + * The instance is still supplied by a "context", initialised when a method is obtained from an instance (or through other attribute accesses) 18.136 + * Upon invocation, `self` is included in the argument list regardless of the nature of the callable 18.137 + * Class-accessed methods would have their class as context, following from the above, but `self` may not refer to a class: it must be an instance 18.138 + * To combine class-accessed methods with instance contexts, a special function is employed 18.139 + 18.140 +The argument list for each callable thus remains static, at a cost of allocating an extra argument that may not be used. (Various calling conventions for certain processor architectures employ potentially unused registers, anyway.) Note that a callable may support defaults, however, and thus any argument list may need extending to include default values for parameters without corresponding arguments. 18.141 + 18.142 +{{{{#!table 18.143 +'''Python''' || '''Without self''' 18.144 +== 18.145 +<style="vertical-align:top"> 18.146 + 18.147 +{{{#!python numbers=disable 18.148 +inst.method(x, y, z) 18.149 +# -> inst.method(inst, x, y, z) 18.150 + 18.151 +def method(self, a, b, c): 18.152 + # self = inst; a = x; b = y; c = z 18.153 +}}} 18.154 + 18.155 +|| 18.156 +<style="vertical-align:top"> 18.157 + 18.158 +{{{#!python numbers=disable 18.159 +inst.method(x, y, z) 18.160 +# -> inst.method(inst, x, y, z) 18.161 + 18.162 +def method(a, b, c): 18.163 + # self = inst; a = x; b = y; c = z 18.164 +}}} 18.165 + 18.166 +== 18.167 +<style="vertical-align:top"> 18.168 + 18.169 +{{{#!python numbers=disable 18.170 +cls.method(self, x, y, z) 18.171 + 18.172 +def method(self, a, b, c): 18.173 + # parameters = arguments 18.174 +}}} 18.175 + 18.176 +|| 18.177 +<style="vertical-align:top"> 18.178 + 18.179 +{{{#!python numbers=disable 18.180 +f = get_using(cls.method, self) 18.181 +# context of f = self; value of f = cls.method 18.182 +f(x, y, z) 18.183 +# -> f(context of f, x, y, z) 18.184 + 18.185 +def method(a, b, c): 18.186 + # self = context of f = self; a = x; b = y; c = z 18.187 +}}} 18.188 + 18.189 +}}}} 18.190 + 18.191 +To avoid usage of `self` in undefined ways, only methods are able to use `self` and are not allowed to redefine it. Consequently, when invoking a callable, the context is set (where the callable is unknown until run-time; it is not set if compile-time knowledge indicates that it is not needed), and in situations where `self` is not permitted, the context is therefore safely ignored. Meanwhile, methods are always supplied with a context compatible with `self`. 18.192 + 18.193 +|| '''Callable''' || '''self''' || '''Remarks''' || 18.194 +|| Class || context || context discarded and replaced by allocated instance || 18.195 +|| Function || null || `self` not permitted, context ignored || 18.196 +|| Function (stored on class) || class as context || `self` not permitted, context ignored || 18.197 +|| Function (stored on instance) || instance as context || `self` not permitted, context ignored || 18.198 +|| Instance || instance as context || `self` set to instance || 18.199 +|| Method (via class) || class as context || method not called (see "unbound methods") || 18.200 +|| Method (via instance) || instance as context || `self` set to instance || 18.201 + 18.202 +Note that the treatment of functions stored on classes differs from Python. In Python, such functions would become unbound methods (see below) and would employ their first parameter as an effective `self` parameter (regardless of name). 18.203 + 18.204 +== Unbound Methods == 18.205 + 18.206 +Since methods acquired directly from classes ("unbound methods" in Python) are meant to be combined with an instance as context (using the `get_using` function), they must be uncallable until combined with the appropriate context, yet the same methods when acquired via instances ("bound methods" in Python) must be immediately callable. 18.207 + 18.208 +To support the two different states of methods, the principal structure of a class has attributes referencing uncallable versions of its methods. Meanwhile, such uncallable methods reference callable versions and when instances are employed to access the class attributes, it is these callable versions that are retrieved. For example: 18.209 + 18.210 +{{{#!graphviz 18.211 +//format=svg 18.212 +//transform=notugly 18.213 +digraph structures { 18.214 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Instance and class structures"]; 18.215 + edge [fontsize="13.0",fontname="Helvetica",tooltip="Instance and class structures"]; 18.216 + rankdir=TB; 18.217 + 18.218 + instanceC [label="<main> instance of C |{ context of a | value of a }|{context of b | value of b }",shape=record]; 18.219 + classC [label="<main> class C |{ context of m | <m> value of m }|{ context of n | <n> value of n}",shape=record]; 18.220 + callables [label="<m> m\ncallable |<n> n\ncallable",shape=record]; 18.221 + uncallables [label="<m> m\nuncallable |<n> n\nuncallable",shape=record]; 18.222 + 18.223 + instanceC:main -> classC:main; 18.224 + classC:m -> uncallables:m [label="C.m",style=dashed]; 18.225 + classC:n -> uncallables:n [label="C.n",style=dashed]; 18.226 + uncallables:m -> callables:m [label="get_using(C.m, instance)",style=dashed]; 18.227 + uncallables:n -> callables:n [label="get_using(C.n, instance)",style=dashed]; 18.228 +} 18.229 +}}} 18.230 + 18.231 +The precise structure usage is as follows: 18.232 + 18.233 +{{{#!graphviz 18.234 +//format=svg 18.235 +//transform=notugly 18.236 +digraph methods { 18.237 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Method structures"]; 18.238 + edge [fontsize="13.0",fontname="Helvetica",tooltip="Method structures"]; 18.239 + rankdir=TB; 18.240 + 18.241 + classC [label="<main> class C | { context of m | <mvalue> uncallable for m } | ...",shape=record]; 18.242 + uncallableattr [label="attr | { <context> C | <value> uncallable for m }",shape=record]; 18.243 + callableattr [label="attr | { <context> instance | <value> callable for m }",shape=record]; 18.244 + uncallable [label="<main> uncallable for m |{ __fn__ | <b> bound method reference | <fn> unbound method routine }|{ __args__ | minimum #parameters | <ptable> parameter table reference }",shape=record]; 18.245 + callable [label="<main> callable for m |{ __fn__ | 0 | <fn> bound method routine }|{ __args__ | minimum #parameters | <ptable> parameter table reference }",shape=record]; 18.246 + ptable [label="<main> parameter table for m | ...",shape=record]; 18.247 + 18.248 + classC:mvalue -> uncallableattr [label="C.m",style=dashed]; 18.249 + classC:mvalue -> uncallable:main; 18.250 + uncallableattr:value -> uncallable:main; 18.251 + uncallableattr -> callableattr [label="get_using(C.m, instance)",style=dashed]; 18.252 + uncallable:b -> callable:main; 18.253 + callableattr:value -> callable:main; 18.254 + uncallable:ptable -> ptable:main; 18.255 + callable:ptable -> ptable:main; 18.256 +} 18.257 +}}} 18.258 + 18.259 +Callable methods provide a reference to a callable routine in its special callable member, just as functions and classes do. Uncallable methods populate the callable member with a reference to an error routine. Thus, any attempt to call an uncallable method would cause the error routine to be invoked. In addition, uncallable methods reference the corresponding callable methods so that the callable methods can be found and referenced. 18.260 + 18.261 +|| '''Accessor''' || '''Provider''' || '''Attribute''' || '''Context''' || '''Summary''' || 18.262 +||<|4> Instance ||<|4> Instance || Function || ''not used'' ||<|6> Preserve context || 18.263 +|| Bound method || Original instance || 18.264 +|| Unbound method || Providing class || 18.265 +|| Other || Same as value || 18.266 +||<|4> Instance ||<|4> Class || Function || ''not used'' || 18.267 +|| Bound method || Original instance || 18.268 +|| Unbound method || Accessing instance, if compatible || Test and replace context || 18.269 +|| Other || Same as value ||<|5> Preserve context || 18.270 +||<|4> Class ||<|4> Class || Function || ''not used'' || 18.271 +|| Bound method || Original instance || 18.272 +|| Unbound method || Providing class || 18.273 +|| Other || Same as value || 18.274 + 18.275 +When obtaining an unbound method from an instance attribute, the context of the method attribute is provided. Indeed, the context is always preserved when accessing instance attributes. 18.276 + 18.277 +When obtaining an unbound method from a class attribute via an instance, the context of the method attribute is tested against the accessing instance. If compatible, an attribute is copied containing the instance as context and a callable method reference as value. 18.278 + 18.279 +When obtaining an unbound method from a class attribute, the context of the method attribute is provided. Indeed, the context is always preserved when accessing class attributes directly. 18.280 + 18.281 +When combining an unbound method obtained from a class with an instance using `get_using`, the context of the method attribute is tested against the supplied instance. If compatible, an attribute is copied containing the instance as context and a callable method reference as value. 18.282 + 18.283 +=== Functions as Unbound Methods === 18.284 + 18.285 +Functions not defined within classes could be treated as unbound methods if they were to employ `self` (thus indicating that they are intended as methods). Such functions would then be recorded as uncallable in the module namespace, needing to be explicitly bound to a class using a special function. However, there appears to be limited utility in defining functions in this way, instead of defining them directly as methods, or instead of merely using such generic functions from existing methods.
19.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 19.2 +++ b/docs/wiki/Structure Tue Apr 11 21:41:22 2017 +0200 19.3 @@ -0,0 +1,135 @@ 19.4 += Program Structure = 19.5 + 19.6 +A program consists of a number of '''modules''' with each module providing its own namespace. Modules can be organised within '''packages''' which define a hierarchical relationship between them. However, the relationship between modules is not automatically exposed within the program: it is more appropriate to consider modules as independent entities that have a particular naming scheme. 19.7 + 19.8 +Within each module a hierarchy of namespaces is provided by '''classes''', with each class potentially containing other classes, and so on. Also appearing within modules and classes are '''functions''', with those appearing within classes being regarded as '''methods'''. 19.9 + 19.10 +Function namespaces are considered separately from module and class namespaces and are not considered part of the general namespace hierarchy, instead merely appearing as objects at the edges of that hierarchy. Functions may also be defined within functions, and such inner functions will be referenced within their parent functions, but no hierarchical relationship will exist between their namespaces. 19.11 + 19.12 +Thus, a program provides a static namespace hierarchy consisting of modules containing classes (containing other classes, and so on) plus functions. Objects residing within namespaces are accessed via '''attributes''' of those namespaces. 19.13 + 19.14 +{{{{#!table 19.15 +{{{#!graphviz 19.16 +//format=svg 19.17 +//transform=notugly 19.18 +digraph program { 19.19 + node [shape=box,fontsize="13.0",fontname="Helvetica",tooltip="Program structure"]; 19.20 + edge [tooltip="Program structure"]; 19.21 + rankdir=LR; 19.22 + 19.23 + program [label="program",shape=folder,style=filled,fillcolor=cyan]; 19.24 + 19.25 + subgraph { 19.26 + rank=same; 19.27 + moduleM [label="module M",style=filled,fillcolor=gold]; 19.28 + moduleN [label="module N\n(package)",style=filled,fillcolor=gold]; 19.29 + } 19.30 + 19.31 + subgraph { 19.32 + rank=same; 19.33 + classA [label="class A"]; 19.34 + moduleNP [label="module N.P",style=filled,fillcolor=gold]; 19.35 + } 19.36 + 19.37 + subgraph { 19.38 + rank=same; 19.39 + functionF [label="function f",shape=ellipse]; 19.40 + functionG [label="function g",shape=ellipse]; 19.41 + classB [label="class B"]; 19.42 + classC [label="class C"]; 19.43 + } 19.44 + 19.45 + subgraph { 19.46 + rank=same; 19.47 + functionJ [label="function j",shape=ellipse]; 19.48 + functionH [label="function h",shape=ellipse]; 19.49 + functionK [label="function k",shape=ellipse]; 19.50 + } 19.51 + 19.52 + program -> moduleM; 19.53 + program -> moduleN; 19.54 + 19.55 + moduleM -> classA; 19.56 + moduleN -> moduleNP; 19.57 + moduleNP -> classC; 19.58 + 19.59 + classA -> functionF; 19.60 + classA -> functionG; 19.61 + classA -> classB; 19.62 + classB -> functionH; 19.63 + classC -> functionK; 19.64 + 19.65 + functionG -> functionJ [style=dashed]; 19.66 +} 19.67 +}}} 19.68 +|| 19.69 +{{{#!python numbers=disable 19.70 +# module M 19.71 + 19.72 +class A: 19.73 + def f(...): pass 19.74 + 19.75 + def g(...): 19.76 + def j(...): pass 19.77 + 19.78 + class B: 19.79 + def h(...): pass 19.80 + 19.81 +# module N 19.82 + 19.83 +... 19.84 + 19.85 +# module N.P 19.86 + 19.87 +class C: 19.88 + def k(...): pass 19.89 +}}} 19.90 +}}}} 19.91 + 19.92 +== Referencing the Structure == 19.93 + 19.94 +Each part of the structure is catalogued using an '''object path''', indicating its location in the structure hierarchy, and a reference indicating its nature and origin. For example: 19.95 + 19.96 +|| '''Object Path''' || '''Reference''' || '''Explanation''' || 19.97 +|| `M.A` || `<class>:M.A` || The definition of class `A` in module `M` || 19.98 +|| `O.A` || `<class>:M.A` || A reference to `M.A`, a class || 19.99 +|| `N.P.C.k` || `<function>:N.P.C.k` || The definition of method `k` in class `C` of module `N.P` || 19.100 +|| `O.k` || `<function>:N.P.C.k` || A reference to `N.P.C.k`, a function || 19.101 +|| `O.values` || `<instance>:__builtins__.list.list` || An object identified as an instance of class `__builtins__.list.list` || 19.102 +|| `__main__.M` || `<module>:M` || A reference to module `M` || 19.103 +|| `__main__.counter` || `<var>` || An undetermined object called `counter` in the `__main__` module || 19.104 + 19.105 +The reference therefore expresses the '''kind''' of object (class, function, instance, module or undetermined variable), possibly accompanied by an object type, and also possibly accompanied by an alias indicating where the reference was obtained. 19.106 + 19.107 +== Classes == 19.108 + 19.109 +Classes are regarded as statically-defined objects, meaning that they are only evaluated once and must not be defined within conditional sections or within functions. Base classes must be expressed using readily-identifiable class names, since any potential ambiguity or uncertainty with the identity of base classes could result in a class inheritance hierarchy that is effectively dynamic. 19.110 + 19.111 +When '''instantiated''', classes provide '''instances''' that provide the attributes of each class's namespace plus any '''inherited''' attributes from base classes that would not be provided by the class itself. In addition, instances provide their own instance attributes. 19.112 + 19.113 +== Functions == 19.114 + 19.115 +Functions are regarded as statically-defined objects, but they may be defined either with names within other named functions or as '''lambdas''' (functions without names) within named functions or other lambdas. Thus, unlike classes, functions may be defined once but replicated many times, each time with different accompanying state information. Such state information needs to be provided via default parameters: it is not detected and propagated automatically. Closures are not supported: any state from enclosing scopes must be supplied at the point of definition of a function; it is not acquired from outer functions by name. 19.116 + 19.117 +{{{#!python numbers=disable 19.118 +def outer(a): 19.119 + b = 2 19.120 + def inner(c, a=a, b=b): 19.121 + # a initialised using default from outer scope 19.122 + # b also initialised using default from outer scope 19.123 + return a, b, c 19.124 + b = 4 19.125 + 19.126 +outer(1)(3) # returns (1, 2, 3) not (1, 4, 3) 19.127 +}}} 19.128 + 19.129 +=== Lambdas === 19.130 + 19.131 +Lambdas are given special names for the purposes of identification within the program structure, being named relative to the scope in which they are defined. For example, the first lambda appearing within module `N` would have an object path of `N.$l0`, and a subsequent lambda within the same scope would have an object path of `N.$l1`. Lambdas may appear within classes and functions, including lambdas themselves. For example: 19.132 + 19.133 +{{{#!python numbers=disable 19.134 +def f(x): 19.135 + return lambda y, x=x: lambda z, x=x, y=y: (x, y, z) 19.136 +}}} 19.137 + 19.138 +Here, within a module `M`, the outer lambda would have the object path `M.f.$l0` whereas the inner lambda would have the object path `M.f.$l0.$l0`.
20.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 20.2 +++ b/docs/wiki/ToDo Tue Apr 11 21:41:22 2017 +0200 20.3 @@ -0,0 +1,57 @@ 20.4 += To Do = 20.5 + 20.6 +As always with software, there are still many things that need to be done. Here is a list of a few of them. 20.7 + 20.8 +<<TableOfContents(2,3)>> 20.9 + 20.10 +== Finish the Core Standard Library == 20.11 + 20.12 +Lichen provides its own core standard library featuring the [[../Builtins|built-ins]] and other essential modules, with only a small amount of native code supporting implementations written in the Lichen language. 20.13 + 20.14 +=== Numeric Types === 20.15 + 20.16 +Support all the numeric types. Currently, only `int` is supported, but `float` merely requires some native code handling operations and testing for exception conditions. A representation for `long` needs to be determined, and `complex` probably just needs some methods implementing. 20.17 + 20.18 +Support promotion between some of the numeric types. Currently, `int` values that overflow raise `OverflowError`, as was done in Python before automatic promotion to `long`. 20.19 + 20.20 +=== String Types === 20.21 + 20.22 +The remaining methods need defining for byte and Unicode strings. Some methods are more complicated than others, particularly where some interpretation of the content is required: identification of case, whitespace, punctuation, and so on. 20.23 + 20.24 +Some Unicode operations could be implemented in the Lichen language, not in C. 20.25 + 20.26 +=== Sequence and Mapping Types === 20.27 + 20.28 +Various methods still need defining in the dictionary, list, tuple and set classes to provide parity with Python. 20.29 + 20.30 +=== Hashing === 20.31 + 20.32 +Unlike Python, the hashing approaches currently employed are not well-tuned. One important difference between Lichen and Python is that the former does not use hashtables as a general structure for objects, meaning that certain desirable criteria for Python hashtables (randomising certain aspects of their behaviour to prevent the exploitation of poorly performing cases) are less important for Lichen. 20.33 + 20.34 +== Provide Peripheral Library Support == 20.35 + 20.36 +The Python standard library offers support for things like networking which require a degree of system-level integration. The Lichen libraries could be expanded to offer coverage of many of the same APIs. 20.37 + 20.38 +=== Versatile Native Libraries === 20.39 + 20.40 +Currently, the native functionality already exposes things like file descriptors, and a relatively simple versatile interface might be sufficient for many future libraries. 20.41 + 20.42 +== Incremental Compilation == 20.43 + 20.44 +To some extent, the toolchain supports the caching of inspection results with selective regeneration upon changes to source files. However, deduction and subsequent activities are performed in their entirety every time. 20.45 + 20.46 +=== Preserving Structures === 20.47 + 20.48 +Where source files have changed, it may be possible that the structure details remain the same. Consequently, it is unnecessary to regenerate structures, and it is only necessary to determine the nature of modified program operations. 20.49 + 20.50 +Where structures have changed, certain existing operations may need to be updated because such operations may no longer support certain types of object or may support additional types. Meanwhile, changes to structures may be compatible with existing structure layouts and not require such layouts to be recomputed. 20.51 + 20.52 +== Exploit Deductions Further == 20.53 + 20.54 +So far, deductions have been used to inform the translation of certain operations and to identify situations where suitable operations cannot be generated. However, no attempt has been made to broaden the application of such information. 20.55 + 20.56 +=== Parameter Type Checking === 20.57 + 20.58 +Elementary knowledge about function parameters is currently available when inspecting a program, but the deducer does not attempt to propagate such information to likely callers of such functions. Such propagation in its simplest form would merely use the list of potential targets for each invocation, obtain the parameter types from each target, compare these types to any readily-comparable argument (such as a name), and then determine which targets may still be invoked successfully, potentially imposing restrictions on the suitable targets as a result. 20.59 + 20.60 +Such propagation activities can be time-consuming because they can lead to iterative whole-program propagation occurring. Consider the restriction of an invocation target: this may indicate a restriction on a name that provides such a target, thus affecting the function parameter providing the name; such a restriction could then be propagated to callers of that function, initiating further refinement of other invocations, and so on.
21.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 21.2 +++ b/docs/wiki/Toolchain Tue Apr 11 21:41:22 2017 +0200 21.3 @@ -0,0 +1,53 @@ 21.4 += Toolchain = 21.5 + 21.6 +The toolchain implements the process of analysing Lichen source files, compiling information about the structures and routines expressed in each program, and generating output for further processing that can produce an executable program. 21.7 + 21.8 +<<TableOfContents(2,3)>> 21.9 + 21.10 +== Compiling Programs == 21.11 + 21.12 +The principal interface to the toolchain is the `lplc` command, run on source files as in the following example: 21.13 + 21.14 +{{{ 21.15 +lplc tests/unicode.py 21.16 +}}} 21.17 + 21.18 +There is no need to specify all the files that might be required by the complete program. Instead, the toolchain identifies files in the program by searching its module search path. This can be configured using the `LICHENPATH` environment variable and the `-E` option. 21.19 + 21.20 +Various [[../Prerequisites|prerequisites]] are needed for the toolchain to work properly. By specifying the `-c` option, the specified program will be translated to a C programming language representation but not built, avoiding the need for some development tools to be installed if this is desirable. 21.21 + 21.22 +The default output file from a successful compilation is a file called `_main`, but this can be overridden using the `-o` option. For example: 21.23 + 21.24 +{{{ 21.25 +lplc -o unicode tests/unicode.py 21.26 +}}} 21.27 + 21.28 +The complete set of options can be viewed by specifying the `--help` option, and a manual page is also provided in the `docs` directory of the source distribution: 21.29 + 21.30 +{{{ 21.31 +man -l docs/lplc.1 21.32 +}}} 21.33 + 21.34 +This page may already be installed if the software was provided as a package as part of an operating system distribution: 21.35 + 21.36 +{{{ 21.37 +man lplc 21.38 +}}} 21.39 + 21.40 +== Toolchain Implementation == 21.41 + 21.42 +The toolchain itself is currently written in Python, but it is envisaged that it will eventually be written in the Lichen language, hopefully needing only minor modifications so that it may be able to accept its own source files as input and ultimately produce a representation of itself as an executable program. Since the Lichen language is based on Python, it is convenient to use existing Python implementations to access libraries that support the parsing of Python source files into useful representations. 21.43 + 21.44 +The Python standard library provides two particularly useful modules or packages of relevance: the `compiler` package and the `parser` module; `parser` is employed by `compiler` to decode source text, whereas `compiler` takes the concrete syntax tree representation from `parser` and produces an abstract syntax tree (AST) which is particularly helpful to software of the nature described here. (Contrary to impressions that [[http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/|some articles]] might give, the `ast` module available in Python 2.5 and later was not the first module to offer AST representations of Python programs in Python, nor was it even the first such module in the standard library.) 21.45 + 21.46 +However, it is not desirable to have a dependency on a Python implementation, which the `parser` module effectively is (as would the `ast` module also be if it were used here), with it typically being implemented as an extension module in a non-Python language (in C for CPython, in Java for Jython, and so on). Fortunately, the !PyPy project implemented their own parsing module, `pyparser`, that is intended to be used within the !PyPy environment together with their own `ast` equivalent, but it has been possible to rework `pyparser` to produce representations that are compatible with the `compiler` package, itself being modified in various ways to achieve compatibility (and also to provide various other conveniences). 21.47 + 21.48 +== Program Analysis == 21.49 + 21.50 +With the means of inspecting source files available through a `compiler` package producing a usable representation of each file, it becomes possible to identify the different elements in each file and to collect information that may be put to use later. But before any files are inspected, it must be determined ''which'' files are to be inspected, these comprising the complete program to be analysed. 21.51 + 21.52 +Both Lichen and Python support the notion of a main source file (sometimes called the "script" file or the main module or `__main__`) and of imported modules and packages. The complete set of modules employed in a program is defined as those imported by the main module, then those imported by those modules, and so on. Thus, the complete set is not known without inspecting part of the program, and this set must be built incrementally until no new modules are encountered. 21.53 + 21.54 +Where Lichen and Python differ is in the handling of [[../Imports|imports]] themselves. Python [[https://docs.python.org/3/reference/import.html|employs]] an intricate mechanism that searches for modules and packages, loading modules encountered when descending into packages to retrieve specific modules. In contrast, Lichen only imports the modules that are explicitly mentioned in programs. Thus, a Lichen program will not accumulate potentially large numbers of superfluous modules. 21.55 + 21.56 +With a given module identified as being part of a program, the module will then be [[../Inspection|inspected]] for the purposes of gathering useful information. Since the primary objectives are to characterise the structure of the objects in a program and to determine how such objects are used, certain kinds of program constructs will be inspected more closely than others. Note that this initial inspection activity is not concerned with the translation of program operations to other forms: such [[../Translation|translation]] will occur later; this initial inspection is purely concerned with obtaining enough information to inform such later activities, with the original program being revisited to provide the necessary detail required to translate it.
22.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 22.2 +++ b/docs/wiki/Translation Tue Apr 11 21:41:22 2017 +0200 22.3 @@ -0,0 +1,164 @@ 22.4 += Translating Programs = 22.5 + 22.6 +The Lichen toolchain currently targets the C programming language, generating programs that are then compiled using existing C compilers. The process of translation involves the optimisation and generation of program structures, and the translation of Lichen language constructs into equivalent C language constructs. 22.7 + 22.8 +<<TableOfContents(2,3)>> 22.9 + 22.10 +== Attribute Access Plans == 22.11 + 22.12 +During [[../Inspection|inspection]], each attribute access is associated with a unique location and its details stored. During [[../Deduction|deduction]], such accesses are resolved and characterised. During [[../Optimisation|optimisation]], such accesses are encoded as '''instruction plans''' indicating the operations or instructions required to achieve the access. During translation, these instruction plans are generated as part of the final program. 22.13 + 22.14 +== Structure Optimisation == 22.15 + 22.16 +All object structures need to support the attributes defined for them, and although it may be possible to determine the identity of attributes in advance of a program being run, there generally remains a need to provide mechanisms to determine the nature of an object at run-time and to access its attributes dynamically. To achieve this, an [[../Optimisation|optimisation]] process catalogues the presence of each attribute name on all program objects, then deciding upon a position for the structure member corresponding to that attribute name in all objects. 22.17 + 22.18 +== Structure Generation == 22.19 + 22.20 +Each program consists of a number of structures providing the program's objects and defining classes, functions, modules and any predefined instances. 22.21 + 22.22 +=== Attribute Representation === 22.23 + 22.24 +Attributes most typically provide references to objects within a certain optional context. Since attributes are employed to encode various other kinds of information, the data structure may be interpreted in other ways within suitably controlled environments. For example, the `__data__` attribute found on instances of certain built-in classes will encode references to other kinds of information that are used within native functions. 22.25 + 22.26 +=== Object Representation === 22.27 + 22.28 +Objects provide collections of attributes, and the object representation is meant to support efficient computed access to attributes whilst also allowing a reasonably efficient means of testing and accessing attributes at run-time. 22.29 + 22.30 +=== Structure Constants === 22.31 + 22.32 +For clarity, several classes of symbolic constant are defined in the translated program to help define structures or to refer to structure members. 22.33 + 22.34 +|| '''Constant Class''' || '''Purpose''' || '''Example''' || '''Application''' || 22.35 +|| `csize` || Indicates the number of attributes in a class || `__csize___builtins___list_list` ||<|3> Structure definition || 22.36 +|| `isize` || Indicates the number of attributes in an instance || `__isize___builtins___list_list` || 22.37 +|| `msize` || Indicates the number of attributes in a module || `__msize___builtins___list` || 22.38 +|| `pminsize` || Indicates the minimum number of parameters expected by a callable || `__pminsize___builtins___list_list_append` ||<|2> Invocation || 22.39 +|| `pmaxsize` || Indicates the maximum number of parameters expected by a callable || `__pmaxsize___builtins___list_list_append` || 22.40 +|| `pcode` || Assigns a code to a particular parameter name || `__pcode_value` ||<|2> Keyword argument positioning || 22.41 +|| `ppos` || Indicates the table position used by a particular parameter name || `__ppos_value` || 22.42 +|| `code` || Assigns a code to a particular attribute name || `__code_value` ||<|2> Attribute access || 22.43 +|| `pos` || Indicates the table position used by a particular attribute name || `__pos_value` || 22.44 + 22.45 +Such constants are encoded using helper functions, producing names resembling those above that are intended to be distinct and not to conflict with other defined names in the final program. 22.46 + 22.47 +=== Instance Structures === 22.48 + 22.49 +Like all program objects, instances are defined in C according to a generic structure, meaning that all instances have the same generic members from the perspective of a C program. However, specific structures are defined for each class in order to conveniently describe the dimensions of instances of that class. For example, for the built-in list class, a declaration resembling the following is issued: 22.50 + 22.51 +{{{ 22.52 +typedef struct { 22.53 + const __table * table; 22.54 + unsigned int pos; 22.55 + __attr attrs[__isize___builtins___list_list___init__]; 22.56 +} __obj___builtins___list_list___init__; 22.57 +}}} 22.58 + 22.59 +== Program Translation == 22.60 + 22.61 +=== Useful C Language Features === 22.62 + 22.63 +The translated code relies on the availability of certain useful features of C, some of them supported only in modern compilers: 22.64 + 22.65 + * Array and structure literals: these are used to define values within expressions that would otherwise require separate definition; invocation arguments are defined using inline expressions and the `__ARGS` macro (for unknown invocation targets and for keyword arguments) 22.66 + * [[https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros|Variadic macros]]: such macros permit variable numbers of arguments, thus making the `__ARGS` macro possible 22.67 + * [[WikiPedia:Comma operator]] sequences: these are used to construct instruction sequences within expressions, providing a mechanism for formulating attribute accesses 22.68 + 22.69 +=== Function Naming === 22.70 + 22.71 +A naming scheme is applied to functions defined in the final C program to support different kinds of callables present in the original program. 22.72 + 22.73 +|| '''Name Class''' || '''Purpose''' || '''Example''' || 22.74 +|| `fn` || Indicates a plain function || `__fn___builtins___str_str` || 22.75 +|| `main` || Indicates a module main program || `__main_sys` || 22.76 +|| `new` || Indicates a class instantiator || `__new___builtins___list_list` || 22.77 +|| `newliteral` || Provides an instantiator for literals mentioned in programs || `__newliteral___builtins___list_list` || 22.78 + 22.79 +=== Generic Program Functionality === 22.80 + 22.81 +Given the representation of the fundamental program objects, a suite of generic operations is provided to support the activities necessary to inspect and update program data. Such operations are largely concerned with generic object and attribute representations and are practically identical from program to program. 22.82 + 22.83 +For example, an attribute provided by an object can be accessed in several different ways depending on the information known about the object when the program is translated. Consequently, a suite of attribute access operations is provided to support each different scenario: 22.84 + 22.85 +||<-2|2> ||<-2> '''Attribute provider''' || 22.86 +|| '''Class''' || '''Instance''' || 22.87 +||<|3> '''Accessor''' || '''Class''' || `load_via_object` || || 22.88 +|| '''Instance''' || `load_via_class` || `load_via_object` || 22.89 +|| '''Class or instance''' || `get_class_and_load` || `check_and_load_via_any` || 22.90 + 22.91 +=== Specific Program Functionality === 22.92 + 22.93 +Since objects at the program level have their layouts [[../Optimisation|optimised]], and since these layouts may differ from program to program, a suite of specific operations is provided to support a range of activities within programs, such operations being configured with program-specific information in order to do the right thing within a given program. 22.94 + 22.95 +|| '''Function''' || '''Purpose''' || 22.96 +|| `__invoke` || A generic invocation function that populates the argument array and obtains the appropriate target || 22.97 +|| `__new` || Creates a new object, setting the `__class__` attribute to the indicated class identity || 22.98 +|| `__BOOL` || Tests the identity of an attribute value against the `True` constant's address || 22.99 +|| `__GETDEFAULT` || Obtains a function default from additional members of a function instance beyond the core members || 22.100 +|| `__SETDEFAULT` || Updates a function default from additional members of a function instance beyond the core members || 22.101 + 22.102 +=== Module Files === 22.103 + 22.104 +Each program module is defined in its own file in the translated program. Since C has no notion of general module-level code, a special main function is generated for each module to perform the operations defined at the module level in the original program. These main functions are called in turn by the principal `main` function of the final C program, and the order of invocation is defined by the module initialisation ordering established when resolving inter-module dependencies. 22.105 + 22.106 +=== Statement Translation === 22.107 + 22.108 +Statements in the original program are generally translated to statements in the final C program. During inspection and translation, certain constructs are transformed to produce the operations necessary to implement such constructs. For instance, operators need to be rewritten as function invocations, and thus the form of the original program changes before it is translated to C. 22.109 + 22.110 +When translating certain operations that may appear within a statement, a sequence of instructions is generated, and such instructions must be performed in turn. Fortunately, C supports a wide variety of operations - such as assignments - within expressions, and it also provides the comma operator which permits sequences of operations to be performed in order and to yield a result that can then be used in the context of an expression. Such sequences of operations are employed extensively to realise attribute access instruction plans. 22.111 + 22.112 +=== Name References === 22.113 + 22.114 +Names can be translated in different ways for the final C program. Function locals and parameters correspond directly to locals in C functions. Module-level globals and class-level locals are translated to structure accesses. However, temporary names that are used in sequence assignment are translated to C function locals, either in functions corresponding to their equivalents in the original program, or in module initialisation functions corresponding to the module scope in the original program. 22.115 + 22.116 +Certain special names that are formulated during [[../Inspection|inspection]] are converted to constant or static references. 22.117 + 22.118 +=== Attribute Accesses === 22.119 + 22.120 +Each attribute access is associated with a particular access location, corresponding to an operation occurring in the original program identified during inspection. This location can be used as a key to access a [[#Attribute_Access_Plans|plan]] that describes how the access may be realised in the final program. Since most of the work computing the nature of the access is done during the preceding deduction and optimisation activities, what remains is mostly the emission of the plan into the generated program text with some substitutions performed. 22.121 + 22.122 +=== Invocations === 22.123 + 22.124 +Each invocation involves an object whose identity may or may not be known at compile-time, plus a collection of arguments. The translation process is concerned with obtaining a target to be invoked in the generated program, populating an argument array appropriately for the target, and providing the means of invocation. 22.125 + 22.126 +Where the identity of the object is not known at all, the information provided is effectively passed on to the special `__invoke` function which is then required to do all the work at run-time. However, one of the goals of analysing and compiling the program is to avoid doing such work at run-time and to take advantage of those cases where the identity of the object can be determined. 22.127 + 22.128 +Thus, where the identity of the called object is known, details of the object's signature - its parameters and their positions - are used to populate the argument array and to determine whether this can be done successfully. Some callables require a context to be provided, and knowledge about the callable can be used to obtain or omit such a context from the arguments. 22.129 + 22.130 +Finally, the invocation target must be obtained. Where some knowledge of the identity of the object is available, it may be sufficient to access the special `__fn__` member directly (potentially testing for its presence if this is not certain) and to call the appropriate function with the assembled argument array. Where a definite identity is available, the actual function itself may be named and thus called with the arguments. Such operations are mere fragments of the total work performed by the `__invoke` function in the least optimal case. 22.131 + 22.132 +=== Assignments === 22.133 + 22.134 +Assignments of objects to names occur for the following kinds of statements: assignments, definition statements (`class` and `def`), import statements (`from` and `import`). 22.135 + 22.136 +=== Guards === 22.137 + 22.138 +Guards are identified during [[../Deduction|deduction]] as being necessary to uphold deductions about the nature of objects referenced via names that may be undermined at run-time by potentially erroneous program behaviour. 22.139 + 22.140 +=== Exceptions === 22.141 + 22.142 +The C programming language does not support exceptions in a form supported by more modern programming languages. Consequently, lower-level mechanisms need to be employed to reproduce the semantics of the `try` statement, its `except` and `finally` clauses, and the `raise` statement, as well as considering the effect of exceptions on the `return` statement. 22.143 + 22.144 +Fortunately, some consideration has been given to this topic before. The [[http://www.nicemice.net/cexcept/|cexcept]] library - really a collection of convenience macros - provides a way of expressing exception constructs in a natural way in C while translating those constructs to usage of the `setjmp` and `longjmp` functions. Although direct usage of such functions could be generated during program translation, for convenience and to focus on other implementation concerns, it was decided to employ the cexcept macros and to build any additional functionality in terms of their usage. 22.145 + 22.146 +A structure is defined to hold exception-related information: 22.147 + 22.148 +{{{ 22.149 +typedef struct 22.150 +{ 22.151 + __attr arg; 22.152 + int raising; 22.153 + int raising_else; 22.154 + int completing; 22.155 +} __exc; 22.156 +}}} 22.157 + 22.158 +Here, the `arg` member holds a raised exception object reference. Meanwhile, the remaining members interact with language constructs as follows: 22.159 + 22.160 +|| '''Member''' || '''Description''' || '''Output Program Operation''' || 22.161 +|| `raising` || Indicates an exception that has been raised and not yet handled || `__Raise` || 22.162 +|| `raising_else` || Indicates an exception that must be re-raised because it occurred in an `else` clause || `__RaiseElse` || 22.163 +|| `completing` || Indicates the completion of, or return from, a `try` clause requiring entry into a `finally` clause || `__Complete` and `__Return` || 22.164 + 22.165 +=== Memory Allocation and Garbage Collection === 22.166 + 22.167 +To avoid having to write a garbage collector, the [[http://www.hboehm.info/gc/|Boehm-Demers-Weiser garbage collector]] is employed by programs to allocate and free memory needed for its objects.