paulb@171 | 1 | Architecture Overview
|
paulb@171 | 2 | =====================
|
paulb@171 | 3 |
|
paulb@171 | 4 | The simplify modules take Python source code, obtain an abstract syntax tree
|
paulb@171 | 5 | (AST) using the standard library compiler module, produce a simplified node
|
paulb@171 | 6 | tree whose nodes correspond to primitive operations, and then perform a number
|
paulb@171 | 7 | of processes such as name resolution and type annotation on the simplified
|
paulb@171 | 8 | tree.
|
paulb@171 | 9 |
|
paulb@171 | 10 | The Simplified Nodes
|
paulb@171 | 11 | ====================
|
paulb@171 | 12 |
|
paulb@171 | 13 | Unlike the selection of AST nodes produced by the compiler module which
|
paulb@171 | 14 | reflect syntactic constructs in Python source code, the simplified nodes more
|
paulb@171 | 15 | closely reflect the underlying operations performed when such constructs are
|
paulb@171 | 16 | executed in a running program. Thus, in some respects the simplified nodes
|
paulb@171 | 17 | have a certain similarity with CPython interpreter bytecodes. However, unlike
|
paulb@171 | 18 | the instruction set encoded in the selection of bytecodes understood by the
|
paulb@171 | 19 | CPython interpreter, simplified nodes are more low-level and eliminate the
|
paulb@171 | 20 | hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a
|
paulb@171 | 21 | small Python program, producing a small AST, can produce a much larger
|
paulb@171 | 22 | simplified node tree.
|
paulb@171 | 23 |
|
paulb@171 | 24 | The simplify module is responsible for the production of a simplified node
|
paulb@171 | 25 | tree, using the full range of nodes defined in the simplified module. In some
|
paulb@171 | 26 | cases, the nodes created in this process may be discarded in favour of others
|
paulb@171 | 27 | after further analysis of the program.
|
paulb@171 | 28 |
|
paulb@171 | 29 | Name Resolution
|
paulb@171 | 30 | ===============
|
paulb@171 | 31 |
|
paulb@171 | 32 | The scope of names or identifiers in Python programs can be determined by
|
paulb@171 | 33 | relatively simple methods, and the process of name resolution is concerned
|
paulb@171 | 34 | with identifying the scope in all name accesses within a program, changing the
|
paulb@171 | 35 | simplified node involved where necessary. For example, before the name
|
paulb@171 | 36 | resolution process the mention of a name in a program, as represented by a
|
paulb@171 | 37 | LoadName simplified node, may be found to refer to a module global instead of
|
paulb@171 | 38 | a local name; consequently, the LoadName node would be changed to a LoadAttr
|
paulb@171 | 39 | node which references the module.
|
paulb@171 | 40 |
|
paulb@171 | 41 | The fixnames module is responsible for the transformation of the simplified
|
paulb@171 | 42 | node tree and the replacement of nodes in order to indicate particular
|
paulb@171 | 43 | namespace operations.
|
paulb@171 | 44 |
|
paulb@171 | 45 | Type Annotation
|
paulb@171 | 46 | ===============
|
paulb@171 | 47 |
|
paulb@171 | 48 | The principal motivation in developing this system is to discover the nature
|
paulb@171 | 49 | of data at each point in a program, and one important aspect of doing so is to
|
paulb@171 | 50 | examine the data types employed at certain points (principally the definition
|
paulb@171 | 51 | and instantiation of such types), and to propagate such information throughout
|
paulb@171 | 52 | the program, in a way simulating the execution of the program but without
|
paulb@171 | 53 | actually doing so with concrete values or objects associated with such types.
|
paulb@171 | 54 | In order to simulate the semantics of Python, the following concepts have been
|
paulb@171 | 55 | employed:
|
paulb@171 | 56 |
|
paulb@171 | 57 | * Attribute: a bundle containing a data type and its context.
|
paulb@171 | 58 | * Namespace: a mapping of names to attributes.
|
paulb@171 | 59 | * Accessor: a combination of an attribute and its origin - the owner of the
|
paulb@171 | 60 | namespace providing access to the attribute.
|
paulb@171 | 61 |
|
paulb@171 | 62 | Attributes
|
paulb@171 | 63 | ----------
|
paulb@171 | 64 |
|
paulb@171 | 65 | The main purpose of the notion of an attribute is to support references to
|
paulb@171 | 66 | methods. Contrary to initial expectations (perhaps fuelled by experience with
|
paulb@171 | 67 | other programming languages [1] or by simple example programs), Python does
|
paulb@171 | 68 | not insist that methods be invoked immediately upon dereferencing objects
|
paulb@171 | 69 | providing such methods; instead the reference to a method may be stored in a
|
paulb@171 | 70 | variable and used in an invocation on a subsequent occasion. In order to
|
paulb@171 | 71 | permit this behaviour, the context of a method (typically the object from
|
paulb@171 | 72 | which the method was obtained) must be recalled and employed in the
|
paulb@171 | 73 | invocation. Thus, it becomes necessary to treat type information as a bundle
|
paulb@171 | 74 | containing the type of an attribute and any context information that may
|
paulb@171 | 75 | subsequently be useful.
|
paulb@171 | 76 |
|
paulb@171 | 77 | [1] http://java.sun.com/docs/white/delegates.html
|
paulb@262 | 78 |
|
paulb@265 | 79 | Classes, Attributes and Instantiation Behaviours
|
paulb@265 | 80 | ------------------------------------------------
|
paulb@265 | 81 |
|
paulb@265 | 82 | A superficial view of the relationship between classes and instances in Python
|
paulb@265 | 83 | is that instances of a particular class can mostly be treated as equivalent -
|
paulb@265 | 84 | the class defines the behaviour of them all, and the kind of data stored (or
|
paulb@265 | 85 | rather referenced by each of them) is likely to be mostly identical. However,
|
paulb@265 | 86 | we know that some classes encourage a more heterogeneous usage of their
|
paulb@265 | 87 | instances: lists, for example, often contain a variety of different kinds of
|
paulb@265 | 88 | data, and it is often accurate to note that within a given program the
|
paulb@265 | 89 | contents of list objects may vary substantially. Consequently, whilst a simple
|
paulb@265 | 90 | view of instances and their equivalence may hold for some classes, others
|
paulb@265 | 91 | require a more sophisticated view of instances created from them.
|
paulb@265 | 92 |
|
paulb@265 | 93 | The simplest class behaviour states that only one kind of instance exists for
|
paulb@265 | 94 | each class; a more sophisticated behaviour states that potentially many kinds
|
paulb@265 | 95 | of instances exist for a class. In a running program, all instances are
|
paulb@265 | 96 | distinct and there may be no limit (other than that of physical resources and
|
paulb@265 | 97 | of time) to how many there may be in the lifetime of a program. However, when
|
paulb@265 | 98 | modelling the behaviour of a program, it is necessary to be more conservative
|
paulb@265 | 99 | when attempting to identify such instance types.
|
paulb@265 | 100 |
|
paulb@262 | 101 | Instance Filtering
|
paulb@262 | 102 | ==================
|
paulb@262 | 103 |
|
paulb@262 | 104 | Depending on the mechanism employed to identify distinct kinds of instances,
|
paulb@262 | 105 | the annotation process may produce a number of instance types which are
|
paulb@262 | 106 | effectively equivalent. A filtering process may be run in order to identify
|
paulb@262 | 107 | equivalent instance types and to reduce all instance types to a genuinely
|
paulb@262 | 108 | distinct set, modifying also annotations referring to such instance types so
|
paulb@262 | 109 | that they subsequently refer only to those in the resulting set.
|