paulb@171 | 1 | Architecture Overview
|
paulb@171 | 2 | =====================
|
paulb@171 | 3 |
|
paulb@171 | 4 | The simplify modules take Python source code, obtain an abstract syntax tree
|
paulb@171 | 5 | (AST) using the standard library compiler module, produce a simplified node
|
paulb@171 | 6 | tree whose nodes correspond to primitive operations, and then perform a number
|
paulb@171 | 7 | of processes such as name resolution and type annotation on the simplified
|
paulb@171 | 8 | tree.
|
paulb@171 | 9 |
|
paulb@171 | 10 | The Simplified Nodes
|
paulb@171 | 11 | ====================
|
paulb@171 | 12 |
|
paulb@171 | 13 | Unlike the selection of AST nodes produced by the compiler module which
|
paulb@171 | 14 | reflect syntactic constructs in Python source code, the simplified nodes more
|
paulb@171 | 15 | closely reflect the underlying operations performed when such constructs are
|
paulb@171 | 16 | executed in a running program. Thus, in some respects the simplified nodes
|
paulb@171 | 17 | have a certain similarity with CPython interpreter bytecodes. However, unlike
|
paulb@171 | 18 | the instruction set encoded in the selection of bytecodes understood by the
|
paulb@171 | 19 | CPython interpreter, simplified nodes are more low-level and eliminate the
|
paulb@171 | 20 | hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a
|
paulb@171 | 21 | small Python program, producing a small AST, can produce a much larger
|
paulb@171 | 22 | simplified node tree.
|
paulb@171 | 23 |
|
paulb@171 | 24 | The simplify module is responsible for the production of a simplified node
|
paulb@171 | 25 | tree, using the full range of nodes defined in the simplified module. In some
|
paulb@171 | 26 | cases, the nodes created in this process may be discarded in favour of others
|
paulb@171 | 27 | after further analysis of the program.
|
paulb@171 | 28 |
|
paulb@171 | 29 | Name Resolution
|
paulb@171 | 30 | ===============
|
paulb@171 | 31 |
|
paulb@171 | 32 | The scope of names or identifiers in Python programs can be determined by
|
paulb@171 | 33 | relatively simple methods, and the process of name resolution is concerned
|
paulb@171 | 34 | with identifying the scope in all name accesses within a program, changing the
|
paulb@171 | 35 | simplified node involved where necessary. For example, before the name
|
paulb@171 | 36 | resolution process the mention of a name in a program, as represented by a
|
paulb@171 | 37 | LoadName simplified node, may be found to refer to a module global instead of
|
paulb@171 | 38 | a local name; consequently, the LoadName node would be changed to a LoadAttr
|
paulb@171 | 39 | node which references the module.
|
paulb@171 | 40 |
|
paulb@171 | 41 | The fixnames module is responsible for the transformation of the simplified
|
paulb@171 | 42 | node tree and the replacement of nodes in order to indicate particular
|
paulb@171 | 43 | namespace operations.
|
paulb@171 | 44 |
|
paulb@171 | 45 | Type Annotation
|
paulb@171 | 46 | ===============
|
paulb@171 | 47 |
|
paulb@171 | 48 | The principal motivation in developing this system is to discover the nature
|
paulb@171 | 49 | of data at each point in a program, and one important aspect of doing so is to
|
paulb@171 | 50 | examine the data types employed at certain points (principally the definition
|
paulb@171 | 51 | and instantiation of such types), and to propagate such information throughout
|
paulb@171 | 52 | the program, in a way simulating the execution of the program but without
|
paulb@171 | 53 | actually doing so with concrete values or objects associated with such types.
|
paulb@171 | 54 | In order to simulate the semantics of Python, the following concepts have been
|
paulb@171 | 55 | employed:
|
paulb@171 | 56 |
|
paulb@171 | 57 | * Attribute: a bundle containing a data type and its context.
|
paulb@171 | 58 | * Namespace: a mapping of names to attributes.
|
paulb@171 | 59 | * Accessor: a combination of an attribute and its origin - the owner of the
|
paulb@171 | 60 | namespace providing access to the attribute.
|
paulb@171 | 61 |
|
paulb@171 | 62 | Attributes
|
paulb@171 | 63 | ----------
|
paulb@171 | 64 |
|
paulb@171 | 65 | The main purpose of the notion of an attribute is to support references to
|
paulb@171 | 66 | methods. Contrary to initial expectations (perhaps fuelled by experience with
|
paulb@171 | 67 | other programming languages [1] or by simple example programs), Python does
|
paulb@171 | 68 | not insist that methods be invoked immediately upon dereferencing objects
|
paulb@171 | 69 | providing such methods; instead the reference to a method may be stored in a
|
paulb@171 | 70 | variable and used in an invocation on a subsequent occasion. In order to
|
paulb@171 | 71 | permit this behaviour, the context of a method (typically the object from
|
paulb@171 | 72 | which the method was obtained) must be recalled and employed in the
|
paulb@171 | 73 | invocation. Thus, it becomes necessary to treat type information as a bundle
|
paulb@171 | 74 | containing the type of an attribute and any context information that may
|
paulb@171 | 75 | subsequently be useful.
|
paulb@171 | 76 |
|
paulb@171 | 77 | [1] http://java.sun.com/docs/white/delegates.html
|