1 Architecture Overview
2 =====================
3
4 The simplify modules take Python source code, obtain an abstract syntax tree
5 (AST) using the standard library compiler module, produce a simplified node
6 tree whose nodes correspond to primitive operations, and then perform a number
7 of processes such as name resolution and type annotation on the simplified
8 tree.
9
10 The Simplified Nodes
11 ====================
12
13 Unlike the selection of AST nodes produced by the compiler module which
14 reflect syntactic constructs in Python source code, the simplified nodes more
15 closely reflect the underlying operations performed when such constructs are
16 executed in a running program. Thus, in some respects the simplified nodes
17 have a certain similarity with CPython interpreter bytecodes. However, unlike
18 the instruction set encoded in the selection of bytecodes understood by the
19 CPython interpreter, simplified nodes are more low-level and eliminate the
20 hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a
21 small Python program, producing a small AST, can produce a much larger
22 simplified node tree.
23
24 The simplify module is responsible for the production of a simplified node
25 tree, using the full range of nodes defined in the simplified module. In some
26 cases, the nodes created in this process may be discarded in favour of others
27 after further analysis of the program.
28
29 Name Resolution
30 ===============
31
32 The scope of names or identifiers in Python programs can be determined by
33 relatively simple methods, and the process of name resolution is concerned
34 with identifying the scope in all name accesses within a program, changing the
35 simplified node involved where necessary. For example, before the name
36 resolution process the mention of a name in a program, as represented by a
37 LoadName simplified node, may be found to refer to a module global instead of
38 a local name; consequently, the LoadName node would be changed to a LoadAttr
39 node which references the module.
40
41 The fixnames module is responsible for the transformation of the simplified
42 node tree and the replacement of nodes in order to indicate particular
43 namespace operations.
44
45 Type Annotation
46 ===============
47
48 The principal motivation in developing this system is to discover the nature
49 of data at each point in a program, and one important aspect of doing so is to
50 examine the data types employed at certain points (principally the definition
51 and instantiation of such types), and to propagate such information throughout
52 the program, in a way simulating the execution of the program but without
53 actually doing so with concrete values or objects associated with such types.
54 In order to simulate the semantics of Python, the following concepts have been
55 employed:
56
57 * Attribute: a bundle containing a data type and its context.
58 * Namespace: a mapping of names to attributes.
59 * Accessor: a combination of an attribute and its origin - the owner of the
60 namespace providing access to the attribute.
61
62 Attributes
63 ----------
64
65 The main purpose of the notion of an attribute is to support references to
66 methods. Contrary to initial expectations (perhaps fuelled by experience with
67 other programming languages [1] or by simple example programs), Python does
68 not insist that methods be invoked immediately upon dereferencing objects
69 providing such methods; instead the reference to a method may be stored in a
70 variable and used in an invocation on a subsequent occasion. In order to
71 permit this behaviour, the context of a method (typically the object from
72 which the method was obtained) must be recalled and employed in the
73 invocation. Thus, it becomes necessary to treat type information as a bundle
74 containing the type of an attribute and any context information that may
75 subsequently be useful.
76
77 [1] http://java.sun.com/docs/white/delegates.html
78
79 Classes, Attributes and Instantiation Behaviours
80 ------------------------------------------------
81
82 A superficial view of the relationship between classes and instances in Python
83 is that instances of a particular class can mostly be treated as equivalent -
84 the class defines the behaviour of them all, and the kind of data stored (or
85 rather referenced by each of them) is likely to be mostly identical. However,
86 we know that some classes encourage a more heterogeneous usage of their
87 instances: lists, for example, often contain a variety of different kinds of
88 data, and it is often accurate to note that within a given program the
89 contents of list objects may vary substantially. Consequently, whilst a simple
90 view of instances and their equivalence may hold for some classes, others
91 require a more sophisticated view of instances created from them.
92
93 The simplest class behaviour states that only one kind of instance exists for
94 each class; a more sophisticated behaviour states that potentially many kinds
95 of instances exist for a class. In a running program, all instances are
96 distinct and there may be no limit (other than that of physical resources and
97 of time) to how many there may be in the lifetime of a program. However, when
98 modelling the behaviour of a program, it is necessary to be more conservative
99 when attempting to identify such instance types.
100
101 Instance Filtering
102 ==================
103
104 Depending on the mechanism employed to identify distinct kinds of instances,
105 the annotation process may produce a number of instance types which are
106 effectively equivalent. A filtering process may be run in order to identify
107 equivalent instance types and to reduce all instance types to a genuinely
108 distinct set, modifying also annotations referring to such instance types so
109 that they subsequently refer only to those in the resulting set.