simplify

docs/architecture.txt

285:6f583cfa39a5
2008-07-13 Paul Boddie Commit long uncommitted changes.
     1 Architecture Overview
     2 =====================
     3 
     4 The simplify modules take Python source code, obtain an abstract syntax tree
     5 (AST) using the standard library compiler module, produce a simplified node
     6 tree whose nodes correspond to primitive operations, and then perform a number
     7 of processes such as name resolution and type annotation on the simplified
     8 tree.
     9 
    10 The Simplified Nodes
    11 ====================
    12 
    13 Unlike the selection of AST nodes produced by the compiler module which
    14 reflect syntactic constructs in Python source code, the simplified nodes more
    15 closely reflect the underlying operations performed when such constructs are
    16 executed in a running program. Thus, in some respects the simplified nodes
    17 have a certain similarity with CPython interpreter bytecodes. However, unlike
    18 the instruction set encoded in the selection of bytecodes understood by the
    19 CPython interpreter, simplified nodes are more low-level and eliminate the
    20 hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a
    21 small Python program, producing a small AST, can produce a much larger
    22 simplified node tree.
    23 
    24 The simplify module is responsible for the production of a simplified node
    25 tree, using the full range of nodes defined in the simplified module. In some
    26 cases, the nodes created in this process may be discarded in favour of others
    27 after further analysis of the program.
    28 
    29 Name Resolution
    30 ===============
    31 
    32 The scope of names or identifiers in Python programs can be determined by
    33 relatively simple methods, and the process of name resolution is concerned
    34 with identifying the scope in all name accesses within a program, changing the
    35 simplified node involved where necessary. For example, before the name
    36 resolution process the mention of a name in a program, as represented by a
    37 LoadName simplified node, may be found to refer to a module global instead of
    38 a local name; consequently, the LoadName node would be changed to a LoadAttr
    39 node which references the module.
    40 
    41 The fixnames module is responsible for the transformation of the simplified
    42 node tree and the replacement of nodes in order to indicate particular
    43 namespace operations.
    44 
    45 Type Annotation
    46 ===============
    47 
    48 The principal motivation in developing this system is to discover the nature
    49 of data at each point in a program, and one important aspect of doing so is to
    50 examine the data types employed at certain points (principally the definition
    51 and instantiation of such types), and to propagate such information throughout
    52 the program, in a way simulating the execution of the program but without
    53 actually doing so with concrete values or objects associated with such types.
    54 In order to simulate the semantics of Python, the following concepts have been
    55 employed:
    56 
    57   * Attribute: a bundle containing a data type and its context.
    58   * Namespace: a mapping of names to attributes.
    59   * Accessor: a combination of an attribute and its origin - the owner of the
    60     namespace providing access to the attribute.
    61 
    62 Attributes
    63 ----------
    64 
    65 The main purpose of the notion of an attribute is to support references to
    66 methods. Contrary to initial expectations (perhaps fuelled by experience with
    67 other programming languages [1] or by simple example programs), Python does
    68 not insist that methods be invoked immediately upon dereferencing objects
    69 providing such methods; instead the reference to a method may be stored in a
    70 variable and used in an invocation on a subsequent occasion. In order to
    71 permit this behaviour, the context of a method (typically the object from
    72 which the method was obtained) must be recalled and employed in the
    73 invocation. Thus, it becomes necessary to treat type information as a bundle
    74 containing the type of an attribute and any context information that may
    75 subsequently be useful.
    76 
    77 [1] http://java.sun.com/docs/white/delegates.html
    78 
    79 Classes, Attributes and Instantiation Behaviours
    80 ------------------------------------------------
    81 
    82 A superficial view of the relationship between classes and instances in Python
    83 is that instances of a particular class can mostly be treated as equivalent -
    84 the class defines the behaviour of them all, and the kind of data stored (or
    85 rather referenced by each of them) is likely to be mostly identical. However,
    86 we know that some classes encourage a more heterogeneous usage of their
    87 instances: lists, for example, often contain a variety of different kinds of
    88 data, and it is often accurate to note that within a given program the
    89 contents of list objects may vary substantially. Consequently, whilst a simple
    90 view of instances and their equivalence may hold for some classes, others
    91 require a more sophisticated view of instances created from them.
    92 
    93 The simplest class behaviour states that only one kind of instance exists for
    94 each class; a more sophisticated behaviour states that potentially many kinds
    95 of instances exist for a class. In a running program, all instances are
    96 distinct and there may be no limit (other than that of physical resources and
    97 of time) to how many there may be in the lifetime of a program. However, when
    98 modelling the behaviour of a program, it is necessary to be more conservative
    99 when attempting to identify such instance types.
   100 
   101 Instance Filtering
   102 ==================
   103 
   104 Depending on the mechanism employed to identify distinct kinds of instances,
   105 the annotation process may produce a number of instance types which are
   106 effectively equivalent. A filtering process may be run in order to identify
   107 equivalent instance types and to reduce all instance types to a genuinely
   108 distinct set, modifying also annotations referring to such instance types so
   109 that they subsequently refer only to those in the resulting set.