simplify (annotate docs/architecture.txt in 88ccf0ca4bd6)

Architecture Overview

paulb@171

2

=====================

The simplify modules take Python source code, obtain an abstract syntax tree

paulb@171

5

(AST) using the standard library compiler module, produce a simplified node

paulb@171

6

tree whose nodes correspond to primitive operations, and then perform a number

paulb@171

7

of processes such as name resolution and type annotation on the simplified

paulb@171

8

tree.

The Simplified Nodes

paulb@171

11

====================

Unlike the selection of AST nodes produced by the compiler module which

paulb@171

14

reflect syntactic constructs in Python source code, the simplified nodes more

paulb@171

15

closely reflect the underlying operations performed when such constructs are

paulb@171

16

executed in a running program. Thus, in some respects the simplified nodes

paulb@171

17

have a certain similarity with CPython interpreter bytecodes. However, unlike

paulb@171

18

the instruction set encoded in the selection of bytecodes understood by the

paulb@171

19

CPython interpreter, simplified nodes are more low-level and eliminate the

paulb@171

20

hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a

paulb@171

21

small Python program, producing a small AST, can produce a much larger

paulb@171

22

simplified node tree.

The simplify module is responsible for the production of a simplified node

paulb@171

25

tree, using the full range of nodes defined in the simplified module. In some

paulb@171

26

cases, the nodes created in this process may be discarded in favour of others

paulb@171

27

after further analysis of the program.

Name Resolution

paulb@171

30

===============

The scope of names or identifiers in Python programs can be determined by

paulb@171

33

relatively simple methods, and the process of name resolution is concerned

paulb@171

34

with identifying the scope in all name accesses within a program, changing the

paulb@171

35

simplified node involved where necessary. For example, before the name

paulb@171

36

resolution process the mention of a name in a program, as represented by a

paulb@171

37

LoadName simplified node, may be found to refer to a module global instead of

paulb@171

38

a local name; consequently, the LoadName node would be changed to a LoadAttr

paulb@171

39

node which references the module.

The fixnames module is responsible for the transformation of the simplified

paulb@171

42

node tree and the replacement of nodes in order to indicate particular

paulb@171

43

namespace operations.

Type Annotation

paulb@171

46

===============

The principal motivation in developing this system is to discover the nature

paulb@171

49

of data at each point in a program, and one important aspect of doing so is to

paulb@171

50

examine the data types employed at certain points (principally the definition

paulb@171

51

and instantiation of such types), and to propagate such information throughout

paulb@171

52

the program, in a way simulating the execution of the program but without

paulb@171

53

actually doing so with concrete values or objects associated with such types.

paulb@171

54

In order to simulate the semantics of Python, the following concepts have been

paulb@171

55

employed:

  * Attribute: a bundle containing a data type and its context.

paulb@171

58

  * Namespace: a mapping of names to attributes.

paulb@171

59

  * Accessor: a combination of an attribute and its origin - the owner of the

paulb@171

60

    namespace providing access to the attribute.

Attributes

paulb@171

63

----------

The main purpose of the notion of an attribute is to support references to

paulb@171

66

methods. Contrary to initial expectations (perhaps fuelled by experience with

paulb@171

67

other programming languages [1] or by simple example programs), Python does

paulb@171

68

not insist that methods be invoked immediately upon dereferencing objects

paulb@171

69

providing such methods; instead the reference to a method may be stored in a

paulb@171

70

variable and used in an invocation on a subsequent occasion. In order to

paulb@171

71

permit this behaviour, the context of a method (typically the object from

paulb@171

72

which the method was obtained) must be recalled and employed in the

paulb@171

73

invocation. Thus, it becomes necessary to treat type information as a bundle

paulb@171

74

containing the type of an attribute and any context information that may

paulb@171

75

subsequently be useful.

[1] http://java.sun.com/docs/white/delegates.html

paulb@171	1	Architecture Overview
paulb@171	2	=====================
paulb@171	3
paulb@171	4	The simplify modules take Python source code, obtain an abstract syntax tree
paulb@171	5	(AST) using the standard library compiler module, produce a simplified node
paulb@171	6	tree whose nodes correspond to primitive operations, and then perform a number
paulb@171	7	of processes such as name resolution and type annotation on the simplified
paulb@171	8	tree.
paulb@171	9
paulb@171	10	The Simplified Nodes
paulb@171	11	====================
paulb@171	12
paulb@171	13	Unlike the selection of AST nodes produced by the compiler module which
paulb@171	14	reflect syntactic constructs in Python source code, the simplified nodes more
paulb@171	15	closely reflect the underlying operations performed when such constructs are
paulb@171	16	executed in a running program. Thus, in some respects the simplified nodes
paulb@171	17	have a certain similarity with CPython interpreter bytecodes. However, unlike
paulb@171	18	the instruction set encoded in the selection of bytecodes understood by the
paulb@171	19	CPython interpreter, simplified nodes are more low-level and eliminate the
paulb@171	20	hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a
paulb@171	21	small Python program, producing a small AST, can produce a much larger
paulb@171	22	simplified node tree.
paulb@171	23
paulb@171	24	The simplify module is responsible for the production of a simplified node
paulb@171	25	tree, using the full range of nodes defined in the simplified module. In some
paulb@171	26	cases, the nodes created in this process may be discarded in favour of others
paulb@171	27	after further analysis of the program.
paulb@171	28
paulb@171	29	Name Resolution
paulb@171	30	===============
paulb@171	31
paulb@171	32	The scope of names or identifiers in Python programs can be determined by
paulb@171	33	relatively simple methods, and the process of name resolution is concerned
paulb@171	34	with identifying the scope in all name accesses within a program, changing the
paulb@171	35	simplified node involved where necessary. For example, before the name
paulb@171	36	resolution process the mention of a name in a program, as represented by a
paulb@171	37	LoadName simplified node, may be found to refer to a module global instead of
paulb@171	38	a local name; consequently, the LoadName node would be changed to a LoadAttr
paulb@171	39	node which references the module.
paulb@171	40
paulb@171	41	The fixnames module is responsible for the transformation of the simplified
paulb@171	42	node tree and the replacement of nodes in order to indicate particular
paulb@171	43	namespace operations.
paulb@171	44
paulb@171	45	Type Annotation
paulb@171	46	===============
paulb@171	47
paulb@171	48	The principal motivation in developing this system is to discover the nature
paulb@171	49	of data at each point in a program, and one important aspect of doing so is to
paulb@171	50	examine the data types employed at certain points (principally the definition
paulb@171	51	and instantiation of such types), and to propagate such information throughout
paulb@171	52	the program, in a way simulating the execution of the program but without
paulb@171	53	actually doing so with concrete values or objects associated with such types.
paulb@171	54	In order to simulate the semantics of Python, the following concepts have been
paulb@171	55	employed:
paulb@171	56
paulb@171	57	* Attribute: a bundle containing a data type and its context.
paulb@171	58	* Namespace: a mapping of names to attributes.
paulb@171	59	* Accessor: a combination of an attribute and its origin - the owner of the
paulb@171	60	namespace providing access to the attribute.
paulb@171	61
paulb@171	62	Attributes
paulb@171	63	----------
paulb@171	64
paulb@171	65	The main purpose of the notion of an attribute is to support references to
paulb@171	66	methods. Contrary to initial expectations (perhaps fuelled by experience with
paulb@171	67	other programming languages [1] or by simple example programs), Python does
paulb@171	68	not insist that methods be invoked immediately upon dereferencing objects
paulb@171	69	providing such methods; instead the reference to a method may be stored in a
paulb@171	70	variable and used in an invocation on a subsequent occasion. In order to
paulb@171	71	permit this behaviour, the context of a method (typically the object from
paulb@171	72	which the method was obtained) must be recalled and employed in the
paulb@171	73	invocation. Thus, it becomes necessary to treat type information as a bundle
paulb@171	74	containing the type of an attribute and any context information that may
paulb@171	75	subsequently be useful.
paulb@171	76
paulb@171	77	[1] http://java.sun.com/docs/white/delegates.html

simplify

Annotated docs/architecture.txt