simplify (annotate docs/architecture.txt in 41d635a383c7)

Architecture Overview

paulb@171

2

=====================

The simplify modules take Python source code, obtain an abstract syntax tree

paulb@171

5

(AST) using the standard library compiler module, produce a simplified node

paulb@171

6

tree whose nodes correspond to primitive operations, and then perform a number

paulb@171

7

of processes such as name resolution and type annotation on the simplified

paulb@171

8

tree.

The Simplified Nodes

paulb@171

11

====================

Unlike the selection of AST nodes produced by the compiler module which

paulb@171

14

reflect syntactic constructs in Python source code, the simplified nodes more

paulb@171

15

closely reflect the underlying operations performed when such constructs are

paulb@171

16

executed in a running program. Thus, in some respects the simplified nodes

paulb@171

17

have a certain similarity with CPython interpreter bytecodes. However, unlike

paulb@171

18

the instruction set encoded in the selection of bytecodes understood by the

paulb@171

19

CPython interpreter, simplified nodes are more low-level and eliminate the

paulb@171

20

hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a

paulb@171

21

small Python program, producing a small AST, can produce a much larger

paulb@171

22

simplified node tree.

The simplify module is responsible for the production of a simplified node

paulb@171

25

tree, using the full range of nodes defined in the simplified module. In some

paulb@171

26

cases, the nodes created in this process may be discarded in favour of others

paulb@171

27

after further analysis of the program.

Name Resolution

paulb@171

30

===============

The scope of names or identifiers in Python programs can be determined by

paulb@171

33

relatively simple methods, and the process of name resolution is concerned

paulb@171

34

with identifying the scope in all name accesses within a program, changing the

paulb@171

35

simplified node involved where necessary. For example, before the name

paulb@171

36

resolution process the mention of a name in a program, as represented by a

paulb@171

37

LoadName simplified node, may be found to refer to a module global instead of

paulb@171

38

a local name; consequently, the LoadName node would be changed to a LoadAttr

paulb@171

39

node which references the module.

The fixnames module is responsible for the transformation of the simplified

paulb@171

42

node tree and the replacement of nodes in order to indicate particular

paulb@171

43

namespace operations.

Type Annotation

paulb@171

46

===============

The principal motivation in developing this system is to discover the nature

paulb@171

49

of data at each point in a program, and one important aspect of doing so is to

paulb@171

50

examine the data types employed at certain points (principally the definition

paulb@171

51

and instantiation of such types), and to propagate such information throughout

paulb@171

52

the program, in a way simulating the execution of the program but without

paulb@171

53

actually doing so with concrete values or objects associated with such types.

paulb@171

54

In order to simulate the semantics of Python, the following concepts have been

paulb@171

55

employed:

  * Attribute: a bundle containing a data type and its context.

paulb@171

58

  * Namespace: a mapping of names to attributes.

paulb@171

59

  * Accessor: a combination of an attribute and its origin - the owner of the

paulb@171

60

    namespace providing access to the attribute.

Attributes

paulb@171

63

----------

The main purpose of the notion of an attribute is to support references to

paulb@171

66

methods. Contrary to initial expectations (perhaps fuelled by experience with

paulb@171

67

other programming languages [1] or by simple example programs), Python does

paulb@171

68

not insist that methods be invoked immediately upon dereferencing objects

paulb@171

69

providing such methods; instead the reference to a method may be stored in a

paulb@171

70

variable and used in an invocation on a subsequent occasion. In order to

paulb@171

71

permit this behaviour, the context of a method (typically the object from

paulb@171

72

which the method was obtained) must be recalled and employed in the

paulb@171

73

invocation. Thus, it becomes necessary to treat type information as a bundle

paulb@171

74

containing the type of an attribute and any context information that may

paulb@171

75

subsequently be useful.

[1] http://java.sun.com/docs/white/delegates.html

Classes, Attributes and Instantiation Behaviours

paulb@265

80

------------------------------------------------

A superficial view of the relationship between classes and instances in Python

paulb@265

83

is that instances of a particular class can mostly be treated as equivalent -

paulb@265

84

the class defines the behaviour of them all, and the kind of data stored (or

paulb@265

85

rather referenced by each of them) is likely to be mostly identical. However,

paulb@265

86

we know that some classes encourage a more heterogeneous usage of their

paulb@265

87

instances: lists, for example, often contain a variety of different kinds of

paulb@265

88

data, and it is often accurate to note that within a given program the

paulb@265

89

contents of list objects may vary substantially. Consequently, whilst a simple

paulb@265

90

view of instances and their equivalence may hold for some classes, others

paulb@265

91

require a more sophisticated view of instances created from them.

The simplest class behaviour states that only one kind of instance exists for

paulb@265

94

each class; a more sophisticated behaviour states that potentially many kinds

paulb@265

95

of instances exist for a class. In a running program, all instances are

paulb@265

96

distinct and there may be no limit (other than that of physical resources and

paulb@265

97

of time) to how many there may be in the lifetime of a program. However, when

paulb@265

98

modelling the behaviour of a program, it is necessary to be more conservative

paulb@265

99

when attempting to identify such instance types.

Instance Filtering

paulb@262

102

==================

Depending on the mechanism employed to identify distinct kinds of instances,

paulb@262

105

the annotation process may produce a number of instance types which are

paulb@262

106

effectively equivalent. A filtering process may be run in order to identify

paulb@262

107

equivalent instance types and to reduce all instance types to a genuinely

paulb@262

108

distinct set, modifying also annotations referring to such instance types so

paulb@262

109

that they subsequently refer only to those in the resulting set.

paulb@171	1	Architecture Overview
paulb@171	2	=====================
paulb@171	3
paulb@171	4	The simplify modules take Python source code, obtain an abstract syntax tree
paulb@171	5	(AST) using the standard library compiler module, produce a simplified node
paulb@171	6	tree whose nodes correspond to primitive operations, and then perform a number
paulb@171	7	of processes such as name resolution and type annotation on the simplified
paulb@171	8	tree.
paulb@171	9
paulb@171	10	The Simplified Nodes
paulb@171	11	====================
paulb@171	12
paulb@171	13	Unlike the selection of AST nodes produced by the compiler module which
paulb@171	14	reflect syntactic constructs in Python source code, the simplified nodes more
paulb@171	15	closely reflect the underlying operations performed when such constructs are
paulb@171	16	executed in a running program. Thus, in some respects the simplified nodes
paulb@171	17	have a certain similarity with CPython interpreter bytecodes. However, unlike
paulb@171	18	the instruction set encoded in the selection of bytecodes understood by the
paulb@171	19	CPython interpreter, simplified nodes are more low-level and eliminate the
paulb@171	20	hidden complexity of certain bytecodes (eg. BINARY_ADD). Consequently, even a
paulb@171	21	small Python program, producing a small AST, can produce a much larger
paulb@171	22	simplified node tree.
paulb@171	23
paulb@171	24	The simplify module is responsible for the production of a simplified node
paulb@171	25	tree, using the full range of nodes defined in the simplified module. In some
paulb@171	26	cases, the nodes created in this process may be discarded in favour of others
paulb@171	27	after further analysis of the program.
paulb@171	28
paulb@171	29	Name Resolution
paulb@171	30	===============
paulb@171	31
paulb@171	32	The scope of names or identifiers in Python programs can be determined by
paulb@171	33	relatively simple methods, and the process of name resolution is concerned
paulb@171	34	with identifying the scope in all name accesses within a program, changing the
paulb@171	35	simplified node involved where necessary. For example, before the name
paulb@171	36	resolution process the mention of a name in a program, as represented by a
paulb@171	37	LoadName simplified node, may be found to refer to a module global instead of
paulb@171	38	a local name; consequently, the LoadName node would be changed to a LoadAttr
paulb@171	39	node which references the module.
paulb@171	40
paulb@171	41	The fixnames module is responsible for the transformation of the simplified
paulb@171	42	node tree and the replacement of nodes in order to indicate particular
paulb@171	43	namespace operations.
paulb@171	44
paulb@171	45	Type Annotation
paulb@171	46	===============
paulb@171	47
paulb@171	48	The principal motivation in developing this system is to discover the nature
paulb@171	49	of data at each point in a program, and one important aspect of doing so is to
paulb@171	50	examine the data types employed at certain points (principally the definition
paulb@171	51	and instantiation of such types), and to propagate such information throughout
paulb@171	52	the program, in a way simulating the execution of the program but without
paulb@171	53	actually doing so with concrete values or objects associated with such types.
paulb@171	54	In order to simulate the semantics of Python, the following concepts have been
paulb@171	55	employed:
paulb@171	56
paulb@171	57	* Attribute: a bundle containing a data type and its context.
paulb@171	58	* Namespace: a mapping of names to attributes.
paulb@171	59	* Accessor: a combination of an attribute and its origin - the owner of the
paulb@171	60	namespace providing access to the attribute.
paulb@171	61
paulb@171	62	Attributes
paulb@171	63	----------
paulb@171	64
paulb@171	65	The main purpose of the notion of an attribute is to support references to
paulb@171	66	methods. Contrary to initial expectations (perhaps fuelled by experience with
paulb@171	67	other programming languages [1] or by simple example programs), Python does
paulb@171	68	not insist that methods be invoked immediately upon dereferencing objects
paulb@171	69	providing such methods; instead the reference to a method may be stored in a
paulb@171	70	variable and used in an invocation on a subsequent occasion. In order to
paulb@171	71	permit this behaviour, the context of a method (typically the object from
paulb@171	72	which the method was obtained) must be recalled and employed in the
paulb@171	73	invocation. Thus, it becomes necessary to treat type information as a bundle
paulb@171	74	containing the type of an attribute and any context information that may
paulb@171	75	subsequently be useful.
paulb@171	76
paulb@171	77	[1] http://java.sun.com/docs/white/delegates.html
paulb@262	78
paulb@265	79	Classes, Attributes and Instantiation Behaviours
paulb@265	80	------------------------------------------------
paulb@265	81
paulb@265	82	A superficial view of the relationship between classes and instances in Python
paulb@265	83	is that instances of a particular class can mostly be treated as equivalent -
paulb@265	84	the class defines the behaviour of them all, and the kind of data stored (or
paulb@265	85	rather referenced by each of them) is likely to be mostly identical. However,
paulb@265	86	we know that some classes encourage a more heterogeneous usage of their
paulb@265	87	instances: lists, for example, often contain a variety of different kinds of
paulb@265	88	data, and it is often accurate to note that within a given program the
paulb@265	89	contents of list objects may vary substantially. Consequently, whilst a simple
paulb@265	90	view of instances and their equivalence may hold for some classes, others
paulb@265	91	require a more sophisticated view of instances created from them.
paulb@265	92
paulb@265	93	The simplest class behaviour states that only one kind of instance exists for
paulb@265	94	each class; a more sophisticated behaviour states that potentially many kinds
paulb@265	95	of instances exist for a class. In a running program, all instances are
paulb@265	96	distinct and there may be no limit (other than that of physical resources and
paulb@265	97	of time) to how many there may be in the lifetime of a program. However, when
paulb@265	98	modelling the behaviour of a program, it is necessary to be more conservative
paulb@265	99	when attempting to identify such instance types.
paulb@265	100
paulb@262	101	Instance Filtering
paulb@262	102	==================
paulb@262	103
paulb@262	104	Depending on the mechanism employed to identify distinct kinds of instances,
paulb@262	105	the annotation process may produce a number of instance types which are
paulb@262	106	effectively equivalent. A filtering process may be run in order to identify
paulb@262	107	equivalent instance types and to reduce all instance types to a genuinely
paulb@262	108	distinct set, modifying also annotations referring to such instance types so
paulb@262	109	that they subsequently refer only to those in the resulting set.

simplify

Annotated docs/architecture.txt