1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Contexts and values
8 * Tables, attributes and lookups
9 * Objects and structures
10 * Parameters and lookups
11 * Instantiation
12 * Register usage
13 * List and tuple representations
14
15 Namespaces and Attribute Definition
16 ===================================
17
18 Namespaces are any objects which can retain attributes.
19
20 * Module attributes are defined either at the module level or by global
21 statements.
22 * Class attributes are defined only within class statements.
23 * Instance attributes are defined only by assignments to attributes of self.
24
25 These restrictions apply because such attributes are thus explicitly declared,
26 permitting the use of tables (described below). Module and class attributes
27 can also be finalised in this way in order to permit certain optimisations.
28
29 An additional restriction required for the current implementation of tables
30 (as described below) applies to class definitions: each class must be defined
31 using a unique name; repeated definition of classes having the same name is
32 thus not permitted. This restriction arises from the use of the "full name" of
33 a class as a key to the object table, where the full name is a qualified path
34 via the module hierarchy ending with the name of the class.
35
36 See rejected.txt for complicating mechanisms which could be applied to
37 mitigate the effects of these restrictions on optimisations.
38
39 Contexts and Values
40 ===================
41
42 Values are used as the common reference representation in micropython: as
43 stored representations of attributes (of classes, instances, modules, and
44 other objects supporting attribute-like entities) as well as the stored values
45 associated with names in functions and methods.
46
47 Unlike other implementations, micropython does not create things like bound
48 method objects for individual instances. Instead, all objects are referenced
49 using a context, reference pair:
50
51 Value Layout
52 ------------
53
54 0 1
55 context object
56 reference reference
57
58 Specific implementations might reverse this ordering for optimisation
59 purposes.
60
61 Rationale
62 ---------
63
64 To reduce the number of created objects whilst retaining the ability to
65 support bound method invocations. The context indicates the context in which
66 an invocation is performed, typically the owner of the method.
67
68 Usage
69 -----
70
71 The context may be inserted as the first argument when a value is involved in
72 an invocation. This argument may then be omitted from the invocation if its
73 usage is not appropriate.
74
75 See invocation.txt for details.
76
77 Context Value Types
78 -------------------
79
80 The following types of context value exist:
81
82 Type Usage Transformations
83 ---- ----- ---------------
84
85 Replaceable With functions (not methods) May be replaced with an
86 instance or a class when a
87 value is stored on an
88 instance or class
89
90 Placeholder With classes May not be replaced
91
92 Instance With instances (and constants) May not be replaced
93 or functions as methods
94
95 Class With functions as methods May be replaced when a
96 value is loaded from a
97 class attribute via an
98 instance
99
100 Contexts in Acquired Values
101 ---------------------------
102
103 There are four classes of instructions which provide values:
104
105 Instruction Purpose Context Operations
106 ----------- ------- ------------------
107
108 1) LoadConst Load module, constant Use loaded object with itself
109 as context
110
111 2) LoadFunction Load function Combine replaceable context
112 with loaded object
113
114 3) LoadClass Load class Combine placeholder context
115 with loaded object
116
117 4) LoadAddress* Load attribute from Preserve or override stored
118 LoadAttr* class, module, context (as described in
119 instance assignment.txt)
120
121 In order to comply with traditional Python behaviour, contexts may or may not
122 represent the object from which an attribute has been acquired.
123
124 See assignment.txt for details.
125
126 Contexts in Stored Values
127 -------------------------
128
129 There are two classes of instruction for storing values:
130
131 Instruction Purpose Context Operations
132 ----------- ------- ------------------
133
134 1) StoreAddress Store attribute in a Preserve context; note that no
135 known object test for class attribute
136 assignment should be necessary
137 since this instruction should only
138 be generated for module globals
139
140 StoreAttr Store attribute in an Preserve context; note that no
141 instance test for class attribute
142 assignment should be necessary
143 since this instruction should only
144 be generated for self accesses
145
146 StoreAttrIndex Store attribute in an Preserve context; since the index
147 unknown object lookup could yield a class
148 attribute, a test of the nature of
149 the nature of the structure is
150 necessary in order to prevent
151 assignments to classes
152
153 2) StoreAddressContext Store attribute in a Override context if appropriate;
154 known object if the value has a replaceable
155 context, permit the target to
156 take ownership of the value
157
158 See assignment.txt for details.
159
160 Tables, Attributes and Lookups
161 ==============================
162
163 Attribute lookups, where the exact location of an object attribute is deduced,
164 are performed differently in micropython than in other implementations.
165 Instead of providing attribute dictionaries, in which attributes are found,
166 attributes are located at fixed places in object structures (described below)
167 and their locations are stored using a special representation known as a
168 table.
169
170 For a given program, a table can be considered as being like a matrix mapping
171 classes to attribute names. For example:
172
173 class A:
174 # instances have attributes x, y
175
176 class B(A):
177 # introduces attribute z for instances
178
179 class C:
180 # instances have attributes a, b, z
181
182 This would provide the following table, referred to as an object table in the
183 context of classes and instances:
184
185 Class/attr a b x y z
186
187 A 1 2
188 B 1 2 3
189 C 1 2 3
190
191 A limitation of this representation is that instance attributes may not shadow
192 class attributes: if an attribute with a given name is not defined on an
193 instance, an attribute with the same name cannot be provided by the class of
194 the instance or any superclass of the instance's class. This impacts the
195 provision of the __class__ attribute, as described below.
196
197 The table can be compacted using a representation known as a displacement
198 list (referred to as an object list in this context):
199
200 Classes with attribute offsets
201
202 classcode A
203 attrcode a b x y z
204
205 B
206 a b x y z
207
208 C
209 a b x y z
210
211 List . . 1 2 1 2 3 1 2 . . 3
212
213 Here, the classcode refers to the offset in the list at which a class's
214 attributes are defined, whereas the attrcode defines the offset within a
215 region of attributes corresponding to a single attribute of a given name.
216
217 Attribute Locations
218 -------------------
219
220 The locations stored in table/list elements are generally for instance
221 attributes relative to the location of the instance, whereas those for class
222 attributes and module attributes are generally absolute addresses. Thus, each
223 occupied table cell has the following structure:
224
225 attrcode, uses-absolute-address, address (or location)
226
227 This could be given instead as follows:
228
229 attrcode, is-class-or-module, location
230
231 Since uses-absolute-address corresponds to is-class-or-module, and since there
232 is a need to test for classes and modules to prevent assignment to attributes
233 of such objects, this particular information is always required.
234
235 The __class__ Attribute
236 -----------------------
237
238 In Python 2.x, at least with new-style classes, instances have __class__
239 attributes which indicate the class from which they have been instantiated,
240 whereas classes have __class__ attributes which reference the type class.
241 With the object table, it is not possible to provide absolute addresses which
242 can be used for both classes and instances, since this would result in classes
243 and instances having the same class, and thus the class of a class would be
244 the class itself.
245
246 One solution is to use object-relative values in the table so that referencing
247 the __class__ attribute of an instance produces a value which can be combined
248 with an instance's address to yield the address of the attribute, which itself
249 refers to the instance's class, whereas referencing the __class__ attribute of
250 a class produces a similar object-relative value that is combined with the
251 class's address to yield the address of the attribute, which itself refers to
252 the special type class.
253
254 Obviously, the above solution requires both classes and instances to retain an
255 attribute location specifically to hold the value appropriate for each object
256 type, whereas a scheme which omits the __class__ attribute on classes would be
257 able to employ an absolute address in the table and maintain only a single
258 address to refer to the class for all instances. The only problem with not
259 providing a sensible __class__ attribute entry for classes would be the need
260 for special treatment of __class__ to prevent inappropriate consultation of
261 the table for classes.
262
263 Comparing Tables as Matrices with Displacement Lists
264 ----------------------------------------------------
265
266 Although displacement lists can provide reasonable levels of compaction for
267 attribute data, the element size is larger than that required for a simple
268 matrix: the attribute code (attrcode) need not be stored since each element
269 unambiguously refers to the availability of an attribute for a particular
270 class or instance of that class, and so the data at a given element need not
271 be tested for relevance to a given attribute access operation.
272
273 Given a program with 20 object types and 100 attribute types, a matrix would
274 occupy the following amount of space:
275
276 number of object types * number of attribute types * element size
277 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
278 = 2000
279
280 In contrast, given a compaction to 40% of the matrix size (without considering
281 element size) in a displacement list, the amount of space would be as follows:
282
283 number of elements * element size
284 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
285 = 1600
286
287 Consequently, the principal overhead of using a displacement list is likely to
288 be in the need to check element relevance when retrieving values from such a
289 list.
290
291 Objects and Structures
292 ======================
293
294 As well as references, micropython needs to have actual objects to refer to.
295 Since classes, functions and instances are all objects, it is desirable that
296 certain common features and operations are supported in the same way for all
297 of these things. To permit this, a common data structure format is used.
298
299 Header.................................................... Attributes.................
300
301 Identifier Identifier Address Identifier Size Object ...
302
303 0 1 2 3 4 5 6
304 classcode attrcode/ invocation funccode size attribute ...
305 instance reference reference
306 status
307
308 Classcode
309 ---------
310
311 Used in attribute lookup.
312
313 Here, the classcode refers to the attribute lookup table for the object (as
314 described above). Classes and instances share the same classcode, and their
315 structures reflect this. Functions all belong to the same type and thus employ
316 the classcode for the function built-in type, whereas modules have distinct
317 types since they must support different sets of attributes.
318
319 Attrcode
320 --------
321
322 Used to test instances for membership of classes (or descendants of classes).
323
324 Since, in traditional Python, classes are only ever instances of some generic
325 built-in type, support for testing such a relationship directly has been
326 removed and the attrcode is not specified for classes: the presence of an
327 attrcode indicates that a given object is an instance. In addition, support
328 has also been removed for testing modules in the same way, meaning that the
329 attrcode is also not specified for modules.
330
331 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
332 for details of attrcodes.
333
334 Invocation Reference
335 --------------------
336
337 Used when an object is called.
338
339 This is the address of the code to be executed when an invocation is performed
340 on the object.
341
342 Funccode
343 --------
344
345 Used to look up argument positions by name.
346
347 The strategy with keyword arguments in micropython is to attempt to position
348 such arguments in the invocation frame as it is being constructed.
349
350 See the "Parameters and Lookups" section for more information.
351
352 Size
353 ----
354
355 Used to indicate the size of an object including attributes.
356
357 Attributes
358 ----------
359
360 For classes, modules and instances, the attributes in the structure correspond
361 to the attributes of each kind of object. For functions, however, the
362 attributes in the structure correspond to the default arguments for each
363 function, if any.
364
365 Structure Types
366 ---------------
367
368 Class C:
369
370 0 1 2 3 4 5 6
371 classcode (unused) __new__ funccode size attribute ...
372 for C reference for reference
373 instantiator
374
375 Instance of C:
376
377 0 1 2 3 4 5 6
378 classcode attrcode C.__call__ funccode size attribute ...
379 for C for C reference for reference
380 (if exists) C.__call__
381
382 Function f:
383
384 0 1 2 3 4 5 6
385 classcode attrcode code funccode size attribute ...
386 for for reference (default)
387 function function reference
388
389 Module m:
390
391 0 1 2 3 4 5 6
392 classcode attrcode (unused) (unused) (unused) attribute ...
393 for m for m (global)
394 reference
395
396 The __class__ Attribute
397 -----------------------
398
399 All objects should support the __class__ attribute, and in most cases this is
400 done using the object table, yielding a common address for all instances of a
401 given class.
402
403 Function: refers to the function class
404 Instance: refers to the class instantiated to make the object
405
406 The object table cannot support two definitions simultaneously for both
407 instances and their classes. Consequently, __class__ access on classes must be
408 tested for and a special result returned.
409
410 Class: refers to the type class (type.__class__ also refers to the type class)
411
412 For convenience, the first attribute of a class will be the common __class__
413 attribute for all its instances. As noted above, direct access to this
414 attribute will not be possible for classes, and a constant result will be
415 returned instead.
416
417 Lists and Tuples
418 ----------------
419
420 The built-in list and tuple sequences employ variable length structures using
421 the attribute locations to store their elements, where each element is a
422 reference to a separately stored object.
423
424 Testing Instance Compatibility with Classes (Attrcode)
425 ------------------------------------------------------
426
427 Although it would be possible to have a data structure mapping classes to
428 compatible classes, such as a matrix indicating the subclasses (or
429 superclasses) of each class, the need to retain the key to such a data
430 structure for each class might introduce a noticeable overhead.
431
432 Instead of having a separate structure, descendant classes of each class are
433 inserted as special attributes into the object table. This requires an extra
434 key to be retained, since each class must provide its own attribute code such
435 that upon an instance/class compatibility test, the code may be obtained and
436 used in the object table.
437
438 Invocation and Code References
439 ------------------------------
440
441 Modules: there is no meaningful invocation reference since modules cannot be
442 explicitly called.
443
444 Functions: a simple code reference is employed pointing to code implementing
445 the function. Note that the function locals are completely distinct from this
446 structure and are not comparable to attributes. Instead, attributes are
447 reserved for default parameter values, although they do not appear in the
448 object table described above, appearing instead in a separate parameter table
449 described below.
450
451 Classes: given that classes must be invoked in order to create instances, a
452 reference must be provided in class structures. However, this reference does
453 not point directly at the __init__ method of the class. Instead, the
454 referenced code belongs to a special initialiser function, __new__, consisting
455 of the following instructions:
456
457 create instance for C
458 call C.__init__(instance, ...)
459 return instance
460
461 Instances: each instance employs a reference to any __call__ method defined in
462 the class hierarchy for the instance, thus maintaining its callable nature.
463
464 Both classes and modules may contain code in their definitions - the former in
465 the "body" of the class, potentially defining attributes, and the latter as
466 the "top-level" code in the module, potentially defining attributes/globals -
467 but this code is not associated with any invocation target. It is thus
468 generated in order of appearance and is not referenced externally.
469
470 Invocation Operation
471 --------------------
472
473 Consequently, regardless of the object an invocation is always done as
474 follows:
475
476 get invocation reference from the header
477 jump to reference
478
479 Additional preparation is necessary before the above code: positional
480 arguments must be saved in the invocation frame, and keyword arguments must be
481 resolved and saved to the appropriate position in the invocation frame.
482
483 See invocation.txt for details.
484
485 Parameters and Lookups
486 ======================
487
488 Since Python supports keyword arguments when making invocations, it becomes
489 necessary to record the parameter names associated with each function or
490 method. Just as object tables record attributes positions on classes and
491 instances, parameter tables record parameter positions in function or method
492 parameter lists.
493
494 For a given program, a parameter table can be considered as being like a
495 matrix mapping functions/methods to parameter names. For example:
496
497 def f(x, y, z):
498 pass
499
500 def g(a, b, c):
501 pass
502
503 def h(a, x):
504 pass
505
506 This would provide the following table, referred to as a parameter table in
507 the context of functions and methods:
508
509 Function/param a b c x y z
510
511 f 1 2 3
512 g 1 2 3
513 h 1 2
514
515 Confusion can occur when functions are adopted as methods, since the context
516 then occupies the first slot in the invocation frame:
517
518 def f(x, y, z):
519 pass
520
521 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
522 -> f(1, 2, 3)
523
524 class C:
525 f = f
526
527 def g(x, y, z):
528 pass
529
530 c = C()
531
532 c.f(y=2, z=3) -> f(<context>, 2, 3)
533 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
534
535 Just as with parameter tables, a displacement list can be prepared from a
536 parameter table:
537
538 Functions with parameter (attribute) offsets
539
540 funccode f
541 attrcode a b c x y z
542
543 g
544 a b c x y z
545
546 h
547 a b c x y z
548
549 List . . . 1 2 3 1 2 3 1 . . 2 . .
550
551 Here, the funccode refers to the offset in the list at which a function's
552 parameters are defined, whereas the attrcode defines the offset within a
553 region of attributes corresponding to a single parameter of a given name.
554
555 Instantiation
556 =============
557
558 When instantiating classes, memory must be reserved for the header of the
559 resulting instance, along with locations for the attributes of the instance.
560 Since the instance header contains data common to all instances of a class, a
561 template header is copied to the start of the newly reserved memory region.
562
563 Register Usage
564 ==============
565
566 During code generation, much of the evaluation produces results which are
567 implicitly recorded in the "active value" or "working" register, and various
568 instructions will consume this active value. In addition, some instructions
569 will consume a separate "active source value" from a register, typically those
570 which are assigning the result of an expression to an assignment target.
571
572 Since values often need to be retained for later use, a set of temporary
573 storage locations are typically employed. However, optimisations may reduce
574 the need to use such temporary storage where instructions which provide the
575 "active value" can be re-executed and will produce the same result. Whether
576 re-use of instructions is possible or not, values still need to be loaded into
577 a "source" register to be accessed by an assignment instruction.
578
579 RSVP instructions generally have the notion of working, target and source
580 registers (see registers.txt). These register "roles" are independent from the
581 actual registers defined for the RSVP machine itself, even though the naming
582 is similar. Generally, instructions do regard the RSVP "working" registers as
583 the class of register in the "working" role, although some occurrences of
584 instruction usage may employ other registers (such as the "result" registers)
585 and thus take advantage of the generality of the RSVP implementation of such
586 instructions.
587
588 List and Tuple Representations
589 ==============================
590
591 Since tuples have a fixed size, the representation of a tuple instance is
592 merely a header describing the size of the entire object, together with a
593 sequence of references to the object "stored" at each position in the
594 structure. Such references consist of the usual context and reference pair.
595
596 Lists, however, have a variable size and must be accessible via an unchanging
597 location even as more memory is allocated elsewhere to accommodate the
598 contents of the list. Consequently, the representation must resemble the
599 following:
600
601 Structure header for list (size == header plus special attribute)
602 Special attribute referencing the underlying sequence
603
604 The underlying sequence has a fixed size, like a tuple, but may contain fewer
605 elements than the size of the sequence permits:
606
607 Special header indicating the current size and allocated size
608 Element
609 ... <-- current size
610 (Unused space)
611 ... <-- allocated size
612
613 This representation permits the allocation of a new sequence when space is
614 exhausted in an existing sequence, with the new sequence address stored in the
615 main list structure. Since access to the contents of the list must go through
616 the main list structure, underlying allocation activities may take place
617 without the users of a list having to be aware of such activities.