1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Contexts and values
8 * Tables, attributes and lookups
9 * Objects and structures
10 * Parameters and lookups
11 * Instantiation
12 * Register usage
13 * List and tuple representations
14
15 Namespaces and Attribute Definition
16 ===================================
17
18 Namespaces are any objects which can retain attributes.
19
20 * Module attributes are defined either at the module level or by global
21 statements.
22 * Class attributes are defined only within class statements.
23 * Instance attributes are defined only by assignments to attributes of self.
24
25 These restrictions apply because such attributes are thus explicitly declared,
26 permitting the use of tables (described below). Module and class attributes
27 can also be finalised in this way in order to permit certain optimisations.
28
29 An additional restriction required for the current implementation of tables
30 (as described below) applies to class definitions: each class must be defined
31 using a unique name; repeated definition of classes having the same name is
32 thus not permitted. This restriction arises from the use of the "full name" of
33 a class as a key to the object table, where the full name is a qualified path
34 via the module hierarchy ending with the name of the class.
35
36 Rebinding of attributes outside classes and modules can be allowed if
37 attribute usage observations are being used to detect such external
38 modifications to such objects. Without such observations, such rebinding
39 should be forbidden since apparently constant attributes might be modified in
40 a running program, but code may have been generated that provides specific
41 objects for those attributes under the assumption that they will not be
42 changed.
43
44 See rejected.txt for complicating mechanisms which could be applied to
45 mitigate the effects of these restrictions on optimisations.
46
47 Contexts and Values
48 ===================
49
50 Values are used as the common reference representation in micropython: as
51 stored representations of attributes (of classes, instances, modules, and
52 other objects supporting attribute-like entities) as well as the stored values
53 associated with names in functions and methods.
54
55 Unlike other implementations, micropython does not create things like bound
56 method objects for individual instances. Instead, all objects are referenced
57 using a context, reference pair:
58
59 Value Layout
60 ------------
61
62 0 1
63 context object
64 reference reference
65
66 Specific implementations might reverse this ordering for optimisation
67 purposes.
68
69 Rationale
70 ---------
71
72 To reduce the number of created objects whilst retaining the ability to
73 support bound method invocations. The context indicates the context in which
74 an invocation is performed, typically the owner of the method.
75
76 Usage
77 -----
78
79 The context may be inserted as the first argument when a value is involved in
80 an invocation. This argument may then be omitted from the invocation if its
81 usage is not appropriate.
82
83 See invocation.txt for details.
84
85 Context Value Types
86 -------------------
87
88 The following types of context value exist:
89
90 Type Usage Transformations
91 ---- ----- ---------------
92
93 Replaceable With functions (not methods) May be replaced with an
94 instance or a class when a
95 value is stored on an
96 instance or class
97
98 Placeholder With classes May not be replaced
99
100 Instance With instances (and constants) May not be replaced
101 or functions as methods
102
103 Class With functions as methods May be replaced when a
104 value is loaded from a
105 class attribute via an
106 instance
107
108 Contexts in Acquired Values
109 ---------------------------
110
111 There are four classes of instructions which provide values:
112
113 Instruction Purpose Context Operations
114 ----------- ------- ------------------
115
116 1) LoadConst Load module, constant Use loaded object with itself
117 as context
118
119 2) LoadFunction Load function Combine replaceable context
120 with loaded object
121
122 3) LoadClass Load class Combine placeholder context
123 with loaded object
124
125 4) LoadAddress* Load attribute from Preserve or override stored
126 LoadAttr* class, module, context (as described in
127 instance assignment.txt)
128
129 In order to comply with traditional Python behaviour, contexts may or may not
130 represent the object from which an attribute has been acquired.
131
132 See assignment.txt for details.
133
134 Contexts in Stored Values
135 -------------------------
136
137 There are two classes of instruction for storing values:
138
139 Instruction Purpose Context Operations
140 ----------- ------- ------------------
141
142 1) StoreAddress Store attribute in a Preserve context; note that no
143 known object test for class attribute
144 assignment should be necessary
145 since this instruction should only
146 be generated for module globals
147
148 StoreAttr Store attribute in an Preserve context; note that no
149 instance test for class attribute
150 assignment should be necessary
151 since this instruction should only
152 be generated for self accesses
153
154 StoreAttrIndex Store attribute in an Preserve context; since the index
155 unknown object lookup could yield a class
156 attribute, a test of the nature of
157 the nature of the structure is
158 necessary in order to prevent
159 assignments to classes
160
161 2) StoreAddressContext Store attribute in a Override context if appropriate;
162 known object if the value has a replaceable
163 context, permit the target to
164 take ownership of the value
165
166 See assignment.txt for details.
167
168 Tables, Attributes and Lookups
169 ==============================
170
171 Attribute lookups, where the exact location of an object attribute is deduced,
172 are performed differently in micropython than in other implementations.
173 Instead of providing attribute dictionaries, in which attributes are found,
174 attributes are located at fixed places in object structures (described below)
175 and their locations are stored using a special representation known as a
176 table.
177
178 For a given program, a table can be considered as being like a matrix mapping
179 classes to attribute names. For example:
180
181 class A:
182 # instances have attributes x, y
183
184 class B(A):
185 # introduces attribute z for instances
186
187 class C:
188 # instances have attributes a, b, z
189
190 This would provide the following table, referred to as an object table in the
191 context of classes and instances:
192
193 Class/attr a b x y z
194
195 A 1 2
196 B 1 2 3
197 C 1 2 3
198
199 A limitation of this representation is that instance attributes may not shadow
200 class attributes: if an attribute with a given name is not defined on an
201 instance, an attribute with the same name cannot be provided by the class of
202 the instance or any superclass of the instance's class. This impacts the
203 provision of the __class__ attribute, as described below.
204
205 The table can be compacted using a representation known as a displacement
206 list (referred to as an object list in this context):
207
208 Classes with attribute offsets
209
210 classcode A
211 attrcode a b x y z
212
213 B
214 a b x y z
215
216 C
217 a b x y z
218
219 List . . 1 2 1 2 3 1 2 . . 3
220
221 Here, the classcode refers to the offset in the list at which a class's
222 attributes are defined, whereas the attrcode defines the offset within a
223 region of attributes corresponding to a single attribute of a given name.
224
225 Attribute Locations
226 -------------------
227
228 The locations stored in table/list elements are generally for instance
229 attributes relative to the location of the instance, whereas those for class
230 attributes and module attributes are generally absolute addresses. Thus, each
231 occupied table cell has the following structure:
232
233 attrcode, uses-absolute-address, address (or location)
234
235 This could be given instead as follows:
236
237 attrcode, is-class-or-module, location
238
239 Since uses-absolute-address corresponds to is-class-or-module, and since there
240 is a need to test for classes and modules to prevent assignment to attributes
241 of such objects, this particular information is always required.
242
243 The __class__ Attribute
244 -----------------------
245
246 In Python 2.x, at least with new-style classes, instances have __class__
247 attributes which indicate the class from which they have been instantiated,
248 whereas classes have __class__ attributes which reference the type class.
249 With the object table, it is not possible to provide absolute addresses which
250 can be used for both classes and instances, since this would result in classes
251 and instances having the same class, and thus the class of a class would be
252 the class itself.
253
254 One solution is to use object-relative values in the table so that referencing
255 the __class__ attribute of an instance produces a value which can be combined
256 with an instance's address to yield the address of the attribute, which itself
257 refers to the instance's class, whereas referencing the __class__ attribute of
258 a class produces a similar object-relative value that is combined with the
259 class's address to yield the address of the attribute, which itself refers to
260 the special type class.
261
262 Obviously, the above solution requires both classes and instances to retain an
263 attribute location specifically to hold the value appropriate for each object
264 type, whereas a scheme which omits the __class__ attribute on classes would be
265 able to employ an absolute address in the table and maintain only a single
266 address to refer to the class for all instances. The only problem with not
267 providing a sensible __class__ attribute entry for classes would be the need
268 for special treatment of __class__ to prevent inappropriate consultation of
269 the table for classes.
270
271 Comparing Tables as Matrices with Displacement Lists
272 ----------------------------------------------------
273
274 Although displacement lists can provide reasonable levels of compaction for
275 attribute data, the element size is larger than that required for a simple
276 matrix: the attribute code (attrcode) need not be stored since each element
277 unambiguously refers to the availability of an attribute for a particular
278 class or instance of that class, and so the data at a given element need not
279 be tested for relevance to a given attribute access operation.
280
281 Given a program with 20 object types and 100 attribute types, a matrix would
282 occupy the following amount of space:
283
284 number of object types * number of attribute types * element size
285 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
286 = 2000
287
288 In contrast, given a compaction to 40% of the matrix size (without considering
289 element size) in a displacement list, the amount of space would be as follows:
290
291 number of elements * element size
292 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
293 = 1600
294
295 Consequently, the principal overhead of using a displacement list is likely to
296 be in the need to check element relevance when retrieving values from such a
297 list.
298
299 Objects and Structures
300 ======================
301
302 As well as references, micropython needs to have actual objects to refer to.
303 Since classes, functions and instances are all objects, it is desirable that
304 certain common features and operations are supported in the same way for all
305 of these things. To permit this, a common data structure format is used.
306
307 Header.................................................... Attributes.................
308
309 Identifier Identifier Address Identifier Size Object ...
310
311 0 1 2 3 4 5 6
312 classcode attrcode/ invocation funccode size attribute ...
313 instance reference reference
314 status
315
316 Classcode
317 ---------
318
319 Used in attribute lookup.
320
321 Here, the classcode refers to the attribute lookup table for the object (as
322 described above). Classes and instances share the same classcode, and their
323 structures reflect this. Functions all belong to the same type and thus employ
324 the classcode for the function built-in type, whereas modules have distinct
325 types since they must support different sets of attributes.
326
327 Attrcode
328 --------
329
330 Used to test instances for membership of classes (or descendants of classes).
331
332 Since, in traditional Python, classes are only ever instances of some generic
333 built-in type, support for testing such a relationship directly has been
334 removed and the attrcode is not specified for classes: the presence of an
335 attrcode indicates that a given object is an instance. In addition, support
336 has also been removed for testing modules in the same way, meaning that the
337 attrcode is also not specified for modules.
338
339 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
340 for details of attrcodes.
341
342 Invocation Reference
343 --------------------
344
345 Used when an object is called.
346
347 This is the address of the code to be executed when an invocation is performed
348 on the object.
349
350 Funccode
351 --------
352
353 Used to look up argument positions by name.
354
355 The strategy with keyword arguments in micropython is to attempt to position
356 such arguments in the invocation frame as it is being constructed.
357
358 See the "Parameters and Lookups" section for more information.
359
360 Size
361 ----
362
363 Used to indicate the size of an object including attributes.
364
365 Attributes
366 ----------
367
368 For classes, modules and instances, the attributes in the structure correspond
369 to the attributes of each kind of object. For functions, however, the
370 attributes in the structure correspond to the default arguments for each
371 function, if any.
372
373 Structure Types
374 ---------------
375
376 Class C:
377
378 0 1 2 3 4 5 6
379 classcode (unused) __new__ funccode size attribute ...
380 for C reference for reference
381 instantiator
382
383 Instance of C:
384
385 0 1 2 3 4 5 6
386 classcode attrcode C.__call__ funccode size attribute ...
387 for C for C reference for reference
388 (if exists) C.__call__
389
390 Function f:
391
392 0 1 2 3 4 5 6
393 classcode attrcode code funccode size attribute ...
394 for for reference (default)
395 function function reference
396
397 Module m:
398
399 0 1 2 3 4 5 6
400 classcode attrcode (unused) (unused) (unused) attribute ...
401 for m for m (global)
402 reference
403
404 The __class__ Attribute
405 -----------------------
406
407 All objects should support the __class__ attribute, and in most cases this is
408 done using the object table, yielding a common address for all instances of a
409 given class.
410
411 Function: refers to the function class
412 Instance: refers to the class instantiated to make the object
413
414 The object table cannot support two definitions simultaneously for both
415 instances and their classes. Consequently, __class__ access on classes must be
416 tested for and a special result returned.
417
418 Class: refers to the type class (type.__class__ also refers to the type class)
419
420 For convenience, the first attribute of a class will be the common __class__
421 attribute for all its instances. As noted above, direct access to this
422 attribute will not be possible for classes, and a constant result will be
423 returned instead.
424
425 Lists and Tuples
426 ----------------
427
428 The built-in list and tuple sequences employ variable length structures using
429 the attribute locations to store their elements, where each element is a
430 reference to a separately stored object.
431
432 Testing Instance Compatibility with Classes (Attrcode)
433 ------------------------------------------------------
434
435 Although it would be possible to have a data structure mapping classes to
436 compatible classes, such as a matrix indicating the subclasses (or
437 superclasses) of each class, the need to retain the key to such a data
438 structure for each class might introduce a noticeable overhead.
439
440 Instead of having a separate structure, descendant classes of each class are
441 inserted as special attributes into the object table. This requires an extra
442 key to be retained, since each class must provide its own attribute code such
443 that upon an instance/class compatibility test, the code may be obtained and
444 used in the object table.
445
446 Invocation and Code References
447 ------------------------------
448
449 Modules: there is no meaningful invocation reference since modules cannot be
450 explicitly called.
451
452 Functions: a simple code reference is employed pointing to code implementing
453 the function. Note that the function locals are completely distinct from this
454 structure and are not comparable to attributes. Instead, attributes are
455 reserved for default parameter values, although they do not appear in the
456 object table described above, appearing instead in a separate parameter table
457 described below.
458
459 Classes: given that classes must be invoked in order to create instances, a
460 reference must be provided in class structures. However, this reference does
461 not point directly at the __init__ method of the class. Instead, the
462 referenced code belongs to a special initialiser function, __new__, consisting
463 of the following instructions:
464
465 create instance for C
466 call C.__init__(instance, ...)
467 return instance
468
469 Instances: each instance employs a reference to any __call__ method defined in
470 the class hierarchy for the instance, thus maintaining its callable nature.
471
472 Both classes and modules may contain code in their definitions - the former in
473 the "body" of the class, potentially defining attributes, and the latter as
474 the "top-level" code in the module, potentially defining attributes/globals -
475 but this code is not associated with any invocation target. It is thus
476 generated in order of appearance and is not referenced externally.
477
478 Invocation Operation
479 --------------------
480
481 Consequently, regardless of the object an invocation is always done as
482 follows:
483
484 get invocation reference from the header
485 jump to reference
486
487 Additional preparation is necessary before the above code: positional
488 arguments must be saved in the invocation frame, and keyword arguments must be
489 resolved and saved to the appropriate position in the invocation frame.
490
491 See invocation.txt for details.
492
493 Parameters and Lookups
494 ======================
495
496 Since Python supports keyword arguments when making invocations, it becomes
497 necessary to record the parameter names associated with each function or
498 method. Just as object tables record attributes positions on classes and
499 instances, parameter tables record parameter positions in function or method
500 parameter lists.
501
502 For a given program, a parameter table can be considered as being like a
503 matrix mapping functions/methods to parameter names. For example:
504
505 def f(x, y, z):
506 pass
507
508 def g(a, b, c):
509 pass
510
511 def h(a, x):
512 pass
513
514 This would provide the following table, referred to as a parameter table in
515 the context of functions and methods:
516
517 Function/param a b c x y z
518
519 f 1 2 3
520 g 1 2 3
521 h 1 2
522
523 Confusion can occur when functions are adopted as methods, since the context
524 then occupies the first slot in the invocation frame:
525
526 def f(x, y, z):
527 pass
528
529 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
530 -> f(1, 2, 3)
531
532 class C:
533 f = f
534
535 def g(x, y, z):
536 pass
537
538 c = C()
539
540 c.f(y=2, z=3) -> f(<context>, 2, 3)
541 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
542
543 Just as with parameter tables, a displacement list can be prepared from a
544 parameter table:
545
546 Functions with parameter (attribute) offsets
547
548 funccode f
549 attrcode a b c x y z
550
551 g
552 a b c x y z
553
554 h
555 a b c x y z
556
557 List . . . 1 2 3 1 2 3 1 . . 2 . .
558
559 Here, the funccode refers to the offset in the list at which a function's
560 parameters are defined, whereas the attrcode defines the offset within a
561 region of attributes corresponding to a single parameter of a given name.
562
563 Instantiation
564 =============
565
566 When instantiating classes, memory must be reserved for the header of the
567 resulting instance, along with locations for the attributes of the instance.
568 Since the instance header contains data common to all instances of a class, a
569 template header is copied to the start of the newly reserved memory region.
570
571 Register Usage
572 ==============
573
574 During code generation, much of the evaluation produces results which are
575 implicitly recorded in the "active value" or "working" register, and various
576 instructions will consume this active value. In addition, some instructions
577 will consume a separate "active source value" from a register, typically those
578 which are assigning the result of an expression to an assignment target.
579
580 Since values often need to be retained for later use, a set of temporary
581 storage locations are typically employed. However, optimisations may reduce
582 the need to use such temporary storage where instructions which provide the
583 "active value" can be re-executed and will produce the same result. Whether
584 re-use of instructions is possible or not, values still need to be loaded into
585 a "source" register to be accessed by an assignment instruction.
586
587 RSVP instructions generally have the notion of working, target and source
588 registers (see registers.txt). These register "roles" are independent from the
589 actual registers defined for the RSVP machine itself, even though the naming
590 is similar. Generally, instructions do regard the RSVP "working" registers as
591 the class of register in the "working" role, although some occurrences of
592 instruction usage may employ other registers (such as the "result" registers)
593 and thus take advantage of the generality of the RSVP implementation of such
594 instructions.
595
596 List and Tuple Representations
597 ==============================
598
599 Since tuples have a fixed size, the representation of a tuple instance is
600 merely a header describing the size of the entire object, together with a
601 sequence of references to the object "stored" at each position in the
602 structure. Such references consist of the usual context and reference pair.
603
604 Lists, however, have a variable size and must be accessible via an unchanging
605 location even as more memory is allocated elsewhere to accommodate the
606 contents of the list. Consequently, the representation must resemble the
607 following:
608
609 Structure header for list (size == header plus special attribute)
610 Special attribute referencing the underlying sequence
611
612 The underlying sequence has a fixed size, like a tuple, but may contain fewer
613 elements than the size of the sequence permits:
614
615 Special header indicating the current size and allocated size
616 Element
617 ... <-- current size
618 (Unused space)
619 ... <-- allocated size
620
621 This representation permits the allocation of a new sequence when space is
622 exhausted in an existing sequence, with the new sequence address stored in the
623 main list structure. Since access to the contents of the list must go through
624 the main list structure, underlying allocation activities may take place
625 without the users of a list having to be aware of such activities.