1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Attribute usage observations
8 * Imports and circular import detection
9 * Contexts and values
10 * Tables, attributes and lookups
11 * Objects and structures
12 * Parameters and lookups
13 * Instantiation
14 * Register usage
15 * List and tuple representations
16
17 Namespaces and Attribute Definition
18 ===================================
19
20 Namespaces are any objects which can retain attributes.
21
22 * Module attributes are defined either at the module level or by global
23 statements.
24 * Class attributes are defined only within class statements.
25 * Instance attributes are defined only by assignments to attributes of self
26 or tentatively as references to attributes of self.
27
28 These restrictions apply because such attributes are thus explicitly declared,
29 permitting the use of tables (described below). Module and class attributes
30 can also be finalised in this way in order to permit certain optimisations.
31
32 An additional restriction required for the current implementation of tables
33 (as described below) applies to class definitions: each class must be defined
34 using a unique name; repeated definition of classes having the same name is
35 thus not permitted. This restriction arises from the use of the "full name" of
36 a class as a key to the object table, where the full name is a qualified path
37 via the module hierarchy ending with the name of the class.
38
39 Rebinding of attributes outside classes and modules can be allowed if
40 attribute usage observations are being used to detect such external
41 modifications to such objects. Without such observations, such rebinding
42 should be forbidden since apparently constant attributes might be modified in
43 a running program, but code may have been generated that provides specific
44 objects for those attributes under the assumption that they will not be
45 changed.
46
47 See rejected.txt for complicating mechanisms which could be applied to
48 mitigate the effects of these restrictions on optimisations.
49
50 Attribute Usage Observations
51 ============================
52
53 Within a scope, a name may be used in conjunction with attribute names in
54 order to access attributes on objects referenced by the name. However, such
55 observations can only be regarded as reliable if the object referenced is not
56 changed independently by some other mechanism or activity.
57
58 With conventional functions and methods, any locally defined names can be
59 considered private to that scope and thus immune to independent modification,
60 at least within reasonable features of the language. Although stack
61 modification may be possible, it seems appropriate to reject such features,
62 especially since they lend themselves to unmaintainable programs.
63
64 For names defined during the initialisation of a class, since the class itself
65 cannot be referenced by name until its declaration has been completely
66 evaluated, no independent modification can occur from outside the class scope.
67
68 For names defined during the initialisation of a module, global declarations
69 in functions permit the rebinding of global variables and since functions may
70 be invoked during module initialisation, independent modification can
71 potentially occur if any functions are called.
72
73 Module globals can be accessed from other modules that can refer to a module
74 by its name. Initially, an import is required to make a module available for
75 modification, but there is no restriction on whether a module has been
76 completely imported (and thus defined) before an import statement can make it
77 available to other modules. Consider the following package root definition:
78
79 # Module changed:
80 def f():
81 import changed.modifier
82 x = 123
83 f()
84
85 # Module changed.modifier:
86 import changed
87 changed.x = 456
88
89 Here, an import of changed will initially set x to 123, but then the function
90 f will be invoked and cause the changed.modifier module to be imported. Since
91 the changed module is already being imported, the import statement will not
92 try to perform the import operation again, but it will make the partially
93 defined module available for access. Thus, the changed.modifier module will
94 then set x to 456, and independent modification of the changed namespace will
95 have been performed.
96
97 In conclusion, module globals cannot therefore be regarded as immune to
98 operations that would disrupt usage observations. Consequently, only locals
99 and class definition "locals" can be reliably employed in attribute usage
100 observations.
101
102 Imports and Circular Import Detection
103 =====================================
104
105 The matter of whether any given module is potentially modified by another
106 module before it has been completely imported can be addressed by separating
107 the import process into distinct phases:
108
109 1. Discovery/loading
110 2. Parsing/structure processing
111 3. Completion/code processing
112
113 Upon discovering a module, a test is made to determine whether it is already
114 being imported; if not, it is then loaded and its structure inspected to
115 determine whether it may import other modules, which will then in turn be
116 discovered and loaded. Once no more modules can be loaded, they will then be
117 completed by undergoing the more thorough systematic processing of each
118 module's contents, defining program units and requesting the completion of
119 other modules when import statements are encountered.
120
121 The motivation for such a multi-phase approach is to detect circular imports
122 in the structure processing phase before modules are populated and deductions
123 made about their objects' behaviour. Thus, globals belonging to a module known
124 to be imported in a circular fashion will already be regarded as potentially
125 modifiable by other modules and attribute usage observations will not be
126 recorded.
127
128 Contexts and Values
129 ===================
130
131 Values are used as the common reference representation in micropython: as
132 stored representations of attributes (of classes, instances, modules, and
133 other objects supporting attribute-like entities) as well as the stored values
134 associated with names in functions and methods.
135
136 Unlike other implementations, micropython does not create things like bound
137 method objects for individual instances. Instead, all objects are referenced
138 using a context, reference pair:
139
140 Value Layout
141 ------------
142
143 0 1
144 context object
145 reference reference
146
147 Specific implementations might reverse this ordering for optimisation
148 purposes.
149
150 Rationale
151 ---------
152
153 To reduce the number of created objects whilst retaining the ability to
154 support bound method invocations. The context indicates the context in which
155 an invocation is performed, typically the owner of the method.
156
157 Usage
158 -----
159
160 The context may be inserted as the first argument when a value is involved in
161 an invocation. This argument may then be omitted from the invocation if its
162 usage is not appropriate.
163
164 See invocation.txt for details.
165
166 Context Value Types
167 -------------------
168
169 The following types of context value exist:
170
171 Type Usage Transformations
172 ---- ----- ---------------
173
174 Replaceable With functions (not methods) May be replaced with an
175 instance or a class when a
176 value is stored on an
177 instance or class
178
179 Placeholder With classes May not be replaced
180
181 Instance With instances (and constants) May not be replaced
182 or functions as methods
183
184 Class With functions as methods May be replaced when a
185 value is loaded from a
186 class attribute via an
187 instance
188
189 Contexts in Acquired Values
190 ---------------------------
191
192 There are four classes of instructions which provide values:
193
194 Instruction Purpose Context Operations
195 ----------- ------- ------------------
196
197 1) LoadConst Load module, constant Use loaded object with itself
198 as context
199
200 2) LoadFunction Load function Combine replaceable context
201 with loaded object
202
203 3) LoadClass Load class Combine placeholder context
204 with loaded object
205
206 4) LoadAddress* Load attribute from Preserve or override stored
207 LoadAttr* class, module, context (as described in
208 instance assignment.txt)
209
210 In order to comply with traditional Python behaviour, contexts may or may not
211 represent the object from which an attribute has been acquired.
212
213 See assignment.txt for details.
214
215 Contexts in Stored Values
216 -------------------------
217
218 There are two classes of instruction for storing values:
219
220 Instruction Purpose Context Operations
221 ----------- ------- ------------------
222
223 1) StoreAddress Store attribute in a Preserve context; note that no
224 known object test for class attribute
225 assignment should be necessary
226 since this instruction should only
227 be generated for module globals
228
229 StoreAttr Store attribute in an Preserve context; note that no
230 instance test for class attribute
231 assignment should be necessary
232 since this instruction should only
233 be generated for self accesses
234
235 StoreAttrIndex Store attribute in an Preserve context; since the index
236 unknown object lookup could yield a class
237 attribute, a test of the nature of
238 the nature of the structure is
239 necessary in order to prevent
240 assignments to classes
241
242 2) StoreAddressContext Store attribute in a Override context if appropriate;
243 known object if the value has a replaceable
244 context, permit the target to
245 take ownership of the value
246
247 See assignment.txt for details.
248
249 Tables, Attributes and Lookups
250 ==============================
251
252 Attribute lookups, where the exact location of an object attribute is deduced,
253 are performed differently in micropython than in other implementations.
254 Instead of providing attribute dictionaries, in which attributes are found,
255 attributes are located at fixed places in object structures (described below)
256 and their locations are stored using a special representation known as a
257 table.
258
259 For a given program, a table can be considered as being like a matrix mapping
260 classes to attribute names. For example:
261
262 class A:
263 # instances have attributes x, y
264
265 class B(A):
266 # introduces attribute z for instances
267
268 class C:
269 # instances have attributes a, b, z
270
271 This would provide the following table, referred to as an object table in the
272 context of classes and instances:
273
274 Class/attr a b x y z
275
276 A 1 2
277 B 1 2 3
278 C 1 2 3
279
280 A limitation of this representation is that instance attributes may not shadow
281 class attributes: if an attribute with a given name is not defined on an
282 instance, an attribute with the same name cannot be provided by the class of
283 the instance or any superclass of the instance's class. This impacts the
284 provision of the __class__ attribute, as described below.
285
286 The table can be compacted using a representation known as a displacement
287 list (referred to as an object list in this context):
288
289 Classes with attribute offsets
290
291 classcode A
292 attrcode a b x y z
293
294 B
295 a b x y z
296
297 C
298 a b x y z
299
300 List . . 1 2 1 2 3 1 2 . . 3
301
302 Here, the classcode refers to the offset in the list at which a class's
303 attributes are defined, whereas the attrcode defines the offset within a
304 region of attributes corresponding to a single attribute of a given name.
305
306 Attribute Locations
307 -------------------
308
309 The locations stored in table/list elements are generally for instance
310 attributes relative to the location of the instance, whereas those for class
311 attributes and module attributes are generally absolute addresses. Thus, each
312 occupied table cell has the following structure:
313
314 attrcode, uses-absolute-address, address (or location)
315
316 This could be given instead as follows:
317
318 attrcode, is-class-or-module, location
319
320 Since uses-absolute-address corresponds to is-class-or-module, and since there
321 is a need to test for classes and modules to prevent assignment to attributes
322 of such objects, this particular information is always required.
323
324 The __class__ Attribute
325 -----------------------
326
327 In Python 2.x, at least with new-style classes, instances have __class__
328 attributes which indicate the class from which they have been instantiated,
329 whereas classes have __class__ attributes which reference the type class.
330 With the object table, it is not possible to provide absolute addresses which
331 can be used for both classes and instances, since this would result in classes
332 and instances having the same class, and thus the class of a class would be
333 the class itself.
334
335 One solution is to use object-relative values in the table so that referencing
336 the __class__ attribute of an instance produces a value which can be combined
337 with an instance's address to yield the address of the attribute, which itself
338 refers to the instance's class, whereas referencing the __class__ attribute of
339 a class produces a similar object-relative value that is combined with the
340 class's address to yield the address of the attribute, which itself refers to
341 the special type class.
342
343 Obviously, the above solution requires both classes and instances to retain an
344 attribute location specifically to hold the value appropriate for each object
345 type, whereas a scheme which omits the __class__ attribute on classes would be
346 able to employ an absolute address in the table and maintain only a single
347 address to refer to the class for all instances. The only problem with not
348 providing a sensible __class__ attribute entry for classes would be the need
349 for special treatment of __class__ to prevent inappropriate consultation of
350 the table for classes.
351
352 Comparing Tables as Matrices with Displacement Lists
353 ----------------------------------------------------
354
355 Although displacement lists can provide reasonable levels of compaction for
356 attribute data, the element size is larger than that required for a simple
357 matrix: the attribute code (attrcode) need not be stored since each element
358 unambiguously refers to the availability of an attribute for a particular
359 class or instance of that class, and so the data at a given element need not
360 be tested for relevance to a given attribute access operation.
361
362 Given a program with 20 object types and 100 attribute types, a matrix would
363 occupy the following amount of space:
364
365 number of object types * number of attribute types * element size
366 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
367 = 2000
368
369 In contrast, given a compaction to 40% of the matrix size (without considering
370 element size) in a displacement list, the amount of space would be as follows:
371
372 number of elements * element size
373 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
374 = 1600
375
376 Consequently, the principal overhead of using a displacement list is likely to
377 be in the need to check element relevance when retrieving values from such a
378 list.
379
380 Objects and Structures
381 ======================
382
383 As well as references, micropython needs to have actual objects to refer to.
384 Since classes, functions and instances are all objects, it is desirable that
385 certain common features and operations are supported in the same way for all
386 of these things. To permit this, a common data structure format is used.
387
388 Header.................................................... Attributes.................
389
390 Identifier Identifier Address Identifier Size Object ...
391
392 0 1 2 3 4 5 6
393 classcode attrcode/ invocation funccode size attribute ...
394 instance reference reference
395 status
396
397 Classcode
398 ---------
399
400 Used in attribute lookup.
401
402 Here, the classcode refers to the attribute lookup table for the object (as
403 described above). Classes and instances share the same classcode, and their
404 structures reflect this. Functions all belong to the same type and thus employ
405 the classcode for the function built-in type, whereas modules have distinct
406 types since they must support different sets of attributes.
407
408 Attrcode
409 --------
410
411 Used to test instances for membership of classes (or descendants of classes).
412
413 Since, in traditional Python, classes are only ever instances of some generic
414 built-in type, support for testing such a relationship directly has been
415 removed and the attrcode is not specified for classes: the presence of an
416 attrcode indicates that a given object is an instance. In addition, support
417 has also been removed for testing modules in the same way, meaning that the
418 attrcode is also not specified for modules.
419
420 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
421 for details of attrcodes.
422
423 Invocation Reference
424 --------------------
425
426 Used when an object is called.
427
428 This is the address of the code to be executed when an invocation is performed
429 on the object.
430
431 Funccode
432 --------
433
434 Used to look up argument positions by name.
435
436 The strategy with keyword arguments in micropython is to attempt to position
437 such arguments in the invocation frame as it is being constructed.
438
439 See the "Parameters and Lookups" section for more information.
440
441 Size
442 ----
443
444 Used to indicate the size of an object including attributes.
445
446 Attributes
447 ----------
448
449 For classes, modules and instances, the attributes in the structure correspond
450 to the attributes of each kind of object. For functions, however, the
451 attributes in the structure correspond to the default arguments for each
452 function, if any.
453
454 Structure Types
455 ---------------
456
457 Class C:
458
459 0 1 2 3 4 5 6
460 classcode (unused) __new__ funccode size attribute ...
461 for C reference for reference
462 instantiator
463
464 Instance of C:
465
466 0 1 2 3 4 5 6
467 classcode attrcode C.__call__ funccode size attribute ...
468 for C for C reference for reference
469 (if exists) C.__call__
470
471 Function f:
472
473 0 1 2 3 4 5 6
474 classcode attrcode code funccode size attribute ...
475 for for reference (default)
476 function function reference
477
478 Module m:
479
480 0 1 2 3 4 5 6
481 classcode attrcode (unused) (unused) (unused) attribute ...
482 for m for m (global)
483 reference
484
485 The __class__ Attribute
486 -----------------------
487
488 All objects should support the __class__ attribute, and in most cases this is
489 done using the object table, yielding a common address for all instances of a
490 given class.
491
492 Function: refers to the function class
493 Instance: refers to the class instantiated to make the object
494
495 The object table cannot support two definitions simultaneously for both
496 instances and their classes. Consequently, __class__ access on classes must be
497 tested for and a special result returned.
498
499 Class: refers to the type class (type.__class__ also refers to the type class)
500
501 For convenience, the first attribute of a class will be the common __class__
502 attribute for all its instances. As noted above, direct access to this
503 attribute will not be possible for classes, and a constant result will be
504 returned instead.
505
506 Lists and Tuples
507 ----------------
508
509 The built-in list and tuple sequences employ variable length structures using
510 the attribute locations to store their elements, where each element is a
511 reference to a separately stored object.
512
513 Testing Instance Compatibility with Classes (Attrcode)
514 ------------------------------------------------------
515
516 Although it would be possible to have a data structure mapping classes to
517 compatible classes, such as a matrix indicating the subclasses (or
518 superclasses) of each class, the need to retain the key to such a data
519 structure for each class might introduce a noticeable overhead.
520
521 Instead of having a separate structure, descendant classes of each class are
522 inserted as special attributes into the object table. This requires an extra
523 key to be retained, since each class must provide its own attribute code such
524 that upon an instance/class compatibility test, the code may be obtained and
525 used in the object table.
526
527 Invocation and Code References
528 ------------------------------
529
530 Modules: there is no meaningful invocation reference since modules cannot be
531 explicitly called.
532
533 Functions: a simple code reference is employed pointing to code implementing
534 the function. Note that the function locals are completely distinct from this
535 structure and are not comparable to attributes. Instead, attributes are
536 reserved for default parameter values, although they do not appear in the
537 object table described above, appearing instead in a separate parameter table
538 described below.
539
540 Classes: given that classes must be invoked in order to create instances, a
541 reference must be provided in class structures. However, this reference does
542 not point directly at the __init__ method of the class. Instead, the
543 referenced code belongs to a special initialiser function, __new__, consisting
544 of the following instructions:
545
546 create instance for C
547 call C.__init__(instance, ...)
548 return instance
549
550 Instances: each instance employs a reference to any __call__ method defined in
551 the class hierarchy for the instance, thus maintaining its callable nature.
552
553 Both classes and modules may contain code in their definitions - the former in
554 the "body" of the class, potentially defining attributes, and the latter as
555 the "top-level" code in the module, potentially defining attributes/globals -
556 but this code is not associated with any invocation target. It is thus
557 generated in order of appearance and is not referenced externally.
558
559 Invocation Operation
560 --------------------
561
562 Consequently, regardless of the object an invocation is always done as
563 follows:
564
565 get invocation reference from the header
566 jump to reference
567
568 Additional preparation is necessary before the above code: positional
569 arguments must be saved in the invocation frame, and keyword arguments must be
570 resolved and saved to the appropriate position in the invocation frame.
571
572 See invocation.txt for details.
573
574 Parameters and Lookups
575 ======================
576
577 Since Python supports keyword arguments when making invocations, it becomes
578 necessary to record the parameter names associated with each function or
579 method. Just as object tables record attributes positions on classes and
580 instances, parameter tables record parameter positions in function or method
581 parameter lists.
582
583 For a given program, a parameter table can be considered as being like a
584 matrix mapping functions/methods to parameter names. For example:
585
586 def f(x, y, z):
587 pass
588
589 def g(a, b, c):
590 pass
591
592 def h(a, x):
593 pass
594
595 This would provide the following table, referred to as a parameter table in
596 the context of functions and methods:
597
598 Function/param a b c x y z
599
600 f 1 2 3
601 g 1 2 3
602 h 1 2
603
604 Confusion can occur when functions are adopted as methods, since the context
605 then occupies the first slot in the invocation frame:
606
607 def f(x, y, z):
608 pass
609
610 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
611 -> f(1, 2, 3)
612
613 class C:
614 f = f
615
616 def g(x, y, z):
617 pass
618
619 c = C()
620
621 c.f(y=2, z=3) -> f(<context>, 2, 3)
622 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
623
624 Just as with parameter tables, a displacement list can be prepared from a
625 parameter table:
626
627 Functions with parameter (attribute) offsets
628
629 funccode f
630 attrcode a b c x y z
631
632 g
633 a b c x y z
634
635 h
636 a b c x y z
637
638 List . . . 1 2 3 1 2 3 1 . . 2 . .
639
640 Here, the funccode refers to the offset in the list at which a function's
641 parameters are defined, whereas the attrcode defines the offset within a
642 region of attributes corresponding to a single parameter of a given name.
643
644 Instantiation
645 =============
646
647 When instantiating classes, memory must be reserved for the header of the
648 resulting instance, along with locations for the attributes of the instance.
649 Since the instance header contains data common to all instances of a class, a
650 template header is copied to the start of the newly reserved memory region.
651
652 Register Usage
653 ==============
654
655 During code generation, much of the evaluation produces results which are
656 implicitly recorded in the "active value" or "working" register, and various
657 instructions will consume this active value. In addition, some instructions
658 will consume a separate "active source value" from a register, typically those
659 which are assigning the result of an expression to an assignment target.
660
661 Since values often need to be retained for later use, a set of temporary
662 storage locations are typically employed. However, optimisations may reduce
663 the need to use such temporary storage where instructions which provide the
664 "active value" can be re-executed and will produce the same result. Whether
665 re-use of instructions is possible or not, values still need to be loaded into
666 a "source" register to be accessed by an assignment instruction.
667
668 RSVP instructions generally have the notion of working, target and source
669 registers (see registers.txt). These register "roles" are independent from the
670 actual registers defined for the RSVP machine itself, even though the naming
671 is similar. Generally, instructions do regard the RSVP "working" registers as
672 the class of register in the "working" role, although some occurrences of
673 instruction usage may employ other registers (such as the "result" registers)
674 and thus take advantage of the generality of the RSVP implementation of such
675 instructions.
676
677 List and Tuple Representations
678 ==============================
679
680 Since tuples have a fixed size, the representation of a tuple instance is
681 merely a header describing the size of the entire object, together with a
682 sequence of references to the object "stored" at each position in the
683 structure. Such references consist of the usual context and reference pair.
684
685 Lists, however, have a variable size and must be accessible via an unchanging
686 location even as more memory is allocated elsewhere to accommodate the
687 contents of the list. Consequently, the representation must resemble the
688 following:
689
690 Structure header for list (size == header plus special attribute)
691 Special attribute referencing the underlying sequence
692
693 The underlying sequence has a fixed size, like a tuple, but may contain fewer
694 elements than the size of the sequence permits:
695
696 Special header indicating the current size and allocated size
697 Element
698 ... <-- current size
699 (Unused space)
700 ... <-- allocated size
701
702 This representation permits the allocation of a new sequence when space is
703 exhausted in an existing sequence, with the new sequence address stored in the
704 main list structure. Since access to the contents of the list must go through
705 the main list structure, underlying allocation activities may take place
706 without the users of a list having to be aware of such activities.