1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Attribute usage observations
8 * Imports and circular import detection
9 * Contexts and values
10 * Tables, attributes and lookups
11 * Objects and structures
12 * Parameters and lookups
13 * Instantiation
14 * Register usage
15 * List and tuple representations
16
17 Namespaces and Attribute Definition
18 ===================================
19
20 Namespaces are any objects which can retain attributes.
21
22 * Module attributes are defined either at the module level or by global
23 statements.
24 * Class attributes are defined only within class statements.
25 * Instance attributes are defined only by assignments to attributes of self
26 or tentatively as references to attributes of self.
27
28 These restrictions apply because such attributes are thus explicitly declared,
29 permitting the use of tables (described below). Module and class attributes
30 can also be finalised in this way in order to permit certain optimisations.
31
32 Rebinding of attributes outside classes and modules can be allowed if
33 attribute usage observations are being used to detect such external
34 modifications to such objects. Without such observations, such rebinding
35 should be forbidden since apparently constant attributes might be modified in
36 a running program, but code may have been generated that provides specific
37 objects for those attributes under the assumption that they will not be
38 changed.
39
40 See rejected.txt for complicating mechanisms which could be applied to
41 mitigate the effects of these restrictions on optimisations.
42
43 Attribute Usage Observations
44 ============================
45
46 Within a scope, a name may be used in conjunction with attribute names in
47 order to access attributes on objects referenced by the name. However, such
48 observations can only be regarded as reliable if the object referenced is not
49 changed independently by some other mechanism or activity.
50
51 With conventional functions and methods, any locally defined names can be
52 considered private to that scope and thus immune to independent modification,
53 at least within reasonable features of the language. Although stack
54 modification may be possible, it seems appropriate to reject such features,
55 especially since they lend themselves to unmaintainable programs.
56
57 For names defined during the initialisation of a class, since the class itself
58 cannot be referenced by name until its declaration has been completely
59 evaluated, no independent modification can occur from outside the class scope.
60
61 For names defined during the initialisation of a module, global declarations
62 in functions permit the rebinding of global variables and since functions may
63 be invoked during module initialisation, independent modification can
64 potentially occur if any functions are called.
65
66 Module globals can be accessed from other modules that can refer to a module
67 by its name. Initially, an import is required to make a module available for
68 modification, but there is no restriction on whether a module has been
69 completely imported (and thus defined) before an import statement can make it
70 available to other modules. Consider the following package root definition:
71
72 # Module changed:
73 def f():
74 import changed.modifier
75 x = 123
76 f()
77
78 # Module changed.modifier:
79 import changed
80 changed.x = 456
81
82 Here, an import of changed will initially set x to 123, but then the function
83 f will be invoked and cause the changed.modifier module to be imported. Since
84 the changed module is already being imported, the import statement will not
85 try to perform the import operation again, but it will make the partially
86 defined module available for access. Thus, the changed.modifier module will
87 then set x to 456, and independent modification of the changed namespace will
88 have been performed.
89
90 In conclusion, module globals cannot therefore be regarded as immune to
91 operations that would disrupt usage observations. Consequently, only locals
92 and class definition "locals" can be reliably employed in attribute usage
93 observations.
94
95 Imports and Circular Import Detection
96 =====================================
97
98 The matter of whether any given module is potentially modified by another
99 module before it has been completely imported can be addressed by separating
100 the import process into distinct phases:
101
102 1. Discovery/loading
103 2. Parsing/structure processing
104 3. Completion/code processing
105
106 Upon discovering a module, a test is made to determine whether it is already
107 being imported; if not, it is then loaded and its structure inspected to
108 determine whether it may import other modules, which will then in turn be
109 discovered and loaded. Once no more modules can be loaded, they will then be
110 completed by undergoing the more thorough systematic processing of each
111 module's contents, defining program units and requesting the completion of
112 other modules when import statements are encountered.
113
114 The motivation for such a multi-phase approach is to detect circular imports
115 in the structure processing phase before modules are populated and deductions
116 made about their objects' behaviour. Thus, globals belonging to a module known
117 to be imported in a circular fashion will already be regarded as potentially
118 modifiable by other modules and attribute usage observations will not be
119 recorded.
120
121 Contexts and Values
122 ===================
123
124 Values are used as the common reference representation in micropython: as
125 stored representations of attributes (of classes, instances, modules, and
126 other objects supporting attribute-like entities) as well as the stored values
127 associated with names in functions and methods.
128
129 Unlike other implementations, micropython does not create things like bound
130 method objects for individual instances. Instead, all objects are referenced
131 using a context, reference pair:
132
133 Value Layout
134 ------------
135
136 0 1
137 context object
138 reference reference
139
140 Specific implementations might reverse this ordering for optimisation
141 purposes.
142
143 Rationale
144 ---------
145
146 To reduce the number of created objects whilst retaining the ability to
147 support bound method invocations. The context indicates the context in which
148 an invocation is performed, typically the owner of the method.
149
150 Usage
151 -----
152
153 The context may be inserted as the first argument when a value is involved in
154 an invocation. This argument may then be omitted from the invocation if its
155 usage is not appropriate.
156
157 See invocation.txt for details.
158
159 Context Value Types
160 -------------------
161
162 The following types of context value exist:
163
164 Type Usage Transformations
165 ---- ----- ---------------
166
167 Replaceable With functions (not methods) May be replaced with an
168 instance or a class when a
169 value is stored on an
170 instance or class
171
172 Placeholder With classes May not be replaced
173
174 Instance With instances (and constants) May not be replaced
175 or functions as methods
176
177 Class With functions as methods May be replaced when a
178 value is loaded from a
179 class attribute via an
180 instance
181
182 Contexts in Acquired Values
183 ---------------------------
184
185 There are four classes of instructions which provide values:
186
187 Instruction Purpose Context Operations
188 ----------- ------- ------------------
189
190 1) LoadConst Load module, constant Use loaded object with itself
191 as context
192
193 2) LoadFunction Load function Combine replaceable context
194 with loaded object
195
196 3) LoadClass Load class Combine placeholder context
197 with loaded object
198
199 4) LoadAddress* Load attribute from Preserve or override stored
200 LoadAttr* class, module, context (as described in
201 instance assignment.txt)
202
203 In order to comply with traditional Python behaviour, contexts may or may not
204 represent the object from which an attribute has been acquired.
205
206 See assignment.txt for details.
207
208 Contexts in Stored Values
209 -------------------------
210
211 There are two classes of instruction for storing values:
212
213 Instruction Purpose Context Operations
214 ----------- ------- ------------------
215
216 1) StoreAddress Store attribute in a Preserve context; note that no
217 known object test for class attribute
218 assignment should be necessary
219 since this instruction should only
220 be generated for module globals
221
222 StoreAttr Store attribute in an Preserve context; note that no
223 instance test for class attribute
224 assignment should be necessary
225 since this instruction should only
226 be generated for self accesses
227
228 StoreAttrIndex Store attribute in an Preserve context; since the index
229 unknown object lookup could yield a class
230 attribute, a test of the nature of
231 the nature of the structure is
232 necessary in order to prevent
233 assignments to classes
234
235 2) StoreAddressContext Store attribute in a Override context if appropriate;
236 known object if the value has a replaceable
237 context, permit the target to
238 take ownership of the value
239
240 See assignment.txt for details.
241
242 Tables, Attributes and Lookups
243 ==============================
244
245 Attribute lookups, where the exact location of an object attribute is deduced,
246 are performed differently in micropython than in other implementations.
247 Instead of providing attribute dictionaries, in which attributes are found,
248 attributes are located at fixed places in object structures (described below)
249 and their locations are stored using a special representation known as a
250 table.
251
252 For a given program, a table can be considered as being like a matrix mapping
253 classes to attribute names. For example:
254
255 class A:
256 # instances have attributes x, y
257
258 class B(A):
259 # introduces attribute z for instances
260
261 class C:
262 # instances have attributes a, b, z
263
264 This would provide the following table, referred to as an object table in the
265 context of classes and instances:
266
267 Class/attr a b x y z
268
269 A 1 2
270 B 1 2 3
271 C 1 2 3
272
273 A limitation of this representation is that instance attributes may not shadow
274 class attributes: if an attribute with a given name is not defined on an
275 instance, an attribute with the same name cannot be provided by the class of
276 the instance or any superclass of the instance's class. This impacts the
277 provision of the __class__ attribute, as described below.
278
279 The table can be compacted using a representation known as a displacement
280 list (referred to as an object list in this context):
281
282 Classes with attribute offsets
283
284 classcode A
285 attrcode a b x y z
286
287 B
288 a b x y z
289
290 C
291 a b x y z
292
293 List . . 1 2 1 2 3 1 2 . . 3
294
295 Here, the classcode refers to the offset in the list at which a class's
296 attributes are defined, whereas the attrcode defines the offset within a
297 region of attributes corresponding to a single attribute of a given name.
298
299 Attribute Locations
300 -------------------
301
302 The locations stored in table/list elements are generally for instance
303 attributes relative to the location of the instance, whereas those for class
304 attributes and module attributes are generally absolute addresses. Thus, each
305 occupied table cell has the following structure:
306
307 attrcode, uses-absolute-address, address (or location)
308
309 This could be given instead as follows:
310
311 attrcode, is-class-or-module, location
312
313 Since uses-absolute-address corresponds to is-class-or-module, and since there
314 is a need to test for classes and modules to prevent assignment to attributes
315 of such objects, this particular information is always required.
316
317 The __class__ Attribute
318 -----------------------
319
320 In Python 2.x, at least with new-style classes, instances have __class__
321 attributes which indicate the class from which they have been instantiated,
322 whereas classes have __class__ attributes which reference the type class.
323 With the object table, it is not possible to provide absolute addresses which
324 can be used for both classes and instances, since this would result in classes
325 and instances having the same class, and thus the class of a class would be
326 the class itself.
327
328 One solution is to use object-relative values in the table so that referencing
329 the __class__ attribute of an instance produces a value which can be combined
330 with an instance's address to yield the address of the attribute, which itself
331 refers to the instance's class, whereas referencing the __class__ attribute of
332 a class produces a similar object-relative value that is combined with the
333 class's address to yield the address of the attribute, which itself refers to
334 the special type class.
335
336 Obviously, the above solution requires both classes and instances to retain an
337 attribute location specifically to hold the value appropriate for each object
338 type, whereas a scheme which omits the __class__ attribute on classes would be
339 able to employ an absolute address in the table and maintain only a single
340 address to refer to the class for all instances. The only problem with not
341 providing a sensible __class__ attribute entry for classes would be the need
342 for special treatment of __class__ to prevent inappropriate consultation of
343 the table for classes.
344
345 Comparing Tables as Matrices with Displacement Lists
346 ----------------------------------------------------
347
348 Although displacement lists can provide reasonable levels of compaction for
349 attribute data, the element size is larger than that required for a simple
350 matrix: the attribute code (attrcode) need not be stored since each element
351 unambiguously refers to the availability of an attribute for a particular
352 class or instance of that class, and so the data at a given element need not
353 be tested for relevance to a given attribute access operation.
354
355 Given a program with 20 object types and 100 attribute types, a matrix would
356 occupy the following amount of space:
357
358 number of object types * number of attribute types * element size
359 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
360 = 2000
361
362 In contrast, given a compaction to 40% of the matrix size (without considering
363 element size) in a displacement list, the amount of space would be as follows:
364
365 number of elements * element size
366 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
367 = 1600
368
369 Consequently, the principal overhead of using a displacement list is likely to
370 be in the need to check element relevance when retrieving values from such a
371 list.
372
373 Objects and Structures
374 ======================
375
376 As well as references, micropython needs to have actual objects to refer to.
377 Since classes, functions and instances are all objects, it is desirable that
378 certain common features and operations are supported in the same way for all
379 of these things. To permit this, a common data structure format is used.
380
381 Header.................................................... Attributes.................
382
383 Identifier Identifier Address Identifier Size Object ...
384
385 0 1 2 3 4 5 6
386 classcode attrcode/ invocation funccode size attribute ...
387 instance reference reference
388 status
389
390 Classcode
391 ---------
392
393 Used in attribute lookup.
394
395 Here, the classcode refers to the attribute lookup table for the object (as
396 described above). Classes and instances share the same classcode, and their
397 structures reflect this. Functions all belong to the same type and thus employ
398 the classcode for the function built-in type, whereas modules have distinct
399 types since they must support different sets of attributes.
400
401 Attrcode
402 --------
403
404 Used to test instances for membership of classes (or descendants of classes).
405
406 Since, in traditional Python, classes are only ever instances of some generic
407 built-in type, support for testing such a relationship directly has been
408 removed and the attrcode is not specified for classes: the presence of an
409 attrcode indicates that a given object is an instance. In addition, support
410 has also been removed for testing modules in the same way, meaning that the
411 attrcode is also not specified for modules.
412
413 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
414 for details of attrcodes.
415
416 Invocation Reference
417 --------------------
418
419 Used when an object is called.
420
421 This is the address of the code to be executed when an invocation is performed
422 on the object.
423
424 Funccode
425 --------
426
427 Used to look up argument positions by name.
428
429 The strategy with keyword arguments in micropython is to attempt to position
430 such arguments in the invocation frame as it is being constructed.
431
432 See the "Parameters and Lookups" section for more information.
433
434 Size
435 ----
436
437 Used to indicate the size of an object including attributes.
438
439 Attributes
440 ----------
441
442 For classes, modules and instances, the attributes in the structure correspond
443 to the attributes of each kind of object. For functions, however, the
444 attributes in the structure correspond to the default arguments for each
445 function, if any.
446
447 Structure Types
448 ---------------
449
450 Class C:
451
452 0 1 2 3 4 5 6
453 classcode (unused) __new__ funccode size attribute ...
454 for C reference for reference
455 instantiator
456
457 Instance of C:
458
459 0 1 2 3 4 5 6
460 classcode attrcode C.__call__ funccode size attribute ...
461 for C for C reference for reference
462 (if exists) C.__call__
463
464 Function f:
465
466 0 1 2 3 4 5 6
467 classcode attrcode code funccode size attribute ...
468 for for reference (default)
469 function function reference
470
471 Module m:
472
473 0 1 2 3 4 5 6
474 classcode attrcode (unused) (unused) (unused) attribute ...
475 for m for m (global)
476 reference
477
478 The __class__ Attribute
479 -----------------------
480
481 All objects should support the __class__ attribute, and in most cases this is
482 done using the object table, yielding a common address for all instances of a
483 given class.
484
485 Function: refers to the function class
486 Instance: refers to the class instantiated to make the object
487
488 The object table cannot support two definitions simultaneously for both
489 instances and their classes. Consequently, __class__ access on classes must be
490 tested for and a special result returned.
491
492 Class: refers to the type class (type.__class__ also refers to the type class)
493
494 For convenience, the first attribute of a class will be the common __class__
495 attribute for all its instances. As noted above, direct access to this
496 attribute will not be possible for classes, and a constant result will be
497 returned instead.
498
499 Lists and Tuples
500 ----------------
501
502 The built-in list and tuple sequences employ variable length structures using
503 the attribute locations to store their elements, where each element is a
504 reference to a separately stored object.
505
506 Testing Instance Compatibility with Classes (Attrcode)
507 ------------------------------------------------------
508
509 Although it would be possible to have a data structure mapping classes to
510 compatible classes, such as a matrix indicating the subclasses (or
511 superclasses) of each class, the need to retain the key to such a data
512 structure for each class might introduce a noticeable overhead.
513
514 Instead of having a separate structure, descendant classes of each class are
515 inserted as special attributes into the object table. This requires an extra
516 key to be retained, since each class must provide its own attribute code such
517 that upon an instance/class compatibility test, the code may be obtained and
518 used in the object table.
519
520 Invocation and Code References
521 ------------------------------
522
523 Modules: there is no meaningful invocation reference since modules cannot be
524 explicitly called.
525
526 Functions: a simple code reference is employed pointing to code implementing
527 the function. Note that the function locals are completely distinct from this
528 structure and are not comparable to attributes. Instead, attributes are
529 reserved for default parameter values, although they do not appear in the
530 object table described above, appearing instead in a separate parameter table
531 described below.
532
533 Classes: given that classes must be invoked in order to create instances, a
534 reference must be provided in class structures. However, this reference does
535 not point directly at the __init__ method of the class. Instead, the
536 referenced code belongs to a special initialiser function, __new__, consisting
537 of the following instructions:
538
539 create instance for C
540 call C.__init__(instance, ...)
541 return instance
542
543 Instances: each instance employs a reference to any __call__ method defined in
544 the class hierarchy for the instance, thus maintaining its callable nature.
545
546 Both classes and modules may contain code in their definitions - the former in
547 the "body" of the class, potentially defining attributes, and the latter as
548 the "top-level" code in the module, potentially defining attributes/globals -
549 but this code is not associated with any invocation target. It is thus
550 generated in order of appearance and is not referenced externally.
551
552 Invocation Operation
553 --------------------
554
555 Consequently, regardless of the object an invocation is always done as
556 follows:
557
558 get invocation reference from the header
559 jump to reference
560
561 Additional preparation is necessary before the above code: positional
562 arguments must be saved in the invocation frame, and keyword arguments must be
563 resolved and saved to the appropriate position in the invocation frame.
564
565 See invocation.txt for details.
566
567 Parameters and Lookups
568 ======================
569
570 Since Python supports keyword arguments when making invocations, it becomes
571 necessary to record the parameter names associated with each function or
572 method. Just as object tables record attributes positions on classes and
573 instances, parameter tables record parameter positions in function or method
574 parameter lists.
575
576 For a given program, a parameter table can be considered as being like a
577 matrix mapping functions/methods to parameter names. For example:
578
579 def f(x, y, z):
580 pass
581
582 def g(a, b, c):
583 pass
584
585 def h(a, x):
586 pass
587
588 This would provide the following table, referred to as a parameter table in
589 the context of functions and methods:
590
591 Function/param a b c x y z
592
593 f 1 2 3
594 g 1 2 3
595 h 1 2
596
597 Confusion can occur when functions are adopted as methods, since the context
598 then occupies the first slot in the invocation frame:
599
600 def f(x, y, z):
601 pass
602
603 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
604 -> f(1, 2, 3)
605
606 class C:
607 f = f
608
609 def g(x, y, z):
610 pass
611
612 c = C()
613
614 c.f(y=2, z=3) -> f(<context>, 2, 3)
615 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
616
617 Just as with parameter tables, a displacement list can be prepared from a
618 parameter table:
619
620 Functions with parameter (attribute) offsets
621
622 funccode f
623 attrcode a b c x y z
624
625 g
626 a b c x y z
627
628 h
629 a b c x y z
630
631 List . . . 1 2 3 1 2 3 1 . . 2 . .
632
633 Here, the funccode refers to the offset in the list at which a function's
634 parameters are defined, whereas the attrcode defines the offset within a
635 region of attributes corresponding to a single parameter of a given name.
636
637 Instantiation
638 =============
639
640 When instantiating classes, memory must be reserved for the header of the
641 resulting instance, along with locations for the attributes of the instance.
642 Since the instance header contains data common to all instances of a class, a
643 template header is copied to the start of the newly reserved memory region.
644
645 Register Usage
646 ==============
647
648 During code generation, much of the evaluation produces results which are
649 implicitly recorded in the "active value" or "working" register, and various
650 instructions will consume this active value. In addition, some instructions
651 will consume a separate "active source value" from a register, typically those
652 which are assigning the result of an expression to an assignment target.
653
654 Since values often need to be retained for later use, a set of temporary
655 storage locations are typically employed. However, optimisations may reduce
656 the need to use such temporary storage where instructions which provide the
657 "active value" can be re-executed and will produce the same result. Whether
658 re-use of instructions is possible or not, values still need to be loaded into
659 a "source" register to be accessed by an assignment instruction.
660
661 RSVP instructions generally have the notion of working, target and source
662 registers (see registers.txt). These register "roles" are independent from the
663 actual registers defined for the RSVP machine itself, even though the naming
664 is similar. Generally, instructions do regard the RSVP "working" registers as
665 the class of register in the "working" role, although some occurrences of
666 instruction usage may employ other registers (such as the "result" registers)
667 and thus take advantage of the generality of the RSVP implementation of such
668 instructions.
669
670 List and Tuple Representations
671 ==============================
672
673 Since tuples have a fixed size, the representation of a tuple instance is
674 merely a header describing the size of the entire object, together with a
675 sequence of references to the object "stored" at each position in the
676 structure. Such references consist of the usual context and reference pair.
677
678 Lists, however, have a variable size and must be accessible via an unchanging
679 location even as more memory is allocated elsewhere to accommodate the
680 contents of the list. Consequently, the representation must resemble the
681 following:
682
683 Structure header for list (size == header plus special attribute)
684 Special attribute referencing the underlying sequence
685
686 The underlying sequence has a fixed size, like a tuple, but may contain fewer
687 elements than the size of the sequence permits:
688
689 Special header indicating the current size and allocated size
690 Element
691 ... <-- current size
692 (Unused space)
693 ... <-- allocated size
694
695 This representation permits the allocation of a new sequence when space is
696 exhausted in an existing sequence, with the new sequence address stored in the
697 main list structure. Since access to the contents of the list must go through
698 the main list structure, underlying allocation activities may take place
699 without the users of a list having to be aware of such activities.