1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Attribute usage observations
8 * Contexts and values
9 * Tables, attributes and lookups
10 * Objects and structures
11 * Parameters and lookups
12 * Instantiation
13 * Register usage
14 * List and tuple representations
15
16 Namespaces and Attribute Definition
17 ===================================
18
19 Namespaces are any objects which can retain attributes.
20
21 * Module attributes are defined either at the module level or by global
22 statements.
23 * Class attributes are defined only within class statements.
24 * Instance attributes are defined only by assignments to attributes of self
25 or tentatively as references to attributes of self.
26
27 These restrictions apply because such attributes are thus explicitly declared,
28 permitting the use of tables (described below). Module and class attributes
29 can also be finalised in this way in order to permit certain optimisations.
30
31 An additional restriction required for the current implementation of tables
32 (as described below) applies to class definitions: each class must be defined
33 using a unique name; repeated definition of classes having the same name is
34 thus not permitted. This restriction arises from the use of the "full name" of
35 a class as a key to the object table, where the full name is a qualified path
36 via the module hierarchy ending with the name of the class.
37
38 Rebinding of attributes outside classes and modules can be allowed if
39 attribute usage observations are being used to detect such external
40 modifications to such objects. Without such observations, such rebinding
41 should be forbidden since apparently constant attributes might be modified in
42 a running program, but code may have been generated that provides specific
43 objects for those attributes under the assumption that they will not be
44 changed.
45
46 See rejected.txt for complicating mechanisms which could be applied to
47 mitigate the effects of these restrictions on optimisations.
48
49 Attribute Usage Observations
50 ============================
51
52 Within a scope, a name may be used in conjunction with attribute names in
53 order to access attributes on objects referenced by the name. However, such
54 observations can only be regarded as reliable if the object referenced is not
55 changed independently by some other mechanism or activity.
56
57 With conventional functions and methods, any locally defined names can be
58 considered private to that scope and thus immune to independent modification,
59 at least within reasonable features of the language. Although stack
60 modification may be possible, it seems appropriate to reject such features,
61 especially since they lend themselves to unmaintainable programs.
62
63 For names defined during the initialisation of a class, since the class itself
64 cannot be referenced by name until its declaration has been completely
65 evaluated, no independent modification can occur from outside the class scope.
66
67 For names defined during the initialisation of a module, global declarations
68 in functions permit the rebinding of global variables and since functions may
69 be invoked during module initialisation, independent modification can
70 potentially occur if any functions are called.
71
72 Module globals can be accessed from other modules that can refer to a module
73 by its name. Initially, an import is required to make a module available for
74 modification, but there is no restriction on whether a module has been
75 completely imported (and thus defined) before an import statement can make it
76 available to other modules. Consider the following package root definition:
77
78 # Module changed:
79 def f():
80 import changed.modifier
81 x = 123
82 f()
83
84 # Module changed.modifier:
85 import changed
86 changed.x = 456
87
88 Here, an import of changed will initially set x to 123, but then the function
89 f will be invoked and cause the changed.modifier module to be imported. Since
90 the changed module is already being imported, the import statement will not
91 try to perform the import operation again, but it will make the partially
92 defined module available for access. Thus, the changed.modifier module will
93 then set x to 456, and independent modification of the changed namespace will
94 have been performed.
95
96 In conclusion, module globals cannot therefore be regarded as immune to
97 operations that would disrupt usage observations. Consequently, only locals
98 and class definition "locals" can be reliably employed in attribute usage
99 observations.
100
101 Contexts and Values
102 ===================
103
104 Values are used as the common reference representation in micropython: as
105 stored representations of attributes (of classes, instances, modules, and
106 other objects supporting attribute-like entities) as well as the stored values
107 associated with names in functions and methods.
108
109 Unlike other implementations, micropython does not create things like bound
110 method objects for individual instances. Instead, all objects are referenced
111 using a context, reference pair:
112
113 Value Layout
114 ------------
115
116 0 1
117 context object
118 reference reference
119
120 Specific implementations might reverse this ordering for optimisation
121 purposes.
122
123 Rationale
124 ---------
125
126 To reduce the number of created objects whilst retaining the ability to
127 support bound method invocations. The context indicates the context in which
128 an invocation is performed, typically the owner of the method.
129
130 Usage
131 -----
132
133 The context may be inserted as the first argument when a value is involved in
134 an invocation. This argument may then be omitted from the invocation if its
135 usage is not appropriate.
136
137 See invocation.txt for details.
138
139 Context Value Types
140 -------------------
141
142 The following types of context value exist:
143
144 Type Usage Transformations
145 ---- ----- ---------------
146
147 Replaceable With functions (not methods) May be replaced with an
148 instance or a class when a
149 value is stored on an
150 instance or class
151
152 Placeholder With classes May not be replaced
153
154 Instance With instances (and constants) May not be replaced
155 or functions as methods
156
157 Class With functions as methods May be replaced when a
158 value is loaded from a
159 class attribute via an
160 instance
161
162 Contexts in Acquired Values
163 ---------------------------
164
165 There are four classes of instructions which provide values:
166
167 Instruction Purpose Context Operations
168 ----------- ------- ------------------
169
170 1) LoadConst Load module, constant Use loaded object with itself
171 as context
172
173 2) LoadFunction Load function Combine replaceable context
174 with loaded object
175
176 3) LoadClass Load class Combine placeholder context
177 with loaded object
178
179 4) LoadAddress* Load attribute from Preserve or override stored
180 LoadAttr* class, module, context (as described in
181 instance assignment.txt)
182
183 In order to comply with traditional Python behaviour, contexts may or may not
184 represent the object from which an attribute has been acquired.
185
186 See assignment.txt for details.
187
188 Contexts in Stored Values
189 -------------------------
190
191 There are two classes of instruction for storing values:
192
193 Instruction Purpose Context Operations
194 ----------- ------- ------------------
195
196 1) StoreAddress Store attribute in a Preserve context; note that no
197 known object test for class attribute
198 assignment should be necessary
199 since this instruction should only
200 be generated for module globals
201
202 StoreAttr Store attribute in an Preserve context; note that no
203 instance test for class attribute
204 assignment should be necessary
205 since this instruction should only
206 be generated for self accesses
207
208 StoreAttrIndex Store attribute in an Preserve context; since the index
209 unknown object lookup could yield a class
210 attribute, a test of the nature of
211 the nature of the structure is
212 necessary in order to prevent
213 assignments to classes
214
215 2) StoreAddressContext Store attribute in a Override context if appropriate;
216 known object if the value has a replaceable
217 context, permit the target to
218 take ownership of the value
219
220 See assignment.txt for details.
221
222 Tables, Attributes and Lookups
223 ==============================
224
225 Attribute lookups, where the exact location of an object attribute is deduced,
226 are performed differently in micropython than in other implementations.
227 Instead of providing attribute dictionaries, in which attributes are found,
228 attributes are located at fixed places in object structures (described below)
229 and their locations are stored using a special representation known as a
230 table.
231
232 For a given program, a table can be considered as being like a matrix mapping
233 classes to attribute names. For example:
234
235 class A:
236 # instances have attributes x, y
237
238 class B(A):
239 # introduces attribute z for instances
240
241 class C:
242 # instances have attributes a, b, z
243
244 This would provide the following table, referred to as an object table in the
245 context of classes and instances:
246
247 Class/attr a b x y z
248
249 A 1 2
250 B 1 2 3
251 C 1 2 3
252
253 A limitation of this representation is that instance attributes may not shadow
254 class attributes: if an attribute with a given name is not defined on an
255 instance, an attribute with the same name cannot be provided by the class of
256 the instance or any superclass of the instance's class. This impacts the
257 provision of the __class__ attribute, as described below.
258
259 The table can be compacted using a representation known as a displacement
260 list (referred to as an object list in this context):
261
262 Classes with attribute offsets
263
264 classcode A
265 attrcode a b x y z
266
267 B
268 a b x y z
269
270 C
271 a b x y z
272
273 List . . 1 2 1 2 3 1 2 . . 3
274
275 Here, the classcode refers to the offset in the list at which a class's
276 attributes are defined, whereas the attrcode defines the offset within a
277 region of attributes corresponding to a single attribute of a given name.
278
279 Attribute Locations
280 -------------------
281
282 The locations stored in table/list elements are generally for instance
283 attributes relative to the location of the instance, whereas those for class
284 attributes and module attributes are generally absolute addresses. Thus, each
285 occupied table cell has the following structure:
286
287 attrcode, uses-absolute-address, address (or location)
288
289 This could be given instead as follows:
290
291 attrcode, is-class-or-module, location
292
293 Since uses-absolute-address corresponds to is-class-or-module, and since there
294 is a need to test for classes and modules to prevent assignment to attributes
295 of such objects, this particular information is always required.
296
297 The __class__ Attribute
298 -----------------------
299
300 In Python 2.x, at least with new-style classes, instances have __class__
301 attributes which indicate the class from which they have been instantiated,
302 whereas classes have __class__ attributes which reference the type class.
303 With the object table, it is not possible to provide absolute addresses which
304 can be used for both classes and instances, since this would result in classes
305 and instances having the same class, and thus the class of a class would be
306 the class itself.
307
308 One solution is to use object-relative values in the table so that referencing
309 the __class__ attribute of an instance produces a value which can be combined
310 with an instance's address to yield the address of the attribute, which itself
311 refers to the instance's class, whereas referencing the __class__ attribute of
312 a class produces a similar object-relative value that is combined with the
313 class's address to yield the address of the attribute, which itself refers to
314 the special type class.
315
316 Obviously, the above solution requires both classes and instances to retain an
317 attribute location specifically to hold the value appropriate for each object
318 type, whereas a scheme which omits the __class__ attribute on classes would be
319 able to employ an absolute address in the table and maintain only a single
320 address to refer to the class for all instances. The only problem with not
321 providing a sensible __class__ attribute entry for classes would be the need
322 for special treatment of __class__ to prevent inappropriate consultation of
323 the table for classes.
324
325 Comparing Tables as Matrices with Displacement Lists
326 ----------------------------------------------------
327
328 Although displacement lists can provide reasonable levels of compaction for
329 attribute data, the element size is larger than that required for a simple
330 matrix: the attribute code (attrcode) need not be stored since each element
331 unambiguously refers to the availability of an attribute for a particular
332 class or instance of that class, and so the data at a given element need not
333 be tested for relevance to a given attribute access operation.
334
335 Given a program with 20 object types and 100 attribute types, a matrix would
336 occupy the following amount of space:
337
338 number of object types * number of attribute types * element size
339 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
340 = 2000
341
342 In contrast, given a compaction to 40% of the matrix size (without considering
343 element size) in a displacement list, the amount of space would be as follows:
344
345 number of elements * element size
346 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
347 = 1600
348
349 Consequently, the principal overhead of using a displacement list is likely to
350 be in the need to check element relevance when retrieving values from such a
351 list.
352
353 Objects and Structures
354 ======================
355
356 As well as references, micropython needs to have actual objects to refer to.
357 Since classes, functions and instances are all objects, it is desirable that
358 certain common features and operations are supported in the same way for all
359 of these things. To permit this, a common data structure format is used.
360
361 Header.................................................... Attributes.................
362
363 Identifier Identifier Address Identifier Size Object ...
364
365 0 1 2 3 4 5 6
366 classcode attrcode/ invocation funccode size attribute ...
367 instance reference reference
368 status
369
370 Classcode
371 ---------
372
373 Used in attribute lookup.
374
375 Here, the classcode refers to the attribute lookup table for the object (as
376 described above). Classes and instances share the same classcode, and their
377 structures reflect this. Functions all belong to the same type and thus employ
378 the classcode for the function built-in type, whereas modules have distinct
379 types since they must support different sets of attributes.
380
381 Attrcode
382 --------
383
384 Used to test instances for membership of classes (or descendants of classes).
385
386 Since, in traditional Python, classes are only ever instances of some generic
387 built-in type, support for testing such a relationship directly has been
388 removed and the attrcode is not specified for classes: the presence of an
389 attrcode indicates that a given object is an instance. In addition, support
390 has also been removed for testing modules in the same way, meaning that the
391 attrcode is also not specified for modules.
392
393 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
394 for details of attrcodes.
395
396 Invocation Reference
397 --------------------
398
399 Used when an object is called.
400
401 This is the address of the code to be executed when an invocation is performed
402 on the object.
403
404 Funccode
405 --------
406
407 Used to look up argument positions by name.
408
409 The strategy with keyword arguments in micropython is to attempt to position
410 such arguments in the invocation frame as it is being constructed.
411
412 See the "Parameters and Lookups" section for more information.
413
414 Size
415 ----
416
417 Used to indicate the size of an object including attributes.
418
419 Attributes
420 ----------
421
422 For classes, modules and instances, the attributes in the structure correspond
423 to the attributes of each kind of object. For functions, however, the
424 attributes in the structure correspond to the default arguments for each
425 function, if any.
426
427 Structure Types
428 ---------------
429
430 Class C:
431
432 0 1 2 3 4 5 6
433 classcode (unused) __new__ funccode size attribute ...
434 for C reference for reference
435 instantiator
436
437 Instance of C:
438
439 0 1 2 3 4 5 6
440 classcode attrcode C.__call__ funccode size attribute ...
441 for C for C reference for reference
442 (if exists) C.__call__
443
444 Function f:
445
446 0 1 2 3 4 5 6
447 classcode attrcode code funccode size attribute ...
448 for for reference (default)
449 function function reference
450
451 Module m:
452
453 0 1 2 3 4 5 6
454 classcode attrcode (unused) (unused) (unused) attribute ...
455 for m for m (global)
456 reference
457
458 The __class__ Attribute
459 -----------------------
460
461 All objects should support the __class__ attribute, and in most cases this is
462 done using the object table, yielding a common address for all instances of a
463 given class.
464
465 Function: refers to the function class
466 Instance: refers to the class instantiated to make the object
467
468 The object table cannot support two definitions simultaneously for both
469 instances and their classes. Consequently, __class__ access on classes must be
470 tested for and a special result returned.
471
472 Class: refers to the type class (type.__class__ also refers to the type class)
473
474 For convenience, the first attribute of a class will be the common __class__
475 attribute for all its instances. As noted above, direct access to this
476 attribute will not be possible for classes, and a constant result will be
477 returned instead.
478
479 Lists and Tuples
480 ----------------
481
482 The built-in list and tuple sequences employ variable length structures using
483 the attribute locations to store their elements, where each element is a
484 reference to a separately stored object.
485
486 Testing Instance Compatibility with Classes (Attrcode)
487 ------------------------------------------------------
488
489 Although it would be possible to have a data structure mapping classes to
490 compatible classes, such as a matrix indicating the subclasses (or
491 superclasses) of each class, the need to retain the key to such a data
492 structure for each class might introduce a noticeable overhead.
493
494 Instead of having a separate structure, descendant classes of each class are
495 inserted as special attributes into the object table. This requires an extra
496 key to be retained, since each class must provide its own attribute code such
497 that upon an instance/class compatibility test, the code may be obtained and
498 used in the object table.
499
500 Invocation and Code References
501 ------------------------------
502
503 Modules: there is no meaningful invocation reference since modules cannot be
504 explicitly called.
505
506 Functions: a simple code reference is employed pointing to code implementing
507 the function. Note that the function locals are completely distinct from this
508 structure and are not comparable to attributes. Instead, attributes are
509 reserved for default parameter values, although they do not appear in the
510 object table described above, appearing instead in a separate parameter table
511 described below.
512
513 Classes: given that classes must be invoked in order to create instances, a
514 reference must be provided in class structures. However, this reference does
515 not point directly at the __init__ method of the class. Instead, the
516 referenced code belongs to a special initialiser function, __new__, consisting
517 of the following instructions:
518
519 create instance for C
520 call C.__init__(instance, ...)
521 return instance
522
523 Instances: each instance employs a reference to any __call__ method defined in
524 the class hierarchy for the instance, thus maintaining its callable nature.
525
526 Both classes and modules may contain code in their definitions - the former in
527 the "body" of the class, potentially defining attributes, and the latter as
528 the "top-level" code in the module, potentially defining attributes/globals -
529 but this code is not associated with any invocation target. It is thus
530 generated in order of appearance and is not referenced externally.
531
532 Invocation Operation
533 --------------------
534
535 Consequently, regardless of the object an invocation is always done as
536 follows:
537
538 get invocation reference from the header
539 jump to reference
540
541 Additional preparation is necessary before the above code: positional
542 arguments must be saved in the invocation frame, and keyword arguments must be
543 resolved and saved to the appropriate position in the invocation frame.
544
545 See invocation.txt for details.
546
547 Parameters and Lookups
548 ======================
549
550 Since Python supports keyword arguments when making invocations, it becomes
551 necessary to record the parameter names associated with each function or
552 method. Just as object tables record attributes positions on classes and
553 instances, parameter tables record parameter positions in function or method
554 parameter lists.
555
556 For a given program, a parameter table can be considered as being like a
557 matrix mapping functions/methods to parameter names. For example:
558
559 def f(x, y, z):
560 pass
561
562 def g(a, b, c):
563 pass
564
565 def h(a, x):
566 pass
567
568 This would provide the following table, referred to as a parameter table in
569 the context of functions and methods:
570
571 Function/param a b c x y z
572
573 f 1 2 3
574 g 1 2 3
575 h 1 2
576
577 Confusion can occur when functions are adopted as methods, since the context
578 then occupies the first slot in the invocation frame:
579
580 def f(x, y, z):
581 pass
582
583 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
584 -> f(1, 2, 3)
585
586 class C:
587 f = f
588
589 def g(x, y, z):
590 pass
591
592 c = C()
593
594 c.f(y=2, z=3) -> f(<context>, 2, 3)
595 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
596
597 Just as with parameter tables, a displacement list can be prepared from a
598 parameter table:
599
600 Functions with parameter (attribute) offsets
601
602 funccode f
603 attrcode a b c x y z
604
605 g
606 a b c x y z
607
608 h
609 a b c x y z
610
611 List . . . 1 2 3 1 2 3 1 . . 2 . .
612
613 Here, the funccode refers to the offset in the list at which a function's
614 parameters are defined, whereas the attrcode defines the offset within a
615 region of attributes corresponding to a single parameter of a given name.
616
617 Instantiation
618 =============
619
620 When instantiating classes, memory must be reserved for the header of the
621 resulting instance, along with locations for the attributes of the instance.
622 Since the instance header contains data common to all instances of a class, a
623 template header is copied to the start of the newly reserved memory region.
624
625 Register Usage
626 ==============
627
628 During code generation, much of the evaluation produces results which are
629 implicitly recorded in the "active value" or "working" register, and various
630 instructions will consume this active value. In addition, some instructions
631 will consume a separate "active source value" from a register, typically those
632 which are assigning the result of an expression to an assignment target.
633
634 Since values often need to be retained for later use, a set of temporary
635 storage locations are typically employed. However, optimisations may reduce
636 the need to use such temporary storage where instructions which provide the
637 "active value" can be re-executed and will produce the same result. Whether
638 re-use of instructions is possible or not, values still need to be loaded into
639 a "source" register to be accessed by an assignment instruction.
640
641 RSVP instructions generally have the notion of working, target and source
642 registers (see registers.txt). These register "roles" are independent from the
643 actual registers defined for the RSVP machine itself, even though the naming
644 is similar. Generally, instructions do regard the RSVP "working" registers as
645 the class of register in the "working" role, although some occurrences of
646 instruction usage may employ other registers (such as the "result" registers)
647 and thus take advantage of the generality of the RSVP implementation of such
648 instructions.
649
650 List and Tuple Representations
651 ==============================
652
653 Since tuples have a fixed size, the representation of a tuple instance is
654 merely a header describing the size of the entire object, together with a
655 sequence of references to the object "stored" at each position in the
656 structure. Such references consist of the usual context and reference pair.
657
658 Lists, however, have a variable size and must be accessible via an unchanging
659 location even as more memory is allocated elsewhere to accommodate the
660 contents of the list. Consequently, the representation must resemble the
661 following:
662
663 Structure header for list (size == header plus special attribute)
664 Special attribute referencing the underlying sequence
665
666 The underlying sequence has a fixed size, like a tuple, but may contain fewer
667 elements than the size of the sequence permits:
668
669 Special header indicating the current size and allocated size
670 Element
671 ... <-- current size
672 (Unused space)
673 ... <-- allocated size
674
675 This representation permits the allocation of a new sequence when space is
676 exhausted in an existing sequence, with the new sequence address stored in the
677 main list structure. Since access to the contents of the list must go through
678 the main list structure, underlying allocation activities may take place
679 without the users of a list having to be aware of such activities.