1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Contexts and values
8 * Tables, attributes and lookups
9 * Objects and structures
10 * Parameters and lookups
11 * Instantiation
12 * Register usage
13 * List and tuple representations
14
15 Namespaces and Attribute Definition
16 ===================================
17
18 Namespaces are any objects which can retain attributes.
19
20 * Module attributes are defined either at the module level or by global
21 statements.
22 * Class attributes are defined only within class statements.
23 * Instance attributes are defined only by assignments to attributes of self
24 or tentatively as references to attributes of self.
25
26 These restrictions apply because such attributes are thus explicitly declared,
27 permitting the use of tables (described below). Module and class attributes
28 can also be finalised in this way in order to permit certain optimisations.
29
30 An additional restriction required for the current implementation of tables
31 (as described below) applies to class definitions: each class must be defined
32 using a unique name; repeated definition of classes having the same name is
33 thus not permitted. This restriction arises from the use of the "full name" of
34 a class as a key to the object table, where the full name is a qualified path
35 via the module hierarchy ending with the name of the class.
36
37 Rebinding of attributes outside classes and modules can be allowed if
38 attribute usage observations are being used to detect such external
39 modifications to such objects. Without such observations, such rebinding
40 should be forbidden since apparently constant attributes might be modified in
41 a running program, but code may have been generated that provides specific
42 objects for those attributes under the assumption that they will not be
43 changed.
44
45 See rejected.txt for complicating mechanisms which could be applied to
46 mitigate the effects of these restrictions on optimisations.
47
48 Attribute Usage Observations
49 ============================
50
51 Within a scope, a name may be used in conjunction with attribute names in
52 order to access attributes on objects referenced by the name. However, such
53 observations can only be regarded as reliable if the object referenced is not
54 changed independently by some other mechanism or activity.
55
56 With conventional functions and methods, any locally defined names can be
57 considered private to that scope and thus immune to independent modification,
58 at least within reasonable features of the language. Although stack
59 modification may be possible, it seems appropriate to reject such features,
60 especially since they lend themselves to unmaintainable programs.
61
62 For names defined during the initialisation of a class, since the class itself
63 cannot be referenced by name until its declaration has been completely
64 evaluated, no independent modification can occur from outside the class scope.
65
66 For names defined during the initialisation of a module, global declarations
67 in functions permit the rebinding of global variables and since functions may
68 be invoked during module initialisation, independent modification can
69 potentially occur if any functions are called.
70
71 Module globals can be accessed from other modules that can refer to a module
72 by its name. Initially, an import is required to make a module available for
73 modification, but there is no restriction on whether a module has been
74 completely imported (and thus defined) before an import statement can make it
75 available to other modules. Consider the following package root definition:
76
77 # Module test:
78 def f():
79 import test.modifier
80 x = 123
81 f()
82
83 # Module test.modifier:
84 import test
85 test.x = 456
86
87 Here, an import of test will initially set x to 123, but then the function f
88 will be invoked and cause the test.modifier module to be imported. Since the
89 test module is already being imported, the import statement will not try to
90 perform the import operation again, but it will make the partially defined
91 module available for access. Thus, the test.modifier module will then set x to
92 456, and independent modification of the test namespace will have been
93 performed.
94
95 In conclusion, module globals cannot therefore be regarded as immune to
96 operations that would disrupt usage observations. Consequently, only locals
97 and class definition "locals" can be reliably employed in attribute usage
98 observations.
99
100 Contexts and Values
101 ===================
102
103 Values are used as the common reference representation in micropython: as
104 stored representations of attributes (of classes, instances, modules, and
105 other objects supporting attribute-like entities) as well as the stored values
106 associated with names in functions and methods.
107
108 Unlike other implementations, micropython does not create things like bound
109 method objects for individual instances. Instead, all objects are referenced
110 using a context, reference pair:
111
112 Value Layout
113 ------------
114
115 0 1
116 context object
117 reference reference
118
119 Specific implementations might reverse this ordering for optimisation
120 purposes.
121
122 Rationale
123 ---------
124
125 To reduce the number of created objects whilst retaining the ability to
126 support bound method invocations. The context indicates the context in which
127 an invocation is performed, typically the owner of the method.
128
129 Usage
130 -----
131
132 The context may be inserted as the first argument when a value is involved in
133 an invocation. This argument may then be omitted from the invocation if its
134 usage is not appropriate.
135
136 See invocation.txt for details.
137
138 Context Value Types
139 -------------------
140
141 The following types of context value exist:
142
143 Type Usage Transformations
144 ---- ----- ---------------
145
146 Replaceable With functions (not methods) May be replaced with an
147 instance or a class when a
148 value is stored on an
149 instance or class
150
151 Placeholder With classes May not be replaced
152
153 Instance With instances (and constants) May not be replaced
154 or functions as methods
155
156 Class With functions as methods May be replaced when a
157 value is loaded from a
158 class attribute via an
159 instance
160
161 Contexts in Acquired Values
162 ---------------------------
163
164 There are four classes of instructions which provide values:
165
166 Instruction Purpose Context Operations
167 ----------- ------- ------------------
168
169 1) LoadConst Load module, constant Use loaded object with itself
170 as context
171
172 2) LoadFunction Load function Combine replaceable context
173 with loaded object
174
175 3) LoadClass Load class Combine placeholder context
176 with loaded object
177
178 4) LoadAddress* Load attribute from Preserve or override stored
179 LoadAttr* class, module, context (as described in
180 instance assignment.txt)
181
182 In order to comply with traditional Python behaviour, contexts may or may not
183 represent the object from which an attribute has been acquired.
184
185 See assignment.txt for details.
186
187 Contexts in Stored Values
188 -------------------------
189
190 There are two classes of instruction for storing values:
191
192 Instruction Purpose Context Operations
193 ----------- ------- ------------------
194
195 1) StoreAddress Store attribute in a Preserve context; note that no
196 known object test for class attribute
197 assignment should be necessary
198 since this instruction should only
199 be generated for module globals
200
201 StoreAttr Store attribute in an Preserve context; note that no
202 instance test for class attribute
203 assignment should be necessary
204 since this instruction should only
205 be generated for self accesses
206
207 StoreAttrIndex Store attribute in an Preserve context; since the index
208 unknown object lookup could yield a class
209 attribute, a test of the nature of
210 the nature of the structure is
211 necessary in order to prevent
212 assignments to classes
213
214 2) StoreAddressContext Store attribute in a Override context if appropriate;
215 known object if the value has a replaceable
216 context, permit the target to
217 take ownership of the value
218
219 See assignment.txt for details.
220
221 Tables, Attributes and Lookups
222 ==============================
223
224 Attribute lookups, where the exact location of an object attribute is deduced,
225 are performed differently in micropython than in other implementations.
226 Instead of providing attribute dictionaries, in which attributes are found,
227 attributes are located at fixed places in object structures (described below)
228 and their locations are stored using a special representation known as a
229 table.
230
231 For a given program, a table can be considered as being like a matrix mapping
232 classes to attribute names. For example:
233
234 class A:
235 # instances have attributes x, y
236
237 class B(A):
238 # introduces attribute z for instances
239
240 class C:
241 # instances have attributes a, b, z
242
243 This would provide the following table, referred to as an object table in the
244 context of classes and instances:
245
246 Class/attr a b x y z
247
248 A 1 2
249 B 1 2 3
250 C 1 2 3
251
252 A limitation of this representation is that instance attributes may not shadow
253 class attributes: if an attribute with a given name is not defined on an
254 instance, an attribute with the same name cannot be provided by the class of
255 the instance or any superclass of the instance's class. This impacts the
256 provision of the __class__ attribute, as described below.
257
258 The table can be compacted using a representation known as a displacement
259 list (referred to as an object list in this context):
260
261 Classes with attribute offsets
262
263 classcode A
264 attrcode a b x y z
265
266 B
267 a b x y z
268
269 C
270 a b x y z
271
272 List . . 1 2 1 2 3 1 2 . . 3
273
274 Here, the classcode refers to the offset in the list at which a class's
275 attributes are defined, whereas the attrcode defines the offset within a
276 region of attributes corresponding to a single attribute of a given name.
277
278 Attribute Locations
279 -------------------
280
281 The locations stored in table/list elements are generally for instance
282 attributes relative to the location of the instance, whereas those for class
283 attributes and module attributes are generally absolute addresses. Thus, each
284 occupied table cell has the following structure:
285
286 attrcode, uses-absolute-address, address (or location)
287
288 This could be given instead as follows:
289
290 attrcode, is-class-or-module, location
291
292 Since uses-absolute-address corresponds to is-class-or-module, and since there
293 is a need to test for classes and modules to prevent assignment to attributes
294 of such objects, this particular information is always required.
295
296 The __class__ Attribute
297 -----------------------
298
299 In Python 2.x, at least with new-style classes, instances have __class__
300 attributes which indicate the class from which they have been instantiated,
301 whereas classes have __class__ attributes which reference the type class.
302 With the object table, it is not possible to provide absolute addresses which
303 can be used for both classes and instances, since this would result in classes
304 and instances having the same class, and thus the class of a class would be
305 the class itself.
306
307 One solution is to use object-relative values in the table so that referencing
308 the __class__ attribute of an instance produces a value which can be combined
309 with an instance's address to yield the address of the attribute, which itself
310 refers to the instance's class, whereas referencing the __class__ attribute of
311 a class produces a similar object-relative value that is combined with the
312 class's address to yield the address of the attribute, which itself refers to
313 the special type class.
314
315 Obviously, the above solution requires both classes and instances to retain an
316 attribute location specifically to hold the value appropriate for each object
317 type, whereas a scheme which omits the __class__ attribute on classes would be
318 able to employ an absolute address in the table and maintain only a single
319 address to refer to the class for all instances. The only problem with not
320 providing a sensible __class__ attribute entry for classes would be the need
321 for special treatment of __class__ to prevent inappropriate consultation of
322 the table for classes.
323
324 Comparing Tables as Matrices with Displacement Lists
325 ----------------------------------------------------
326
327 Although displacement lists can provide reasonable levels of compaction for
328 attribute data, the element size is larger than that required for a simple
329 matrix: the attribute code (attrcode) need not be stored since each element
330 unambiguously refers to the availability of an attribute for a particular
331 class or instance of that class, and so the data at a given element need not
332 be tested for relevance to a given attribute access operation.
333
334 Given a program with 20 object types and 100 attribute types, a matrix would
335 occupy the following amount of space:
336
337 number of object types * number of attribute types * element size
338 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
339 = 2000
340
341 In contrast, given a compaction to 40% of the matrix size (without considering
342 element size) in a displacement list, the amount of space would be as follows:
343
344 number of elements * element size
345 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
346 = 1600
347
348 Consequently, the principal overhead of using a displacement list is likely to
349 be in the need to check element relevance when retrieving values from such a
350 list.
351
352 Objects and Structures
353 ======================
354
355 As well as references, micropython needs to have actual objects to refer to.
356 Since classes, functions and instances are all objects, it is desirable that
357 certain common features and operations are supported in the same way for all
358 of these things. To permit this, a common data structure format is used.
359
360 Header.................................................... Attributes.................
361
362 Identifier Identifier Address Identifier Size Object ...
363
364 0 1 2 3 4 5 6
365 classcode attrcode/ invocation funccode size attribute ...
366 instance reference reference
367 status
368
369 Classcode
370 ---------
371
372 Used in attribute lookup.
373
374 Here, the classcode refers to the attribute lookup table for the object (as
375 described above). Classes and instances share the same classcode, and their
376 structures reflect this. Functions all belong to the same type and thus employ
377 the classcode for the function built-in type, whereas modules have distinct
378 types since they must support different sets of attributes.
379
380 Attrcode
381 --------
382
383 Used to test instances for membership of classes (or descendants of classes).
384
385 Since, in traditional Python, classes are only ever instances of some generic
386 built-in type, support for testing such a relationship directly has been
387 removed and the attrcode is not specified for classes: the presence of an
388 attrcode indicates that a given object is an instance. In addition, support
389 has also been removed for testing modules in the same way, meaning that the
390 attrcode is also not specified for modules.
391
392 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
393 for details of attrcodes.
394
395 Invocation Reference
396 --------------------
397
398 Used when an object is called.
399
400 This is the address of the code to be executed when an invocation is performed
401 on the object.
402
403 Funccode
404 --------
405
406 Used to look up argument positions by name.
407
408 The strategy with keyword arguments in micropython is to attempt to position
409 such arguments in the invocation frame as it is being constructed.
410
411 See the "Parameters and Lookups" section for more information.
412
413 Size
414 ----
415
416 Used to indicate the size of an object including attributes.
417
418 Attributes
419 ----------
420
421 For classes, modules and instances, the attributes in the structure correspond
422 to the attributes of each kind of object. For functions, however, the
423 attributes in the structure correspond to the default arguments for each
424 function, if any.
425
426 Structure Types
427 ---------------
428
429 Class C:
430
431 0 1 2 3 4 5 6
432 classcode (unused) __new__ funccode size attribute ...
433 for C reference for reference
434 instantiator
435
436 Instance of C:
437
438 0 1 2 3 4 5 6
439 classcode attrcode C.__call__ funccode size attribute ...
440 for C for C reference for reference
441 (if exists) C.__call__
442
443 Function f:
444
445 0 1 2 3 4 5 6
446 classcode attrcode code funccode size attribute ...
447 for for reference (default)
448 function function reference
449
450 Module m:
451
452 0 1 2 3 4 5 6
453 classcode attrcode (unused) (unused) (unused) attribute ...
454 for m for m (global)
455 reference
456
457 The __class__ Attribute
458 -----------------------
459
460 All objects should support the __class__ attribute, and in most cases this is
461 done using the object table, yielding a common address for all instances of a
462 given class.
463
464 Function: refers to the function class
465 Instance: refers to the class instantiated to make the object
466
467 The object table cannot support two definitions simultaneously for both
468 instances and their classes. Consequently, __class__ access on classes must be
469 tested for and a special result returned.
470
471 Class: refers to the type class (type.__class__ also refers to the type class)
472
473 For convenience, the first attribute of a class will be the common __class__
474 attribute for all its instances. As noted above, direct access to this
475 attribute will not be possible for classes, and a constant result will be
476 returned instead.
477
478 Lists and Tuples
479 ----------------
480
481 The built-in list and tuple sequences employ variable length structures using
482 the attribute locations to store their elements, where each element is a
483 reference to a separately stored object.
484
485 Testing Instance Compatibility with Classes (Attrcode)
486 ------------------------------------------------------
487
488 Although it would be possible to have a data structure mapping classes to
489 compatible classes, such as a matrix indicating the subclasses (or
490 superclasses) of each class, the need to retain the key to such a data
491 structure for each class might introduce a noticeable overhead.
492
493 Instead of having a separate structure, descendant classes of each class are
494 inserted as special attributes into the object table. This requires an extra
495 key to be retained, since each class must provide its own attribute code such
496 that upon an instance/class compatibility test, the code may be obtained and
497 used in the object table.
498
499 Invocation and Code References
500 ------------------------------
501
502 Modules: there is no meaningful invocation reference since modules cannot be
503 explicitly called.
504
505 Functions: a simple code reference is employed pointing to code implementing
506 the function. Note that the function locals are completely distinct from this
507 structure and are not comparable to attributes. Instead, attributes are
508 reserved for default parameter values, although they do not appear in the
509 object table described above, appearing instead in a separate parameter table
510 described below.
511
512 Classes: given that classes must be invoked in order to create instances, a
513 reference must be provided in class structures. However, this reference does
514 not point directly at the __init__ method of the class. Instead, the
515 referenced code belongs to a special initialiser function, __new__, consisting
516 of the following instructions:
517
518 create instance for C
519 call C.__init__(instance, ...)
520 return instance
521
522 Instances: each instance employs a reference to any __call__ method defined in
523 the class hierarchy for the instance, thus maintaining its callable nature.
524
525 Both classes and modules may contain code in their definitions - the former in
526 the "body" of the class, potentially defining attributes, and the latter as
527 the "top-level" code in the module, potentially defining attributes/globals -
528 but this code is not associated with any invocation target. It is thus
529 generated in order of appearance and is not referenced externally.
530
531 Invocation Operation
532 --------------------
533
534 Consequently, regardless of the object an invocation is always done as
535 follows:
536
537 get invocation reference from the header
538 jump to reference
539
540 Additional preparation is necessary before the above code: positional
541 arguments must be saved in the invocation frame, and keyword arguments must be
542 resolved and saved to the appropriate position in the invocation frame.
543
544 See invocation.txt for details.
545
546 Parameters and Lookups
547 ======================
548
549 Since Python supports keyword arguments when making invocations, it becomes
550 necessary to record the parameter names associated with each function or
551 method. Just as object tables record attributes positions on classes and
552 instances, parameter tables record parameter positions in function or method
553 parameter lists.
554
555 For a given program, a parameter table can be considered as being like a
556 matrix mapping functions/methods to parameter names. For example:
557
558 def f(x, y, z):
559 pass
560
561 def g(a, b, c):
562 pass
563
564 def h(a, x):
565 pass
566
567 This would provide the following table, referred to as a parameter table in
568 the context of functions and methods:
569
570 Function/param a b c x y z
571
572 f 1 2 3
573 g 1 2 3
574 h 1 2
575
576 Confusion can occur when functions are adopted as methods, since the context
577 then occupies the first slot in the invocation frame:
578
579 def f(x, y, z):
580 pass
581
582 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
583 -> f(1, 2, 3)
584
585 class C:
586 f = f
587
588 def g(x, y, z):
589 pass
590
591 c = C()
592
593 c.f(y=2, z=3) -> f(<context>, 2, 3)
594 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
595
596 Just as with parameter tables, a displacement list can be prepared from a
597 parameter table:
598
599 Functions with parameter (attribute) offsets
600
601 funccode f
602 attrcode a b c x y z
603
604 g
605 a b c x y z
606
607 h
608 a b c x y z
609
610 List . . . 1 2 3 1 2 3 1 . . 2 . .
611
612 Here, the funccode refers to the offset in the list at which a function's
613 parameters are defined, whereas the attrcode defines the offset within a
614 region of attributes corresponding to a single parameter of a given name.
615
616 Instantiation
617 =============
618
619 When instantiating classes, memory must be reserved for the header of the
620 resulting instance, along with locations for the attributes of the instance.
621 Since the instance header contains data common to all instances of a class, a
622 template header is copied to the start of the newly reserved memory region.
623
624 Register Usage
625 ==============
626
627 During code generation, much of the evaluation produces results which are
628 implicitly recorded in the "active value" or "working" register, and various
629 instructions will consume this active value. In addition, some instructions
630 will consume a separate "active source value" from a register, typically those
631 which are assigning the result of an expression to an assignment target.
632
633 Since values often need to be retained for later use, a set of temporary
634 storage locations are typically employed. However, optimisations may reduce
635 the need to use such temporary storage where instructions which provide the
636 "active value" can be re-executed and will produce the same result. Whether
637 re-use of instructions is possible or not, values still need to be loaded into
638 a "source" register to be accessed by an assignment instruction.
639
640 RSVP instructions generally have the notion of working, target and source
641 registers (see registers.txt). These register "roles" are independent from the
642 actual registers defined for the RSVP machine itself, even though the naming
643 is similar. Generally, instructions do regard the RSVP "working" registers as
644 the class of register in the "working" role, although some occurrences of
645 instruction usage may employ other registers (such as the "result" registers)
646 and thus take advantage of the generality of the RSVP implementation of such
647 instructions.
648
649 List and Tuple Representations
650 ==============================
651
652 Since tuples have a fixed size, the representation of a tuple instance is
653 merely a header describing the size of the entire object, together with a
654 sequence of references to the object "stored" at each position in the
655 structure. Such references consist of the usual context and reference pair.
656
657 Lists, however, have a variable size and must be accessible via an unchanging
658 location even as more memory is allocated elsewhere to accommodate the
659 contents of the list. Consequently, the representation must resemble the
660 following:
661
662 Structure header for list (size == header plus special attribute)
663 Special attribute referencing the underlying sequence
664
665 The underlying sequence has a fixed size, like a tuple, but may contain fewer
666 elements than the size of the sequence permits:
667
668 Special header indicating the current size and allocated size
669 Element
670 ... <-- current size
671 (Unused space)
672 ... <-- allocated size
673
674 This representation permits the allocation of a new sequence when space is
675 exhausted in an existing sequence, with the new sequence address stored in the
676 main list structure. Since access to the contents of the list must go through
677 the main list structure, underlying allocation activities may take place
678 without the users of a list having to be aware of such activities.