1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Contexts and values
8 * Tables, attributes and lookups
9 * Objects and structures
10 * Parameters and lookups
11 * Instantiation
12 * Register usage
13 * List and tuple representations
14
15 Namespaces and Attribute Definition
16 ===================================
17
18 Namespaces are any objects which can retain attributes.
19
20 * Module attributes are defined either at the module level or by global
21 statements.
22 * Class attributes are defined only within class statements.
23 * Instance attributes are defined only by assignments to attributes of self
24 within __init__ methods.
25
26 These restrictions apply because such attributes are thus explicitly declared,
27 permitting the use of tables (described below). Module and class attributes
28 can also be finalised in this way in order to permit certain optimisations.
29
30 An additional restriction required for the current implementation of tables
31 (as described below) applies to class definitions: each class must be defined
32 using a unique name; repeated definition of classes having the same name is
33 thus not permitted. This restriction arises from the use of the "full name" of
34 a class as a key to the object table, where the full name is a qualified path
35 via the module hierarchy ending with the name of the class.
36
37 See rejected.txt for complicating mechanisms which could be applied to
38 mitigate the effects of these restrictions on optimisations.
39
40 Contexts and Values
41 ===================
42
43 Values are used as the common reference representation in micropython: as
44 stored representations of attributes (of classes, instances, modules, and
45 other objects supporting attribute-like entities) as well as the stored values
46 associated with names in functions and methods.
47
48 Unlike other implementations, micropython does not create things like bound
49 method objects for individual instances. Instead, all objects are referenced
50 using a context, reference pair:
51
52 Value Layout
53 ------------
54
55 0 1
56 context object
57 reference reference
58
59 Specific implementations might reverse this ordering for optimisation
60 purposes.
61
62 Rationale
63 ---------
64
65 To reduce the number of created objects whilst retaining the ability to
66 support bound method invocations. The context indicates the context in which
67 an invocation is performed, typically the owner of the method.
68
69 Usage
70 -----
71
72 The context may be inserted as the first argument when a value is involved in
73 an invocation. This argument may then be omitted from the invocation if its
74 usage is not appropriate.
75
76 See invocation.txt for details.
77
78 Context Value Types
79 -------------------
80
81 The following types of context value exist:
82
83 Type Usage Transformations
84 ---- ----- ---------------
85
86 Replaceable With functions (not methods) May be replaced with an
87 instance or a class when a
88 value is stored on an
89 instance or class
90
91 Placeholder With classes May not be replaced
92
93 Instance With instances (and constants) May not be replaced
94 or functions as methods
95
96 Class With functions as methods May be replaced when a
97 value is loaded from a
98 class attribute via an
99 instance
100
101 Contexts in Acquired Values
102 ---------------------------
103
104 There are four classes of instructions which provide values:
105
106 Instruction Purpose Context Operations
107 ----------- ------- ------------------
108
109 1) LoadConst Load module, constant Use loaded object with itself
110 as context
111
112 2) LoadFunction Load function Combine replaceable context
113 with loaded object
114
115 3) LoadClass Load class Combine placeholder context
116 with loaded object
117
118 4) LoadAddress* Load attribute from Preserve or override stored
119 LoadAttr* class, module, context (as described in
120 instance assignment.txt)
121
122 In order to comply with traditional Python behaviour, contexts may or may not
123 represent the object from which an attribute has been acquired.
124
125 See assignment.txt for details.
126
127 Contexts in Stored Values
128 -------------------------
129
130 There are two classes of instruction for storing values:
131
132 Instruction Purpose Context Operations
133 ----------- ------- ------------------
134
135 1) StoreAddress Store attribute in a Preserve context; note that no
136 known object test for class attribute
137 assignment should be necessary
138 since this instruction should only
139 be generated for module globals
140
141 StoreAttr Store attribute in an Preserve context; note that no
142 instance test for class attribute
143 assignment should be necessary
144 since this instruction should only
145 be generated for self accesses
146
147 StoreAttrIndex Store attribute in an Preserve context; since the index
148 unknown object lookup could yield a class
149 attribute, a test of the nature of
150 the nature of the structure is
151 necessary in order to prevent
152 assignments to classes
153
154 2) StoreAddressContext Store attribute in a Override context if appropriate;
155 known object if the value has a replaceable
156 context, permit the target to
157 take ownership of the value
158
159 See assignment.txt for details.
160
161 Tables, Attributes and Lookups
162 ==============================
163
164 Attribute lookups, where the exact location of an object attribute is deduced,
165 are performed differently in micropython than in other implementations.
166 Instead of providing attribute dictionaries, in which attributes are found,
167 attributes are located at fixed places in object structures (described below)
168 and their locations are stored using a special representation known as a
169 table.
170
171 For a given program, a table can be considered as being like a matrix mapping
172 classes to attribute names. For example:
173
174 class A:
175 # instances have attributes x, y
176
177 class B(A):
178 # introduces attribute z for instances
179
180 class C:
181 # instances have attributes a, b, z
182
183 This would provide the following table, referred to as an object table in the
184 context of classes and instances:
185
186 Class/attr a b x y z
187
188 A 1 2
189 B 1 2 3
190 C 1 2 3
191
192 A limitation of this representation is that instance attributes may not shadow
193 class attributes: if an attribute with a given name is not defined on an
194 instance, an attribute with the same name cannot be provided by the class of
195 the instance or any superclass of the instance's class. This impacts the
196 provision of the __class__ attribute, as described below.
197
198 The table can be compacted using a representation known as a displacement
199 list (referred to as an object list in this context):
200
201 Classes with attribute offsets
202
203 classcode A
204 attrcode a b x y z
205
206 B
207 a b x y z
208
209 C
210 a b x y z
211
212 List . . 1 2 1 2 3 1 2 . . 3
213
214 Here, the classcode refers to the offset in the list at which a class's
215 attributes are defined, whereas the attrcode defines the offset within a
216 region of attributes corresponding to a single attribute of a given name.
217
218 Attribute Locations
219 -------------------
220
221 The locations stored in table/list elements are generally for instance
222 attributes relative to the location of the instance, whereas those for class
223 attributes and module attributes are generally absolute addresses. Thus, each
224 occupied table cell has the following structure:
225
226 attrcode, uses-absolute-address, address (or location)
227
228 This could be given instead as follows:
229
230 attrcode, is-class-or-module, location
231
232 Since uses-absolute-address corresponds to is-class-or-module, and since there
233 is a need to test for classes and modules to prevent assignment to attributes
234 of such objects, this particular information is always required.
235
236 The __class__ Attribute
237 -----------------------
238
239 In Python 2.x, at least with new-style classes, instances have __class__
240 attributes which indicate the class from which they have been instantiated,
241 whereas classes have __class__ attributes which reference the type class.
242 With the object table, it is not possible to provide absolute addresses which
243 can be used for both classes and instances, since this would result in classes
244 and instances having the same class, and thus the class of a class would be
245 the class itself.
246
247 One solution is to use object-relative values in the table so that referencing
248 the __class__ attribute of an instance produces a value which can be combined
249 with an instance's address to yield the address of the attribute, which itself
250 refers to the instance's class, whereas referencing the __class__ attribute of
251 a class produces a similar object-relative value that is combined with the
252 class's address to yield the address of the attribute, which itself refers to
253 the special type class.
254
255 Obviously, the above solution requires both classes and instances to retain an
256 attribute location specifically to hold the value appropriate for each object
257 type, whereas a scheme which omits the __class__ attribute on classes would be
258 able to employ an absolute address in the table and maintain only a single
259 address to refer to the class for all instances. The only problem with not
260 providing a sensible __class__ attribute entry for classes would be the need
261 for special treatment of __class__ to prevent inappropriate consultation of
262 the table for classes.
263
264 Comparing Tables as Matrices with Displacement Lists
265 ----------------------------------------------------
266
267 Although displacement lists can provide reasonable levels of compaction for
268 attribute data, the element size is larger than that required for a simple
269 matrix: the attribute code (attrcode) need not be stored since each element
270 unambiguously refers to the availability of an attribute for a particular
271 class or instance of that class, and so the data at a given element need not
272 be tested for relevance to a given attribute access operation.
273
274 Given a program with 20 object types and 100 attribute types, a matrix would
275 occupy the following amount of space:
276
277 number of object types * number of attribute types * element size
278 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
279 = 2000
280
281 In contrast, given a compaction to 40% of the matrix size (without considering
282 element size) in a displacement list, the amount of space would be as follows:
283
284 number of elements * element size
285 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
286 = 1600
287
288 Consequently, the principal overhead of using a displacement list is likely to
289 be in the need to check element relevance when retrieving values from such a
290 list.
291
292 Objects and Structures
293 ======================
294
295 As well as references, micropython needs to have actual objects to refer to.
296 Since classes, functions and instances are all objects, it is desirable that
297 certain common features and operations are supported in the same way for all
298 of these things. To permit this, a common data structure format is used.
299
300 Header.................................................... Attributes.................
301
302 Identifier Identifier Address Identifier Size Object ...
303
304 0 1 2 3 4 5 6
305 classcode attrcode/ invocation funccode size attribute ...
306 instance reference reference
307 status
308
309 Classcode
310 ---------
311
312 Used in attribute lookup.
313
314 Here, the classcode refers to the attribute lookup table for the object (as
315 described above). Classes and instances share the same classcode, and their
316 structures reflect this. Functions all belong to the same type and thus employ
317 the classcode for the function built-in type, whereas modules have distinct
318 types since they must support different sets of attributes.
319
320 Attrcode
321 --------
322
323 Used to test instances for membership of classes (or descendants of classes).
324
325 Since, in traditional Python, classes are only ever instances of some generic
326 built-in type, support for testing such a relationship directly has been
327 removed and the attrcode is not specified for classes: the presence of an
328 attrcode indicates that a given object is an instance. In addition, support
329 has also been removed for testing modules in the same way, meaning that the
330 attrcode is also not specified for modules.
331
332 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
333 for details of attrcodes.
334
335 Invocation Reference
336 --------------------
337
338 Used when an object is called.
339
340 This is the address of the code to be executed when an invocation is performed
341 on the object.
342
343 Funccode
344 --------
345
346 Used to look up argument positions by name.
347
348 The strategy with keyword arguments in micropython is to attempt to position
349 such arguments in the invocation frame as it is being constructed.
350
351 See the "Parameters and Lookups" section for more information.
352
353 Size
354 ----
355
356 Used to indicate the size of an object including attributes.
357
358 Attributes
359 ----------
360
361 For classes, modules and instances, the attributes in the structure correspond
362 to the attributes of each kind of object. For functions, however, the
363 attributes in the structure correspond to the default arguments for each
364 function, if any.
365
366 Structure Types
367 ---------------
368
369 Class C:
370
371 0 1 2 3 4 5 6
372 classcode (unused) __new__ funccode size attribute ...
373 for C reference for reference
374 instantiator
375
376 Instance of C:
377
378 0 1 2 3 4 5 6
379 classcode attrcode C.__call__ funccode size attribute ...
380 for C for C reference for reference
381 (if exists) C.__call__
382
383 Function f:
384
385 0 1 2 3 4 5 6
386 classcode attrcode code funccode size attribute ...
387 for for reference (default)
388 function function reference
389
390 Module m:
391
392 0 1 2 3 4 5 6
393 classcode attrcode (unused) (unused) (unused) attribute ...
394 for m for m (global)
395 reference
396
397 The __class__ Attribute
398 -----------------------
399
400 All objects should support the __class__ attribute, and in most cases this is
401 done using the object table, yielding a common address for all instances of a
402 given class.
403
404 Function: refers to the function class
405 Instance: refers to the class instantiated to make the object
406
407 The object table cannot support two definitions simultaneously for both
408 instances and their classes. Consequently, __class__ access on classes must be
409 tested for and a special result returned.
410
411 Class: refers to the type class (type.__class__ also refers to the type class)
412
413 For convenience, the first attribute of a class will be the common __class__
414 attribute for all its instances. As noted above, direct access to this
415 attribute will not be possible for classes, and a constant result will be
416 returned instead.
417
418 Lists and Tuples
419 ----------------
420
421 The built-in list and tuple sequences employ variable length structures using
422 the attribute locations to store their elements, where each element is a
423 reference to a separately stored object.
424
425 Testing Instance Compatibility with Classes (Attrcode)
426 ------------------------------------------------------
427
428 Although it would be possible to have a data structure mapping classes to
429 compatible classes, such as a matrix indicating the subclasses (or
430 superclasses) of each class, the need to retain the key to such a data
431 structure for each class might introduce a noticeable overhead.
432
433 Instead of having a separate structure, descendant classes of each class are
434 inserted as special attributes into the object table. This requires an extra
435 key to be retained, since each class must provide its own attribute code such
436 that upon an instance/class compatibility test, the code may be obtained and
437 used in the object table.
438
439 Invocation and Code References
440 ------------------------------
441
442 Modules: there is no meaningful invocation reference since modules cannot be
443 explicitly called.
444
445 Functions: a simple code reference is employed pointing to code implementing
446 the function. Note that the function locals are completely distinct from this
447 structure and are not comparable to attributes. Instead, attributes are
448 reserved for default parameter values, although they do not appear in the
449 object table described above, appearing instead in a separate parameter table
450 described below.
451
452 Classes: given that classes must be invoked in order to create instances, a
453 reference must be provided in class structures. However, this reference does
454 not point directly at the __init__ method of the class. Instead, the
455 referenced code belongs to a special initialiser function, __new__, consisting
456 of the following instructions:
457
458 create instance for C
459 call C.__init__(instance, ...)
460 return instance
461
462 Instances: each instance employs a reference to any __call__ method defined in
463 the class hierarchy for the instance, thus maintaining its callable nature.
464
465 Both classes and modules may contain code in their definitions - the former in
466 the "body" of the class, potentially defining attributes, and the latter as
467 the "top-level" code in the module, potentially defining attributes/globals -
468 but this code is not associated with any invocation target. It is thus
469 generated in order of appearance and is not referenced externally.
470
471 Invocation Operation
472 --------------------
473
474 Consequently, regardless of the object an invocation is always done as
475 follows:
476
477 get invocation reference from the header
478 jump to reference
479
480 Additional preparation is necessary before the above code: positional
481 arguments must be saved in the invocation frame, and keyword arguments must be
482 resolved and saved to the appropriate position in the invocation frame.
483
484 See invocation.txt for details.
485
486 Parameters and Lookups
487 ======================
488
489 Since Python supports keyword arguments when making invocations, it becomes
490 necessary to record the parameter names associated with each function or
491 method. Just as object tables record attributes positions on classes and
492 instances, parameter tables record parameter positions in function or method
493 parameter lists.
494
495 For a given program, a parameter table can be considered as being like a
496 matrix mapping functions/methods to parameter names. For example:
497
498 def f(x, y, z):
499 pass
500
501 def g(a, b, c):
502 pass
503
504 def h(a, x):
505 pass
506
507 This would provide the following table, referred to as a parameter table in
508 the context of functions and methods:
509
510 Function/param a b c x y z
511
512 f 1 2 3
513 g 1 2 3
514 h 1 2
515
516 Confusion can occur when functions are adopted as methods, since the context
517 then occupies the first slot in the invocation frame:
518
519 def f(x, y, z):
520 pass
521
522 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
523 -> f(1, 2, 3)
524
525 class C:
526 f = f
527
528 def g(x, y, z):
529 pass
530
531 c = C()
532
533 c.f(y=2, z=3) -> f(<context>, 2, 3)
534 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
535
536 Just as with parameter tables, a displacement list can be prepared from a
537 parameter table:
538
539 Functions with parameter (attribute) offsets
540
541 funccode f
542 attrcode a b c x y z
543
544 g
545 a b c x y z
546
547 h
548 a b c x y z
549
550 List . . . 1 2 3 1 2 3 1 . . 2 . .
551
552 Here, the funccode refers to the offset in the list at which a function's
553 parameters are defined, whereas the attrcode defines the offset within a
554 region of attributes corresponding to a single parameter of a given name.
555
556 Instantiation
557 =============
558
559 When instantiating classes, memory must be reserved for the header of the
560 resulting instance, along with locations for the attributes of the instance.
561 Since the instance header contains data common to all instances of a class, a
562 template header is copied to the start of the newly reserved memory region.
563
564 Register Usage
565 ==============
566
567 During code generation, much of the evaluation produces results which are
568 implicitly recorded in the "active value" or "working" register, and various
569 instructions will consume this active value. In addition, some instructions
570 will consume a separate "active source value" from a register, typically those
571 which are assigning the result of an expression to an assignment target.
572
573 Since values often need to be retained for later use, a set of temporary
574 storage locations are typically employed. However, optimisations may reduce
575 the need to use such temporary storage where instructions which provide the
576 "active value" can be re-executed and will produce the same result. Whether
577 re-use of instructions is possible or not, values still need to be loaded into
578 a "source" register to be accessed by an assignment instruction.
579
580 RSVP instructions generally have the notion of working, target and source
581 registers (see registers.txt). These register "roles" are independent from the
582 actual registers defined for the RSVP machine itself, even though the naming
583 is similar. Generally, instructions do regard the RSVP "working" registers as
584 the class of register in the "working" role, although some occurrences of
585 instruction usage may employ other registers (such as the "result" registers)
586 and thus take advantage of the generality of the RSVP implementation of such
587 instructions.
588
589 List and Tuple Representations
590 ==============================
591
592 Since tuples have a fixed size, the representation of a tuple instance is
593 merely a header describing the size of the entire object, together with a
594 sequence of references to the object "stored" at each position in the
595 structure. Such references consist of the usual context and reference pair.
596
597 Lists, however, have a variable size and must be accessible via an unchanging
598 location even as more memory is allocated elsewhere to accommodate the
599 contents of the list. Consequently, the representation must resemble the
600 following:
601
602 Structure header for list (size == header plus special attribute)
603 Special attribute referencing the underlying sequence
604
605 The underlying sequence has a fixed size, like a tuple, but may contain fewer
606 elements than the size of the sequence permits:
607
608 Special header indicating the current size and allocated size
609 Element
610 ... <-- current size
611 (Unused space)
612 ... <-- allocated size
613
614 This representation permits the allocation of a new sequence when space is
615 exhausted in an existing sequence, with the new sequence address stored in the
616 main list structure. Since access to the contents of the list must go through
617 the main list structure, underlying allocation activities may take place
618 without the users of a list having to be aware of such activities.