1 Concepts
2 ========
3
4 This document describes the underlying concepts employed in micropython.
5
6 * Namespaces and attribute definition
7 * Contexts and values
8 * Tables, attributes and lookups
9 * Objects and structures
10 * Parameters and lookups
11 * Instantiation
12 * Register usage
13 * List and tuple representations
14
15 Namespaces and Attribute Definition
16 ===================================
17
18 Namespaces are any objects which can retain attributes.
19
20 * Module attributes are defined either at the module level or by global
21 statements.
22 * Class attributes are defined only within class statements.
23 * Instance attributes are defined only by assignments to attributes of self
24 or tentatively as references to attributes of self.
25
26 These restrictions apply because such attributes are thus explicitly declared,
27 permitting the use of tables (described below). Module and class attributes
28 can also be finalised in this way in order to permit certain optimisations.
29
30 An additional restriction required for the current implementation of tables
31 (as described below) applies to class definitions: each class must be defined
32 using a unique name; repeated definition of classes having the same name is
33 thus not permitted. This restriction arises from the use of the "full name" of
34 a class as a key to the object table, where the full name is a qualified path
35 via the module hierarchy ending with the name of the class.
36
37 Rebinding of attributes outside classes and modules can be allowed if
38 attribute usage observations are being used to detect such external
39 modifications to such objects. Without such observations, such rebinding
40 should be forbidden since apparently constant attributes might be modified in
41 a running program, but code may have been generated that provides specific
42 objects for those attributes under the assumption that they will not be
43 changed.
44
45 See rejected.txt for complicating mechanisms which could be applied to
46 mitigate the effects of these restrictions on optimisations.
47
48 Contexts and Values
49 ===================
50
51 Values are used as the common reference representation in micropython: as
52 stored representations of attributes (of classes, instances, modules, and
53 other objects supporting attribute-like entities) as well as the stored values
54 associated with names in functions and methods.
55
56 Unlike other implementations, micropython does not create things like bound
57 method objects for individual instances. Instead, all objects are referenced
58 using a context, reference pair:
59
60 Value Layout
61 ------------
62
63 0 1
64 context object
65 reference reference
66
67 Specific implementations might reverse this ordering for optimisation
68 purposes.
69
70 Rationale
71 ---------
72
73 To reduce the number of created objects whilst retaining the ability to
74 support bound method invocations. The context indicates the context in which
75 an invocation is performed, typically the owner of the method.
76
77 Usage
78 -----
79
80 The context may be inserted as the first argument when a value is involved in
81 an invocation. This argument may then be omitted from the invocation if its
82 usage is not appropriate.
83
84 See invocation.txt for details.
85
86 Context Value Types
87 -------------------
88
89 The following types of context value exist:
90
91 Type Usage Transformations
92 ---- ----- ---------------
93
94 Replaceable With functions (not methods) May be replaced with an
95 instance or a class when a
96 value is stored on an
97 instance or class
98
99 Placeholder With classes May not be replaced
100
101 Instance With instances (and constants) May not be replaced
102 or functions as methods
103
104 Class With functions as methods May be replaced when a
105 value is loaded from a
106 class attribute via an
107 instance
108
109 Contexts in Acquired Values
110 ---------------------------
111
112 There are four classes of instructions which provide values:
113
114 Instruction Purpose Context Operations
115 ----------- ------- ------------------
116
117 1) LoadConst Load module, constant Use loaded object with itself
118 as context
119
120 2) LoadFunction Load function Combine replaceable context
121 with loaded object
122
123 3) LoadClass Load class Combine placeholder context
124 with loaded object
125
126 4) LoadAddress* Load attribute from Preserve or override stored
127 LoadAttr* class, module, context (as described in
128 instance assignment.txt)
129
130 In order to comply with traditional Python behaviour, contexts may or may not
131 represent the object from which an attribute has been acquired.
132
133 See assignment.txt for details.
134
135 Contexts in Stored Values
136 -------------------------
137
138 There are two classes of instruction for storing values:
139
140 Instruction Purpose Context Operations
141 ----------- ------- ------------------
142
143 1) StoreAddress Store attribute in a Preserve context; note that no
144 known object test for class attribute
145 assignment should be necessary
146 since this instruction should only
147 be generated for module globals
148
149 StoreAttr Store attribute in an Preserve context; note that no
150 instance test for class attribute
151 assignment should be necessary
152 since this instruction should only
153 be generated for self accesses
154
155 StoreAttrIndex Store attribute in an Preserve context; since the index
156 unknown object lookup could yield a class
157 attribute, a test of the nature of
158 the nature of the structure is
159 necessary in order to prevent
160 assignments to classes
161
162 2) StoreAddressContext Store attribute in a Override context if appropriate;
163 known object if the value has a replaceable
164 context, permit the target to
165 take ownership of the value
166
167 See assignment.txt for details.
168
169 Tables, Attributes and Lookups
170 ==============================
171
172 Attribute lookups, where the exact location of an object attribute is deduced,
173 are performed differently in micropython than in other implementations.
174 Instead of providing attribute dictionaries, in which attributes are found,
175 attributes are located at fixed places in object structures (described below)
176 and their locations are stored using a special representation known as a
177 table.
178
179 For a given program, a table can be considered as being like a matrix mapping
180 classes to attribute names. For example:
181
182 class A:
183 # instances have attributes x, y
184
185 class B(A):
186 # introduces attribute z for instances
187
188 class C:
189 # instances have attributes a, b, z
190
191 This would provide the following table, referred to as an object table in the
192 context of classes and instances:
193
194 Class/attr a b x y z
195
196 A 1 2
197 B 1 2 3
198 C 1 2 3
199
200 A limitation of this representation is that instance attributes may not shadow
201 class attributes: if an attribute with a given name is not defined on an
202 instance, an attribute with the same name cannot be provided by the class of
203 the instance or any superclass of the instance's class. This impacts the
204 provision of the __class__ attribute, as described below.
205
206 The table can be compacted using a representation known as a displacement
207 list (referred to as an object list in this context):
208
209 Classes with attribute offsets
210
211 classcode A
212 attrcode a b x y z
213
214 B
215 a b x y z
216
217 C
218 a b x y z
219
220 List . . 1 2 1 2 3 1 2 . . 3
221
222 Here, the classcode refers to the offset in the list at which a class's
223 attributes are defined, whereas the attrcode defines the offset within a
224 region of attributes corresponding to a single attribute of a given name.
225
226 Attribute Locations
227 -------------------
228
229 The locations stored in table/list elements are generally for instance
230 attributes relative to the location of the instance, whereas those for class
231 attributes and module attributes are generally absolute addresses. Thus, each
232 occupied table cell has the following structure:
233
234 attrcode, uses-absolute-address, address (or location)
235
236 This could be given instead as follows:
237
238 attrcode, is-class-or-module, location
239
240 Since uses-absolute-address corresponds to is-class-or-module, and since there
241 is a need to test for classes and modules to prevent assignment to attributes
242 of such objects, this particular information is always required.
243
244 The __class__ Attribute
245 -----------------------
246
247 In Python 2.x, at least with new-style classes, instances have __class__
248 attributes which indicate the class from which they have been instantiated,
249 whereas classes have __class__ attributes which reference the type class.
250 With the object table, it is not possible to provide absolute addresses which
251 can be used for both classes and instances, since this would result in classes
252 and instances having the same class, and thus the class of a class would be
253 the class itself.
254
255 One solution is to use object-relative values in the table so that referencing
256 the __class__ attribute of an instance produces a value which can be combined
257 with an instance's address to yield the address of the attribute, which itself
258 refers to the instance's class, whereas referencing the __class__ attribute of
259 a class produces a similar object-relative value that is combined with the
260 class's address to yield the address of the attribute, which itself refers to
261 the special type class.
262
263 Obviously, the above solution requires both classes and instances to retain an
264 attribute location specifically to hold the value appropriate for each object
265 type, whereas a scheme which omits the __class__ attribute on classes would be
266 able to employ an absolute address in the table and maintain only a single
267 address to refer to the class for all instances. The only problem with not
268 providing a sensible __class__ attribute entry for classes would be the need
269 for special treatment of __class__ to prevent inappropriate consultation of
270 the table for classes.
271
272 Comparing Tables as Matrices with Displacement Lists
273 ----------------------------------------------------
274
275 Although displacement lists can provide reasonable levels of compaction for
276 attribute data, the element size is larger than that required for a simple
277 matrix: the attribute code (attrcode) need not be stored since each element
278 unambiguously refers to the availability of an attribute for a particular
279 class or instance of that class, and so the data at a given element need not
280 be tested for relevance to a given attribute access operation.
281
282 Given a program with 20 object types and 100 attribute types, a matrix would
283 occupy the following amount of space:
284
285 number of object types * number of attribute types * element size
286 = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
287 = 2000
288
289 In contrast, given a compaction to 40% of the matrix size (without considering
290 element size) in a displacement list, the amount of space would be as follows:
291
292 number of elements * element size
293 = 40% * (20 * 100) * 2 (assuming that one additional location is required)
294 = 1600
295
296 Consequently, the principal overhead of using a displacement list is likely to
297 be in the need to check element relevance when retrieving values from such a
298 list.
299
300 Objects and Structures
301 ======================
302
303 As well as references, micropython needs to have actual objects to refer to.
304 Since classes, functions and instances are all objects, it is desirable that
305 certain common features and operations are supported in the same way for all
306 of these things. To permit this, a common data structure format is used.
307
308 Header.................................................... Attributes.................
309
310 Identifier Identifier Address Identifier Size Object ...
311
312 0 1 2 3 4 5 6
313 classcode attrcode/ invocation funccode size attribute ...
314 instance reference reference
315 status
316
317 Classcode
318 ---------
319
320 Used in attribute lookup.
321
322 Here, the classcode refers to the attribute lookup table for the object (as
323 described above). Classes and instances share the same classcode, and their
324 structures reflect this. Functions all belong to the same type and thus employ
325 the classcode for the function built-in type, whereas modules have distinct
326 types since they must support different sets of attributes.
327
328 Attrcode
329 --------
330
331 Used to test instances for membership of classes (or descendants of classes).
332
333 Since, in traditional Python, classes are only ever instances of some generic
334 built-in type, support for testing such a relationship directly has been
335 removed and the attrcode is not specified for classes: the presence of an
336 attrcode indicates that a given object is an instance. In addition, support
337 has also been removed for testing modules in the same way, meaning that the
338 attrcode is also not specified for modules.
339
340 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
341 for details of attrcodes.
342
343 Invocation Reference
344 --------------------
345
346 Used when an object is called.
347
348 This is the address of the code to be executed when an invocation is performed
349 on the object.
350
351 Funccode
352 --------
353
354 Used to look up argument positions by name.
355
356 The strategy with keyword arguments in micropython is to attempt to position
357 such arguments in the invocation frame as it is being constructed.
358
359 See the "Parameters and Lookups" section for more information.
360
361 Size
362 ----
363
364 Used to indicate the size of an object including attributes.
365
366 Attributes
367 ----------
368
369 For classes, modules and instances, the attributes in the structure correspond
370 to the attributes of each kind of object. For functions, however, the
371 attributes in the structure correspond to the default arguments for each
372 function, if any.
373
374 Structure Types
375 ---------------
376
377 Class C:
378
379 0 1 2 3 4 5 6
380 classcode (unused) __new__ funccode size attribute ...
381 for C reference for reference
382 instantiator
383
384 Instance of C:
385
386 0 1 2 3 4 5 6
387 classcode attrcode C.__call__ funccode size attribute ...
388 for C for C reference for reference
389 (if exists) C.__call__
390
391 Function f:
392
393 0 1 2 3 4 5 6
394 classcode attrcode code funccode size attribute ...
395 for for reference (default)
396 function function reference
397
398 Module m:
399
400 0 1 2 3 4 5 6
401 classcode attrcode (unused) (unused) (unused) attribute ...
402 for m for m (global)
403 reference
404
405 The __class__ Attribute
406 -----------------------
407
408 All objects should support the __class__ attribute, and in most cases this is
409 done using the object table, yielding a common address for all instances of a
410 given class.
411
412 Function: refers to the function class
413 Instance: refers to the class instantiated to make the object
414
415 The object table cannot support two definitions simultaneously for both
416 instances and their classes. Consequently, __class__ access on classes must be
417 tested for and a special result returned.
418
419 Class: refers to the type class (type.__class__ also refers to the type class)
420
421 For convenience, the first attribute of a class will be the common __class__
422 attribute for all its instances. As noted above, direct access to this
423 attribute will not be possible for classes, and a constant result will be
424 returned instead.
425
426 Lists and Tuples
427 ----------------
428
429 The built-in list and tuple sequences employ variable length structures using
430 the attribute locations to store their elements, where each element is a
431 reference to a separately stored object.
432
433 Testing Instance Compatibility with Classes (Attrcode)
434 ------------------------------------------------------
435
436 Although it would be possible to have a data structure mapping classes to
437 compatible classes, such as a matrix indicating the subclasses (or
438 superclasses) of each class, the need to retain the key to such a data
439 structure for each class might introduce a noticeable overhead.
440
441 Instead of having a separate structure, descendant classes of each class are
442 inserted as special attributes into the object table. This requires an extra
443 key to be retained, since each class must provide its own attribute code such
444 that upon an instance/class compatibility test, the code may be obtained and
445 used in the object table.
446
447 Invocation and Code References
448 ------------------------------
449
450 Modules: there is no meaningful invocation reference since modules cannot be
451 explicitly called.
452
453 Functions: a simple code reference is employed pointing to code implementing
454 the function. Note that the function locals are completely distinct from this
455 structure and are not comparable to attributes. Instead, attributes are
456 reserved for default parameter values, although they do not appear in the
457 object table described above, appearing instead in a separate parameter table
458 described below.
459
460 Classes: given that classes must be invoked in order to create instances, a
461 reference must be provided in class structures. However, this reference does
462 not point directly at the __init__ method of the class. Instead, the
463 referenced code belongs to a special initialiser function, __new__, consisting
464 of the following instructions:
465
466 create instance for C
467 call C.__init__(instance, ...)
468 return instance
469
470 Instances: each instance employs a reference to any __call__ method defined in
471 the class hierarchy for the instance, thus maintaining its callable nature.
472
473 Both classes and modules may contain code in their definitions - the former in
474 the "body" of the class, potentially defining attributes, and the latter as
475 the "top-level" code in the module, potentially defining attributes/globals -
476 but this code is not associated with any invocation target. It is thus
477 generated in order of appearance and is not referenced externally.
478
479 Invocation Operation
480 --------------------
481
482 Consequently, regardless of the object an invocation is always done as
483 follows:
484
485 get invocation reference from the header
486 jump to reference
487
488 Additional preparation is necessary before the above code: positional
489 arguments must be saved in the invocation frame, and keyword arguments must be
490 resolved and saved to the appropriate position in the invocation frame.
491
492 See invocation.txt for details.
493
494 Parameters and Lookups
495 ======================
496
497 Since Python supports keyword arguments when making invocations, it becomes
498 necessary to record the parameter names associated with each function or
499 method. Just as object tables record attributes positions on classes and
500 instances, parameter tables record parameter positions in function or method
501 parameter lists.
502
503 For a given program, a parameter table can be considered as being like a
504 matrix mapping functions/methods to parameter names. For example:
505
506 def f(x, y, z):
507 pass
508
509 def g(a, b, c):
510 pass
511
512 def h(a, x):
513 pass
514
515 This would provide the following table, referred to as a parameter table in
516 the context of functions and methods:
517
518 Function/param a b c x y z
519
520 f 1 2 3
521 g 1 2 3
522 h 1 2
523
524 Confusion can occur when functions are adopted as methods, since the context
525 then occupies the first slot in the invocation frame:
526
527 def f(x, y, z):
528 pass
529
530 f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
531 -> f(1, 2, 3)
532
533 class C:
534 f = f
535
536 def g(x, y, z):
537 pass
538
539 c = C()
540
541 c.f(y=2, z=3) -> f(<context>, 2, 3)
542 c.g(y=2, z=3) -> C.g(<context>, 2, 3)
543
544 Just as with parameter tables, a displacement list can be prepared from a
545 parameter table:
546
547 Functions with parameter (attribute) offsets
548
549 funccode f
550 attrcode a b c x y z
551
552 g
553 a b c x y z
554
555 h
556 a b c x y z
557
558 List . . . 1 2 3 1 2 3 1 . . 2 . .
559
560 Here, the funccode refers to the offset in the list at which a function's
561 parameters are defined, whereas the attrcode defines the offset within a
562 region of attributes corresponding to a single parameter of a given name.
563
564 Instantiation
565 =============
566
567 When instantiating classes, memory must be reserved for the header of the
568 resulting instance, along with locations for the attributes of the instance.
569 Since the instance header contains data common to all instances of a class, a
570 template header is copied to the start of the newly reserved memory region.
571
572 Register Usage
573 ==============
574
575 During code generation, much of the evaluation produces results which are
576 implicitly recorded in the "active value" or "working" register, and various
577 instructions will consume this active value. In addition, some instructions
578 will consume a separate "active source value" from a register, typically those
579 which are assigning the result of an expression to an assignment target.
580
581 Since values often need to be retained for later use, a set of temporary
582 storage locations are typically employed. However, optimisations may reduce
583 the need to use such temporary storage where instructions which provide the
584 "active value" can be re-executed and will produce the same result. Whether
585 re-use of instructions is possible or not, values still need to be loaded into
586 a "source" register to be accessed by an assignment instruction.
587
588 RSVP instructions generally have the notion of working, target and source
589 registers (see registers.txt). These register "roles" are independent from the
590 actual registers defined for the RSVP machine itself, even though the naming
591 is similar. Generally, instructions do regard the RSVP "working" registers as
592 the class of register in the "working" role, although some occurrences of
593 instruction usage may employ other registers (such as the "result" registers)
594 and thus take advantage of the generality of the RSVP implementation of such
595 instructions.
596
597 List and Tuple Representations
598 ==============================
599
600 Since tuples have a fixed size, the representation of a tuple instance is
601 merely a header describing the size of the entire object, together with a
602 sequence of references to the object "stored" at each position in the
603 structure. Such references consist of the usual context and reference pair.
604
605 Lists, however, have a variable size and must be accessible via an unchanging
606 location even as more memory is allocated elsewhere to accommodate the
607 contents of the list. Consequently, the representation must resemble the
608 following:
609
610 Structure header for list (size == header plus special attribute)
611 Special attribute referencing the underlying sequence
612
613 The underlying sequence has a fixed size, like a tuple, but may contain fewer
614 elements than the size of the sequence permits:
615
616 Special header indicating the current size and allocated size
617 Element
618 ... <-- current size
619 (Unused space)
620 ... <-- allocated size
621
622 This representation permits the allocation of a new sequence when space is
623 exhausted in an existing sequence, with the new sequence address stored in the
624 main list structure. Since access to the contents of the list must go through
625 the main list structure, underlying allocation activities may take place
626 without the users of a list having to be aware of such activities.