micropython

docs/concepts.txt

450:5382a26943ca
2011-08-21 Paul Boddie Changed the nature of RSVP instructions so that they have explicit working, target and source register inputs and no longer consume simple instructions as direct inputs. This changes the nature of the generated code and of certain optimisations, as well as making some previous optimisations redundant and necessitating additional optimisations.
     1 Concepts
     2 ========
     3 
     4 This document describes the underlying concepts employed in micropython.
     5 
     6   * Namespaces and attribute definition
     7   * Contexts and values
     8   * Tables, attributes and lookups
     9   * Objects and structures
    10   * Parameters and lookups
    11   * Instantiation
    12   * Register usage
    13   * List and tuple representations
    14 
    15 Namespaces and Attribute Definition
    16 ===================================
    17 
    18 Namespaces are any objects which can retain attributes.
    19 
    20   * Module attributes are defined either at the module level or by global
    21     statements.
    22   * Class attributes are defined only within class statements.
    23   * Instance attributes are defined only by assignments to attributes of self
    24     within __init__ methods.
    25 
    26 These restrictions apply because such attributes are thus explicitly declared,
    27 permitting the use of tables (described below). Module and class attributes
    28 can also be finalised in this way in order to permit certain optimisations.
    29 
    30 An additional restriction required for the current implementation of tables
    31 (as described below) applies to class definitions: each class must be defined
    32 using a unique name; repeated definition of classes having the same name is
    33 thus not permitted. This restriction arises from the use of the "full name" of
    34 a class as a key to the object table, where the full name is a qualified path
    35 via the module hierarchy ending with the name of the class.
    36 
    37 See rejected.txt for complicating mechanisms which could be applied to
    38 mitigate the effects of these restrictions on optimisations.
    39 
    40 Contexts and Values
    41 ===================
    42 
    43 Values are used as the common reference representation in micropython: as
    44 stored representations of attributes (of classes, instances, modules, and
    45 other objects supporting attribute-like entities) as well as the stored values
    46 associated with names in functions and methods.
    47 
    48 Unlike other implementations, micropython does not create things like bound
    49 method objects for individual instances. Instead, all objects are referenced
    50 using a context, reference pair:
    51 
    52 Value Layout
    53 ------------
    54 
    55     0           1
    56     context     object
    57     reference   reference
    58 
    59 Specific implementations might reverse this ordering for optimisation
    60 purposes.
    61 
    62 Rationale
    63 ---------
    64 
    65 To reduce the number of created objects whilst retaining the ability to
    66 support bound method invocations. The context indicates the context in which
    67 an invocation is performed, typically the owner of the method.
    68 
    69 Usage
    70 -----
    71 
    72 The context may be inserted as the first argument when a value is involved in
    73 an invocation. This argument may then be omitted from the invocation if its
    74 usage is not appropriate.
    75 
    76 See invocation.txt for details.
    77 
    78 Context Value Types
    79 -------------------
    80 
    81 The following types of context value exist:
    82 
    83     Type            Usage                           Transformations
    84     ----            -----                           ---------------
    85 
    86     Replaceable     With functions (not methods)    May be replaced with an
    87                                                     instance or a class when a
    88                                                     value is stored on an
    89                                                     instance or class
    90 
    91     Placeholder     With classes                    May not be replaced
    92 
    93     Instance        With instances (and constants)  May not be replaced
    94                     or functions as methods
    95 
    96     Class           With functions as methods       May be replaced when a
    97                                                     value is loaded from a
    98                                                     class attribute via an
    99                                                     instance
   100 
   101 Contexts in Acquired Values
   102 ---------------------------
   103 
   104 There are four classes of instructions which provide values:
   105 
   106     Instruction         Purpose                 Context Operations
   107     -----------         -------                 ------------------
   108 
   109 1)  LoadConst           Load module, constant   Use loaded object with itself
   110                                                 as context
   111 
   112 2)  LoadFunction        Load function           Combine replaceable context
   113                                                 with loaded object
   114 
   115 3)  LoadClass           Load class              Combine placeholder context
   116                                                 with loaded object
   117 
   118 4)  LoadAddress*        Load attribute from     Preserve or override stored
   119     LoadAttr*           class, module,          context (as described in
   120                         instance                assignment.txt)
   121 
   122 In order to comply with traditional Python behaviour, contexts may or may not
   123 represent the object from which an attribute has been acquired.
   124 
   125 See assignment.txt for details.
   126 
   127 Contexts in Stored Values
   128 -------------------------
   129 
   130 There are two classes of instruction for storing values:
   131 
   132     Instruction         Purpose                 Context Operations
   133     -----------         -------                 ------------------
   134 
   135 1)  StoreAddress        Store attribute in a    Preserve context; note that no
   136                         known object            test for class attribute
   137                                                 assignment should be necessary
   138                                                 since this instruction should only
   139                                                 be generated for module globals
   140 
   141     StoreAttr           Store attribute in an   Preserve context; note that no
   142                         instance                test for class attribute
   143                                                 assignment should be necessary
   144                                                 since this instruction should only
   145                                                 be generated for self accesses
   146 
   147     StoreAttrIndex      Store attribute in an   Preserve context; since the index
   148                         unknown object          lookup could yield a class
   149                                                 attribute, a test of the nature of
   150                                                 the nature of the structure is
   151                                                 necessary in order to prevent
   152                                                 assignments to classes
   153 
   154 2)  StoreAddressContext Store attribute in a    Override context if appropriate;
   155                         known object            if the value has a replaceable
   156                                                 context, permit the target to
   157                                                 take ownership of the value
   158 
   159 See assignment.txt for details.
   160 
   161 Tables, Attributes and Lookups
   162 ==============================
   163 
   164 Attribute lookups, where the exact location of an object attribute is deduced,
   165 are performed differently in micropython than in other implementations.
   166 Instead of providing attribute dictionaries, in which attributes are found,
   167 attributes are located at fixed places in object structures (described below)
   168 and their locations are stored using a special representation known as a
   169 table.
   170 
   171 For a given program, a table can be considered as being like a matrix mapping
   172 classes to attribute names. For example:
   173 
   174     class A:
   175         # instances have attributes x, y
   176 
   177     class B(A):
   178         # introduces attribute z for instances
   179 
   180     class C:
   181         # instances have attributes a, b, z
   182 
   183 This would provide the following table, referred to as an object table in the
   184 context of classes and instances:
   185 
   186     Class/attr      a   b   x   y   z
   187 
   188     A                       1   2
   189     B                       1   2   3
   190     C               1   2           3
   191 
   192 A limitation of this representation is that instance attributes may not shadow
   193 class attributes: if an attribute with a given name is not defined on an
   194 instance, an attribute with the same name cannot be provided by the class of
   195 the instance or any superclass of the instance's class. This impacts the
   196 provision of the __class__ attribute, as described below.
   197 
   198 The table can be compacted using a representation known as a displacement
   199 list (referred to as an object list in this context):
   200 
   201                 Classes with attribute offsets
   202 
   203     classcode   A
   204     attrcode    a   b   x   y   z
   205 
   206                         B
   207                         a   b   x   y   z
   208 
   209                                             C
   210                                             a   b   x   y   z
   211 
   212     List        .   .   1   2   1   2   3   1   2   .   .   3
   213 
   214 Here, the classcode refers to the offset in the list at which a class's
   215 attributes are defined, whereas the attrcode defines the offset within a
   216 region of attributes corresponding to a single attribute of a given name.
   217 
   218 Attribute Locations
   219 -------------------
   220 
   221 The locations stored in table/list elements are generally for instance
   222 attributes relative to the location of the instance, whereas those for class
   223 attributes and module attributes are generally absolute addresses. Thus, each
   224 occupied table cell has the following structure:
   225 
   226     attrcode, uses-absolute-address, address (or location)
   227 
   228 This could be given instead as follows:
   229 
   230     attrcode, is-class-or-module, location
   231 
   232 Since uses-absolute-address corresponds to is-class-or-module, and since there
   233 is a need to test for classes and modules to prevent assignment to attributes
   234 of such objects, this particular information is always required.
   235 
   236 The __class__ Attribute
   237 -----------------------
   238 
   239 In Python 2.x, at least with new-style classes, instances have __class__
   240 attributes which indicate the class from which they have been instantiated,
   241 whereas classes have __class__ attributes which reference the type class.
   242 With the object table, it is not possible to provide absolute addresses which
   243 can be used for both classes and instances, since this would result in classes
   244 and instances having the same class, and thus the class of a class would be
   245 the class itself.
   246 
   247 One solution is to use object-relative values in the table so that referencing
   248 the __class__ attribute of an instance produces a value which can be combined
   249 with an instance's address to yield the address of the attribute, which itself
   250 refers to the instance's class, whereas referencing the __class__ attribute of
   251 a class produces a similar object-relative value that is combined with the
   252 class's address to yield the address of the attribute, which itself refers to
   253 the special type class.
   254 
   255 Obviously, the above solution requires both classes and instances to retain an
   256 attribute location specifically to hold the value appropriate for each object
   257 type, whereas a scheme which omits the __class__ attribute on classes would be
   258 able to employ an absolute address in the table and maintain only a single
   259 address to refer to the class for all instances. The only problem with not
   260 providing a sensible __class__ attribute entry for classes would be the need
   261 for special treatment of __class__ to prevent inappropriate consultation of
   262 the table for classes.
   263 
   264 Comparing Tables as Matrices with Displacement Lists
   265 ----------------------------------------------------
   266 
   267 Although displacement lists can provide reasonable levels of compaction for
   268 attribute data, the element size is larger than that required for a simple
   269 matrix: the attribute code (attrcode) need not be stored since each element
   270 unambiguously refers to the availability of an attribute for a particular
   271 class or instance of that class, and so the data at a given element need not
   272 be tested for relevance to a given attribute access operation.
   273 
   274 Given a program with 20 object types and 100 attribute types, a matrix would
   275 occupy the following amount of space:
   276 
   277     number of object types * number of attribute types * element size
   278   = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
   279   = 2000
   280 
   281 In contrast, given a compaction to 40% of the matrix size (without considering
   282 element size) in a displacement list, the amount of space would be as follows:
   283 
   284     number of elements * element size
   285   = 40% * (20 * 100) * 2 (assuming that one additional location is required)
   286   = 1600
   287 
   288 Consequently, the principal overhead of using a displacement list is likely to
   289 be in the need to check element relevance when retrieving values from such a
   290 list.
   291 
   292 Objects and Structures 
   293 ======================
   294 
   295 As well as references, micropython needs to have actual objects to refer to.
   296 Since classes, functions and instances are all objects, it is desirable that
   297 certain common features and operations are supported in the same way for all
   298 of these things. To permit this, a common data structure format is used.
   299 
   300     Header....................................................  Attributes.................
   301 
   302     Identifier  Identifier  Address     Identifier  Size        Object      ...
   303 
   304     0           1           2           3           4           5           6
   305     classcode   attrcode/   invocation  funccode    size        attribute   ...
   306                 instance    reference                           reference
   307                 status
   308 
   309 Classcode
   310 ---------
   311 
   312 Used in attribute lookup.
   313 
   314 Here, the classcode refers to the attribute lookup table for the object (as
   315 described above). Classes and instances share the same classcode, and their
   316 structures reflect this. Functions all belong to the same type and thus employ
   317 the classcode for the function built-in type, whereas modules have distinct
   318 types since they must support different sets of attributes.
   319 
   320 Attrcode
   321 --------
   322 
   323 Used to test instances for membership of classes (or descendants of classes).
   324 
   325 Since, in traditional Python, classes are only ever instances of some generic
   326 built-in type, support for testing such a relationship directly has been
   327 removed and the attrcode is not specified for classes: the presence of an
   328 attrcode indicates that a given object is an instance. In addition, support
   329 has also been removed for testing modules in the same way, meaning that the
   330 attrcode is also not specified for modules.
   331 
   332 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
   333 for details of attrcodes.
   334 
   335 Invocation Reference
   336 --------------------
   337 
   338 Used when an object is called.
   339 
   340 This is the address of the code to be executed when an invocation is performed
   341 on the object.
   342 
   343 Funccode
   344 --------
   345 
   346 Used to look up argument positions by name.
   347 
   348 The strategy with keyword arguments in micropython is to attempt to position
   349 such arguments in the invocation frame as it is being constructed.
   350 
   351 See the "Parameters and Lookups" section for more information.
   352 
   353 Size
   354 ----
   355 
   356 Used to indicate the size of an object including attributes.
   357 
   358 Attributes
   359 ----------
   360 
   361 For classes, modules and instances, the attributes in the structure correspond
   362 to the attributes of each kind of object. For functions, however, the
   363 attributes in the structure correspond to the default arguments for each
   364 function, if any.
   365 
   366 Structure Types
   367 ---------------
   368 
   369 Class C:
   370 
   371     0           1           2           3           4           5           6
   372     classcode   (unused)    __new__     funccode    size        attribute   ...
   373     for C                   reference   for                     reference
   374                                         instantiator
   375 
   376 Instance of C:
   377 
   378     0           1           2           3           4           5           6
   379     classcode   attrcode    C.__call__  funccode    size        attribute   ...
   380     for C       for C       reference   for                     reference
   381                             (if exists) C.__call__
   382 
   383 Function f:
   384 
   385     0           1           2           3           4           5           6
   386     classcode   attrcode    code        funccode    size        attribute   ...
   387     for         for         reference                           (default)
   388     function    function                                        reference
   389 
   390 Module m:
   391 
   392     0           1           2           3           4           5           6
   393     classcode   attrcode    (unused)    (unused)    (unused)    attribute   ...
   394     for m       for m                                           (global)
   395                                                                 reference
   396 
   397 The __class__ Attribute
   398 -----------------------
   399 
   400 All objects should support the __class__ attribute, and in most cases this is
   401 done using the object table, yielding a common address for all instances of a
   402 given class.
   403 
   404 Function: refers to the function class
   405 Instance: refers to the class instantiated to make the object
   406 
   407 The object table cannot support two definitions simultaneously for both
   408 instances and their classes. Consequently, __class__ access on classes must be
   409 tested for and a special result returned.
   410 
   411 Class: refers to the type class (type.__class__ also refers to the type class)
   412 
   413 For convenience, the first attribute of a class will be the common __class__
   414 attribute for all its instances. As noted above, direct access to this
   415 attribute will not be possible for classes, and a constant result will be
   416 returned instead.
   417 
   418 Lists and Tuples
   419 ----------------
   420 
   421 The built-in list and tuple sequences employ variable length structures using
   422 the attribute locations to store their elements, where each element is a
   423 reference to a separately stored object.
   424 
   425 Testing Instance Compatibility with Classes (Attrcode)
   426 ------------------------------------------------------
   427 
   428 Although it would be possible to have a data structure mapping classes to
   429 compatible classes, such as a matrix indicating the subclasses (or
   430 superclasses) of each class, the need to retain the key to such a data
   431 structure for each class might introduce a noticeable overhead.
   432 
   433 Instead of having a separate structure, descendant classes of each class are
   434 inserted as special attributes into the object table. This requires an extra
   435 key to be retained, since each class must provide its own attribute code such
   436 that upon an instance/class compatibility test, the code may be obtained and
   437 used in the object table.
   438 
   439 Invocation and Code References
   440 ------------------------------
   441 
   442 Modules: there is no meaningful invocation reference since modules cannot be
   443 explicitly called.
   444 
   445 Functions: a simple code reference is employed pointing to code implementing
   446 the function. Note that the function locals are completely distinct from this
   447 structure and are not comparable to attributes. Instead, attributes are
   448 reserved for default parameter values, although they do not appear in the
   449 object table described above, appearing instead in a separate parameter table
   450 described below.
   451 
   452 Classes: given that classes must be invoked in order to create instances, a
   453 reference must be provided in class structures. However, this reference does
   454 not point directly at the __init__ method of the class. Instead, the
   455 referenced code belongs to a special initialiser function, __new__, consisting
   456 of the following instructions:
   457 
   458     create instance for C
   459     call C.__init__(instance, ...)
   460     return instance
   461 
   462 Instances: each instance employs a reference to any __call__ method defined in
   463 the class hierarchy for the instance, thus maintaining its callable nature.
   464 
   465 Both classes and modules may contain code in their definitions - the former in
   466 the "body" of the class, potentially defining attributes, and the latter as
   467 the "top-level" code in the module, potentially defining attributes/globals -
   468 but this code is not associated with any invocation target. It is thus
   469 generated in order of appearance and is not referenced externally.
   470 
   471 Invocation Operation
   472 --------------------
   473 
   474 Consequently, regardless of the object an invocation is always done as
   475 follows:
   476 
   477     get invocation reference from the header
   478     jump to reference
   479 
   480 Additional preparation is necessary before the above code: positional
   481 arguments must be saved in the invocation frame, and keyword arguments must be
   482 resolved and saved to the appropriate position in the invocation frame.
   483 
   484 See invocation.txt for details.
   485 
   486 Parameters and Lookups 
   487 ======================
   488 
   489 Since Python supports keyword arguments when making invocations, it becomes
   490 necessary to record the parameter names associated with each function or
   491 method. Just as object tables record attributes positions on classes and
   492 instances, parameter tables record parameter positions in function or method
   493 parameter lists.
   494 
   495 For a given program, a parameter table can be considered as being like a
   496 matrix mapping functions/methods to parameter names. For example:
   497 
   498     def f(x, y, z):
   499         pass
   500 
   501     def g(a, b, c):
   502         pass
   503 
   504     def h(a, x):
   505         pass
   506 
   507 This would provide the following table, referred to as a parameter table in
   508 the context of functions and methods:
   509 
   510     Function/param  a   b   c   x   y   z
   511 
   512     f                           1   2   3
   513     g               1   2   3
   514     h               1           2
   515 
   516 Confusion can occur when functions are adopted as methods, since the context
   517 then occupies the first slot in the invocation frame:
   518 
   519     def f(x, y, z):
   520         pass
   521 
   522     f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
   523                      -> f(1, 2, 3)
   524 
   525     class C:
   526         f = f
   527 
   528         def g(x, y, z):
   529             pass
   530 
   531     c = C()
   532 
   533     c.f(y=2, z=3) -> f(<context>, 2, 3)
   534     c.g(y=2, z=3) -> C.g(<context>, 2, 3)
   535 
   536 Just as with parameter tables, a displacement list can be prepared from a
   537 parameter table:
   538 
   539                 Functions with parameter (attribute) offsets
   540 
   541     funccode    f
   542     attrcode    a   b   c   x   y   z
   543 
   544                                         g
   545                                         a   b   c   x   y   z
   546 
   547                                                     h
   548                                                     a   b   c   x   y   z
   549 
   550     List        .   .   .   1   2   3   1   2   3   1   .   .   2   .   .
   551 
   552 Here, the funccode refers to the offset in the list at which a function's
   553 parameters are defined, whereas the attrcode defines the offset within a
   554 region of attributes corresponding to a single parameter of a given name.
   555 
   556 Instantiation
   557 =============
   558 
   559 When instantiating classes, memory must be reserved for the header of the
   560 resulting instance, along with locations for the attributes of the instance.
   561 Since the instance header contains data common to all instances of a class, a
   562 template header is copied to the start of the newly reserved memory region.
   563 
   564 Register Usage
   565 ==============
   566 
   567 During code generation, much of the evaluation produces results which are
   568 implicitly recorded in the "active value" or "working" register, and various
   569 instructions will consume this active value. In addition, some instructions
   570 will consume a separate "active source value" from a register, typically those
   571 which are assigning the result of an expression to an assignment target.
   572 
   573 Since values often need to be retained for later use, a set of temporary
   574 storage locations are typically employed. However, optimisations may reduce
   575 the need to use such temporary storage where instructions which provide the
   576 "active value" can be re-executed and will produce the same result. Whether
   577 re-use of instructions is possible or not, values still need to be loaded into
   578 a "source" register to be accessed by an assignment instruction.
   579 
   580 RSVP instructions generally have the notion of working, target and source
   581 registers (see registers.txt). These register "roles" are independent from the
   582 actual registers defined for the RSVP machine itself, even though the naming
   583 is similar. Generally, instructions do regard the RSVP "working" registers as
   584 the class of register in the "working" role, although some occurrences of
   585 instruction usage may employ other registers (such as the "result" registers)
   586 and thus take advantage of the generality of the RSVP implementation of such
   587 instructions.
   588 
   589 List and Tuple Representations
   590 ==============================
   591 
   592 Since tuples have a fixed size, the representation of a tuple instance is
   593 merely a header describing the size of the entire object, together with a
   594 sequence of references to the object "stored" at each position in the
   595 structure. Such references consist of the usual context and reference pair.
   596 
   597 Lists, however, have a variable size and must be accessible via an unchanging
   598 location even as more memory is allocated elsewhere to accommodate the
   599 contents of the list. Consequently, the representation must resemble the
   600 following:
   601 
   602     Structure header for list (size == header plus special attribute)
   603     Special attribute referencing the underlying sequence
   604 
   605 The underlying sequence has a fixed size, like a tuple, but may contain fewer
   606 elements than the size of the sequence permits:
   607 
   608     Special header indicating the current size and allocated size
   609     Element
   610     ...             <-- current size
   611     (Unused space)
   612     ...             <-- allocated size
   613 
   614 This representation permits the allocation of a new sequence when space is
   615 exhausted in an existing sequence, with the new sequence address stored in the
   616 main list structure. Since access to the contents of the list must go through
   617 the main list structure, underlying allocation activities may take place
   618 without the users of a list having to be aware of such activities.