micropython

docs/concepts.txt

530:0f9aa4c4d92c
2012-06-09 Paul Boddie Added an __iter__ method for strings.
     1 Concepts
     2 ========
     3 
     4 This document describes the underlying concepts employed in micropython.
     5 
     6   * Namespaces and attribute definition
     7   * Contexts and values
     8   * Tables, attributes and lookups
     9   * Objects and structures
    10   * Parameters and lookups
    11   * Instantiation
    12   * Register usage
    13   * List and tuple representations
    14 
    15 Namespaces and Attribute Definition
    16 ===================================
    17 
    18 Namespaces are any objects which can retain attributes.
    19 
    20   * Module attributes are defined either at the module level or by global
    21     statements.
    22   * Class attributes are defined only within class statements.
    23   * Instance attributes are defined only by assignments to attributes of self.
    24 
    25 These restrictions apply because such attributes are thus explicitly declared,
    26 permitting the use of tables (described below). Module and class attributes
    27 can also be finalised in this way in order to permit certain optimisations.
    28 
    29 An additional restriction required for the current implementation of tables
    30 (as described below) applies to class definitions: each class must be defined
    31 using a unique name; repeated definition of classes having the same name is
    32 thus not permitted. This restriction arises from the use of the "full name" of
    33 a class as a key to the object table, where the full name is a qualified path
    34 via the module hierarchy ending with the name of the class.
    35 
    36 Rebinding of attributes outside classes and modules can be allowed if
    37 attribute usage observations are being used to detect such external
    38 modifications to such objects. Without such observations, such rebinding
    39 should be forbidden since apparently constant attributes might be modified in
    40 a running program, but code may have been generated that provides specific
    41 objects for those attributes under the assumption that they will not be
    42 changed.
    43 
    44 See rejected.txt for complicating mechanisms which could be applied to
    45 mitigate the effects of these restrictions on optimisations.
    46 
    47 Contexts and Values
    48 ===================
    49 
    50 Values are used as the common reference representation in micropython: as
    51 stored representations of attributes (of classes, instances, modules, and
    52 other objects supporting attribute-like entities) as well as the stored values
    53 associated with names in functions and methods.
    54 
    55 Unlike other implementations, micropython does not create things like bound
    56 method objects for individual instances. Instead, all objects are referenced
    57 using a context, reference pair:
    58 
    59 Value Layout
    60 ------------
    61 
    62     0           1
    63     context     object
    64     reference   reference
    65 
    66 Specific implementations might reverse this ordering for optimisation
    67 purposes.
    68 
    69 Rationale
    70 ---------
    71 
    72 To reduce the number of created objects whilst retaining the ability to
    73 support bound method invocations. The context indicates the context in which
    74 an invocation is performed, typically the owner of the method.
    75 
    76 Usage
    77 -----
    78 
    79 The context may be inserted as the first argument when a value is involved in
    80 an invocation. This argument may then be omitted from the invocation if its
    81 usage is not appropriate.
    82 
    83 See invocation.txt for details.
    84 
    85 Context Value Types
    86 -------------------
    87 
    88 The following types of context value exist:
    89 
    90     Type            Usage                           Transformations
    91     ----            -----                           ---------------
    92 
    93     Replaceable     With functions (not methods)    May be replaced with an
    94                                                     instance or a class when a
    95                                                     value is stored on an
    96                                                     instance or class
    97 
    98     Placeholder     With classes                    May not be replaced
    99 
   100     Instance        With instances (and constants)  May not be replaced
   101                     or functions as methods
   102 
   103     Class           With functions as methods       May be replaced when a
   104                                                     value is loaded from a
   105                                                     class attribute via an
   106                                                     instance
   107 
   108 Contexts in Acquired Values
   109 ---------------------------
   110 
   111 There are four classes of instructions which provide values:
   112 
   113     Instruction         Purpose                 Context Operations
   114     -----------         -------                 ------------------
   115 
   116 1)  LoadConst           Load module, constant   Use loaded object with itself
   117                                                 as context
   118 
   119 2)  LoadFunction        Load function           Combine replaceable context
   120                                                 with loaded object
   121 
   122 3)  LoadClass           Load class              Combine placeholder context
   123                                                 with loaded object
   124 
   125 4)  LoadAddress*        Load attribute from     Preserve or override stored
   126     LoadAttr*           class, module,          context (as described in
   127                         instance                assignment.txt)
   128 
   129 In order to comply with traditional Python behaviour, contexts may or may not
   130 represent the object from which an attribute has been acquired.
   131 
   132 See assignment.txt for details.
   133 
   134 Contexts in Stored Values
   135 -------------------------
   136 
   137 There are two classes of instruction for storing values:
   138 
   139     Instruction         Purpose                 Context Operations
   140     -----------         -------                 ------------------
   141 
   142 1)  StoreAddress        Store attribute in a    Preserve context; note that no
   143                         known object            test for class attribute
   144                                                 assignment should be necessary
   145                                                 since this instruction should only
   146                                                 be generated for module globals
   147 
   148     StoreAttr           Store attribute in an   Preserve context; note that no
   149                         instance                test for class attribute
   150                                                 assignment should be necessary
   151                                                 since this instruction should only
   152                                                 be generated for self accesses
   153 
   154     StoreAttrIndex      Store attribute in an   Preserve context; since the index
   155                         unknown object          lookup could yield a class
   156                                                 attribute, a test of the nature of
   157                                                 the nature of the structure is
   158                                                 necessary in order to prevent
   159                                                 assignments to classes
   160 
   161 2)  StoreAddressContext Store attribute in a    Override context if appropriate;
   162                         known object            if the value has a replaceable
   163                                                 context, permit the target to
   164                                                 take ownership of the value
   165 
   166 See assignment.txt for details.
   167 
   168 Tables, Attributes and Lookups
   169 ==============================
   170 
   171 Attribute lookups, where the exact location of an object attribute is deduced,
   172 are performed differently in micropython than in other implementations.
   173 Instead of providing attribute dictionaries, in which attributes are found,
   174 attributes are located at fixed places in object structures (described below)
   175 and their locations are stored using a special representation known as a
   176 table.
   177 
   178 For a given program, a table can be considered as being like a matrix mapping
   179 classes to attribute names. For example:
   180 
   181     class A:
   182         # instances have attributes x, y
   183 
   184     class B(A):
   185         # introduces attribute z for instances
   186 
   187     class C:
   188         # instances have attributes a, b, z
   189 
   190 This would provide the following table, referred to as an object table in the
   191 context of classes and instances:
   192 
   193     Class/attr      a   b   x   y   z
   194 
   195     A                       1   2
   196     B                       1   2   3
   197     C               1   2           3
   198 
   199 A limitation of this representation is that instance attributes may not shadow
   200 class attributes: if an attribute with a given name is not defined on an
   201 instance, an attribute with the same name cannot be provided by the class of
   202 the instance or any superclass of the instance's class. This impacts the
   203 provision of the __class__ attribute, as described below.
   204 
   205 The table can be compacted using a representation known as a displacement
   206 list (referred to as an object list in this context):
   207 
   208                 Classes with attribute offsets
   209 
   210     classcode   A
   211     attrcode    a   b   x   y   z
   212 
   213                         B
   214                         a   b   x   y   z
   215 
   216                                             C
   217                                             a   b   x   y   z
   218 
   219     List        .   .   1   2   1   2   3   1   2   .   .   3
   220 
   221 Here, the classcode refers to the offset in the list at which a class's
   222 attributes are defined, whereas the attrcode defines the offset within a
   223 region of attributes corresponding to a single attribute of a given name.
   224 
   225 Attribute Locations
   226 -------------------
   227 
   228 The locations stored in table/list elements are generally for instance
   229 attributes relative to the location of the instance, whereas those for class
   230 attributes and module attributes are generally absolute addresses. Thus, each
   231 occupied table cell has the following structure:
   232 
   233     attrcode, uses-absolute-address, address (or location)
   234 
   235 This could be given instead as follows:
   236 
   237     attrcode, is-class-or-module, location
   238 
   239 Since uses-absolute-address corresponds to is-class-or-module, and since there
   240 is a need to test for classes and modules to prevent assignment to attributes
   241 of such objects, this particular information is always required.
   242 
   243 The __class__ Attribute
   244 -----------------------
   245 
   246 In Python 2.x, at least with new-style classes, instances have __class__
   247 attributes which indicate the class from which they have been instantiated,
   248 whereas classes have __class__ attributes which reference the type class.
   249 With the object table, it is not possible to provide absolute addresses which
   250 can be used for both classes and instances, since this would result in classes
   251 and instances having the same class, and thus the class of a class would be
   252 the class itself.
   253 
   254 One solution is to use object-relative values in the table so that referencing
   255 the __class__ attribute of an instance produces a value which can be combined
   256 with an instance's address to yield the address of the attribute, which itself
   257 refers to the instance's class, whereas referencing the __class__ attribute of
   258 a class produces a similar object-relative value that is combined with the
   259 class's address to yield the address of the attribute, which itself refers to
   260 the special type class.
   261 
   262 Obviously, the above solution requires both classes and instances to retain an
   263 attribute location specifically to hold the value appropriate for each object
   264 type, whereas a scheme which omits the __class__ attribute on classes would be
   265 able to employ an absolute address in the table and maintain only a single
   266 address to refer to the class for all instances. The only problem with not
   267 providing a sensible __class__ attribute entry for classes would be the need
   268 for special treatment of __class__ to prevent inappropriate consultation of
   269 the table for classes.
   270 
   271 Comparing Tables as Matrices with Displacement Lists
   272 ----------------------------------------------------
   273 
   274 Although displacement lists can provide reasonable levels of compaction for
   275 attribute data, the element size is larger than that required for a simple
   276 matrix: the attribute code (attrcode) need not be stored since each element
   277 unambiguously refers to the availability of an attribute for a particular
   278 class or instance of that class, and so the data at a given element need not
   279 be tested for relevance to a given attribute access operation.
   280 
   281 Given a program with 20 object types and 100 attribute types, a matrix would
   282 occupy the following amount of space:
   283 
   284     number of object types * number of attribute types * element size
   285   = 20 * 100 * 1 (assuming that a single location is sufficient for an element)
   286   = 2000
   287 
   288 In contrast, given a compaction to 40% of the matrix size (without considering
   289 element size) in a displacement list, the amount of space would be as follows:
   290 
   291     number of elements * element size
   292   = 40% * (20 * 100) * 2 (assuming that one additional location is required)
   293   = 1600
   294 
   295 Consequently, the principal overhead of using a displacement list is likely to
   296 be in the need to check element relevance when retrieving values from such a
   297 list.
   298 
   299 Objects and Structures 
   300 ======================
   301 
   302 As well as references, micropython needs to have actual objects to refer to.
   303 Since classes, functions and instances are all objects, it is desirable that
   304 certain common features and operations are supported in the same way for all
   305 of these things. To permit this, a common data structure format is used.
   306 
   307     Header....................................................  Attributes.................
   308 
   309     Identifier  Identifier  Address     Identifier  Size        Object      ...
   310 
   311     0           1           2           3           4           5           6
   312     classcode   attrcode/   invocation  funccode    size        attribute   ...
   313                 instance    reference                           reference
   314                 status
   315 
   316 Classcode
   317 ---------
   318 
   319 Used in attribute lookup.
   320 
   321 Here, the classcode refers to the attribute lookup table for the object (as
   322 described above). Classes and instances share the same classcode, and their
   323 structures reflect this. Functions all belong to the same type and thus employ
   324 the classcode for the function built-in type, whereas modules have distinct
   325 types since they must support different sets of attributes.
   326 
   327 Attrcode
   328 --------
   329 
   330 Used to test instances for membership of classes (or descendants of classes).
   331 
   332 Since, in traditional Python, classes are only ever instances of some generic
   333 built-in type, support for testing such a relationship directly has been
   334 removed and the attrcode is not specified for classes: the presence of an
   335 attrcode indicates that a given object is an instance. In addition, support
   336 has also been removed for testing modules in the same way, meaning that the
   337 attrcode is also not specified for modules.
   338 
   339 See the "Testing Instance Compatibility with Classes (Attrcode)" section below
   340 for details of attrcodes.
   341 
   342 Invocation Reference
   343 --------------------
   344 
   345 Used when an object is called.
   346 
   347 This is the address of the code to be executed when an invocation is performed
   348 on the object.
   349 
   350 Funccode
   351 --------
   352 
   353 Used to look up argument positions by name.
   354 
   355 The strategy with keyword arguments in micropython is to attempt to position
   356 such arguments in the invocation frame as it is being constructed.
   357 
   358 See the "Parameters and Lookups" section for more information.
   359 
   360 Size
   361 ----
   362 
   363 Used to indicate the size of an object including attributes.
   364 
   365 Attributes
   366 ----------
   367 
   368 For classes, modules and instances, the attributes in the structure correspond
   369 to the attributes of each kind of object. For functions, however, the
   370 attributes in the structure correspond to the default arguments for each
   371 function, if any.
   372 
   373 Structure Types
   374 ---------------
   375 
   376 Class C:
   377 
   378     0           1           2           3           4           5           6
   379     classcode   (unused)    __new__     funccode    size        attribute   ...
   380     for C                   reference   for                     reference
   381                                         instantiator
   382 
   383 Instance of C:
   384 
   385     0           1           2           3           4           5           6
   386     classcode   attrcode    C.__call__  funccode    size        attribute   ...
   387     for C       for C       reference   for                     reference
   388                             (if exists) C.__call__
   389 
   390 Function f:
   391 
   392     0           1           2           3           4           5           6
   393     classcode   attrcode    code        funccode    size        attribute   ...
   394     for         for         reference                           (default)
   395     function    function                                        reference
   396 
   397 Module m:
   398 
   399     0           1           2           3           4           5           6
   400     classcode   attrcode    (unused)    (unused)    (unused)    attribute   ...
   401     for m       for m                                           (global)
   402                                                                 reference
   403 
   404 The __class__ Attribute
   405 -----------------------
   406 
   407 All objects should support the __class__ attribute, and in most cases this is
   408 done using the object table, yielding a common address for all instances of a
   409 given class.
   410 
   411 Function: refers to the function class
   412 Instance: refers to the class instantiated to make the object
   413 
   414 The object table cannot support two definitions simultaneously for both
   415 instances and their classes. Consequently, __class__ access on classes must be
   416 tested for and a special result returned.
   417 
   418 Class: refers to the type class (type.__class__ also refers to the type class)
   419 
   420 For convenience, the first attribute of a class will be the common __class__
   421 attribute for all its instances. As noted above, direct access to this
   422 attribute will not be possible for classes, and a constant result will be
   423 returned instead.
   424 
   425 Lists and Tuples
   426 ----------------
   427 
   428 The built-in list and tuple sequences employ variable length structures using
   429 the attribute locations to store their elements, where each element is a
   430 reference to a separately stored object.
   431 
   432 Testing Instance Compatibility with Classes (Attrcode)
   433 ------------------------------------------------------
   434 
   435 Although it would be possible to have a data structure mapping classes to
   436 compatible classes, such as a matrix indicating the subclasses (or
   437 superclasses) of each class, the need to retain the key to such a data
   438 structure for each class might introduce a noticeable overhead.
   439 
   440 Instead of having a separate structure, descendant classes of each class are
   441 inserted as special attributes into the object table. This requires an extra
   442 key to be retained, since each class must provide its own attribute code such
   443 that upon an instance/class compatibility test, the code may be obtained and
   444 used in the object table.
   445 
   446 Invocation and Code References
   447 ------------------------------
   448 
   449 Modules: there is no meaningful invocation reference since modules cannot be
   450 explicitly called.
   451 
   452 Functions: a simple code reference is employed pointing to code implementing
   453 the function. Note that the function locals are completely distinct from this
   454 structure and are not comparable to attributes. Instead, attributes are
   455 reserved for default parameter values, although they do not appear in the
   456 object table described above, appearing instead in a separate parameter table
   457 described below.
   458 
   459 Classes: given that classes must be invoked in order to create instances, a
   460 reference must be provided in class structures. However, this reference does
   461 not point directly at the __init__ method of the class. Instead, the
   462 referenced code belongs to a special initialiser function, __new__, consisting
   463 of the following instructions:
   464 
   465     create instance for C
   466     call C.__init__(instance, ...)
   467     return instance
   468 
   469 Instances: each instance employs a reference to any __call__ method defined in
   470 the class hierarchy for the instance, thus maintaining its callable nature.
   471 
   472 Both classes and modules may contain code in their definitions - the former in
   473 the "body" of the class, potentially defining attributes, and the latter as
   474 the "top-level" code in the module, potentially defining attributes/globals -
   475 but this code is not associated with any invocation target. It is thus
   476 generated in order of appearance and is not referenced externally.
   477 
   478 Invocation Operation
   479 --------------------
   480 
   481 Consequently, regardless of the object an invocation is always done as
   482 follows:
   483 
   484     get invocation reference from the header
   485     jump to reference
   486 
   487 Additional preparation is necessary before the above code: positional
   488 arguments must be saved in the invocation frame, and keyword arguments must be
   489 resolved and saved to the appropriate position in the invocation frame.
   490 
   491 See invocation.txt for details.
   492 
   493 Parameters and Lookups 
   494 ======================
   495 
   496 Since Python supports keyword arguments when making invocations, it becomes
   497 necessary to record the parameter names associated with each function or
   498 method. Just as object tables record attributes positions on classes and
   499 instances, parameter tables record parameter positions in function or method
   500 parameter lists.
   501 
   502 For a given program, a parameter table can be considered as being like a
   503 matrix mapping functions/methods to parameter names. For example:
   504 
   505     def f(x, y, z):
   506         pass
   507 
   508     def g(a, b, c):
   509         pass
   510 
   511     def h(a, x):
   512         pass
   513 
   514 This would provide the following table, referred to as a parameter table in
   515 the context of functions and methods:
   516 
   517     Function/param  a   b   c   x   y   z
   518 
   519     f                           1   2   3
   520     g               1   2   3
   521     h               1           2
   522 
   523 Confusion can occur when functions are adopted as methods, since the context
   524 then occupies the first slot in the invocation frame:
   525 
   526     def f(x, y, z):
   527         pass
   528 
   529     f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)
   530                      -> f(1, 2, 3)
   531 
   532     class C:
   533         f = f
   534 
   535         def g(x, y, z):
   536             pass
   537 
   538     c = C()
   539 
   540     c.f(y=2, z=3) -> f(<context>, 2, 3)
   541     c.g(y=2, z=3) -> C.g(<context>, 2, 3)
   542 
   543 Just as with parameter tables, a displacement list can be prepared from a
   544 parameter table:
   545 
   546                 Functions with parameter (attribute) offsets
   547 
   548     funccode    f
   549     attrcode    a   b   c   x   y   z
   550 
   551                                         g
   552                                         a   b   c   x   y   z
   553 
   554                                                     h
   555                                                     a   b   c   x   y   z
   556 
   557     List        .   .   .   1   2   3   1   2   3   1   .   .   2   .   .
   558 
   559 Here, the funccode refers to the offset in the list at which a function's
   560 parameters are defined, whereas the attrcode defines the offset within a
   561 region of attributes corresponding to a single parameter of a given name.
   562 
   563 Instantiation
   564 =============
   565 
   566 When instantiating classes, memory must be reserved for the header of the
   567 resulting instance, along with locations for the attributes of the instance.
   568 Since the instance header contains data common to all instances of a class, a
   569 template header is copied to the start of the newly reserved memory region.
   570 
   571 Register Usage
   572 ==============
   573 
   574 During code generation, much of the evaluation produces results which are
   575 implicitly recorded in the "active value" or "working" register, and various
   576 instructions will consume this active value. In addition, some instructions
   577 will consume a separate "active source value" from a register, typically those
   578 which are assigning the result of an expression to an assignment target.
   579 
   580 Since values often need to be retained for later use, a set of temporary
   581 storage locations are typically employed. However, optimisations may reduce
   582 the need to use such temporary storage where instructions which provide the
   583 "active value" can be re-executed and will produce the same result. Whether
   584 re-use of instructions is possible or not, values still need to be loaded into
   585 a "source" register to be accessed by an assignment instruction.
   586 
   587 RSVP instructions generally have the notion of working, target and source
   588 registers (see registers.txt). These register "roles" are independent from the
   589 actual registers defined for the RSVP machine itself, even though the naming
   590 is similar. Generally, instructions do regard the RSVP "working" registers as
   591 the class of register in the "working" role, although some occurrences of
   592 instruction usage may employ other registers (such as the "result" registers)
   593 and thus take advantage of the generality of the RSVP implementation of such
   594 instructions.
   595 
   596 List and Tuple Representations
   597 ==============================
   598 
   599 Since tuples have a fixed size, the representation of a tuple instance is
   600 merely a header describing the size of the entire object, together with a
   601 sequence of references to the object "stored" at each position in the
   602 structure. Such references consist of the usual context and reference pair.
   603 
   604 Lists, however, have a variable size and must be accessible via an unchanging
   605 location even as more memory is allocated elsewhere to accommodate the
   606 contents of the list. Consequently, the representation must resemble the
   607 following:
   608 
   609     Structure header for list (size == header plus special attribute)
   610     Special attribute referencing the underlying sequence
   611 
   612 The underlying sequence has a fixed size, like a tuple, but may contain fewer
   613 elements than the size of the sequence permits:
   614 
   615     Special header indicating the current size and allocated size
   616     Element
   617     ...             <-- current size
   618     (Unused space)
   619     ...             <-- allocated size
   620 
   621 This representation permits the allocation of a new sequence when space is
   622 exhausted in an existing sequence, with the new sequence address stored in the
   623 main list structure. Since access to the contents of the list must go through
   624 the main list structure, underlying allocation activities may take place
   625 without the users of a list having to be aware of such activities.