micropython (annotate docs/concepts.txt in 18c62608bfa6)

Concepts

paul@199

2

========

This document describes the underlying concepts employed in micropython.

  * Namespaces and attribute definition

paul@199

7

  * Contexts and values

paul@200

8

  * Tables, attributes and lookups

paul@199

9

  * Objects and structures

paul@200

10

  * Parameters and lookups

paul@200

11

  * Instantiation

paul@222

12

  * Register usage

paul@245

13

  * List and tuple representations

Namespaces and Attribute Definition

paul@201

16

===================================

Namespaces are any objects which can retain attributes.

  * Module attributes are defined either at the module level or by global

paul@201

21

    statements.

paul@201

22

  * Class attributes are defined only within class statements.

paul@201

23

  * Instance attributes are defined only by assignments to attributes of self

paul@201

24

    within __init__ methods.

These restrictions apply because such attributes are thus explicitly declared,

paul@201

27

permitting the use of tables (described below). Module and class attributes

paul@201

28

can also be finalised in this way in order to permit certain optimisations.

An additional restriction required for the current implementation of tables

paul@243

31

(as described below) applies to class definitions: each class must be defined

paul@243

32

using a unique name; repeated definition of classes having the same name is

paul@243

33

thus not permitted. This restriction arises from the use of the "full name" of

paul@243

34

a class as a key to the object table, where the full name is a qualified path

paul@243

35

via the module hierarchy ending with the name of the class.

See rejected.txt for complicating mechanisms which could be applied to

paul@201

38

mitigate the effects of these restrictions on optimisations.

Contexts and Values

paul@199

41

===================

Values are used as the common reference representation in micropython: as

paul@199

44

stored representations of attributes (of classes, instances, modules, and

paul@199

45

other objects supporting attribute-like entities) as well as the stored values

paul@199

46

associated with names in functions and methods.

Unlike other implementations, micropython does not create things like bound

paul@199

49

method objects for individual instances. Instead, all objects are referenced

paul@199

50

using a context, reference pair:

Value Layout

paul@199

53

------------

    0           1

paul@199

56

    context     object

paul@199

57

    reference   reference

Specific implementations might reverse this ordering for optimisation

paul@199

60

purposes.

Rationale

paul@199

63

---------

To reduce the number of created objects whilst retaining the ability to

paul@199

66

support bound method invocations. The context indicates the context in which

paul@199

67

an invocation is performed, typically the owner of the method.

Usage

paul@199

70

-----

The context may be inserted as the first argument when a value is involved in

paul@199

73

an invocation. This argument may then be omitted from the invocation if its

paul@199

74

usage is not appropriate.

See invocation.txt for details.

Context Value Types

paul@237

79

-------------------

The following types of context value exist:

    Type            Usage                           Transformations

paul@237

84

    ----            -----                           ---------------

    Replaceable     With functions (not methods)    May be replaced with an

paul@237

87

                                                    instance or a class when a

paul@237

88

                                                    value is stored on an

paul@237

89

                                                    instance or class

    Placeholder     With classes                    May not be replaced

    Instance        With instances (and constants)  May not be replaced

paul@237

94

                    or functions as methods

    Class           With functions as methods       May be replaced when a

paul@237

97

                                                    value is loaded from a

paul@237

98

                                                    class attribute via an

paul@237

99

                                                    instance

Contexts in Acquired Values

paul@199

102

---------------------------

There are four classes of instructions which provide values:

    Instruction         Purpose                 Context Operations

paul@199

107

    -----------         -------                 ------------------

1)  LoadConst           Load module, constant   Use loaded object with itself

paul@237

110

                                                as context

2)  LoadFunction        Load function           Combine replaceable context

paul@237

113

                                                with loaded object

3)  LoadClass           Load class              Combine placeholder context

paul@237

116

                                                with loaded object

4)  LoadAddress*        Load attribute from     Preserve or override stored

paul@201

119

    LoadAttr*           class, module,          context (as described in

paul@201

120

                        instance                assignment.txt)

In order to comply with traditional Python behaviour, contexts may or may not

paul@199

123

represent the object from which an attribute has been acquired.

See assignment.txt for details.

Contexts in Stored Values

paul@199

128

-------------------------

There are two classes of instruction for storing values:

    Instruction         Purpose                 Context Operations

paul@223

133

    -----------         -------                 ------------------

1)  StoreAddress        Store attribute in a    Preserve context; note that no

paul@223

136

                        known object            test for class attribute

paul@223

137

                                                assignment should be necessary

paul@223

138

                                                since this instruction should only

paul@223

139

                                                be generated for module globals

    StoreAttr           Store attribute in an   Preserve context; note that no

paul@223

142

                        instance                test for class attribute

paul@223

143

                                                assignment should be necessary

paul@223

144

                                                since this instruction should only

paul@223

145

                                                be generated for self accesses

    StoreAttrIndex      Store attribute in an   Preserve context; since the index

paul@223

148

                        unknown object          lookup could yield a class

paul@223

149

                                                attribute, a test of the nature of

paul@223

150

                                                the nature of the structure is

paul@223

151

                                                necessary in order to prevent

paul@223

152

                                                assignments to classes

2)  StoreAddressContext Store attribute in a    Override context if appropriate;

paul@237

155

                        known object            if the value has a replaceable

paul@237

156

                                                context, permit the target to

paul@237

157

                                                take ownership of the value

See assignment.txt for details.

Tables, Attributes and Lookups

paul@200

162

==============================

Attribute lookups, where the exact location of an object attribute is deduced,

paul@199

165

are performed differently in micropython than in other implementations.

paul@199

166

Instead of providing attribute dictionaries, in which attributes are found,

paul@199

167

attributes are located at fixed places in object structures (described below)

paul@199

168

and their locations are stored using a special representation known as a

paul@199

169

table.

For a given program, a table can be considered as being like a matrix mapping

paul@199

172

classes to attribute names. For example:

    class A:

paul@200

175

        # instances have attributes x, y

    class B(A):

paul@200

178

        # introduces attribute z for instances

    class C:

paul@200

181

        # instances have attributes a, b, z

This would provide the following table, referred to as an object table in the

paul@200

184

context of classes and instances:

    Class/attr      a   b   x   y   z

    A                       1   2

paul@199

189

    B                       1   2   3

paul@199

190

    C               1   2           3

A limitation of this representation is that instance attributes may not shadow

paul@199

193

class attributes: if an attribute with a given name is not defined on an

paul@199

194

instance, an attribute with the same name cannot be provided by the class of

paul@401

195

the instance or any superclass of the instance's class. This impacts the

paul@401

196

provision of the __class__ attribute, as described below.

The table can be compacted using a representation known as a displacement

paul@200

199

list (referred to as an object list in this context):

                Classes with attribute offsets

    classcode   A

paul@199

204

    attrcode    a   b   x   y   z

                        a   b   x   y   z

                                            a   b   x   y   z

    List        .   .   1   2   1   2   3   1   2   .   .   3

Here, the classcode refers to the offset in the list at which a class's

paul@199

215

attributes are defined, whereas the attrcode defines the offset within a

paul@199

216

region of attributes corresponding to a single attribute of a given name.

Attribute Locations

paul@200

219

-------------------

The locations stored in table/list elements are generally for instance

paul@394

222

attributes relative to the location of the instance, whereas those for class

paul@394

223

attributes and module attributes are generally absolute addresses. Thus, each

paul@394

224

occupied table cell has the following structure:

    attrcode, uses-absolute-address, address (or location)

This could be given instead as follows:

    attrcode, is-class-or-module, location

Since uses-absolute-address corresponds to is-class-or-module, and since there

paul@247

233

is a need to test for classes and modules to prevent assignment to attributes

paul@247

234

of such objects, this particular information is always required.

The __class__ Attribute

paul@394

237

-----------------------

In Python 2.x, at least with new-style classes, instances have __class__

paul@401

240

attributes which indicate the class from which they have been instantiated,

paul@401

241

whereas classes have __class__ attributes which reference the type class.

paul@401

242

With the object table, it is not possible to provide absolute addresses which

paul@401

243

can be used for both classes and instances, since this would result in classes

paul@401

244

and instances having the same class, and thus the class of a class would be

paul@401

245

the class itself.

One solution is to use object-relative values in the table so that referencing

paul@401

248

the __class__ attribute of an instance produces a value which can be combined

paul@401

249

with an instance's address to yield the address of the attribute, which itself

paul@401

250

refers to the instance's class, whereas referencing the __class__ attribute of

paul@401

251

a class produces a similar object-relative value that is combined with the

paul@401

252

class's address to yield the address of the attribute, which itself refers to

paul@401

253

the special type class.

Obviously, the above solution requires both classes and instances to retain an

paul@401

256

attribute location specifically to hold the value appropriate for each object

paul@401

257

type, whereas a scheme which omits the __class__ attribute on classes would be

paul@401

258

able to employ an absolute address in the table and maintain only a single

paul@401

259

address to refer to the class for all instances. The only problem with not

paul@401

260

providing a sensible __class__ attribute entry for classes would be the need

paul@401

261

for special treatment of __class__ to prevent inappropriate consultation of

paul@401

262

the table for classes.

Comparing Tables as Matrices with Displacement Lists

paul@247

265

----------------------------------------------------

Although displacement lists can provide reasonable levels of compaction for

paul@247

268

attribute data, the element size is larger than that required for a simple

paul@247

269

matrix: the attribute code (attrcode) need not be stored since each element

paul@247

270

unambiguously refers to the availability of an attribute for a particular

paul@247

271

class or instance of that class, and so the data at a given element need not

paul@247

272

be tested for relevance to a given attribute access operation.

Given a program with 20 object types and 100 attribute types, a matrix would

paul@247

275

occupy the following amount of space:

    number of object types * number of attribute types * element size

paul@247

278

  = 20 * 100 * 1 (assuming that a single location is sufficient for an element)

paul@247

279

  = 2000

In contrast, given a compaction to 40% of the matrix size (without considering

paul@247

282

element size) in a displacement list, the amount of space would be as follows:

    number of elements * element size

paul@247

285

  = 40% * (20 * 100) * 2 (assuming that one additional location is required)

paul@247

286

  = 1600

Consequently, the principal overhead of using a displacement list is likely to

paul@247

289

be in the need to check element relevance when retrieving values from such a

paul@247

290

list.

Objects and Structures

paul@199

293

======================

As well as references, micropython needs to have actual objects to refer to.

paul@199

296

Since classes, functions and instances are all objects, it is desirable that

paul@199

297

certain common features and operations are supported in the same way for all

paul@199

298

of these things. To permit this, a common data structure format is used.

    Header....................................................  Attributes.................

    Identifier  Identifier  Address     Identifier  Size        Object      ...

    0           1           2           3           4           5           6

paul@401

305

    classcode   attrcode/   invocation  funccode    size        attribute   ...

paul@401

306

                instance    reference                           reference

paul@215

307

                status

Classcode

paul@206

310

---------

Used in attribute lookup.

Here, the classcode refers to the attribute lookup table for the object (as

paul@200

315

described above). Classes and instances share the same classcode, and their

paul@200

316

structures reflect this. Functions all belong to the same type and thus employ

paul@200

317

the classcode for the function built-in type, whereas modules have distinct

paul@200

318

types since they must support different sets of attributes.

Attrcode

paul@206

321

--------

Used to test instances for membership of classes (or descendants of classes).

Since, in traditional Python, classes are only ever instances of some generic

paul@242

326

built-in type, support for testing such a relationship directly has been

paul@207

327

removed and the attrcode is not specified for classes: the presence of an

paul@242

328

attrcode indicates that a given object is an instance. In addition, support

paul@242

329

has also been removed for testing modules in the same way, meaning that the

paul@242

330

attrcode is also not specified for modules.

See the "Testing Instance Compatibility with Classes (Attrcode)" section below

paul@215

333

for details of attrcodes.

Invocation Reference

paul@213

336

--------------------

Used when an object is called.

This is the address of the code to be executed when an invocation is performed

paul@213

341

on the object.

Funccode

paul@215

344

--------

Used to look up argument positions by name.

The strategy with keyword arguments in micropython is to attempt to position

paul@215

349

such arguments in the invocation frame as it is being constructed.

See the "Parameters and Lookups" section for more information.

Size

paul@215

354

----

Used to indicate the size of an object including attributes.

Attributes

paul@209

359

----------

For classes, modules and instances, the attributes in the structure correspond

paul@209

362

to the attributes of each kind of object. For functions, however, the

paul@209

363

attributes in the structure correspond to the default arguments for each

paul@209

364

function, if any.

Structure Types

paul@206

367

---------------

Class C:

    0           1           2           3           4           5           6

paul@401

372

    classcode   (unused)    __new__     funccode    size        attribute   ...

paul@401

373

    for C                   reference   for                     reference

paul@215

374

                                        instantiator

Instance of C:

    0           1           2           3           4           5           6

paul@401

379

    classcode   attrcode    C.__call__  funccode    size        attribute   ...

paul@401

380

    for C       for C       reference   for                     reference

paul@215

381

                            (if exists) C.__call__

Function f:

    0           1           2           3           4           5           6

paul@401

386

    classcode   attrcode    code        funccode    size        attribute   ...

paul@401

387

    for         for         reference                           (default)

paul@401

388

    function    function                                        reference

Module m:

    0           1           2           3           4           5           6

paul@401

393

    classcode   attrcode    (unused)    (unused)    (unused)    attribute   ...

paul@401

394

    for m       for m                                           (global)

paul@401

395

                                                                reference

The __class__ Attribute

paul@200

398

-----------------------

All objects should support the __class__ attribute, and in most cases this is

paul@401

401

done using the object table, yielding a common address for all instances of a

paul@401

402

given class.

Function: refers to the function class

paul@401

405

Instance: refers to the class instantiated to make the object

The object table cannot support two definitions simultaneously for both

paul@401

408

instances and their classes. Consequently, __class__ access on classes must be

paul@401

409

tested for and a special result returned.

Class: refers to the type class (type.__class__ also refers to the type class)

For convenience, the first attribute of a class will be the common __class__

paul@401

414

attribute for all its instances. As noted above, direct access to this

paul@401

415

attribute will not be possible for classes, and a constant result will be

paul@401

416

returned instead.

Lists and Tuples

paul@203

419

----------------

The built-in list and tuple sequences employ variable length structures using

paul@203

422

the attribute locations to store their elements, where each element is a

paul@203

423

reference to a separately stored object.

Testing Instance Compatibility with Classes (Attrcode)

paul@200

426

------------------------------------------------------

Although it would be possible to have a data structure mapping classes to

paul@200

429

compatible classes, such as a matrix indicating the subclasses (or

paul@200

430

superclasses) of each class, the need to retain the key to such a data

paul@200

431

structure for each class might introduce a noticeable overhead.

Instead of having a separate structure, descendant classes of each class are

paul@200

434

inserted as special attributes into the object table. This requires an extra

paul@200

435

key to be retained, since each class must provide its own attribute code such

paul@200

436

that upon an instance/class compatibility test, the code may be obtained and

paul@200

437

used in the object table.

Invocation and Code References

paul@200

440

------------------------------

Modules: there is no meaningful invocation reference since modules cannot be

paul@200

443

explicitly called.

Functions: a simple code reference is employed pointing to code implementing

paul@200

446

the function. Note that the function locals are completely distinct from this

paul@200

447

structure and are not comparable to attributes. Instead, attributes are

paul@200

448

reserved for default parameter values, although they do not appear in the

paul@200

449

object table described above, appearing instead in a separate parameter table

paul@200

450

described below.

Classes: given that classes must be invoked in order to create instances, a

paul@200

453

reference must be provided in class structures. However, this reference does

paul@200

454

not point directly at the __init__ method of the class. Instead, the

paul@200

455

referenced code belongs to a special initialiser function, __new__, consisting

paul@200

456

of the following instructions:

    create instance for C

paul@200

459

    call C.__init__(instance, ...)

paul@200

460

    return instance

Instances: each instance employs a reference to any __call__ method defined in

paul@200

463

the class hierarchy for the instance, thus maintaining its callable nature.

Both classes and modules may contain code in their definitions - the former in

paul@200

466

the "body" of the class, potentially defining attributes, and the latter as

paul@200

467

the "top-level" code in the module, potentially defining attributes/globals -

paul@200

468

but this code is not associated with any invocation target. It is thus

paul@200

469

generated in order of appearance and is not referenced externally.

Invocation Operation

paul@200

472

--------------------

Consequently, regardless of the object an invocation is always done as

paul@200

475

follows:

    get invocation reference from the header

paul@200

478

    jump to reference

Additional preparation is necessary before the above code: positional

paul@200

481

arguments must be saved in the invocation frame, and keyword arguments must be

paul@200

482

resolved and saved to the appropriate position in the invocation frame.

See invocation.txt for details.

Parameters and Lookups

paul@200

487

======================

Since Python supports keyword arguments when making invocations, it becomes

paul@200

490

necessary to record the parameter names associated with each function or

paul@200

491

method. Just as object tables record attributes positions on classes and

paul@200

492

instances, parameter tables record parameter positions in function or method

paul@200

493

parameter lists.

For a given program, a parameter table can be considered as being like a

paul@200

496

matrix mapping functions/methods to parameter names. For example:

    def f(x, y, z):

paul@200

499

        pass

    def g(a, b, c):

paul@200

502

        pass

    def h(a, x):

paul@200

505

        pass

This would provide the following table, referred to as a parameter table in

paul@200

508

the context of functions and methods:

    Function/param  a   b   c   x   y   z

    f                           1   2   3

paul@200

513

    g               1   2   3

paul@200

514

    h               1           2

Confusion can occur when functions are adopted as methods, since the context

paul@233

517

then occupies the first slot in the invocation frame:

    def f(x, y, z):

paul@233

520

        pass

    f(x=1, y=2, z=3) -> f(<context>, 1, 2, 3)

paul@233

523

                     -> f(1, 2, 3)

    class C:

paul@233

526

        f = f

        def g(x, y, z):

paul@233

529

            pass

    c = C()

    c.f(y=2, z=3) -> f(<context>, 2, 3)

paul@233

534

    c.g(y=2, z=3) -> C.g(<context>, 2, 3)

Just as with parameter tables, a displacement list can be prepared from a

paul@200

537

parameter table:

                Functions with parameter (attribute) offsets

    funccode    f

paul@200

542

    attrcode    a   b   c   x   y   z

                                        a   b   c   x   y   z

                                                    a   b   c   x   y   z

    List        .   .   .   1   2   3   1   2   3   1   .   .   2   .   .

Here, the funccode refers to the offset in the list at which a function's

paul@200

553

parameters are defined, whereas the attrcode defines the offset within a

paul@200

554

region of attributes corresponding to a single parameter of a given name.

Instantiation

paul@200

557

=============

When instantiating classes, memory must be reserved for the header of the

paul@200

560

resulting instance, along with locations for the attributes of the instance.

paul@200

561

Since the instance header contains data common to all instances of a class, a

paul@200

562

template header is copied to the start of the newly reserved memory region.

Register Usage

paul@222

565

==============

During code generation, much of the evaluation produces results which are

paul@222

568

implicitly recorded in the "active value" register, and various instructions

paul@222

569

will consume the active value. In addition, some instructions will consume a

paul@222

570

separate "active source value" from a register, typically those which are

paul@222

571

assigning the result of an expression to an assignment target.

Since values often need to be retained for later use, a set of temporary

paul@222

574

storage locations are typically employed. However, optimisations may reduce

paul@222

575

the need to use such temporary storage where instructions which provide the

paul@222

576

"active value" can be re-executed and will produce the same result.

List and Tuple Representations

paul@245

579

==============================

Since tuples have a fixed size, the representation of a tuple instance is

paul@245

582

merely a header describing the size of the entire object, together with a

paul@245

583

sequence of references to the object "stored" at each position in the

paul@245

584

structure. Such references consist of the usual context and reference pair.

Lists, however, have a variable size and must be accessible via an unchanging

paul@245

587

location even as more memory is allocated elsewhere to accommodate the

paul@245

588

contents of the list. Consequently, the representation must resemble the

paul@245

589

following:

    Structure header for list (size == header plus special attribute)

paul@245

592

    Special attribute referencing the underlying sequence

The underlying sequence has a fixed size, like a tuple, but may contain fewer

paul@245

595

elements than the size of the sequence permits:

    Special header indicating the current size and allocated size

paul@245

598

    Element

paul@245

599

    ...             <-- current size

paul@245

600

    (Unused space)

paul@245

601

    ...             <-- allocated size

This representation permits the allocation of a new sequence when space is

paul@245

604

exhausted in an existing sequence, with the new sequence address stored in the

paul@245

605

main list structure. Since access to the contents of the list must go through

paul@245

606

the main list structure, underlying allocation activities may take place

paul@245

607

without the users of a list having to be aware of such activities.

micropython

Annotated docs/concepts.txt