Lichen

docs/wiki/Design

932:c07b0dd14f85
2021-06-28 Paul Boddie Moved integer instantiation support to library functions.
     1 = Design Decisions =     2      3 The Lichen language design involves some different choices to those taken in     4 Python's design. Many of these choices are motivated by the following     5 criteria:     6      7  * To simplify the language and to make what programs do easier to understand     8    and to predict     9  * To make analysis of programs easier, particularly    10    [[../Deduction|deductions]] about the nature of the code    11  * To simplify and otherwise reduce the [[../Representations|representations]]    12    employed and the operations performed at run-time    13     14 Lichen is in many ways a restricted form of Python. In particular,    15 restrictions on the attribute names supported by each object help to clearly    16 define the object types in a program, allowing us to identify those objects    17 when they are used. Consequently, optimisations that can be employed in a    18 Lichen program become possible in situations where they would have been    19 difficult or demanding to employ in a Python program.    20     21 Some design choices evoke memories of earlier forms of Python. Removing nested    22 scopes simplifies the [[../Inspection|inspection]] of programs and run-time    23 [[../Representations|representations]] and mechanisms. Other choices seek to    24 remedy difficult or defective aspects of Python, notably the behaviour of    25 Python's [[../Imports|import]] system.    26     27 <<TableOfContents(2,3)>>    28     29 == Attributes ==    30     31 {{{#!table    32 '''Lichen''' || '''Python''' || '''Rationale'''    33 ==    34 Objects have a fixed set of attribute names    35 || Objects can gain and lose attributes at run-time    36 || Having fixed sets of attributes helps identify object types    37 ==    38 Instance attributes may not shadow class attributes    39 || Instance attributes may shadow class attributes    40 || Forbidding shadowing simplifies access operations    41 ==    42 Attributes are simple members of object structures    43 || Dynamic handling and computation of attributes is supported    44 || Forbidding dynamic attributes simplifies access operations    45 }}}    46     47 === Fixed Attribute Names ===    48     49 Attribute names are bound for classes through assignment in the class    50 namespace, for modules in the module namespace, and for instances in methods    51 through assignment to `self`. Class and instance attributes are propagated to    52 descendant classes and instances of descendant classes respectively. Once    53 bound, attributes can be modified, but new attributes cannot be bound by other    54 means, such as the assignment of an attribute to an arbitrary object that    55 would not already support such an attribute.    56     57 {{{#!python numbers=disable    58 class C:    59     a = 123    60     def __init__(self):    61         self.x = 234    62     63 C.b = 456 # not allowed (b not bound in C)    64 C().y = 567 # not allowed (y not bound for C instances)    65 }}}    66     67 Permitting the addition of attributes to objects would then require that such    68 addition attempts be associated with particular objects, leading to a    69 potentially iterative process involving object type deduction and    70 modification, also causing imprecise results.    71     72 === No Shadowing ===    73     74 Instances may not define attributes that are provided by classes.    75     76 {{{#!python numbers=disable    77 class C:    78     a = 123    79     def shadow(self):    80         self.a = 234 # not allowed (attribute shadows class attribute)    81 }}}    82     83 Permitting this would oblige instances to support attributes that, when    84 missing, are provided by consulting their classes but, when not missing, may    85 also be provided directly by the instances themselves.    86     87 === No Dynamic Attributes ===    88     89 Instance attributes cannot be provided dynamically, such that any missing    90 attribute would be supplied by a special method call to determine the    91 attribute's presence and to retrieve its value.    92     93 {{{#!python numbers=disable    94 class C:    95     def __getattr__(self, name): # not supported    96         if name == "missing":    97             return 123    98 }}}    99    100 Permitting this would require object types to potentially support any   101 attribute, undermining attempts to use attributes to identify objects.   102    103 == Naming ==   104    105 {{{#!table   106 '''Lichen''' || '''Python''' || '''Rationale'''   107 ==   108 Names may be local, global or built-in: nested namespaces must be initialised   109 explicitly   110 || Names may also be non-local, permitting closures   111 || Limited name scoping simplifies program inspection and run-time mechanisms   112 ==   113 `self` is a reserved name and is optional in method parameter lists   114 || `self` is a naming convention, but the first method parameter must always   115 .. refer to the accessed object   116 || Reserving `self` assists deduction; making it optional is a consequence of   117 .. the method binding behaviour   118 ==   119 Instance attributes can be initialised using `.name` parameter notation   120 || [[https://stackoverflow.com/questions/1389180/automatically-initialize-instance-variables|Workarounds]]   121 .. involving decorators and introspection are required for similar brevity   122 || Initialiser notation eliminates duplication in program code and is convenient   123 }}}   124    125 === Traditional Local, Global and Built-In Scopes Only ===   126    127 Namespaces reside within a hierarchy within modules: classes containing   128 classes or functions; functions containing other functions. Built-in names are   129 exposed in all namespaces, global names are defined at the module level and   130 are exposed in all namespaces within the module, locals are confined to the   131 namespace in which they are defined.   132    133 However, locals are not inherited by namespaces from surrounding or enclosing   134 namespaces.   135    136 {{{#!python numbers=disable   137 def f(x):   138     def g(y):   139         return x + y # not permitted: x is not inherited from f in Lichen (it is in Python)   140     return g   141    142 def h(x):   143     def i(y, x=x): # x is initialised but held in the namespace of i   144         return x + y # succeeds: x is defined   145     return i   146 }}}   147    148 Needing to access outer namespaces in order to access any referenced names   149 complicates the way in which such dynamic namespaces would need to be managed.   150 Although the default initialisation technique demonstrated above could be   151 automated, explicit initialisation makes programs easier to follow and avoids   152 mistakes involving globals having the same name.   153    154 === Reserved Self ===   155    156 The `self` name can be omitted in method signatures, but in methods it is   157 always initialised to the instance on which the method is operating.   158    159 {{{#!python numbers=disable   160 class C:   161     def f(y): # y is not the instance   162         self.x = y # self is the instance   163 }}}   164    165 The assumption in methods is that `self` must always be referring to an   166 instance of the containing class or of a descendant class. This means that   167 `self` cannot be initialised to another kind of value, which Python permits   168 through the explicit invocation of a method with the inclusion of the affected   169 instance as the first argument. Consequently, `self` becomes optional in the   170 signature because it is not assigned in the same way as the other parameters.   171    172 === Instance Attribute Initialisers ===   173    174 In parameter lists, a special notation can be used to indicate that the given   175 name is an instance attribute that will be assigned the argument value   176 corresponding to the parameter concerned.   177    178 {{{#!python numbers=disable   179 class C:   180     def f(self, .a, .b, c): # .a and .b indicate instance attributes   181         self.c = c # a traditional assignment using a parameter   182 }}}   183    184 To use the notation, such dot-qualified parameters must appear only in the   185 parameter lists of methods, not plain functions. The qualified parameters are   186 represented as locals having the same name, and assignments to the   187 corresponding instance attributes are inserted into the generated code.   188    189 {{{#!python numbers=disable   190 class C:   191     def f1(self, .a, .b): # equivalent to f2, below   192         pass   193    194     def f2(self, a, b):   195         self.a = a   196         self.b = b   197    198     def g(self, .a, .b, a): # not permitted: a appears twice   199         pass   200 }}}   201    202 Naturally, `self`, being a reserved name in methods, can also be omitted from   203 such parameter lists. Moreover, such initialising parameters can have default   204 values.   205    206 {{{#!python numbers=disable   207 class C:   208     def __init__(.a=1, .b=2):   209         pass   210    211 c1 = C()   212 c2 = C(3, 4)   213 print c1.a, c1.b # 1 2   214 print c2.a, c2.b # 3 4   215 }}}   216    217 == Inheritance and Binding ==   218    219 {{{#!table   220 '''Lichen''' || '''Python''' || '''Rationale'''   221 ==   222 Class attributes are propagated to class hierarchy members during   223 initialisation: rebinding class attributes does not affect descendant class   224 attributes   225 || Class attributes are propagated live to class hierarchy members and must be   226 .. looked up by the run-time system if not provided by a given class   227 || Initialisation-time propagation simplifies access operations and attribute   228 .. table storage   229 ==   230 Unbound methods must be bound using a special function taking an instance   231 || Unbound methods may be called using an instance as first argument   232 || Forbidding instances as first arguments simplifies the invocation mechanism   233 ==   234 Functions assigned to class attributes do not become unbound methods   235 || Functions assigned to class attributes become unbound methods   236 || Removing method assignment simplifies deduction: methods are always defined   237 .. in place   238 ==   239 Base classes must be well-defined   240 || Base classes may be expressions   241 || Well-defined base classes are required to establish a well-defined   242 .. hierarchy of types   243 ==   244 Classes may not be defined in functions   245 || Classes may be defined in any kind of namespace   246 || Forbidding classes in functions prevents the definition of countless class   247 .. variants that are awkward to analyse   248 }}}   249    250 === Inherited Class Attributes ===   251    252 Class attributes that are changed for a class do not change for that class's   253 descendants.   254    255 {{{#!python numbers=disable   256 class C:   257     a = 123   258    259 class D(C):   260     pass   261    262 C.a = 456   263 print D.a # remains 123 in Lichen, becomes 456 in Python   264 }}}   265    266 Permitting this requires indirection for all class attributes, requiring them   267 to be treated differently from other kinds of attributes. Meanwhile, class   268 attribute rebinding and the accessing of inherited attributes changed in this   269 way is relatively rare.   270    271 === Unbound Methods ===   272    273 Methods are defined on classes but are only available via instances: they are   274 instance methods. Consequently, acquiring a method directly from a class and   275 then invoking it should fail because the method will be unbound: the "context"   276 of the method is not an instance. Furthermore, the Python technique of   277 supplying an instance as the first argument in an invocation to bind the   278 method to an instance, thus setting the context of the method, is not   279 supported. See [[#Reserved Self|"Reserved Self"]] for more information.   280    281 {{{#!python numbers=disable   282 class C:   283     def f(self, x):   284         self.x = x   285     def g(self):   286         C.f(123) # not permitted: C is not an instance   287         C.f(self, 123) # not permitted: self cannot be specified in the argument list   288         get_using(C.f, self)(123) # binds C.f to self, then the result is called   289 }}}   290    291 Binding methods to instances occurs when acquiring methods via instances or   292 explicitly using the `get_using` built-in. The built-in checks the   293 compatibility of the supplied method and instance. If compatible, it provides   294 the bound method as its result.   295    296 Normal functions are callable without any further preparation, whereas unbound   297 methods need the binding step to be performed and are not immediately   298 callable. Were functions to become unbound methods upon assignment to a class   299 attribute, they would need to be invalidated by having the preparation   300 mechanism enabled on them. However, this invalidation would only be relevant   301 to the specific case of assigning functions to classes and this would need to   302 be tested for. Given the added complications, such functionality is arguably   303 not worth supporting.   304    305 === Assigning Functions to Class Attributes ===   306    307 Functions can be assigned to class attributes but do not become unbound   308 methods as a result.   309    310 {{{#!python numbers=disable   311 class C:   312     def f(self): # will be replaced   313         return 234   314    315 def f(self):   316     return self   317    318 C.f = f # makes C.f a function, not a method   319 C().f() # not permitted: f requires an explicit argument   320 C().f(123) # permitted: f has merely been exposed via C.f   321 }}}   322    323 Methods are identified as such by their definition location, they contribute   324 information about attributes to the class hierarchy, and they employ certain   325 structure details at run-time to permit the binding of methods. Since   326 functions can defined in arbitrary locations, no class hierarchy information   327 is available, and a function could combine `self` with a range of attributes   328 that are not compatible with any class to which the function might be   329 assigned.   330    331 === Well-Defined Base Classes ===   332    333 Base classes must be clearly identifiable as well-defined classes. This   334 facilitates the cataloguing of program objects and further analysis on them.   335    336 {{{#!python numbers=disable   337 class C:   338     x = 123   339    340 def f():   341     return C   342    343 class D(f()): # not permitted: f could return anything   344     pass   345 }}}   346    347 If base class identification could only be done reliably at run-time, class   348 relationship information would be very limited without running the program or   349 performing costly and potentially unreliable analysis. Indeed, programs   350 employing such dynamic base classes are arguably resistant to analysis, which   351 is contrary to the goals of a language like Lichen.   352    353 === Class Definitions and Functions ===   354    355 Classes may not be defined in functions because functions provide dynamic   356 namespaces, but Lichen relies on a static namespace hierarchy in order to   357 clearly identify the principal objects in a program. If classes could be   358 defined in functions, despite seemingly providing the same class over and over   359 again on every invocation, a family of classes would, in fact, be defined.   360    361 {{{#!python numbers=disable   362 def f(x):   363     class C: # not permitted: this describes one of potentially many classes   364         y = x   365     return f   366 }}}   367    368 Moreover, issues of namespace nesting also arise, since the motivation for   369 defining classes in functions would surely be to take advantage of local state   370 to parameterise such classes.   371    372 == Modules and Packages ==   373    374 {{{#!table   375 '''Lichen''' || '''Python''' || '''Rationale'''   376 ==   377 Modules are independent: package hierarchies are not traversed when importing   378 || Modules exist in hierarchical namespaces: package roots must be imported   379 .. before importing specific submodules   380 || Eliminating module traversal permits more precise imports and reduces   381 .. superfluous code   382 ==   383 Only specific names can be imported from a module or package using the `from`   384 statement   385 || Importing "all" from a package or module is permitted   386 || Eliminating "all" imports simplifies the task of determining where names in   387 .. use have come from   388 ==   389 Modules must be specified using absolute names   390 || Imports can be absolute or relative   391 || Using only absolute names simplifies the import mechanism   392 ==   393 Modules are imported independently and their dependencies subsequently   394 resolved   395 || Modules are imported as import statements are encountered   396 || Statically-initialised objects can be used declaratively, although an   397 .. initialisation order may still need establishing   398 }}}   399    400 === Independent Modules ===   401    402 The inclusion of modules in a program affects only explicitly-named modules:   403 they do not have relationships implied by their naming that would cause such   404 related modules to be included in a program.   405    406 {{{#!python numbers=disable   407 from compiler import consts # defines consts   408 import compiler.ast # defines ast, not compiler   409    410 ast # is defined   411 compiler # is not defined   412 consts # is defined   413 }}}   414    415 Where modules should have relationships, they should be explicitly defined   416 using `from` and `import` statements which target the exact modules required.   417 In the above example, `compiler` is not routinely imported because modules   418 within the `compiler` package have been requested.   419    420 === Specific Name Imports Only ===   421    422 Lichen, unlike Python, also does not support the special `__all__` module   423 attribute.   424    425 {{{#!python numbers=disable   426 from compiler import * # not permitted   427 from compiler import ast, consts # permitted   428    429 interpreter # undefined in compiler (yet it might be thought to reside there) and in this module   430 }}}   431    432 The `__all__` attribute supports `from ... import *` statements in Python, but   433 without identifying the module or package involved and then consulting   434 `__all__` in that module or package to discover which names might be involved   435 (which might require the inspection of yet other modules or packages), the   436 names imported cannot be known. Consequently, some names used elsewhere in the   437 module performing the import might be assumed to be imported names when, in   438 fact, they are unknown in both the importing and imported modules. Such   439 uncertainty hinders the inspection of individual modules.   440    441 === Modules Imported Independently ===   442    443 When indicating an import using the `from` and `import` statements, the   444 [[../Toolchain|toolchain]] does not attempt to immediately import other   445 modules. Instead, the imports act as declarations of such other modules or   446 names from other modules, resolved at a later stage. This permits mutual   447 imports to a greater extent than in Python.   448    449 {{{#!python numbers=disable   450 # Module M   451 from N import C # in Python: fails attempting to re-enter N   452    453 class D(C):   454     y = 456   455    456 # Module N   457 from M import D # in Python: causes M to be entered, fails when re-entered from N   458    459 class C:   460     x = 123   461    462 class E(D):   463     z = 789   464    465 # Main program   466 import N   467 }}}   468    469 Such flexibility is not usually needed, and circular importing usually   470 indicates issues with program organisation. However, declarative imports can   471 help to decouple modules and avoid combining import declaration and module   472 initialisation order concerns.   473    474 == Syntax and Control-Flow ==   475    476 {{{#!table   477 '''Lichen''' || '''Python''' || '''Rationale'''   478 ==   479 If expressions and comprehensions are not supported   480 || If expressions and comprehensions are supported   481 || Omitting such syntactic features simplifies program inspection and   482 .. translation   483 ==   484 The `with` statement is not supported   485 || The `with` statement offers a mechanism for resource allocation and   486 .. deallocation using context managers   487 || This syntactic feature can be satisfactorily emulated using existing   488 .. constructs   489 ==   490 Generators are not supported   491 || Generators are supported   492 || Omitting generator support simplifies run-time mechanisms   493 ==   494 Only positional and keyword arguments are supported   495 || Argument unpacking (using `*` and `**`) is supported   496 || Omitting unpacking simplifies generic invocation handling   497 ==   498 All parameters must be specified   499 || Catch-all parameters (`*` and `**`) are supported   500 || Omitting catch-all parameter population simplifies generic invocation   501 .. handling   502 }}}   503    504 === No If Expressions or Comprehensions ===   505    506 In order to support the classic [[WikiPedia:?:|ternary operator]], a construct   507 was [[https://www.python.org/dev/peps/pep-0308/|added]] to the Python syntax   508 that needed to avoid problems with the existing grammar and notation.   509 Unfortunately, it reorders the components from the traditional form:   510    511 {{{#!python numbers=disable   512 # Not valid in Lichen, only in Python.   513    514 # In C: condition ? true_result : false_result   515 true_result if condition else false_result   516    517 # In C: (condition ? inner_true_result : inner_false_result) ? true_result : false_result   518 true_result if (inner_true_result if condition else inner_false_result) else false_result   519 }}}   520    521 Since if expressions may participate within expressions, they cannot be   522 rewritten as if statements. Nor can they be rewritten as logical operator   523 chains in general.   524    525 {{{#!python numbers=disable   526 # Not valid in Lichen, only in Python.   527    528 a = 0 if x else 1 # x being true yields 0   529    530 # Here, x being true causes (x and 0) to complete, yielding 0.   531 # But this causes ((x and 0) or 1) to complete, yielding 1.   532    533 a = x and 0 or 1 # not valid   534 }}}   535    536 But in any case, it would be more of a motivation to support the functionality   537 if a better syntax could be adopted instead. However, if expressions are not   538 particularly important in Python, and despite enhancement requests over many   539 years, everybody managed to live without them.   540    541 List and generator comprehensions are more complicated but share some   542 characteristics of if expressions: their syntax contradicts the typical   543 conventions established by the rest of the Python language; they create   544 implicit state that is perhaps most appropriately modelled by a separate   545 function or similar object. Since Lichen does not support generators at all,   546 it will obviously not support generator expressions.   547    548 Meanwhile, list comprehensions quickly encourage barely-readable programs:   549    550 {{{#!python numbers=disable   551 # Not valid in Lichen, only in Python.   552    553 x = [0, [1, 2, 0], 0, 0, [0, 3, 4]]   554 a = [z for y in x if y for z in y if z]   555 }}}   556    557 Supporting the creation of temporary functions to produce list comprehensions,   558 while also hiding temporary names from the enclosing scope, adds complexity to   559 the toolchain for situations where programmers would arguably be better   560 creating their own functions and thus writing more readable programs.   561    562 === No With Statement ===   563    564 The   565 [[https://docs.python.org/2.7/reference/compound_stmts.html#the-with-statement|with   566 statement]] introduced the concept of   567 [[https://docs.python.org/2.7/reference/datamodel.html#context-managers|context   568 managers]] in Python 2.5, with such objects supporting a   569 [[https://docs.python.org/2.7/library/stdtypes.html#typecontextmanager|programming   570 interface]] that aims to formalise certain conventions around resource   571 management. For example:   572    573 {{{#!python numbers=disable   574 # Not valid in Lichen, only in Python.   575    576 with connection = db.connect(connection_args):   577     with cursor = connection.cursor():   578         cursor.execute(query, args)   579 }}}   580    581 Although this makes for readable code, it must be supported by objects which   582 define the `__enter__` and `__exit__` special methods. Here, the `connect`   583 method invoked in the first `with` statement must return such an object;   584 similarly, the `cursor` method must also provide an object with such   585 characteristics.   586    587 However, the "pre-with" solution is as follows:   588    589 {{{#!python numbers=disable   590 connection = db.connect(connection_args)   591 try:   592     cursor = connection.cursor()   593     try:   594         cursor.execute(query, args)   595     finally:   596         cursor.close()   597 finally:   598     connection.close()   599 }}}   600    601 Although this seems less readable, its behaviour is more obvious because magic   602 methods are not being called implicitly. Moreover, any parameterisation of the   603 acts of resource deallocation or closure can be done in the `finally` clauses   604 where such parameterisation would seem natural, rather than being specified   605 through some kind of context manager initialisation arguments that must then   606 be propagated to the magic methods so that they may take into consideration   607 contextual information that is readily available in the place where the actual   608 resource operations are being performed.   609    610 === No Generators ===   611    612 [[https://www.python.org/dev/peps/pep-0255/|Generators]] were   613 [[https://docs.python.org/release/2.3/whatsnew/section-generators.html|added]]   614 to Python in the 2.2 release and became fully part of the language in the 2.3   615 release. They offer a convenient way of writing iterator-like objects,   616 capturing execution state instead of obliging the programmer to manage such   617 state explicitly.   618    619 {{{#!python numbers=disable   620 # Not valid in Lichen, only in Python.   621    622 def fib():   623     a, b = 0, 1   624     while 1:   625         yield b   626         a, b = b, a+b   627    628 # Alternative form valid in Lichen.   629    630 class fib:   631     def __init__(self):   632         self.a, self.b = 0, 1   633    634     def next(self):   635         result = self.b   636         self.a, self.b = self.b, self.a + self.b   637         return result   638    639 # Main program.   640    641 seq = fib()   642 i = 0   643 while i < 10:   644     print seq.next()   645     i += 1   646 }}}   647    648 However, generators make additional demands on the mechanisms provided to   649 support program execution. The encapsulation of the above example generator in   650 a separate class illustrates the need for state that persists outside the   651 execution of the routine providing the generator's results. Generators may   652 look like functions, but they do not necessarily behave like them, leading to   653 potential misunderstandings about their operation even if the code is   654 superficially tidy and concise.   655    656 === Positional and Keyword Arguments Only ===   657    658 When invoking callables, only positional arguments and keyword arguments can   659 be used. Python also supports `*` and `**` arguments which respectively unpack   660 sequences and mappings into the argument list, filling the list with sequence   661 items (using `*`) and keywords (using `**`).   662    663 {{{#!python numbers=disable   664 def f(a, b, c, d):   665     return a + b + c + d   666    667 l = range(0, 4)   668 f(*l) # not permitted   669    670 m = {"c" : 10, "d" : 20}   671 f(2, 4, **m) # not permitted   672 }}}   673    674 While convenient, such "unpacking" arguments obscure the communication between   675 callables and undermine the safety provided by function and method signatures.   676 They also require run-time support for the unpacking operations.   677    678 === Positional Parameters Only ===   679    680 Similarly, signatures may only contain named parameters that correspond to   681 arguments. Python supports `*` and `**` in parameter lists, too, which   682 respectively accumulate superfluous positional and keyword arguments.   683    684 {{{#!python numbers=disable   685 def f(a, b, *args, **kw): # not permitted   686     return a + b + sum(args) + kw.get("c", 0) + kw.get("d", 0)   687    688 f(1, 2, 3, 4)   689 f(1, 2, c=3, d=4)   690 }}}   691    692 Such accumulation parameters can be useful for collecting arbitrary data and   693 applying some of it within a callable. However, they can easily proliferate   694 throughout a system and allow erroneous data to propagate far from its origin   695 because such parameters permit the deferral of validation until the data needs   696 to be accessed. Again, run-time support is required to marshal arguments into   697 the appropriate parameter of this nature, but programmers could just write   698 functions and methods that employ general sequence and mapping parameters   699 explicitly instead.