1 = Inspecting Programs = 2 3 A program's source code is inspected module by module. As 4 [[../Imports|imports]] of modules are encountered, other modules are added to 5 the program. The inspection process for each module involves top-to-bottom 6 traversal of the code using depth-first processing of the abstract syntax 7 tree, with a stack of namespaces being employed to record definitions and 8 names within namespaces are they are visited. Inspection continues until all 9 modules in the program have been imported and inspected. 10 11 <<TableOfContents(2,3)>> 12 13 == The Inspection Process == 14 15 The inspection process is focused on two primary tasks: 16 17 * Populating namespaces 18 * Identifying the names defined by each namespace 19 * Recording relationships between namespaces 20 * Describing the operations in each namespace 21 * Identifying the names in each namespace and the attributes used by those 22 names 23 * Recording details of assignments and invocations 24 25 The results of inspection are written out to [[../Cache|cache]] files, one for 26 each module in the program. 27 28 === Program Units for Inspection === 29 30 The `inspector` module performs much of the inspection work, relying on the 31 `common` module for certain tasks, with the `modules` module providing the 32 relevant module abstractions including those writing to and reading from cache 33 files, and the `resolving` module completing each module's inspection. The 34 `importer` module coordinates inspection across the whole program. 35 36 === Name Identification === 37 38 Names can be introduced to a namespace via several different mechanisms: 39 40 * Assignments to local names 41 * References to global names 42 * References to built-in names (defined in the built-in modules) 43 * Importing of modules 44 * Importing of names from modules 45 46 The origins of names are discovered by considering local, global and built-in 47 namespaces. Where a name is '''local''', it is defined in the same namespace. 48 Where a name is '''global''', it is defined in the same module at the module 49 level. Where a name is '''built-in''', it is defined in a module in the 50 `__builtins__` package and exposed via the `__builtins__` namespace. 51 52 Initial tentative identification of names will sort names into two categories: 53 locals and external names. Global variables employed in function (or method) 54 namespaces may not be defined when a function body is inspected, either within 55 a module or being imported from another module, and so it is not initially 56 possible to more specifically determine the nature of a name. 57 58 These categories are later refined to distinguish between globals and built-in 59 names as external names. The built-in origin of a name is only suggested when 60 no locals or globals can provide the name, and the final identification of 61 such names, plus other external names introduced as locals or globals via 62 imports, will occur at the end of the inspection activity. Names that are 63 unrecognised by then may be treated like unrecognised built-ins. 64 65 === Name Restrictions and Resolution === 66 67 '''Static''' objects, forming the fundamental [[../Structure|structure]] of 68 the program, expose their names through the general structure hierarchy. 69 Classes, which are defined as part of this structure, depend on well-defined 70 names for any base classes they employ. For example: 71 72 {{{#!python numbers=disable 73 class C(A): # depends on A being a name that can be resolved (and being a class) 74 ... 75 76 class D(A.B.C): # depends on A.B.C being resolvable (and being a class) 77 ... 78 }}} 79 80 Base classes must be identifiable and unambiguous. Since base classes may be 81 imported, their identification may not necessarily occur immediately, but it 82 must be possible once all information about a program is known and when all 83 such dependencies are resolved. 84 85 === Attribute Identification === 86 87 Attributes are introduced to namespaces as follows: 88 89 * For classes and modules, the definition of names defines attributes in 90 those namespaces 91 * Functions do not generally support attributes, although 92 [[../Representations#Special_Members|representation-specific attributes]] 93 do exist on functions 94 * For instances, assignments to attributes of the special `self` name, 95 performed in methods, define attributes in all instances of the method's 96 class 97 98 Attributes are only supported on program objects if they have been defined or 99 '''bound''' as described above. Any attempt to access or set an attribute on 100 an object using a name that was not determined through the above process is 101 considered an invalid operation. Thus, augmenting the attributes available on 102 an object (so-called "monkeypatching") is not possible. 103 104 When an attribute is accessed, the location of its use is recorded and 105 ultimately associated with assignments of the name involved, just as is done 106 for accesses of the plain name itself, but the attribute details are 107 subsequently collected together for each assignment or '''version''' of the 108 name. This is discussed below. 109 110 === Inherited Attributes === 111 112 In order to support inheritance, a process of propagation makes class and 113 instance attributes available to any given class from its ancestor classes 114 according to a depth-first traversal of a class's base class hierarchy. Thus, 115 for each class, given the definition of its base classes, a complete 116 collection of class and instance attribute names can be determined. The actual 117 mechanism of propagation occurs during the consolidation phase of inspection, 118 principally because class bases are not generally immediately identifiable 119 upon completing the inspection of any given class. 120 121 === Name Assignments and Accesses === 122 123 Each assignment of a name defines a '''version''' of the name within a 124 namespace. The location of this definition consists of the following: 125 126 * The namespace in which the assignment appears 127 * The name itself 128 * The version or assignment instance number 129 130 When a name is used, the location of its use is recorded and is ultimately 131 associated with the assignments that may be providing the name at that 132 location. This permits information about the type of the name to become 133 available for each usage location. The location of each name usage consists of 134 the following: 135 136 * The namespace in which the name appears 137 * The name itself 138 * The access instance number 139 140 Name accesses are described as special cases of attribute accesses: where 141 attributes would be indicated, none are specified. 142 143 === Attribute Accesses === 144 145 Attribute '''accesses''' are operations involving attributes where those 146 attributes are obtained or set in conjunction with an '''accessor''' 147 expression. They are recorded in terms of location tuples, each describing an 148 access as follows: 149 150 * The namespace in which the access occurs 151 * Any named accessor or an anonymous accessor 152 * The attribute (or chain of attributes) involved (omitted for name accesses) 153 * The access instance number 154 155 As with name accesses, the access instance number distinguishes between 156 different accesses employing the same details. 157 158 Consider the following example: 159 160 {{{{#!table 161 <style="vertical-align: top"> 162 {{{#!python numbers=disable 163 164 def f(): 165 p = ... # p version 0 166 p.a # p.a access 0 167 if fn().a: # anonymous a access 0 168 q = ... # q version 0 169 q.a # q.a access 0 170 p # p access 0 171 else: 172 q = ... # q version 1 173 q.a # q.a access 1 174 q.b # q.b access 0 175 p # p access 1 176 }}} 177 || 178 {{{#!graphviz 179 //format=svg 180 //transform=notugly 181 digraph accesses { 182 node [shape=box,fontsize="12.0",fontname="sans-serif",tooltip="Names and accesses"]; 183 edge [tooltip="Names and accesses"]; 184 rankdir=TB; 185 186 p [label="p = ...",style=filled,fillcolor=gold]; 187 188 subgraph { 189 rank=same; 190 pa0expr [label="p.a",style=filled,fillcolor=cyan]; 191 pa0 [label="p.a #0",style=filled,fillcolor=red]; 192 } 193 194 subgraph { 195 rank=same; 196 if [label="if",shape=ellipse]; 197 fnaexpr [label="fn().a",style=filled,fillcolor=cyan]; 198 fna [label="{}.a #0",style=filled,fillcolor=red]; 199 } 200 201 q0 [label="q = ...",style=filled,fillcolor=gold]; 202 203 subgraph { 204 rank=same; 205 qa0expr [label="q.a",style=filled,fillcolor=cyan]; 206 qa0 [label="q.a #0",style=filled,fillcolor=red]; 207 } 208 209 subgraph { 210 rank=same; 211 p0expr [label="p",style=filled,fillcolor=cyan]; 212 p0 [label="p.{} #0",style=filled,fillcolor=red]; 213 } 214 215 else [label="else",shape=ellipse]; 216 q1 [label="q = ...",style=filled,fillcolor=gold]; 217 218 subgraph { 219 rank=same; 220 qa1expr [label="q.a #1",style=filled,fillcolor=cyan]; 221 qa1 [label="q.a #1",style=filled,fillcolor=red]; 222 } 223 224 subgraph { 225 rank=same; 226 qb0expr [label="q.b",style=filled,fillcolor=cyan]; 227 qb0 [label="q.b #0",style=filled,fillcolor=red]; 228 } 229 230 subgraph { 231 rank=same; 232 p1expr [label="p",style=filled,fillcolor=cyan]; 233 p1 [label="p.{} #1",style=filled,fillcolor=red]; 234 } 235 236 p -> pa0expr -> if -> fnaexpr -> q0 -> qa0expr -> p0expr -> qb0expr; 237 if -> else -> q1 -> qa1expr -> qb0expr; 238 qb0expr -> p1expr; 239 240 fnaexpr -> fna [dir=none,style=dashed]; 241 pa0expr -> pa0 [dir=none,style=dashed]; 242 qa0expr -> qa0 [dir=none,style=dashed]; 243 p0expr -> p0 [dir=none,style=dashed]; 244 qa1expr -> qa1 [dir=none,style=dashed]; 245 qb0expr -> qb0 [dir=none,style=dashed]; 246 p1expr -> p1 [dir=none,style=dashed]; 247 248 p -> pa0 [dir=none]; 249 p -> p0 [dir=none]; 250 p -> p1 [dir=none]; 251 252 q0 -> qa0 [dir=none]; 253 q0 -> qb0 [dir=none]; 254 255 q1 -> qa1 [dir=none]; 256 q1 -> qb0 [dir=none]; 257 } 258 }}} 259 }}}} 260 261 Since the names involved may be provided by different name versions, accesses 262 are counted independently. Meanwhile, a non-name or '''anonymous''' accessor 263 may be involved in an access. Such anonymous accessors are independent and do 264 not accumulate attribute usage because they potentially involve completely 265 different objects. 266 267 || '''Namespace''' || '''Name''' || '''Attribute''' || '''Access number''' || 268 || `module.f` || `p` || `a` || 0 || 269 || `module.f` || `p` || `{}` || 0 || 270 || `module.f` || `p` || `{}` || 1 || 271 || `module.f` || `q` || `a` || 0 || 272 || `module.f` || `q` || `a` || 1 || 273 || `module.f` || `q` || `b` || 0 || 274 || `module.f` || `{}` || `a` || 0 || 275 276 Accessors may be recorded using a similar location scheme. For example: 277 278 || '''Namespace''' || '''Name''' || '''Attribute''' || '''Name version''' || 279 || `module.f` || `p` || `{}` || 0 || 280 || `module.f` || `q` || `{}` || 0 || 281 || `module.f` || `q` || `{}` || 1 || 282 283 Here, the attribute field is left empty and indicates that name definitions 284 are being described. Although the data is superficially similar to name 285 accesses, it should be remembered that accesses employ an access number 286 whereas accessors employ a name version, with such identifiers being different 287 things. 288 289 === Names and Attribute Usage === 290 291 Within each scope, the names employed by the program and the attributes used 292 with those names are tracked. As the path of execution diverges and converges 293 again through control-flow constructs, the tracking attempts to maintain the 294 '''attribute usage''' associated with each name, or specifically each 295 '''version''' of each name. 296 297 {{{{#!table 298 <style="vertical-align: top"> 299 {{{#!python numbers=disable 300 y = ... # y version 0 301 while cond0: 302 if cond1: 303 y.a1 304 elif cond2: 305 y = ... # y version 1 306 y.a2 307 else: 308 y.a3 309 310 # y version 0 is used with a1 or a3 or neither 311 312 # y version 1 is used with a2 313 }}} 314 || 315 {{{#!graphviz 316 //format=svg 317 //transform=notugly 318 digraph usage { 319 node [shape=box,fontsize="12.0",fontname="sans-serif",tooltip="Name and attribute tracking"]; 320 edge [tooltip="Name and attribute tracking"]; 321 rankdir=TB; 322 323 sety0 [label="y = ...",style=filled,fillcolor=gold]; 324 while [label="while cond0",shape=ellipse]; 325 326 subgraph { 327 rank=same; 328 if [label="if cond1",shape=ellipse]; 329 ya1 [label="y.a1",style=filled,fillcolor=cyan]; 330 } 331 332 subgraph { 333 rank=same; 334 elif [label="elif cond2",shape=ellipse]; 335 sety1 [label="y = ...",style=filled,fillcolor=gold]; 336 ya2 [label="y.a2",style=filled,fillcolor=green]; 337 } 338 339 subgraph { 340 rank=same; 341 ifelse [label="else",shape=ellipse]; 342 ya3 [label="y.a3",style=filled,fillcolor=cyan]; 343 } 344 345 endif [label="end if",shape=ellipse]; 346 347 whileelse [label="(else)",shape=ellipse]; 348 349 endwhile [label="end while",shape=ellipse]; 350 351 sety0 -> while -> if -> elif -> ifelse -> endif; 352 if -> ya1 -> endif; 353 elif -> sety1 -> ya2 -> endif; 354 ifelse -> ya3 -> endif; 355 endif -> while [style=dashed]; 356 while -> whileelse -> endwhile; 357 } 358 }}} 359 }}}} 360 361 The outcome of such tracking should be an indication of the attribute usage 362 with each name based on the shortest routes that names can take through the 363 control-flow structure. Such shortest-route usage defines the minimal 364 selection of attributes that can be considered used with a name, and thus such 365 usage defines the broadest selection of object types that can be identified as 366 supporting such attributes. In the above example, the following minimal 367 selections of attributes apply: 368 369 || '''Name Version''' || '''Minimal Set of Attributes''' || '''Types''' || 370 || `y` (version 0) || ''empty set'' || ''any object'' || 371 || `y` (version 1) || `a2` || ''only objects providing the single attribute'' || 372 373 The assumption is made that any attribute used with a name is always provided 374 by the object referenced by that name and that the correct execution of the 375 code does not rely on an attribute being absent (and thus raising an 376 exception). By defining usage for each name, the toolchain can determine 377 whether any type can provide the attributes used with a name, producing a 378 compile-time error if no type supports such usage. 379 380 It is possible that certain routes taken by names might define attribute usage 381 that cannot be supported by types that do support the shortest-route usage, 382 yet it might not be appropriate to forbid usage of such types with such names: 383 the program logic may intend that such types do not visit the regions of the 384 code that employ the attributes that such types cannot support. However, as a 385 consequence, run-time tests will be required to prevent attribute accesses 386 that are inappropriate for such types from taking place. In the above example, 387 the following maximal selections apply: 388 389 || '''Name Version''' || '''Maximal Set of Attributes''' || '''Types''' || 390 || `y` (version 0) || `a1`, `a3` || ''only objects providing both attributes'' || 391 || `y` (version 1) || `a1`, `a2`, `a3` || ''only objects providing all three attributes'' || 392 393 Tracking occurs separately in function or method namespaces and at a level 394 combining the static namespaces in a module. This latter combination of 395 namespaces permits the flow of global name details through class namespaces. 396 Tracking employs special '''tracking names''' with which usage is associated, 397 with globals and class attributes employing complete object paths to describe 398 their names, whereas locals merely employ the plain names defined and used in 399 local namespaces. Some examples follow: 400 401 || '''Namespace''' || '''Name''' || '''Name Scope''' || '''Tracking Name''' || 402 || `__main__` (module) || `x` || global || `__main__.x` || 403 || `__main__.C` (class) || `x` || global || `__main__.x` || 404 || `__main__.C` (class) || `y` || local (class attribute) || `__main__.C.y` || 405 || `__main__.f` (function) || `y` || local || `y` || 406 407 === Name Initialisation and Aliases === 408 409 Each name version may be associated with a value upon assignment provided by 410 an expression known as an '''initialiser'''. Such values may be used to inform 411 the interpretation of operations on names, restricting the types involved to 412 the initialised values. Some examples are as follows: 413 414 {{{#!python numbers=disable 415 x = 123 # initialiser is a constant, indicating a known type 416 x = classname() # initialiser refers to a class, indicating instantiation 417 }}} 418 419 Names may also be assigned the values of other names, and such a name becomes 420 an '''alias''' for such other names. Aliases can therefore be used to combine 421 attribute usage observations across names. Other forms of aliases involve 422 assigning attributes accessed via other names to any given name. Some examples 423 are as follows: 424 425 {{{#!python numbers=disable 426 y = x # y 427 y.p # p can be assumed to apply to x 428 z = x.q # z can be restricted to the types deduced for x.q 429 }}} 430 431 Initialising values are retained for later resolution because their identities 432 are not generally known until certain name resolution activities have taken 433 place. 434 435 === Invocations === 436 437 During inspection only limited information is available about the nature of 438 invocations. However, some information will already be available about global 439 names and these may be defined by classes. Thus, invocations that cause the 440 instantiation of classes may be determined even during the inspection phase. 441 442 Information about class instantiation is most useful in combination with the 443 initialisation of names. When an assignment occurs, any initialising 444 expression that provides enough information can be evaluated to see if it 445 describes instantiation. If so, the nature of the instantiation can be 446 associated with the name and used in the deduction process to constrain any 447 usage of that name and its attributes. Such restrictions on the types 448 associated with names are applied in the deduction phase. 449 450 === Literals and Constants === 451 452 Literal values or '''literals''' are specific values that appear in the 453 program, typically representing numbers, character strings, mappings or 454 collections. Literal numbers and strings are effectively '''constants''', 455 meaning that their values are unambiguously defined and can eventually be 456 referenced using a unique identifier that applies throughout the program and 457 which can refer to an initialised object in any generated program. Initially, 458 they are recorded for the namespace in which they appear. For example: 459 460 || '''Namespace''' || '''Identifier''' || '''Type''' || '''Encoding''' || '''Value''' || 461 || `__main__` || `__main__.$c0` || `__builtins__.str.string` || `iso-8859-15` || `'\xc6\xd8\xc5'` || 462 || `__main__` || `__main__.$c1` || `__builtins__.int.int` || || `123` || 463 464 Since values may appear again in other namespaces, a mapping is generated from 465 such local identifiers to common global identifiers for constants. Where such 466 an identifier is derived from the value content, this can potentially occur 467 immediately, although in practice (and in general) such a mapping will be 468 generated after all modules have been inspected. 469 470 Collections and mappings may also be initialised using literal syntax but may 471 contain non-constant information such as names or expressions whose values are 472 unknown before the program is run. Such values can be represented as 473 instantiation operations of the appropriate type in any generated program, and 474 the instances of the type concerned can be associated with any names to which 475 such literals are assigned. Special names are used to refer to literal types 476 for collections and mappings. For example: 477 478 || '''Identifier''' || '''Type''' || 479 || `$Ldict` || `__builtins__.dict.dict` || 480 || `$Llist` || `__builtins__.list.list` || 481 || `$Ltuple` || `__builtins__.tuple.tuple` || 482 483 Such special names merely serve as convenient placeholders for the types of 484 any literals mentioned in a module. 485 486 == Consolidating Inspection Details == 487 488 As briefly discussed above, certain activities occur after information has 489 been collected from an input program. Within each module, names that are 490 external to namespaces are recorded in a '''name reference''' collection. 491 These references identify unrecognised names but do not generally define their 492 origins. Examples of name references are as follows: 493 494 || '''Name''' || '''Identity''' || 495 || `__main__.repr` || `<deferred>:__builtins__.repr` || 496 || `__main__.sys` || `<module>:sys` || 497 498 In order to obtain the final identities, deferred references may need to be 499 resolved, yielding concrete references: 500 501 || '''Name''' || '''Identity''' || 502 || `__main__.repr` || `<function>:__builtins__.identity.repr` || 503 || `__main__.sys` || `<module>:sys` || 504 505 This process of resolution cannot be performed immediately with only the 506 knowledge available in a single module. Only with all program modules loaded 507 can such resolution occur. 508 509 === Identifying Deferred References === 510 511 After all modules are loaded, and with all objects present in the program at 512 this point, each deferred reference defined within a module should ultimately 513 yield an identity. Any failure to do so indicates usage of a name not defined 514 in the program. The process of identifying them by resolving what each of them 515 points to, is illustrated by the following example: 516 517 || '''Reference''' || '''Object Path to Search''' || 518 || `<depends>:__builtins__.repr` || `__builtins__.repr` || 519 || `<depends>:__builtins__.identity.repr` || `__builtins__.identity.repr` || 520 || `<function>:__builtins__.identity.repr` || ''identification has occurred'' || 521 522 Initially, the origin information from the reference is used to find the 523 details of the referenced object. However, in this case, the details yield 524 another deferred reference that has not yet been resolved. Consequently, the 525 origin information must be used to find the details of the object referenced 526 in this case, finally yielding a reference with concrete object information. 527 Deferred references are mutated to concrete references, thus changing the 528 nature of such references throughout the accumulated data. 529 530 === Identifying Module Dependencies === 531 532 During deferred reference resolution, relationships are catalogued between 533 modules in which deferred references are found and those providing the 534 corresponding referenced objects. In addition, module ordering dependencies 535 are established on the basis that some kinds of objects must be initialised 536 before being used in other parts of the program. As a result, the modules 537 providing such objects must themselves be initialised before the modules 538 referencing such objects. More information on this topic can be found in the 539 [[../Imports#Import_Sequencing|import sequencing]] documentation. 540 541 === Resolving Module Details === 542 543 With all dependencies resolved, further work can be done with the details 544 within each module. The `resolving` module provides the functionality to each 545 module instance to perform such work. 546 547 1. Class bases can be identified. 548 1. Special names (referring to built-in objects employed by certain 549 operations) are resolved. 550 1. Each access involving an external name is investigated to see if it 551 provides an effectively constant object (as described below). 552 1. Invocation references are converted to provide instantiation information, 553 if appropriate. 554 1. Initialisers are investigated to see if they provide concrete object 555 details and can thus indicate the nature of a name. 556 1. Constant value literals are associated with the appropriate types. 557 558 Thus, attribute details for each module and its classes are finalised. 559 Meanwhile, modules that are still hidden are removed from the program. 560 561 ==== Investigating Accesses ==== 562 563 External names will not have been resolved during the inspection process, but 564 with information about the whole program, it becomes possible to identify 565 names and to determine whether attribute accesses involving those names can 566 also be identified. Thus, name references from each namespace are investigated 567 in turn as follows. 568 569 Each name referencing a static object is considered a constant access of that 570 object. However, given a number of accompanying attributes describing an 571 access, each attribute is added in turn, and the resulting object path 572 identified. If the path identifies a constant object, the next attribute in 573 the access chain is added and the resulting object path identified, and so on. 574 The maximal collection of attributes identifying a constant object is then 575 recorded as a constant access, with any remaining attributes in the access 576 chain retained for traversal in a later phase. 577 578 Names yielding constant accesses do not need to have their identity deduced 579 through the application of attribute usage. 580 581 ==== Investigating Initialisers ==== 582 583 During inspection, initialisers will have been recorded in terms of expression 584 results such as names, invocations, readily-identifiable instances, and 585 attribute accesses. With details of the whole program plus constant accesses 586 having been made available, such results may be definitively identified and 587 associated with initialised names. 588 589 Alias information may also be refined at this point by identifying attribute 590 accesses that are used to initialise names. 591 592 === Finalising Program Details === 593 594 With class contents and relationships identified, class attribute inheritance 595 and instance attribute availability can be defined. Some classes may depend on 596 external state for their initialisation, and so additional module dependencies 597 may be defined at this stage. The `importer` module contains the functionality 598 to conduct these whole-program activities. 599 600 == Inspection Outcome == 601 602 Once inspection is complete, the available knowledge concerns the whole 603 program and not merely individual modules or parts of the program. However, 604 only limited efforts have been made until this point, notably in the 605 acquisition of statically-defined object details when referencing names or 606 attributes, to integrate the different modules. The [[../Deduction|deduction]] 607 process is concerned with such integration.