ULA

ULA.txt

145:52f02b762d61
12 months ago Paul Boddie Added more remarks about CPU reading and writing operations involving RAM.
     1 The Acorn Electron ULA
     2 ======================
     3 
     4 Principal Design and Feature Constraints
     5 ----------------------------------------
     6 
     7 The features of the ULA are limited in sophistication by the amount of time
     8 and resources that can be allocated to each activity supporting the
     9 fundamental features and obligations of the unit. Maintaining a screen display
    10 based on the contents of RAM itself requires the ULA to have exclusive access
    11 to various hardware resources for a significant period of time.
    12 
    13 Whilst other elements of the ULA can in principle run in parallel with the
    14 display refresh activity, they cannot also access the RAM at the same time.
    15 Consequently, other features that might use the RAM must accept a reduced
    16 allocation of that resource in comparison to a hypothetical architecture where
    17 concurrent RAM access is possible at all times.
    18 
    19 Thus, the principal constraint for many features is bandwidth. The duration of
    20 access to hardware resources is one aspect of this; the rate at which such
    21 resources can be accessed is another. For example, the RAM is not fast enough
    22 to support access more frequently than one byte per 2MHz cycle, and for screen
    23 modes involving 80 bytes of screen data per scanline, there are no free cycles
    24 for anything other than the production of pixel output during the active
    25 scanline periods.
    26 
    27 Another constraint is imposed by the method of RAM access provided by the ULA.
    28 The ULA is able to access RAM by fetching 4 bits at a time and thus managing
    29 to transfer 8 bits within a single 2MHz cycle, this being sufficient to
    30 provide display data for the most demanding screen modes. However, this
    31 mechanism's timing requirements are beyond the capabilities of the CPU when
    32 running at 2MHz.
    33 
    34 Consequently, the CPU will only ever be able to access RAM via the ULA at
    35 1MHz, even when the ULA is not accessing the RAM. Fortunately, when needing to
    36 refresh the display, the ULA is still able to make use of the idle part of
    37 each 1MHz cycle (or, rather, the idle 2MHz cycle unused by the CPU) to itself
    38 access the RAM at a rate of 1 byte per 1MHz cycle (or 1 byte every other 2MHz
    39 cycle), thus supporting the less demanding screen modes.
    40 
    41 Timing
    42 ------
    43 
    44 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
    45 of which are used to generate pixel data. At 50Hz, this means that 128 cycles
    46 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
    47 312 ~= 128 cycles). This is consistent with the observation that each scanline
    48 requires at most 80 bytes of data, and that the ULA is apparently busy for 40
    49 out of 64 microseconds in each scanline.
    50 
    51 (In fact, since the ULA is seeking to provide an image for an interlaced
    52 625-line display, there are in fact two "fields" involved, one providing 312
    53 scanlines and one providing 313 scanlines. See below for a description of the
    54 video system.)
    55 
    56 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
    57 each providing two bits of each byte) using two cycles within the 500ns period
    58 of the 2MHz clock to complete each access operation. Since the CPU and ULA
    59 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
    60 effectively run at 1MHz (since every other 500ns period involves the ULA
    61 accessing RAM) during transfers of screen data.
    62 
    63 The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided
    64 by the ULA (IC1) depending on the screen mode in use. Each 16MHz cycle is
    65 approximately 62.5ns. To access the memory, the following patterns
    66 corresponding to 16MHz cycles are required:
    67 
    68      Time (ns):  0-------------- 500------------- ...
    69    2 MHz cycle:  0               1                ...
    70   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
    71                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
    72           ~RAS:  /---\___________/---\___________ ...
    73           ~CAS:  /-----\___/-\___/-----\___/-\___ ...
    74 Address events:      A B     C       A B     C    ...
    75    Data events:        ...F  ...S      ...F  ...S ...
    76            ~WE:        R               R          ...
    77 
    78       ~RAS ops:  1   0           1   0            ...
    79       ~CAS ops:  1     0   1 0   1     0   1 0    ...
    80 
    81    Address ops:     a.b.    c.      a.b.    c.    ...
    82       Data ops:  s         f     s         f      ...
    83 
    84        PHI OUT:  ----\_______/-------\_______/--- ...
    85      CPU (ROM):  D   .....L  ....D   .....L  .... ...
    86            RnW:      .....R          .....R       ...
    87 
    88 ~RAS must be high for 100ns, ~CAS must be high for 50ns.
    89 ~RAS must be low for 150ns, ~CAS must be low for 90ns.
    90 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
    91 
    92 Here, "A" and "B" respectively indicate the row and first column addresses
    93 being latched into the RAM (on a negative edge for ~RAS and ~CAS
    94 respectively), and "C" indicates the second column address being latched into
    95 the RAM. Presumably, the first and second half-bytes can be read at "F" and
    96 "S" respectively, and the row and column addresses must be made available at
    97 "a" and "b" (and "c") respectively at the latest. The TM4164EC4 datasheet
    98 suggests that the addresses can be made available as the ~RAS and ~CAS levels
    99 are brought low. Data can be read at "f" and "s" for the first and second
   100 half-bytes respectively.
   101 
   102 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
   103 address access time of 90ns (maximum), which appears to mean that ~RAS must be
   104 held low for at least 150ns and that ~CAS must be held low for at least 90ns
   105 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
   106 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
   107 is 1.5 cycles.
   108 
   109 Note that the Service Manual refers to the negative edge of RAS and CAS, but
   110 the datasheet for the similar TM4164EC4 product shows latching on the negative
   111 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
   112 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
   113 "page mode" provides the appropriate behaviour for that particular product.
   114 
   115 The CPU, when accessing the RAM alone, apparently does not make use of the
   116 vacated "slot" that the ULA would otherwise use (when interleaving accesses in
   117 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
   118 accessing ROM (and potentially sideways RAM). The principal limitation is the
   119 amount of time needed between issuing an address and receiving an entire byte
   120 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
   121 4 cycles that would be required for 2MHz operation.
   122 
   123 Write operations expose some uncertainty about the relationship between the
   124 ULA's RAM access schedule and the PHI OUT clock. The Service Manual shows PHI
   125 IN (which should be the ULA's PHI OUT signal) as being synchronised with ~RAS.
   126 Since the CPU makes its address available potentially as late as 140ns after
   127 its PHI2 clock goes low (this clock being broadly similar to PHI OUT), it
   128 would make no sense to expect the ULA to be able perform a memory access
   129 immediately. What seems more likely is that the CPU makes data available, and
   130 this is written during the next 2MHz cycle.
   131 
   132 For CPU write operations, "L" indicates the point at which an address is taken
   133 from the CPU address bus, following a negative edge of PHI OUT, with "D" being
   134 the point at which data may be asserted for writing, following a positive edge
   135 of PHI OUT.  Here, PHI OUT is driven at 1MHz.
   136 
   137      Time (ns):  0-------------- 500------------ 1000------------ ...
   138    1 MHz cycle:  0                               1
   139    2 MHz cycle:  0               1               2                ...
   140   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
   141                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
   142           ~RAS:                  /---\___________/                ...
   143           ~CAS:                  /-----\___/-\___/                ...
   144        PHI OUT:  ----\_______/-----------------------\_______/--- ...
   145      CPU (RAM):      .....L  ....D                                ...
   146            RnW:      .....W                                       ...
   147 
   148 Here, the concurrent RAM accesses performed by the ULA to obtain any screen
   149 data have been omitted to avoid confusion.
   150 
   151 Given that ~WE needs to be driven low for writing or high for reading, and
   152 thus propagates RnW from the CPU, this would need to be done before data would
   153 be retrieved and, according to the TM4164EC4 datasheet, even as late as the
   154 column address is presented and ~CAS brought low.
   155 
   156 For CPU read operations, the positive edge of PHI OUT is not critical.
   157 Instead, the data presented to the CPU must be available for a minimum setup
   158 time before the next negative edge of PHI OUT. In the diagram below, "D" is
   159 the point at which data can be made available. The data must be stable
   160 approximately 50ns before the start of the next PHI OUT cycle, indicated by
   161 "*" below.
   162 
   163      Time (ns):  0-------------- 500------------ 1000------------ ...
   164    1 MHz cycle:  0                               1
   165    2 MHz cycle:  0               1               2                ...
   166   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
   167                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
   168           ~RAS:                  /---\___________/                ...
   169           ~CAS:                  /-----\___/-\___/                ...
   170        PHI OUT:  ----\_______/-----------------------\_______/--- ...
   171      CPU (RAM):      .....L.........D..............*              ...
   172            RnW:      .....R                                       ...
   173 
   174 Here, the concurrent RAM accesses performed by the ULA to obtain any screen
   175 data have been omitted to avoid confusion.
   176 
   177 It must be concluded that where accesses are interleaved between the CPU and
   178 ULA, the CPU access begins concurrently with the ULA access, with the CPU
   179 address and data retained by the ULA, and after the ULA access, the rest of
   180 the CPU transaction occurs in the following 2MHz cycle.
   181 
   182 See: Acorn Electron Advanced User Guide
   183 See: Acorn Electron Service Manual
   184      http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
   185 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
   186 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
   187 See: One of the Most Popular 65,536-Bit (64K) Dynamic RAMs The TMS 4164
   188      http://smithsonianchips.si.edu/augarten/p64.htm
   189 See: https://www.mups.co.uk/project/hardware/acorn_electron/
   190 See: Rockwell R650X and R651X Microprocessors (CPU)
   191 See: http://wilsonminesco.com/6502primer/
   192 
   193 A Note on 8-Bit Wide RAM Access
   194 -------------------------------
   195 
   196 It is worth considering the timing when 8 bits of data can be obtained at once
   197 from the RAM chips:
   198 
   199      Time (ns):  0-------------- 500------------- ...
   200    2 MHz cycle:  0               1                ...
   201    8 MHz cycle:  0   1   2   3   0   1   2   3    ...
   202                  /-\_/-\_/-\_/-\_/-\_/-\_/-\_/-\_ ...
   203           ~RAS:  /---\___________/---\___________ ...
   204           ~CAS:  /-------\_______/-------\_______ ...
   205 Address events:      A   B           A   B        ...
   206    Data events:          ...E            ...E     ...
   207            ~WE:          R               R        ...
   208 
   209       ~RAS ops:  1   0           1   0            ...
   210       ~CAS ops:  1       0       1       0        ...
   211 
   212    Address ops:     a.  b.          a.  b.        ...
   213       Data ops:            f     s         f      ...
   214 
   215        PHI OUT:  ----\_______/-------\_______/--- ...
   216            CPU:  D   .....L  ....D   .....L  .... ...
   217            RnW:      .....W          .....W        ...
   218 
   219 Here, "E" indicates the availability of an entire byte.
   220 
   221 Since only one fetch is required per 2MHz cycle, instead of two fetches for
   222 the 4-bit wide RAM arrangement, it seems likely that longer 8MHz cycles could
   223 be used to coordinate the necessary signalling.
   224 
   225 Another conceivable simplification from using an 8-bit wide RAM access channel
   226 with a single access within each 2MHz cycle is the possibility of allowing the
   227 CPU to signal directly to the RAM instead of having the ULA perform the access
   228 signalling on the CPU's behalf. Note that it is this more leisurely signalling
   229 that would allow the CPU to conduct accesses at 2MHz: the "compressed"
   230 signalling being beyond the capabilities of the CPU.
   231 
   232 Note that 16MHz cycles would still be needed for the pixel clock in MODE 0,
   233 which needs to output eight pixels per 2MHz cycle, producing 640 monochrome
   234 pixels per 80-byte line.
   235 
   236 An obvious consideration with regard to 8-bit wide access is whether the ULA
   237 could still conduct the "compressed" signalling for its own RAM accesses:
   238 
   239      Time (ns):  0-------------- 500------------- ...
   240    2 MHz cycle:  0               1                ...
   241   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
   242                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
   243           ~RAS:  /---\___________/---\___________ ...
   244           ~CAS:  /-----\___/-\___/-----\___/-\___ ...
   245 Address events:      A B     C       A B     C    ...
   246    Data events:        ...1  ...2      ...1  ...2 ...
   247            ~WE:        R               R          ...
   248 
   249       ~RAS ops:  1   0           1   0            ...
   250       ~CAS ops:  1     0   1 0   1     0   1 0    ...
   251 
   252    Address ops:     a.b.    c       a.b.    c     ...
   253       Data ops:  s         f     s         f      ...
   254 
   255        PHI OUT:  ----\_______/-------\_______/--- ...
   256            CPU:  D   .....L  ....D   .....L  .... ...
   257            RnW:      .....W          .....W        ...
   258 
   259 Here, "1" and "2" in the data events correspond to whole byte accesses,
   260 effectively upgrading the half-byte "F" and "S" events in the existing ULA
   261 arrangement.
   262 
   263 Although the provision of access for the CPU would adhere to the relevant
   264 timing constraints, providing only one byte per 2MHz cycle, the ULA could
   265 obtain two bytes per cycle. This would then free up bandwidth for the CPU in
   266 screen modes where the ULA would normally be dominant (MODE 0 to 3), albeit at
   267 the cost of extra buffering. Such buffering could also be done for modes where
   268 the bandwidth is shared (MODE 4 to 6), consolidating pairs of ULA accesses into
   269 single cycles and freeing up an extra cycle for CPU accesses.
   270 
   271 A further consideration is whether the CPU and ULA could access the memory on
   272 interleaved 4MHz cycles, thus replicating the arrangement used by the CPU and
   273 Video ULA on the BBC Micro. One potential obstacle is that the apparent 4MHz
   274 access rate employed by the ULA does not involve the complete process for
   275 accessing the RAM: upon setting up the address and issuing the ~RAS signal,
   276 the ULA is able to make a pair of column accesses on the same "row" of memory,
   277 effectively achieving an average access rate of 4MHz in an 8-bit
   278 configuration.
   279 
   280 However, if arbitrary pairs of column accesses were to be attempted, as would
   281 be required by CPU and ULA interleaving, the ~RAS signal would need to be
   282 re-issued with different addresses being set up. This would expand the time to
   283 access a memory location to beyond the period of a 4MHz cycle, making it
   284 impossible to employ interleaved accesses at such a rate.
   285 
   286 In conclusion, a strict interleaving strategy is not possible, but by using
   287 pixel data buffering and employing two ULA accesses per 2MHz cycle to obtain
   288 two bytes in that cycle, each adjacent 2MHz cycle can be given to the CPU,
   289 thus achieving an effective throughput during display update periods of 3
   290 bytes for every pair of cycles (2 bytes for the ULA, 1 byte for the CPU), and
   291 thus 1.5 bytes per cycle, giving an illusion of 3MHz access to RAM.
   292 
   293 Some other considerations apply to introducing 8-bit wide access. The ULA
   294 employs four pins for data transfer to and from the memory devices (RAM0..3),
   295 and obviously another four pins would be needed in an 8-bit wide scheme.
   296 However, there may have been a physical limitation on the number of pins
   297 permissible on a ULA package or the device's socket. This would necessitate
   298 the reassignment of pins, although few are readily available for such
   299 reassignment.
   300 
   301 One approach might involve connecting the RAM devices to the CPU data bus,
   302 with each line connecting to a different RAM chip. The signalling of the RAM
   303 would remain under the control of the ULA, thus preventing the RAM devices
   304 from interfering with other memory transfer operations, with the ROM
   305 signalling also remaining under the ULA's control. One potential disadvantage
   306 of this scheme would involve the elimination of the separate data paths
   307 between the CPU and ROM and between the ULA and RAM.
   308 
   309 Another approach might involve reclaiming the keyboard input pins (KBD0..3) as
   310 data pins for ULA access to RAM. This would necessitate the reorganisation of
   311 the keyboard interface, perhaps integrating the keyboard matrix more directly
   312 as a kind of ROM device. A bus transceiver could be used to isolate the
   313 keyboard inputs, with a pin being used to control the transceiver, since the
   314 keyboard data lines are pulled high. In effect, the transceiver would act as a
   315 kind of output enable for the keyboard.
   316 
   317 To make the matrix appear within the sideways ROM region of the memory map,
   318 A15 would need to be set to a high value and A14 to a low value. Signals A13
   319 to A0 would then be brought low to select the appropriate column, with the
   320 individual key states being made available via data lines, perhaps D3 to D0.
   321 This mostly retains the existing addressing arrangement and scanning
   322 mechanism. Internally, the ULA would continue to enable access to the keyboard
   323 through the ROM paging mechanism, but instead of integrating separate data
   324 pins into the CPU's data path, it would integrate the keyboard inputs using
   325 the transceiver.
   326 
   327 Enhancement: Keyboard Matrix Scanning
   328 -------------------------------------
   329 
   330 The keyboard scanning mechanism is presumably designed to be as inexpensive as
   331 possible, being driven by software and avoiding extra logic, but at the
   332 expense of occupying large regions of the memory map when paged in. A more
   333 efficient mapping of the keyboard columns could possibly be done using
   334 decoders such as the 74xx138 part which permits the decoding of three inputs
   335 to select one of eight outputs. Using two of these parts, six address lines
   336 would be dedicated to the keyboard columns as follows:
   337 
   338   A5...A3 select up to eight columns via one decoder
   339   A2...A0 select up to eight columns via another decoder
   340 
   341 In this arrangement, only one of the two ranges of pins would be used at any
   342 given time. If the ULA were to require a certain combination of the remaining
   343 address bits, a region as small as 64 bytes could be dedicated to the
   344 keyboard.
   345 
   346 A more efficient arrangement could be used by introducing logic that allows
   347 the decoders to work together to address the keyboard:
   348 
   349   A2...A0 select up to eight columns via both decoders
   350   A3 would enable one decoder if low and the other decoder if high
   351 
   352 With ULA constraints on the remaining address bits, a 16-byte region could be
   353 used to represent the keyboard.
   354 
   355 A further refinement might involve combining the existing columns into groups
   356 of eight keys. This would reduce the number of columns to seven, requiring
   357 only three address lines, with all eight data lines being used to read the
   358 matrix.
   359 
   360 On the BBC Micro, the system 6522 VIA is used to monitor and read from the
   361 keyboard. The memory locations involved with this chip are located in the
   362 region from &FE40 to &FE7F inclusive, although the memory is allocated in a
   363 way that is appropriate to operate that chip, as opposed to merely exposing
   364 the keyboard matrix.
   365 
   366 Enhancement: Hardware Device Selection
   367 --------------------------------------
   368 
   369 An alternative to the existing, rather cumbersome, sideways ROM mapping of the
   370 keyboard might involve making it accessible via a hardware-related memory page
   371 like page FE. With ULA addresses confined to FE0x, and with the ULA itself
   372 having to trap accesses to page FE, the page selection signal might be brought
   373 out of the ULA instead of any dedicated signal for the keyboard. Various
   374 address lines corresponding to A7 through A4, or a subset of these, could be
   375 fed into a decoder to permit the selection of other devices, with the keyboard
   376 being one of these.
   377 
   378 Meanwhile, a more efficient keyboard mapping using the above matrix
   379 enhancement would permit the different keyboard columns to appear as a group
   380 of sixteen or eight bytes. Thus:
   381 
   382   A15...A8 select page FE
   383    A7...A4 select a device or peripheral
   384    A3...A0 select a register or keyboard column
   385 
   386 Conceivably, devices such as sound generators could be mapped to device
   387 regions.
   388 
   389 CPU Clock Notes
   390 ---------------
   391 
   392 "The 6502 receives an external square-wave clock input signal on pin 37, which
   393 is usually labeled PHI0. [...] This clock input is processed within the 6502
   394 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2
   395 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been
   396 through two inverters and a push-pull amplifier. The same network of
   397 transistors within the 6502 which generates PHI2 is also tied to PHI1, and
   398 generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made
   399 available to external devices is so that they know when they can access the
   400 CPU. When PHI1 is high, this means that external devices can read from the
   401 address bus or data bus; when PHI2 is high, this means that external devices
   402 can write to the data bus."
   403 
   404 See: http://lateblt.livejournal.com/88105.html
   405 
   406 "The 6502 has a synchronous memory bus where the master clock is divided into
   407 two phases (Phase 1 and Phase 2). The address is always generated during Phase
   408 1 and all memory accesses take place during Phase 2."
   409 
   410 See: http://www.jmargolin.com/vgens/vgens.htm
   411 
   412 Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During
   413 Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means
   414 when PHI1 is high.
   415 
   416 Bandwidth Figures
   417 -----------------
   418 
   419 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
   420 total lines, with 80 cycles occurring in the active periods of display
   421 scanlines, the following bandwidth calculations can be performed:
   422 
   423 Total theoretical maximum:
   424        128 cycles * 312 lines
   425      = 39936 bytes
   426 
   427 MODE 0, 1, 2:
   428 ULA:    80 cycles * 256 lines
   429      = 20480 bytes
   430 CPU:    48 cycles / 2 * 256 lines
   431      + 128 cycles / 2 * (312 - 256) lines
   432      = 9728 bytes
   433 
   434 MODE 3:
   435 ULA:    80 cycles * 24 rows * 8 lines
   436      = 15360 bytes
   437 CPU:    48 cycles / 2 * 24 rows * 8 lines
   438      + 128 cycles / 2 * (312 - (24 rows * 8 lines))
   439      = 12288 bytes
   440 
   441 MODE 4, 5:
   442 ULA:    40 cycles * 256 lines
   443      = 10240 bytes
   444 CPU:   (40 cycles + 48 cycles / 2) * 256 lines
   445      + 128 cycles / 2 * (312 - 256) lines
   446      = 19968 bytes
   447 
   448 MODE 6:
   449 ULA:    40 cycles * 24 rows * 8 lines
   450      = 7680 bytes
   451 CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
   452      + 128 cycles / 2 * (312 - (24 rows * 8 lines))
   453      = 19968 bytes
   454 
   455 Here, the division of 2 for CPU accesses is performed to indicate that the CPU
   456 only uses every other access opportunity even in uncontended periods. See the
   457 2MHz RAM Access enhancement below for bandwidth calculations that consider
   458 this limitation removed.
   459 
   460 A summary of the bandwidth figures is as follows (with extra timing details
   461 described below):
   462 
   463                 Standard ULA    % Total   Slowdown  BBC-10s BBC-34s
   464 MODE 0, 1, 2    9728 bytes      24%       4.11      43s     105s
   465 MODE 3          12288 bytes     31%       3.25      34s
   466 MODE 4, 5       19968 bytes     50%       2         20s
   467 MODE 6          19968 bytes     50%       2         20s     50s
   468 
   469 The review of the Electron in Practical Computing (October 1983) provides a
   470 concise overview of the RAM access limitations and gives timing comparisons
   471 between modes and BBC Micro performance. In the above, "BBC-10s" is the
   472 measured or stated time given for a program taking 10 seconds on the BBC
   473 Micro, whereas "BBC-34s" is the apparently measured time given for the
   474 "Persian" program taking 34 seconds to complete on the BBC Micro, with a
   475 "quick" mode presumably switching to MODE 6 using the ULA directly in order to
   476 reduce display bandwidth usage while the program draws to the screen.
   477 Evidently, the measured slowdown is slightly lower than the theoretical
   478 slowdown, most likely due to the running time not being entirely dominated by
   479 RAM access performance characteristics.
   480 
   481 Video Timing
   482 ------------
   483 
   484 According to 8.7 in the Service Manual, and the PAL Wikipedia page,
   485 approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
   486 (including the "colour burst"), and 1.65µs for the "front porch", totalling
   487 12.05µs and thus leaving 51.95µs for the active video signal for each
   488 scanline. As the Service Manual suggests in the oscilloscope traces, the
   489 display information is transmitted more or less centred within the active
   490 video period since the ULA will only be providing pixel data for 40µs in each
   491 scanline.
   492 
   493 Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
   494 each scanline can be divided into 1024 cycles, although only 640 at most are
   495 actively used to provide pixel data. Pixel data production should only occur
   496 within a certain period on each scanline, approximately 262 cycles after the
   497 start of hsync:
   498 
   499   active video period = 51.95µs
   500   pixel data period = 40µs
   501   total silent period = 51.95µs - 40µs = 11.95µs
   502   silent periods (before and after) = 11.95µs / 2 = 5.975µs
   503   hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
   504   time before pixel data period = 10.4µs + 5.975µs = 16.375µs
   505   pixel data period start cycle = 16.375µs / 62.5ns = 262
   506 
   507 By choosing a number divisible by 8, the RAM access mechanism can be
   508 synchronised with the pixel production. Thus, 256 is a more appropriate start
   509 cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
   510 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
   511 document) occurs at cycle 0.
   512 
   513 To summarise:
   514 
   515   HS signal starts at cycle 0 on each horizontal scanline
   516   HS signal ends approximately 4µs later at cycle 64
   517   Pixel data starts approximately 12µs later at cycle 256
   518 
   519 "Re: Electron Memory Contention" provides measurements that appear consistent
   520 with these calculations.
   521 
   522 The "vertical blanking period", meaning the period before picture information
   523 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
   524 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
   525 lines. Thus, the first visible scanline on the first field of a frame occurs
   526 half way through the 23rd scanline period measured from the start of vsync
   527 (indicated by "V" in the diagrams below):
   528 
   529                                         10                  20    23
   530   Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
   531     Line from 1:       0                                          22 3
   532  Line on screen: .:::::VVVVV:::::                                   12233445566
   533                   |_________________________________________________|
   534                            25 line vertical blanking period
   535 
   536 In the second field of a frame, the first visible scanline coincides with the
   537 24th scanline period measured from the start of line 313 in the frame:
   538 
   539                310                                                 336
   540   Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
   541   Line from 313:       0                                            23 4
   542  Line on screen: 88:::::VVVVV::::                                    11223344
   543                288 |                                                 |
   544                    |_________________________________________________|
   545                             25 line vertical blanking period
   546 
   547 In order to consider only full lines, we might consider the start of each
   548 frame to occur 23 lines after the start of vsync.
   549 
   550 Again, it is likely that pixel data production should only occur on scanlines
   551 within a certain period on each frame. The "625/50" document indicates that
   552 only a certain region is "safe" to use, suggesting a vertically centred region
   553 with approximately 15 blank lines above and below the picture. However, the
   554 "PAL TV timing and voltages" document suggests 28 blank lines above and below
   555 the picture. This would centre the 256 lines within the 312 lines of each
   556 field and thus provide a start of picture approximately 5.5 or 5 lines after
   557 the end of the blanking period or 28 or 27.5 lines after the start of vsync.
   558 
   559 To summarise:
   560 
   561   CSYNC signal starts at cycle 0
   562   CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
   563   Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
   564 
   565 See: http://en.wikipedia.org/wiki/PAL
   566 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
   567 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
   568      http://lipas.uwasa.fi/~f76998/video/modes/
   569 See: PAL TV timing and voltages
   570      http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
   571 See: Line Standards
   572      http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
   573 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
   574      http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
   575 See: Re: Electron Memory Contention
   576      http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
   577 
   578 RAM Integrated Circuits
   579 -----------------------
   580 
   581 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
   582 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
   583 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
   584 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
   585 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
   586 
   587 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
   588 the Samsung-produced KM41464 series is apparently equivalent to the Texas
   589 Instruments 4164 chips presumably used in the Electron.
   590 
   591 The TM4164EC4 series combines 4 64K x 1b units into a single package and
   592 appears similar to the TM4164EA4 featured on the Electron's circuit diagram
   593 (in the Advanced User Guide but not the Service Manual), and it also has 22
   594 pins providing 3 additional inputs and 3 additional outputs over the 16 pins
   595 of the individual 4164-15 modules, presumably allowing concurrent access to
   596 the packaged memory units.
   597 
   598 As far as currently available replacements are concerned, the NTE4164 is a
   599 potential candidate: according to the Vetco Electronics entry, it is
   600 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
   601 parts include the NTE2164 and the NTE6664, both of which appear to have
   602 largely the same performance and connection characteristics. Meanwhile, the
   603 NTE21256 appears to be a 16-pin replacement with four times the capacity that
   604 maintains the single data input and output pins. Using the NTE21256 as a
   605 replacement for all ICs combined would be difficult because of the single bit
   606 output.
   607 
   608 Another device equivalent to the 4164-15 appears to be available under the
   609 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
   610 site lists data sheets for other devices on the same page, but these are
   611 different and actually appear to be provided under the 41574 product code (but
   612 are listed under 41464-10) and appear to be replacements for the TM4164EC4:
   613 the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
   614 employing 4 pins for both input and output.
   615 
   616             Pins    I/O pins    Row access  Column access
   617             ----    --------    ----------  -------------
   618 TM4164EC4   22      4 + 4       150ns (15)  90ns (15)
   619 KM41464AP   18      4           150ns (15)  75ns (15)
   620 NTE21256    16      1 + 1       150ns       75ns
   621 HYB 4164-2  16      1 + 1       150ns       100ns
   622 µPD41464    18      4           120ns (12)  60ns (12)
   623 
   624 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
   625      https://www.rocelec.com/part/REITM4164EC4-15L
   626 See: Dynamic RAMS
   627      http://www.unicornelectronics.com/IC/DYNAMIC.html
   628 See: New old stock 8x 4164 chips
   629      http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
   630 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
   631      http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
   632 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
   633      http://www.vetco.net/catalog/product_info.php?products_id=2806
   634 See: NTE4164 - IC-NMOS 64K DRAM 150NS
   635      http://www.vetco.net/catalog/product_info.php?products_id=3680
   636 See: NTE21256 - IC-256K DRAM 150NS
   637      http://www.vetco.net/catalog/product_info.php?products_id=2799
   638 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
   639      http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
   640 See: NTE6664 - IC-MOS 64K DRAM 150NS
   641      http://www.vetco.net/catalog/product_info.php?products_id=5213
   642 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
   643      http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
   644 See: 4164-150: MAJOR BRANDS
   645      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
   646 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
   647      http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
   648 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
   649      http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
   650 See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
   651      http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
   652 See: 41464-10: MAJOR BRANDS
   653      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
   654 
   655 Interrupts
   656 ----------
   657 
   658 The ULA generates IRQs (maskable interrupts) according to certain conditions
   659 and these conditions are controlled by location &FE00:
   660 
   661   * Vertical sync (bottom of displayed screen)
   662   * 50MHz real time clock
   663   * Transmit data empty
   664   * Receive data full
   665   * High tone detect
   666 
   667 The ULA is also used to clear interrupt conditions through location &FE05. Of
   668 particular significance is bit 7, which must be set if an NMI (non-maskable
   669 interrupt) has occurred and has thus suspended ULA access to memory, restoring
   670 the normal function of the ULA.
   671 
   672 ROM Paging
   673 ----------
   674 
   675 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
   676 mappings exist:
   677 
   678    8    keyboard
   679    9    keyboard (duplicate)
   680   10    BASIC ROM
   681   11    BASIC ROM (duplicate)
   682 
   683 Paging in a ROM involves the following procedure:
   684 
   685  1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
   686     2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
   687     selected.
   688  2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
   689     whilst writing the desired ROM number n in bits 0 to 2.
   690 
   691 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
   692 
   693 Keyboard Access
   694 ---------------
   695 
   696 The keyboard pages appear to be accessed at 1MHz just like the RAM.
   697 
   698 See: https://stardot.org.uk/forums/viewtopic.php?p=254155#p254155
   699 
   700 Shadow/Expanded Memory
   701 ----------------------
   702 
   703 The Electron exposes all sixteen address lines and all eight data lines
   704 through the expansion bus. Using such lines, it is possible to provide
   705 additional memory - typically sideways ROM and RAM - on expansion cards and
   706 through cartridges, although the official cartridge specification provides
   707 fewer address lines and only seeks to provide access to memory in 16K units.
   708 
   709 Various modifications and upgrades were developed to offer "turbo"
   710 capabilities to the Electron, permitting the CPU to access a separate 8K of
   711 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
   712 the ULA through additional logic. However, an enhanced ULA might support
   713 independent CPU access to memory over the expansion bus by allowing itself to
   714 be discharged from providing access to memory, potentially for a range of
   715 addresses, and for the CPU to communicate with external memory uninterrupted.
   716 
   717 Sideways RAM/ROM and Upper Memory Access
   718 ----------------------------------------
   719 
   720 Although the ULA controls the CPU clock, effectively slowing or stopping the
   721 CPU when the ULA needs to access screen memory, it is apparently able to allow
   722 the CPU to access addresses of &8000 and above - the upper region of memory -
   723 at 2MHz independently of any access to RAM that the ULA might be performing,
   724 only blocking the CPU if it attempts to access addresses of &7FFF and below
   725 during any ULA memory access - the lower region of memory - by stopping or
   726 stalling its clock.
   727 
   728 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
   729 CPU clock if the line goes low, when the CPU is attempting to access the lower
   730 region of memory.
   731 
   732 Hardware Scrolling (and Enhancement)
   733 ------------------------------------
   734 
   735 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
   736 the least significant 5 bits being zero, thus limiting the scrolling
   737 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
   738 using the same layout of these addresses.
   739 
   740 |--&FE02--------------| |--&FE03--------------|
   741 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
   742 
   743    XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
   744 
   745 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
   746 memory to pixel locations is character oriented. A change in 8 bytes would
   747 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
   748 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
   749 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
   750 Guide).
   751 
   752 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
   753 of changing the screen address by 2 bytes is the change in the number of lines
   754 from the initial and final character rows that need reading by the ULA, which
   755 would need to maintain this state information (although this is a relatively
   756 trivial change). Another pitfall is the complication that might be introduced
   757 to software writing bitmaps of character height to the screen.
   758 
   759 See: http://pastraiser.com/computers/acornelectron/acornelectron.html
   760 
   761 Enhancement: Mode Layouts
   762 -------------------------
   763 
   764 Merely changing the screen memory mappings in order to have Archimedes-style
   765 row-oriented screen addresses (instead of character-oriented addresses) could
   766 be done for the existing modes, but this might not be sufficiently beneficial,
   767 especially since accessing regions of the screen would involve incrementing
   768 pointers by amounts that are inconvenient on an 8-bit CPU.
   769 
   770 However, instead of using a Archimedes-style mapping, column-oriented screen
   771 addresses could be more feasibly employed: incrementing the address would
   772 reference the vertical screen location below the currently-referenced location
   773 (just as occurs within characters using the existing ULA); instead of
   774 returning to the top of the character row and referencing the next horizontal
   775 location after eight bytes, the address would reference the next character row
   776 and continue to reference locations downwards over the height of the screen
   777 until reaching the bottom; at the bottom, the next location would be the next
   778 horizontal location at the top of the screen.
   779 
   780 In other words, the memory layout for the screen would resemble the following
   781 (for MODE 2):
   782 
   783   &3000 &3100       ... &7F00
   784   &3001 &3101
   785   ...   ...
   786   &3007
   787   &3008
   788   ...
   789   ...                   ...
   790   &30FF             ... &7FFF
   791 
   792 Since there are 256 pixel rows, each column of locations would be addressable
   793 using the low byte of the address. Meanwhile, the high byte would be
   794 incremented to address different columns. Thus, addressing screen locations
   795 would become a lot more convenient and potentially much more efficient for
   796 certain kinds of graphical output.
   797 
   798 One potential complication with this simplified addressing scheme arises with
   799 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
   800 with the existing ULA) would be achieved by incrementing or decrementing the
   801 screen start address; by one character row, it would involve adding or
   802 subtracting 8. However, the ULA only supports multiples of 64 when changing the
   803 screen start address. Thus, if such a scheme were to be adopted, three
   804 additional bits would need to be supported in the screen start register (see
   805 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal
   806 scrolling would be much improved even under the severe constraints of the
   807 existing ULA: only adjustments of 256 to the screen start address would be
   808 required to produce single-location scrolling of as few as two pixels in MODE 2
   809 (four pixels in MODEs 1 and 5, eight pixels otherwise).
   810 
   811 More disruptive is the effect of this alternative layout on software.
   812 Presumably, compatibility with the BBC Micro was the primary goal of the
   813 Electron's hardware design. With the character-oriented screen layout in
   814 place, system software (and application software accessing the screen
   815 directly) would be relying on this layout to run on the Electron with little
   816 or no modification. Although it might have been possible to change the system
   817 software to use this column-oriented layout instead, this would have incurred
   818 a development cost and caused additional work porting things like games to the
   819 Electron. Moreover, a separate branch of the software from that supporting the
   820 BBC Micro and closer derivatives would then have needed maintaining.
   821 
   822 The decision to use the character-oriented layout in the BBC Micro may have
   823 been related to the choice of circuitry and to facilitate a convenient
   824 hardware implementation, and by the time the Electron was planned, it was too
   825 late to do anything about this somewhat unfortunate choice.
   826 
   827 Pixel Layouts
   828 -------------
   829 
   830 The pixel layouts are as follows:
   831 
   832   Modes         Depth (bpp)     Pixels (from bits)
   833   -----         -----------     ------------------
   834   0, 3, 4, 6    1               7 6 5 4 3 2 1 0
   835   1, 5          2               73 62 51 40
   836   2             4               7531 6420
   837 
   838 Since the ULA reads a half-byte at a time, one might expect it to attempt to
   839 produce pixels for every half-byte, as opposed to handling entire bytes.
   840 However, the pixel layout is not conducive to producing pixels as soon as a
   841 half-byte has been read for a given full-byte location: in 1bpp modes the
   842 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
   843 data is spread across the entire byte in different ways.
   844 
   845 An alternative arrangement might be as follows:
   846 
   847   Modes         Depth (bpp)     Pixels (from bits)
   848   -----         -----------     ------------------
   849   0, 3, 4, 6    1               7 6 5 4 3 2 1 0
   850   1, 5          2               76 54 32 10
   851   2             4               7654 3210
   852 
   853 Just as the mode layouts were presumably decided by compatibility with the BBC
   854 Micro, the pixel layouts will have been maintained for similar reasons.
   855 Unfortunately, this layout prevents any optimisation of the ULA for handling
   856 half-byte pixel data generally.
   857 
   858 Enhancement: The Missing MODE 4
   859 -------------------------------
   860 
   861 The Electron inherits its screen mode selection from the BBC Micro, where MODE
   862 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
   863 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
   864 however, and they are merely implemented by skipping two scanlines in every
   865 ten after the eight required to produce a character line. Thus, such modes
   866 provide a 24-row display.
   867 
   868 In principle, nothing prevents this "text mode" effect being applied to other
   869 modes. The 20-column modes are not well-suited to displaying text, which
   870 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
   871 2. Although the need for a non-monochrome 40-column text mode is addressed by
   872 MODE 7 on the BBC Micro, the Electron lacks such a mode.
   873 
   874 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
   875 would occupy MODE 4 instead of the current MODE 4:
   876 
   877   Screen mode  Size (kilobytes)  Colours  Rows  Resolution
   878   -----------  ----------------  -------  ----  ----------
   879   0            20                2        32    640x256
   880   1            20                4        32    320x256
   881   2            20                16       32    160x256
   882   3            16                2        24    640x256
   883   4 (new)      16                4        24    320x256
   884   4 (old)      10                2        32    320x256
   885   5            10                4        32    160x256
   886   6            8                 2        24    320x256
   887 
   888 Thus, for increasing mode numbers, the size of each mode would be the same or
   889 less than the preceding mode.
   890 
   891 Enhancement: Display Mode Property Control
   892 ------------------------------------------
   893 
   894 It is rather curious that the ULA supports the mode numbers directly in bits 3
   895 to 5 of &FE07 since these would presumably need to be decoded in order to set
   896 the fundamental properties of the display mode. These properties are as
   897 follows:
   898 
   899  * Screen data retrieval rate: number of fetches per pair of 2MHz cycles
   900  * Pixel colour depth
   901  * Text mode vertical spacing
   902 
   903 From these, the following properties emerge:
   904 
   905   Property                        Influences
   906   --------                        ----------
   907   Character row size (bytes)      Retrieval rate
   908 
   909   Number of character rows        Text mode setting
   910 
   911   Display size (bytes)            Retrieval rate (character row size)
   912                                   Text mode setting (number of rows)
   913 
   914   Pixel frequency                 Retrieval rate
   915   Horizontal resolution (pixels)  Colour depth
   916 
   917 One can imagine a register bitfield arrangement as follows:
   918 
   919   Field             Values                  Formula
   920   -----             ------                  -------
   921   Pixel depth       00: 1 bit per pixel     log2(depth)
   922                     01: 2 bits per pixel
   923                     10: 4 bits per pixel
   924 
   925   Retrieval rate     0: twice               2 - fetches per cycle pair
   926                      1: once
   927 
   928   Text mode enable   0: disable/off         text mode enabled
   929                      1: enable/on
   930 
   931 This arrangement would require four bits. However, one bit in &FE07 is
   932 seemingly inactive and might possibly be reallocated.
   933 
   934 The resulting combination of properties would permit all of the existing modes
   935 plus some additional ones, including the missing MODE 4 mentioned above. With
   936 the bitfields above ordered from the most significant bits to the least
   937 significant bits providing the low-level "mode" values, the following table
   938 can be produced:
   939 
   940   Screen mode  Depth Rate   Text  Size (K)  Colours  Rows  Resolution
   941   -----------  ----- ----   ----  --------  -------  ----  ----------
   942   0  (0000)    1     twice  off   20        2        32    640x256    (MODE 0)
   943   1  (0001)    1     twice  on    16        2        24    640x256    (MODE 3)
   944   2  (0010)    1     once   off   10        2        32    320x256    (MODE 4)
   945   3  (0011)    1     once   on    8         2        24    320x256    (MODE 6)
   946   4  (0100)    2     twice  off   20        4        32    320x256    (MODE 1)
   947   5  (0101)    2     twice  on    16        4        24    320x256
   948   6  (0110)    2     once   off   10        4        32    160x256    (MODE 5)
   949   7  (0111)    2     once   on    8         4        24    160x256
   950   8  (1000)    4     twice  off   20        16       32    160x256    (MODE 2)
   951   9  (1001)    4     twice  on    16        16       24    160x256
   952   10 (1010)    4     once   off   10        16       32    80x256
   953   11 (1011)    4     once   on    8         16       24    80x256
   954 
   955 The existing modes would be covered in a way that is incompatible with the
   956 existing numbering, thus requiring a table in software, but additional text
   957 modes would be provided for MODE 1, MODE 5 and MODE 2. An additional two lower
   958 resolution modes would also be conceivable within this scheme, requiring the
   959 stretching of 16MHz pixels by a factor of eight to yield 80 pixels per
   960 scanline. The utility of such modes is questionable and such modes might not
   961 be supported.
   962 
   963 Enhancement: 2MHz RAM Access
   964 ----------------------------
   965 
   966 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
   967 when not competing with the ULA only accesses RAM every other 2MHz cycle (as
   968 if the ULA still needed to access the RAM), one useful enhancement would be a
   969 mechanism to let the CPU take over the ULA cycles outside the ULA's period of
   970 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
   971 3.
   972 
   973 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
   974 
   975   Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
   976   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
   977 
   978 In MODE 4 to 6:
   979  
   980   Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
   981   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
   982 
   983 This would improve CPU bandwidth as follows:
   984 
   985                 Standard ULA    Enhanced ULA    % Total Bandwidth   Speedup
   986 MODE 0, 1, 2    9728 bytes      19456 bytes     24% -> 49%          2
   987 MODE 3          12288 bytes     24576 bytes     31% -> 62%          2
   988 MODE 4, 5       19968 bytes     29696 bytes     50% -> 74%          1.5
   989 MODE 6          19968 bytes     32256 bytes     50% -> 81%          1.6
   990 
   991 (Here, the uncontended total 2MHz bandwidth for a display period would be
   992 39936 bytes, being 128 cycles per line over 312 lines.)
   993 
   994 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
   995 because all access opportunities to RAM are doubled. Meanwhile, in the other
   996 modes, some CPU accesses occur alongside ULA accesses and thus cannot be
   997 doubled, but the CPU bandwidth increase is still significant.
   998 
   999 Unfortunately, the mechanism for accessing the RAM is too slow to provide data
  1000 within the time constraints of 2MHz operation. There is no time remaining in a
  1001 2MHz cycle for the CPU to receive and process any retrieved data once the
  1002 necessary signalling has been performed.
  1003 
  1004 The only way for the CPU to be able to access the RAM quickly enough would be
  1005 to do away with the double 4-bit access mechanism and to have a single 8-bit
  1006 channel to the memory. This would require twice as many 1-bit RAM chips or a
  1007 different kind of RAM chip, but it would also potentially simplify the ULA.
  1008 
  1009 The section on 8-bit wide RAM access discusses the possibilities around
  1010 changing the memory architecture, also describing the possibility of ULA
  1011 accesses achieving two bytes per 2MHz cycle due to the doubling of the memory
  1012 channel, leaving every other access free for the CPU during the display period
  1013 in MODE 0 to 3...
  1014 
  1015   Standard display period: UUUUUUUU
  1016   Modified display period: UCUCUCUC
  1017 
  1018 ...and consolidating accesses in MODE 4 to 6:
  1019 
  1020   Standard display period: UCUCUCUC
  1021   Modified display period: UCCCUCCC
  1022 
  1023 Together with the enhancements for non-display periods, such an "Enhanced+ ULA"
  1024 would perform as follows:
  1025 
  1026                 Standard ULA    Enhanced+ ULA   % Total Bandwidth   Speedup
  1027 MODE 0, 1, 2    9728 bytes      29696 bytes     24% -> 74%          3.1
  1028 MODE 3          12288 bytes     32256 bytes     31% -> 81%          2.6
  1029 MODE 4, 5       19968 bytes     34816 bytes     50% -> 87%          1.7
  1030 MODE 6          19968 bytes     36096 bytes     50% -> 90%          1.8
  1031 
  1032 Of course, the principal enhancement would be the wider memory channel, with
  1033 more buffering in the ULA being its contribution to this arrangement.
  1034 
  1035 Enhancement: Region Blanking
  1036 ----------------------------
  1037 
  1038 The problem of permitting character-oriented blitting in programs whilst
  1039 scrolling the screen by sub-character amounts could be mitigated by permitting
  1040 a region of the display to be blank, such as the final lines of the display.
  1041 Consider the following vertical scrolling by 2 bytes that would cause an
  1042 initial character row of 6 lines and a final character row of 2 lines:
  1043 
  1044     6 lines - initial, partial character row
  1045   248 lines - 31 complete rows
  1046     2 lines - final, partial character row
  1047 
  1048 If a routine were in use that wrote 8 line bitmaps to the partial character
  1049 row now split in two, it would be advisable to hide one of the regions in
  1050 order to prevent content appearing in the wrong place on screen (such as
  1051 content meant to appear at the top "leaking" onto the bottom). Blanking 6
  1052 lines would be sufficient, as can be seen from the following cases.
  1053 
  1054 Scrolling up by 2 lines:
  1055 
  1056     6 lines - initial, partial character row
  1057   240 lines - 30 complete rows
  1058     4 lines - part of 1 complete row
  1059   -----------------------------------------------------------------
  1060     4 lines - part of 1 complete row (hidden to maintain 250 lines)
  1061     2 lines - final, partial character row (hidden)
  1062 
  1063 Scrolling down by 2 lines:
  1064 
  1065     2 lines - initial, partial character row
  1066   248 lines - 31 complete rows
  1067   ----------------------------------------------------------
  1068     6 lines - final, partial character row (hidden)
  1069 
  1070 Thus, in this case, region blanking would impose a 250 line display with the
  1071 bottom 6 lines blank.
  1072 
  1073 See the description of the display suspend enhancement for a more efficient
  1074 way of blanking lines than merely blanking the palette whilst allowing the CPU
  1075 to perform useful work during the blanking period.
  1076 
  1077 To control the blanking or suspending of lines at the top and bottom of the
  1078 display, a memory location could be dedicated to the task: the upper 4 bits
  1079 could define a blanking region of up to 16 lines at the top of the screen,
  1080 whereas the lower 4 bits could define such a region at the bottom of the
  1081 screen. If more lines were required, two locations could be employed, allowing
  1082 the top and bottom regions to occupy the entire screen.
  1083 
  1084 Enhancement: Screen Height Adjustment
  1085 -------------------------------------
  1086 
  1087 The height of the screen could be configurable in order to reduce screen
  1088 memory consumption. This is not quite done in MODE 3 and 6 since the start of
  1089 the screen appears to be rounded down to the nearest page, but by reducing the
  1090 height by amounts more than a page, savings would be possible. For example:
  1091 
  1092   Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address
  1093   ------------  -----  ------  --------------  ---------------  -------------
  1094   640           1      252     80              320              &3140 -> &3100
  1095   640           1      248     80              640              &3280 -> &3200
  1096   320           1      240     40              640              &5A80 -> &5A00
  1097   320           2      240     80              1280             &3500
  1098 
  1099 Screen Mode Selection
  1100 ---------------------
  1101 
  1102 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
  1103 range of modes, the other bits of &FE*7 (related to sound, cassette
  1104 input/output and the Caps Lock LED) would need to be reassigned and bit 0
  1105 potentially being made available for use.
  1106 
  1107 Enhancement: Palette Definition
  1108 -------------------------------
  1109 
  1110 Since all memory accesses go via the ULA, an enhanced ULA could employ more
  1111 specific addresses than &FE*X to perform enhanced functions. For example, the
  1112 palette control is done using &FE*8-F and merely involves selecting predefined
  1113 colours, whereas an enhanced ULA could support the redefinition of all 16
  1114 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
  1115 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour
  1116 specifications similar to those used on the Archimedes.
  1117 
  1118 The principal limitation here is actually the hardware: the Electron has only
  1119 a single output line for each of the red, green and blue channels, and if
  1120 those outputs are strictly digital and can only be set to a "high" and "low"
  1121 value, then only the existing eight colours are possible. If a modern ULA were
  1122 able to output analogue values (or values at well-defined points between the
  1123 high and low values, such as the half-on value supported by the Amstrad CPC
  1124 series), it would still need to be assessed whether the circuitry could
  1125 successfully handle and propagate such values. Various sources indicate that
  1126 only "TTL levels" are supported by the RGB output circuit, and since there are
  1127 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
  1128 is likely that the ULA is expected to provide only "high" or "low" values.
  1129 
  1130 Short of adding extra outputs from the ULA (either additional red, green and
  1131 blue outputs or a combined intensity output), another approach might involve
  1132 some kind of modulation where an output value might be encoded in multiple
  1133 pulses at a higher frequency than the pixel frequency. However, this would
  1134 demand additional circuitry outside the ULA, and component RGB monitors would
  1135 probably not be able to take advantage of this feature; only UHF and composite
  1136 video devices (the latter with the composite video colour support enabled on
  1137 the Electron's circuit board) would potentially benefit.
  1138 
  1139 Flashing Colours
  1140 ----------------
  1141 
  1142 According to the Advanced User Guide, "The cursor and flashing colours are
  1143 entirely generated in software: This means that all of the logical to physical
  1144 colour map must be changed to cause colours to flash." This appears to suggest
  1145 that the palette registers must be updated upon the flash counter - read and
  1146 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
  1147 colour pairs to be any combination of colours might be possible, instead of
  1148 having colour complements as pairs.
  1149 
  1150 It is conceivable that the interrupt code responsible does the simple thing
  1151 and merely inverts the current values for any logical colours (LC) for which
  1152 the associated physical colour (as supplied as the second parameter to the VDU
  1153 19 call) has the top bit of its four bit value set. These top bits are not
  1154 recorded in the palette registers but are presumably recorded separately and
  1155 used to build bitmaps as follows:
  1156 
  1157   LC  2 colour  4 colour  16 colour  4-bit value for inversion
  1158   --  --------  --------  ---------  -------------------------
  1159    0  00010001  00010001  00010001   1, 1, 1
  1160    1  01000100  00100010  00010001   4, 2, 1
  1161    2            01000100  00100010      4, 2
  1162    3            10001000  00100010      8, 2
  1163    4                      00010001         1
  1164    5                      00010001         1
  1165    6                      00100010         2
  1166    7                      00100010         2
  1167    8                      01000100         4
  1168    9                      01000100         4
  1169   10                      10001000         8
  1170   11                      10001000         8
  1171   12                      01000100         4
  1172   13                      01000100         4
  1173   14                      10001000         8
  1174   15                      10001000         8
  1175 
  1176   Inversion value calculation:
  1177 
  1178    2 colour formula: 1 << (colour * 2)
  1179    4 colour formula: 1 << colour
  1180   16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
  1181 
  1182 For example, where logical colour 0 has been mapped to a physical colour in
  1183 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
  1184 the inversion operation. (The lower three bits of the physical colour would be
  1185 used to set the underlying colour information affected by the inversion
  1186 operation.)
  1187 
  1188 An operation in the interrupt code would then combine the bitmaps for all
  1189 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
  1190 combined for groups of logical colours as follows:
  1191 
  1192    Logical colours
  1193    ---------------
  1194    0,  2,  8, 10
  1195    4,  6, 12, 14
  1196    5,  7, 13, 15
  1197    1,  3,  9, 11
  1198 
  1199 These combined bitmaps would be EORed with the existing palette register
  1200 values in order to perform the value inversion necessary to produce the
  1201 flashing effect.
  1202 
  1203 Thus, in the VDU 19 operation, the appropriate inversion value would be
  1204 calculated for the logical colour, and this value would then be combined with
  1205 other inversion values in a dedicated memory location corresponding to the
  1206 colour's group as indicated above. Meanwhile, the palette channel values would
  1207 be derived from the lower three bits of the specified physical colour and
  1208 combined with other palette data in dedicated memory locations corresponding
  1209 to the palette registers.
  1210 
  1211 Interestingly, although flashing colours on the BBC Micro are controlled by
  1212 toggling bit 0 of the &FE20 control register location for the Video ULA, the
  1213 actual colour inversion is done in hardware.
  1214 
  1215 Enhancement: Palette Definition Lists
  1216 -------------------------------------
  1217 
  1218 It can be useful to redefine the palette in order to change the colours
  1219 available for a particular region of the screen, particularly in modes where
  1220 the choice of colours is constrained, and if an increased colour depth were
  1221 available, palette redefinition would be useful to give the illusion of more
  1222 than 16 colours in MODE 2. Traditionally, palette redefinition has been done
  1223 by using interrupt-driven timers, but a more efficient approach would involve
  1224 presenting lists of palette definitions to the ULA so that it can change the
  1225 palette at a particular display line.
  1226 
  1227 One might define a palette redefinition list in a region of memory and then
  1228 communicate its contents to the ULA by writing the address and length of the
  1229 list, along with the display line at which the palette is to be changed, to
  1230 ULA registers such that the ULA buffers the list and performs the redefinition
  1231 at the appropriate time. Throughput/bandwidth considerations might impose
  1232 restrictions on the practical length of such a list, however.
  1233 
  1234 A simple form of palette definition might be useful in text modes. Within the
  1235 blank region between lines, the foreground palette could be changed to apply
  1236 to the next line. Palette values could be read from a table in RAM, perhaps
  1237 preceding the screen data, with 24 2-byte entries providing palette
  1238 redefinition support in 2- and 4-colour modes.
  1239 
  1240 Enhancement: Display Synchronisation Interrupts
  1241 -----------------------------------------------
  1242 
  1243 When completing each scanline of the display, the ULA could trigger an
  1244 interrupt. Since this might impact system performance substantially, the
  1245 feature would probably need to be configurable, and it might be sufficient to
  1246 have an interrupt only after a certain number of display lines instead.
  1247 Permitting the CPU to take action after eight lines would allow palette
  1248 switching and other effects to occur on a character row basis.
  1249 
  1250 The ULA provides an interrupt at the end of the display period, presumably so
  1251 that software can schedule updates to the screen, avoid flickering or tearing,
  1252 and so on. However, some applications might benefit from an interrupt at, or
  1253 just before, the start of the display period so that palette modifications or
  1254 similar effects could be scheduled.
  1255 
  1256 Enhancement: Palette-Free Modes
  1257 -------------------------------
  1258 
  1259 Palette-free modes might be defined where bit values directly correspond to
  1260 the red, green and blue channels, although this would mostly make sense only
  1261 for modes with depths greater than the standard 4 bits per pixel, and such
  1262 modes would require more memory than MODE 2 if they were to have an acceptable
  1263 resolution.
  1264 
  1265 Enhancement: Display Suspend
  1266 ----------------------------
  1267 
  1268 Especially when writing to the screen memory, it could be beneficial to be
  1269 able to suspend the ULA's access to the memory, instead producing blank values
  1270 for all screen pixels until a program is ready to reveal the screen. This is
  1271 different from palette blanking since with a blank palette, the ULA is still
  1272 reading screen memory and translating its contents into pixel values that end
  1273 up being blank.
  1274 
  1275 This function is reminiscent of a capability of the ZX81, albeit necessary on
  1276 that hardware to reduce the load on the system CPU which was responsible for
  1277 producing the video output. By allowing display suspend on the Electron, the
  1278 performance benefit would be derived from giving the CPU full access to the
  1279 memory bandwidth.
  1280 
  1281 Note that since the CPU is only able to access RAM at 1MHz, there is no
  1282 possibility to improve performance beyond that achieved in MODE 4, 5 or 6
  1283 normally. However, if faster RAM access were to be made possible (see the
  1284 discussion of 8-bit wide RAM access), the CPU could benefit from freeing up
  1285 the ULA's access slots entirely.
  1286 
  1287 The region blanking feature mentioned above could be implemented using this
  1288 enhancement instead of employing palette blanking for the affected lines of
  1289 the display.
  1290 
  1291 Enhancement: Memory Filling
  1292 ---------------------------
  1293 
  1294 A capability that could be given to an enhanced ULA is that of permitting the
  1295 ULA to write to screen memory as well being able to read from it. Although
  1296 such a capability would probably not be useful in conjunction with the
  1297 existing read operations when producing a screen display, and insufficient
  1298 bandwidth would exist to do so in high-bandwidth screen modes anyway, the
  1299 capability could be offered during a display suspend period (as described
  1300 above), permitting a more efficient mechanism to rapidly fill memory with a
  1301 predetermined value.
  1302 
  1303 This capability could also support block filling, where the limits of the
  1304 filled memory would be defined by the position and size of a screen area,
  1305 although this would demand the provision of additional registers in the ULA to
  1306 retain the details of such areas and additional logic to control the fill
  1307 operation.
  1308 
  1309 Enhancement: Region Filling
  1310 ---------------------------
  1311 
  1312 An alternative to memory writing might involve indicating regions using
  1313 additional registers or memory where the ULA fills regions of the screen with
  1314 content instead of reading from memory. Unlike hardware sprites which should
  1315 realistically provide varied content, region filling could employ single
  1316 colours or patterns, and one advantage of doing so would be that the ULA need
  1317 not access memory at all within a particular region.
  1318 
  1319 Regions would be defined on a row-by-row basis. Instead of reading memory and
  1320 blitting a direct representation to the screen, the ULA would read region
  1321 definitions containing a start column, region width and colour details. There
  1322 might be a certain number of definitions allowed per row, or the ULA might
  1323 just traverse an ordered list of such definitions with each one indicating the
  1324 row, start column, region width and colour details.
  1325 
  1326 One could even compress this information further by requiring only the row,
  1327 start column and colour details with each subsequent definition terminating
  1328 the effect of the previous one. However, one would also need to consider the
  1329 convenience of preparing such definitions and whether efficient access to
  1330 definitions for a particular row might be desirable. It might also be
  1331 desirable to avoid having to prepare definitions for "empty" areas of the
  1332 screen, effectively making the definition of the screen contents employ
  1333 run-length encoding and employ only colour plus length information.
  1334 
  1335 One application of region filling is that of simple 2D and 3D shape rendering.
  1336 Although it is entirely possible to plot such shapes to the screen and have
  1337 the ULA blit the memory contents to the screen, such operations consume
  1338 bandwidth both in the initial plotting and in the final transfer to the
  1339 screen. Region filling would reduce such bandwidth usage substantially.
  1340 
  1341 This way of representing screen images would make certain kinds of images
  1342 unfeasible to represent - consider alternating single pixel values which could
  1343 easily occur in some character bitmaps - even if an internal queue of regions
  1344 were to be supported such that the ULA could read ahead and buffer such
  1345 "bandwidth intensive" areas. Thus, the ULA might be better served providing
  1346 this feature for certain areas of the display only as some kind of special
  1347 graphics window.
  1348 
  1349 Enhancement: Hardware Sprites
  1350 -----------------------------
  1351 
  1352 An enhanced ULA might provide hardware sprites, but this would be done in an
  1353 way that is incompatible with the standard ULA, since no &FE*X locations are
  1354 available for allocation. To keep the facility simple, hardware sprites would
  1355 have a standard byte width and height.
  1356 
  1357 The specification of sprites could involve the reservation of 16 locations
  1358 (for example, &FE20-F) specifying a fixed number of eight sprites, with each
  1359 location pair referring to the sprite data. By limiting the ULA to dealing
  1360 with a fixed number of sprites, the work required inside the ULA would be
  1361 reduced since it would avoid having to deal with arbitrary numbers of sprites.
  1362 
  1363 The principal limitation on providing hardware sprites is that of having to
  1364 obtain sprite data, given that the ULA is usually required to retrieve screen
  1365 data, and given the lack of memory bandwidth available to retrieve sprite data
  1366 (particularly from multiple sprites supposedly at the same position) and
  1367 screen data simultaneously. Although the ULA could potentially read sprite
  1368 data and screen data in alternate memory accesses in screen modes where the
  1369 bandwidth is not already fully utilised, this would result in a degradation of
  1370 performance.
  1371 
  1372 Enhancement: Additional Screen Mode Configurations
  1373 --------------------------------------------------
  1374 
  1375 Alternative screen mode configurations could be supported. The ULA has to
  1376 produce 640 pixel values across the screen, with pixel doubling or quadrupling
  1377 employed to fill the screen width:
  1378 
  1379   Screen width      Columns     Scaling     Depth       Bytes
  1380   ------------      -------     -------     -----       -----
  1381   640               80          x1          1           80
  1382   320               40          x2          1, 2        40, 80
  1383   160               20          x4          2, 4        40, 80
  1384 
  1385 It must also use at most 80 byte-sized memory accesses to provide the
  1386 information for the display. Given that characters must occupy an 8x8 pixel
  1387 array, if a configuration featuring anything other than 20, 40 or 80 character
  1388 columns is to be supported, compromises must be made such as the introduction
  1389 of blank pixels either between characters (such as occurs between rows in MODE
  1390 3 and 6) or at the end of a scanline (such as occurs at the end of the frame
  1391 in MODE 3 and 6). Consider the following configuration:
  1392 
  1393   Screen width      Columns     Scaling     Depth       Bytes       Blank
  1394   ------------      -------     -------     -----       ------      -----
  1395   208               26          x3          1, 2        26, 52      16
  1396 
  1397 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
  1398 colours could be provided, with 16 blank pixel values (out of a total of 640)
  1399 generated either at the start or end (or split between the start and end) of
  1400 each scanline.
  1401 
  1402 Enhancement: Character Attributes
  1403 ---------------------------------
  1404 
  1405 The BBC Micro MODE 7 employs something resembling character attributes to
  1406 support teletext displays, but depends on circuitry providing a character
  1407 generator. The ZX Spectrum, on the other hand, provides character attributes
  1408 as a means of colouring bitmapped graphics. Although such a feature is very
  1409 limiting as the sole means of providing multicolour graphics, in situations
  1410 where the choice is between low resolution multicolour graphics or high
  1411 resolution monochrome graphics, character attributes provide a potentially
  1412 useful compromise.
  1413 
  1414 For each byte read, the ULA must deliver 8 pixel values (out of a total of
  1415 640) to the video output, doing so by either emptying its pixel buffer on a
  1416 pixel per cycle basis, or by multiplying pixels and thus holding them for more
  1417 than one cycle. For example for a screen mode having 640 pixels in width:
  1418 
  1419   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  1420   Reads:    B                               B
  1421   Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7
  1422 
  1423 And for a screen mode having 320 pixels in width:
  1424 
  1425   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  1426   Reads:    B
  1427   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
  1428 
  1429 However, in modes where less than 80 bytes are required to generate the pixel
  1430 values, an enhanced ULA might be able to read additional bytes between those
  1431 providing the bitmapped graphics data:
  1432 
  1433   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  1434   Reads:    B                               A
  1435   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
  1436 
  1437 These additional bytes could provide colour information for the bitmapped data
  1438 in the following character column (of 8 pixels). Since it would be desirable
  1439 to apply attribute data to the first column, the initial 8 cycles might be
  1440 configured to not produce pixel values.
  1441 
  1442 For an entire character, attribute data need only be read for the first row of
  1443 pixels for a character. The subsequent rows would have attribute information
  1444 applied to them, although this would require the attribute data to be stored
  1445 in some kind of buffer. Thus, the following access pattern would be observed:
  1446 
  1447   Reads:    A B _ B _ B _ B _ B _ B _ B _ B ...
  1448 
  1449 In modes 3 and 6, the blank display lines could be used to retrieve attribute
  1450 data:
  1451 
  1452   Reads (blank):     A _ A _ A _ A _ A _ A _ A _ A _ ...
  1453   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
  1454   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
  1455                      ...
  1456 
  1457 See below for a discussion of using this for character data as well.
  1458 
  1459 A whole byte used for colour information for a whole character would result in
  1460 a choice of 256 colours, and this might be somewhat excessive. By only reading
  1461 attribute bytes at every other opportunity, a choice of 16 colours could be
  1462 applied individually to two characters.
  1463 
  1464   Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  1465   Reads:    B               A               B               -
  1466   Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
  1467 
  1468 Further reductions in attribute data access, offering 4 colours for every
  1469 character in a four character block, for example, might also be worth
  1470 considering.
  1471 
  1472 Consider the following configurations for screen modes with a colour depth of
  1473 1 bit per pixel for bitmap information:
  1474 
  1475   Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start
  1476   ------------  -------  -------  ---------  ---------  -------  ------------
  1477   320           40       x2       40         40         256      &5300
  1478   320           40       x2       40         20         16       &5580 -> &5500
  1479   320           40       x2       40         10         4        &56C0 -> &5600
  1480   208           26       x3       26         26         256      &62C0 -> &6200
  1481   208           26       x3       26         13         16       &6460 -> &6400
  1482 
  1483 Enhancement: Text-Only Modes using Character and Attribute Data
  1484 ---------------------------------------------------------------
  1485 
  1486 In modes 3 and 6, the blank display lines could be used to retrieve character
  1487 and attribute data instead of trying to insert it between bitmap data accesses,
  1488 but this data would then need to be retained:
  1489 
  1490   Reads:    A C A C A C A C A C A C A C A C ...
  1491   Reads:    B _ B _ B _ B _ B _ B _ B _ B _ ...
  1492 
  1493 Only attribute (A) and character (C) reads would require screen memory
  1494 storage. Bitmap data reads (B) would involve either accesses to memory to
  1495 obtain character definition details or could, at the cost of special storage
  1496 in the ULA, involve accesses within the ULA that would then free up the RAM.
  1497 However, the CPU would not benefit from having any extra access slots due to
  1498 the limitations of the RAM access mechanism.
  1499 
  1500 A scheme without caching might be possible. The same line of memory addresses
  1501 might be visited over and over again for eight display lines, with an index
  1502 into the bitmap data being incremented from zero to seven. The access patterns
  1503 would look like this:
  1504 
  1505   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 0)
  1506   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 1)
  1507   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 2)
  1508   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 3)
  1509   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 4)
  1510   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 5)
  1511   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 6)
  1512   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 7)
  1513 
  1514 The bandwidth requirements would be the sum of the accesses to read the
  1515 character values (repeatedly) and those to read the bitmap data to reproduce
  1516 the characters on screen.
  1517 
  1518 Enhancement: 40-Column Text Modes by Interleaving Screen and Bitmap Accesses
  1519 ----------------------------------------------------------------------------
  1520 
  1521 A simplified form of the above interleaved character/bitmap reading scheme.
  1522 This was also suggested in a discussion here:
  1523 
  1524 https://stardot.org.uk/forums/viewtopic.php?p=393243#p393243
  1525 
  1526 The ULA could be run in high-bandwidth mode to fetch character codes from
  1527 screen memory in one cycle and then to use the character code to look up a
  1528 pixel row of a character bitmap, reading that bitmap slice in the following
  1529 cycle. The bitmap would be converted to pixel values that would then be
  1530 emitted over the subsequent two cycles concurrently with the preparation of
  1531 the next character's pixels.
  1532 
  1533   2MHz cycle: 0 1 2 3 4 5 ...
  1534   Reads:      C B C B C B ...
  1535   Pixels:         a   b   ...
  1536 
  1537 The memory access to bitmap data would be computed as follows, assuming the
  1538 normal eight pixel height and single-byte encoding of character bitmaps:
  1539 
  1540   bitmap address = bitmap table base + (character code * 8) + index
  1541 
  1542 Each successive pixel row on the screen would expose the appropriate row in
  1543 the character bitmap, with this index looping from 0 to 7 repeatedly as shown
  1544 previously. Spacing between character lines could be introduced as already
  1545 done in MODE 6.
  1546 
  1547 Character bitmap data would be stored in RAM, since this is the only possible
  1548 source of data for the ULA as delivered. The use of ROM would require changes
  1549 to the broader system architecture. Thus, the total memory requirements of
  1550 such a mode would be the locations for character positions plus the storage
  1551 requirements of the bitmaps to be supported.
  1552 
  1553   Columns  Rows  Screen size  Bitmaps  Bitmaps size  Total size
  1554   -------  ----  -----------  -------  ------------  ----------
  1555   40       25    1000         256      2048          3048
  1556   40       25    1000         128      1024          2024
  1557   40       25    1000         96       768           1768
  1558   40       32    1280         256      2048          3328
  1559   40       32    1280         128      1024          2304
  1560   40       32    1280         96       768           2048
  1561 
  1562 The simplest arrangement would involve bitmap definitions for all 256 possible
  1563 character codes, demanding a total of around 3K of RAM. Reducing the number of
  1564 supported bitmaps to 96 (codes 32 to 127 inclusive) would bring this total to
  1565 a maximum of 2K, but this would incur additional complexity in the ULA itself
  1566 if the codes not corresponding to bitmaps were to be specially mapped to, say,
  1567 the bitmap for the space character or to a null character.
  1568 
  1569 With the screen start address controllable, it is conceivable that with a
  1570 256-entry bitmap table, the screen memory could be made to overlap the bitmap
  1571 table for bitmaps not likely to be used. For example, the bitmap table might
  1572 be situated at &7700, with this leaving enough space for 128 entries (&400 or
  1573 1024 bytes) and a 40x32 text screen (&500 or 1280 bytes):
  1574 
  1575   &8000 +---------------+---------------+
  1576   &7F00 +---------------+               |
  1577         |               |    Display    |
  1578         | Bitmaps (128) |    (40x32)    |
  1579         |               |               |
  1580   &7B00 +---------------+---------------+
  1581         |               |
  1582         | Bitmaps (128) |
  1583         |               |
  1584   &7700 +---------------+
  1585 
  1586 Care would then need to be taken to avoid the use of codes from 128 to 255 in
  1587 the screen memory as these would replicate character data as bitmap data.
  1588 
  1589 Enhancement: MODE 7 Emulation using Character Attributes
  1590 --------------------------------------------------------
  1591 
  1592 If the scheme of applying attributes to character regions were employed to
  1593 emulate MODE 7, in conjunction with the MODE 6 display technique, the
  1594 following configuration would be required:
  1595 
  1596   Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start
  1597   ------------  -------  ----  ---------  ---------  -------  ------------
  1598   320           40       25    40         20         16       &5ECC -> &5E00
  1599   320           40       25    40         10         4        &5FC6 -> &5F00
  1600 
  1601 Although this requires much more memory than MODE 7 (8500 bytes versus MODE
  1602 7's 1000 bytes), it does not need much more memory than MODE 6, and it would
  1603 at least make a limited 40-column multicolour mode available as a substitute
  1604 for MODE 7.
  1605 
  1606 Using the text-only enhancement with caching of data or with repeated reads of
  1607 the same character data line for eight display lines, the storage requirements
  1608 would be diminished substantially:
  1609 
  1610   Screen width  Columns  Rows  Bytes (C)  Bytes (A)  Colours  Screen start
  1611   ------------  -------  ----  ---------  ---------  -------  ------------
  1612   320           40       25    40         20         16       &7A94 -> &7A00
  1613   320           40       25    40         10         4        &7B1E -> &7B00
  1614   320           40       25    40         5          2        &7B9B -> &7B00
  1615   320           40       25    40         0          (2)      &7C18 -> &7C00
  1616   640           80       25    80         40         16       &7448 -> &7400
  1617   640           80       25    80         20         4        &763C -> &7600
  1618   640           80       25    80         10         2        &7736 -> &7700
  1619   640           80       25    80         0          (2)      &7830 -> &7800
  1620 
  1621 Note that the colours describe the locally defined attributes for each
  1622 character. When no attribute information is provided, the colours are defined
  1623 globally.
  1624 
  1625 Enhancement: Character Generator Support and Vertical Scaling
  1626 -------------------------------------------------------------
  1627 
  1628 When generating a picture, the ULA traverses screen memory, obtaining 40 or 80
  1629 bytes of pixel data for each scanline. It then proceeds to the next row of
  1630 pixel data for each successive scanline, with the exception of the text modes
  1631 where scanlines may be blank (for which the row address does not advance).
  1632 This arrangement provides a conventional bitmapped graphics display.
  1633 
  1634 However, the ULA could instead facilitate the use of character generators. The
  1635 principles involved can be demonstrated by the Jafa Mode 7 Mark 2 Display Unit
  1636 expansion for the Electron which feeds the pixel data from a MODE 4 screen to
  1637 a SAA5050 character generator to create a MODE 7 display. The solution adopted
  1638 involves the replication of 40 bytes of character data across as many pixel
  1639 rows as is necessary for the character generator to receive the appropriate
  1640 character data for all scanlines in any given character row. If only a single
  1641 40-byte row of character data were to be present for the first scanline of a
  1642 character row, the character generator would only produce the first scanline
  1643 (or the uppermost pixels of the characters) correctly, with the rest of the
  1644 character shapes being ill-defined.
  1645 
  1646 Here, the ULA could facilitate the use of memory-efficient character mode
  1647 representations (such as MODE 7) by holding the row address for a number of
  1648 scanlines, thus providing the same row of screen data for those scanlines,
  1649 then advancing to the next row. Visualised in terms of pixel data, it would be
  1650 like providing a display with a very low vertical resolution. Indeed, being
  1651 able to reduce the vertical resolution of a display mode by a factor of eight
  1652 or ten would be equivalent to the above character generation technique in
  1653 terms of the ULA's screen reading activities.
  1654 
  1655 By combining this vertical scaling or scanline replication with a circuit
  1656 switchable between bitmapped graphics output and character graphics output,
  1657 MODE 7 support could be made available, potentially as a hardware option
  1658 separate from the ULA.
  1659 
  1660 Enhancement: Compressed Character Data
  1661 --------------------------------------
  1662 
  1663 Another observation about text-only modes is that they only need to store a
  1664 restricted set of bitmapped data values. Encoding this set of values in a
  1665 smaller unit of storage than a byte could possibly help to reduce the amount
  1666 of storage and bandwidth required to reproduce the characters on the display.
  1667 
  1668 Enhancement: High Resolution Graphics and Larger Colour Depths
  1669 --------------------------------------------------------------
  1670 
  1671 Screen modes with higher resolutions and larger colour depths might be
  1672 possible, but this would in most cases involve the allocation of more screen
  1673 memory, and the ULA would probably then be obliged to page in such memory for
  1674 the CPU to be able to sensibly access it all. Higher resolutions would also
  1675 involve a faster pixel clock.
  1676 
  1677 However, we may consider a doubled colour depth and the need for higher
  1678 bandwidth transfers by a ULA having an 8-bit data bus to access the RAM,
  1679 utilising two "page mode" transfers per 2MHz cycle. If such transfers were to
  1680 access consecutive bytes in the same memory region (for example, bytes &3000
  1681 and &3001) this would require a change to the arrangement of screen memory,
  1682 also incurring changes to the memory map for larger modes:
  1683 
  1684  (&3000 &3001) (&3010 &3011) ...
  1685  (&3002 &3003) (&3012 &3013)
  1686  ...           ...
  1687  (&300E &300F) (&301E &301F)
  1688 
  1689 If such transfers were to access two adjacent columns of bytes (for example,
  1690 bytes &3000 and &3008), this would still require a change in the step size
  1691 across the screen memory, also incur memory map changes for larger modes, and
  1692 the method for programs to update the screen would be more complicated:
  1693 
  1694  (&3000 &3008) (&3010 &3018) ...
  1695  (&3001 &3009) (&3011 &3019)
  1696  ...           ...
  1697  (&3007 &300F) (&3017 &301F)
  1698 
  1699 However, such transfers could instead map the device address bit that is
  1700 toggled between transfers to the most significant system memory address bit.
  1701 Thus, bits in adjacent locations within each RAM device would actually reside
  1702 in different memory regions:
  1703 
  1704  (&3000 &B000) (&3008 &B008) ...
  1705  (&3001 &B001) (&3009 &B009)
  1706  ...           ...
  1707  (&3007 &B007) (&300F &B00F)
  1708 
  1709 Since &B000 can also be considered as &3000 combined with &8000, this
  1710 introducing the asserted uppermost bit, address &B000 can be considered as
  1711 &3000 in an upper memory bank.
  1712 
  1713 Other mechanisms might be employed to allow programs to access the uppermost
  1714 bank, but the ULA would be able to access it trivially and unconditionally.
  1715 
  1716 Enhancement: Assembling a Display from Separate Display Planes
  1717 --------------------------------------------------------------
  1718 
  1719 Continuing from the use of separate memory regions for higher bandwidth modes,
  1720 one can consider a memory layout where modes 1 and 2 would employ two regions
  1721 that individually resemble modes 4 and 5 respectively. Programs would be able
  1722 to populate two copies of the screen memory for a low-bandwidth mode in order
  1723 to produce a single screen memory region for the corresponding high-bandwidth
  1724 mode. This would allow a seamless transition between displays with different
  1725 numbers of colours without needing to redraw the display.
  1726 
  1727 Enhancement: Genlock Support
  1728 ----------------------------
  1729 
  1730 The ULA generates a video signal in conjunction with circuitry producing the
  1731 output features necessary for the correct display of the screen image.
  1732 However, it appears that the ULA drives the video synchronisation mechanism
  1733 instead of reacting to an existing signal. Genlock support might be possible
  1734 if the ULA were made to be responsive to such external signals, resetting its
  1735 address generators upon receiving synchronisation events.
  1736 
  1737 Enhancement: Improved Sound
  1738 ---------------------------
  1739 
  1740 The standard ULA reserves &FE*6 for sound generation and cassette input/output
  1741 (with bits 1 and 2 of &FE*7 being used to select either sound generation or
  1742 cassette I/O), thus making it impossible to support multiple channels within
  1743 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
  1744 and an enhanced ULA could adopt this interface.
  1745 
  1746 The BBC Micro uses the SN76489 chip to produce sound, and the entire
  1747 functionality of this chip could be emulated for enhanced sound, with a subset
  1748 of the functionality exposed via the &FE*6 interface.
  1749 
  1750 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
  1751 See: http://www.smspower.org/Development/SN76489
  1752 
  1753 Enhancement: Waveform Upload
  1754 ----------------------------
  1755 
  1756 As with a hardware sprite function, waveforms could be uploaded or referenced
  1757 using locations as registers referencing memory regions.
  1758 
  1759 Enhancement: Sound Input/Output
  1760 -------------------------------
  1761 
  1762 Since the ULA already controls audio input/output for cassette-based data, it
  1763 would have been interesting to entertain the idea of sampling and output of
  1764 sounds through the cassette interface. However, a significant amount of
  1765 circuitry is employed to process the input signal for use by the ULA and to
  1766 process the output signal for recording.
  1767 
  1768 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
  1769 
  1770 Enhancement: BBC ULA Compatibility
  1771 ----------------------------------
  1772 
  1773 Although some new ULA functions could be defined in a way that is also
  1774 compatible with the BBC Micro, the BBC ULA is itself incompatible with the
  1775 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
  1776 map, but controls various functions specific to the 6845 video controller;
  1777 &FE08-F is reserved for the serial controller. It therefore becomes possible
  1778 to disregard compatibility where compatibility is already disregarded for a
  1779 particular area of functionality.
  1780 
  1781 &FE20-F maps to video ULA functionality on the BBC Micro which provides
  1782 control over the palette (using address &FE21, compared to &FE07-F on the
  1783 Electron) and other system-specific functions. Since the location usage is
  1784 generally incompatible, this region could be reused for other purposes.
  1785 
  1786 Enhancement: Increased RAM, ULA and CPU Performance
  1787 ---------------------------------------------------
  1788 
  1789 More modern implementations of the hardware might feature faster RAM coupled
  1790 with an increased ULA clock frequency in order to increase the bandwidth
  1791 available to the ULA and to the CPU in situations where the ULA is not needed
  1792 to perform work. A ULA employing a 32MHz clock would be able to complete the
  1793 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
  1794 to access the RAM for the following 250ns even in display modes requiring the
  1795 retrieval of a byte for the display every 500ns. The CPU could, subject to
  1796 timing issues, run at 2MHz even in MODE 0, 1 and 2.
  1797 
  1798 A scheme such as that described above would have a similar effect to the
  1799 scheme employed in the BBC Micro, although the latter made use of RAM with a
  1800 wider bandwidth in order to complete memory transfers within 250ns and thus
  1801 permit the CPU to run continuously at 2MHz.
  1802 
  1803 Higher bandwidth could potentially be used to implement exotic features such
  1804 as RAM-resident hardware sprites or indeed any feature demanding RAM access
  1805 concurrent with the production of the display image.
  1806 
  1807 Enhancement: Multiple CPU Stacks and Zero Pages
  1808 -----------------------------------------------
  1809 
  1810 The 6502 maintains a stack for subroutine calls and register storage in page
  1811 &01. Although the stack register can be manipulated using the TSX and TXS
  1812 instructions, thereby permitting the maintenance of multiple stack regions and
  1813 thus the potential coexistence of multiple programs each using a separate
  1814 region, only programs that make little use of the stack (perhaps avoiding
  1815 deeply-nested subroutine invocations and significant register storage) would
  1816 be able to coexist without overwriting each other's stacks.
  1817 
  1818 One way that this issue could be alleviated would involve the provision of a
  1819 facility to redirect accesses to page &01 to other areas of memory. The ULA
  1820 would provide a register that defines a physical page for the use of the CPU's
  1821 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would
  1822 change the asserted address lines to redirect the access to the appropriate
  1823 physical region.
  1824 
  1825 By providing an 8-bit register, mapping to the most significant byte (MSB) of
  1826 a 16-bit address, the ULA could then replace any MSB equal to &01 with the
  1827 register value before the access is made. Where multiple programs coexist,
  1828 upon switching programs, the register would be updated to point the ULA to the
  1829 appropriate stack location, thus providing a simple memory management unit
  1830 (MMU) capability.
  1831 
  1832 In a similar fashion, zero page accesses could also be redirected so that code
  1833 could run from sideways RAM and have zero page operations redirected to "upper
  1834 memory" - for example, to page &BE (with stack accesses redirected to page
  1835 &BF, perhaps) - thereby permitting most CPU operations to occur without
  1836 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
  1837 CPU as it contends with the ULA for memory access.
  1838 
  1839 Such facilities could also be provided by a separate circuit between the CPU
  1840 and ULA in a fashion similar to that employed by a "turbo" board, but unlike
  1841 such boards, no additional RAM would be provided: all memory accesses would
  1842 occur as normal through the ULA, albeit redirected when configured
  1843 appropriately.
  1844 
  1845 ULA Pin Functions
  1846 -----------------
  1847 
  1848 The functions of the ULA pins are described in the Electron Service Manual. Of
  1849 interest to video processing are the following:
  1850 
  1851   CSYNC (low during horizontal or vertical synchronisation periods, high
  1852          otherwise)
  1853 
  1854   HS (low during horizontal synchronisation periods, high otherwise)
  1855 
  1856   RED, GREEN, BLUE (pixel colour outputs)
  1857 
  1858   CLOCK IN (a 16MHz clock input, 4V peak to peak)
  1859 
  1860   PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
  1861 
  1862 More general memory access pins:
  1863 
  1864   RAM0...RAM3 (data lines to/from the RAM)
  1865 
  1866   RA0...RA7 (address lines for sending both row and column addresses to the RAM)
  1867 
  1868   RAS (row address strobe setting the row address on a negative edge - see the
  1869        timing notes)
  1870 
  1871   CAS (column address strobe setting the column address on a negative edge -
  1872        see the timing notes)
  1873 
  1874   WE (sets write enable with logic 0, read with logic 1)
  1875 
  1876   ROM (select data access from ROM)
  1877 
  1878 CPU-oriented memory access pins:
  1879 
  1880   A0...A15 (CPU address lines)
  1881 
  1882   PD0...PD7 (CPU data lines)
  1883 
  1884   R/W (indicates CPU write with logic 0, CPU read with logic 1)
  1885 
  1886 Interrupt-related pins:
  1887 
  1888   NMI (CPU request for uninterrupted 1MHz access to memory)
  1889 
  1890   IRQ (signal event to CPU)
  1891 
  1892   POR (power-on reset, resetting the ULA on a positive edge and asserting the
  1893        CPU's RST pin)
  1894 
  1895   RST (master reset for the CPU signalled on power-up and by the Break key)
  1896 
  1897 Keyboard-related pins:
  1898 
  1899   KBD0...KBD3 (keyboard inputs)
  1900 
  1901   CAPS LOCK (control status LED)
  1902 
  1903 Sound-related pins:
  1904 
  1905   SOUND O/P (sound output using internal oscillator)
  1906 
  1907 Cassette-related pins:
  1908 
  1909   CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
  1910 
  1911   CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
  1912 
  1913   CAS RC (detect high tone)
  1914 
  1915   CAS MO (motor relay output)
  1916 
  1917   ÷13 IN (~1200 baud clock input)
  1918 
  1919 ULA Socket
  1920 ----------
  1921 
  1922 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
  1923 
  1924 References
  1925 ----------
  1926 
  1927 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
  1928 
  1929 About this Document
  1930 -------------------
  1931 
  1932 The most recent version of this document and accompanying distribution should
  1933 be available from the following location:
  1934 
  1935 http://hgweb.boddie.org.uk/ULA
  1936 
  1937 Copyright and licence information can be found in the docs directory of this
  1938 distribution - see docs/COPYING.txt for more information.