ULA

ULA.txt

112:e0f6fdfe1e6d
2017-04-11 Paul Boddie Added notes about text-only modes, plus character and attribute value retrieval and caching.
     1 The Acorn Electron ULA
     2 ======================
     3 
     4 Principal Design and Feature Constraints
     5 ----------------------------------------
     6 
     7 The features of the ULA are limited by the amount of time and resources that
     8 can be allocated to each activity necessary to support such features given the
     9 fundamental obligations of the unit. Maintaining a screen display based on the
    10 contents of RAM itself requires the ULA to have exclusive access to such
    11 hardware resources for a significant period of time. Whilst other elements of
    12 the ULA can in principle run in parallel with this activity, they cannot also
    13 access the RAM. Consequently, other features that might use the RAM must
    14 accept a reduced allocation of that resource in comparison to a hypothetical
    15 architecture where concurrent RAM access is possible.
    16 
    17 Thus, the principal constraint for many features is bandwidth. The duration of
    18 access to hardware resources is one aspect of this; the rate at which such
    19 resources can be accessed is another. For example, the RAM is not fast enough
    20 to support access more frequently than one byte per 2MHz cycle, and for screen
    21 modes involving 80 bytes of screen data per scanline, there are no free cycles
    22 for anything other than the production of pixel output during the active
    23 scanline periods.
    24 
    25 Timing
    26 ------
    27 
    28 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
    29 of which are used to generate pixel data. At 50Hz, this means that 128 cycles
    30 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
    31 312 ~= 128 cycles). This is consistent with the observation that each scanline
    32 requires at most 80 bytes of data, and that the ULA is apparently busy for 40
    33 out of 64 microseconds in each scanline.
    34 
    35 (In fact, since the ULA is seeking to provide an image for an interlaced
    36 625-line display, there are in fact two "fields" involved, one providing 312
    37 scanlines and one providing 313 scanlines. See below for a description of the
    38 video system.)
    39 
    40 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
    41 each providing two bits of each byte) using two cycles within the 500ns period
    42 of the 2MHz clock to complete each access operation. Since the CPU and ULA
    43 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
    44 effectively run at 1MHz (since every other 500ns period involves the ULA
    45 accessing RAM). The CPU is driven by an external clock (IC8) whose 16MHz
    46 frequency is divided by the ULA (IC1) depending on the screen mode in use.
    47 
    48 Each 16MHz cycle is approximately 62.5ns. To access the memory, the following
    49 patterns corresponding to 16MHz cycles are required:
    50 
    51      Time (ns):  0-------------- 500------------- ...
    52    2 MHz cycle:  0               1                ...
    53   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
    54                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
    55           ~RAS:  /---\___________/---\___________ ...
    56           ~CAS:  /-----\___/-\___/-----\___/-\___ ...
    57 Address events:      A B     C       A B     C    ...
    58    Data events:           F     S         F     S ...
    59 
    60       ~RAS ops:  1   0           1   0            ...
    61       ~CAS ops:  1     0   1 0   1     0   1 0    ...
    62 
    63    Address ops:     a b     c       a b     c     ...
    64       Data ops:  s         f     s         f      ...
    65 
    66            ~WE:  ......W                          ...
    67        PHI OUT:  \_______________/--------------- ...
    68      CPU (RAM):  L               D                ...
    69            RnW:  R                                ...
    70 
    71        PHI OUT:  \_______/-------\_______/------- ...
    72      CPU (ROM):  L       D       L       D        ...
    73            RnW:          R               R        ...
    74 
    75 ~RAS must be high for 100ns, ~CAS must be high for 50ns.
    76 ~RAS must be low for 150ns, ~CAS must be low for 90ns.
    77 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
    78 
    79 Here, "A" and "B" respectively indicate the row and first column addresses
    80 being latched into the RAM (on a negative edge for ~RAS and ~CAS
    81 respectively), and "C" indicates the second column address being latched into
    82 the RAM. Presumably, the first and second half-bytes can be read at "F" and
    83 "S" respectively, and the row and column addresses must be made available at
    84 "a" and "b" (and "c") respectively at the latest. Data can be read at "f" and
    85 "s" for the first and second half-bytes respectively.
    86 
    87 For the CPU, "L" indicates the point at which an address is taken from the CPU
    88 address bus, on a negative edge of PHI OUT, with "D" being the point at which
    89 data may either be read or be asserted for writing, on a positive edge of PHI
    90 OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low
    91 for writing or high for reading, and thus propagates RnW from the CPU, this
    92 would need to be done before data would be retrieved and, according to the
    93 TM4164EC4 datasheet, even as late as the column address is presented and ~CAS
    94 brought low.
    95 
    96 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
    97 address access time of 90ns (maximum), which appears to mean that ~RAS must be
    98 held low for at least 150ns and that ~CAS must be held low for at least 90ns
    99 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
   100 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
   101 is 1.5 cycles.
   102 
   103 Note that the Service Manual refers to the negative edge of RAS and CAS, but
   104 the datasheet for the similar TM4164EC4 product shows latching on the negative
   105 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
   106 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
   107 "page mode" provides the appropriate behaviour for that particular product.
   108 
   109 The CPU, when accessing the RAM alone, apparently does not make use of the
   110 vacated "slot" that the ULA would otherwise use (when interleaving accesses in
   111 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
   112 accessing ROM (and potentially sideways RAM). The principal limitation is the
   113 amount of time needed between issuing an address and receiving an entire byte
   114 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
   115 4 cycles that would be required for 2MHz operation.
   116 
   117 See: Acorn Electron Advanced User Guide
   118 See: Acorn Electron Service Manual
   119      http://acorn.chriswhy.co.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
   120 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
   121 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
   122 
   123 CPU Clock Notes
   124 ---------------
   125 
   126 "The 6502 receives an external square-wave clock input signal on pin 37, which
   127 is usually labeled PHI0. [...] This clock input is processed within the 6502
   128 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2
   129 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been
   130 through two inverters and a push-pull amplifier. The same network of
   131 transistors within the 6502 which generates PHI2 is also tied to PHI1, and
   132 generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made
   133 available to external devices is so that they know when they can access the
   134 CPU. When PHI1 is high, this means that external devices can read from the
   135 address bus or data bus; when PHI2 is high, this means that external devices
   136 can write to the data bus."
   137 
   138 See: http://lateblt.livejournal.com/88105.html
   139 
   140 "The 6502 has a synchronous memory bus where the master clock is divided into
   141 two phases (Phase 1 and Phase 2). The address is always generated during Phase
   142 1 and all memory accesses take place during Phase 2."
   143 
   144 See: http://www.jmargolin.com/vgens/vgens.htm
   145 
   146 Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During
   147 Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means
   148 when PHI1 is high.
   149 
   150 Bandwidth Figures
   151 -----------------
   152 
   153 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
   154 total lines, with 80 cycles occurring in the active periods of display
   155 scanlines, the following bandwidth calculations can be performed:
   156 
   157 Total theoretical maximum:
   158        128 cycles * 312 lines
   159      = 39936 bytes
   160 
   161 MODE 0, 1, 2:
   162 ULA:    80 cycles * 256 lines
   163      = 20480 bytes
   164 CPU:    48 cycles / 2 * 256 lines
   165      + 128 cycles / 2 * (312 - 256) lines
   166      = 9728 bytes
   167 
   168 MODE 3:
   169 ULA:    80 cycles * 24 rows * 8 lines
   170      = 15360 bytes
   171 CPU:    48 cycles / 2 * 24 rows * 8 lines
   172      + 128 cycles / 2 * (312 - (24 rows * 8 lines))
   173      = 12288 bytes
   174 
   175 MODE 4, 5:
   176 ULA:    40 cycles * 256 lines
   177      = 10240 bytes
   178 CPU:   (40 cycles + 48 cycles / 2) * 256 lines
   179      + 128 cycles / 2 * (312 - 256) lines
   180      = 19968 bytes
   181 
   182 MODE 6:
   183 ULA:    40 cycles * 24 rows * 8 lines
   184      = 7680 bytes
   185 CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
   186      + 128 cycles / 2 * (312 - (24 rows * 8 lines))
   187      = 19968 bytes
   188 
   189 Here, the division of 2 for CPU accesses is performed to indicate that the CPU
   190 only uses every other access opportunity even in uncontended periods. See the
   191 2MHz RAM Access enhancement below for bandwidth calculations that consider
   192 this limitation removed.
   193 
   194 Video Timing
   195 ------------
   196 
   197 According to 8.7 in the Service Manual, and the PAL Wikipedia page,
   198 approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
   199 (including the "colour burst"), and 1.65µs for the "front porch", totalling
   200 12.05µs and thus leaving 51.95µs for the active video signal for each
   201 scanline. As the Service Manual suggests in the oscilloscope traces, the
   202 display information is transmitted more or less centred within the active
   203 video period since the ULA will only be providing pixel data for 40µs in each
   204 scanline.
   205 
   206 Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
   207 each scanline can be divided into 1024 cycles, although only 640 at most are
   208 actively used to provide pixel data. Pixel data production should only occur
   209 within a certain period on each scanline, approximately 262 cycles after the
   210 start of hsync:
   211 
   212   active video period = 51.95µs
   213   pixel data period = 40µs
   214   total silent period = 51.95µs - 40µs = 11.95µs
   215   silent periods (before and after) = 11.95µs / 2 = 5.975µs
   216   hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
   217   time before pixel data period = 10.4µs + 5.975µs = 16.375µs
   218   pixel data period start cycle = 16.375µs / 62.5ns = 262
   219 
   220 By choosing a number divisible by 8, the RAM access mechanism can be
   221 synchronised with the pixel production. Thus, 256 is a more appropriate start
   222 cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
   223 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
   224 document) occurs at cycle 0.
   225 
   226 To summarise:
   227 
   228   HS signal starts at cycle 0 on each horizontal scanline
   229   HS signal ends approximately 4µs later at cycle 64
   230   Pixel data starts approximately 12µs later at cycle 256
   231 
   232 "Re: Electron Memory Contention" provides measurements that appear consistent
   233 with these calculations.
   234 
   235 The "vertical blanking period", meaning the period before picture information
   236 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
   237 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
   238 lines. Thus, the first visible scanline on the first field of a frame occurs
   239 half way through the 23rd scanline period measured from the start of vsync
   240 (indicated by "V" in the diagrams below):
   241 
   242                                         10                  20    23
   243   Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
   244     Line from 1:       0                                          22 3
   245  Line on screen: .:::::VVVVV:::::                                   12233445566
   246                   |_________________________________________________|
   247                            25 line vertical blanking period
   248 
   249 In the second field of a frame, the first visible scanline coincides with the
   250 24th scanline period measured from the start of line 313 in the frame:
   251 
   252                310                                                 336
   253   Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
   254   Line from 313:       0                                            23 4
   255  Line on screen: 88:::::VVVVV::::                                    11223344
   256                288 |                                                 |
   257                    |_________________________________________________|
   258                             25 line vertical blanking period
   259 
   260 In order to consider only full lines, we might consider the start of each
   261 frame to occur 23 lines after the start of vsync.
   262 
   263 Again, it is likely that pixel data production should only occur on scanlines
   264 within a certain period on each frame. The "625/50" document indicates that
   265 only a certain region is "safe" to use, suggesting a vertically centred region
   266 with approximately 15 blank lines above and below the picture. However, the
   267 "PAL TV timing and voltages" document suggests 28 blank lines above and below
   268 the picture. This would centre the 256 lines within the 312 lines of each
   269 field and thus provide a start of picture approximately 5.5 or 5 lines after
   270 the end of the blanking period or 28 or 27.5 lines after the start of vsync.
   271 
   272 To summarise:
   273 
   274   CSYNC signal starts at cycle 0
   275   CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
   276   Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
   277 
   278 See: http://en.wikipedia.org/wiki/PAL
   279 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
   280 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
   281      http://lipas.uwasa.fi/~f76998/video/modes/
   282 See: PAL TV timing and voltages
   283      http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
   284 See: Line Standards
   285      http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
   286 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
   287      http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
   288 See: Re: Electron Memory Contention
   289      http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
   290 
   291 RAM Integrated Circuits
   292 -----------------------
   293 
   294 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
   295 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
   296 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
   297 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
   298 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
   299 
   300 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
   301 the Samsung-produced KM41464 series is apparently equivalent to the Texas
   302 Instruments 4164 chips presumably used in the Electron.
   303 
   304 The TM4164EC4 series combines 4 64K x 1b units into a single package and
   305 appears similar to the TM4164EA4 featured on the Electron's circuit diagram
   306 (in the Advanced User Guide but not the Service Manual), and it also has 22
   307 pins providing 3 additional inputs and 3 additional outputs over the 16 pins
   308 of the individual 4164-15 modules, presumably allowing concurrent access to
   309 the packaged memory units.
   310 
   311 As far as currently available replacements are concerned, the NTE4164 is a
   312 potential candidate: according to the Vetco Electronics entry, it is
   313 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
   314 parts include the NTE2164 and the NTE6664, both of which appear to have
   315 largely the same performance and connection characteristics. Meanwhile, the
   316 NTE21256 appears to be a 16-pin replacement with four times the capacity that
   317 maintains the single data input and output pins. Using the NTE21256 as a
   318 replacement for all ICs combined would be difficult because of the single bit
   319 output.
   320 
   321 Another device equivalent to the 4164-15 appears to be available under the
   322 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
   323 site lists data sheets for other devices on the same page, but these are
   324 different and actually appear to be provided under the 41574 product code (but
   325 are listed under 41464-10) and appear to be replacements for the TM4164EC4:
   326 the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
   327 employing 4 pins for both input and output.
   328 
   329             Pins    I/O pins    Row access  Column access
   330             ----    --------    ----------  -------------
   331 TM4164EC4   22      4 + 4       150ns (15)  90ns (15)
   332 KM41464AP   18      4           150ns (15)  75ns (15)
   333 NTE21256    16      1 + 1       150ns       75ns
   334 HYB 4164-2  16      1 + 1       150ns       100ns
   335 µPD41464    18      4           120ns (12)  60ns (12)
   336 
   337 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
   338      http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf
   339 See: Dynamic RAMS
   340      http://www.unicornelectronics.com/IC/DYNAMIC.html
   341 See: New old stock 8x 4164 chips
   342      http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
   343 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
   344      http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
   345 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
   346      http://www.vetco.net/catalog/product_info.php?products_id=2806
   347 See: NTE4164 - IC-NMOS 64K DRAM 150NS
   348      http://www.vetco.net/catalog/product_info.php?products_id=3680
   349 See: NTE21256 - IC-256K DRAM 150NS
   350      http://www.vetco.net/catalog/product_info.php?products_id=2799
   351 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
   352      http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
   353 See: NTE6664 - IC-MOS 64K DRAM 150NS
   354      http://www.vetco.net/catalog/product_info.php?products_id=5213
   355 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
   356      http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
   357 See: 4164-150: MAJOR BRANDS
   358      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
   359 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
   360      http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
   361 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
   362      http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
   363 See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
   364      http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
   365 See: 41464-10: MAJOR BRANDS
   366      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
   367 
   368 Interrupts
   369 ----------
   370 
   371 The ULA generates IRQs (maskable interrupts) according to certain conditions
   372 and these conditions are controlled by location &FE00:
   373 
   374   * Vertical sync (bottom of displayed screen)
   375   * 50MHz real time clock
   376   * Transmit data empty
   377   * Receive data full
   378   * High tone detect
   379 
   380 The ULA is also used to clear interrupt conditions through location &FE05. Of
   381 particular significance is bit 7, which must be set if an NMI (non-maskable
   382 interrupt) has occurred and has thus suspended ULA access to memory, restoring
   383 the normal function of the ULA.
   384 
   385 ROM Paging
   386 ----------
   387 
   388 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
   389 mappings exist:
   390 
   391    8    keyboard
   392    9    keyboard (duplicate)
   393   10    BASIC ROM
   394   11    BASIC ROM (duplicate)
   395 
   396 Paging in a ROM involves the following procedure:
   397 
   398  1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
   399     2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
   400     selected.
   401  2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
   402     whilst writing the desired ROM number n in bits 0 to 2.
   403 
   404 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
   405 
   406 Shadow/Expanded Memory
   407 ----------------------
   408 
   409 The Electron exposes all sixteen address lines and all eight data lines
   410 through the expansion bus. Using such lines, it is possible to provide
   411 additional memory - typically sideways ROM and RAM - on expansion cards and
   412 through cartridges, although the official cartridge specification provides
   413 fewer address lines and only seeks to provide access to memory in 16K units.
   414 
   415 Various modifications and upgrades were developed to offer "turbo"
   416 capabilities to the Electron, permitting the CPU to access a separate 8K of
   417 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
   418 the ULA through additional logic. However, an enhanced ULA might support
   419 independent CPU access to memory over the expansion bus by allowing itself to
   420 be discharged from providing access to memory, potentially for a range of
   421 addresses, and for the CPU to communicate with external memory uninterrupted.
   422 
   423 Sideways RAM/ROM and Upper Memory Access
   424 ----------------------------------------
   425 
   426 Although the ULA controls the CPU clock, effectively slowing or stopping the
   427 CPU when the ULA needs to access screen memory, it is apparently able to allow
   428 the CPU to access addresses of &8000 and above - the upper region of memory -
   429 at 2MHz independently of any access to RAM that the ULA might be performing,
   430 only blocking the CPU if it attempts to access addresses of &7FFF and below
   431 during any ULA memory access - the lower region of memory - by stopping or
   432 stalling its clock.
   433 
   434 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
   435 CPU clock if the line goes low, when the CPU is attempting to access the lower
   436 region of memory.
   437 
   438 Hardware Scrolling (and Enhancement)
   439 ------------------------------------
   440 
   441 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
   442 the least significant 5 bits being zero, thus limiting the scrolling
   443 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
   444 using the same layout of these addresses.
   445 
   446 |--&FE02--------------| |--&FE03--------------|
   447 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
   448 
   449    XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
   450 
   451 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
   452 memory to pixel locations is character oriented. A change in 8 bytes would
   453 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
   454 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
   455 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
   456 Guide).
   457 
   458 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
   459 of changing the screen address by 2 bytes is the change in the number of lines
   460 from the initial and final character rows that need reading by the ULA, which
   461 would need to maintain this state information (although this is a relatively
   462 trivial change). Another pitfall is the complication that might be introduced
   463 to software writing bitmaps of character height to the screen.
   464 
   465 See: http://pastraiser.com/computers/acornelectron/acornelectron.html
   466 
   467 Enhancement: Mode Layouts
   468 -------------------------
   469 
   470 Merely changing the screen memory mappings in order to have Archimedes-style
   471 row-oriented screen addresses (instead of character-oriented addresses) could
   472 be done for the existing modes, but this might not be sufficiently beneficial,
   473 especially since accessing regions of the screen would involve incrementing
   474 pointers by amounts that are inconvenient on an 8-bit CPU.
   475 
   476 However, instead of using a Archimedes-style mapping, column-oriented screen
   477 addresses could be more feasibly employed: incrementing the address would
   478 reference the vertical screen location below the currently-referenced location
   479 (just as occurs within characters using the existing ULA); instead of
   480 returning to the top of the character row and referencing the next horizontal
   481 location after eight bytes, the address would reference the next character row
   482 and continue to reference locations downwards over the height of the screen
   483 until reaching the bottom; at the bottom, the next location would be the next
   484 horizontal location at the top of the screen.
   485 
   486 In other words, the memory layout for the screen would resemble the following
   487 (for MODE 2):
   488 
   489   &3000 &3100       ... &7F00
   490   &3001 &3101
   491   ...   ...
   492   &3007
   493   &3008
   494   ...
   495   ...                   ...
   496   &30FF             ... &7FFF
   497 
   498 Since there are 256 pixel rows, each column of locations would be addressable
   499 using the low byte of the address. Meanwhile, the high byte would be
   500 incremented to address different columns. Thus, addressing screen locations
   501 would become a lot more convenient and potentially much more efficient for
   502 certain kinds of graphical output.
   503 
   504 One potential complication with this simplified addressing scheme arises with
   505 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
   506 with the existing ULA) would be achieved by incrementing or decrementing the
   507 screen start address; by one character row, it would involve adding or
   508 subtracting 8. However, the ULA only supports multiples of 64 when changing the
   509 screen start address. Thus, if such a scheme were to be adopted, three
   510 additional bits would need to be supported in the screen start register (see
   511 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal
   512 scrolling would be much improved even under the severe constraints of the
   513 existing ULA: only adjustments of 256 to the screen start address would be
   514 required to produce single-location scrolling of as few as two pixels in MODE 2
   515 (four pixels in MODEs 1 and 5, eight pixels otherwise).
   516 
   517 More disruptive is the effect of this alternative layout on software.
   518 Presumably, compatibility with the BBC Micro was the primary goal of the
   519 Electron's hardware design. With the character-oriented screen layout in
   520 place, system software (and application software accessing the screen
   521 directly) would be relying on this layout to run on the Electron with little
   522 or no modification. Although it might have been possible to change the system
   523 software to use this column-oriented layout instead, this would have incurred
   524 a development cost and caused additional work porting things like games to the
   525 Electron. Moreover, a separate branch of the software from that supporting the
   526 BBC Micro and closer derivatives would then have needed maintaining.
   527 
   528 The decision to use the character-oriented layout in the BBC Micro may have
   529 been related to the choice of circuitry and to facilitate a convenient
   530 hardware implementation, and by the time the Electron was planned, it was too
   531 late to do anything about this somewhat unfortunate choice.
   532 
   533 Pixel Layouts
   534 -------------
   535 
   536 The pixel layouts are as follows:
   537 
   538   Modes         Depth (bpp)     Pixels (from bits)
   539   -----         -----------     ------------------
   540   0, 3, 4, 6    1               7 6 5 4 3 2 1 0
   541   1, 5          2               73 62 51 40
   542   2             4               7531 6420
   543 
   544 Since the ULA reads a half-byte at a time, one might expect it to attempt to
   545 produce pixels for every half-byte, as opposed to handling entire bytes.
   546 However, the pixel layout is not conducive to producing pixels as soon as a
   547 half-byte has been read for a given full-byte location: in 1bpp modes the
   548 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
   549 data is spread across the entire byte in different ways.
   550 
   551 An alternative arrangement might be as follows:
   552 
   553   Modes         Depth (bpp)     Pixels (from bits)
   554   -----         -----------     ------------------
   555   0, 3, 4, 6    1               7 6 5 4 3 2 1 0
   556   1, 5          2               76 54 32 10
   557   2             4               7654 3210
   558 
   559 Just as the mode layouts were presumably decided by compatibility with the BBC
   560 Micro, the pixel layouts will have been maintained for similar reasons.
   561 Unfortunately, this layout prevents any optimisation of the ULA for handling
   562 half-byte pixel data generally.
   563 
   564 Enhancement: The Missing MODE 4
   565 -------------------------------
   566 
   567 The Electron inherits its screen mode selection from the BBC Micro, where MODE
   568 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
   569 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
   570 however, and they are merely implemented by skipping two scanlines in every
   571 ten after the eight required to produce a character line. Thus, such modes
   572 provide a 24-row display.
   573 
   574 In principle, nothing prevents this "text mode" effect being applied to other
   575 modes. The 20-column modes are not well-suited to displaying text, which
   576 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
   577 2. Although the need for a non-monochrome 40-column text mode is addressed by
   578 MODE 7 on the BBC Micro, the Electron lacks such a mode.
   579 
   580 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
   581 would occupy MODE 4 instead of the current MODE 4:
   582 
   583   Screen mode  Size (kilobytes)  Colours  Rows  Resolution
   584   -----------  ----------------  -------  ----  ----------
   585   0            20                2        32    640x256
   586   1            20                4        32    320x256
   587   2            20                16       32    160x256
   588   3            16                2        24    640x256
   589   4 (new)      16                4        24    320x256
   590   4 (old)      10                2        32    320x256
   591   5            10                4        32    160x256
   592   6            8                 2        24    320x256
   593 
   594 Thus, for increasing mode numbers, the size of each mode would be the same or
   595 less than the preceding mode.
   596 
   597 Enhancement: 2MHz RAM Access
   598 ----------------------------
   599 
   600 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
   601 when not competing with the ULA only accesses RAM every other 2MHz cycle (as
   602 if the ULA still needed to access the RAM), one useful enhancement would be a
   603 mechanism to let the CPU take over the ULA cycles outside the ULA's period of
   604 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
   605 3.
   606 
   607 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
   608 
   609   Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
   610   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
   611 
   612 In MODE 4 to 6:
   613  
   614   Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
   615   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
   616 
   617 This would improve CPU bandwidth as follows:
   618 
   619                 Standard ULA    Enhanced ULA
   620 MODE 0, 1, 2    9728 bytes      19456 bytes
   621 MODE 3          12288 bytes     24576 bytes
   622 MODE 4, 5       19968 bytes     29696 bytes
   623 MODE 6          19968 bytes     32256 bytes
   624 
   625 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
   626 because all access opportunities to RAM are doubled. Meanwhile, in the other
   627 modes, some CPU accesses occur alongside ULA accesses and thus cannot be
   628 doubled, but the CPU bandwidth increase is still significant.
   629 
   630 Unfortunately, the mechanism for accessing the RAM is too slow to provide data
   631 within the time constraints of 2MHz operation. There is no time remaining in a
   632 2MHz cycle for the CPU to receive and process any retrieved data.
   633 
   634 Enhancement: Region Blanking
   635 ----------------------------
   636 
   637 The problem of permitting character-oriented blitting in programs whilst
   638 scrolling the screen by sub-character amounts could be mitigated by permitting
   639 a region of the display to be blank, such as the final lines of the display.
   640 Consider the following vertical scrolling by 2 bytes that would cause an
   641 initial character row of 6 lines and a final character row of 2 lines:
   642 
   643     6 lines - initial, partial character row
   644   248 lines - 31 complete rows
   645     2 lines - final, partial character row
   646 
   647 If a routine were in use that wrote 8 line bitmaps to the partial character
   648 row now split in two, it would be advisable to hide one of the regions in
   649 order to prevent content appearing in the wrong place on screen (such as
   650 content meant to appear at the top "leaking" onto the bottom). Blanking 6
   651 lines would be sufficient, as can be seen from the following cases.
   652 
   653 Scrolling up by 2 lines:
   654 
   655     6 lines - initial, partial character row
   656   240 lines - 30 complete rows
   657     4 lines - part of 1 complete row
   658   -----------------------------------------------------------------
   659     4 lines - part of 1 complete row (hidden to maintain 250 lines)
   660     2 lines - final, partial character row (hidden)
   661 
   662 Scrolling down by 2 lines:
   663 
   664     2 lines - initial, partial character row
   665   248 lines - 31 complete rows
   666   ----------------------------------------------------------
   667     6 lines - final, partial character row (hidden)
   668 
   669 Thus, in this case, region blanking would impose a 250 line display with the
   670 bottom 6 lines blank.
   671 
   672 See the description of the display suspend enhancement for a more efficient
   673 way of blanking lines than merely blanking the palette whilst allowing the CPU
   674 to perform useful work during the blanking period.
   675 
   676 To control the blanking or suspending of lines at the top and bottom of the
   677 display, a memory location could be dedicated to the task: the upper 4 bits
   678 could define a blanking region of up to 16 lines at the top of the screen,
   679 whereas the lower 4 bits could define such a region at the bottom of the
   680 screen. If more lines were required, two locations could be employed, allowing
   681 the top and bottom regions to occupy the entire screen.
   682 
   683 Enhancement: Screen Height Adjustment
   684 -------------------------------------
   685 
   686 The height of the screen could be configurable in order to reduce screen
   687 memory consumption. This is not quite done in MODE 3 and 6 since the start of
   688 the screen appears to be rounded down to the nearest page, but by reducing the
   689 height by amounts more than a page, savings would be possible. For example:
   690 
   691   Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address
   692   ------------  -----  ------  --------------  ---------------  -------------
   693   640           1      252     80              320              &3140 -> &3100
   694   640           1      248     80              640              &3280 -> &3200
   695   320           1      240     40              640              &5A80 -> &5A00
   696   320           2      240     80              1280             &3500
   697 
   698 Screen Mode Selection
   699 ---------------------
   700 
   701 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
   702 range of modes, the other bits of &FE*7 (related to sound, cassette
   703 input/output and the Caps Lock LED) would need to be reassigned and bit 0
   704 potentially being made available for use.
   705 
   706 Enhancement: Palette Definition
   707 -------------------------------
   708 
   709 Since all memory accesses go via the ULA, an enhanced ULA could employ more
   710 specific addresses than &FE*X to perform enhanced functions. For example, the
   711 palette control is done using &FE*8-F and merely involves selecting predefined
   712 colours, whereas an enhanced ULA could support the redefinition of all 16
   713 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
   714 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour
   715 specifications similar to those used on the Archimedes.
   716 
   717 The principal limitation here is actually the hardware: the Electron has only
   718 a single output line for each of the red, green and blue channels, and if
   719 those outputs are strictly digital and can only be set to a "high" and "low"
   720 value, then only the existing eight colours are possible. If a modern ULA were
   721 able to output analogue values (or values at well-defined points between the
   722 high and low values, such as the half-on value supported by the Amstrad CPC
   723 series), it would still need to be assessed whether the circuitry could
   724 successfully handle and propagate such values. Various sources indicate that
   725 only "TTL levels" are supported by the RGB output circuit, and since there are
   726 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
   727 is likely that the ULA is expected to provide only "high" or "low" values.
   728 
   729 Short of adding extra outputs from the ULA (either additional red, green and
   730 blue outputs or a combined intensity output), another approach might involve
   731 some kind of modulation where an output value might be encoded in multiple
   732 pulses at a higher frequency than the pixel frequency. However, this would
   733 demand additional circuitry outside the ULA, and component RGB monitors would
   734 probably not be able to take advantage of this feature; only UHF and composite
   735 video devices (the latter with the composite video colour support enabled on
   736 the Electron's circuit board) would potentially benefit.
   737 
   738 Flashing Colours
   739 ----------------
   740 
   741 According to the Advanced User Guide, "The cursor and flashing colours are
   742 entirely generated in software: This means that all of the logical to physical
   743 colour map must be changed to cause colours to flash." This appears to suggest
   744 that the palette registers must be updated upon the flash counter - read and
   745 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
   746 colour pairs to be any combination of colours might be possible, instead of
   747 having colour complements as pairs.
   748 
   749 It is conceivable that the interrupt code responsible does the simple thing
   750 and merely inverts the current values for any logical colours (LC) for which
   751 the associated physical colour (as supplied as the second parameter to the VDU
   752 19 call) has the top bit of its four bit value set. These top bits are not
   753 recorded in the palette registers but are presumably recorded separately and
   754 used to build bitmaps as follows:
   755 
   756   LC  2 colour  4 colour  16 colour  4-bit value for inversion
   757   --  --------  --------  ---------  -------------------------
   758    0  00010001  00010001  00010001   1, 1, 1
   759    1  01000100  00100010  00010001   4, 2, 1
   760    2            01000100  00100010      4, 2
   761    3            10001000  00100010      8, 2
   762    4                      00010001         1
   763    5                      00010001         1
   764    6                      00100010         2
   765    7                      00100010         2
   766    8                      01000100         4
   767    9                      01000100         4
   768   10                      10001000         8
   769   11                      10001000         8
   770   12                      01000100         4
   771   13                      01000100         4
   772   14                      10001000         8
   773   15                      10001000         8
   774 
   775   Inversion value calculation:
   776 
   777    2 colour formula: 1 << (colour * 2)
   778    4 colour formula: 1 << colour
   779   16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
   780 
   781 For example, where logical colour 0 has been mapped to a physical colour in
   782 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
   783 the inversion operation. (The lower three bits of the physical colour would be
   784 used to set the underlying colour information affected by the inversion
   785 operation.)
   786 
   787 An operation in the interrupt code would then combine the bitmaps for all
   788 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
   789 combined for groups of logical colours as follows:
   790 
   791    Logical colours
   792    ---------------
   793    0,  2,  8, 10
   794    4,  6, 12, 14
   795    5,  7, 13, 15
   796    1,  3,  9, 11
   797 
   798 These combined bitmaps would be EORed with the existing palette register
   799 values in order to perform the value inversion necessary to produce the
   800 flashing effect.
   801 
   802 Thus, in the VDU 19 operation, the appropriate inversion value would be
   803 calculated for the logical colour, and this value would then be combined with
   804 other inversion values in a dedicated memory location corresponding to the
   805 colour's group as indicated above. Meanwhile, the palette channel values would
   806 be derived from the lower three bits of the specified physical colour and
   807 combined with other palette data in dedicated memory locations corresponding
   808 to the palette registers.
   809 
   810 Interestingly, although flashing colours on the BBC Micro are controlled by
   811 toggling bit 0 of the &FE20 control register location for the Video ULA, the
   812 actual colour inversion is done in hardware.
   813 
   814 Enhancement: Palette Definition Lists
   815 -------------------------------------
   816 
   817 It can be useful to redefine the palette in order to change the colours
   818 available for a particular region of the screen, particularly in modes where
   819 the choice of colours is constrained, and if an increased colour depth were
   820 available, palette redefinition would be useful to give the illusion of more
   821 than 16 colours in MODE 2. Traditionally, palette redefinition has been done
   822 by using interrupt-driven timers, but a more efficient approach would involve
   823 presenting lists of palette definitions to the ULA so that it can change the
   824 palette at a particular display line.
   825 
   826 One might define a palette redefinition list in a region of memory and then
   827 communicate its contents to the ULA by writing the address and length of the
   828 list, along with the display line at which the palette is to be changed, to
   829 ULA registers such that the ULA buffers the list and performs the redefinition
   830 at the appropriate time. Throughput/bandwidth considerations might impose
   831 restrictions on the practical length of such a list, however.
   832 
   833 Enhancement: Display Synchronisation Interrupts
   834 -----------------------------------------------
   835 
   836 When completing each scanline of the display, the ULA could trigger an
   837 interrupt. Since this might impact system performance substantially, the
   838 feature would probably need to be configurable, and it might be sufficient to
   839 have an interrupt only after a certain number of display lines instead.
   840 Permitting the CPU to take action after eight lines would allow palette
   841 switching and other effects to occur on a character row basis.
   842 
   843 The ULA provides an interrupt at the end of the display period, presumably so
   844 that software can schedule updates to the screen, avoid flickering or tearing,
   845 and so on. However, some applications might benefit from an interrupt at, or
   846 just before, the start of the display period so that palette modifications or
   847 similar effects could be scheduled.
   848 
   849 Enhancement: Palette-Free Modes
   850 -------------------------------
   851 
   852 Palette-free modes might be defined where bit values directly correspond to
   853 the red, green and blue channels, although this would mostly make sense only
   854 for modes with depths greater than the standard 4 bits per pixel, and such
   855 modes would require more memory than MODE 2 if they were to have an acceptable
   856 resolution.
   857 
   858 Enhancement: Display Suspend
   859 ----------------------------
   860 
   861 Especially when writing to the screen memory, it could be beneficial to be
   862 able to suspend the ULA's access to the memory, instead producing blank values
   863 for all screen pixels until a program is ready to reveal the screen. This is
   864 different from palette blanking since with a blank palette, the ULA is still
   865 reading screen memory and translating its contents into pixel values that end
   866 up being blank.
   867 
   868 This function is reminiscent of a capability of the ZX81, albeit necessary on
   869 that hardware to reduce the load on the system CPU which was responsible for
   870 producing the video output. By allowing display suspend on the Electron, the
   871 performance benefit would be derived from giving the CPU full access to the
   872 memory bandwidth.
   873 
   874 The region blanking feature mentioned above could be implemented using this
   875 enhancement instead of employing palette blanking for the affected lines of
   876 the display.
   877 
   878 Enhancement: Memory Filling
   879 ---------------------------
   880 
   881 A capability that could be given to an enhanced ULA is that of permitting the
   882 ULA to write to screen memory as well being able to read from it. Although
   883 such a capability would probably not be useful in conjunction with the
   884 existing read operations when producing a screen display, and insufficient
   885 bandwidth would exist to do so in high-bandwidth screen modes anyway, the
   886 capability could be offered during a display suspend period (as described
   887 above), permitting a more efficient mechanism to rapidly fill memory with a
   888 predetermined value.
   889 
   890 This capability could also support block filling, where the limits of the
   891 filled memory would be defined by the position and size of a screen area,
   892 although this would demand the provision of additional registers in the ULA to
   893 retain the details of such areas and additional logic to control the fill
   894 operation.
   895 
   896 Enhancement: Region Filling
   897 ---------------------------
   898 
   899 An alternative to memory writing might involve indicating regions using
   900 additional registers or memory where the ULA fills regions of the screen with
   901 content instead of reading from memory. Unlike hardware sprites which should
   902 realistically provide varied content, region filling could employ single
   903 colours or patterns, and one advantage of doing so would be that the ULA need
   904 not access memory at all within a particular region.
   905 
   906 Regions would be defined on a row-by-row basis. Instead of reading memory and
   907 blitting a direct representation to the screen, the ULA would read region
   908 definitions containing a start column, region width and colour details. There
   909 might be a certain number of definitions allowed per row, or the ULA might
   910 just traverse an ordered list of such definitions with each one indicating the
   911 row, start column, region width and colour details.
   912 
   913 One could even compress this information further by requiring only the row,
   914 start column and colour details with each subsequent definition terminating
   915 the effect of the previous one. However, one would also need to consider the
   916 convenience of preparing such definitions and whether efficient access to
   917 definitions for a particular row might be desirable. It might also be
   918 desirable to avoid having to prepare definitions for "empty" areas of the
   919 screen, effectively making the definition of the screen contents employ
   920 run-length encoding and employ only colour plus length information.
   921 
   922 One application of region filling is that of simple 2D and 3D shape rendering.
   923 Although it is entirely possible to plot such shapes to the screen and have
   924 the ULA blit the memory contents to the screen, such operations consume
   925 bandwidth both in the initial plotting and in the final transfer to the
   926 screen. Region filling would reduce such bandwidth usage substantially.
   927 
   928 This way of representing screen images would make certain kinds of images
   929 unfeasible to represent - consider alternating single pixel values which could
   930 easily occur in some character bitmaps - even if an internal queue of regions
   931 were to be supported such that the ULA could read ahead and buffer such
   932 "bandwidth intensive" areas. Thus, the ULA might be better served providing
   933 this feature for certain areas of the display only as some kind of special
   934 graphics window.
   935 
   936 Enhancement: Hardware Sprites
   937 -----------------------------
   938 
   939 An enhanced ULA might provide hardware sprites, but this would be done in an
   940 way that is incompatible with the standard ULA, since no &FE*X locations are
   941 available for allocation. To keep the facility simple, hardware sprites would
   942 have a standard byte width and height.
   943 
   944 The specification of sprites could involve the reservation of 16 locations
   945 (for example, &FE20-F) specifying a fixed number of eight sprites, with each
   946 location pair referring to the sprite data. By limiting the ULA to dealing
   947 with a fixed number of sprites, the work required inside the ULA would be
   948 reduced since it would avoid having to deal with arbitrary numbers of sprites.
   949 
   950 The principal limitation on providing hardware sprites is that of having to
   951 obtain sprite data, given that the ULA is usually required to retrieve screen
   952 data, and given the lack of memory bandwidth available to retrieve sprite data
   953 (particularly from multiple sprites supposedly at the same position) and
   954 screen data simultaneously. Although the ULA could potentially read sprite
   955 data and screen data in alternate memory accesses in screen modes where the
   956 bandwidth is not already fully utilised, this would result in a degradation of
   957 performance.
   958 
   959 Enhancement: Additional Screen Mode Configurations
   960 --------------------------------------------------
   961 
   962 Alternative screen mode configurations could be supported. The ULA has to
   963 produce 640 pixel values across the screen, with pixel doubling or quadrupling
   964 employed to fill the screen width:
   965 
   966   Screen width      Columns     Scaling     Depth       Bytes
   967   ------------      -------     -------     -----       -----
   968   640               80          x1          1           80
   969   320               40          x2          1, 2        40, 80
   970   160               20          x4          2, 4        40, 80
   971 
   972 It must also use at most 80 byte-sized memory accesses to provide the
   973 information for the display. Given that characters must occupy an 8x8 pixel
   974 array, if a configuration featuring anything other than 20, 40 or 80 character
   975 columns is to be supported, compromises must be made such as the introduction
   976 of blank pixels either between characters (such as occurs between rows in MODE
   977 3 and 6) or at the end of a scanline (such as occurs at the end of the frame
   978 in MODE 3 and 6). Consider the following configuration:
   979 
   980   Screen width      Columns     Scaling     Depth       Bytes       Blank
   981   ------------      -------     -------     -----       ------      -----
   982   208               26          x3          1, 2        26, 52      16
   983 
   984 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
   985 colours could be provided, with 16 blank pixel values (out of a total of 640)
   986 generated either at the start or end (or split between the start and end) of
   987 each scanline.
   988 
   989 Enhancement: Character Attributes
   990 ---------------------------------
   991 
   992 The BBC Micro MODE 7 employs something resembling character attributes to
   993 support teletext displays, but depends on circuitry providing a character
   994 generator. The ZX Spectrum, on the other hand, provides character attributes
   995 as a means of colouring bitmapped graphics. Although such a feature is very
   996 limiting as the sole means of providing multicolour graphics, in situations
   997 where the choice is between low resolution multicolour graphics or high
   998 resolution monochrome graphics, character attributes provide a potentially
   999 useful compromise.
  1000 
  1001 For each byte read, the ULA must deliver 8 pixel values (out of a total of
  1002 640) to the video output, doing so by either emptying its pixel buffer on a
  1003 pixel per cycle basis, or by multiplying pixels and thus holding them for more
  1004 than one cycle. For example for a screen mode having 640 pixels in width:
  1005 
  1006   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  1007   Reads:    B                               B
  1008   Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7
  1009 
  1010 And for a screen mode having 320 pixels in width:
  1011 
  1012   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  1013   Reads:    B
  1014   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
  1015 
  1016 However, in modes where less than 80 bytes are required to generate the pixel
  1017 values, an enhanced ULA might be able to read additional bytes between those
  1018 providing the bitmapped graphics data:
  1019 
  1020   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  1021   Reads:    B                               A
  1022   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
  1023 
  1024 These additional bytes could provide colour information for the bitmapped data
  1025 in the following character column (of 8 pixels). Since it would be desirable
  1026 to apply attribute data to the first column, the initial 8 cycles might be
  1027 configured to not produce pixel values.
  1028 
  1029 For an entire character, attribute data need only be read for the first row of
  1030 pixels for a character. The subsequent rows would have attribute information
  1031 applied to them, although this would require the attribute data to be stored
  1032 in some kind of buffer. Thus, the following access pattern would be observed:
  1033 
  1034   Reads:    A B _ B _ B _ B _ B _ B _ B _ B ...
  1035 
  1036 In modes 3 and 6, the blank display lines could be used to retrieve attribute
  1037 data:
  1038 
  1039   Reads (blank):     A _ A _ A _ A _ A _ A _ A _ A _ ...
  1040   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
  1041   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
  1042                      ...
  1043 
  1044 See below for a discussion of using this for character data as well.
  1045 
  1046 A whole byte used for colour information for a whole character would result in
  1047 a choice of 256 colours, and this might be somewhat excessive. By only reading
  1048 attribute bytes at every other opportunity, a choice of 16 colours could be
  1049 applied individually to two characters.
  1050 
  1051   Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  1052   Reads:    B               A               B               -
  1053   Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
  1054 
  1055 Further reductions in attribute data access, offering 4 colours for every
  1056 character in a four character block, for example, might also be worth
  1057 considering.
  1058 
  1059 Consider the following configurations for screen modes with a colour depth of
  1060 1 bit per pixel for bitmap information:
  1061 
  1062   Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start
  1063   ------------  -------  -------  ---------  ---------  -------  ------------
  1064   320           40       x2       40         40         256      &5300
  1065   320           40       x2       40         20         16       &5580 -> &5500
  1066   320           40       x2       40         10         4        &56C0 -> &5600
  1067   208           26       x3       26         26         256      &62C0 -> &6200
  1068   208           26       x3       26         13         16       &6460 -> &6400
  1069 
  1070 Enhancement: Text-Only Modes using Cached Character and Attribute Data
  1071 ----------------------------------------------------------------------
  1072 
  1073 In modes 3 and 6, the blank display lines could be used to retrieve character
  1074 and attribute data instead of trying to insert it between bitmap data accesses,
  1075 but this data would then need to be retained:
  1076 
  1077   Reads:    A C A C A C A C A C A C A C A C ...
  1078   Reads:    B _ B _ B _ B _ B _ B _ B _ B _ ...
  1079 
  1080 Only attribute (A) and character (C) reads would require screen memory
  1081 storage. Bitmap data reads (B) would involve either accesses to memory to
  1082 obtain character definition details or could, at the cost of special storage
  1083 in the ULA, involve accesses within the ULA that would then free up the RAM.
  1084 However, the CPU would not benefit from having any extra access slots due to
  1085 the limitations of the RAM access mechanism.
  1086 
  1087 Enhancement: MODE 7 Emulation using Character Attributes
  1088 --------------------------------------------------------
  1089 
  1090 If the scheme of applying attributes to character regions were employed to
  1091 emulate MODE 7, in conjunction with the MODE 6 display technique, the
  1092 following configuration would be required:
  1093 
  1094   Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start
  1095   ------------  -------  ----  ---------  ---------  -------  ------------
  1096   320           40       25    40         20         16       &5ECC -> &5E00
  1097   320           40       25    40         10         4        &5FC6 -> &5F00
  1098 
  1099 Although this requires much more memory than MODE 7 (8500 bytes versus MODE
  1100 7's 1000 bytes), it does not need much more memory than MODE 6, and it would
  1101 at least make a limited 40-column multicolour mode available as a substitute
  1102 for MODE 7.
  1103 
  1104 Using the text-only enhancement with caching of data, the storage requirements
  1105 would be diminished substantially:
  1106 
  1107   Screen width  Columns  Rows  Bytes (C)  Bytes (A)  Colours  Screen start
  1108   ------------  -------  ----  ---------  ---------  -------  ------------
  1109   320           40       25    40         20         16       &7A94 -> &7A00
  1110   320           40       25    40         10         4        &7B1E -> &7B00
  1111   320           40       25    40         5          2        &7B9B -> &7B00
  1112   320           40       25    40         0          (2)      &7C18 -> &7C00
  1113   640           80       25    80         40         16       &7448 -> &7400
  1114   640           80       25    80         20         4        &763C -> &7600
  1115   640           80       25    80         10         2        &7736 -> &7700
  1116   640           80       25    80         0          (2)      &7830 -> &7800
  1117 
  1118 Note that the colours describe the locally defined attributes for each
  1119 character. When no attribute information is provided, the colours are defined
  1120 globally.
  1121 
  1122 Enhancement: Compressed Character Data
  1123 --------------------------------------
  1124 
  1125 Another observation about text-only modes is that they only need to store a
  1126 restricted set of bitmapped data values. Encoding this set of values in a
  1127 smaller unit of storage than a byte could possibly help to reduce the amount
  1128 of storage and bandwidth required to reproduce the characters on the display.
  1129 
  1130 Enhancement: High Resolution Graphics
  1131 -------------------------------------
  1132 
  1133 Screen modes with higher resolutions and larger colour depths might be
  1134 possible, but this would in most cases involve the allocation of more screen
  1135 memory, and the ULA would probably then be obliged to page in such memory for
  1136 the CPU to be able to sensibly access it all.
  1137 
  1138 Enhancement: Genlock Support
  1139 ----------------------------
  1140 
  1141 The ULA generates a video signal in conjunction with circuitry producing the
  1142 output features necessary for the correct display of the screen image.
  1143 However, it appears that the ULA drives the video synchronisation mechanism
  1144 instead of reacting to an existing signal. Genlock support might be possible
  1145 if the ULA were made to be responsive to such external signals, resetting its
  1146 address generators upon receiving synchronisation events.
  1147 
  1148 Enhancement: Improved Sound
  1149 ---------------------------
  1150 
  1151 The standard ULA reserves &FE*6 for sound generation and cassette input/output
  1152 (with bits 1 and 2 of &FE*7 being used to select either sound generation or
  1153 cassette I/O), thus making it impossible to support multiple channels within
  1154 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
  1155 and an enhanced ULA could adopt this interface.
  1156 
  1157 The BBC Micro uses the SN76489 chip to produce sound, and the entire
  1158 functionality of this chip could be emulated for enhanced sound, with a subset
  1159 of the functionality exposed via the &FE*6 interface.
  1160 
  1161 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
  1162 See: http://www.smspower.org/Development/SN76489
  1163 
  1164 Enhancement: Waveform Upload
  1165 ----------------------------
  1166 
  1167 As with a hardware sprite function, waveforms could be uploaded or referenced
  1168 using locations as registers referencing memory regions.
  1169 
  1170 Enhancement: Sound Input/Output
  1171 -------------------------------
  1172 
  1173 Since the ULA already controls audio input/output for cassette-based data, it
  1174 would have been interesting to entertain the idea of sampling and output of
  1175 sounds through the cassette interface. However, a significant amount of
  1176 circuitry is employed to process the input signal for use by the ULA and to
  1177 process the output signal for recording.
  1178 
  1179 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
  1180 
  1181 Enhancement: BBC ULA Compatibility
  1182 ----------------------------------
  1183 
  1184 Although some new ULA functions could be defined in a way that is also
  1185 compatible with the BBC Micro, the BBC ULA is itself incompatible with the
  1186 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
  1187 map, but controls various functions specific to the 6845 video controller;
  1188 &FE08-F is reserved for the serial controller. It therefore becomes possible
  1189 to disregard compatibility where compatibility is already disregarded for a
  1190 particular area of functionality.
  1191 
  1192 &FE20-F maps to video ULA functionality on the BBC Micro which provides
  1193 control over the palette (using address &FE21, compared to &FE07-F on the
  1194 Electron) and other system-specific functions. Since the location usage is
  1195 generally incompatible, this region could be reused for other purposes.
  1196 
  1197 Enhancement: Increased RAM, ULA and CPU Performance
  1198 ---------------------------------------------------
  1199 
  1200 More modern implementations of the hardware might feature faster RAM coupled
  1201 with an increased ULA clock frequency in order to increase the bandwidth
  1202 available to the ULA and to the CPU in situations where the ULA is not needed
  1203 to perform work. A ULA employing a 32MHz clock would be able to complete the
  1204 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
  1205 to access the RAM for the following 250ns even in display modes requiring the
  1206 retrieval of a byte for the display every 500ns. The CPU could, subject to
  1207 timing issues, run at 2MHz even in MODE 0, 1 and 2.
  1208 
  1209 A scheme such as that described above would have a similar effect to the
  1210 scheme employed in the BBC Micro, although the latter made use of RAM with a
  1211 wider bandwidth in order to complete memory transfers within 250ns and thus
  1212 permit the CPU to run continuously at 2MHz.
  1213 
  1214 Higher bandwidth could potentially be used to implement exotic features such
  1215 as RAM-resident hardware sprites or indeed any feature demanding RAM access
  1216 concurrent with the production of the display image.
  1217 
  1218 Enhancement: Multiple CPU Stacks and Zero Pages
  1219 -----------------------------------------------
  1220 
  1221 The 6502 maintains a stack for subroutine calls and register storage in page
  1222 &01. Although the stack register can be manipulated using the TSX and TXS
  1223 instructions, thereby permitting the maintenance of multiple stack regions and
  1224 thus the potential coexistence of multiple programs each using a separate
  1225 region, only programs that make little use of the stack (perhaps avoiding
  1226 deeply-nested subroutine invocations and significant register storage) would
  1227 be able to coexist without overwriting each other's stacks.
  1228 
  1229 One way that this issue could be alleviated would involve the provision of a
  1230 facility to redirect accesses to page &01 to other areas of memory. The ULA
  1231 would provide a register that defines a physical page for the use of the CPU's
  1232 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would
  1233 change the asserted address lines to redirect the access to the appropriate
  1234 physical region.
  1235 
  1236 By providing an 8-bit register, mapping to the most significant byte (MSB) of
  1237 a 16-bit address, the ULA could then replace any MSB equal to &01 with the
  1238 register value before the access is made. Where multiple programs coexist,
  1239 upon switching programs, the register would be updated to point the ULA to the
  1240 appropriate stack location, thus providing a simple memory management unit
  1241 (MMU) capability.
  1242 
  1243 In a similar fashion, zero page accesses could also be redirected so that code
  1244 could run from sideways RAM and have zero page operations redirected to "upper
  1245 memory" - for example, to page &BE (with stack accesses redirected to page
  1246 &BF, perhaps) - thereby permitting most CPU operations to occur without
  1247 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
  1248 CPU as it contends with the ULA for memory access.
  1249 
  1250 Such facilities could also be provided by a separate circuit between the CPU
  1251 and ULA in a fashion similar to that employed by a "turbo" board, but unlike
  1252 such boards, no additional RAM would be provided: all memory accesses would
  1253 occur as normal through the ULA, albeit redirected when configured
  1254 appropriately.
  1255 
  1256 ULA Pin Functions
  1257 -----------------
  1258 
  1259 The functions of the ULA pins are described in the Electron Service Manual. Of
  1260 interest to video processing are the following:
  1261 
  1262   CSYNC (low during horizontal or vertical synchronisation periods, high
  1263          otherwise)
  1264 
  1265   HS (low during horizontal synchronisation periods, high otherwise)
  1266 
  1267   RED, GREEN, BLUE (pixel colour outputs)
  1268 
  1269   CLOCK IN (a 16MHz clock input, 4V peak to peak)
  1270 
  1271   PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
  1272 
  1273 More general memory access pins:
  1274 
  1275   RAM0...RAM3 (data lines to/from the RAM)
  1276 
  1277   RA0...RA7 (address lines for sending both row and column addresses to the RAM)
  1278 
  1279   RAS (row address strobe setting the row address on a negative edge - see the
  1280        timing notes)
  1281 
  1282   CAS (column address strobe setting the column address on a negative edge -
  1283        see the timing notes)
  1284 
  1285   WE (sets write enable with logic 0, read with logic 1)
  1286 
  1287   ROM (select data access from ROM)
  1288 
  1289 CPU-oriented memory access pins:
  1290 
  1291   A0...A15 (CPU address lines)
  1292 
  1293   PD0...PD7 (CPU data lines)
  1294 
  1295   R/W (indicates CPU write with logic 0, CPU read with logic 1)
  1296 
  1297 Interrupt-related pins:
  1298 
  1299   NMI (CPU request for uninterrupted 1MHz access to memory)
  1300 
  1301   IRQ (signal event to CPU)
  1302 
  1303   POR (power-on reset, resetting the ULA on a positive edge and asserting the
  1304        CPU's RST pin)
  1305 
  1306   RST (master reset for the CPU signalled on power-up and by the Break key)
  1307 
  1308 Keyboard-related pins:
  1309 
  1310   KBD0...KBD3 (keyboard inputs)
  1311 
  1312   CAPS LOCK (control status LED)
  1313 
  1314 Sound-related pins:
  1315 
  1316   SOUND O/P (sound output using internal oscillator)
  1317 
  1318 Cassette-related pins:
  1319 
  1320   CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
  1321 
  1322   CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
  1323 
  1324   CAS RC (detect high tone)
  1325 
  1326   CAS MO (motor relay output)
  1327 
  1328   ÷13 IN (~1200 baud clock input)
  1329 
  1330 ULA Socket
  1331 ----------
  1332 
  1333 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
  1334 
  1335 References
  1336 ----------
  1337 
  1338 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
  1339 
  1340 About this Document
  1341 -------------------
  1342 
  1343 The most recent version of this document and accompanying distribution should
  1344 be available from the following location:
  1345 
  1346 http://hgweb.boddie.org.uk/ULA
  1347 
  1348 Copyright and licence information can be found in the docs directory of this
  1349 distribution - see docs/COPYING.txt for more information.