ULA

ULA.txt

76:cabf7421f461
2015-09-06 Paul Boddie Added RAM access corrections related to CPU activity plus bandwidth figures. Consequently, an inevitable enhancement is proposed to remedy the situation.
     1 The Acorn Electron ULA     2 ======================     3      4 Principal Design and Feature Constraints     5 ----------------------------------------     6      7 The features of the ULA are limited by the amount of time and resources that     8 can be allocated to each activity necessary to support such features given the     9 fundamental obligations of the unit. Maintaining a screen display based on the    10 contents of RAM itself requires the ULA to have exclusive access to such    11 hardware resources for a significant period of time. Whilst other elements of    12 the ULA can in principle run in parallel with this activity, they cannot also    13 access the RAM. Consequently, other features that might use the RAM must    14 accept a reduced allocation of that resource in comparison to a hypothetical    15 architecture where concurrent RAM access is possible.    16     17 Thus, the principal constraint for many features is bandwidth. The duration of    18 access to hardware resources is one aspect of this; the rate at which such    19 resources can be accessed is another. For example, the RAM is not fast enough    20 to support access more frequently than one byte per 2MHz cycle, and for screen    21 modes involving 80 bytes of screen data per scanline, there are no free cycles    22 for anything other than the production of pixel output during the active    23 scanline periods.    24     25 Timing    26 ------    27     28 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256    29 of which are used to generate pixel data. At 50Hz, this means that 128 cycles    30 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /    31 312 ~= 128 cycles). This is consistent with the observation that each scanline    32 requires at most 80 bytes of data, and that the ULA is apparently busy for 40    33 out of 64 microseconds in each scanline.    34     35 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,    36 each providing two bits of each byte) using two cycles within the 500ns period    37 of the 2MHz clock to complete each access operation. Since the CPU and ULA    38 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must    39 effectively run at 1MHz (since every other 500ns period involves the ULA    40 accessing RAM). The CPU is driven by an external clock (IC8) whose 16MHz    41 frequency is divided by the ULA (IC1) depending on the screen mode in use.    42     43 Each 16MHz cycle is approximately 62.5ns. To access the memory, the following    44 patterns corresponding to 16MHz cycles are required:    45     46      Time (ns):  0-------------- 500------------ ...    47    2 MHz cycle:  0               1               ...    48   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...    49           ~RAS:    0           1   0           1 ...    50           ~CAS:      0   1 0   1     0   1 0   1 ...    51                    A B     C       A B     C     ...    52                        F     S         F     S   ...    53                  a b     c       a b     c       ...    54     55 Here, "A" and "B" respectively indicate the row and first column addresses    56 being latched into the RAM (on a negative edge for ~RAS and ~CAS    57 respectively), and "C" indicates the second column address being latched into    58 the RAM. Presumably, the first and second half-bytes can be read at "F" and    59 "S" respectively, and the row and column addresses must be made available at    60 "a" and "b" (and "c") respectively at the latest.    61     62 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column    63 address access time of 90ns (maximum), which appears to mean that    64 approximately two 16MHz cycles after the row address is latched, and one and a    65 half cycles after the column address is latched, the data becomes available.    66     67 Note that the Service Manual refers to the negative edge of RAS and CAS, but    68 the datasheet for the similar TM4164EC4 product shows latching on the negative    69 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to    70 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that    71 "page mode" provides the appropriate behaviour for that particular product.    72     73 The CPU, when accessing the RAM alone, apparently does not make use of the    74 vacated "slot" that the ULA would otherwise use (when interleaving accesses in    75 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when    76 accessing ROM (and potentially sideways RAM).    77     78 See: Acorn Electron Advanced User Guide    79 See: Acorn Electron Service Manual    80      http://acorn.chriswhy.co.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf    81 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm    82 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438    83     84 Bandwidth Figures    85 -----------------    86     87 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312    88 total lines, with 80 cycles occurring in the active periods of display    89 scanlines, the following bandwidth calculations can be performed:    90     91 Total theoretical maximum:    92        128 cycles * 312 lines    93      = 39936 bytes    94     95 MODE 0, 1, 2:    96 ULA:    80 cycles * 256 lines    97      = 20480 bytes    98 CPU:    48 cycles / 2 * 256 lines    99      + 128 cycles / 2 * (312 - 256) lines   100      = 9728 bytes   101    102 MODE 3:   103 ULA:    80 cycles * 24 rows * 8 lines   104      = 15360 bytes   105 CPU:    48 cycles / 2 * 24 rows * 8 lines   106      + 128 cycles / 2 * (312 - (24 rows * 8 lines))   107      = 12288 bytes   108    109 MODE 4, 5:   110 ULA:    40 cycles * 256 lines   111      = 10240 bytes   112 CPU:   (40 cycles + 48 cycles / 2) * 256 lines   113      + 128 cycles / 2 * (312 - 256) lines   114      = 19968 bytes   115    116 MODE 6:   117 ULA:    40 cycles * 24 rows * 8 lines   118      = 7680 bytes   119 CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines   120      + 128 cycles / 2 * (312 - (24 rows * 8 lines))   121      = 19968 bytes   122    123 Here, the division of 2 for CPU accesses is performed to indicate that the CPU   124 only uses every other access opportunity even in uncontended periods. See the   125 2MHz RAM Access enhancement below for bandwidth calculations that consider   126 this limitation removed.   127    128 Video Timing   129 ------------   130    131 According to 8.7 in the Service Manual, and the PAL Wikipedia page,   132 approximately 4.7?s is used for the sync pulse, 5.7?s for the "back porch"   133 (including the "colour burst"), and 1.65?s for the "front porch", totalling   134 12.05?s and thus leaving 51.95?s for the active video signal for each   135 scanline. As the Service Manual suggests in the oscilloscope traces, the   136 display information is transmitted more or less centred within the active   137 video period since the ULA will only be providing pixel data for 40?s in each   138 scanline.   139    140 Each 62.5ns cycle happens to correspond to 64?s divided by 1024, meaning that   141 each scanline can be divided into 1024 cycles, although only 640 at most are   142 actively used to provide pixel data. Pixel data production should only occur   143 within a certain period on each scanline, approximately 262 cycles after the   144 start of hsync:   145    146   active video period = 51.95?s   147   pixel data period = 40?s   148   total silent period = 51.95?s - 40?s = 11.95?s   149   silent periods (before and after) = 11.95?s / 2 = 5.975?s   150   hsync and back porch period = 4.7?s + 5.7?s = 10.4?s   151   time before pixel data period = 10.4?s + 5.975?s = 16.375?s   152   pixel data period start cycle = 16.375?s / 62.5ns = 262   153    154 By choosing a number divisible by 8, the RAM access mechanism can be   155 synchronised with the pixel production. Thus, 264 is a more appropriate start   156 cycle.   157    158 The "vertical blanking period", meaning the period before picture information   159 in each field is 25 lines out of 312 (strictly 312.5) and thus lasts for   160 1.6ms. Of this, 2.5 lines occur before the vsync (field sync) which also lasts   161 for 2.5 lines. Thus, the first visible scanline on the first field of a frame   162 occurs half way through the 23rd scanline period measured from the start of   163 vsync:   164    165                                         10                  20    23   166   Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8   167     Line from 1:       0                                          22 3   168  Line on screen: .:::::VVVVV:::::                                   12233445566   169                   |_________________________________________________|   170                            25 line vertical blanking period   171    172 In the second field of a frame, the first visible scanline coincides with the   173 24th scanline period measured from the start of line 313 in the frame:   174    175                310                                                 336   176   Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9   177   Line from 313:       0                                            23   178  Line on screen: 88:::::VVVVV::::                                    11223344   179                288 |                                                 |   180                    |_________________________________________________|   181                             25 line vertical blanking period   182    183 In order to consider only full lines, we might consider the start of each   184 frame to occur 23 lines after the start of vsync.   185    186 Again, it is likely that pixel data production should only occur on scanlines   187 within a certain period on each frame. The "625/50" document indicates that   188 only a certain region is "safe" to use, suggesting a vertically centred region   189 with approximately 15 blank lines above and below the picture. Thus, the start   190 of the picture could be chosen as 38 lines after the start of vsync.   191    192 See: http://en.wikipedia.org/wiki/PAL   193 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal   194 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes   195      http://lipas.uwasa.fi/~f76998/video/modes/   196 See: PAL TV timing and voltages   197      http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/   198 See: Line Standards   199      http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html   200    201 RAM Integrated Circuits   202 -----------------------   203    204 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series   205 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are   206 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,   207 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,   208 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.   209    210 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and   211 the Samsung-produced KM41464 series is apparently equivalent to the Texas   212 Instruments 4164 chips presumably used in the Electron.   213    214 The TM4164EC4 series combines 4 64K x 1b units into a single package and   215 appears similar to the TM4164EA4 featured on the Electron's circuit diagram   216 (in the Advanced User Guide but not the Service Manual), and it also has 22   217 pins providing 3 additional inputs and 3 additional outputs over the 16 pins   218 of the individual 4164-15 modules, presumably allowing concurrent access to   219 the packaged memory units.   220    221 As far as currently available replacements are concerned, the NTE4164 is a   222 potential candidate: according to the Vetco Electronics entry, it is   223 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar   224 parts include the NTE2164 and the NTE6664, both of which appear to have   225 largely the same performance and connection characteristics. Meanwhile, the   226 NTE21256 appears to be a 16-pin replacement with four times the capacity that   227 maintains the single data input and output pins. Using the NTE21256 as a   228 replacement for all ICs combined would be difficult because of the single bit   229 output.   230    231 Another device equivalent to the 4164-15 appears to be available under the   232 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web   233 site lists data sheets for other devices on the same page, but these are   234 different and actually appear to be provided under the 41574 product code (but   235 are listed under 41464-10) and appear to be replacements for the TM4164EC4:   236 the Samsung KM41464A-15 and NEC ?PD41464 employ 18 pins, eliminating 4 pins by   237 employing 4 pins for both input and output.   238    239             Pins    I/O pins    Row access  Column access   240             ----    --------    ----------  -------------   241 TM4164EC4   22      4 + 4       150ns (15)  90ns (15)   242 KM41464AP   18      4           150ns (15)  75ns (15)   243 NTE21256    16      1 + 1       150ns       75ns   244 HYB 4164-2  16      1 + 1       150ns       100ns   245 ?PD41464    18      4           120ns (12)  60ns (12)   246    247 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module   248      http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf   249 See: Dynamic RAMS   250      http://www.unicornelectronics.com/IC/DYNAMIC.html   251 See: New old stock 8x 4164 chips   252      http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock   253 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode   254      http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf   255 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory   256      http://www.vetco.net/catalog/product_info.php?products_id=2806   257 See: NTE4164 - IC-NMOS 64K DRAM 150NS   258      http://www.vetco.net/catalog/product_info.php?products_id=3680   259 See: NTE21256 - IC-256K DRAM 150NS   260      http://www.vetco.net/catalog/product_info.php?products_id=2799   261 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)   262      http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf   263 See: NTE6664 - IC-MOS 64K DRAM 150NS   264      http://www.vetco.net/catalog/product_info.php?products_id=5213   265 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM   266      http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf   267 See: 4164-150: MAJOR BRANDS   268      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1   269 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)   270      http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf   271 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode   272      http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf   273 See: NEC ?41464 65,536 x 4-Bit Dynamic NMOS RAM   274      http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf   275 See: 41464-10: MAJOR BRANDS   276      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1   277    278 Interrupts   279 ----------   280    281 The ULA generates IRQs (maskable interrupts) according to certain conditions   282 and these conditions are controlled by location &FE00:   283    284   * Vertical sync (bottom of displayed screen)   285   * 50MHz real time clock   286   * Transmit data empty   287   * Receive data full   288   * High tone detect   289    290 The ULA is also used to clear interrupt conditions through location &FE05. Of   291 particular significance is bit 7, which must be set if an NMI (non-maskable   292 interrupt) has occurred and has thus suspended ULA access to memory, restoring   293 the normal function of the ULA.   294    295 ROM Paging   296 ----------   297    298 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM   299 mappings exist:   300    301    8    keyboard   302    9    keyboard (duplicate)   303   10    BASIC ROM   304   11    BASIC ROM (duplicate)   305    306 Paging in a ROM involves the following procedure:   307    308  1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to   309     2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is   310     selected.   311  2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero   312     whilst writing the desired ROM number n in bits 0 to 2.   313    314 Shadow/Expanded Memory   315 ----------------------   316    317 The Electron exposes all sixteen address lines and all eight data lines   318 through the expansion bus. Using such lines, it is possible to provide   319 additional memory - typically sideways ROM and RAM - on expansion cards and   320 through cartridges, although the official cartridge specification provides   321 fewer address lines and only seeks to provide access to memory in 16K units.   322    323 Various modifications and upgrades were developed to offer "turbo"   324 capabilities to the Electron, permitting the CPU to access a separate 8K of   325 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via   326 the ULA through additional logic. However, an enhanced ULA might support   327 independent CPU access to memory over the expansion bus by allowing itself to   328 be discharged from providing access to memory, potentially for a range of   329 addresses, and for the CPU to communicate with external memory uninterrupted.   330    331 Sideways RAM/ROM and Upper Memory Access   332 ----------------------------------------   333    334 Although the ULA controls the CPU clock, effectively slowing or stopping the   335 CPU when the ULA needs to access screen memory, it is apparently able to allow   336 the CPU to access addresses of &8000 and above - the upper region of memory -   337 at 2MHz independently of any access to RAM that the ULA might be performing,   338 only blocking the CPU if it attempts to access addresses of &7FFF and below   339 during any ULA memory access - the lower region of memory - by stopping or   340 stalling its clock.   341    342 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the   343 CPU clock if the line goes low, when the CPU is attempting to access the lower   344 region of memory.   345    346 Hardware Scrolling   347 ------------------   348    349 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with   350 the least significant 5 bits being zero, thus limiting the scrolling   351 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes   352 using the same layout of these addresses.   353    354 |--&FE02--------------| |--&FE03--------------|   355 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX   356    357    XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX   358    359 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen   360 memory to pixel locations is character oriented. A change in 8 bytes would   361 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in   362 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually   363 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User   364 Guide).   365    366 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall   367 of changing the screen address by 2 bytes is the change in the number of lines   368 from the initial and final character rows that need reading by the ULA, which   369 would need to maintain this state information (although this is a relatively   370 trivial change). Another pitfall is the complication that might be introduced   371 to software writing bitmaps of character height to the screen.   372    373 Enhancement: 2MHz RAM Access   374 ----------------------------   375    376 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU   377 when not competing with the ULA only accesses RAM every other 2MHz cycle (as   378 if the ULA still needed to access the RAM), one useful enhancement would be a   379 mechanism to let the CPU take over the ULA cycles outside the ULA's period of   380 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to   381 3.   382    383 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:   384    385   Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)   386   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)   387    388 In MODE 4 to 6:   389     390   Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)   391   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)   392    393 This would improve CPU bandwidth as follows:   394    395                 Standard ULA    Enhanced ULA   396 MODE 0, 1, 2    9728 bytes      19456 bytes   397 MODE 3          12288 bytes     24576 bytes   398 MODE 4, 5       19968 bytes     29696 bytes   399 MODE 6          19968 bytes     32256 bytes   400    401 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth   402 because all access opportunities to RAM are doubled. Meanwhile, in the other   403 modes, some CPU accesses occur alongside ULA accesses and thus cannot be   404 doubled, but the CPU bandwidth increase is still significant.   405    406 Enhancement: Region Blanking   407 ----------------------------   408    409 The problem of permitting character-oriented blitting in programs whilst   410 scrolling the screen by sub-character amounts could be mitigated by permitting   411 a region of the display to be blank, such as the final lines of the display.   412 Consider the following vertical scrolling by 2 bytes that would cause an   413 initial character row of 6 lines and a final character row of 2 lines:   414    415     6 lines - initial, partial character row   416   248 lines - 31 complete rows   417     2 lines - final, partial character row   418    419 If a routine were in use that wrote 8 line bitmaps to the partial character   420 row now split in two, it would be advisable to hide one of the regions in   421 order to prevent content appearing in the wrong place on screen (such as   422 content meant to appear at the top "leaking" onto the bottom). Blanking 6   423 lines would be sufficient, as can be seen from the following cases.   424    425 Scrolling up by 2 lines:   426    427     6 lines - initial, partial character row   428   240 lines - 30 complete rows   429     4 lines - part of 1 complete row   430   -----------------------------------------------------------------   431     4 lines - part of 1 complete row (hidden to maintain 250 lines)   432     2 lines - final, partial character row (hidden)   433    434 Scrolling down by 2 lines:   435    436     2 lines - initial, partial character row   437   248 lines - 31 complete rows   438   ----------------------------------------------------------   439     6 lines - final, partial character row (hidden)   440    441 Thus, in this case, region blanking would impose a 250 line display with the   442 bottom 6 lines blank.   443    444 See the description of the display suspend enhancement for a more efficient   445 way of blanking lines than merely blanking the palette whilst allowing the CPU   446 to perform useful work during the blanking period.   447    448 To control the blanking or suspending of lines at the top and bottom of the   449 display, a memory location could be dedicated to the task: the upper 4 bits   450 could define a blanking region of up to 16 lines at the top of the screen,   451 whereas the lower 4 bits could define such a region at the bottom of the   452 screen. If more lines were required, two locations could be employed, allowing   453 the top and bottom regions to occupy the entire screen.   454    455 Enhancement: Screen Height Adjustment   456 -------------------------------------   457    458 The height of the screen could be configurable in order to reduce screen   459 memory consumption. This is not quite done in MODE 3 and 6 since the start of   460 the screen appears to be rounded down to the nearest page, but by reducing the   461 height by amounts more than a page, savings would be possible. For example:   462    463   Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address   464   ------------  -----  ------  --------------  ---------------  -------------   465   640           1      252     80              320              &3140 -> &3100   466   640           1      248     80              640              &3280 -> &3200   467   320           1      240     40              640              &5A80 -> &5A00   468   320           2      240     80              1280             &3500   469    470 Screen Mode Selection   471 ---------------------   472    473 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider   474 range of modes, the other bits of &FE*7 (related to sound, cassette   475 input/output and the Caps Lock LED) would need to be reassigned and bit 0   476 potentially being made available for use.   477    478 Enhancement: Palette Definition   479 -------------------------------   480    481 Since all memory accesses go via the ULA, an enhanced ULA could employ more   482 specific addresses than &FE*X to perform enhanced functions. For example, the   483 palette control is done using &FE*8-F and merely involves selecting predefined   484 colours, whereas an enhanced ULA could support the redefinition of all 16   485 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F   486 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour   487 specifications similar to those used on the Archimedes.   488    489 The principal limitation here is actually the hardware: the Electron has only   490 a single output line for each of the red, green and blue channels, and if   491 those outputs are strictly digital and can only be set to a "high" and "low"   492 value, then only the existing eight colours are possible. If a modern ULA were   493 able to output analogue values, it would still need to be assessed whether the   494 circuitry could successfully handle and propagate such values. Various sources   495 indicate that only "TTL levels" are supported by the RGB output circuit, and   496 since there are 74LS08 AND logic gates involved in the RGB component outputs   497 from the ULA, it is likely that the ULA is expected to provide only "high" or   498 "low" values.   499    500 Short of adding extra outputs from the ULA (either additional red, green and   501 blue outputs or a combined intensity output, the former employed on the   502 Amstrad CPC series), another approach might involve some kind of modulation   503 where an output value might be encoded in multiple pulses at a higher   504 frequency than the pixel frequency. However, this would demand additional   505 circuitry outside the ULA, and component RGB monitors would probably not be   506 able to take advantage of this feature; only UHF and composite video devices   507 (the latter with the composite video colour support enabled on the Electron's   508 circuit board) would potentially benefit.   509    510 Flashing Colours   511 ----------------   512    513 According to the Advanced User Guide, "The cursor and flashing colours are   514 entirely generated in software: This means that all of the logical to physical   515 colour map must be changed to cause colours to flash." This appears to suggest   516 that the palette registers must be updated upon the flash counter - read and   517 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the   518 colour pairs to be any combination of colours might be possible, instead of   519 having colour complements as pairs.   520    521 It is conceivable that the interrupt code responsible does the simple thing   522 and merely inverts the current values for any logical colours (LC) for which   523 the associated physical colour (as supplied as the second parameter to the VDU   524 19 call) has the top bit of its four bit value set. These top bits are not   525 recorded in the palette registers but are presumably recorded separately and   526 used to build bitmaps as follows:   527    528   LC  2 colour  4 colour  16 colour  4-bit value for inversion   529   --  --------  --------  ---------  -------------------------   530    0  00010001  00010001  00010001   1, 1, 1   531    1  01000100  00100010  00010001   4, 2, 1   532    2            01000100  00100010      4, 2   533    3            10001000  00100010      8, 2   534    4                      00010001         1   535    5                      00010001         1   536    6                      00100010         2   537    7                      00100010         2   538    8                      01000100         4   539    9                      01000100         4   540   10                      10001000         8   541   11                      10001000         8   542   12                      01000100         4   543   13                      01000100         4   544   14                      10001000         8   545   15                      10001000         8   546    547   Inversion value calculation:   548    549    2 colour formula: 1 << (colour * 2)   550    4 colour formula: 1 << colour   551   16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))   552    553 For example, where logical colour 0 has been mapped to a physical colour in   554 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to   555 the inversion operation. (The lower three bits of the physical colour would be   556 used to set the underlying colour information affected by the inversion   557 operation.)   558    559 An operation in the interrupt code would then combine the bitmaps for all   560 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being   561 combined for groups of logical colours as follows:   562    563    Logical colours   564    ---------------   565    0,  2,  8, 10   566    4,  6, 12, 14   567    5,  7, 13, 15   568    1,  3,  9, 11   569    570 These combined bitmaps would be EORed with the existing palette register   571 values in order to perform the value inversion necessary to produce the   572 flashing effect.   573    574 Thus, in the VDU 19 operation, the appropriate inversion value would be   575 calculated for the logical colour, and this value would then be combined with   576 other inversion values in a dedicated memory location corresponding to the   577 colour's group as indicated above. Meanwhile, the palette channel values would   578 be derived from the lower three bits of the specified physical colour and   579 combined with other palette data in dedicated memory locations corresponding   580 to the palette registers.   581    582 Interestingly, although flashing colours on the BBC Micro are controlled by   583 toggling bit 0 of the &FE20 control register location for the Video ULA, the   584 actual colour inversion is done in hardware.   585    586 Enhancement: Palette Definition Lists   587 -------------------------------------   588    589 It can be useful to redefine the palette in order to change the colours   590 available for a particular region of the screen, particularly in modes where   591 the choice of colours is constrained, and if an increased colour depth were   592 available, palette redefinition would be useful to give the illusion of more   593 than 16 colours in MODE 2. Traditionally, palette redefinition has been done   594 by using interrupt-driven timers, but a more efficient approach would involve   595 presenting lists of palette definitions to the ULA so that it can change the   596 palette at a particular display line.   597    598 One might define a palette redefinition list in a region of memory and then   599 communicate its contents to the ULA by writing the address and length of the   600 list, along with the display line at which the palette is to be changed, to   601 ULA registers such that the ULA buffers the list and performs the redefinition   602 at the appropriate time. Throughput/bandwidth considerations might impose   603 restrictions on the practical length of such a list, however.   604    605 Enhancement: Palette-Free Modes   606 -------------------------------   607    608 Palette-free modes might be defined where bit values directly correspond to   609 the red, green and blue channels, although this would mostly make sense only   610 for modes with depths greater than the standard 4 bits per pixel, and such   611 modes would require more memory than MODE 2 if they were to have an acceptable   612 resolution.   613    614 Enhancement: Display Suspend   615 ----------------------------   616    617 Especially when writing to the screen memory, it could be beneficial to be   618 able to suspend the ULA's access to the memory, instead producing blank values   619 for all screen pixels until a program is ready to reveal the screen. This is   620 different from palette blanking since with a blank palette, the ULA is still   621 reading screen memory and translating its contents into pixel values that end   622 up being blank.   623    624 This function is reminiscent of a capability of the ZX81, albeit necessary on   625 that hardware to reduce the load on the system CPU which was responsible for   626 producing the video output. By allowing display suspend on the Electron, the   627 performance benefit would be derived from giving the CPU full access to the   628 memory bandwidth.   629    630 The region blanking feature mentioned above could be implemented using this   631 enhancement instead of employing palette blanking for the affected lines of   632 the display.   633    634 Enhancement: Memory Filling   635 ---------------------------   636    637 A capability that could be given to an enhanced ULA is that of permitting the   638 ULA to write to screen memory as well being able to read from it. Although   639 such a capability would probably not be useful in conjunction with the   640 existing read operations when producing a screen display, and insufficient   641 bandwidth would exist to do so in high-bandwidth screen modes anyway, the   642 capability could be offered during a display suspend period (as described   643 above), permitting a more efficient mechanism to rapidly fill memory with a   644 predetermined value.   645    646 This capability could also support block filling, where the limits of the   647 filled memory would be defined by the position and size of a screen area,   648 although this would demand the provision of additional registers in the ULA to   649 retain the details of such areas and additional logic to control the fill   650 operation.   651    652 Enhancement: Region Filling   653 ---------------------------   654    655 An alternative to memory writing might involve indicating regions using   656 additional registers or memory where the ULA fills regions of the screen with   657 content instead of reading from memory. Unlike hardware sprites which should   658 realistically provide varied content, region filling could employ single   659 colours or patterns, and one advantage of doing so would be that the ULA need   660 not access memory at all within a particular region.   661    662 Regions would be defined on a row-by-row basis. Instead of reading memory and   663 blitting a direct representation to the screen, the ULA would read region   664 definitions containing a start column, region width and colour details. There   665 might be a certain number of definitions allowed per row, or the ULA might   666 just traverse an ordered list of such definitions with each one indicating the   667 row, start column, region width and colour details.   668    669 One could even compress this information further by requiring only the row,   670 start column and colour details with each subsequent definition terminating   671 the effect of the previous one. However, one would also need to consider the   672 convenience of preparing such definitions and whether efficient access to   673 definitions for a particular row might be desirable. It might also be   674 desirable to avoid having to prepare definitions for "empty" areas of the   675 screen, effectively making the definition of the screen contents employ   676 run-length encoding and employ only colour plus length information.   677    678 One application of region filling is that of simple 2D and 3D shape rendering.   679 Although it is entirely possible to plot such shapes to the screen and have   680 the ULA blit the memory contents to the screen, such operations consume   681 bandwidth both in the initial plotting and in the final transfer to the   682 screen. Region filling would reduce such bandwidth usage substantially.   683    684 This way of representing screen images would make certain kinds of images   685 unfeasible to represent - consider alternating single pixel values which could   686 easily occur in some character bitmaps - even if an internal queue of regions   687 were to be supported such that the ULA could read ahead and buffer such   688 "bandwidth intensive" areas. Thus, the ULA might be better served providing   689 this feature for certain areas of the display only as some kind of special   690 graphics window.   691    692 Enhancement: Hardware Sprites   693 -----------------------------   694    695 An enhanced ULA might provide hardware sprites, but this would be done in an   696 way that is incompatible with the standard ULA, since no &FE*X locations are   697 available for allocation. To keep the facility simple, hardware sprites would   698 have a standard byte width and height.   699    700 The specification of sprites could involve the reservation of 16 locations   701 (for example, &FE20-F) specifying a fixed number of eight sprites, with each   702 location pair referring to the sprite data. By limiting the ULA to dealing   703 with a fixed number of sprites, the work required inside the ULA would be   704 reduced since it would avoid having to deal with arbitrary numbers of sprites.   705    706 The principal limitation on providing hardware sprites is that of having to   707 obtain sprite data, given that the ULA is usually required to retrieve screen   708 data, and given the lack of memory bandwidth available to retrieve sprite data   709 (particularly from multiple sprites supposedly at the same position) and   710 screen data simultaneously. Although the ULA could potentially read sprite   711 data and screen data in alternate memory accesses in screen modes where the   712 bandwidth is not already fully utilised, this would result in a degradation of   713 performance.   714    715 Enhancement: Additional Screen Mode Configurations   716 --------------------------------------------------   717    718 Alternative screen mode configurations could be supported. The ULA has to   719 produce 640 pixel values across the screen, with pixel doubling or quadrupling   720 employed to fill the screen width:   721    722   Screen width      Columns     Scaling     Depth       Bytes   723   ------------      -------     -------     -----       -----   724   640               80          x1          1           80   725   320               40          x2          1, 2        40, 80   726   160               20          x4          2, 4        40, 80   727    728 It must also use at most 80 byte-sized memory accesses to provide the   729 information for the display. Given that characters must occupy an 8x8 pixel   730 array, if a configuration featuring anything other than 20, 40 or 80 character   731 columns is to be supported, compromises must be made such as the introduction   732 of blank pixels either between characters (such as occurs between rows in MODE   733 3 and 6) or at the end of a scanline (such as occurs at the end of the frame   734 in MODE 3 and 6). Consider the following configuration:   735    736   Screen width      Columns     Scaling     Depth       Bytes       Blank   737   ------------      -------     -------     -----       ------      -----   738   208               26          x3          1, 2        26, 52      16   739    740 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4   741 colours could be provided, with 16 blank pixel values (out of a total of 640)   742 generated either at the start or end (or split between the start and end) of   743 each scanline.   744    745 Enhancement: Character Attributes   746 ---------------------------------   747    748 The BBC Micro MODE 7 employs something resembling character attributes to   749 support teletext displays, but depends on circuitry providing a character   750 generator. The ZX Spectrum, on the other hand, provides character attributes   751 as a means of colouring bitmapped graphics. Although such a feature is very   752 limiting as the sole means of providing multicolour graphics, in situations   753 where the choice is between low resolution multicolour graphics or high   754 resolution monochrome graphics, character attributes provide a potentially   755 useful compromise.   756    757 For each byte read, the ULA must deliver 8 pixel values (out of a total of   758 640) to the video output, doing so by either emptying its pixel buffer on a   759 pixel per cycle basis, or by multiplying pixels and thus holding them for more   760 than one cycle. For example for a screen mode having 640 pixels in width:   761    762   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15   763   Reads:    B                               B   764   Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7   765    766 And for a screen mode having 320 pixels in width:   767    768   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15   769   Reads:    B   770   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7   771    772 However, in modes where less than 80 bytes are required to generate the pixel   773 values, an enhanced ULA might be able to read additional bytes between those   774 providing the bitmapped graphics data:   775    776   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15   777   Reads:    B                               A   778   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7   779    780 These additional bytes could provide colour information for the bitmapped data   781 in the following character column (of 8 pixels). Since it would be desirable   782 to apply attribute data to the first column, the initial 8 cycles might be   783 configured to not produce pixel values.   784    785 For an entire character, attribute data need only be read for the first row of   786 pixels for a character. The subsequent rows would have attribute information   787 applied to them, although this would require the attribute data to be stored   788 in some kind of buffer. Thus, the following access pattern would be observed:   789    790   Cycle:    A B ... _ B ... _ B ... _ B ... _ B ... _ B ... _ B ... _ B ...   791    792 A whole byte used for colour information for a whole character would result in   793 a choice of 256 colours, and this might be somewhat excessive. By only reading   794 attribute bytes at every other opportunity, a choice of 16 colours could be   795 applied individually to two characters.   796    797   Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1   798   Reads:    B               A               B               -   799   Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7   800    801 Further reductions in attribute data access, offering 4 colours for every   802 character in a four character block, for example, might also be worth   803 considering.   804    805 Consider the following configurations for screen modes with a colour depth of   806 1 bit per pixel for bitmap information:   807    808   Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start   809   ------------  -------  -------  ---------  ---------  -------  ------------   810   320           40       x2       40         40         256      &5300   811   320           40       x2       40         20         16       &5580 -> &5500   812   320           40       x2       40         10         4        &56C0 -> &5600   813   208           26       x3       26         26         256      &62C0 -> &6200   814   208           26       x3       26         13         16       &6460 -> &6400   815    816 Enhancement: MODE 7 Emulation using Character Attributes   817 --------------------------------------------------------   818    819 If the scheme of applying attributes to character regions were employed to   820 emulate MODE 7, in conjunction with the MODE 6 display technique, the   821 following configuration would be required:   822    823   Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start   824   ------------  -------  ----  ---------  ---------  -------  ------------   825   320           40       25    40         20         16       &5ECC -> &5E00   826   320           40       25    40         10         4        &5FC6 -> &5F00   827    828 Although this requires much more memory than MODE 7 (8500 bytes versus MODE   829 7's 1000 bytes), it does not need much more memory than MODE 6, and it would   830 at least make a limited 40-column multicolour mode available as a substitute   831 for MODE 7.   832    833 Enhancement: High Resolution Graphics and Mode Layouts   834 ------------------------------------------------------   835    836 Screen modes with different screen memory mappings, higher resolutions and   837 larger colour depths might be possible, but this would in most cases involve   838 the allocation of more screen memory, and the ULA would probably then be   839 obliged to page in such memory for the CPU to be able to sensibly access it   840 all. Merely changing the memory mappings in order to have Archimedes-style   841 row-oriented screen addresses (instead of character-oriented addresses) could   842 be done for the existing modes, but this might not be sufficiently beneficial,   843 especially since accessing regions of the screen would involve incrementing   844 pointers by amounts that are inconvenient on an 8-bit CPU.   845    846 Enhancement: Genlock Support   847 ----------------------------   848    849 The ULA generates a video signal in conjunction with circuitry producing the   850 output features necessary for the correct display of the screen image.   851 However, it appears that the ULA drives the video synchronisation mechanism   852 instead of reacting to an existing signal. Genlock support might be possible   853 if the ULA were made to be responsive to such external signals, resetting its   854 address generators upon receiving synchronisation events.   855    856 Enhancement: Improved Sound   857 ---------------------------   858    859 The standard ULA reserves &FE*6 for sound generation and cassette input/output   860 (with bits 1 and 2 of &FE*7 being used to select either sound generation or   861 cassette I/O), thus making it impossible to support multiple channels within   862 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,   863 and an enhanced ULA could adopt this interface.   864    865 The BBC Micro uses the SN76489 chip to produce sound, and the entire   866 functionality of this chip could be emulated for enhanced sound, with a subset   867 of the functionality exposed via the &FE*6 interface.   868    869 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489   870    871 Enhancement: Waveform Upload   872 ----------------------------   873    874 As with a hardware sprite function, waveforms could be uploaded or referenced   875 using locations as registers referencing memory regions.   876    877 Enhancement: Sound Input/Output   878 -------------------------------   879    880 Since the ULA already controls audio input/output for cassette-based data, it   881 would have been interesting to entertain the idea of sampling and output of   882 sounds through the cassette interface. However, a significant amount of   883 circuitry is employed to process the input signal for use by the ULA and to   884 process the output signal for recording.   885    886 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11   887    888 Enhancement: BBC ULA Compatibility   889 ----------------------------------   890    891 Although some new ULA functions could be defined in a way that is also   892 compatible with the BBC Micro, the BBC ULA is itself incompatible with the   893 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory   894 map, but controls various functions specific to the 6845 video controller;   895 &FE08-F is reserved for the serial controller. It therefore becomes possible   896 to disregard compatibility where compatibility is already disregarded for a   897 particular area of functionality.   898    899 &FE20-F maps to video ULA functionality on the BBC Micro which provides   900 control over the palette (using address &FE21, compared to &FE07-F on the   901 Electron) and other system-specific functions. Since the location usage is   902 generally incompatible, this region could be reused for other purposes.   903    904 Enhancement: Increased RAM, ULA and CPU Performance   905 ---------------------------------------------------   906    907 More modern implementations of the hardware might feature faster RAM coupled   908 with an increased ULA clock frequency in order to increase the bandwidth   909 available to the ULA and to the CPU in situations where the ULA is not needed   910 to perform work. A ULA employing a 32MHz clock would be able to complete the   911 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU   912 to access the RAM for the following 250ns even in display modes requiring the   913 retrieval of a byte for the display every 500ns. The CPU could, subject to   914 timing issues, run at 2MHz even in MODE 0, 1 and 2.   915    916 A scheme such as that described above would have a similar effect to the   917 scheme employed in the BBC Micro, although the latter made use of RAM with a   918 wider bandwidth in order to complete memory transfers within 250ns and thus   919 permit the CPU to run continuously at 2MHz.   920    921 Higher bandwidth could potentially be used to implement exotic features such   922 as RAM-resident hardware sprites or indeed any feature demanding RAM access   923 concurrent with the production of the display image.   924    925 Enhancement: Multiple CPU Stacks   926 --------------------------------   927    928 The 6502 maintains a stack for subroutine calls and register storage in page   929 &01. Although the stack register can be manipulated using the TSX and TXS   930 instructions, thereby permitting the maintenance of multiple stack regions and   931 thus the potential coexistence of multiple programs each using a separate   932 region, only programs that make little use of the stack (perhaps avoiding   933 deeply-nested subroutine invocations and significant register storage) would   934 be able to coexist without overwriting each other's stacks.   935    936 One way that this issue could be alleviated would involve the provision of a   937 facility to redirect accesses to page &01 to other areas of memory. The ULA   938 would provide a register that defines a physical page for the use of the CPU's   939 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would   940 change the asserted address lines to redirect the access to the appropriate   941 physical region.   942    943 By providing an 8-bit register, mapping to the most significant byte (MSB) of   944 a 16-bit address, the ULA could then replace any MSB equal to &01 with the   945 register value before the access is made. Where multiple programs coexist,   946 upon switching programs, the register would be updated to point the ULA to the   947 appropriate stack location, thus providing a simple memory management unit   948 (MMU) capability.   949    950 ULA Pin Functions   951 -----------------   952    953 The functions of the ULA pins are described in the Electron Service Manual. Of   954 interest to video processing are the following:   955    956   CSYNC (low during horizontal or vertical synchronisation periods, high   957          otherwise)   958    959   HS (low during horizontal synchronisation periods, high otherwise)   960    961   RED, GREEN, BLUE (pixel colour outputs)   962    963   CLOCK IN (a 16MHz clock input, 4V peak to peak)   964    965   PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)   966    967 More general memory access pins:   968    969   RAM0...RAM3 (data lines to/from the RAM)   970    971   RA0...RA7 (address lines for sending both row and column addresses to the RAM)   972    973   RAS (row address strobe setting the row address on a negative edge - see the   974        timing notes)   975    976   CAS (column address strobe setting the column address on a negative edge -   977        see the timing notes)   978    979   WE (sets write enable with logic 0, read with logic 1)   980    981   ROM (select data access from ROM)   982    983 CPU-oriented memory access pins:   984    985   A0...A15 (CPU address lines)   986    987   PD0...PD7 (CPU data lines)   988    989   R/W (indicates CPU write with logic 0, CPU read with logic 1)   990    991 Interrupt-related pins:   992    993   NMI (CPU request for uninterrupted 1MHz access to memory)   994    995   IRQ (signal event to CPU)   996    997   POR (power-on reset, resetting the ULA on a positive edge and asserting the   998        CPU's RST pin)   999   1000   RST (master reset for the CPU signalled on power-up and by the Break key)  1001   1002 Keyboard-related pins:  1003   1004   KBD0...KBD3 (keyboard inputs)  1005   1006   CAPS LOCK (control status LED)  1007   1008 Sound-related pins:  1009   1010   SOUND O/P (sound output using internal oscillator)  1011   1012 Cassette-related pins:  1013   1014   CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)  1015   1016   CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)  1017   1018   CAS RC (detect high tone)  1019   1020   CAS MO (motor relay output)  1021   1022   ?13 IN (~1200 baud clock input)  1023   1024 ULA Socket  1025 ----------  1026   1027 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.  1028   1029 References  1030 ----------  1031   1032 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm  1033   1034 About this Document  1035 -------------------  1036   1037 The most recent version of this document and accompanying distribution should  1038 be available from the following location:  1039   1040 http://hgweb.boddie.org.uk/ULA  1041   1042 Copyright and licence information can be found in the docs directory of this  1043 distribution - see docs/COPYING.txt for more information.