ULA

ULA.txt

125:0fa3ae6ff36b
2 months ago Paul Boddie Added a note about the CPU performance limitations of suspending ULA RAM access.
     1 The Acorn Electron ULA     2 ======================     3      4 Principal Design and Feature Constraints     5 ----------------------------------------     6      7 The features of the ULA are limited in sophistication by the amount of time     8 and resources that can be allocated to each activity supporting the     9 fundamental features and obligations of the unit. Maintaining a screen display    10 based on the contents of RAM itself requires the ULA to have exclusive access    11 to various hardware resources for a significant period of time.    12     13 Whilst other elements of the ULA can in principle run in parallel with the    14 display refresh activity, they cannot also access the RAM at the same time.    15 Consequently, other features that might use the RAM must accept a reduced    16 allocation of that resource in comparison to a hypothetical architecture where    17 concurrent RAM access is possible at all times.    18     19 Thus, the principal constraint for many features is bandwidth. The duration of    20 access to hardware resources is one aspect of this; the rate at which such    21 resources can be accessed is another. For example, the RAM is not fast enough    22 to support access more frequently than one byte per 2MHz cycle, and for screen    23 modes involving 80 bytes of screen data per scanline, there are no free cycles    24 for anything other than the production of pixel output during the active    25 scanline periods.    26     27 Another constraint is imposed by the method of RAM access provided by the ULA.    28 The ULA is able to access RAM by fetching 4 bits at a time and thus managing    29 to transfer 8 bits within a single 2MHz cycle, this being sufficient to    30 provide display data for the most demanding screen modes. However, this    31 mechanism's timing requirements are beyond the capabilities of the CPU when    32 running at 2MHz.    33     34 Consequently, the CPU will only ever be able to access RAM via the ULA at    35 1MHz, even when the ULA is not accessing the RAM. Fortunately, when needing to    36 refresh the display, the ULA is still able to make use of the idle part of    37 each 1MHz cycle (or, rather, the idle 2MHz cycle unused by the CPU) to itself    38 access the RAM at a rate of 1 byte per 1MHz cycle (or 1 byte every other 2MHz    39 cycle), thus supporting the less demanding screen modes.    40     41 Timing    42 ------    43     44 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256    45 of which are used to generate pixel data. At 50Hz, this means that 128 cycles    46 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /    47 312 ~= 128 cycles). This is consistent with the observation that each scanline    48 requires at most 80 bytes of data, and that the ULA is apparently busy for 40    49 out of 64 microseconds in each scanline.    50     51 (In fact, since the ULA is seeking to provide an image for an interlaced    52 625-line display, there are in fact two "fields" involved, one providing 312    53 scanlines and one providing 313 scanlines. See below for a description of the    54 video system.)    55     56 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,    57 each providing two bits of each byte) using two cycles within the 500ns period    58 of the 2MHz clock to complete each access operation. Since the CPU and ULA    59 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must    60 effectively run at 1MHz (since every other 500ns period involves the ULA    61 accessing RAM) during transfers of screen data.    62     63 The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided    64 by the ULA (IC1) depending on the screen mode in use.  Each 16MHz cycle is    65 approximately 62.5ns. To access the memory, the following patterns    66 corresponding to 16MHz cycles are required:    67     68      Time (ns):  0-------------- 500------------- ...    69    2 MHz cycle:  0               1                ...    70   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...    71                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...    72           ~RAS:  /---\___________/---\___________ ...    73           ~CAS:  /-----\___/-\___/-----\___/-\___ ...    74 Address events:      A B     C       A B     C    ...    75    Data events:           F     S         F     S ...    76     77       ~RAS ops:  1   0           1   0            ...    78       ~CAS ops:  1     0   1 0   1     0   1 0    ...    79     80    Address ops:     a b     c       a b     c     ...    81       Data ops:  s         f     s         f      ...    82     83            ~WE:  ......W                          ...    84        PHI OUT:  \_______________/--------------- ...    85      CPU (RAM):  L               D                ...    86            RnW:  R                                ...    87     88        PHI OUT:  \_______/-------\_______/------- ...    89      CPU (ROM):  L       D       L       D        ...    90            RnW:          R               R        ...    91     92 ~RAS must be high for 100ns, ~CAS must be high for 50ns.    93 ~RAS must be low for 150ns, ~CAS must be low for 90ns.    94 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.    95     96 Here, "A" and "B" respectively indicate the row and first column addresses    97 being latched into the RAM (on a negative edge for ~RAS and ~CAS    98 respectively), and "C" indicates the second column address being latched into    99 the RAM. Presumably, the first and second half-bytes can be read at "F" and   100 "S" respectively, and the row and column addresses must be made available at   101 "a" and "b" (and "c") respectively at the latest. Data can be read at "f" and   102 "s" for the first and second half-bytes respectively.   103    104 For the CPU, "L" indicates the point at which an address is taken from the CPU   105 address bus, on a negative edge of PHI OUT, with "D" being the point at which   106 data may either be read or be asserted for writing, on a positive edge of PHI   107 OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low   108 for writing or high for reading, and thus propagates RnW from the CPU, this   109 would need to be done before data would be retrieved and, according to the   110 TM4164EC4 datasheet, even as late as the column address is presented and ~CAS   111 brought low.   112    113 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column   114 address access time of 90ns (maximum), which appears to mean that ~RAS must be   115 held low for at least 150ns and that ~CAS must be held low for at least 90ns   116 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44   117 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"   118 is 1.5 cycles.   119    120 Note that the Service Manual refers to the negative edge of RAS and CAS, but   121 the datasheet for the similar TM4164EC4 product shows latching on the negative   122 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to   123 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that   124 "page mode" provides the appropriate behaviour for that particular product.   125    126 The CPU, when accessing the RAM alone, apparently does not make use of the   127 vacated "slot" that the ULA would otherwise use (when interleaving accesses in   128 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when   129 accessing ROM (and potentially sideways RAM). The principal limitation is the   130 amount of time needed between issuing an address and receiving an entire byte   131 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the   132 4 cycles that would be required for 2MHz operation.   133    134 See: Acorn Electron Advanced User Guide   135 See: Acorn Electron Service Manual   136      http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf   137 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm   138 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438   139 See: One of the Most Popular 65,536-Bit (64K) Dynamic RAMs The TMS 4164   140      http://smithsonianchips.si.edu/augarten/p64.htm   141    142 A Note on 8-Bit Wide RAM Access   143 -------------------------------   144    145 It is worth considering the timing when 8 bits of data can be obtained at once   146 from the RAM chips:   147    148      Time (ns):  0-------------- 500------------- ...   149    2 MHz cycle:  0               1                ...   150    8 MHz cycle:  0   1   2   3   0   1   2   3    ...   151                  /-\_/-\_/-\_/-\_/-\_/-\_/-\_/-\_ ...   152           ~RAS:  /---\___________/---\___________ ...   153           ~CAS:  /-------\_______/-------\_______ ...   154 Address events:      A   B           A   B        ...   155    Data events:             E               E     ...   156    157       ~RAS ops:  1   0           1   0            ...   158       ~CAS ops:  1       0       1       0        ...   159    160    Address ops:     a   b           a   b         ...   161       Data ops:            f     s         f      ...   162    163            ~WE:  ........W                        ...   164        PHI OUT:  \_______/-------\_______/------- ...   165            CPU:  L       D       L       D        ...   166            RnW:          R               R        ...   167    168 Here, "E" indicates the availability of an entire byte.   169    170 Since only one fetch is required per 2MHz cycle, instead of two fetches for   171 the 4-bit wide RAM arrangement, it seems likely that longer 8MHz cycles could   172 be used to coordinate the necessary signalling.   173    174 Another conceivable simplification from using an 8-bit wide RAM access channel   175 with a single access within each 2MHz cycle is the possibility of allowing the   176 CPU to signal directly to the RAM instead of having the ULA perform the access   177 signalling on the CPU's behalf. Note that it is this more leisurely signalling   178 that would allow the CPU to conduct accesses at 2MHz: the "compressed"   179 signalling being beyond the capabilities of the CPU.   180    181 Note that 16MHz cycles would still be needed for the pixel clock in MODE 0,   182 which needs to output eight pixels per 2MHz cycle, producing 640 monochrome   183 pixels per 80-byte line.   184    185 An obvious consideration with regard to 8-bit wide access is whether the ULA   186 could still conduct the "compressed" signalling for its own RAM accesses:   187    188      Time (ns):  0-------------- 500------------- ...   189    2 MHz cycle:  0               1                ...   190   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...   191                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...   192           ~RAS:  /---\___________/---\___________ ...   193           ~CAS:  /-----\___/-\___/-----\___/-\___ ...   194 Address events:      A B     C       A B     C    ...   195    Data events:           1     2         1     2 ...   196    197       ~RAS ops:  1   0           1   0            ...   198       ~CAS ops:  1     0   1 0   1     0   1 0    ...   199    200    Address ops:     a b     c       a b     c     ...   201       Data ops:  s         f     s         f      ...   202    203            ~WE:  ......W                          ...   204        PHI OUT:  \_______/-------\_______/------- ...   205            CPU:  L       D       L       D        ...   206            RnW:          R               R        ...   207    208 Here, "1" and "2" in the data events correspond to whole byte accesses,   209 effectively upgrading the half-byte "F" and "S" events in the existing ULA   210 arrangement.   211    212 Although the provision of access for the CPU would adhere to the relevant   213 timing constraints, providing only one byte per 2MHz cycle, the ULA could   214 obtain two bytes per cycle. This would then free up bandwidth for the CPU in   215 screen modes where the ULA would normally be dominant (MODE 0 to 3), albeit at   216 the cost of extra buffering. Such buffering could also be done for modes where   217 the bandwidth is shared (MODE 4 to 6), consolidating pairs of ULA accesses into   218 single cycles and freeing up an extra cycle for CPU accesses.   219    220 CPU Clock Notes   221 ---------------   222    223 "The 6502 receives an external square-wave clock input signal on pin 37, which   224 is usually labeled PHI0. [...] This clock input is processed within the 6502   225 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2   226 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been   227 through two inverters and a push-pull amplifier. The same network of   228 transistors within the 6502 which generates PHI2 is also tied to PHI1, and   229 generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made   230 available to external devices is so that they know when they can access the   231 CPU. When PHI1 is high, this means that external devices can read from the   232 address bus or data bus; when PHI2 is high, this means that external devices   233 can write to the data bus."   234    235 See: http://lateblt.livejournal.com/88105.html   236    237 "The 6502 has a synchronous memory bus where the master clock is divided into   238 two phases (Phase 1 and Phase 2). The address is always generated during Phase   239 1 and all memory accesses take place during Phase 2."   240    241 See: http://www.jmargolin.com/vgens/vgens.htm   242    243 Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During   244 Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means   245 when PHI1 is high.   246    247 Bandwidth Figures   248 -----------------   249    250 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312   251 total lines, with 80 cycles occurring in the active periods of display   252 scanlines, the following bandwidth calculations can be performed:   253    254 Total theoretical maximum:   255        128 cycles * 312 lines   256      = 39936 bytes   257    258 MODE 0, 1, 2:   259 ULA:    80 cycles * 256 lines   260      = 20480 bytes   261 CPU:    48 cycles / 2 * 256 lines   262      + 128 cycles / 2 * (312 - 256) lines   263      = 9728 bytes   264    265 MODE 3:   266 ULA:    80 cycles * 24 rows * 8 lines   267      = 15360 bytes   268 CPU:    48 cycles / 2 * 24 rows * 8 lines   269      + 128 cycles / 2 * (312 - (24 rows * 8 lines))   270      = 12288 bytes   271    272 MODE 4, 5:   273 ULA:    40 cycles * 256 lines   274      = 10240 bytes   275 CPU:   (40 cycles + 48 cycles / 2) * 256 lines   276      + 128 cycles / 2 * (312 - 256) lines   277      = 19968 bytes   278    279 MODE 6:   280 ULA:    40 cycles * 24 rows * 8 lines   281      = 7680 bytes   282 CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines   283      + 128 cycles / 2 * (312 - (24 rows * 8 lines))   284      = 19968 bytes   285    286 Here, the division of 2 for CPU accesses is performed to indicate that the CPU   287 only uses every other access opportunity even in uncontended periods. See the   288 2MHz RAM Access enhancement below for bandwidth calculations that consider   289 this limitation removed.   290    291 A summary of the bandwidth figures is as follows (with extra timing details   292 described below):   293    294                 Standard ULA    % Total   Slowdown  BBC-10s BBC-34s   295 MODE 0, 1, 2    9728 bytes      24%       4.11      43s     105s   296 MODE 3          12288 bytes     31%       3.25      34s   297 MODE 4, 5       19968 bytes     50%       2         20s   298 MODE 6          19968 bytes     50%       2         20s     50s   299    300 The review of the Electron in Practical Computing (October 1983) provides a   301 concise overview of the RAM access limitations and gives timing comparisons   302 between modes and BBC Micro performance. In the above, "BBC-10s" is the   303 measured or stated time given for a program taking 10 seconds on the BBC   304 Micro, whereas "BBC-34s" is the apparently measured time given for the   305 "Persian" program taking 34 seconds to complete on the BBC Micro, with a   306 "quick" mode presumably switching to MODE 6 using the ULA directly in order to   307 reduce display bandwidth usage while the program draws to the screen.   308 Evidently, the measured slowdown is slightly lower than the theoretical   309 slowdown, most likely due to the running time not being entirely dominated by   310 RAM access performance characteristics.   311    312 Video Timing   313 ------------   314    315 According to 8.7 in the Service Manual, and the PAL Wikipedia page,   316 approximately 4.7?s is used for the sync pulse, 5.7?s for the "back porch"   317 (including the "colour burst"), and 1.65?s for the "front porch", totalling   318 12.05?s and thus leaving 51.95?s for the active video signal for each   319 scanline. As the Service Manual suggests in the oscilloscope traces, the   320 display information is transmitted more or less centred within the active   321 video period since the ULA will only be providing pixel data for 40?s in each   322 scanline.   323    324 Each 62.5ns cycle happens to correspond to 64?s divided by 1024, meaning that   325 each scanline can be divided into 1024 cycles, although only 640 at most are   326 actively used to provide pixel data. Pixel data production should only occur   327 within a certain period on each scanline, approximately 262 cycles after the   328 start of hsync:   329    330   active video period = 51.95?s   331   pixel data period = 40?s   332   total silent period = 51.95?s - 40?s = 11.95?s   333   silent periods (before and after) = 11.95?s / 2 = 5.975?s   334   hsync and back porch period = 4.7?s + 5.7?s = 10.4?s   335   time before pixel data period = 10.4?s + 5.975?s = 16.375?s   336   pixel data period start cycle = 16.375?s / 62.5ns = 262   337    338 By choosing a number divisible by 8, the RAM access mechanism can be   339 synchronised with the pixel production. Thus, 256 is a more appropriate start   340 cycle, where the HS (horizontal sync) signal corresponding to the 4?s sync   341 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"   342 document) occurs at cycle 0.   343    344 To summarise:   345    346   HS signal starts at cycle 0 on each horizontal scanline   347   HS signal ends approximately 4?s later at cycle 64   348   Pixel data starts approximately 12?s later at cycle 256   349    350 "Re: Electron Memory Contention" provides measurements that appear consistent   351 with these calculations.   352    353 The "vertical blanking period", meaning the period before picture information   354 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of   355 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5   356 lines. Thus, the first visible scanline on the first field of a frame occurs   357 half way through the 23rd scanline period measured from the start of vsync   358 (indicated by "V" in the diagrams below):   359    360                                         10                  20    23   361   Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8   362     Line from 1:       0                                          22 3   363  Line on screen: .:::::VVVVV:::::                                   12233445566   364                   |_________________________________________________|   365                            25 line vertical blanking period   366    367 In the second field of a frame, the first visible scanline coincides with the   368 24th scanline period measured from the start of line 313 in the frame:   369    370                310                                                 336   371   Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9   372   Line from 313:       0                                            23 4   373  Line on screen: 88:::::VVVVV::::                                    11223344   374                288 |                                                 |   375                    |_________________________________________________|   376                             25 line vertical blanking period   377    378 In order to consider only full lines, we might consider the start of each   379 frame to occur 23 lines after the start of vsync.   380    381 Again, it is likely that pixel data production should only occur on scanlines   382 within a certain period on each frame. The "625/50" document indicates that   383 only a certain region is "safe" to use, suggesting a vertically centred region   384 with approximately 15 blank lines above and below the picture. However, the   385 "PAL TV timing and voltages" document suggests 28 blank lines above and below   386 the picture. This would centre the 256 lines within the 312 lines of each   387 field and thus provide a start of picture approximately 5.5 or 5 lines after   388 the end of the blanking period or 28 or 27.5 lines after the start of vsync.   389    390 To summarise:   391    392   CSYNC signal starts at cycle 0   393   CSYNC signal ends approximately 160?s (2.5 lines) later at cycle 2560   394   Start of line occurs approximately 1632?s (5.5 lines) later at cycle 28672   395    396 See: http://en.wikipedia.org/wiki/PAL   397 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal   398 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes   399      http://lipas.uwasa.fi/~f76998/video/modes/   400 See: PAL TV timing and voltages   401      http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/   402 See: Line Standards   403      http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html   404 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards   405      http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf   406 See: Re: Electron Memory Contention   407      http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109   408    409 RAM Integrated Circuits   410 -----------------------   411    412 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series   413 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are   414 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,   415 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,   416 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.   417    418 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and   419 the Samsung-produced KM41464 series is apparently equivalent to the Texas   420 Instruments 4164 chips presumably used in the Electron.   421    422 The TM4164EC4 series combines 4 64K x 1b units into a single package and   423 appears similar to the TM4164EA4 featured on the Electron's circuit diagram   424 (in the Advanced User Guide but not the Service Manual), and it also has 22   425 pins providing 3 additional inputs and 3 additional outputs over the 16 pins   426 of the individual 4164-15 modules, presumably allowing concurrent access to   427 the packaged memory units.   428    429 As far as currently available replacements are concerned, the NTE4164 is a   430 potential candidate: according to the Vetco Electronics entry, it is   431 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar   432 parts include the NTE2164 and the NTE6664, both of which appear to have   433 largely the same performance and connection characteristics. Meanwhile, the   434 NTE21256 appears to be a 16-pin replacement with four times the capacity that   435 maintains the single data input and output pins. Using the NTE21256 as a   436 replacement for all ICs combined would be difficult because of the single bit   437 output.   438    439 Another device equivalent to the 4164-15 appears to be available under the   440 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web   441 site lists data sheets for other devices on the same page, but these are   442 different and actually appear to be provided under the 41574 product code (but   443 are listed under 41464-10) and appear to be replacements for the TM4164EC4:   444 the Samsung KM41464A-15 and NEC ?PD41464 employ 18 pins, eliminating 4 pins by   445 employing 4 pins for both input and output.   446    447             Pins    I/O pins    Row access  Column access   448             ----    --------    ----------  -------------   449 TM4164EC4   22      4 + 4       150ns (15)  90ns (15)   450 KM41464AP   18      4           150ns (15)  75ns (15)   451 NTE21256    16      1 + 1       150ns       75ns   452 HYB 4164-2  16      1 + 1       150ns       100ns   453 ?PD41464    18      4           120ns (12)  60ns (12)   454    455 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module   456      http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf   457 See: Dynamic RAMS   458      http://www.unicornelectronics.com/IC/DYNAMIC.html   459 See: New old stock 8x 4164 chips   460      http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock   461 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode   462      http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf   463 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory   464      http://www.vetco.net/catalog/product_info.php?products_id=2806   465 See: NTE4164 - IC-NMOS 64K DRAM 150NS   466      http://www.vetco.net/catalog/product_info.php?products_id=3680   467 See: NTE21256 - IC-256K DRAM 150NS   468      http://www.vetco.net/catalog/product_info.php?products_id=2799   469 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)   470      http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf   471 See: NTE6664 - IC-MOS 64K DRAM 150NS   472      http://www.vetco.net/catalog/product_info.php?products_id=5213   473 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM   474      http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf   475 See: 4164-150: MAJOR BRANDS   476      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1   477 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)   478      http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf   479 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode   480      http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf   481 See: NEC ?41464 65,536 x 4-Bit Dynamic NMOS RAM   482      http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf   483 See: 41464-10: MAJOR BRANDS   484      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1   485    486 Interrupts   487 ----------   488    489 The ULA generates IRQs (maskable interrupts) according to certain conditions   490 and these conditions are controlled by location &FE00:   491    492   * Vertical sync (bottom of displayed screen)   493   * 50MHz real time clock   494   * Transmit data empty   495   * Receive data full   496   * High tone detect   497    498 The ULA is also used to clear interrupt conditions through location &FE05. Of   499 particular significance is bit 7, which must be set if an NMI (non-maskable   500 interrupt) has occurred and has thus suspended ULA access to memory, restoring   501 the normal function of the ULA.   502    503 ROM Paging   504 ----------   505    506 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM   507 mappings exist:   508    509    8    keyboard   510    9    keyboard (duplicate)   511   10    BASIC ROM   512   11    BASIC ROM (duplicate)   513    514 Paging in a ROM involves the following procedure:   515    516  1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to   517     2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is   518     selected.   519  2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero   520     whilst writing the desired ROM number n in bits 0 to 2.   521    522 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686   523    524 Keyboard Access   525 ---------------   526    527 The keyboard pages appear to be accessed at 1MHz just like the RAM.   528    529 See: https://stardot.org.uk/forums/viewtopic.php?p=254155#p254155   530    531 Shadow/Expanded Memory   532 ----------------------   533    534 The Electron exposes all sixteen address lines and all eight data lines   535 through the expansion bus. Using such lines, it is possible to provide   536 additional memory - typically sideways ROM and RAM - on expansion cards and   537 through cartridges, although the official cartridge specification provides   538 fewer address lines and only seeks to provide access to memory in 16K units.   539    540 Various modifications and upgrades were developed to offer "turbo"   541 capabilities to the Electron, permitting the CPU to access a separate 8K of   542 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via   543 the ULA through additional logic. However, an enhanced ULA might support   544 independent CPU access to memory over the expansion bus by allowing itself to   545 be discharged from providing access to memory, potentially for a range of   546 addresses, and for the CPU to communicate with external memory uninterrupted.   547    548 Sideways RAM/ROM and Upper Memory Access   549 ----------------------------------------   550    551 Although the ULA controls the CPU clock, effectively slowing or stopping the   552 CPU when the ULA needs to access screen memory, it is apparently able to allow   553 the CPU to access addresses of &8000 and above - the upper region of memory -   554 at 2MHz independently of any access to RAM that the ULA might be performing,   555 only blocking the CPU if it attempts to access addresses of &7FFF and below   556 during any ULA memory access - the lower region of memory - by stopping or   557 stalling its clock.   558    559 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the   560 CPU clock if the line goes low, when the CPU is attempting to access the lower   561 region of memory.   562    563 Hardware Scrolling (and Enhancement)   564 ------------------------------------   565    566 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with   567 the least significant 5 bits being zero, thus limiting the scrolling   568 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes   569 using the same layout of these addresses.   570    571 |--&FE02--------------| |--&FE03--------------|   572 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX   573    574    XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX   575    576 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen   577 memory to pixel locations is character oriented. A change in 8 bytes would   578 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in   579 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually   580 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User   581 Guide).   582    583 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall   584 of changing the screen address by 2 bytes is the change in the number of lines   585 from the initial and final character rows that need reading by the ULA, which   586 would need to maintain this state information (although this is a relatively   587 trivial change). Another pitfall is the complication that might be introduced   588 to software writing bitmaps of character height to the screen.   589    590 See: http://pastraiser.com/computers/acornelectron/acornelectron.html   591    592 Enhancement: Mode Layouts   593 -------------------------   594    595 Merely changing the screen memory mappings in order to have Archimedes-style   596 row-oriented screen addresses (instead of character-oriented addresses) could   597 be done for the existing modes, but this might not be sufficiently beneficial,   598 especially since accessing regions of the screen would involve incrementing   599 pointers by amounts that are inconvenient on an 8-bit CPU.   600    601 However, instead of using a Archimedes-style mapping, column-oriented screen   602 addresses could be more feasibly employed: incrementing the address would   603 reference the vertical screen location below the currently-referenced location   604 (just as occurs within characters using the existing ULA); instead of   605 returning to the top of the character row and referencing the next horizontal   606 location after eight bytes, the address would reference the next character row   607 and continue to reference locations downwards over the height of the screen   608 until reaching the bottom; at the bottom, the next location would be the next   609 horizontal location at the top of the screen.   610    611 In other words, the memory layout for the screen would resemble the following   612 (for MODE 2):   613    614   &3000 &3100       ... &7F00   615   &3001 &3101   616   ...   ...   617   &3007   618   &3008   619   ...   620   ...                   ...   621   &30FF             ... &7FFF   622    623 Since there are 256 pixel rows, each column of locations would be addressable   624 using the low byte of the address. Meanwhile, the high byte would be   625 incremented to address different columns. Thus, addressing screen locations   626 would become a lot more convenient and potentially much more efficient for   627 certain kinds of graphical output.   628    629 One potential complication with this simplified addressing scheme arises with   630 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported   631 with the existing ULA) would be achieved by incrementing or decrementing the   632 screen start address; by one character row, it would involve adding or   633 subtracting 8. However, the ULA only supports multiples of 64 when changing the   634 screen start address. Thus, if such a scheme were to be adopted, three   635 additional bits would need to be supported in the screen start register (see   636 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal   637 scrolling would be much improved even under the severe constraints of the   638 existing ULA: only adjustments of 256 to the screen start address would be   639 required to produce single-location scrolling of as few as two pixels in MODE 2   640 (four pixels in MODEs 1 and 5, eight pixels otherwise).   641    642 More disruptive is the effect of this alternative layout on software.   643 Presumably, compatibility with the BBC Micro was the primary goal of the   644 Electron's hardware design. With the character-oriented screen layout in   645 place, system software (and application software accessing the screen   646 directly) would be relying on this layout to run on the Electron with little   647 or no modification. Although it might have been possible to change the system   648 software to use this column-oriented layout instead, this would have incurred   649 a development cost and caused additional work porting things like games to the   650 Electron. Moreover, a separate branch of the software from that supporting the   651 BBC Micro and closer derivatives would then have needed maintaining.   652    653 The decision to use the character-oriented layout in the BBC Micro may have   654 been related to the choice of circuitry and to facilitate a convenient   655 hardware implementation, and by the time the Electron was planned, it was too   656 late to do anything about this somewhat unfortunate choice.   657    658 Pixel Layouts   659 -------------   660    661 The pixel layouts are as follows:   662    663   Modes         Depth (bpp)     Pixels (from bits)   664   -----         -----------     ------------------   665   0, 3, 4, 6    1               7 6 5 4 3 2 1 0   666   1, 5          2               73 62 51 40   667   2             4               7531 6420   668    669 Since the ULA reads a half-byte at a time, one might expect it to attempt to   670 produce pixels for every half-byte, as opposed to handling entire bytes.   671 However, the pixel layout is not conducive to producing pixels as soon as a   672 half-byte has been read for a given full-byte location: in 1bpp modes the   673 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel   674 data is spread across the entire byte in different ways.   675    676 An alternative arrangement might be as follows:   677    678   Modes         Depth (bpp)     Pixels (from bits)   679   -----         -----------     ------------------   680   0, 3, 4, 6    1               7 6 5 4 3 2 1 0   681   1, 5          2               76 54 32 10   682   2             4               7654 3210   683    684 Just as the mode layouts were presumably decided by compatibility with the BBC   685 Micro, the pixel layouts will have been maintained for similar reasons.   686 Unfortunately, this layout prevents any optimisation of the ULA for handling   687 half-byte pixel data generally.   688    689 Enhancement: The Missing MODE 4   690 -------------------------------   691    692 The Electron inherits its screen mode selection from the BBC Micro, where MODE   693 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.   694 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,   695 however, and they are merely implemented by skipping two scanlines in every   696 ten after the eight required to produce a character line. Thus, such modes   697 provide a 24-row display.   698    699 In principle, nothing prevents this "text mode" effect being applied to other   700 modes. The 20-column modes are not well-suited to displaying text, which   701 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than   702 2. Although the need for a non-monochrome 40-column text mode is addressed by   703 MODE 7 on the BBC Micro, the Electron lacks such a mode.   704    705 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it   706 would occupy MODE 4 instead of the current MODE 4:   707    708   Screen mode  Size (kilobytes)  Colours  Rows  Resolution   709   -----------  ----------------  -------  ----  ----------   710   0            20                2        32    640x256   711   1            20                4        32    320x256   712   2            20                16       32    160x256   713   3            16                2        24    640x256   714   4 (new)      16                4        24    320x256   715   4 (old)      10                2        32    320x256   716   5            10                4        32    160x256   717   6            8                 2        24    320x256   718    719 Thus, for increasing mode numbers, the size of each mode would be the same or   720 less than the preceding mode.   721    722 Enhancement: 2MHz RAM Access   723 ----------------------------   724    725 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU   726 when not competing with the ULA only accesses RAM every other 2MHz cycle (as   727 if the ULA still needed to access the RAM), one useful enhancement would be a   728 mechanism to let the CPU take over the ULA cycles outside the ULA's period of   729 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to   730 3.   731    732 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:   733    734   Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)   735   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)   736    737 In MODE 4 to 6:   738     739   Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)   740   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)   741    742 This would improve CPU bandwidth as follows:   743    744                 Standard ULA    Enhanced ULA    % Total Bandwidth   Speedup   745 MODE 0, 1, 2    9728 bytes      19456 bytes     24% -> 49%          2   746 MODE 3          12288 bytes     24576 bytes     31% -> 62%          2   747 MODE 4, 5       19968 bytes     29696 bytes     50% -> 74%          1.5   748 MODE 6          19968 bytes     32256 bytes     50% -> 81%          1.6   749    750 (Here, the uncontended total 2MHz bandwidth for a display period would be   751 39936 bytes, being 128 cycles per line over 312 lines.)   752    753 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth   754 because all access opportunities to RAM are doubled. Meanwhile, in the other   755 modes, some CPU accesses occur alongside ULA accesses and thus cannot be   756 doubled, but the CPU bandwidth increase is still significant.   757    758 Unfortunately, the mechanism for accessing the RAM is too slow to provide data   759 within the time constraints of 2MHz operation. There is no time remaining in a   760 2MHz cycle for the CPU to receive and process any retrieved data once the   761 necessary signalling has been performed.   762    763 The only way for the CPU to be able to access the RAM quickly enough would be   764 to do away with the double 4-bit access mechanism and to have a single 8-bit   765 channel to the memory. This would require twice as many 1-bit RAM chips or a   766 different kind of RAM chip, but it would also potentially simplify the ULA.   767    768 The section on 8-bit wide RAM access discusses the possibilities around   769 changing the memory architecture, also describing the possibility of ULA   770 accesses achieving two bytes per 2MHz cycle due to the doubling of the memory   771 channel, leaving every other access free for the CPU during the display period   772 in MODE 0 to 3...   773    774   Standard display period: UUUUUUUU   775   Modified display period: UCUCUCUC   776    777 ...and consolidating accesses in MODE 4 to 6:   778    779   Standard display period: UCUCUCUC   780   Modified display period: UCCCUCCC   781    782 Together with the enhancements for non-display periods, such an "Enhanced+ ULA"   783 would perform as follows:   784    785                 Standard ULA    Enhanced+ ULA   % Total Bandwidth   Speedup   786 MODE 0, 1, 2    9728 bytes      29696 bytes     24% -> 74%          3.1   787 MODE 3          12288 bytes     32256 bytes     31% -> 81%          2.6   788 MODE 4, 5       19968 bytes     34816 bytes     50% -> 87%          1.7   789 MODE 6          19968 bytes     36096 bytes     50% -> 90%          1.8   790    791 Of course, the principal enhancement would be the wider memory channel, with   792 more buffering in the ULA being its contribution to this arrangement.   793    794 Enhancement: Region Blanking   795 ----------------------------   796    797 The problem of permitting character-oriented blitting in programs whilst   798 scrolling the screen by sub-character amounts could be mitigated by permitting   799 a region of the display to be blank, such as the final lines of the display.   800 Consider the following vertical scrolling by 2 bytes that would cause an   801 initial character row of 6 lines and a final character row of 2 lines:   802    803     6 lines - initial, partial character row   804   248 lines - 31 complete rows   805     2 lines - final, partial character row   806    807 If a routine were in use that wrote 8 line bitmaps to the partial character   808 row now split in two, it would be advisable to hide one of the regions in   809 order to prevent content appearing in the wrong place on screen (such as   810 content meant to appear at the top "leaking" onto the bottom). Blanking 6   811 lines would be sufficient, as can be seen from the following cases.   812    813 Scrolling up by 2 lines:   814    815     6 lines - initial, partial character row   816   240 lines - 30 complete rows   817     4 lines - part of 1 complete row   818   -----------------------------------------------------------------   819     4 lines - part of 1 complete row (hidden to maintain 250 lines)   820     2 lines - final, partial character row (hidden)   821    822 Scrolling down by 2 lines:   823    824     2 lines - initial, partial character row   825   248 lines - 31 complete rows   826   ----------------------------------------------------------   827     6 lines - final, partial character row (hidden)   828    829 Thus, in this case, region blanking would impose a 250 line display with the   830 bottom 6 lines blank.   831    832 See the description of the display suspend enhancement for a more efficient   833 way of blanking lines than merely blanking the palette whilst allowing the CPU   834 to perform useful work during the blanking period.   835    836 To control the blanking or suspending of lines at the top and bottom of the   837 display, a memory location could be dedicated to the task: the upper 4 bits   838 could define a blanking region of up to 16 lines at the top of the screen,   839 whereas the lower 4 bits could define such a region at the bottom of the   840 screen. If more lines were required, two locations could be employed, allowing   841 the top and bottom regions to occupy the entire screen.   842    843 Enhancement: Screen Height Adjustment   844 -------------------------------------   845    846 The height of the screen could be configurable in order to reduce screen   847 memory consumption. This is not quite done in MODE 3 and 6 since the start of   848 the screen appears to be rounded down to the nearest page, but by reducing the   849 height by amounts more than a page, savings would be possible. For example:   850    851   Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address   852   ------------  -----  ------  --------------  ---------------  -------------   853   640           1      252     80              320              &3140 -> &3100   854   640           1      248     80              640              &3280 -> &3200   855   320           1      240     40              640              &5A80 -> &5A00   856   320           2      240     80              1280             &3500   857    858 Screen Mode Selection   859 ---------------------   860    861 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider   862 range of modes, the other bits of &FE*7 (related to sound, cassette   863 input/output and the Caps Lock LED) would need to be reassigned and bit 0   864 potentially being made available for use.   865    866 Enhancement: Palette Definition   867 -------------------------------   868    869 Since all memory accesses go via the ULA, an enhanced ULA could employ more   870 specific addresses than &FE*X to perform enhanced functions. For example, the   871 palette control is done using &FE*8-F and merely involves selecting predefined   872 colours, whereas an enhanced ULA could support the redefinition of all 16   873 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F   874 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour   875 specifications similar to those used on the Archimedes.   876    877 The principal limitation here is actually the hardware: the Electron has only   878 a single output line for each of the red, green and blue channels, and if   879 those outputs are strictly digital and can only be set to a "high" and "low"   880 value, then only the existing eight colours are possible. If a modern ULA were   881 able to output analogue values (or values at well-defined points between the   882 high and low values, such as the half-on value supported by the Amstrad CPC   883 series), it would still need to be assessed whether the circuitry could   884 successfully handle and propagate such values. Various sources indicate that   885 only "TTL levels" are supported by the RGB output circuit, and since there are   886 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it   887 is likely that the ULA is expected to provide only "high" or "low" values.   888    889 Short of adding extra outputs from the ULA (either additional red, green and   890 blue outputs or a combined intensity output), another approach might involve   891 some kind of modulation where an output value might be encoded in multiple   892 pulses at a higher frequency than the pixel frequency. However, this would   893 demand additional circuitry outside the ULA, and component RGB monitors would   894 probably not be able to take advantage of this feature; only UHF and composite   895 video devices (the latter with the composite video colour support enabled on   896 the Electron's circuit board) would potentially benefit.   897    898 Flashing Colours   899 ----------------   900    901 According to the Advanced User Guide, "The cursor and flashing colours are   902 entirely generated in software: This means that all of the logical to physical   903 colour map must be changed to cause colours to flash." This appears to suggest   904 that the palette registers must be updated upon the flash counter - read and   905 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the   906 colour pairs to be any combination of colours might be possible, instead of   907 having colour complements as pairs.   908    909 It is conceivable that the interrupt code responsible does the simple thing   910 and merely inverts the current values for any logical colours (LC) for which   911 the associated physical colour (as supplied as the second parameter to the VDU   912 19 call) has the top bit of its four bit value set. These top bits are not   913 recorded in the palette registers but are presumably recorded separately and   914 used to build bitmaps as follows:   915    916   LC  2 colour  4 colour  16 colour  4-bit value for inversion   917   --  --------  --------  ---------  -------------------------   918    0  00010001  00010001  00010001   1, 1, 1   919    1  01000100  00100010  00010001   4, 2, 1   920    2            01000100  00100010      4, 2   921    3            10001000  00100010      8, 2   922    4                      00010001         1   923    5                      00010001         1   924    6                      00100010         2   925    7                      00100010         2   926    8                      01000100         4   927    9                      01000100         4   928   10                      10001000         8   929   11                      10001000         8   930   12                      01000100         4   931   13                      01000100         4   932   14                      10001000         8   933   15                      10001000         8   934    935   Inversion value calculation:   936    937    2 colour formula: 1 << (colour * 2)   938    4 colour formula: 1 << colour   939   16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))   940    941 For example, where logical colour 0 has been mapped to a physical colour in   942 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to   943 the inversion operation. (The lower three bits of the physical colour would be   944 used to set the underlying colour information affected by the inversion   945 operation.)   946    947 An operation in the interrupt code would then combine the bitmaps for all   948 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being   949 combined for groups of logical colours as follows:   950    951    Logical colours   952    ---------------   953    0,  2,  8, 10   954    4,  6, 12, 14   955    5,  7, 13, 15   956    1,  3,  9, 11   957    958 These combined bitmaps would be EORed with the existing palette register   959 values in order to perform the value inversion necessary to produce the   960 flashing effect.   961    962 Thus, in the VDU 19 operation, the appropriate inversion value would be   963 calculated for the logical colour, and this value would then be combined with   964 other inversion values in a dedicated memory location corresponding to the   965 colour's group as indicated above. Meanwhile, the palette channel values would   966 be derived from the lower three bits of the specified physical colour and   967 combined with other palette data in dedicated memory locations corresponding   968 to the palette registers.   969    970 Interestingly, although flashing colours on the BBC Micro are controlled by   971 toggling bit 0 of the &FE20 control register location for the Video ULA, the   972 actual colour inversion is done in hardware.   973    974 Enhancement: Palette Definition Lists   975 -------------------------------------   976    977 It can be useful to redefine the palette in order to change the colours   978 available for a particular region of the screen, particularly in modes where   979 the choice of colours is constrained, and if an increased colour depth were   980 available, palette redefinition would be useful to give the illusion of more   981 than 16 colours in MODE 2. Traditionally, palette redefinition has been done   982 by using interrupt-driven timers, but a more efficient approach would involve   983 presenting lists of palette definitions to the ULA so that it can change the   984 palette at a particular display line.   985    986 One might define a palette redefinition list in a region of memory and then   987 communicate its contents to the ULA by writing the address and length of the   988 list, along with the display line at which the palette is to be changed, to   989 ULA registers such that the ULA buffers the list and performs the redefinition   990 at the appropriate time. Throughput/bandwidth considerations might impose   991 restrictions on the practical length of such a list, however.   992    993 Enhancement: Display Synchronisation Interrupts   994 -----------------------------------------------   995    996 When completing each scanline of the display, the ULA could trigger an   997 interrupt. Since this might impact system performance substantially, the   998 feature would probably need to be configurable, and it might be sufficient to   999 have an interrupt only after a certain number of display lines instead.  1000 Permitting the CPU to take action after eight lines would allow palette  1001 switching and other effects to occur on a character row basis.  1002   1003 The ULA provides an interrupt at the end of the display period, presumably so  1004 that software can schedule updates to the screen, avoid flickering or tearing,  1005 and so on. However, some applications might benefit from an interrupt at, or  1006 just before, the start of the display period so that palette modifications or  1007 similar effects could be scheduled.  1008   1009 Enhancement: Palette-Free Modes  1010 -------------------------------  1011   1012 Palette-free modes might be defined where bit values directly correspond to  1013 the red, green and blue channels, although this would mostly make sense only  1014 for modes with depths greater than the standard 4 bits per pixel, and such  1015 modes would require more memory than MODE 2 if they were to have an acceptable  1016 resolution.  1017   1018 Enhancement: Display Suspend  1019 ----------------------------  1020   1021 Especially when writing to the screen memory, it could be beneficial to be  1022 able to suspend the ULA's access to the memory, instead producing blank values  1023 for all screen pixels until a program is ready to reveal the screen. This is  1024 different from palette blanking since with a blank palette, the ULA is still  1025 reading screen memory and translating its contents into pixel values that end  1026 up being blank.  1027   1028 This function is reminiscent of a capability of the ZX81, albeit necessary on  1029 that hardware to reduce the load on the system CPU which was responsible for  1030 producing the video output. By allowing display suspend on the Electron, the  1031 performance benefit would be derived from giving the CPU full access to the  1032 memory bandwidth.  1033   1034 Note that since the CPU is only able to access RAM at 1MHz, there is no  1035 possibility to improve performance beyond that achieved in MODE 4, 5 or 6  1036 normally. However, if faster RAM access were to be made possible (see the  1037 discussion of 8-bit wide RAM access), the CPU could benefit from freeing up  1038 the ULA's access slots entirely.  1039   1040 The region blanking feature mentioned above could be implemented using this  1041 enhancement instead of employing palette blanking for the affected lines of  1042 the display.  1043   1044 Enhancement: Memory Filling  1045 ---------------------------  1046   1047 A capability that could be given to an enhanced ULA is that of permitting the  1048 ULA to write to screen memory as well being able to read from it. Although  1049 such a capability would probably not be useful in conjunction with the  1050 existing read operations when producing a screen display, and insufficient  1051 bandwidth would exist to do so in high-bandwidth screen modes anyway, the  1052 capability could be offered during a display suspend period (as described  1053 above), permitting a more efficient mechanism to rapidly fill memory with a  1054 predetermined value.  1055   1056 This capability could also support block filling, where the limits of the  1057 filled memory would be defined by the position and size of a screen area,  1058 although this would demand the provision of additional registers in the ULA to  1059 retain the details of such areas and additional logic to control the fill  1060 operation.  1061   1062 Enhancement: Region Filling  1063 ---------------------------  1064   1065 An alternative to memory writing might involve indicating regions using  1066 additional registers or memory where the ULA fills regions of the screen with  1067 content instead of reading from memory. Unlike hardware sprites which should  1068 realistically provide varied content, region filling could employ single  1069 colours or patterns, and one advantage of doing so would be that the ULA need  1070 not access memory at all within a particular region.  1071   1072 Regions would be defined on a row-by-row basis. Instead of reading memory and  1073 blitting a direct representation to the screen, the ULA would read region  1074 definitions containing a start column, region width and colour details. There  1075 might be a certain number of definitions allowed per row, or the ULA might  1076 just traverse an ordered list of such definitions with each one indicating the  1077 row, start column, region width and colour details.  1078   1079 One could even compress this information further by requiring only the row,  1080 start column and colour details with each subsequent definition terminating  1081 the effect of the previous one. However, one would also need to consider the  1082 convenience of preparing such definitions and whether efficient access to  1083 definitions for a particular row might be desirable. It might also be  1084 desirable to avoid having to prepare definitions for "empty" areas of the  1085 screen, effectively making the definition of the screen contents employ  1086 run-length encoding and employ only colour plus length information.  1087   1088 One application of region filling is that of simple 2D and 3D shape rendering.  1089 Although it is entirely possible to plot such shapes to the screen and have  1090 the ULA blit the memory contents to the screen, such operations consume  1091 bandwidth both in the initial plotting and in the final transfer to the  1092 screen. Region filling would reduce such bandwidth usage substantially.  1093   1094 This way of representing screen images would make certain kinds of images  1095 unfeasible to represent - consider alternating single pixel values which could  1096 easily occur in some character bitmaps - even if an internal queue of regions  1097 were to be supported such that the ULA could read ahead and buffer such  1098 "bandwidth intensive" areas. Thus, the ULA might be better served providing  1099 this feature for certain areas of the display only as some kind of special  1100 graphics window.  1101   1102 Enhancement: Hardware Sprites  1103 -----------------------------  1104   1105 An enhanced ULA might provide hardware sprites, but this would be done in an  1106 way that is incompatible with the standard ULA, since no &FE*X locations are  1107 available for allocation. To keep the facility simple, hardware sprites would  1108 have a standard byte width and height.  1109   1110 The specification of sprites could involve the reservation of 16 locations  1111 (for example, &FE20-F) specifying a fixed number of eight sprites, with each  1112 location pair referring to the sprite data. By limiting the ULA to dealing  1113 with a fixed number of sprites, the work required inside the ULA would be  1114 reduced since it would avoid having to deal with arbitrary numbers of sprites.  1115   1116 The principal limitation on providing hardware sprites is that of having to  1117 obtain sprite data, given that the ULA is usually required to retrieve screen  1118 data, and given the lack of memory bandwidth available to retrieve sprite data  1119 (particularly from multiple sprites supposedly at the same position) and  1120 screen data simultaneously. Although the ULA could potentially read sprite  1121 data and screen data in alternate memory accesses in screen modes where the  1122 bandwidth is not already fully utilised, this would result in a degradation of  1123 performance.  1124   1125 Enhancement: Additional Screen Mode Configurations  1126 --------------------------------------------------  1127   1128 Alternative screen mode configurations could be supported. The ULA has to  1129 produce 640 pixel values across the screen, with pixel doubling or quadrupling  1130 employed to fill the screen width:  1131   1132   Screen width      Columns     Scaling     Depth       Bytes  1133   ------------      -------     -------     -----       -----  1134   640               80          x1          1           80  1135   320               40          x2          1, 2        40, 80  1136   160               20          x4          2, 4        40, 80  1137   1138 It must also use at most 80 byte-sized memory accesses to provide the  1139 information for the display. Given that characters must occupy an 8x8 pixel  1140 array, if a configuration featuring anything other than 20, 40 or 80 character  1141 columns is to be supported, compromises must be made such as the introduction  1142 of blank pixels either between characters (such as occurs between rows in MODE  1143 3 and 6) or at the end of a scanline (such as occurs at the end of the frame  1144 in MODE 3 and 6). Consider the following configuration:  1145   1146   Screen width      Columns     Scaling     Depth       Bytes       Blank  1147   ------------      -------     -------     -----       ------      -----  1148   208               26          x3          1, 2        26, 52      16  1149   1150 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4  1151 colours could be provided, with 16 blank pixel values (out of a total of 640)  1152 generated either at the start or end (or split between the start and end) of  1153 each scanline.  1154   1155 Enhancement: Character Attributes  1156 ---------------------------------  1157   1158 The BBC Micro MODE 7 employs something resembling character attributes to  1159 support teletext displays, but depends on circuitry providing a character  1160 generator. The ZX Spectrum, on the other hand, provides character attributes  1161 as a means of colouring bitmapped graphics. Although such a feature is very  1162 limiting as the sole means of providing multicolour graphics, in situations  1163 where the choice is between low resolution multicolour graphics or high  1164 resolution monochrome graphics, character attributes provide a potentially  1165 useful compromise.  1166   1167 For each byte read, the ULA must deliver 8 pixel values (out of a total of  1168 640) to the video output, doing so by either emptying its pixel buffer on a  1169 pixel per cycle basis, or by multiplying pixels and thus holding them for more  1170 than one cycle. For example for a screen mode having 640 pixels in width:  1171   1172   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  1173   Reads:    B                               B  1174   Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  1175   1176 And for a screen mode having 320 pixels in width:  1177   1178   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  1179   Reads:    B  1180   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7  1181   1182 However, in modes where less than 80 bytes are required to generate the pixel  1183 values, an enhanced ULA might be able to read additional bytes between those  1184 providing the bitmapped graphics data:  1185   1186   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  1187   Reads:    B                               A  1188   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7  1189   1190 These additional bytes could provide colour information for the bitmapped data  1191 in the following character column (of 8 pixels). Since it would be desirable  1192 to apply attribute data to the first column, the initial 8 cycles might be  1193 configured to not produce pixel values.  1194   1195 For an entire character, attribute data need only be read for the first row of  1196 pixels for a character. The subsequent rows would have attribute information  1197 applied to them, although this would require the attribute data to be stored  1198 in some kind of buffer. Thus, the following access pattern would be observed:  1199   1200   Reads:    A B _ B _ B _ B _ B _ B _ B _ B ...  1201   1202 In modes 3 and 6, the blank display lines could be used to retrieve attribute  1203 data:  1204   1205   Reads (blank):     A _ A _ A _ A _ A _ A _ A _ A _ ...  1206   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...  1207   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...  1208                      ...  1209   1210 See below for a discussion of using this for character data as well.  1211   1212 A whole byte used for colour information for a whole character would result in  1213 a choice of 256 colours, and this might be somewhat excessive. By only reading  1214 attribute bytes at every other opportunity, a choice of 16 colours could be  1215 applied individually to two characters.  1216   1217   Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  1218   Reads:    B               A               B               -  1219   Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7  1220   1221 Further reductions in attribute data access, offering 4 colours for every  1222 character in a four character block, for example, might also be worth  1223 considering.  1224   1225 Consider the following configurations for screen modes with a colour depth of  1226 1 bit per pixel for bitmap information:  1227   1228   Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start  1229   ------------  -------  -------  ---------  ---------  -------  ------------  1230   320           40       x2       40         40         256      &5300  1231   320           40       x2       40         20         16       &5580 -> &5500  1232   320           40       x2       40         10         4        &56C0 -> &5600  1233   208           26       x3       26         26         256      &62C0 -> &6200  1234   208           26       x3       26         13         16       &6460 -> &6400  1235   1236 Enhancement: Text-Only Modes using Character and Attribute Data  1237 ---------------------------------------------------------------  1238   1239 In modes 3 and 6, the blank display lines could be used to retrieve character  1240 and attribute data instead of trying to insert it between bitmap data accesses,  1241 but this data would then need to be retained:  1242   1243   Reads:    A C A C A C A C A C A C A C A C ...  1244   Reads:    B _ B _ B _ B _ B _ B _ B _ B _ ...  1245   1246 Only attribute (A) and character (C) reads would require screen memory  1247 storage. Bitmap data reads (B) would involve either accesses to memory to  1248 obtain character definition details or could, at the cost of special storage  1249 in the ULA, involve accesses within the ULA that would then free up the RAM.  1250 However, the CPU would not benefit from having any extra access slots due to  1251 the limitations of the RAM access mechanism.  1252   1253 A scheme without caching might be possible. The same line of memory addresses  1254 might be visited over and over again for eight display lines, with an index  1255 into the bitmap data being incremented from zero to seven. The access patterns  1256 would look like this:  1257   1258   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 0)  1259   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 1)  1260   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 2)  1261   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 3)  1262   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 4)  1263   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 5)  1264   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 6)  1265   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 7)  1266   1267 The bandwidth requirements would be the sum of the accesses to read the  1268 character values (repeatedly) and those to read the bitmap data to reproduce  1269 the characters on screen.  1270   1271 Enhancement: MODE 7 Emulation using Character Attributes  1272 --------------------------------------------------------  1273   1274 If the scheme of applying attributes to character regions were employed to  1275 emulate MODE 7, in conjunction with the MODE 6 display technique, the  1276 following configuration would be required:  1277   1278   Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start  1279   ------------  -------  ----  ---------  ---------  -------  ------------  1280   320           40       25    40         20         16       &5ECC -> &5E00  1281   320           40       25    40         10         4        &5FC6 -> &5F00  1282   1283 Although this requires much more memory than MODE 7 (8500 bytes versus MODE  1284 7's 1000 bytes), it does not need much more memory than MODE 6, and it would  1285 at least make a limited 40-column multicolour mode available as a substitute  1286 for MODE 7.  1287   1288 Using the text-only enhancement with caching of data or with repeated reads of  1289 the same character data line for eight display lines, the storage requirements  1290 would be diminished substantially:  1291   1292   Screen width  Columns  Rows  Bytes (C)  Bytes (A)  Colours  Screen start  1293   ------------  -------  ----  ---------  ---------  -------  ------------  1294   320           40       25    40         20         16       &7A94 -> &7A00  1295   320           40       25    40         10         4        &7B1E -> &7B00  1296   320           40       25    40         5          2        &7B9B -> &7B00  1297   320           40       25    40         0          (2)      &7C18 -> &7C00  1298   640           80       25    80         40         16       &7448 -> &7400  1299   640           80       25    80         20         4        &763C -> &7600  1300   640           80       25    80         10         2        &7736 -> &7700  1301   640           80       25    80         0          (2)      &7830 -> &7800  1302   1303 Note that the colours describe the locally defined attributes for each  1304 character. When no attribute information is provided, the colours are defined  1305 globally.  1306   1307 Enhancement: Compressed Character Data  1308 --------------------------------------  1309   1310 Another observation about text-only modes is that they only need to store a  1311 restricted set of bitmapped data values. Encoding this set of values in a  1312 smaller unit of storage than a byte could possibly help to reduce the amount  1313 of storage and bandwidth required to reproduce the characters on the display.  1314   1315 Enhancement: High Resolution Graphics  1316 -------------------------------------  1317   1318 Screen modes with higher resolutions and larger colour depths might be  1319 possible, but this would in most cases involve the allocation of more screen  1320 memory, and the ULA would probably then be obliged to page in such memory for  1321 the CPU to be able to sensibly access it all.  1322   1323 Enhancement: Genlock Support  1324 ----------------------------  1325   1326 The ULA generates a video signal in conjunction with circuitry producing the  1327 output features necessary for the correct display of the screen image.  1328 However, it appears that the ULA drives the video synchronisation mechanism  1329 instead of reacting to an existing signal. Genlock support might be possible  1330 if the ULA were made to be responsive to such external signals, resetting its  1331 address generators upon receiving synchronisation events.  1332   1333 Enhancement: Improved Sound  1334 ---------------------------  1335   1336 The standard ULA reserves &FE*6 for sound generation and cassette input/output  1337 (with bits 1 and 2 of &FE*7 being used to select either sound generation or  1338 cassette I/O), thus making it impossible to support multiple channels within  1339 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,  1340 and an enhanced ULA could adopt this interface.  1341   1342 The BBC Micro uses the SN76489 chip to produce sound, and the entire  1343 functionality of this chip could be emulated for enhanced sound, with a subset  1344 of the functionality exposed via the &FE*6 interface.  1345   1346 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489  1347 See: http://www.smspower.org/Development/SN76489  1348   1349 Enhancement: Waveform Upload  1350 ----------------------------  1351   1352 As with a hardware sprite function, waveforms could be uploaded or referenced  1353 using locations as registers referencing memory regions.  1354   1355 Enhancement: Sound Input/Output  1356 -------------------------------  1357   1358 Since the ULA already controls audio input/output for cassette-based data, it  1359 would have been interesting to entertain the idea of sampling and output of  1360 sounds through the cassette interface. However, a significant amount of  1361 circuitry is employed to process the input signal for use by the ULA and to  1362 process the output signal for recording.  1363   1364 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11  1365   1366 Enhancement: BBC ULA Compatibility  1367 ----------------------------------  1368   1369 Although some new ULA functions could be defined in a way that is also  1370 compatible with the BBC Micro, the BBC ULA is itself incompatible with the  1371 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory  1372 map, but controls various functions specific to the 6845 video controller;  1373 &FE08-F is reserved for the serial controller. It therefore becomes possible  1374 to disregard compatibility where compatibility is already disregarded for a  1375 particular area of functionality.  1376   1377 &FE20-F maps to video ULA functionality on the BBC Micro which provides  1378 control over the palette (using address &FE21, compared to &FE07-F on the  1379 Electron) and other system-specific functions. Since the location usage is  1380 generally incompatible, this region could be reused for other purposes.  1381   1382 Enhancement: Increased RAM, ULA and CPU Performance  1383 ---------------------------------------------------  1384   1385 More modern implementations of the hardware might feature faster RAM coupled  1386 with an increased ULA clock frequency in order to increase the bandwidth  1387 available to the ULA and to the CPU in situations where the ULA is not needed  1388 to perform work. A ULA employing a 32MHz clock would be able to complete the  1389 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU  1390 to access the RAM for the following 250ns even in display modes requiring the  1391 retrieval of a byte for the display every 500ns. The CPU could, subject to  1392 timing issues, run at 2MHz even in MODE 0, 1 and 2.  1393   1394 A scheme such as that described above would have a similar effect to the  1395 scheme employed in the BBC Micro, although the latter made use of RAM with a  1396 wider bandwidth in order to complete memory transfers within 250ns and thus  1397 permit the CPU to run continuously at 2MHz.  1398   1399 Higher bandwidth could potentially be used to implement exotic features such  1400 as RAM-resident hardware sprites or indeed any feature demanding RAM access  1401 concurrent with the production of the display image.  1402   1403 Enhancement: Multiple CPU Stacks and Zero Pages  1404 -----------------------------------------------  1405   1406 The 6502 maintains a stack for subroutine calls and register storage in page  1407 &01. Although the stack register can be manipulated using the TSX and TXS  1408 instructions, thereby permitting the maintenance of multiple stack regions and  1409 thus the potential coexistence of multiple programs each using a separate  1410 region, only programs that make little use of the stack (perhaps avoiding  1411 deeply-nested subroutine invocations and significant register storage) would  1412 be able to coexist without overwriting each other's stacks.  1413   1414 One way that this issue could be alleviated would involve the provision of a  1415 facility to redirect accesses to page &01 to other areas of memory. The ULA  1416 would provide a register that defines a physical page for the use of the CPU's  1417 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would  1418 change the asserted address lines to redirect the access to the appropriate  1419 physical region.  1420   1421 By providing an 8-bit register, mapping to the most significant byte (MSB) of  1422 a 16-bit address, the ULA could then replace any MSB equal to &01 with the  1423 register value before the access is made. Where multiple programs coexist,  1424 upon switching programs, the register would be updated to point the ULA to the  1425 appropriate stack location, thus providing a simple memory management unit  1426 (MMU) capability.  1427   1428 In a similar fashion, zero page accesses could also be redirected so that code  1429 could run from sideways RAM and have zero page operations redirected to "upper  1430 memory" - for example, to page &BE (with stack accesses redirected to page  1431 &BF, perhaps) - thereby permitting most CPU operations to occur without  1432 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the  1433 CPU as it contends with the ULA for memory access.  1434   1435 Such facilities could also be provided by a separate circuit between the CPU  1436 and ULA in a fashion similar to that employed by a "turbo" board, but unlike  1437 such boards, no additional RAM would be provided: all memory accesses would  1438 occur as normal through the ULA, albeit redirected when configured  1439 appropriately.  1440   1441 ULA Pin Functions  1442 -----------------  1443   1444 The functions of the ULA pins are described in the Electron Service Manual. Of  1445 interest to video processing are the following:  1446   1447   CSYNC (low during horizontal or vertical synchronisation periods, high  1448          otherwise)  1449   1450   HS (low during horizontal synchronisation periods, high otherwise)  1451   1452   RED, GREEN, BLUE (pixel colour outputs)  1453   1454   CLOCK IN (a 16MHz clock input, 4V peak to peak)  1455   1456   PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)  1457   1458 More general memory access pins:  1459   1460   RAM0...RAM3 (data lines to/from the RAM)  1461   1462   RA0...RA7 (address lines for sending both row and column addresses to the RAM)  1463   1464   RAS (row address strobe setting the row address on a negative edge - see the  1465        timing notes)  1466   1467   CAS (column address strobe setting the column address on a negative edge -  1468        see the timing notes)  1469   1470   WE (sets write enable with logic 0, read with logic 1)  1471   1472   ROM (select data access from ROM)  1473   1474 CPU-oriented memory access pins:  1475   1476   A0...A15 (CPU address lines)  1477   1478   PD0...PD7 (CPU data lines)  1479   1480   R/W (indicates CPU write with logic 0, CPU read with logic 1)  1481   1482 Interrupt-related pins:  1483   1484   NMI (CPU request for uninterrupted 1MHz access to memory)  1485   1486   IRQ (signal event to CPU)  1487   1488   POR (power-on reset, resetting the ULA on a positive edge and asserting the  1489        CPU's RST pin)  1490   1491   RST (master reset for the CPU signalled on power-up and by the Break key)  1492   1493 Keyboard-related pins:  1494   1495   KBD0...KBD3 (keyboard inputs)  1496   1497   CAPS LOCK (control status LED)  1498   1499 Sound-related pins:  1500   1501   SOUND O/P (sound output using internal oscillator)  1502   1503 Cassette-related pins:  1504   1505   CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)  1506   1507   CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)  1508   1509   CAS RC (detect high tone)  1510   1511   CAS MO (motor relay output)  1512   1513   ?13 IN (~1200 baud clock input)  1514   1515 ULA Socket  1516 ----------  1517   1518 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.  1519   1520 References  1521 ----------  1522   1523 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm  1524   1525 About this Document  1526 -------------------  1527   1528 The most recent version of this document and accompanying distribution should  1529 be available from the following location:  1530   1531 http://hgweb.boddie.org.uk/ULA  1532   1533 Copyright and licence information can be found in the docs directory of this  1534 distribution - see docs/COPYING.txt for more information.