ULA

ULA.txt

115:46bcedba4e27
3 weeks ago Paul Boddie Made clarification about 1MHz RAM access, added 2MHz bandwidth figure. Fixed service manual link.
     1 The Acorn Electron ULA     2 ======================     3      4 Principal Design and Feature Constraints     5 ----------------------------------------     6      7 The features of the ULA are limited by the amount of time and resources that     8 can be allocated to each activity necessary to support such features given the     9 fundamental obligations of the unit. Maintaining a screen display based on the    10 contents of RAM itself requires the ULA to have exclusive access to such    11 hardware resources for a significant period of time. Whilst other elements of    12 the ULA can in principle run in parallel with this activity, they cannot also    13 access the RAM. Consequently, other features that might use the RAM must    14 accept a reduced allocation of that resource in comparison to a hypothetical    15 architecture where concurrent RAM access is possible.    16     17 Thus, the principal constraint for many features is bandwidth. The duration of    18 access to hardware resources is one aspect of this; the rate at which such    19 resources can be accessed is another. For example, the RAM is not fast enough    20 to support access more frequently than one byte per 2MHz cycle, and for screen    21 modes involving 80 bytes of screen data per scanline, there are no free cycles    22 for anything other than the production of pixel output during the active    23 scanline periods.    24     25 Timing    26 ------    27     28 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256    29 of which are used to generate pixel data. At 50Hz, this means that 128 cycles    30 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /    31 312 ~= 128 cycles). This is consistent with the observation that each scanline    32 requires at most 80 bytes of data, and that the ULA is apparently busy for 40    33 out of 64 microseconds in each scanline.    34     35 (In fact, since the ULA is seeking to provide an image for an interlaced    36 625-line display, there are in fact two "fields" involved, one providing 312    37 scanlines and one providing 313 scanlines. See below for a description of the    38 video system.)    39     40 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,    41 each providing two bits of each byte) using two cycles within the 500ns period    42 of the 2MHz clock to complete each access operation. Since the CPU and ULA    43 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must    44 effectively run at 1MHz (since every other 500ns period involves the ULA    45 accessing RAM) during transfers of screen data.    46     47 The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided    48 by the ULA (IC1) depending on the screen mode in use.  Each 16MHz cycle is    49 approximately 62.5ns. To access the memory, the following patterns    50 corresponding to 16MHz cycles are required:    51     52      Time (ns):  0-------------- 500------------- ...    53    2 MHz cycle:  0               1                ...    54   16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...    55                  /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...    56           ~RAS:  /---\___________/---\___________ ...    57           ~CAS:  /-----\___/-\___/-----\___/-\___ ...    58 Address events:      A B     C       A B     C    ...    59    Data events:           F     S         F     S ...    60     61       ~RAS ops:  1   0           1   0            ...    62       ~CAS ops:  1     0   1 0   1     0   1 0    ...    63     64    Address ops:     a b     c       a b     c     ...    65       Data ops:  s         f     s         f      ...    66     67            ~WE:  ......W                          ...    68        PHI OUT:  \_______________/--------------- ...    69      CPU (RAM):  L               D                ...    70            RnW:  R                                ...    71     72        PHI OUT:  \_______/-------\_______/------- ...    73      CPU (ROM):  L       D       L       D        ...    74            RnW:          R               R        ...    75     76 ~RAS must be high for 100ns, ~CAS must be high for 50ns.    77 ~RAS must be low for 150ns, ~CAS must be low for 90ns.    78 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.    79     80 Here, "A" and "B" respectively indicate the row and first column addresses    81 being latched into the RAM (on a negative edge for ~RAS and ~CAS    82 respectively), and "C" indicates the second column address being latched into    83 the RAM. Presumably, the first and second half-bytes can be read at "F" and    84 "S" respectively, and the row and column addresses must be made available at    85 "a" and "b" (and "c") respectively at the latest. Data can be read at "f" and    86 "s" for the first and second half-bytes respectively.    87     88 For the CPU, "L" indicates the point at which an address is taken from the CPU    89 address bus, on a negative edge of PHI OUT, with "D" being the point at which    90 data may either be read or be asserted for writing, on a positive edge of PHI    91 OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low    92 for writing or high for reading, and thus propagates RnW from the CPU, this    93 would need to be done before data would be retrieved and, according to the    94 TM4164EC4 datasheet, even as late as the column address is presented and ~CAS    95 brought low.    96     97 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column    98 address access time of 90ns (maximum), which appears to mean that ~RAS must be    99 held low for at least 150ns and that ~CAS must be held low for at least 90ns   100 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44   101 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"   102 is 1.5 cycles.   103    104 Note that the Service Manual refers to the negative edge of RAS and CAS, but   105 the datasheet for the similar TM4164EC4 product shows latching on the negative   106 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to   107 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that   108 "page mode" provides the appropriate behaviour for that particular product.   109    110 The CPU, when accessing the RAM alone, apparently does not make use of the   111 vacated "slot" that the ULA would otherwise use (when interleaving accesses in   112 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when   113 accessing ROM (and potentially sideways RAM). The principal limitation is the   114 amount of time needed between issuing an address and receiving an entire byte   115 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the   116 4 cycles that would be required for 2MHz operation.   117    118 See: Acorn Electron Advanced User Guide   119 See: Acorn Electron Service Manual   120      http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf   121 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm   122 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438   123    124 CPU Clock Notes   125 ---------------   126    127 "The 6502 receives an external square-wave clock input signal on pin 37, which   128 is usually labeled PHI0. [...] This clock input is processed within the 6502   129 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2   130 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been   131 through two inverters and a push-pull amplifier. The same network of   132 transistors within the 6502 which generates PHI2 is also tied to PHI1, and   133 generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made   134 available to external devices is so that they know when they can access the   135 CPU. When PHI1 is high, this means that external devices can read from the   136 address bus or data bus; when PHI2 is high, this means that external devices   137 can write to the data bus."   138    139 See: http://lateblt.livejournal.com/88105.html   140    141 "The 6502 has a synchronous memory bus where the master clock is divided into   142 two phases (Phase 1 and Phase 2). The address is always generated during Phase   143 1 and all memory accesses take place during Phase 2."   144    145 See: http://www.jmargolin.com/vgens/vgens.htm   146    147 Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During   148 Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means   149 when PHI1 is high.   150    151 Bandwidth Figures   152 -----------------   153    154 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312   155 total lines, with 80 cycles occurring in the active periods of display   156 scanlines, the following bandwidth calculations can be performed:   157    158 Total theoretical maximum:   159        128 cycles * 312 lines   160      = 39936 bytes   161    162 MODE 0, 1, 2:   163 ULA:    80 cycles * 256 lines   164      = 20480 bytes   165 CPU:    48 cycles / 2 * 256 lines   166      + 128 cycles / 2 * (312 - 256) lines   167      = 9728 bytes   168    169 MODE 3:   170 ULA:    80 cycles * 24 rows * 8 lines   171      = 15360 bytes   172 CPU:    48 cycles / 2 * 24 rows * 8 lines   173      + 128 cycles / 2 * (312 - (24 rows * 8 lines))   174      = 12288 bytes   175    176 MODE 4, 5:   177 ULA:    40 cycles * 256 lines   178      = 10240 bytes   179 CPU:   (40 cycles + 48 cycles / 2) * 256 lines   180      + 128 cycles / 2 * (312 - 256) lines   181      = 19968 bytes   182    183 MODE 6:   184 ULA:    40 cycles * 24 rows * 8 lines   185      = 7680 bytes   186 CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines   187      + 128 cycles / 2 * (312 - (24 rows * 8 lines))   188      = 19968 bytes   189    190 Here, the division of 2 for CPU accesses is performed to indicate that the CPU   191 only uses every other access opportunity even in uncontended periods. See the   192 2MHz RAM Access enhancement below for bandwidth calculations that consider   193 this limitation removed.   194    195 Video Timing   196 ------------   197    198 According to 8.7 in the Service Manual, and the PAL Wikipedia page,   199 approximately 4.7?s is used for the sync pulse, 5.7?s for the "back porch"   200 (including the "colour burst"), and 1.65?s for the "front porch", totalling   201 12.05?s and thus leaving 51.95?s for the active video signal for each   202 scanline. As the Service Manual suggests in the oscilloscope traces, the   203 display information is transmitted more or less centred within the active   204 video period since the ULA will only be providing pixel data for 40?s in each   205 scanline.   206    207 Each 62.5ns cycle happens to correspond to 64?s divided by 1024, meaning that   208 each scanline can be divided into 1024 cycles, although only 640 at most are   209 actively used to provide pixel data. Pixel data production should only occur   210 within a certain period on each scanline, approximately 262 cycles after the   211 start of hsync:   212    213   active video period = 51.95?s   214   pixel data period = 40?s   215   total silent period = 51.95?s - 40?s = 11.95?s   216   silent periods (before and after) = 11.95?s / 2 = 5.975?s   217   hsync and back porch period = 4.7?s + 5.7?s = 10.4?s   218   time before pixel data period = 10.4?s + 5.975?s = 16.375?s   219   pixel data period start cycle = 16.375?s / 62.5ns = 262   220    221 By choosing a number divisible by 8, the RAM access mechanism can be   222 synchronised with the pixel production. Thus, 256 is a more appropriate start   223 cycle, where the HS (horizontal sync) signal corresponding to the 4?s sync   224 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"   225 document) occurs at cycle 0.   226    227 To summarise:   228    229   HS signal starts at cycle 0 on each horizontal scanline   230   HS signal ends approximately 4?s later at cycle 64   231   Pixel data starts approximately 12?s later at cycle 256   232    233 "Re: Electron Memory Contention" provides measurements that appear consistent   234 with these calculations.   235    236 The "vertical blanking period", meaning the period before picture information   237 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of   238 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5   239 lines. Thus, the first visible scanline on the first field of a frame occurs   240 half way through the 23rd scanline period measured from the start of vsync   241 (indicated by "V" in the diagrams below):   242    243                                         10                  20    23   244   Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8   245     Line from 1:       0                                          22 3   246  Line on screen: .:::::VVVVV:::::                                   12233445566   247                   |_________________________________________________|   248                            25 line vertical blanking period   249    250 In the second field of a frame, the first visible scanline coincides with the   251 24th scanline period measured from the start of line 313 in the frame:   252    253                310                                                 336   254   Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9   255   Line from 313:       0                                            23 4   256  Line on screen: 88:::::VVVVV::::                                    11223344   257                288 |                                                 |   258                    |_________________________________________________|   259                             25 line vertical blanking period   260    261 In order to consider only full lines, we might consider the start of each   262 frame to occur 23 lines after the start of vsync.   263    264 Again, it is likely that pixel data production should only occur on scanlines   265 within a certain period on each frame. The "625/50" document indicates that   266 only a certain region is "safe" to use, suggesting a vertically centred region   267 with approximately 15 blank lines above and below the picture. However, the   268 "PAL TV timing and voltages" document suggests 28 blank lines above and below   269 the picture. This would centre the 256 lines within the 312 lines of each   270 field and thus provide a start of picture approximately 5.5 or 5 lines after   271 the end of the blanking period or 28 or 27.5 lines after the start of vsync.   272    273 To summarise:   274    275   CSYNC signal starts at cycle 0   276   CSYNC signal ends approximately 160?s (2.5 lines) later at cycle 2560   277   Start of line occurs approximately 1632?s (5.5 lines) later at cycle 28672   278    279 See: http://en.wikipedia.org/wiki/PAL   280 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal   281 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes   282      http://lipas.uwasa.fi/~f76998/video/modes/   283 See: PAL TV timing and voltages   284      http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/   285 See: Line Standards   286      http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html   287 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards   288      http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf   289 See: Re: Electron Memory Contention   290      http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109   291    292 RAM Integrated Circuits   293 -----------------------   294    295 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series   296 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are   297 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,   298 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,   299 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.   300    301 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and   302 the Samsung-produced KM41464 series is apparently equivalent to the Texas   303 Instruments 4164 chips presumably used in the Electron.   304    305 The TM4164EC4 series combines 4 64K x 1b units into a single package and   306 appears similar to the TM4164EA4 featured on the Electron's circuit diagram   307 (in the Advanced User Guide but not the Service Manual), and it also has 22   308 pins providing 3 additional inputs and 3 additional outputs over the 16 pins   309 of the individual 4164-15 modules, presumably allowing concurrent access to   310 the packaged memory units.   311    312 As far as currently available replacements are concerned, the NTE4164 is a   313 potential candidate: according to the Vetco Electronics entry, it is   314 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar   315 parts include the NTE2164 and the NTE6664, both of which appear to have   316 largely the same performance and connection characteristics. Meanwhile, the   317 NTE21256 appears to be a 16-pin replacement with four times the capacity that   318 maintains the single data input and output pins. Using the NTE21256 as a   319 replacement for all ICs combined would be difficult because of the single bit   320 output.   321    322 Another device equivalent to the 4164-15 appears to be available under the   323 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web   324 site lists data sheets for other devices on the same page, but these are   325 different and actually appear to be provided under the 41574 product code (but   326 are listed under 41464-10) and appear to be replacements for the TM4164EC4:   327 the Samsung KM41464A-15 and NEC ?PD41464 employ 18 pins, eliminating 4 pins by   328 employing 4 pins for both input and output.   329    330             Pins    I/O pins    Row access  Column access   331             ----    --------    ----------  -------------   332 TM4164EC4   22      4 + 4       150ns (15)  90ns (15)   333 KM41464AP   18      4           150ns (15)  75ns (15)   334 NTE21256    16      1 + 1       150ns       75ns   335 HYB 4164-2  16      1 + 1       150ns       100ns   336 ?PD41464    18      4           120ns (12)  60ns (12)   337    338 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module   339      http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf   340 See: Dynamic RAMS   341      http://www.unicornelectronics.com/IC/DYNAMIC.html   342 See: New old stock 8x 4164 chips   343      http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock   344 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode   345      http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf   346 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory   347      http://www.vetco.net/catalog/product_info.php?products_id=2806   348 See: NTE4164 - IC-NMOS 64K DRAM 150NS   349      http://www.vetco.net/catalog/product_info.php?products_id=3680   350 See: NTE21256 - IC-256K DRAM 150NS   351      http://www.vetco.net/catalog/product_info.php?products_id=2799   352 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)   353      http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf   354 See: NTE6664 - IC-MOS 64K DRAM 150NS   355      http://www.vetco.net/catalog/product_info.php?products_id=5213   356 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM   357      http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf   358 See: 4164-150: MAJOR BRANDS   359      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1   360 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)   361      http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf   362 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode   363      http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf   364 See: NEC ?41464 65,536 x 4-Bit Dynamic NMOS RAM   365      http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf   366 See: 41464-10: MAJOR BRANDS   367      http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1   368    369 Interrupts   370 ----------   371    372 The ULA generates IRQs (maskable interrupts) according to certain conditions   373 and these conditions are controlled by location &FE00:   374    375   * Vertical sync (bottom of displayed screen)   376   * 50MHz real time clock   377   * Transmit data empty   378   * Receive data full   379   * High tone detect   380    381 The ULA is also used to clear interrupt conditions through location &FE05. Of   382 particular significance is bit 7, which must be set if an NMI (non-maskable   383 interrupt) has occurred and has thus suspended ULA access to memory, restoring   384 the normal function of the ULA.   385    386 ROM Paging   387 ----------   388    389 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM   390 mappings exist:   391    392    8    keyboard   393    9    keyboard (duplicate)   394   10    BASIC ROM   395   11    BASIC ROM (duplicate)   396    397 Paging in a ROM involves the following procedure:   398    399  1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to   400     2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is   401     selected.   402  2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero   403     whilst writing the desired ROM number n in bits 0 to 2.   404    405 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686   406    407 Shadow/Expanded Memory   408 ----------------------   409    410 The Electron exposes all sixteen address lines and all eight data lines   411 through the expansion bus. Using such lines, it is possible to provide   412 additional memory - typically sideways ROM and RAM - on expansion cards and   413 through cartridges, although the official cartridge specification provides   414 fewer address lines and only seeks to provide access to memory in 16K units.   415    416 Various modifications and upgrades were developed to offer "turbo"   417 capabilities to the Electron, permitting the CPU to access a separate 8K of   418 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via   419 the ULA through additional logic. However, an enhanced ULA might support   420 independent CPU access to memory over the expansion bus by allowing itself to   421 be discharged from providing access to memory, potentially for a range of   422 addresses, and for the CPU to communicate with external memory uninterrupted.   423    424 Sideways RAM/ROM and Upper Memory Access   425 ----------------------------------------   426    427 Although the ULA controls the CPU clock, effectively slowing or stopping the   428 CPU when the ULA needs to access screen memory, it is apparently able to allow   429 the CPU to access addresses of &8000 and above - the upper region of memory -   430 at 2MHz independently of any access to RAM that the ULA might be performing,   431 only blocking the CPU if it attempts to access addresses of &7FFF and below   432 during any ULA memory access - the lower region of memory - by stopping or   433 stalling its clock.   434    435 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the   436 CPU clock if the line goes low, when the CPU is attempting to access the lower   437 region of memory.   438    439 Hardware Scrolling (and Enhancement)   440 ------------------------------------   441    442 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with   443 the least significant 5 bits being zero, thus limiting the scrolling   444 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes   445 using the same layout of these addresses.   446    447 |--&FE02--------------| |--&FE03--------------|   448 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX   449    450    XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX   451    452 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen   453 memory to pixel locations is character oriented. A change in 8 bytes would   454 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in   455 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually   456 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User   457 Guide).   458    459 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall   460 of changing the screen address by 2 bytes is the change in the number of lines   461 from the initial and final character rows that need reading by the ULA, which   462 would need to maintain this state information (although this is a relatively   463 trivial change). Another pitfall is the complication that might be introduced   464 to software writing bitmaps of character height to the screen.   465    466 See: http://pastraiser.com/computers/acornelectron/acornelectron.html   467    468 Enhancement: Mode Layouts   469 -------------------------   470    471 Merely changing the screen memory mappings in order to have Archimedes-style   472 row-oriented screen addresses (instead of character-oriented addresses) could   473 be done for the existing modes, but this might not be sufficiently beneficial,   474 especially since accessing regions of the screen would involve incrementing   475 pointers by amounts that are inconvenient on an 8-bit CPU.   476    477 However, instead of using a Archimedes-style mapping, column-oriented screen   478 addresses could be more feasibly employed: incrementing the address would   479 reference the vertical screen location below the currently-referenced location   480 (just as occurs within characters using the existing ULA); instead of   481 returning to the top of the character row and referencing the next horizontal   482 location after eight bytes, the address would reference the next character row   483 and continue to reference locations downwards over the height of the screen   484 until reaching the bottom; at the bottom, the next location would be the next   485 horizontal location at the top of the screen.   486    487 In other words, the memory layout for the screen would resemble the following   488 (for MODE 2):   489    490   &3000 &3100       ... &7F00   491   &3001 &3101   492   ...   ...   493   &3007   494   &3008   495   ...   496   ...                   ...   497   &30FF             ... &7FFF   498    499 Since there are 256 pixel rows, each column of locations would be addressable   500 using the low byte of the address. Meanwhile, the high byte would be   501 incremented to address different columns. Thus, addressing screen locations   502 would become a lot more convenient and potentially much more efficient for   503 certain kinds of graphical output.   504    505 One potential complication with this simplified addressing scheme arises with   506 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported   507 with the existing ULA) would be achieved by incrementing or decrementing the   508 screen start address; by one character row, it would involve adding or   509 subtracting 8. However, the ULA only supports multiples of 64 when changing the   510 screen start address. Thus, if such a scheme were to be adopted, three   511 additional bits would need to be supported in the screen start register (see   512 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal   513 scrolling would be much improved even under the severe constraints of the   514 existing ULA: only adjustments of 256 to the screen start address would be   515 required to produce single-location scrolling of as few as two pixels in MODE 2   516 (four pixels in MODEs 1 and 5, eight pixels otherwise).   517    518 More disruptive is the effect of this alternative layout on software.   519 Presumably, compatibility with the BBC Micro was the primary goal of the   520 Electron's hardware design. With the character-oriented screen layout in   521 place, system software (and application software accessing the screen   522 directly) would be relying on this layout to run on the Electron with little   523 or no modification. Although it might have been possible to change the system   524 software to use this column-oriented layout instead, this would have incurred   525 a development cost and caused additional work porting things like games to the   526 Electron. Moreover, a separate branch of the software from that supporting the   527 BBC Micro and closer derivatives would then have needed maintaining.   528    529 The decision to use the character-oriented layout in the BBC Micro may have   530 been related to the choice of circuitry and to facilitate a convenient   531 hardware implementation, and by the time the Electron was planned, it was too   532 late to do anything about this somewhat unfortunate choice.   533    534 Pixel Layouts   535 -------------   536    537 The pixel layouts are as follows:   538    539   Modes         Depth (bpp)     Pixels (from bits)   540   -----         -----------     ------------------   541   0, 3, 4, 6    1               7 6 5 4 3 2 1 0   542   1, 5          2               73 62 51 40   543   2             4               7531 6420   544    545 Since the ULA reads a half-byte at a time, one might expect it to attempt to   546 produce pixels for every half-byte, as opposed to handling entire bytes.   547 However, the pixel layout is not conducive to producing pixels as soon as a   548 half-byte has been read for a given full-byte location: in 1bpp modes the   549 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel   550 data is spread across the entire byte in different ways.   551    552 An alternative arrangement might be as follows:   553    554   Modes         Depth (bpp)     Pixels (from bits)   555   -----         -----------     ------------------   556   0, 3, 4, 6    1               7 6 5 4 3 2 1 0   557   1, 5          2               76 54 32 10   558   2             4               7654 3210   559    560 Just as the mode layouts were presumably decided by compatibility with the BBC   561 Micro, the pixel layouts will have been maintained for similar reasons.   562 Unfortunately, this layout prevents any optimisation of the ULA for handling   563 half-byte pixel data generally.   564    565 Enhancement: The Missing MODE 4   566 -------------------------------   567    568 The Electron inherits its screen mode selection from the BBC Micro, where MODE   569 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.   570 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,   571 however, and they are merely implemented by skipping two scanlines in every   572 ten after the eight required to produce a character line. Thus, such modes   573 provide a 24-row display.   574    575 In principle, nothing prevents this "text mode" effect being applied to other   576 modes. The 20-column modes are not well-suited to displaying text, which   577 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than   578 2. Although the need for a non-monochrome 40-column text mode is addressed by   579 MODE 7 on the BBC Micro, the Electron lacks such a mode.   580    581 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it   582 would occupy MODE 4 instead of the current MODE 4:   583    584   Screen mode  Size (kilobytes)  Colours  Rows  Resolution   585   -----------  ----------------  -------  ----  ----------   586   0            20                2        32    640x256   587   1            20                4        32    320x256   588   2            20                16       32    160x256   589   3            16                2        24    640x256   590   4 (new)      16                4        24    320x256   591   4 (old)      10                2        32    320x256   592   5            10                4        32    160x256   593   6            8                 2        24    320x256   594    595 Thus, for increasing mode numbers, the size of each mode would be the same or   596 less than the preceding mode.   597    598 Enhancement: 2MHz RAM Access   599 ----------------------------   600    601 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU   602 when not competing with the ULA only accesses RAM every other 2MHz cycle (as   603 if the ULA still needed to access the RAM), one useful enhancement would be a   604 mechanism to let the CPU take over the ULA cycles outside the ULA's period of   605 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to   606 3.   607    608 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:   609    610   Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)   611   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)   612    613 In MODE 4 to 6:   614     615   Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)   616   On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)   617    618 This would improve CPU bandwidth as follows:   619    620                 Standard ULA    Enhanced ULA   621 MODE 0, 1, 2    9728 bytes      19456 bytes   622 MODE 3          12288 bytes     24576 bytes   623 MODE 4, 5       19968 bytes     29696 bytes   624 MODE 6          19968 bytes     32256 bytes   625    626 (Here, the uncontended 2MHz bandwidth for a display period would be 39936   627 bytes, being 128 cycles per line over 312 lines.)   628    629 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth   630 because all access opportunities to RAM are doubled. Meanwhile, in the other   631 modes, some CPU accesses occur alongside ULA accesses and thus cannot be   632 doubled, but the CPU bandwidth increase is still significant.   633    634 Unfortunately, the mechanism for accessing the RAM is too slow to provide data   635 within the time constraints of 2MHz operation. There is no time remaining in a   636 2MHz cycle for the CPU to receive and process any retrieved data.   637    638 Enhancement: Region Blanking   639 ----------------------------   640    641 The problem of permitting character-oriented blitting in programs whilst   642 scrolling the screen by sub-character amounts could be mitigated by permitting   643 a region of the display to be blank, such as the final lines of the display.   644 Consider the following vertical scrolling by 2 bytes that would cause an   645 initial character row of 6 lines and a final character row of 2 lines:   646    647     6 lines - initial, partial character row   648   248 lines - 31 complete rows   649     2 lines - final, partial character row   650    651 If a routine were in use that wrote 8 line bitmaps to the partial character   652 row now split in two, it would be advisable to hide one of the regions in   653 order to prevent content appearing in the wrong place on screen (such as   654 content meant to appear at the top "leaking" onto the bottom). Blanking 6   655 lines would be sufficient, as can be seen from the following cases.   656    657 Scrolling up by 2 lines:   658    659     6 lines - initial, partial character row   660   240 lines - 30 complete rows   661     4 lines - part of 1 complete row   662   -----------------------------------------------------------------   663     4 lines - part of 1 complete row (hidden to maintain 250 lines)   664     2 lines - final, partial character row (hidden)   665    666 Scrolling down by 2 lines:   667    668     2 lines - initial, partial character row   669   248 lines - 31 complete rows   670   ----------------------------------------------------------   671     6 lines - final, partial character row (hidden)   672    673 Thus, in this case, region blanking would impose a 250 line display with the   674 bottom 6 lines blank.   675    676 See the description of the display suspend enhancement for a more efficient   677 way of blanking lines than merely blanking the palette whilst allowing the CPU   678 to perform useful work during the blanking period.   679    680 To control the blanking or suspending of lines at the top and bottom of the   681 display, a memory location could be dedicated to the task: the upper 4 bits   682 could define a blanking region of up to 16 lines at the top of the screen,   683 whereas the lower 4 bits could define such a region at the bottom of the   684 screen. If more lines were required, two locations could be employed, allowing   685 the top and bottom regions to occupy the entire screen.   686    687 Enhancement: Screen Height Adjustment   688 -------------------------------------   689    690 The height of the screen could be configurable in order to reduce screen   691 memory consumption. This is not quite done in MODE 3 and 6 since the start of   692 the screen appears to be rounded down to the nearest page, but by reducing the   693 height by amounts more than a page, savings would be possible. For example:   694    695   Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address   696   ------------  -----  ------  --------------  ---------------  -------------   697   640           1      252     80              320              &3140 -> &3100   698   640           1      248     80              640              &3280 -> &3200   699   320           1      240     40              640              &5A80 -> &5A00   700   320           2      240     80              1280             &3500   701    702 Screen Mode Selection   703 ---------------------   704    705 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider   706 range of modes, the other bits of &FE*7 (related to sound, cassette   707 input/output and the Caps Lock LED) would need to be reassigned and bit 0   708 potentially being made available for use.   709    710 Enhancement: Palette Definition   711 -------------------------------   712    713 Since all memory accesses go via the ULA, an enhanced ULA could employ more   714 specific addresses than &FE*X to perform enhanced functions. For example, the   715 palette control is done using &FE*8-F and merely involves selecting predefined   716 colours, whereas an enhanced ULA could support the redefinition of all 16   717 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F   718 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour   719 specifications similar to those used on the Archimedes.   720    721 The principal limitation here is actually the hardware: the Electron has only   722 a single output line for each of the red, green and blue channels, and if   723 those outputs are strictly digital and can only be set to a "high" and "low"   724 value, then only the existing eight colours are possible. If a modern ULA were   725 able to output analogue values (or values at well-defined points between the   726 high and low values, such as the half-on value supported by the Amstrad CPC   727 series), it would still need to be assessed whether the circuitry could   728 successfully handle and propagate such values. Various sources indicate that   729 only "TTL levels" are supported by the RGB output circuit, and since there are   730 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it   731 is likely that the ULA is expected to provide only "high" or "low" values.   732    733 Short of adding extra outputs from the ULA (either additional red, green and   734 blue outputs or a combined intensity output), another approach might involve   735 some kind of modulation where an output value might be encoded in multiple   736 pulses at a higher frequency than the pixel frequency. However, this would   737 demand additional circuitry outside the ULA, and component RGB monitors would   738 probably not be able to take advantage of this feature; only UHF and composite   739 video devices (the latter with the composite video colour support enabled on   740 the Electron's circuit board) would potentially benefit.   741    742 Flashing Colours   743 ----------------   744    745 According to the Advanced User Guide, "The cursor and flashing colours are   746 entirely generated in software: This means that all of the logical to physical   747 colour map must be changed to cause colours to flash." This appears to suggest   748 that the palette registers must be updated upon the flash counter - read and   749 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the   750 colour pairs to be any combination of colours might be possible, instead of   751 having colour complements as pairs.   752    753 It is conceivable that the interrupt code responsible does the simple thing   754 and merely inverts the current values for any logical colours (LC) for which   755 the associated physical colour (as supplied as the second parameter to the VDU   756 19 call) has the top bit of its four bit value set. These top bits are not   757 recorded in the palette registers but are presumably recorded separately and   758 used to build bitmaps as follows:   759    760   LC  2 colour  4 colour  16 colour  4-bit value for inversion   761   --  --------  --------  ---------  -------------------------   762    0  00010001  00010001  00010001   1, 1, 1   763    1  01000100  00100010  00010001   4, 2, 1   764    2            01000100  00100010      4, 2   765    3            10001000  00100010      8, 2   766    4                      00010001         1   767    5                      00010001         1   768    6                      00100010         2   769    7                      00100010         2   770    8                      01000100         4   771    9                      01000100         4   772   10                      10001000         8   773   11                      10001000         8   774   12                      01000100         4   775   13                      01000100         4   776   14                      10001000         8   777   15                      10001000         8   778    779   Inversion value calculation:   780    781    2 colour formula: 1 << (colour * 2)   782    4 colour formula: 1 << colour   783   16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))   784    785 For example, where logical colour 0 has been mapped to a physical colour in   786 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to   787 the inversion operation. (The lower three bits of the physical colour would be   788 used to set the underlying colour information affected by the inversion   789 operation.)   790    791 An operation in the interrupt code would then combine the bitmaps for all   792 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being   793 combined for groups of logical colours as follows:   794    795    Logical colours   796    ---------------   797    0,  2,  8, 10   798    4,  6, 12, 14   799    5,  7, 13, 15   800    1,  3,  9, 11   801    802 These combined bitmaps would be EORed with the existing palette register   803 values in order to perform the value inversion necessary to produce the   804 flashing effect.   805    806 Thus, in the VDU 19 operation, the appropriate inversion value would be   807 calculated for the logical colour, and this value would then be combined with   808 other inversion values in a dedicated memory location corresponding to the   809 colour's group as indicated above. Meanwhile, the palette channel values would   810 be derived from the lower three bits of the specified physical colour and   811 combined with other palette data in dedicated memory locations corresponding   812 to the palette registers.   813    814 Interestingly, although flashing colours on the BBC Micro are controlled by   815 toggling bit 0 of the &FE20 control register location for the Video ULA, the   816 actual colour inversion is done in hardware.   817    818 Enhancement: Palette Definition Lists   819 -------------------------------------   820    821 It can be useful to redefine the palette in order to change the colours   822 available for a particular region of the screen, particularly in modes where   823 the choice of colours is constrained, and if an increased colour depth were   824 available, palette redefinition would be useful to give the illusion of more   825 than 16 colours in MODE 2. Traditionally, palette redefinition has been done   826 by using interrupt-driven timers, but a more efficient approach would involve   827 presenting lists of palette definitions to the ULA so that it can change the   828 palette at a particular display line.   829    830 One might define a palette redefinition list in a region of memory and then   831 communicate its contents to the ULA by writing the address and length of the   832 list, along with the display line at which the palette is to be changed, to   833 ULA registers such that the ULA buffers the list and performs the redefinition   834 at the appropriate time. Throughput/bandwidth considerations might impose   835 restrictions on the practical length of such a list, however.   836    837 Enhancement: Display Synchronisation Interrupts   838 -----------------------------------------------   839    840 When completing each scanline of the display, the ULA could trigger an   841 interrupt. Since this might impact system performance substantially, the   842 feature would probably need to be configurable, and it might be sufficient to   843 have an interrupt only after a certain number of display lines instead.   844 Permitting the CPU to take action after eight lines would allow palette   845 switching and other effects to occur on a character row basis.   846    847 The ULA provides an interrupt at the end of the display period, presumably so   848 that software can schedule updates to the screen, avoid flickering or tearing,   849 and so on. However, some applications might benefit from an interrupt at, or   850 just before, the start of the display period so that palette modifications or   851 similar effects could be scheduled.   852    853 Enhancement: Palette-Free Modes   854 -------------------------------   855    856 Palette-free modes might be defined where bit values directly correspond to   857 the red, green and blue channels, although this would mostly make sense only   858 for modes with depths greater than the standard 4 bits per pixel, and such   859 modes would require more memory than MODE 2 if they were to have an acceptable   860 resolution.   861    862 Enhancement: Display Suspend   863 ----------------------------   864    865 Especially when writing to the screen memory, it could be beneficial to be   866 able to suspend the ULA's access to the memory, instead producing blank values   867 for all screen pixels until a program is ready to reveal the screen. This is   868 different from palette blanking since with a blank palette, the ULA is still   869 reading screen memory and translating its contents into pixel values that end   870 up being blank.   871    872 This function is reminiscent of a capability of the ZX81, albeit necessary on   873 that hardware to reduce the load on the system CPU which was responsible for   874 producing the video output. By allowing display suspend on the Electron, the   875 performance benefit would be derived from giving the CPU full access to the   876 memory bandwidth.   877    878 The region blanking feature mentioned above could be implemented using this   879 enhancement instead of employing palette blanking for the affected lines of   880 the display.   881    882 Enhancement: Memory Filling   883 ---------------------------   884    885 A capability that could be given to an enhanced ULA is that of permitting the   886 ULA to write to screen memory as well being able to read from it. Although   887 such a capability would probably not be useful in conjunction with the   888 existing read operations when producing a screen display, and insufficient   889 bandwidth would exist to do so in high-bandwidth screen modes anyway, the   890 capability could be offered during a display suspend period (as described   891 above), permitting a more efficient mechanism to rapidly fill memory with a   892 predetermined value.   893    894 This capability could also support block filling, where the limits of the   895 filled memory would be defined by the position and size of a screen area,   896 although this would demand the provision of additional registers in the ULA to   897 retain the details of such areas and additional logic to control the fill   898 operation.   899    900 Enhancement: Region Filling   901 ---------------------------   902    903 An alternative to memory writing might involve indicating regions using   904 additional registers or memory where the ULA fills regions of the screen with   905 content instead of reading from memory. Unlike hardware sprites which should   906 realistically provide varied content, region filling could employ single   907 colours or patterns, and one advantage of doing so would be that the ULA need   908 not access memory at all within a particular region.   909    910 Regions would be defined on a row-by-row basis. Instead of reading memory and   911 blitting a direct representation to the screen, the ULA would read region   912 definitions containing a start column, region width and colour details. There   913 might be a certain number of definitions allowed per row, or the ULA might   914 just traverse an ordered list of such definitions with each one indicating the   915 row, start column, region width and colour details.   916    917 One could even compress this information further by requiring only the row,   918 start column and colour details with each subsequent definition terminating   919 the effect of the previous one. However, one would also need to consider the   920 convenience of preparing such definitions and whether efficient access to   921 definitions for a particular row might be desirable. It might also be   922 desirable to avoid having to prepare definitions for "empty" areas of the   923 screen, effectively making the definition of the screen contents employ   924 run-length encoding and employ only colour plus length information.   925    926 One application of region filling is that of simple 2D and 3D shape rendering.   927 Although it is entirely possible to plot such shapes to the screen and have   928 the ULA blit the memory contents to the screen, such operations consume   929 bandwidth both in the initial plotting and in the final transfer to the   930 screen. Region filling would reduce such bandwidth usage substantially.   931    932 This way of representing screen images would make certain kinds of images   933 unfeasible to represent - consider alternating single pixel values which could   934 easily occur in some character bitmaps - even if an internal queue of regions   935 were to be supported such that the ULA could read ahead and buffer such   936 "bandwidth intensive" areas. Thus, the ULA might be better served providing   937 this feature for certain areas of the display only as some kind of special   938 graphics window.   939    940 Enhancement: Hardware Sprites   941 -----------------------------   942    943 An enhanced ULA might provide hardware sprites, but this would be done in an   944 way that is incompatible with the standard ULA, since no &FE*X locations are   945 available for allocation. To keep the facility simple, hardware sprites would   946 have a standard byte width and height.   947    948 The specification of sprites could involve the reservation of 16 locations   949 (for example, &FE20-F) specifying a fixed number of eight sprites, with each   950 location pair referring to the sprite data. By limiting the ULA to dealing   951 with a fixed number of sprites, the work required inside the ULA would be   952 reduced since it would avoid having to deal with arbitrary numbers of sprites.   953    954 The principal limitation on providing hardware sprites is that of having to   955 obtain sprite data, given that the ULA is usually required to retrieve screen   956 data, and given the lack of memory bandwidth available to retrieve sprite data   957 (particularly from multiple sprites supposedly at the same position) and   958 screen data simultaneously. Although the ULA could potentially read sprite   959 data and screen data in alternate memory accesses in screen modes where the   960 bandwidth is not already fully utilised, this would result in a degradation of   961 performance.   962    963 Enhancement: Additional Screen Mode Configurations   964 --------------------------------------------------   965    966 Alternative screen mode configurations could be supported. The ULA has to   967 produce 640 pixel values across the screen, with pixel doubling or quadrupling   968 employed to fill the screen width:   969    970   Screen width      Columns     Scaling     Depth       Bytes   971   ------------      -------     -------     -----       -----   972   640               80          x1          1           80   973   320               40          x2          1, 2        40, 80   974   160               20          x4          2, 4        40, 80   975    976 It must also use at most 80 byte-sized memory accesses to provide the   977 information for the display. Given that characters must occupy an 8x8 pixel   978 array, if a configuration featuring anything other than 20, 40 or 80 character   979 columns is to be supported, compromises must be made such as the introduction   980 of blank pixels either between characters (such as occurs between rows in MODE   981 3 and 6) or at the end of a scanline (such as occurs at the end of the frame   982 in MODE 3 and 6). Consider the following configuration:   983    984   Screen width      Columns     Scaling     Depth       Bytes       Blank   985   ------------      -------     -------     -----       ------      -----   986   208               26          x3          1, 2        26, 52      16   987    988 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4   989 colours could be provided, with 16 blank pixel values (out of a total of 640)   990 generated either at the start or end (or split between the start and end) of   991 each scanline.   992    993 Enhancement: Character Attributes   994 ---------------------------------   995    996 The BBC Micro MODE 7 employs something resembling character attributes to   997 support teletext displays, but depends on circuitry providing a character   998 generator. The ZX Spectrum, on the other hand, provides character attributes   999 as a means of colouring bitmapped graphics. Although such a feature is very  1000 limiting as the sole means of providing multicolour graphics, in situations  1001 where the choice is between low resolution multicolour graphics or high  1002 resolution monochrome graphics, character attributes provide a potentially  1003 useful compromise.  1004   1005 For each byte read, the ULA must deliver 8 pixel values (out of a total of  1006 640) to the video output, doing so by either emptying its pixel buffer on a  1007 pixel per cycle basis, or by multiplying pixels and thus holding them for more  1008 than one cycle. For example for a screen mode having 640 pixels in width:  1009   1010   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  1011   Reads:    B                               B  1012   Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  1013   1014 And for a screen mode having 320 pixels in width:  1015   1016   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  1017   Reads:    B  1018   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7  1019   1020 However, in modes where less than 80 bytes are required to generate the pixel  1021 values, an enhanced ULA might be able to read additional bytes between those  1022 providing the bitmapped graphics data:  1023   1024   Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  1025   Reads:    B                               A  1026   Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7  1027   1028 These additional bytes could provide colour information for the bitmapped data  1029 in the following character column (of 8 pixels). Since it would be desirable  1030 to apply attribute data to the first column, the initial 8 cycles might be  1031 configured to not produce pixel values.  1032   1033 For an entire character, attribute data need only be read for the first row of  1034 pixels for a character. The subsequent rows would have attribute information  1035 applied to them, although this would require the attribute data to be stored  1036 in some kind of buffer. Thus, the following access pattern would be observed:  1037   1038   Reads:    A B _ B _ B _ B _ B _ B _ B _ B ...  1039   1040 In modes 3 and 6, the blank display lines could be used to retrieve attribute  1041 data:  1042   1043   Reads (blank):     A _ A _ A _ A _ A _ A _ A _ A _ ...  1044   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...  1045   Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...  1046                      ...  1047   1048 See below for a discussion of using this for character data as well.  1049   1050 A whole byte used for colour information for a whole character would result in  1051 a choice of 256 colours, and this might be somewhat excessive. By only reading  1052 attribute bytes at every other opportunity, a choice of 16 colours could be  1053 applied individually to two characters.  1054   1055   Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  1056   Reads:    B               A               B               -  1057   Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7  1058   1059 Further reductions in attribute data access, offering 4 colours for every  1060 character in a four character block, for example, might also be worth  1061 considering.  1062   1063 Consider the following configurations for screen modes with a colour depth of  1064 1 bit per pixel for bitmap information:  1065   1066   Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start  1067   ------------  -------  -------  ---------  ---------  -------  ------------  1068   320           40       x2       40         40         256      &5300  1069   320           40       x2       40         20         16       &5580 -> &5500  1070   320           40       x2       40         10         4        &56C0 -> &5600  1071   208           26       x3       26         26         256      &62C0 -> &6200  1072   208           26       x3       26         13         16       &6460 -> &6400  1073   1074 Enhancement: Text-Only Modes using Character and Attribute Data  1075 ---------------------------------------------------------------  1076   1077 In modes 3 and 6, the blank display lines could be used to retrieve character  1078 and attribute data instead of trying to insert it between bitmap data accesses,  1079 but this data would then need to be retained:  1080   1081   Reads:    A C A C A C A C A C A C A C A C ...  1082   Reads:    B _ B _ B _ B _ B _ B _ B _ B _ ...  1083   1084 Only attribute (A) and character (C) reads would require screen memory  1085 storage. Bitmap data reads (B) would involve either accesses to memory to  1086 obtain character definition details or could, at the cost of special storage  1087 in the ULA, involve accesses within the ULA that would then free up the RAM.  1088 However, the CPU would not benefit from having any extra access slots due to  1089 the limitations of the RAM access mechanism.  1090   1091 A scheme without caching might be possible. The same line of memory addresses  1092 might be visited over and over again for eight display lines, with an index  1093 into the bitmap data being incremented from zero to seven. The access patterns  1094 would look like this:  1095   1096   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 0)  1097   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 1)  1098   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 2)  1099   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 3)  1100   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 4)  1101   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 5)  1102   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 6)  1103   Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 7)  1104   1105 The bandwidth requirements would be the sum of the accesses to read the  1106 character values (repeatedly) and those to read the bitmap data to reproduce  1107 the characters on screen.  1108   1109 Enhancement: MODE 7 Emulation using Character Attributes  1110 --------------------------------------------------------  1111   1112 If the scheme of applying attributes to character regions were employed to  1113 emulate MODE 7, in conjunction with the MODE 6 display technique, the  1114 following configuration would be required:  1115   1116   Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start  1117   ------------  -------  ----  ---------  ---------  -------  ------------  1118   320           40       25    40         20         16       &5ECC -> &5E00  1119   320           40       25    40         10         4        &5FC6 -> &5F00  1120   1121 Although this requires much more memory than MODE 7 (8500 bytes versus MODE  1122 7's 1000 bytes), it does not need much more memory than MODE 6, and it would  1123 at least make a limited 40-column multicolour mode available as a substitute  1124 for MODE 7.  1125   1126 Using the text-only enhancement with caching of data or with repeated reads of  1127 the same character data line for eight display lines, the storage requirements  1128 would be diminished substantially:  1129   1130   Screen width  Columns  Rows  Bytes (C)  Bytes (A)  Colours  Screen start  1131   ------------  -------  ----  ---------  ---------  -------  ------------  1132   320           40       25    40         20         16       &7A94 -> &7A00  1133   320           40       25    40         10         4        &7B1E -> &7B00  1134   320           40       25    40         5          2        &7B9B -> &7B00  1135   320           40       25    40         0          (2)      &7C18 -> &7C00  1136   640           80       25    80         40         16       &7448 -> &7400  1137   640           80       25    80         20         4        &763C -> &7600  1138   640           80       25    80         10         2        &7736 -> &7700  1139   640           80       25    80         0          (2)      &7830 -> &7800  1140   1141 Note that the colours describe the locally defined attributes for each  1142 character. When no attribute information is provided, the colours are defined  1143 globally.  1144   1145 Enhancement: Compressed Character Data  1146 --------------------------------------  1147   1148 Another observation about text-only modes is that they only need to store a  1149 restricted set of bitmapped data values. Encoding this set of values in a  1150 smaller unit of storage than a byte could possibly help to reduce the amount  1151 of storage and bandwidth required to reproduce the characters on the display.  1152   1153 Enhancement: High Resolution Graphics  1154 -------------------------------------  1155   1156 Screen modes with higher resolutions and larger colour depths might be  1157 possible, but this would in most cases involve the allocation of more screen  1158 memory, and the ULA would probably then be obliged to page in such memory for  1159 the CPU to be able to sensibly access it all.  1160   1161 Enhancement: Genlock Support  1162 ----------------------------  1163   1164 The ULA generates a video signal in conjunction with circuitry producing the  1165 output features necessary for the correct display of the screen image.  1166 However, it appears that the ULA drives the video synchronisation mechanism  1167 instead of reacting to an existing signal. Genlock support might be possible  1168 if the ULA were made to be responsive to such external signals, resetting its  1169 address generators upon receiving synchronisation events.  1170   1171 Enhancement: Improved Sound  1172 ---------------------------  1173   1174 The standard ULA reserves &FE*6 for sound generation and cassette input/output  1175 (with bits 1 and 2 of &FE*7 being used to select either sound generation or  1176 cassette I/O), thus making it impossible to support multiple channels within  1177 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,  1178 and an enhanced ULA could adopt this interface.  1179   1180 The BBC Micro uses the SN76489 chip to produce sound, and the entire  1181 functionality of this chip could be emulated for enhanced sound, with a subset  1182 of the functionality exposed via the &FE*6 interface.  1183   1184 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489  1185 See: http://www.smspower.org/Development/SN76489  1186   1187 Enhancement: Waveform Upload  1188 ----------------------------  1189   1190 As with a hardware sprite function, waveforms could be uploaded or referenced  1191 using locations as registers referencing memory regions.  1192   1193 Enhancement: Sound Input/Output  1194 -------------------------------  1195   1196 Since the ULA already controls audio input/output for cassette-based data, it  1197 would have been interesting to entertain the idea of sampling and output of  1198 sounds through the cassette interface. However, a significant amount of  1199 circuitry is employed to process the input signal for use by the ULA and to  1200 process the output signal for recording.  1201   1202 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11  1203   1204 Enhancement: BBC ULA Compatibility  1205 ----------------------------------  1206   1207 Although some new ULA functions could be defined in a way that is also  1208 compatible with the BBC Micro, the BBC ULA is itself incompatible with the  1209 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory  1210 map, but controls various functions specific to the 6845 video controller;  1211 &FE08-F is reserved for the serial controller. It therefore becomes possible  1212 to disregard compatibility where compatibility is already disregarded for a  1213 particular area of functionality.  1214   1215 &FE20-F maps to video ULA functionality on the BBC Micro which provides  1216 control over the palette (using address &FE21, compared to &FE07-F on the  1217 Electron) and other system-specific functions. Since the location usage is  1218 generally incompatible, this region could be reused for other purposes.  1219   1220 Enhancement: Increased RAM, ULA and CPU Performance  1221 ---------------------------------------------------  1222   1223 More modern implementations of the hardware might feature faster RAM coupled  1224 with an increased ULA clock frequency in order to increase the bandwidth  1225 available to the ULA and to the CPU in situations where the ULA is not needed  1226 to perform work. A ULA employing a 32MHz clock would be able to complete the  1227 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU  1228 to access the RAM for the following 250ns even in display modes requiring the  1229 retrieval of a byte for the display every 500ns. The CPU could, subject to  1230 timing issues, run at 2MHz even in MODE 0, 1 and 2.  1231   1232 A scheme such as that described above would have a similar effect to the  1233 scheme employed in the BBC Micro, although the latter made use of RAM with a  1234 wider bandwidth in order to complete memory transfers within 250ns and thus  1235 permit the CPU to run continuously at 2MHz.  1236   1237 Higher bandwidth could potentially be used to implement exotic features such  1238 as RAM-resident hardware sprites or indeed any feature demanding RAM access  1239 concurrent with the production of the display image.  1240   1241 Enhancement: Multiple CPU Stacks and Zero Pages  1242 -----------------------------------------------  1243   1244 The 6502 maintains a stack for subroutine calls and register storage in page  1245 &01. Although the stack register can be manipulated using the TSX and TXS  1246 instructions, thereby permitting the maintenance of multiple stack regions and  1247 thus the potential coexistence of multiple programs each using a separate  1248 region, only programs that make little use of the stack (perhaps avoiding  1249 deeply-nested subroutine invocations and significant register storage) would  1250 be able to coexist without overwriting each other's stacks.  1251   1252 One way that this issue could be alleviated would involve the provision of a  1253 facility to redirect accesses to page &01 to other areas of memory. The ULA  1254 would provide a register that defines a physical page for the use of the CPU's  1255 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would  1256 change the asserted address lines to redirect the access to the appropriate  1257 physical region.  1258   1259 By providing an 8-bit register, mapping to the most significant byte (MSB) of  1260 a 16-bit address, the ULA could then replace any MSB equal to &01 with the  1261 register value before the access is made. Where multiple programs coexist,  1262 upon switching programs, the register would be updated to point the ULA to the  1263 appropriate stack location, thus providing a simple memory management unit  1264 (MMU) capability.  1265   1266 In a similar fashion, zero page accesses could also be redirected so that code  1267 could run from sideways RAM and have zero page operations redirected to "upper  1268 memory" - for example, to page &BE (with stack accesses redirected to page  1269 &BF, perhaps) - thereby permitting most CPU operations to occur without  1270 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the  1271 CPU as it contends with the ULA for memory access.  1272   1273 Such facilities could also be provided by a separate circuit between the CPU  1274 and ULA in a fashion similar to that employed by a "turbo" board, but unlike  1275 such boards, no additional RAM would be provided: all memory accesses would  1276 occur as normal through the ULA, albeit redirected when configured  1277 appropriately.  1278   1279 ULA Pin Functions  1280 -----------------  1281   1282 The functions of the ULA pins are described in the Electron Service Manual. Of  1283 interest to video processing are the following:  1284   1285   CSYNC (low during horizontal or vertical synchronisation periods, high  1286          otherwise)  1287   1288   HS (low during horizontal synchronisation periods, high otherwise)  1289   1290   RED, GREEN, BLUE (pixel colour outputs)  1291   1292   CLOCK IN (a 16MHz clock input, 4V peak to peak)  1293   1294   PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)  1295   1296 More general memory access pins:  1297   1298   RAM0...RAM3 (data lines to/from the RAM)  1299   1300   RA0...RA7 (address lines for sending both row and column addresses to the RAM)  1301   1302   RAS (row address strobe setting the row address on a negative edge - see the  1303        timing notes)  1304   1305   CAS (column address strobe setting the column address on a negative edge -  1306        see the timing notes)  1307   1308   WE (sets write enable with logic 0, read with logic 1)  1309   1310   ROM (select data access from ROM)  1311   1312 CPU-oriented memory access pins:  1313   1314   A0...A15 (CPU address lines)  1315   1316   PD0...PD7 (CPU data lines)  1317   1318   R/W (indicates CPU write with logic 0, CPU read with logic 1)  1319   1320 Interrupt-related pins:  1321   1322   NMI (CPU request for uninterrupted 1MHz access to memory)  1323   1324   IRQ (signal event to CPU)  1325   1326   POR (power-on reset, resetting the ULA on a positive edge and asserting the  1327        CPU's RST pin)  1328   1329   RST (master reset for the CPU signalled on power-up and by the Break key)  1330   1331 Keyboard-related pins:  1332   1333   KBD0...KBD3 (keyboard inputs)  1334   1335   CAPS LOCK (control status LED)  1336   1337 Sound-related pins:  1338   1339   SOUND O/P (sound output using internal oscillator)  1340   1341 Cassette-related pins:  1342   1343   CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)  1344   1345   CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)  1346   1347   CAS RC (detect high tone)  1348   1349   CAS MO (motor relay output)  1350   1351   ?13 IN (~1200 baud clock input)  1352   1353 ULA Socket  1354 ----------  1355   1356 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.  1357   1358 References  1359 ----------  1360   1361 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm  1362   1363 About this Document  1364 -------------------  1365   1366 The most recent version of this document and accompanying distribution should  1367 be available from the following location:  1368   1369 http://hgweb.boddie.org.uk/ULA  1370   1371 Copyright and licence information can be found in the docs directory of this  1372 distribution - see docs/COPYING.txt for more information.