The Acorn Electron ULA ====================== Principal Design and Feature Constraints ---------------------------------------- The features of the ULA are limited by the amount of time and resources that can be allocated to each activity necessary to support such features given the fundamental obligations of the unit. Maintaining a screen display based on the contents of RAM itself requires the ULA to have exclusive access to such hardware resources for a significant period of time. Whilst other elements of the ULA can in principle run in parallel with this activity, they cannot also access the RAM. Consequently, other features that might use the RAM must accept a reduced allocation of that resource in comparison to a hypothetical architecture where concurrent RAM access is possible. Thus, the principal constraint for many features is bandwidth. The duration of access to hardware resources is one aspect of this; the rate at which such resources can be accessed is another. For example, the RAM is not fast enough to support access more frequently than one byte per 2MHz cycle, and for screen modes involving 80 bytes of screen data per scanline, there are no free cycles for anything other than the production of pixel output during the active scanline periods. Timing ------ According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256 of which are used to generate pixel data. At 50Hz, this means that 128 cycles are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles / 312 ~= 128 cycles). This is consistent with the observation that each scanline requires at most 80 bytes of data, and that the ULA is apparently busy for 40 out of 64 microseconds in each scanline. (In fact, since the ULA is seeking to provide an image for an interlaced 625-line display, there are in fact two "fields" involved, one providing 312 scanlines and one providing 313 scanlines. See below for a description of the video system.) Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7, each providing two bits of each byte) using two cycles within the 500ns period of the 2MHz clock to complete each access operation. Since the CPU and ULA have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must effectively run at 1MHz (since every other 500ns period involves the ULA accessing RAM) during transfers of screen data. The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided by the ULA (IC1) depending on the screen mode in use. Each 16MHz cycle is approximately 62.5ns. To access the memory, the following patterns corresponding to 16MHz cycles are required: Time (ns): 0-------------- 500------------- ... 2 MHz cycle: 0 1 ... 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ... /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ... ~RAS: /---\___________/---\___________ ... ~CAS: /-----\___/-\___/-----\___/-\___ ... Address events: A B C A B C ... Data events: F S F S ... ~RAS ops: 1 0 1 0 ... ~CAS ops: 1 0 1 0 1 0 1 0 ... Address ops: a b c a b c ... Data ops: s f s f ... ~WE: ......W ... PHI OUT: \_______________/--------------- ... CPU (RAM): L D ... RnW: R ... PHI OUT: \_______/-------\_______/------- ... CPU (ROM): L D L D ... RnW: R R ... ~RAS must be high for 100ns, ~CAS must be high for 50ns. ~RAS must be low for 150ns, ~CAS must be low for 90ns. Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low. Here, "A" and "B" respectively indicate the row and first column addresses being latched into the RAM (on a negative edge for ~RAS and ~CAS respectively), and "C" indicates the second column address being latched into the RAM. Presumably, the first and second half-bytes can be read at "F" and "S" respectively, and the row and column addresses must be made available at "a" and "b" (and "c") respectively at the latest. Data can be read at "f" and "s" for the first and second half-bytes respectively. For the CPU, "L" indicates the point at which an address is taken from the CPU address bus, on a negative edge of PHI OUT, with "D" being the point at which data may either be read or be asserted for writing, on a positive edge of PHI OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low for writing or high for reading, and thus propagates RnW from the CPU, this would need to be done before data would be retrieved and, according to the TM4164EC4 datasheet, even as late as the column address is presented and ~CAS brought low. The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column address access time of 90ns (maximum), which appears to mean that ~RAS must be held low for at least 150ns and that ~CAS must be held low for at least 90ns before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S" is 1.5 cycles. Note that the Service Manual refers to the negative edge of RAS and CAS, but the datasheet for the similar TM4164EC4 product shows latching on the negative edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that "page mode" provides the appropriate behaviour for that particular product. The CPU, when accessing the RAM alone, apparently does not make use of the vacated "slot" that the ULA would otherwise use (when interleaving accesses in MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when accessing ROM (and potentially sideways RAM). The principal limitation is the amount of time needed between issuing an address and receiving an entire byte from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the 4 cycles that would be required for 2MHz operation. See: Acorn Electron Advanced User Guide See: Acorn Electron Service Manual http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438 CPU Clock Notes --------------- "The 6502 receives an external square-wave clock input signal on pin 37, which is usually labeled PHI0. [...] This clock input is processed within the 6502 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been through two inverters and a push-pull amplifier. The same network of transistors within the 6502 which generates PHI2 is also tied to PHI1, and generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made available to external devices is so that they know when they can access the CPU. When PHI1 is high, this means that external devices can read from the address bus or data bus; when PHI2 is high, this means that external devices can write to the data bus." See: http://lateblt.livejournal.com/88105.html "The 6502 has a synchronous memory bus where the master clock is divided into two phases (Phase 1 and Phase 2). The address is always generated during Phase 1 and all memory accesses take place during Phase 2." See: http://www.jmargolin.com/vgens/vgens.htm Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means when PHI1 is high. Bandwidth Figures ----------------- Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312 total lines, with 80 cycles occurring in the active periods of display scanlines, the following bandwidth calculations can be performed: Total theoretical maximum: 128 cycles * 312 lines = 39936 bytes MODE 0, 1, 2: ULA: 80 cycles * 256 lines = 20480 bytes CPU: 48 cycles / 2 * 256 lines + 128 cycles / 2 * (312 - 256) lines = 9728 bytes MODE 3: ULA: 80 cycles * 24 rows * 8 lines = 15360 bytes CPU: 48 cycles / 2 * 24 rows * 8 lines + 128 cycles / 2 * (312 - (24 rows * 8 lines)) = 12288 bytes MODE 4, 5: ULA: 40 cycles * 256 lines = 10240 bytes CPU: (40 cycles + 48 cycles / 2) * 256 lines + 128 cycles / 2 * (312 - 256) lines = 19968 bytes MODE 6: ULA: 40 cycles * 24 rows * 8 lines = 7680 bytes CPU: (40 cycles + 48 cycles / 2) * 24 rows * 8 lines + 128 cycles / 2 * (312 - (24 rows * 8 lines)) = 19968 bytes Here, the division of 2 for CPU accesses is performed to indicate that the CPU only uses every other access opportunity even in uncontended periods. See the 2MHz RAM Access enhancement below for bandwidth calculations that consider this limitation removed. Video Timing ------------ According to 8.7 in the Service Manual, and the PAL Wikipedia page, approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch" (including the "colour burst"), and 1.65µs for the "front porch", totalling 12.05µs and thus leaving 51.95µs for the active video signal for each scanline. As the Service Manual suggests in the oscilloscope traces, the display information is transmitted more or less centred within the active video period since the ULA will only be providing pixel data for 40µs in each scanline. Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that each scanline can be divided into 1024 cycles, although only 640 at most are actively used to provide pixel data. Pixel data production should only occur within a certain period on each scanline, approximately 262 cycles after the start of hsync: active video period = 51.95µs pixel data period = 40µs total silent period = 51.95µs - 40µs = 11.95µs silent periods (before and after) = 11.95µs / 2 = 5.975µs hsync and back porch period = 4.7µs + 5.7µs = 10.4µs time before pixel data period = 10.4µs + 5.975µs = 16.375µs pixel data period start cycle = 16.375µs / 62.5ns = 262 By choosing a number divisible by 8, the RAM access mechanism can be synchronised with the pixel production. Thus, 256 is a more appropriate start cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages" document) occurs at cycle 0. To summarise: HS signal starts at cycle 0 on each horizontal scanline HS signal ends approximately 4µs later at cycle 64 Pixel data starts approximately 12µs later at cycle 256 "Re: Electron Memory Contention" provides measurements that appear consistent with these calculations. The "vertical blanking period", meaning the period before picture information in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5 lines. Thus, the first visible scanline on the first field of a frame occurs half way through the 23rd scanline period measured from the start of vsync (indicated by "V" in the diagrams below): 10 20 23 Line in frame: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 Line from 1: 0 22 3 Line on screen: .:::::VVVVV::::: 12233445566 |_________________________________________________| 25 line vertical blanking period In the second field of a frame, the first visible scanline coincides with the 24th scanline period measured from the start of line 313 in the frame: 310 336 Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Line from 313: 0 23 4 Line on screen: 88:::::VVVVV:::: 11223344 288 | | |_________________________________________________| 25 line vertical blanking period In order to consider only full lines, we might consider the start of each frame to occur 23 lines after the start of vsync. Again, it is likely that pixel data production should only occur on scanlines within a certain period on each frame. The "625/50" document indicates that only a certain region is "safe" to use, suggesting a vertically centred region with approximately 15 blank lines above and below the picture. However, the "PAL TV timing and voltages" document suggests 28 blank lines above and below the picture. This would centre the 256 lines within the 312 lines of each field and thus provide a start of picture approximately 5.5 or 5 lines after the end of the blanking period or 28 or 27.5 lines after the start of vsync. To summarise: CSYNC signal starts at cycle 0 CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560 Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672 See: http://en.wikipedia.org/wiki/PAL See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes http://lipas.uwasa.fi/~f76998/video/modes/ See: PAL TV timing and voltages http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/ See: Line Standards http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf See: Re: Electron Memory Contention http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109 RAM Integrated Circuits ----------------------- Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants, have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly, ByteDelight.com sell 4164 devices primarily for the ZX Spectrum. The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and the Samsung-produced KM41464 series is apparently equivalent to the Texas Instruments 4164 chips presumably used in the Electron. The TM4164EC4 series combines 4 64K x 1b units into a single package and appears similar to the TM4164EA4 featured on the Electron's circuit diagram (in the Advanced User Guide but not the Service Manual), and it also has 22 pins providing 3 additional inputs and 3 additional outputs over the 16 pins of the individual 4164-15 modules, presumably allowing concurrent access to the packaged memory units. As far as currently available replacements are concerned, the NTE4164 is a potential candidate: according to the Vetco Electronics entry, it is supposedly a replacement for the TMS4164-15 amongst many other parts. Similar parts include the NTE2164 and the NTE6664, both of which appear to have largely the same performance and connection characteristics. Meanwhile, the NTE21256 appears to be a 16-pin replacement with four times the capacity that maintains the single data input and output pins. Using the NTE21256 as a replacement for all ICs combined would be difficult because of the single bit output. Another device equivalent to the 4164-15 appears to be available under the code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web site lists data sheets for other devices on the same page, but these are different and actually appear to be provided under the 41574 product code (but are listed under 41464-10) and appear to be replacements for the TM4164EC4: the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by employing 4 pins for both input and output. Pins I/O pins Row access Column access ---- -------- ---------- ------------- TM4164EC4 22 4 + 4 150ns (15) 90ns (15) KM41464AP 18 4 150ns (15) 75ns (15) NTE21256 16 1 + 1 150ns 75ns HYB 4164-2 16 1 + 1 150ns 100ns µPD41464 18 4 120ns (12) 60ns (12) See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf See: Dynamic RAMS http://www.unicornelectronics.com/IC/DYNAMIC.html See: New old stock 8x 4164 chips http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory http://www.vetco.net/catalog/product_info.php?products_id=2806 See: NTE4164 - IC-NMOS 64K DRAM 150NS http://www.vetco.net/catalog/product_info.php?products_id=3680 See: NTE21256 - IC-256K DRAM 150NS http://www.vetco.net/catalog/product_info.php?products_id=2799 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM) http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf See: NTE6664 - IC-MOS 64K DRAM 150NS http://www.vetco.net/catalog/product_info.php?products_id=5213 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf See: 4164-150: MAJOR BRANDS http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM) http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf See: 41464-10: MAJOR BRANDS http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1 Interrupts ---------- The ULA generates IRQs (maskable interrupts) according to certain conditions and these conditions are controlled by location &FE00: * Vertical sync (bottom of displayed screen) * 50MHz real time clock * Transmit data empty * Receive data full * High tone detect The ULA is also used to clear interrupt conditions through location &FE05. Of particular significance is bit 7, which must be set if an NMI (non-maskable interrupt) has occurred and has thus suspended ULA access to memory, restoring the normal function of the ULA. ROM Paging ---------- Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM mappings exist: 8 keyboard 9 keyboard (duplicate) 10 BASIC ROM 11 BASIC ROM (duplicate) Paging in a ROM involves the following procedure: 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to 2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is selected. 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero whilst writing the desired ROM number n in bits 0 to 2. See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686 Shadow/Expanded Memory ---------------------- The Electron exposes all sixteen address lines and all eight data lines through the expansion bus. Using such lines, it is possible to provide additional memory - typically sideways ROM and RAM - on expansion cards and through cartridges, although the official cartridge specification provides fewer address lines and only seeks to provide access to memory in 16K units. Various modifications and upgrades were developed to offer "turbo" capabilities to the Electron, permitting the CPU to access a separate 8K of RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via the ULA through additional logic. However, an enhanced ULA might support independent CPU access to memory over the expansion bus by allowing itself to be discharged from providing access to memory, potentially for a range of addresses, and for the CPU to communicate with external memory uninterrupted. Sideways RAM/ROM and Upper Memory Access ---------------------------------------- Although the ULA controls the CPU clock, effectively slowing or stopping the CPU when the ULA needs to access screen memory, it is apparently able to allow the CPU to access addresses of &8000 and above - the upper region of memory - at 2MHz independently of any access to RAM that the ULA might be performing, only blocking the CPU if it attempts to access addresses of &7FFF and below during any ULA memory access - the lower region of memory - by stopping or stalling its clock. Thus, the ULA remains aware of the level of the A15 line, only inhibiting the CPU clock if the line goes low, when the CPU is attempting to access the lower region of memory. Hardware Scrolling (and Enhancement) ------------------------------------ On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with the least significant 5 bits being zero, thus limiting the scrolling resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes using the same layout of these addresses. |--&FE02--------------| |--&FE03--------------| XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX Arguably, a resolution of 8 bytes is more useful, since the mapping of screen memory to pixel locations is character oriented. A change in 8 bytes would permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User Guide). One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall of changing the screen address by 2 bytes is the change in the number of lines from the initial and final character rows that need reading by the ULA, which would need to maintain this state information (although this is a relatively trivial change). Another pitfall is the complication that might be introduced to software writing bitmaps of character height to the screen. See: http://pastraiser.com/computers/acornelectron/acornelectron.html Enhancement: Mode Layouts ------------------------- Merely changing the screen memory mappings in order to have Archimedes-style row-oriented screen addresses (instead of character-oriented addresses) could be done for the existing modes, but this might not be sufficiently beneficial, especially since accessing regions of the screen would involve incrementing pointers by amounts that are inconvenient on an 8-bit CPU. However, instead of using a Archimedes-style mapping, column-oriented screen addresses could be more feasibly employed: incrementing the address would reference the vertical screen location below the currently-referenced location (just as occurs within characters using the existing ULA); instead of returning to the top of the character row and referencing the next horizontal location after eight bytes, the address would reference the next character row and continue to reference locations downwards over the height of the screen until reaching the bottom; at the bottom, the next location would be the next horizontal location at the top of the screen. In other words, the memory layout for the screen would resemble the following (for MODE 2): &3000 &3100 ... &7F00 &3001 &3101 ... ... &3007 &3008 ... ... ... &30FF ... &7FFF Since there are 256 pixel rows, each column of locations would be addressable using the low byte of the address. Meanwhile, the high byte would be incremented to address different columns. Thus, addressing screen locations would become a lot more convenient and potentially much more efficient for certain kinds of graphical output. One potential complication with this simplified addressing scheme arises with hardware scrolling. Vertical hardware scrolling by one pixel row (not supported with the existing ULA) would be achieved by incrementing or decrementing the screen start address; by one character row, it would involve adding or subtracting 8. However, the ULA only supports multiples of 64 when changing the screen start address. Thus, if such a scheme were to be adopted, three additional bits would need to be supported in the screen start register (see "Hardware Scrolling (and Enhancement)" for more details). However, horizontal scrolling would be much improved even under the severe constraints of the existing ULA: only adjustments of 256 to the screen start address would be required to produce single-location scrolling of as few as two pixels in MODE 2 (four pixels in MODEs 1 and 5, eight pixels otherwise). More disruptive is the effect of this alternative layout on software. Presumably, compatibility with the BBC Micro was the primary goal of the Electron's hardware design. With the character-oriented screen layout in place, system software (and application software accessing the screen directly) would be relying on this layout to run on the Electron with little or no modification. Although it might have been possible to change the system software to use this column-oriented layout instead, this would have incurred a development cost and caused additional work porting things like games to the Electron. Moreover, a separate branch of the software from that supporting the BBC Micro and closer derivatives would then have needed maintaining. The decision to use the character-oriented layout in the BBC Micro may have been related to the choice of circuitry and to facilitate a convenient hardware implementation, and by the time the Electron was planned, it was too late to do anything about this somewhat unfortunate choice. Pixel Layouts ------------- The pixel layouts are as follows: Modes Depth (bpp) Pixels (from bits) ----- ----------- ------------------ 0, 3, 4, 6 1 7 6 5 4 3 2 1 0 1, 5 2 73 62 51 40 2 4 7531 6420 Since the ULA reads a half-byte at a time, one might expect it to attempt to produce pixels for every half-byte, as opposed to handling entire bytes. However, the pixel layout is not conducive to producing pixels as soon as a half-byte has been read for a given full-byte location: in 1bpp modes the first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel data is spread across the entire byte in different ways. An alternative arrangement might be as follows: Modes Depth (bpp) Pixels (from bits) ----- ----------- ------------------ 0, 3, 4, 6 1 7 6 5 4 3 2 1 0 1, 5 2 76 54 32 10 2 4 7654 3210 Just as the mode layouts were presumably decided by compatibility with the BBC Micro, the pixel layouts will have been maintained for similar reasons. Unfortunately, this layout prevents any optimisation of the ULA for handling half-byte pixel data generally. Enhancement: The Missing MODE 4 ------------------------------- The Electron inherits its screen mode selection from the BBC Micro, where MODE 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4. Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7, however, and they are merely implemented by skipping two scanlines in every ten after the eight required to produce a character line. Thus, such modes provide a 24-row display. In principle, nothing prevents this "text mode" effect being applied to other modes. The 20-column modes are not well-suited to displaying text, which leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than 2. Although the need for a non-monochrome 40-column text mode is addressed by MODE 7 on the BBC Micro, the Electron lacks such a mode. If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it would occupy MODE 4 instead of the current MODE 4: Screen mode Size (kilobytes) Colours Rows Resolution ----------- ---------------- ------- ---- ---------- 0 20 2 32 640x256 1 20 4 32 320x256 2 20 16 32 160x256 3 16 2 24 640x256 4 (new) 16 4 24 320x256 4 (old) 10 2 32 320x256 5 10 4 32 160x256 6 8 2 24 320x256 Thus, for increasing mode numbers, the size of each mode would be the same or less than the preceding mode. Enhancement: 2MHz RAM Access ---------------------------- Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU when not competing with the ULA only accesses RAM every other 2MHz cycle (as if the ULA still needed to access the RAM), one useful enhancement would be a mechanism to let the CPU take over the ULA cycles outside the ULA's period of activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to 3. Thus, the RAM access cycles would resemble the following in MODE 0 to 3: Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_) On a non-display line: CCCCCCCC (instead of C_C_C_C_) In MODE 4 to 6: Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_) On a non-display line: CCCCCCCC (instead of C_C_C_C_) This would improve CPU bandwidth as follows: Standard ULA Enhanced ULA MODE 0, 1, 2 9728 bytes 19456 bytes MODE 3 12288 bytes 24576 bytes MODE 4, 5 19968 bytes 29696 bytes MODE 6 19968 bytes 32256 bytes (Here, the uncontended 2MHz bandwidth for a display period would be 39936 bytes, being 128 cycles per line over 312 lines.) With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth because all access opportunities to RAM are doubled. Meanwhile, in the other modes, some CPU accesses occur alongside ULA accesses and thus cannot be doubled, but the CPU bandwidth increase is still significant. Unfortunately, the mechanism for accessing the RAM is too slow to provide data within the time constraints of 2MHz operation. There is no time remaining in a 2MHz cycle for the CPU to receive and process any retrieved data. Enhancement: Region Blanking ---------------------------- The problem of permitting character-oriented blitting in programs whilst scrolling the screen by sub-character amounts could be mitigated by permitting a region of the display to be blank, such as the final lines of the display. Consider the following vertical scrolling by 2 bytes that would cause an initial character row of 6 lines and a final character row of 2 lines: 6 lines - initial, partial character row 248 lines - 31 complete rows 2 lines - final, partial character row If a routine were in use that wrote 8 line bitmaps to the partial character row now split in two, it would be advisable to hide one of the regions in order to prevent content appearing in the wrong place on screen (such as content meant to appear at the top "leaking" onto the bottom). Blanking 6 lines would be sufficient, as can be seen from the following cases. Scrolling up by 2 lines: 6 lines - initial, partial character row 240 lines - 30 complete rows 4 lines - part of 1 complete row ----------------------------------------------------------------- 4 lines - part of 1 complete row (hidden to maintain 250 lines) 2 lines - final, partial character row (hidden) Scrolling down by 2 lines: 2 lines - initial, partial character row 248 lines - 31 complete rows ---------------------------------------------------------- 6 lines - final, partial character row (hidden) Thus, in this case, region blanking would impose a 250 line display with the bottom 6 lines blank. See the description of the display suspend enhancement for a more efficient way of blanking lines than merely blanking the palette whilst allowing the CPU to perform useful work during the blanking period. To control the blanking or suspending of lines at the top and bottom of the display, a memory location could be dedicated to the task: the upper 4 bits could define a blanking region of up to 16 lines at the top of the screen, whereas the lower 4 bits could define such a region at the bottom of the screen. If more lines were required, two locations could be employed, allowing the top and bottom regions to occupy the entire screen. Enhancement: Screen Height Adjustment ------------------------------------- The height of the screen could be configurable in order to reduce screen memory consumption. This is not quite done in MODE 3 and 6 since the start of the screen appears to be rounded down to the nearest page, but by reducing the height by amounts more than a page, savings would be possible. For example: Screen width Depth Height Bytes per line Saving in bytes Start address ------------ ----- ------ -------------- --------------- ------------- 640 1 252 80 320 &3140 -> &3100 640 1 248 80 640 &3280 -> &3200 320 1 240 40 640 &5A80 -> &5A00 320 2 240 80 1280 &3500 Screen Mode Selection --------------------- Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider range of modes, the other bits of &FE*7 (related to sound, cassette input/output and the Caps Lock LED) would need to be reassigned and bit 0 potentially being made available for use. Enhancement: Palette Definition ------------------------------- Since all memory accesses go via the ULA, an enhanced ULA could employ more specific addresses than &FE*X to perform enhanced functions. For example, the palette control is done using &FE*8-F and merely involves selecting predefined colours, whereas an enhanced ULA could support the redefinition of all 16 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F (colours 8 to 15), where a single byte might provide 8 bits per pixel colour specifications similar to those used on the Archimedes. The principal limitation here is actually the hardware: the Electron has only a single output line for each of the red, green and blue channels, and if those outputs are strictly digital and can only be set to a "high" and "low" value, then only the existing eight colours are possible. If a modern ULA were able to output analogue values (or values at well-defined points between the high and low values, such as the half-on value supported by the Amstrad CPC series), it would still need to be assessed whether the circuitry could successfully handle and propagate such values. Various sources indicate that only "TTL levels" are supported by the RGB output circuit, and since there are 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it is likely that the ULA is expected to provide only "high" or "low" values. Short of adding extra outputs from the ULA (either additional red, green and blue outputs or a combined intensity output), another approach might involve some kind of modulation where an output value might be encoded in multiple pulses at a higher frequency than the pixel frequency. However, this would demand additional circuitry outside the ULA, and component RGB monitors would probably not be able to take advantage of this feature; only UHF and composite video devices (the latter with the composite video colour support enabled on the Electron's circuit board) would potentially benefit. Flashing Colours ---------------- According to the Advanced User Guide, "The cursor and flashing colours are entirely generated in software: This means that all of the logical to physical colour map must be changed to cause colours to flash." This appears to suggest that the palette registers must be updated upon the flash counter - read and written by OSBYTE &C1 (193) - reaching zero and that some way of changing the colour pairs to be any combination of colours might be possible, instead of having colour complements as pairs. It is conceivable that the interrupt code responsible does the simple thing and merely inverts the current values for any logical colours (LC) for which the associated physical colour (as supplied as the second parameter to the VDU 19 call) has the top bit of its four bit value set. These top bits are not recorded in the palette registers but are presumably recorded separately and used to build bitmaps as follows: LC 2 colour 4 colour 16 colour 4-bit value for inversion -- -------- -------- --------- ------------------------- 0 00010001 00010001 00010001 1, 1, 1 1 01000100 00100010 00010001 4, 2, 1 2 01000100 00100010 4, 2 3 10001000 00100010 8, 2 4 00010001 1 5 00010001 1 6 00100010 2 7 00100010 2 8 01000100 4 9 01000100 4 10 10001000 8 11 10001000 8 12 01000100 4 13 01000100 4 14 10001000 8 15 10001000 8 Inversion value calculation: 2 colour formula: 1 << (colour * 2) 4 colour formula: 1 << colour 16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2)) For example, where logical colour 0 has been mapped to a physical colour in the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to the inversion operation. (The lower three bits of the physical colour would be used to set the underlying colour information affected by the inversion operation.) An operation in the interrupt code would then combine the bitmaps for all logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being combined for groups of logical colours as follows: Logical colours --------------- 0, 2, 8, 10 4, 6, 12, 14 5, 7, 13, 15 1, 3, 9, 11 These combined bitmaps would be EORed with the existing palette register values in order to perform the value inversion necessary to produce the flashing effect. Thus, in the VDU 19 operation, the appropriate inversion value would be calculated for the logical colour, and this value would then be combined with other inversion values in a dedicated memory location corresponding to the colour's group as indicated above. Meanwhile, the palette channel values would be derived from the lower three bits of the specified physical colour and combined with other palette data in dedicated memory locations corresponding to the palette registers. Interestingly, although flashing colours on the BBC Micro are controlled by toggling bit 0 of the &FE20 control register location for the Video ULA, the actual colour inversion is done in hardware. Enhancement: Palette Definition Lists ------------------------------------- It can be useful to redefine the palette in order to change the colours available for a particular region of the screen, particularly in modes where the choice of colours is constrained, and if an increased colour depth were available, palette redefinition would be useful to give the illusion of more than 16 colours in MODE 2. Traditionally, palette redefinition has been done by using interrupt-driven timers, but a more efficient approach would involve presenting lists of palette definitions to the ULA so that it can change the palette at a particular display line. One might define a palette redefinition list in a region of memory and then communicate its contents to the ULA by writing the address and length of the list, along with the display line at which the palette is to be changed, to ULA registers such that the ULA buffers the list and performs the redefinition at the appropriate time. Throughput/bandwidth considerations might impose restrictions on the practical length of such a list, however. Enhancement: Display Synchronisation Interrupts ----------------------------------------------- When completing each scanline of the display, the ULA could trigger an interrupt. Since this might impact system performance substantially, the feature would probably need to be configurable, and it might be sufficient to have an interrupt only after a certain number of display lines instead. Permitting the CPU to take action after eight lines would allow palette switching and other effects to occur on a character row basis. The ULA provides an interrupt at the end of the display period, presumably so that software can schedule updates to the screen, avoid flickering or tearing, and so on. However, some applications might benefit from an interrupt at, or just before, the start of the display period so that palette modifications or similar effects could be scheduled. Enhancement: Palette-Free Modes ------------------------------- Palette-free modes might be defined where bit values directly correspond to the red, green and blue channels, although this would mostly make sense only for modes with depths greater than the standard 4 bits per pixel, and such modes would require more memory than MODE 2 if they were to have an acceptable resolution. Enhancement: Display Suspend ---------------------------- Especially when writing to the screen memory, it could be beneficial to be able to suspend the ULA's access to the memory, instead producing blank values for all screen pixels until a program is ready to reveal the screen. This is different from palette blanking since with a blank palette, the ULA is still reading screen memory and translating its contents into pixel values that end up being blank. This function is reminiscent of a capability of the ZX81, albeit necessary on that hardware to reduce the load on the system CPU which was responsible for producing the video output. By allowing display suspend on the Electron, the performance benefit would be derived from giving the CPU full access to the memory bandwidth. The region blanking feature mentioned above could be implemented using this enhancement instead of employing palette blanking for the affected lines of the display. Enhancement: Memory Filling --------------------------- A capability that could be given to an enhanced ULA is that of permitting the ULA to write to screen memory as well being able to read from it. Although such a capability would probably not be useful in conjunction with the existing read operations when producing a screen display, and insufficient bandwidth would exist to do so in high-bandwidth screen modes anyway, the capability could be offered during a display suspend period (as described above), permitting a more efficient mechanism to rapidly fill memory with a predetermined value. This capability could also support block filling, where the limits of the filled memory would be defined by the position and size of a screen area, although this would demand the provision of additional registers in the ULA to retain the details of such areas and additional logic to control the fill operation. Enhancement: Region Filling --------------------------- An alternative to memory writing might involve indicating regions using additional registers or memory where the ULA fills regions of the screen with content instead of reading from memory. Unlike hardware sprites which should realistically provide varied content, region filling could employ single colours or patterns, and one advantage of doing so would be that the ULA need not access memory at all within a particular region. Regions would be defined on a row-by-row basis. Instead of reading memory and blitting a direct representation to the screen, the ULA would read region definitions containing a start column, region width and colour details. There might be a certain number of definitions allowed per row, or the ULA might just traverse an ordered list of such definitions with each one indicating the row, start column, region width and colour details. One could even compress this information further by requiring only the row, start column and colour details with each subsequent definition terminating the effect of the previous one. However, one would also need to consider the convenience of preparing such definitions and whether efficient access to definitions for a particular row might be desirable. It might also be desirable to avoid having to prepare definitions for "empty" areas of the screen, effectively making the definition of the screen contents employ run-length encoding and employ only colour plus length information. One application of region filling is that of simple 2D and 3D shape rendering. Although it is entirely possible to plot such shapes to the screen and have the ULA blit the memory contents to the screen, such operations consume bandwidth both in the initial plotting and in the final transfer to the screen. Region filling would reduce such bandwidth usage substantially. This way of representing screen images would make certain kinds of images unfeasible to represent - consider alternating single pixel values which could easily occur in some character bitmaps - even if an internal queue of regions were to be supported such that the ULA could read ahead and buffer such "bandwidth intensive" areas. Thus, the ULA might be better served providing this feature for certain areas of the display only as some kind of special graphics window. Enhancement: Hardware Sprites ----------------------------- An enhanced ULA might provide hardware sprites, but this would be done in an way that is incompatible with the standard ULA, since no &FE*X locations are available for allocation. To keep the facility simple, hardware sprites would have a standard byte width and height. The specification of sprites could involve the reservation of 16 locations (for example, &FE20-F) specifying a fixed number of eight sprites, with each location pair referring to the sprite data. By limiting the ULA to dealing with a fixed number of sprites, the work required inside the ULA would be reduced since it would avoid having to deal with arbitrary numbers of sprites. The principal limitation on providing hardware sprites is that of having to obtain sprite data, given that the ULA is usually required to retrieve screen data, and given the lack of memory bandwidth available to retrieve sprite data (particularly from multiple sprites supposedly at the same position) and screen data simultaneously. Although the ULA could potentially read sprite data and screen data in alternate memory accesses in screen modes where the bandwidth is not already fully utilised, this would result in a degradation of performance. Enhancement: Additional Screen Mode Configurations -------------------------------------------------- Alternative screen mode configurations could be supported. The ULA has to produce 640 pixel values across the screen, with pixel doubling or quadrupling employed to fill the screen width: Screen width Columns Scaling Depth Bytes ------------ ------- ------- ----- ----- 640 80 x1 1 80 320 40 x2 1, 2 40, 80 160 20 x4 2, 4 40, 80 It must also use at most 80 byte-sized memory accesses to provide the information for the display. Given that characters must occupy an 8x8 pixel array, if a configuration featuring anything other than 20, 40 or 80 character columns is to be supported, compromises must be made such as the introduction of blank pixels either between characters (such as occurs between rows in MODE 3 and 6) or at the end of a scanline (such as occurs at the end of the frame in MODE 3 and 6). Consider the following configuration: Screen width Columns Scaling Depth Bytes Blank ------------ ------- ------- ----- ------ ----- 208 26 x3 1, 2 26, 52 16 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4 colours could be provided, with 16 blank pixel values (out of a total of 640) generated either at the start or end (or split between the start and end) of each scanline. Enhancement: Character Attributes --------------------------------- The BBC Micro MODE 7 employs something resembling character attributes to support teletext displays, but depends on circuitry providing a character generator. The ZX Spectrum, on the other hand, provides character attributes as a means of colouring bitmapped graphics. Although such a feature is very limiting as the sole means of providing multicolour graphics, in situations where the choice is between low resolution multicolour graphics or high resolution monochrome graphics, character attributes provide a potentially useful compromise. For each byte read, the ULA must deliver 8 pixel values (out of a total of 640) to the video output, doing so by either emptying its pixel buffer on a pixel per cycle basis, or by multiplying pixels and thus holding them for more than one cycle. For example for a screen mode having 640 pixels in width: Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Reads: B B Pixels: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 And for a screen mode having 320 pixels in width: Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Reads: B Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 However, in modes where less than 80 bytes are required to generate the pixel values, an enhanced ULA might be able to read additional bytes between those providing the bitmapped graphics data: Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Reads: B A Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 These additional bytes could provide colour information for the bitmapped data in the following character column (of 8 pixels). Since it would be desirable to apply attribute data to the first column, the initial 8 cycles might be configured to not produce pixel values. For an entire character, attribute data need only be read for the first row of pixels for a character. The subsequent rows would have attribute information applied to them, although this would require the attribute data to be stored in some kind of buffer. Thus, the following access pattern would be observed: Reads: A B _ B _ B _ B _ B _ B _ B _ B ... In modes 3 and 6, the blank display lines could be used to retrieve attribute data: Reads (blank): A _ A _ A _ A _ A _ A _ A _ A _ ... Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ... Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ... ... See below for a discussion of using this for character data as well. A whole byte used for colour information for a whole character would result in a choice of 256 colours, and this might be somewhat excessive. By only reading attribute bytes at every other opportunity, a choice of 16 colours could be applied individually to two characters. Cycle: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Reads: B A B - Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 Further reductions in attribute data access, offering 4 colours for every character in a four character block, for example, might also be worth considering. Consider the following configurations for screen modes with a colour depth of 1 bit per pixel for bitmap information: Screen width Columns Scaling Bytes (B) Bytes (A) Colours Screen start ------------ ------- ------- --------- --------- ------- ------------ 320 40 x2 40 40 256 &5300 320 40 x2 40 20 16 &5580 -> &5500 320 40 x2 40 10 4 &56C0 -> &5600 208 26 x3 26 26 256 &62C0 -> &6200 208 26 x3 26 13 16 &6460 -> &6400 Enhancement: Text-Only Modes using Character and Attribute Data --------------------------------------------------------------- In modes 3 and 6, the blank display lines could be used to retrieve character and attribute data instead of trying to insert it between bitmap data accesses, but this data would then need to be retained: Reads: A C A C A C A C A C A C A C A C ... Reads: B _ B _ B _ B _ B _ B _ B _ B _ ... Only attribute (A) and character (C) reads would require screen memory storage. Bitmap data reads (B) would involve either accesses to memory to obtain character definition details or could, at the cost of special storage in the ULA, involve accesses within the ULA that would then free up the RAM. However, the CPU would not benefit from having any extra access slots due to the limitations of the RAM access mechanism. A scheme without caching might be possible. The same line of memory addresses might be visited over and over again for eight display lines, with an index into the bitmap data being incremented from zero to seven. The access patterns would look like this: Reads: C B C B C B C B C B C B C B C B ... (generate data from index 0) Reads: C B C B C B C B C B C B C B C B ... (generate data from index 1) Reads: C B C B C B C B C B C B C B C B ... (generate data from index 2) Reads: C B C B C B C B C B C B C B C B ... (generate data from index 3) Reads: C B C B C B C B C B C B C B C B ... (generate data from index 4) Reads: C B C B C B C B C B C B C B C B ... (generate data from index 5) Reads: C B C B C B C B C B C B C B C B ... (generate data from index 6) Reads: C B C B C B C B C B C B C B C B ... (generate data from index 7) The bandwidth requirements would be the sum of the accesses to read the character values (repeatedly) and those to read the bitmap data to reproduce the characters on screen. Enhancement: MODE 7 Emulation using Character Attributes -------------------------------------------------------- If the scheme of applying attributes to character regions were employed to emulate MODE 7, in conjunction with the MODE 6 display technique, the following configuration would be required: Screen width Columns Rows Bytes (B) Bytes (A) Colours Screen start ------------ ------- ---- --------- --------- ------- ------------ 320 40 25 40 20 16 &5ECC -> &5E00 320 40 25 40 10 4 &5FC6 -> &5F00 Although this requires much more memory than MODE 7 (8500 bytes versus MODE 7's 1000 bytes), it does not need much more memory than MODE 6, and it would at least make a limited 40-column multicolour mode available as a substitute for MODE 7. Using the text-only enhancement with caching of data or with repeated reads of the same character data line for eight display lines, the storage requirements would be diminished substantially: Screen width Columns Rows Bytes (C) Bytes (A) Colours Screen start ------------ ------- ---- --------- --------- ------- ------------ 320 40 25 40 20 16 &7A94 -> &7A00 320 40 25 40 10 4 &7B1E -> &7B00 320 40 25 40 5 2 &7B9B -> &7B00 320 40 25 40 0 (2) &7C18 -> &7C00 640 80 25 80 40 16 &7448 -> &7400 640 80 25 80 20 4 &763C -> &7600 640 80 25 80 10 2 &7736 -> &7700 640 80 25 80 0 (2) &7830 -> &7800 Note that the colours describe the locally defined attributes for each character. When no attribute information is provided, the colours are defined globally. Enhancement: Compressed Character Data -------------------------------------- Another observation about text-only modes is that they only need to store a restricted set of bitmapped data values. Encoding this set of values in a smaller unit of storage than a byte could possibly help to reduce the amount of storage and bandwidth required to reproduce the characters on the display. Enhancement: High Resolution Graphics ------------------------------------- Screen modes with higher resolutions and larger colour depths might be possible, but this would in most cases involve the allocation of more screen memory, and the ULA would probably then be obliged to page in such memory for the CPU to be able to sensibly access it all. Enhancement: Genlock Support ---------------------------- The ULA generates a video signal in conjunction with circuitry producing the output features necessary for the correct display of the screen image. However, it appears that the ULA drives the video synchronisation mechanism instead of reacting to an existing signal. Genlock support might be possible if the ULA were made to be responsive to such external signals, resetting its address generators upon receiving synchronisation events. Enhancement: Improved Sound --------------------------- The standard ULA reserves &FE*6 for sound generation and cassette input/output (with bits 1 and 2 of &FE*7 being used to select either sound generation or cassette I/O), thus making it impossible to support multiple channels within the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control, and an enhanced ULA could adopt this interface. The BBC Micro uses the SN76489 chip to produce sound, and the entire functionality of this chip could be emulated for enhanced sound, with a subset of the functionality exposed via the &FE*6 interface. See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489 See: http://www.smspower.org/Development/SN76489 Enhancement: Waveform Upload ---------------------------- As with a hardware sprite function, waveforms could be uploaded or referenced using locations as registers referencing memory regions. Enhancement: Sound Input/Output ------------------------------- Since the ULA already controls audio input/output for cassette-based data, it would have been interesting to entertain the idea of sampling and output of sounds through the cassette interface. However, a significant amount of circuitry is employed to process the input signal for use by the ULA and to process the output signal for recording. See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11 Enhancement: BBC ULA Compatibility ---------------------------------- Although some new ULA functions could be defined in a way that is also compatible with the BBC Micro, the BBC ULA is itself incompatible with the Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory map, but controls various functions specific to the 6845 video controller; &FE08-F is reserved for the serial controller. It therefore becomes possible to disregard compatibility where compatibility is already disregarded for a particular area of functionality. &FE20-F maps to video ULA functionality on the BBC Micro which provides control over the palette (using address &FE21, compared to &FE07-F on the Electron) and other system-specific functions. Since the location usage is generally incompatible, this region could be reused for other purposes. Enhancement: Increased RAM, ULA and CPU Performance --------------------------------------------------- More modern implementations of the hardware might feature faster RAM coupled with an increased ULA clock frequency in order to increase the bandwidth available to the ULA and to the CPU in situations where the ULA is not needed to perform work. A ULA employing a 32MHz clock would be able to complete the retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU to access the RAM for the following 250ns even in display modes requiring the retrieval of a byte for the display every 500ns. The CPU could, subject to timing issues, run at 2MHz even in MODE 0, 1 and 2. A scheme such as that described above would have a similar effect to the scheme employed in the BBC Micro, although the latter made use of RAM with a wider bandwidth in order to complete memory transfers within 250ns and thus permit the CPU to run continuously at 2MHz. Higher bandwidth could potentially be used to implement exotic features such as RAM-resident hardware sprites or indeed any feature demanding RAM access concurrent with the production of the display image. Enhancement: Multiple CPU Stacks and Zero Pages ----------------------------------------------- The 6502 maintains a stack for subroutine calls and register storage in page &01. Although the stack register can be manipulated using the TSX and TXS instructions, thereby permitting the maintenance of multiple stack regions and thus the potential coexistence of multiple programs each using a separate region, only programs that make little use of the stack (perhaps avoiding deeply-nested subroutine invocations and significant register storage) would be able to coexist without overwriting each other's stacks. One way that this issue could be alleviated would involve the provision of a facility to redirect accesses to page &01 to other areas of memory. The ULA would provide a register that defines a physical page for the use of the CPU's "logical" page &01, and upon any access to page &01 by the CPU, the ULA would change the asserted address lines to redirect the access to the appropriate physical region. By providing an 8-bit register, mapping to the most significant byte (MSB) of a 16-bit address, the ULA could then replace any MSB equal to &01 with the register value before the access is made. Where multiple programs coexist, upon switching programs, the register would be updated to point the ULA to the appropriate stack location, thus providing a simple memory management unit (MMU) capability. In a similar fashion, zero page accesses could also be redirected so that code could run from sideways RAM and have zero page operations redirected to "upper memory" - for example, to page &BE (with stack accesses redirected to page &BF, perhaps) - thereby permitting most CPU operations to occur without inadvertent accesses to "lower memory" (the RAM) which would risk stalling the CPU as it contends with the ULA for memory access. Such facilities could also be provided by a separate circuit between the CPU and ULA in a fashion similar to that employed by a "turbo" board, but unlike such boards, no additional RAM would be provided: all memory accesses would occur as normal through the ULA, albeit redirected when configured appropriately. ULA Pin Functions ----------------- The functions of the ULA pins are described in the Electron Service Manual. Of interest to video processing are the following: CSYNC (low during horizontal or vertical synchronisation periods, high otherwise) HS (low during horizontal synchronisation periods, high otherwise) RED, GREEN, BLUE (pixel colour outputs) CLOCK IN (a 16MHz clock input, 4V peak to peak) PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU) More general memory access pins: RAM0...RAM3 (data lines to/from the RAM) RA0...RA7 (address lines for sending both row and column addresses to the RAM) RAS (row address strobe setting the row address on a negative edge - see the timing notes) CAS (column address strobe setting the column address on a negative edge - see the timing notes) WE (sets write enable with logic 0, read with logic 1) ROM (select data access from ROM) CPU-oriented memory access pins: A0...A15 (CPU address lines) PD0...PD7 (CPU data lines) R/W (indicates CPU write with logic 0, CPU read with logic 1) Interrupt-related pins: NMI (CPU request for uninterrupted 1MHz access to memory) IRQ (signal event to CPU) POR (power-on reset, resetting the ULA on a positive edge and asserting the CPU's RST pin) RST (master reset for the CPU signalled on power-up and by the Break key) Keyboard-related pins: KBD0...KBD3 (keyboard inputs) CAPS LOCK (control status LED) Sound-related pins: SOUND O/P (sound output using internal oscillator) Cassette-related pins: CAS IN (cassette circuit input, between 0.5V to 2V peak to peak) CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak) CAS RC (detect high tone) CAS MO (motor relay output) ÷13 IN (~1200 baud clock input) ULA Socket ---------- The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket. References ---------- See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm About this Document ------------------- The most recent version of this document and accompanying distribution should be available from the following location: http://hgweb.boddie.org.uk/ULA Copyright and licence information can be found in the docs directory of this distribution - see docs/COPYING.txt for more information.