1 The Acorn Electron ULA
2 ======================
3
4 Principal Design and Feature Constraints
5 ----------------------------------------
6
7 The features of the ULA are limited by the amount of time and resources that
8 can be allocated to each activity necessary to support such features given the
9 fundamental obligations of the unit. Maintaining a screen display based on the
10 contents of RAM itself requires the ULA to have exclusive access to such
11 hardware resources for a significant period of time. Whilst other elements of
12 the ULA can in principle run in parallel with this activity, they cannot also
13 access the RAM. Consequently, other features that might use the RAM must
14 accept a reduced allocation of that resource in comparison to a hypothetical
15 architecture where concurrent RAM access is possible.
16
17 Thus, the principal constraint for many features is bandwidth. The duration of
18 access to hardware resources is one aspect of this; the rate at which such
19 resources can be accessed is another. For example, the RAM is not fast enough
20 to support access more frequently than one byte per 2MHz cycle, and for screen
21 modes involving 80 bytes of screen data per scanline, there are no free cycles
22 for anything other than the production of pixel output during the active
23 scanline periods.
24
25 Timing
26 ------
27
28 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
29 of which are used to generate pixel data. At 50Hz, this means that 128 cycles
30 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
31 312 ~= 128 cycles). This is consistent with the observation that each scanline
32 requires at most 80 bytes of data, and that the ULA is apparently busy for 40
33 out of 64 microseconds in each scanline.
34
35 (In fact, since the ULA is seeking to provide an image for an interlaced
36 625-line display, there are in fact two "fields" involved, one providing 312
37 scanlines and one providing 313 scanlines. See below for a description of the
38 video system.)
39
40 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
41 each providing two bits of each byte) using two cycles within the 500ns period
42 of the 2MHz clock to complete each access operation. Since the CPU and ULA
43 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
44 effectively run at 1MHz (since every other 500ns period involves the ULA
45 accessing RAM). The CPU is driven by an external clock (IC8) whose 16MHz
46 frequency is divided by the ULA (IC1) depending on the screen mode in use.
47
48 Each 16MHz cycle is approximately 62.5ns. To access the memory, the following
49 patterns corresponding to 16MHz cycles are required:
50
51 Time (ns): 0-------------- 500------------ ...
52 2 MHz cycle: 0 1 ...
53 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
54 ~RAS: 0 1 0 1 ...
55 ~CAS: 0 1 0 1 0 1 0 1 ...
56 A B C A B C ...
57 F S F S ...
58 a b c a b c ...
59
60 Here, "A" and "B" respectively indicate the row and first column addresses
61 being latched into the RAM (on a negative edge for ~RAS and ~CAS
62 respectively), and "C" indicates the second column address being latched into
63 the RAM. Presumably, the first and second half-bytes can be read at "F" and
64 "S" respectively, and the row and column addresses must be made available at
65 "a" and "b" (and "c") respectively at the latest.
66
67 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
68 address access time of 90ns (maximum), which appears to mean that
69 approximately two 16MHz cycles after the row address is latched, and one and a
70 half cycles after the column address is latched, the data becomes available.
71
72 Note that the Service Manual refers to the negative edge of RAS and CAS, but
73 the datasheet for the similar TM4164EC4 product shows latching on the negative
74 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
75 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
76 "page mode" provides the appropriate behaviour for that particular product.
77
78 The CPU, when accessing the RAM alone, apparently does not make use of the
79 vacated "slot" that the ULA would otherwise use (when interleaving accesses in
80 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
81 accessing ROM (and potentially sideways RAM).
82
83 See: Acorn Electron Advanced User Guide
84 See: Acorn Electron Service Manual
85 http://acorn.chriswhy.co.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
86 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
87 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
88
89 Bandwidth Figures
90 -----------------
91
92 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
93 total lines, with 80 cycles occurring in the active periods of display
94 scanlines, the following bandwidth calculations can be performed:
95
96 Total theoretical maximum:
97 128 cycles * 312 lines
98 = 39936 bytes
99
100 MODE 0, 1, 2:
101 ULA: 80 cycles * 256 lines
102 = 20480 bytes
103 CPU: 48 cycles / 2 * 256 lines
104 + 128 cycles / 2 * (312 - 256) lines
105 = 9728 bytes
106
107 MODE 3:
108 ULA: 80 cycles * 24 rows * 8 lines
109 = 15360 bytes
110 CPU: 48 cycles / 2 * 24 rows * 8 lines
111 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
112 = 12288 bytes
113
114 MODE 4, 5:
115 ULA: 40 cycles * 256 lines
116 = 10240 bytes
117 CPU: (40 cycles + 48 cycles / 2) * 256 lines
118 + 128 cycles / 2 * (312 - 256) lines
119 = 19968 bytes
120
121 MODE 6:
122 ULA: 40 cycles * 24 rows * 8 lines
123 = 7680 bytes
124 CPU: (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
125 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
126 = 19968 bytes
127
128 Here, the division of 2 for CPU accesses is performed to indicate that the CPU
129 only uses every other access opportunity even in uncontended periods. See the
130 2MHz RAM Access enhancement below for bandwidth calculations that consider
131 this limitation removed.
132
133 Video Timing
134 ------------
135
136 According to 8.7 in the Service Manual, and the PAL Wikipedia page,
137 approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
138 (including the "colour burst"), and 1.65µs for the "front porch", totalling
139 12.05µs and thus leaving 51.95µs for the active video signal for each
140 scanline. As the Service Manual suggests in the oscilloscope traces, the
141 display information is transmitted more or less centred within the active
142 video period since the ULA will only be providing pixel data for 40µs in each
143 scanline.
144
145 Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
146 each scanline can be divided into 1024 cycles, although only 640 at most are
147 actively used to provide pixel data. Pixel data production should only occur
148 within a certain period on each scanline, approximately 262 cycles after the
149 start of hsync:
150
151 active video period = 51.95µs
152 pixel data period = 40µs
153 total silent period = 51.95µs - 40µs = 11.95µs
154 silent periods (before and after) = 11.95µs / 2 = 5.975µs
155 hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
156 time before pixel data period = 10.4µs + 5.975µs = 16.375µs
157 pixel data period start cycle = 16.375µs / 62.5ns = 262
158
159 By choosing a number divisible by 8, the RAM access mechanism can be
160 synchronised with the pixel production. Thus, 264 is a more appropriate start
161 cycle.
162
163 The "vertical blanking period", meaning the period before picture information
164 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
165 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
166 lines. Thus, the first visible scanline on the first field of a frame occurs
167 half way through the 23rd scanline period measured from the start of vsync:
168
169 10 20 23
170 Line in frame: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
171 Line from 1: 0 22 3
172 Line on screen: .:::::VVVVV::::: 12233445566
173 |_________________________________________________|
174 25 line vertical blanking period
175
176 In the second field of a frame, the first visible scanline coincides with the
177 24th scanline period measured from the start of line 313 in the frame:
178
179 310 336
180 Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
181 Line from 313: 0 23 4
182 Line on screen: 88:::::VVVVV:::: 11223344
183 288 | |
184 |_________________________________________________|
185 25 line vertical blanking period
186
187 In order to consider only full lines, we might consider the start of each
188 frame to occur 23 lines after the start of vsync.
189
190 Again, it is likely that pixel data production should only occur on scanlines
191 within a certain period on each frame. The "625/50" document indicates that
192 only a certain region is "safe" to use, suggesting a vertically centred region
193 with approximately 15 blank lines above and below the picture. Thus, the start
194 of the picture could be chosen as 38 lines after the start of vsync.
195
196 See: http://en.wikipedia.org/wiki/PAL
197 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
198 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
199 http://lipas.uwasa.fi/~f76998/video/modes/
200 See: PAL TV timing and voltages
201 http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
202 See: Line Standards
203 http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
204
205 RAM Integrated Circuits
206 -----------------------
207
208 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
209 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
210 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
211 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
212 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
213
214 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
215 the Samsung-produced KM41464 series is apparently equivalent to the Texas
216 Instruments 4164 chips presumably used in the Electron.
217
218 The TM4164EC4 series combines 4 64K x 1b units into a single package and
219 appears similar to the TM4164EA4 featured on the Electron's circuit diagram
220 (in the Advanced User Guide but not the Service Manual), and it also has 22
221 pins providing 3 additional inputs and 3 additional outputs over the 16 pins
222 of the individual 4164-15 modules, presumably allowing concurrent access to
223 the packaged memory units.
224
225 As far as currently available replacements are concerned, the NTE4164 is a
226 potential candidate: according to the Vetco Electronics entry, it is
227 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
228 parts include the NTE2164 and the NTE6664, both of which appear to have
229 largely the same performance and connection characteristics. Meanwhile, the
230 NTE21256 appears to be a 16-pin replacement with four times the capacity that
231 maintains the single data input and output pins. Using the NTE21256 as a
232 replacement for all ICs combined would be difficult because of the single bit
233 output.
234
235 Another device equivalent to the 4164-15 appears to be available under the
236 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
237 site lists data sheets for other devices on the same page, but these are
238 different and actually appear to be provided under the 41574 product code (but
239 are listed under 41464-10) and appear to be replacements for the TM4164EC4:
240 the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
241 employing 4 pins for both input and output.
242
243 Pins I/O pins Row access Column access
244 ---- -------- ---------- -------------
245 TM4164EC4 22 4 + 4 150ns (15) 90ns (15)
246 KM41464AP 18 4 150ns (15) 75ns (15)
247 NTE21256 16 1 + 1 150ns 75ns
248 HYB 4164-2 16 1 + 1 150ns 100ns
249 µPD41464 18 4 120ns (12) 60ns (12)
250
251 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
252 http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf
253 See: Dynamic RAMS
254 http://www.unicornelectronics.com/IC/DYNAMIC.html
255 See: New old stock 8x 4164 chips
256 http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
257 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
258 http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
259 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
260 http://www.vetco.net/catalog/product_info.php?products_id=2806
261 See: NTE4164 - IC-NMOS 64K DRAM 150NS
262 http://www.vetco.net/catalog/product_info.php?products_id=3680
263 See: NTE21256 - IC-256K DRAM 150NS
264 http://www.vetco.net/catalog/product_info.php?products_id=2799
265 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
266 http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
267 See: NTE6664 - IC-MOS 64K DRAM 150NS
268 http://www.vetco.net/catalog/product_info.php?products_id=5213
269 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
270 http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
271 See: 4164-150: MAJOR BRANDS
272 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
273 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
274 http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
275 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
276 http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
277 See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
278 http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
279 See: 41464-10: MAJOR BRANDS
280 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
281
282 Interrupts
283 ----------
284
285 The ULA generates IRQs (maskable interrupts) according to certain conditions
286 and these conditions are controlled by location &FE00:
287
288 * Vertical sync (bottom of displayed screen)
289 * 50MHz real time clock
290 * Transmit data empty
291 * Receive data full
292 * High tone detect
293
294 The ULA is also used to clear interrupt conditions through location &FE05. Of
295 particular significance is bit 7, which must be set if an NMI (non-maskable
296 interrupt) has occurred and has thus suspended ULA access to memory, restoring
297 the normal function of the ULA.
298
299 ROM Paging
300 ----------
301
302 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
303 mappings exist:
304
305 8 keyboard
306 9 keyboard (duplicate)
307 10 BASIC ROM
308 11 BASIC ROM (duplicate)
309
310 Paging in a ROM involves the following procedure:
311
312 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
313 2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
314 selected.
315 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
316 whilst writing the desired ROM number n in bits 0 to 2.
317
318 Shadow/Expanded Memory
319 ----------------------
320
321 The Electron exposes all sixteen address lines and all eight data lines
322 through the expansion bus. Using such lines, it is possible to provide
323 additional memory - typically sideways ROM and RAM - on expansion cards and
324 through cartridges, although the official cartridge specification provides
325 fewer address lines and only seeks to provide access to memory in 16K units.
326
327 Various modifications and upgrades were developed to offer "turbo"
328 capabilities to the Electron, permitting the CPU to access a separate 8K of
329 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
330 the ULA through additional logic. However, an enhanced ULA might support
331 independent CPU access to memory over the expansion bus by allowing itself to
332 be discharged from providing access to memory, potentially for a range of
333 addresses, and for the CPU to communicate with external memory uninterrupted.
334
335 Sideways RAM/ROM and Upper Memory Access
336 ----------------------------------------
337
338 Although the ULA controls the CPU clock, effectively slowing or stopping the
339 CPU when the ULA needs to access screen memory, it is apparently able to allow
340 the CPU to access addresses of &8000 and above - the upper region of memory -
341 at 2MHz independently of any access to RAM that the ULA might be performing,
342 only blocking the CPU if it attempts to access addresses of &7FFF and below
343 during any ULA memory access - the lower region of memory - by stopping or
344 stalling its clock.
345
346 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
347 CPU clock if the line goes low, when the CPU is attempting to access the lower
348 region of memory.
349
350 Hardware Scrolling
351 ------------------
352
353 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
354 the least significant 5 bits being zero, thus limiting the scrolling
355 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
356 using the same layout of these addresses.
357
358 |--&FE02--------------| |--&FE03--------------|
359 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
360
361 XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
362
363 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
364 memory to pixel locations is character oriented. A change in 8 bytes would
365 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
366 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
367 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
368 Guide).
369
370 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
371 of changing the screen address by 2 bytes is the change in the number of lines
372 from the initial and final character rows that need reading by the ULA, which
373 would need to maintain this state information (although this is a relatively
374 trivial change). Another pitfall is the complication that might be introduced
375 to software writing bitmaps of character height to the screen.
376
377 Enhancement: 2MHz RAM Access
378 ----------------------------
379
380 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
381 when not competing with the ULA only accesses RAM every other 2MHz cycle (as
382 if the ULA still needed to access the RAM), one useful enhancement would be a
383 mechanism to let the CPU take over the ULA cycles outside the ULA's period of
384 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
385 3.
386
387 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
388
389 Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
390 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
391
392 In MODE 4 to 6:
393
394 Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
395 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
396
397 This would improve CPU bandwidth as follows:
398
399 Standard ULA Enhanced ULA
400 MODE 0, 1, 2 9728 bytes 19456 bytes
401 MODE 3 12288 bytes 24576 bytes
402 MODE 4, 5 19968 bytes 29696 bytes
403 MODE 6 19968 bytes 32256 bytes
404
405 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
406 because all access opportunities to RAM are doubled. Meanwhile, in the other
407 modes, some CPU accesses occur alongside ULA accesses and thus cannot be
408 doubled, but the CPU bandwidth increase is still significant.
409
410 Enhancement: Region Blanking
411 ----------------------------
412
413 The problem of permitting character-oriented blitting in programs whilst
414 scrolling the screen by sub-character amounts could be mitigated by permitting
415 a region of the display to be blank, such as the final lines of the display.
416 Consider the following vertical scrolling by 2 bytes that would cause an
417 initial character row of 6 lines and a final character row of 2 lines:
418
419 6 lines - initial, partial character row
420 248 lines - 31 complete rows
421 2 lines - final, partial character row
422
423 If a routine were in use that wrote 8 line bitmaps to the partial character
424 row now split in two, it would be advisable to hide one of the regions in
425 order to prevent content appearing in the wrong place on screen (such as
426 content meant to appear at the top "leaking" onto the bottom). Blanking 6
427 lines would be sufficient, as can be seen from the following cases.
428
429 Scrolling up by 2 lines:
430
431 6 lines - initial, partial character row
432 240 lines - 30 complete rows
433 4 lines - part of 1 complete row
434 -----------------------------------------------------------------
435 4 lines - part of 1 complete row (hidden to maintain 250 lines)
436 2 lines - final, partial character row (hidden)
437
438 Scrolling down by 2 lines:
439
440 2 lines - initial, partial character row
441 248 lines - 31 complete rows
442 ----------------------------------------------------------
443 6 lines - final, partial character row (hidden)
444
445 Thus, in this case, region blanking would impose a 250 line display with the
446 bottom 6 lines blank.
447
448 See the description of the display suspend enhancement for a more efficient
449 way of blanking lines than merely blanking the palette whilst allowing the CPU
450 to perform useful work during the blanking period.
451
452 To control the blanking or suspending of lines at the top and bottom of the
453 display, a memory location could be dedicated to the task: the upper 4 bits
454 could define a blanking region of up to 16 lines at the top of the screen,
455 whereas the lower 4 bits could define such a region at the bottom of the
456 screen. If more lines were required, two locations could be employed, allowing
457 the top and bottom regions to occupy the entire screen.
458
459 Enhancement: Screen Height Adjustment
460 -------------------------------------
461
462 The height of the screen could be configurable in order to reduce screen
463 memory consumption. This is not quite done in MODE 3 and 6 since the start of
464 the screen appears to be rounded down to the nearest page, but by reducing the
465 height by amounts more than a page, savings would be possible. For example:
466
467 Screen width Depth Height Bytes per line Saving in bytes Start address
468 ------------ ----- ------ -------------- --------------- -------------
469 640 1 252 80 320 &3140 -> &3100
470 640 1 248 80 640 &3280 -> &3200
471 320 1 240 40 640 &5A80 -> &5A00
472 320 2 240 80 1280 &3500
473
474 Screen Mode Selection
475 ---------------------
476
477 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
478 range of modes, the other bits of &FE*7 (related to sound, cassette
479 input/output and the Caps Lock LED) would need to be reassigned and bit 0
480 potentially being made available for use.
481
482 Enhancement: Palette Definition
483 -------------------------------
484
485 Since all memory accesses go via the ULA, an enhanced ULA could employ more
486 specific addresses than &FE*X to perform enhanced functions. For example, the
487 palette control is done using &FE*8-F and merely involves selecting predefined
488 colours, whereas an enhanced ULA could support the redefinition of all 16
489 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
490 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour
491 specifications similar to those used on the Archimedes.
492
493 The principal limitation here is actually the hardware: the Electron has only
494 a single output line for each of the red, green and blue channels, and if
495 those outputs are strictly digital and can only be set to a "high" and "low"
496 value, then only the existing eight colours are possible. If a modern ULA were
497 able to output analogue values, it would still need to be assessed whether the
498 circuitry could successfully handle and propagate such values. Various sources
499 indicate that only "TTL levels" are supported by the RGB output circuit, and
500 since there are 74LS08 AND logic gates involved in the RGB component outputs
501 from the ULA, it is likely that the ULA is expected to provide only "high" or
502 "low" values.
503
504 Short of adding extra outputs from the ULA (either additional red, green and
505 blue outputs or a combined intensity output, the former employed on the
506 Amstrad CPC series), another approach might involve some kind of modulation
507 where an output value might be encoded in multiple pulses at a higher
508 frequency than the pixel frequency. However, this would demand additional
509 circuitry outside the ULA, and component RGB monitors would probably not be
510 able to take advantage of this feature; only UHF and composite video devices
511 (the latter with the composite video colour support enabled on the Electron's
512 circuit board) would potentially benefit.
513
514 Flashing Colours
515 ----------------
516
517 According to the Advanced User Guide, "The cursor and flashing colours are
518 entirely generated in software: This means that all of the logical to physical
519 colour map must be changed to cause colours to flash." This appears to suggest
520 that the palette registers must be updated upon the flash counter - read and
521 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
522 colour pairs to be any combination of colours might be possible, instead of
523 having colour complements as pairs.
524
525 It is conceivable that the interrupt code responsible does the simple thing
526 and merely inverts the current values for any logical colours (LC) for which
527 the associated physical colour (as supplied as the second parameter to the VDU
528 19 call) has the top bit of its four bit value set. These top bits are not
529 recorded in the palette registers but are presumably recorded separately and
530 used to build bitmaps as follows:
531
532 LC 2 colour 4 colour 16 colour 4-bit value for inversion
533 -- -------- -------- --------- -------------------------
534 0 00010001 00010001 00010001 1, 1, 1
535 1 01000100 00100010 00010001 4, 2, 1
536 2 01000100 00100010 4, 2
537 3 10001000 00100010 8, 2
538 4 00010001 1
539 5 00010001 1
540 6 00100010 2
541 7 00100010 2
542 8 01000100 4
543 9 01000100 4
544 10 10001000 8
545 11 10001000 8
546 12 01000100 4
547 13 01000100 4
548 14 10001000 8
549 15 10001000 8
550
551 Inversion value calculation:
552
553 2 colour formula: 1 << (colour * 2)
554 4 colour formula: 1 << colour
555 16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
556
557 For example, where logical colour 0 has been mapped to a physical colour in
558 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
559 the inversion operation. (The lower three bits of the physical colour would be
560 used to set the underlying colour information affected by the inversion
561 operation.)
562
563 An operation in the interrupt code would then combine the bitmaps for all
564 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
565 combined for groups of logical colours as follows:
566
567 Logical colours
568 ---------------
569 0, 2, 8, 10
570 4, 6, 12, 14
571 5, 7, 13, 15
572 1, 3, 9, 11
573
574 These combined bitmaps would be EORed with the existing palette register
575 values in order to perform the value inversion necessary to produce the
576 flashing effect.
577
578 Thus, in the VDU 19 operation, the appropriate inversion value would be
579 calculated for the logical colour, and this value would then be combined with
580 other inversion values in a dedicated memory location corresponding to the
581 colour's group as indicated above. Meanwhile, the palette channel values would
582 be derived from the lower three bits of the specified physical colour and
583 combined with other palette data in dedicated memory locations corresponding
584 to the palette registers.
585
586 Interestingly, although flashing colours on the BBC Micro are controlled by
587 toggling bit 0 of the &FE20 control register location for the Video ULA, the
588 actual colour inversion is done in hardware.
589
590 Enhancement: Palette Definition Lists
591 -------------------------------------
592
593 It can be useful to redefine the palette in order to change the colours
594 available for a particular region of the screen, particularly in modes where
595 the choice of colours is constrained, and if an increased colour depth were
596 available, palette redefinition would be useful to give the illusion of more
597 than 16 colours in MODE 2. Traditionally, palette redefinition has been done
598 by using interrupt-driven timers, but a more efficient approach would involve
599 presenting lists of palette definitions to the ULA so that it can change the
600 palette at a particular display line.
601
602 One might define a palette redefinition list in a region of memory and then
603 communicate its contents to the ULA by writing the address and length of the
604 list, along with the display line at which the palette is to be changed, to
605 ULA registers such that the ULA buffers the list and performs the redefinition
606 at the appropriate time. Throughput/bandwidth considerations might impose
607 restrictions on the practical length of such a list, however.
608
609 Enhancement: Palette-Free Modes
610 -------------------------------
611
612 Palette-free modes might be defined where bit values directly correspond to
613 the red, green and blue channels, although this would mostly make sense only
614 for modes with depths greater than the standard 4 bits per pixel, and such
615 modes would require more memory than MODE 2 if they were to have an acceptable
616 resolution.
617
618 Enhancement: Display Suspend
619 ----------------------------
620
621 Especially when writing to the screen memory, it could be beneficial to be
622 able to suspend the ULA's access to the memory, instead producing blank values
623 for all screen pixels until a program is ready to reveal the screen. This is
624 different from palette blanking since with a blank palette, the ULA is still
625 reading screen memory and translating its contents into pixel values that end
626 up being blank.
627
628 This function is reminiscent of a capability of the ZX81, albeit necessary on
629 that hardware to reduce the load on the system CPU which was responsible for
630 producing the video output. By allowing display suspend on the Electron, the
631 performance benefit would be derived from giving the CPU full access to the
632 memory bandwidth.
633
634 The region blanking feature mentioned above could be implemented using this
635 enhancement instead of employing palette blanking for the affected lines of
636 the display.
637
638 Enhancement: Memory Filling
639 ---------------------------
640
641 A capability that could be given to an enhanced ULA is that of permitting the
642 ULA to write to screen memory as well being able to read from it. Although
643 such a capability would probably not be useful in conjunction with the
644 existing read operations when producing a screen display, and insufficient
645 bandwidth would exist to do so in high-bandwidth screen modes anyway, the
646 capability could be offered during a display suspend period (as described
647 above), permitting a more efficient mechanism to rapidly fill memory with a
648 predetermined value.
649
650 This capability could also support block filling, where the limits of the
651 filled memory would be defined by the position and size of a screen area,
652 although this would demand the provision of additional registers in the ULA to
653 retain the details of such areas and additional logic to control the fill
654 operation.
655
656 Enhancement: Region Filling
657 ---------------------------
658
659 An alternative to memory writing might involve indicating regions using
660 additional registers or memory where the ULA fills regions of the screen with
661 content instead of reading from memory. Unlike hardware sprites which should
662 realistically provide varied content, region filling could employ single
663 colours or patterns, and one advantage of doing so would be that the ULA need
664 not access memory at all within a particular region.
665
666 Regions would be defined on a row-by-row basis. Instead of reading memory and
667 blitting a direct representation to the screen, the ULA would read region
668 definitions containing a start column, region width and colour details. There
669 might be a certain number of definitions allowed per row, or the ULA might
670 just traverse an ordered list of such definitions with each one indicating the
671 row, start column, region width and colour details.
672
673 One could even compress this information further by requiring only the row,
674 start column and colour details with each subsequent definition terminating
675 the effect of the previous one. However, one would also need to consider the
676 convenience of preparing such definitions and whether efficient access to
677 definitions for a particular row might be desirable. It might also be
678 desirable to avoid having to prepare definitions for "empty" areas of the
679 screen, effectively making the definition of the screen contents employ
680 run-length encoding and employ only colour plus length information.
681
682 One application of region filling is that of simple 2D and 3D shape rendering.
683 Although it is entirely possible to plot such shapes to the screen and have
684 the ULA blit the memory contents to the screen, such operations consume
685 bandwidth both in the initial plotting and in the final transfer to the
686 screen. Region filling would reduce such bandwidth usage substantially.
687
688 This way of representing screen images would make certain kinds of images
689 unfeasible to represent - consider alternating single pixel values which could
690 easily occur in some character bitmaps - even if an internal queue of regions
691 were to be supported such that the ULA could read ahead and buffer such
692 "bandwidth intensive" areas. Thus, the ULA might be better served providing
693 this feature for certain areas of the display only as some kind of special
694 graphics window.
695
696 Enhancement: Hardware Sprites
697 -----------------------------
698
699 An enhanced ULA might provide hardware sprites, but this would be done in an
700 way that is incompatible with the standard ULA, since no &FE*X locations are
701 available for allocation. To keep the facility simple, hardware sprites would
702 have a standard byte width and height.
703
704 The specification of sprites could involve the reservation of 16 locations
705 (for example, &FE20-F) specifying a fixed number of eight sprites, with each
706 location pair referring to the sprite data. By limiting the ULA to dealing
707 with a fixed number of sprites, the work required inside the ULA would be
708 reduced since it would avoid having to deal with arbitrary numbers of sprites.
709
710 The principal limitation on providing hardware sprites is that of having to
711 obtain sprite data, given that the ULA is usually required to retrieve screen
712 data, and given the lack of memory bandwidth available to retrieve sprite data
713 (particularly from multiple sprites supposedly at the same position) and
714 screen data simultaneously. Although the ULA could potentially read sprite
715 data and screen data in alternate memory accesses in screen modes where the
716 bandwidth is not already fully utilised, this would result in a degradation of
717 performance.
718
719 Enhancement: Additional Screen Mode Configurations
720 --------------------------------------------------
721
722 Alternative screen mode configurations could be supported. The ULA has to
723 produce 640 pixel values across the screen, with pixel doubling or quadrupling
724 employed to fill the screen width:
725
726 Screen width Columns Scaling Depth Bytes
727 ------------ ------- ------- ----- -----
728 640 80 x1 1 80
729 320 40 x2 1, 2 40, 80
730 160 20 x4 2, 4 40, 80
731
732 It must also use at most 80 byte-sized memory accesses to provide the
733 information for the display. Given that characters must occupy an 8x8 pixel
734 array, if a configuration featuring anything other than 20, 40 or 80 character
735 columns is to be supported, compromises must be made such as the introduction
736 of blank pixels either between characters (such as occurs between rows in MODE
737 3 and 6) or at the end of a scanline (such as occurs at the end of the frame
738 in MODE 3 and 6). Consider the following configuration:
739
740 Screen width Columns Scaling Depth Bytes Blank
741 ------------ ------- ------- ----- ------ -----
742 208 26 x3 1, 2 26, 52 16
743
744 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
745 colours could be provided, with 16 blank pixel values (out of a total of 640)
746 generated either at the start or end (or split between the start and end) of
747 each scanline.
748
749 Enhancement: Character Attributes
750 ---------------------------------
751
752 The BBC Micro MODE 7 employs something resembling character attributes to
753 support teletext displays, but depends on circuitry providing a character
754 generator. The ZX Spectrum, on the other hand, provides character attributes
755 as a means of colouring bitmapped graphics. Although such a feature is very
756 limiting as the sole means of providing multicolour graphics, in situations
757 where the choice is between low resolution multicolour graphics or high
758 resolution monochrome graphics, character attributes provide a potentially
759 useful compromise.
760
761 For each byte read, the ULA must deliver 8 pixel values (out of a total of
762 640) to the video output, doing so by either emptying its pixel buffer on a
763 pixel per cycle basis, or by multiplying pixels and thus holding them for more
764 than one cycle. For example for a screen mode having 640 pixels in width:
765
766 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
767 Reads: B B
768 Pixels: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
769
770 And for a screen mode having 320 pixels in width:
771
772 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
773 Reads: B
774 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
775
776 However, in modes where less than 80 bytes are required to generate the pixel
777 values, an enhanced ULA might be able to read additional bytes between those
778 providing the bitmapped graphics data:
779
780 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
781 Reads: B A
782 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
783
784 These additional bytes could provide colour information for the bitmapped data
785 in the following character column (of 8 pixels). Since it would be desirable
786 to apply attribute data to the first column, the initial 8 cycles might be
787 configured to not produce pixel values.
788
789 For an entire character, attribute data need only be read for the first row of
790 pixels for a character. The subsequent rows would have attribute information
791 applied to them, although this would require the attribute data to be stored
792 in some kind of buffer. Thus, the following access pattern would be observed:
793
794 Cycle: A B ... _ B ... _ B ... _ B ... _ B ... _ B ... _ B ... _ B ...
795
796 A whole byte used for colour information for a whole character would result in
797 a choice of 256 colours, and this might be somewhat excessive. By only reading
798 attribute bytes at every other opportunity, a choice of 16 colours could be
799 applied individually to two characters.
800
801 Cycle: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
802 Reads: B A B -
803 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
804
805 Further reductions in attribute data access, offering 4 colours for every
806 character in a four character block, for example, might also be worth
807 considering.
808
809 Consider the following configurations for screen modes with a colour depth of
810 1 bit per pixel for bitmap information:
811
812 Screen width Columns Scaling Bytes (B) Bytes (A) Colours Screen start
813 ------------ ------- ------- --------- --------- ------- ------------
814 320 40 x2 40 40 256 &5300
815 320 40 x2 40 20 16 &5580 -> &5500
816 320 40 x2 40 10 4 &56C0 -> &5600
817 208 26 x3 26 26 256 &62C0 -> &6200
818 208 26 x3 26 13 16 &6460 -> &6400
819
820 Enhancement: MODE 7 Emulation using Character Attributes
821 --------------------------------------------------------
822
823 If the scheme of applying attributes to character regions were employed to
824 emulate MODE 7, in conjunction with the MODE 6 display technique, the
825 following configuration would be required:
826
827 Screen width Columns Rows Bytes (B) Bytes (A) Colours Screen start
828 ------------ ------- ---- --------- --------- ------- ------------
829 320 40 25 40 20 16 &5ECC -> &5E00
830 320 40 25 40 10 4 &5FC6 -> &5F00
831
832 Although this requires much more memory than MODE 7 (8500 bytes versus MODE
833 7's 1000 bytes), it does not need much more memory than MODE 6, and it would
834 at least make a limited 40-column multicolour mode available as a substitute
835 for MODE 7.
836
837 Enhancement: High Resolution Graphics and Mode Layouts
838 ------------------------------------------------------
839
840 Screen modes with different screen memory mappings, higher resolutions and
841 larger colour depths might be possible, but this would in most cases involve
842 the allocation of more screen memory, and the ULA would probably then be
843 obliged to page in such memory for the CPU to be able to sensibly access it
844 all. Merely changing the memory mappings in order to have Archimedes-style
845 row-oriented screen addresses (instead of character-oriented addresses) could
846 be done for the existing modes, but this might not be sufficiently beneficial,
847 especially since accessing regions of the screen would involve incrementing
848 pointers by amounts that are inconvenient on an 8-bit CPU.
849
850 Enhancement: Genlock Support
851 ----------------------------
852
853 The ULA generates a video signal in conjunction with circuitry producing the
854 output features necessary for the correct display of the screen image.
855 However, it appears that the ULA drives the video synchronisation mechanism
856 instead of reacting to an existing signal. Genlock support might be possible
857 if the ULA were made to be responsive to such external signals, resetting its
858 address generators upon receiving synchronisation events.
859
860 Enhancement: Improved Sound
861 ---------------------------
862
863 The standard ULA reserves &FE*6 for sound generation and cassette input/output
864 (with bits 1 and 2 of &FE*7 being used to select either sound generation or
865 cassette I/O), thus making it impossible to support multiple channels within
866 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
867 and an enhanced ULA could adopt this interface.
868
869 The BBC Micro uses the SN76489 chip to produce sound, and the entire
870 functionality of this chip could be emulated for enhanced sound, with a subset
871 of the functionality exposed via the &FE*6 interface.
872
873 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
874
875 Enhancement: Waveform Upload
876 ----------------------------
877
878 As with a hardware sprite function, waveforms could be uploaded or referenced
879 using locations as registers referencing memory regions.
880
881 Enhancement: Sound Input/Output
882 -------------------------------
883
884 Since the ULA already controls audio input/output for cassette-based data, it
885 would have been interesting to entertain the idea of sampling and output of
886 sounds through the cassette interface. However, a significant amount of
887 circuitry is employed to process the input signal for use by the ULA and to
888 process the output signal for recording.
889
890 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
891
892 Enhancement: BBC ULA Compatibility
893 ----------------------------------
894
895 Although some new ULA functions could be defined in a way that is also
896 compatible with the BBC Micro, the BBC ULA is itself incompatible with the
897 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
898 map, but controls various functions specific to the 6845 video controller;
899 &FE08-F is reserved for the serial controller. It therefore becomes possible
900 to disregard compatibility where compatibility is already disregarded for a
901 particular area of functionality.
902
903 &FE20-F maps to video ULA functionality on the BBC Micro which provides
904 control over the palette (using address &FE21, compared to &FE07-F on the
905 Electron) and other system-specific functions. Since the location usage is
906 generally incompatible, this region could be reused for other purposes.
907
908 Enhancement: Increased RAM, ULA and CPU Performance
909 ---------------------------------------------------
910
911 More modern implementations of the hardware might feature faster RAM coupled
912 with an increased ULA clock frequency in order to increase the bandwidth
913 available to the ULA and to the CPU in situations where the ULA is not needed
914 to perform work. A ULA employing a 32MHz clock would be able to complete the
915 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
916 to access the RAM for the following 250ns even in display modes requiring the
917 retrieval of a byte for the display every 500ns. The CPU could, subject to
918 timing issues, run at 2MHz even in MODE 0, 1 and 2.
919
920 A scheme such as that described above would have a similar effect to the
921 scheme employed in the BBC Micro, although the latter made use of RAM with a
922 wider bandwidth in order to complete memory transfers within 250ns and thus
923 permit the CPU to run continuously at 2MHz.
924
925 Higher bandwidth could potentially be used to implement exotic features such
926 as RAM-resident hardware sprites or indeed any feature demanding RAM access
927 concurrent with the production of the display image.
928
929 Enhancement: Multiple CPU Stacks
930 --------------------------------
931
932 The 6502 maintains a stack for subroutine calls and register storage in page
933 &01. Although the stack register can be manipulated using the TSX and TXS
934 instructions, thereby permitting the maintenance of multiple stack regions and
935 thus the potential coexistence of multiple programs each using a separate
936 region, only programs that make little use of the stack (perhaps avoiding
937 deeply-nested subroutine invocations and significant register storage) would
938 be able to coexist without overwriting each other's stacks.
939
940 One way that this issue could be alleviated would involve the provision of a
941 facility to redirect accesses to page &01 to other areas of memory. The ULA
942 would provide a register that defines a physical page for the use of the CPU's
943 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would
944 change the asserted address lines to redirect the access to the appropriate
945 physical region.
946
947 By providing an 8-bit register, mapping to the most significant byte (MSB) of
948 a 16-bit address, the ULA could then replace any MSB equal to &01 with the
949 register value before the access is made. Where multiple programs coexist,
950 upon switching programs, the register would be updated to point the ULA to the
951 appropriate stack location, thus providing a simple memory management unit
952 (MMU) capability.
953
954 ULA Pin Functions
955 -----------------
956
957 The functions of the ULA pins are described in the Electron Service Manual. Of
958 interest to video processing are the following:
959
960 CSYNC (low during horizontal or vertical synchronisation periods, high
961 otherwise)
962
963 HS (low during horizontal synchronisation periods, high otherwise)
964
965 RED, GREEN, BLUE (pixel colour outputs)
966
967 CLOCK IN (a 16MHz clock input, 4V peak to peak)
968
969 PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
970
971 More general memory access pins:
972
973 RAM0...RAM3 (data lines to/from the RAM)
974
975 RA0...RA7 (address lines for sending both row and column addresses to the RAM)
976
977 RAS (row address strobe setting the row address on a negative edge - see the
978 timing notes)
979
980 CAS (column address strobe setting the column address on a negative edge -
981 see the timing notes)
982
983 WE (sets write enable with logic 0, read with logic 1)
984
985 ROM (select data access from ROM)
986
987 CPU-oriented memory access pins:
988
989 A0...A15 (CPU address lines)
990
991 PD0...PD7 (CPU data lines)
992
993 R/W (indicates CPU write with logic 0, CPU read with logic 1)
994
995 Interrupt-related pins:
996
997 NMI (CPU request for uninterrupted 1MHz access to memory)
998
999 IRQ (signal event to CPU)
1000
1001 POR (power-on reset, resetting the ULA on a positive edge and asserting the
1002 CPU's RST pin)
1003
1004 RST (master reset for the CPU signalled on power-up and by the Break key)
1005
1006 Keyboard-related pins:
1007
1008 KBD0...KBD3 (keyboard inputs)
1009
1010 CAPS LOCK (control status LED)
1011
1012 Sound-related pins:
1013
1014 SOUND O/P (sound output using internal oscillator)
1015
1016 Cassette-related pins:
1017
1018 CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
1019
1020 CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
1021
1022 CAS RC (detect high tone)
1023
1024 CAS MO (motor relay output)
1025
1026 ÷13 IN (~1200 baud clock input)
1027
1028 ULA Socket
1029 ----------
1030
1031 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
1032
1033 References
1034 ----------
1035
1036 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
1037
1038 About this Document
1039 -------------------
1040
1041 The most recent version of this document and accompanying distribution should
1042 be available from the following location:
1043
1044 http://hgweb.boddie.org.uk/ULA
1045
1046 Copyright and licence information can be found in the docs directory of this
1047 distribution - see docs/COPYING.txt for more information.