1 The Acorn Electron ULA
2 ======================
3
4 Principal Design and Feature Constraints
5 ----------------------------------------
6
7 The features of the ULA are limited in sophistication by the amount of time
8 and resources that can be allocated to each activity supporting the
9 fundamental features and obligations of the unit. Maintaining a screen display
10 based on the contents of RAM itself requires the ULA to have exclusive access
11 to various hardware resources for a significant period of time.
12
13 Whilst other elements of the ULA can in principle run in parallel with the
14 display refresh activity, they cannot also access the RAM at the same time.
15 Consequently, other features that might use the RAM must accept a reduced
16 allocation of that resource in comparison to a hypothetical architecture where
17 concurrent RAM access is possible at all times.
18
19 Thus, the principal constraint for many features is bandwidth. The duration of
20 access to hardware resources is one aspect of this; the rate at which such
21 resources can be accessed is another. For example, the RAM is not fast enough
22 to support access more frequently than one byte per 2MHz cycle, and for screen
23 modes involving 80 bytes of screen data per scanline, there are no free cycles
24 for anything other than the production of pixel output during the active
25 scanline periods.
26
27 Another constraint is imposed by the method of RAM access provided by the ULA.
28 The ULA is able to access RAM by fetching 4 bits at a time and thus managing
29 to transfer 8 bits within a single 2MHz cycle, this being sufficient to
30 provide display data for the most demanding screen modes. However, this
31 mechanism's timing requirements are beyond the capabilities of the CPU when
32 running at 2MHz.
33
34 Consequently, the CPU will only ever be able to access RAM via the ULA at
35 1MHz, even when the ULA is not accessing the RAM. Fortunately, when needing to
36 refresh the display, the ULA is still able to make use of the idle part of
37 each 1MHz cycle (or, rather, the idle 2MHz cycle unused by the CPU) to itself
38 access the RAM at a rate of 1 byte per 1MHz cycle (or 1 byte every other 2MHz
39 cycle), thus supporting the less demanding screen modes.
40
41 Timing
42 ------
43
44 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
45 of which are used to generate pixel data. At 50Hz, this means that 128 cycles
46 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
47 312 ~= 128 cycles). This is consistent with the observation that each scanline
48 requires at most 80 bytes of data, and that the ULA is apparently busy for 40
49 out of 64 microseconds in each scanline.
50
51 (In fact, since the ULA is seeking to provide an image for an interlaced
52 625-line display, there are in fact two "fields" involved, one providing 312
53 scanlines and one providing 313 scanlines. See below for a description of the
54 video system.)
55
56 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
57 each providing two bits of each byte) using two cycles within the 500ns period
58 of the 2MHz clock to complete each access operation. Since the CPU and ULA
59 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
60 effectively run at 1MHz (since every other 500ns period involves the ULA
61 accessing RAM) during transfers of screen data.
62
63 The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided
64 by the ULA (IC1) depending on the screen mode in use. Each 16MHz cycle is
65 approximately 62.5ns. To access the memory, the following patterns
66 corresponding to 16MHz cycles are required:
67
68 Time (ns): 0-------------- 500------------- ...
69 2 MHz cycle: 0 1 ...
70 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
71 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
72 ~RAS: /---\___________/---\___________ ...
73 ~CAS: /-----\___/-\___/-----\___/-\___ ...
74 Address events: A B C A B C ...
75 Data events: F S F S ...
76
77 ~RAS ops: 1 0 1 0 ...
78 ~CAS ops: 1 0 1 0 1 0 1 0 ...
79
80 Address ops: a b c a b c ...
81 Data ops: s f s f ...
82
83 ~WE: ......W ...
84 PHI OUT: \_______________/--------------- ...
85 CPU (RAM): L D ...
86 RnW: R ...
87
88 PHI OUT: \_______/-------\_______/------- ...
89 CPU (ROM): L D L D ...
90 RnW: R R ...
91
92 ~RAS must be high for 100ns, ~CAS must be high for 50ns.
93 ~RAS must be low for 150ns, ~CAS must be low for 90ns.
94 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
95
96 Here, "A" and "B" respectively indicate the row and first column addresses
97 being latched into the RAM (on a negative edge for ~RAS and ~CAS
98 respectively), and "C" indicates the second column address being latched into
99 the RAM. Presumably, the first and second half-bytes can be read at "F" and
100 "S" respectively, and the row and column addresses must be made available at
101 "a" and "b" (and "c") respectively at the latest. Data can be read at "f" and
102 "s" for the first and second half-bytes respectively.
103
104 For the CPU, "L" indicates the point at which an address is taken from the CPU
105 address bus, on a negative edge of PHI OUT, with "D" being the point at which
106 data may either be read or be asserted for writing, on a positive edge of PHI
107 OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low
108 for writing or high for reading, and thus propagates RnW from the CPU, this
109 would need to be done before data would be retrieved and, according to the
110 TM4164EC4 datasheet, even as late as the column address is presented and ~CAS
111 brought low.
112
113 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
114 address access time of 90ns (maximum), which appears to mean that ~RAS must be
115 held low for at least 150ns and that ~CAS must be held low for at least 90ns
116 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
117 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
118 is 1.5 cycles.
119
120 Note that the Service Manual refers to the negative edge of RAS and CAS, but
121 the datasheet for the similar TM4164EC4 product shows latching on the negative
122 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
123 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
124 "page mode" provides the appropriate behaviour for that particular product.
125
126 The CPU, when accessing the RAM alone, apparently does not make use of the
127 vacated "slot" that the ULA would otherwise use (when interleaving accesses in
128 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
129 accessing ROM (and potentially sideways RAM). The principal limitation is the
130 amount of time needed between issuing an address and receiving an entire byte
131 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
132 4 cycles that would be required for 2MHz operation.
133
134 See: Acorn Electron Advanced User Guide
135 See: Acorn Electron Service Manual
136 http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
137 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
138 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
139
140 A Note on 8-Bit Wide RAM Access
141 -------------------------------
142
143 It is worth considering the timing when 8 bits of data can be obtained at once
144 from the RAM chips:
145
146 Time (ns): 0-------------- 500------------- ...
147 2 MHz cycle: 0 1 ...
148 8 MHz cycle: 0 1 2 3 0 1 2 3 ...
149 /-\_/-\_/-\_/-\_/-\_/-\_/-\_/-\_ ...
150 ~RAS: /---\___________/---\___________ ...
151 ~CAS: /-------\_______/-------\_______ ...
152 Address events: A B A B ...
153 Data events: E E ...
154
155 ~RAS ops: 1 0 1 0 ...
156 ~CAS ops: 1 0 1 0 ...
157
158 Address ops: a b a b ...
159 Data ops: f s f ...
160
161 ~WE: ........W ...
162 PHI OUT: \_______/-------\_______/------- ...
163 CPU: L D L D ...
164 RnW: R R ...
165
166 Here, "E" indicates the availability of an entire byte.
167
168 Since only one fetch is required per 2MHz cycle, instead of two fetches for
169 the 4-bit wide RAM arrangement, it seems likely that longer 8MHz cycles could
170 be used to coordinate the necessary signalling.
171
172 Another conceivable simplification from using an 8-bit wide RAM access channel
173 with a single access within each 2MHz cycle is the possibility of allowing the
174 CPU to signal directly to the RAM instead of having the ULA perform the access
175 signalling on the CPU's behalf.
176
177 CPU Clock Notes
178 ---------------
179
180 "The 6502 receives an external square-wave clock input signal on pin 37, which
181 is usually labeled PHI0. [...] This clock input is processed within the 6502
182 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2
183 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been
184 through two inverters and a push-pull amplifier. The same network of
185 transistors within the 6502 which generates PHI2 is also tied to PHI1, and
186 generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made
187 available to external devices is so that they know when they can access the
188 CPU. When PHI1 is high, this means that external devices can read from the
189 address bus or data bus; when PHI2 is high, this means that external devices
190 can write to the data bus."
191
192 See: http://lateblt.livejournal.com/88105.html
193
194 "The 6502 has a synchronous memory bus where the master clock is divided into
195 two phases (Phase 1 and Phase 2). The address is always generated during Phase
196 1 and all memory accesses take place during Phase 2."
197
198 See: http://www.jmargolin.com/vgens/vgens.htm
199
200 Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During
201 Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means
202 when PHI1 is high.
203
204 Bandwidth Figures
205 -----------------
206
207 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
208 total lines, with 80 cycles occurring in the active periods of display
209 scanlines, the following bandwidth calculations can be performed:
210
211 Total theoretical maximum:
212 128 cycles * 312 lines
213 = 39936 bytes
214
215 MODE 0, 1, 2:
216 ULA: 80 cycles * 256 lines
217 = 20480 bytes
218 CPU: 48 cycles / 2 * 256 lines
219 + 128 cycles / 2 * (312 - 256) lines
220 = 9728 bytes
221
222 MODE 3:
223 ULA: 80 cycles * 24 rows * 8 lines
224 = 15360 bytes
225 CPU: 48 cycles / 2 * 24 rows * 8 lines
226 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
227 = 12288 bytes
228
229 MODE 4, 5:
230 ULA: 40 cycles * 256 lines
231 = 10240 bytes
232 CPU: (40 cycles + 48 cycles / 2) * 256 lines
233 + 128 cycles / 2 * (312 - 256) lines
234 = 19968 bytes
235
236 MODE 6:
237 ULA: 40 cycles * 24 rows * 8 lines
238 = 7680 bytes
239 CPU: (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
240 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
241 = 19968 bytes
242
243 Here, the division of 2 for CPU accesses is performed to indicate that the CPU
244 only uses every other access opportunity even in uncontended periods. See the
245 2MHz RAM Access enhancement below for bandwidth calculations that consider
246 this limitation removed.
247
248 Video Timing
249 ------------
250
251 According to 8.7 in the Service Manual, and the PAL Wikipedia page,
252 approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
253 (including the "colour burst"), and 1.65µs for the "front porch", totalling
254 12.05µs and thus leaving 51.95µs for the active video signal for each
255 scanline. As the Service Manual suggests in the oscilloscope traces, the
256 display information is transmitted more or less centred within the active
257 video period since the ULA will only be providing pixel data for 40µs in each
258 scanline.
259
260 Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
261 each scanline can be divided into 1024 cycles, although only 640 at most are
262 actively used to provide pixel data. Pixel data production should only occur
263 within a certain period on each scanline, approximately 262 cycles after the
264 start of hsync:
265
266 active video period = 51.95µs
267 pixel data period = 40µs
268 total silent period = 51.95µs - 40µs = 11.95µs
269 silent periods (before and after) = 11.95µs / 2 = 5.975µs
270 hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
271 time before pixel data period = 10.4µs + 5.975µs = 16.375µs
272 pixel data period start cycle = 16.375µs / 62.5ns = 262
273
274 By choosing a number divisible by 8, the RAM access mechanism can be
275 synchronised with the pixel production. Thus, 256 is a more appropriate start
276 cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
277 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
278 document) occurs at cycle 0.
279
280 To summarise:
281
282 HS signal starts at cycle 0 on each horizontal scanline
283 HS signal ends approximately 4µs later at cycle 64
284 Pixel data starts approximately 12µs later at cycle 256
285
286 "Re: Electron Memory Contention" provides measurements that appear consistent
287 with these calculations.
288
289 The "vertical blanking period", meaning the period before picture information
290 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
291 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
292 lines. Thus, the first visible scanline on the first field of a frame occurs
293 half way through the 23rd scanline period measured from the start of vsync
294 (indicated by "V" in the diagrams below):
295
296 10 20 23
297 Line in frame: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
298 Line from 1: 0 22 3
299 Line on screen: .:::::VVVVV::::: 12233445566
300 |_________________________________________________|
301 25 line vertical blanking period
302
303 In the second field of a frame, the first visible scanline coincides with the
304 24th scanline period measured from the start of line 313 in the frame:
305
306 310 336
307 Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
308 Line from 313: 0 23 4
309 Line on screen: 88:::::VVVVV:::: 11223344
310 288 | |
311 |_________________________________________________|
312 25 line vertical blanking period
313
314 In order to consider only full lines, we might consider the start of each
315 frame to occur 23 lines after the start of vsync.
316
317 Again, it is likely that pixel data production should only occur on scanlines
318 within a certain period on each frame. The "625/50" document indicates that
319 only a certain region is "safe" to use, suggesting a vertically centred region
320 with approximately 15 blank lines above and below the picture. However, the
321 "PAL TV timing and voltages" document suggests 28 blank lines above and below
322 the picture. This would centre the 256 lines within the 312 lines of each
323 field and thus provide a start of picture approximately 5.5 or 5 lines after
324 the end of the blanking period or 28 or 27.5 lines after the start of vsync.
325
326 To summarise:
327
328 CSYNC signal starts at cycle 0
329 CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
330 Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
331
332 See: http://en.wikipedia.org/wiki/PAL
333 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
334 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
335 http://lipas.uwasa.fi/~f76998/video/modes/
336 See: PAL TV timing and voltages
337 http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
338 See: Line Standards
339 http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
340 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
341 http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
342 See: Re: Electron Memory Contention
343 http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
344
345 RAM Integrated Circuits
346 -----------------------
347
348 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
349 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
350 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
351 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
352 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
353
354 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
355 the Samsung-produced KM41464 series is apparently equivalent to the Texas
356 Instruments 4164 chips presumably used in the Electron.
357
358 The TM4164EC4 series combines 4 64K x 1b units into a single package and
359 appears similar to the TM4164EA4 featured on the Electron's circuit diagram
360 (in the Advanced User Guide but not the Service Manual), and it also has 22
361 pins providing 3 additional inputs and 3 additional outputs over the 16 pins
362 of the individual 4164-15 modules, presumably allowing concurrent access to
363 the packaged memory units.
364
365 As far as currently available replacements are concerned, the NTE4164 is a
366 potential candidate: according to the Vetco Electronics entry, it is
367 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
368 parts include the NTE2164 and the NTE6664, both of which appear to have
369 largely the same performance and connection characteristics. Meanwhile, the
370 NTE21256 appears to be a 16-pin replacement with four times the capacity that
371 maintains the single data input and output pins. Using the NTE21256 as a
372 replacement for all ICs combined would be difficult because of the single bit
373 output.
374
375 Another device equivalent to the 4164-15 appears to be available under the
376 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
377 site lists data sheets for other devices on the same page, but these are
378 different and actually appear to be provided under the 41574 product code (but
379 are listed under 41464-10) and appear to be replacements for the TM4164EC4:
380 the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
381 employing 4 pins for both input and output.
382
383 Pins I/O pins Row access Column access
384 ---- -------- ---------- -------------
385 TM4164EC4 22 4 + 4 150ns (15) 90ns (15)
386 KM41464AP 18 4 150ns (15) 75ns (15)
387 NTE21256 16 1 + 1 150ns 75ns
388 HYB 4164-2 16 1 + 1 150ns 100ns
389 µPD41464 18 4 120ns (12) 60ns (12)
390
391 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
392 http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf
393 See: Dynamic RAMS
394 http://www.unicornelectronics.com/IC/DYNAMIC.html
395 See: New old stock 8x 4164 chips
396 http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
397 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
398 http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
399 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
400 http://www.vetco.net/catalog/product_info.php?products_id=2806
401 See: NTE4164 - IC-NMOS 64K DRAM 150NS
402 http://www.vetco.net/catalog/product_info.php?products_id=3680
403 See: NTE21256 - IC-256K DRAM 150NS
404 http://www.vetco.net/catalog/product_info.php?products_id=2799
405 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
406 http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
407 See: NTE6664 - IC-MOS 64K DRAM 150NS
408 http://www.vetco.net/catalog/product_info.php?products_id=5213
409 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
410 http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
411 See: 4164-150: MAJOR BRANDS
412 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
413 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
414 http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
415 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
416 http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
417 See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
418 http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
419 See: 41464-10: MAJOR BRANDS
420 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
421
422 Interrupts
423 ----------
424
425 The ULA generates IRQs (maskable interrupts) according to certain conditions
426 and these conditions are controlled by location &FE00:
427
428 * Vertical sync (bottom of displayed screen)
429 * 50MHz real time clock
430 * Transmit data empty
431 * Receive data full
432 * High tone detect
433
434 The ULA is also used to clear interrupt conditions through location &FE05. Of
435 particular significance is bit 7, which must be set if an NMI (non-maskable
436 interrupt) has occurred and has thus suspended ULA access to memory, restoring
437 the normal function of the ULA.
438
439 ROM Paging
440 ----------
441
442 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
443 mappings exist:
444
445 8 keyboard
446 9 keyboard (duplicate)
447 10 BASIC ROM
448 11 BASIC ROM (duplicate)
449
450 Paging in a ROM involves the following procedure:
451
452 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
453 2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
454 selected.
455 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
456 whilst writing the desired ROM number n in bits 0 to 2.
457
458 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
459
460 Keyboard Access
461 ---------------
462
463 The keyboard pages appear to be accessed at 1MHz just like the RAM.
464
465 See: https://stardot.org.uk/forums/viewtopic.php?p=254155#p254155
466
467 Shadow/Expanded Memory
468 ----------------------
469
470 The Electron exposes all sixteen address lines and all eight data lines
471 through the expansion bus. Using such lines, it is possible to provide
472 additional memory - typically sideways ROM and RAM - on expansion cards and
473 through cartridges, although the official cartridge specification provides
474 fewer address lines and only seeks to provide access to memory in 16K units.
475
476 Various modifications and upgrades were developed to offer "turbo"
477 capabilities to the Electron, permitting the CPU to access a separate 8K of
478 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
479 the ULA through additional logic. However, an enhanced ULA might support
480 independent CPU access to memory over the expansion bus by allowing itself to
481 be discharged from providing access to memory, potentially for a range of
482 addresses, and for the CPU to communicate with external memory uninterrupted.
483
484 Sideways RAM/ROM and Upper Memory Access
485 ----------------------------------------
486
487 Although the ULA controls the CPU clock, effectively slowing or stopping the
488 CPU when the ULA needs to access screen memory, it is apparently able to allow
489 the CPU to access addresses of &8000 and above - the upper region of memory -
490 at 2MHz independently of any access to RAM that the ULA might be performing,
491 only blocking the CPU if it attempts to access addresses of &7FFF and below
492 during any ULA memory access - the lower region of memory - by stopping or
493 stalling its clock.
494
495 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
496 CPU clock if the line goes low, when the CPU is attempting to access the lower
497 region of memory.
498
499 Hardware Scrolling (and Enhancement)
500 ------------------------------------
501
502 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
503 the least significant 5 bits being zero, thus limiting the scrolling
504 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
505 using the same layout of these addresses.
506
507 |--&FE02--------------| |--&FE03--------------|
508 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
509
510 XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
511
512 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
513 memory to pixel locations is character oriented. A change in 8 bytes would
514 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
515 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
516 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
517 Guide).
518
519 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
520 of changing the screen address by 2 bytes is the change in the number of lines
521 from the initial and final character rows that need reading by the ULA, which
522 would need to maintain this state information (although this is a relatively
523 trivial change). Another pitfall is the complication that might be introduced
524 to software writing bitmaps of character height to the screen.
525
526 See: http://pastraiser.com/computers/acornelectron/acornelectron.html
527
528 Enhancement: Mode Layouts
529 -------------------------
530
531 Merely changing the screen memory mappings in order to have Archimedes-style
532 row-oriented screen addresses (instead of character-oriented addresses) could
533 be done for the existing modes, but this might not be sufficiently beneficial,
534 especially since accessing regions of the screen would involve incrementing
535 pointers by amounts that are inconvenient on an 8-bit CPU.
536
537 However, instead of using a Archimedes-style mapping, column-oriented screen
538 addresses could be more feasibly employed: incrementing the address would
539 reference the vertical screen location below the currently-referenced location
540 (just as occurs within characters using the existing ULA); instead of
541 returning to the top of the character row and referencing the next horizontal
542 location after eight bytes, the address would reference the next character row
543 and continue to reference locations downwards over the height of the screen
544 until reaching the bottom; at the bottom, the next location would be the next
545 horizontal location at the top of the screen.
546
547 In other words, the memory layout for the screen would resemble the following
548 (for MODE 2):
549
550 &3000 &3100 ... &7F00
551 &3001 &3101
552 ... ...
553 &3007
554 &3008
555 ...
556 ... ...
557 &30FF ... &7FFF
558
559 Since there are 256 pixel rows, each column of locations would be addressable
560 using the low byte of the address. Meanwhile, the high byte would be
561 incremented to address different columns. Thus, addressing screen locations
562 would become a lot more convenient and potentially much more efficient for
563 certain kinds of graphical output.
564
565 One potential complication with this simplified addressing scheme arises with
566 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
567 with the existing ULA) would be achieved by incrementing or decrementing the
568 screen start address; by one character row, it would involve adding or
569 subtracting 8. However, the ULA only supports multiples of 64 when changing the
570 screen start address. Thus, if such a scheme were to be adopted, three
571 additional bits would need to be supported in the screen start register (see
572 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal
573 scrolling would be much improved even under the severe constraints of the
574 existing ULA: only adjustments of 256 to the screen start address would be
575 required to produce single-location scrolling of as few as two pixels in MODE 2
576 (four pixels in MODEs 1 and 5, eight pixels otherwise).
577
578 More disruptive is the effect of this alternative layout on software.
579 Presumably, compatibility with the BBC Micro was the primary goal of the
580 Electron's hardware design. With the character-oriented screen layout in
581 place, system software (and application software accessing the screen
582 directly) would be relying on this layout to run on the Electron with little
583 or no modification. Although it might have been possible to change the system
584 software to use this column-oriented layout instead, this would have incurred
585 a development cost and caused additional work porting things like games to the
586 Electron. Moreover, a separate branch of the software from that supporting the
587 BBC Micro and closer derivatives would then have needed maintaining.
588
589 The decision to use the character-oriented layout in the BBC Micro may have
590 been related to the choice of circuitry and to facilitate a convenient
591 hardware implementation, and by the time the Electron was planned, it was too
592 late to do anything about this somewhat unfortunate choice.
593
594 Pixel Layouts
595 -------------
596
597 The pixel layouts are as follows:
598
599 Modes Depth (bpp) Pixels (from bits)
600 ----- ----------- ------------------
601 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
602 1, 5 2 73 62 51 40
603 2 4 7531 6420
604
605 Since the ULA reads a half-byte at a time, one might expect it to attempt to
606 produce pixels for every half-byte, as opposed to handling entire bytes.
607 However, the pixel layout is not conducive to producing pixels as soon as a
608 half-byte has been read for a given full-byte location: in 1bpp modes the
609 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
610 data is spread across the entire byte in different ways.
611
612 An alternative arrangement might be as follows:
613
614 Modes Depth (bpp) Pixels (from bits)
615 ----- ----------- ------------------
616 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
617 1, 5 2 76 54 32 10
618 2 4 7654 3210
619
620 Just as the mode layouts were presumably decided by compatibility with the BBC
621 Micro, the pixel layouts will have been maintained for similar reasons.
622 Unfortunately, this layout prevents any optimisation of the ULA for handling
623 half-byte pixel data generally.
624
625 Enhancement: The Missing MODE 4
626 -------------------------------
627
628 The Electron inherits its screen mode selection from the BBC Micro, where MODE
629 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
630 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
631 however, and they are merely implemented by skipping two scanlines in every
632 ten after the eight required to produce a character line. Thus, such modes
633 provide a 24-row display.
634
635 In principle, nothing prevents this "text mode" effect being applied to other
636 modes. The 20-column modes are not well-suited to displaying text, which
637 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
638 2. Although the need for a non-monochrome 40-column text mode is addressed by
639 MODE 7 on the BBC Micro, the Electron lacks such a mode.
640
641 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
642 would occupy MODE 4 instead of the current MODE 4:
643
644 Screen mode Size (kilobytes) Colours Rows Resolution
645 ----------- ---------------- ------- ---- ----------
646 0 20 2 32 640x256
647 1 20 4 32 320x256
648 2 20 16 32 160x256
649 3 16 2 24 640x256
650 4 (new) 16 4 24 320x256
651 4 (old) 10 2 32 320x256
652 5 10 4 32 160x256
653 6 8 2 24 320x256
654
655 Thus, for increasing mode numbers, the size of each mode would be the same or
656 less than the preceding mode.
657
658 Enhancement: 2MHz RAM Access
659 ----------------------------
660
661 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
662 when not competing with the ULA only accesses RAM every other 2MHz cycle (as
663 if the ULA still needed to access the RAM), one useful enhancement would be a
664 mechanism to let the CPU take over the ULA cycles outside the ULA's period of
665 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
666 3.
667
668 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
669
670 Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
671 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
672
673 In MODE 4 to 6:
674
675 Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
676 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
677
678 This would improve CPU bandwidth as follows:
679
680 Standard ULA Enhanced ULA % Total Bandwidth Speedup
681 MODE 0, 1, 2 9728 bytes 19456 bytes 24% -> 49% 2
682 MODE 3 12288 bytes 24576 bytes 31% -> 62% 2
683 MODE 4, 5 19968 bytes 29696 bytes 50% -> 74% 1.5
684 MODE 6 19968 bytes 32256 bytes 50% -> 81% 1.6
685
686 (Here, the uncontended total 2MHz bandwidth for a display period would be
687 39936 bytes, being 128 cycles per line over 312 lines.)
688
689 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
690 because all access opportunities to RAM are doubled. Meanwhile, in the other
691 modes, some CPU accesses occur alongside ULA accesses and thus cannot be
692 doubled, but the CPU bandwidth increase is still significant.
693
694 Unfortunately, the mechanism for accessing the RAM is too slow to provide data
695 within the time constraints of 2MHz operation. There is no time remaining in a
696 2MHz cycle for the CPU to receive and process any retrieved data once the
697 necessary signalling has been performed. The only way for the CPU to be able
698 to access the RAM quickly enough would be to do away with the double 4-bit
699 access mechanism and to have a single 8-bit channel to the memory. This would
700 require twice as many 1-bit RAM chips or a different kind of RAM chip, but it
701 would also potentially simplify the ULA.
702
703 Enhancement: Region Blanking
704 ----------------------------
705
706 The problem of permitting character-oriented blitting in programs whilst
707 scrolling the screen by sub-character amounts could be mitigated by permitting
708 a region of the display to be blank, such as the final lines of the display.
709 Consider the following vertical scrolling by 2 bytes that would cause an
710 initial character row of 6 lines and a final character row of 2 lines:
711
712 6 lines - initial, partial character row
713 248 lines - 31 complete rows
714 2 lines - final, partial character row
715
716 If a routine were in use that wrote 8 line bitmaps to the partial character
717 row now split in two, it would be advisable to hide one of the regions in
718 order to prevent content appearing in the wrong place on screen (such as
719 content meant to appear at the top "leaking" onto the bottom). Blanking 6
720 lines would be sufficient, as can be seen from the following cases.
721
722 Scrolling up by 2 lines:
723
724 6 lines - initial, partial character row
725 240 lines - 30 complete rows
726 4 lines - part of 1 complete row
727 -----------------------------------------------------------------
728 4 lines - part of 1 complete row (hidden to maintain 250 lines)
729 2 lines - final, partial character row (hidden)
730
731 Scrolling down by 2 lines:
732
733 2 lines - initial, partial character row
734 248 lines - 31 complete rows
735 ----------------------------------------------------------
736 6 lines - final, partial character row (hidden)
737
738 Thus, in this case, region blanking would impose a 250 line display with the
739 bottom 6 lines blank.
740
741 See the description of the display suspend enhancement for a more efficient
742 way of blanking lines than merely blanking the palette whilst allowing the CPU
743 to perform useful work during the blanking period.
744
745 To control the blanking or suspending of lines at the top and bottom of the
746 display, a memory location could be dedicated to the task: the upper 4 bits
747 could define a blanking region of up to 16 lines at the top of the screen,
748 whereas the lower 4 bits could define such a region at the bottom of the
749 screen. If more lines were required, two locations could be employed, allowing
750 the top and bottom regions to occupy the entire screen.
751
752 Enhancement: Screen Height Adjustment
753 -------------------------------------
754
755 The height of the screen could be configurable in order to reduce screen
756 memory consumption. This is not quite done in MODE 3 and 6 since the start of
757 the screen appears to be rounded down to the nearest page, but by reducing the
758 height by amounts more than a page, savings would be possible. For example:
759
760 Screen width Depth Height Bytes per line Saving in bytes Start address
761 ------------ ----- ------ -------------- --------------- -------------
762 640 1 252 80 320 &3140 -> &3100
763 640 1 248 80 640 &3280 -> &3200
764 320 1 240 40 640 &5A80 -> &5A00
765 320 2 240 80 1280 &3500
766
767 Screen Mode Selection
768 ---------------------
769
770 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
771 range of modes, the other bits of &FE*7 (related to sound, cassette
772 input/output and the Caps Lock LED) would need to be reassigned and bit 0
773 potentially being made available for use.
774
775 Enhancement: Palette Definition
776 -------------------------------
777
778 Since all memory accesses go via the ULA, an enhanced ULA could employ more
779 specific addresses than &FE*X to perform enhanced functions. For example, the
780 palette control is done using &FE*8-F and merely involves selecting predefined
781 colours, whereas an enhanced ULA could support the redefinition of all 16
782 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
783 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour
784 specifications similar to those used on the Archimedes.
785
786 The principal limitation here is actually the hardware: the Electron has only
787 a single output line for each of the red, green and blue channels, and if
788 those outputs are strictly digital and can only be set to a "high" and "low"
789 value, then only the existing eight colours are possible. If a modern ULA were
790 able to output analogue values (or values at well-defined points between the
791 high and low values, such as the half-on value supported by the Amstrad CPC
792 series), it would still need to be assessed whether the circuitry could
793 successfully handle and propagate such values. Various sources indicate that
794 only "TTL levels" are supported by the RGB output circuit, and since there are
795 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
796 is likely that the ULA is expected to provide only "high" or "low" values.
797
798 Short of adding extra outputs from the ULA (either additional red, green and
799 blue outputs or a combined intensity output), another approach might involve
800 some kind of modulation where an output value might be encoded in multiple
801 pulses at a higher frequency than the pixel frequency. However, this would
802 demand additional circuitry outside the ULA, and component RGB monitors would
803 probably not be able to take advantage of this feature; only UHF and composite
804 video devices (the latter with the composite video colour support enabled on
805 the Electron's circuit board) would potentially benefit.
806
807 Flashing Colours
808 ----------------
809
810 According to the Advanced User Guide, "The cursor and flashing colours are
811 entirely generated in software: This means that all of the logical to physical
812 colour map must be changed to cause colours to flash." This appears to suggest
813 that the palette registers must be updated upon the flash counter - read and
814 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
815 colour pairs to be any combination of colours might be possible, instead of
816 having colour complements as pairs.
817
818 It is conceivable that the interrupt code responsible does the simple thing
819 and merely inverts the current values for any logical colours (LC) for which
820 the associated physical colour (as supplied as the second parameter to the VDU
821 19 call) has the top bit of its four bit value set. These top bits are not
822 recorded in the palette registers but are presumably recorded separately and
823 used to build bitmaps as follows:
824
825 LC 2 colour 4 colour 16 colour 4-bit value for inversion
826 -- -------- -------- --------- -------------------------
827 0 00010001 00010001 00010001 1, 1, 1
828 1 01000100 00100010 00010001 4, 2, 1
829 2 01000100 00100010 4, 2
830 3 10001000 00100010 8, 2
831 4 00010001 1
832 5 00010001 1
833 6 00100010 2
834 7 00100010 2
835 8 01000100 4
836 9 01000100 4
837 10 10001000 8
838 11 10001000 8
839 12 01000100 4
840 13 01000100 4
841 14 10001000 8
842 15 10001000 8
843
844 Inversion value calculation:
845
846 2 colour formula: 1 << (colour * 2)
847 4 colour formula: 1 << colour
848 16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
849
850 For example, where logical colour 0 has been mapped to a physical colour in
851 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
852 the inversion operation. (The lower three bits of the physical colour would be
853 used to set the underlying colour information affected by the inversion
854 operation.)
855
856 An operation in the interrupt code would then combine the bitmaps for all
857 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
858 combined for groups of logical colours as follows:
859
860 Logical colours
861 ---------------
862 0, 2, 8, 10
863 4, 6, 12, 14
864 5, 7, 13, 15
865 1, 3, 9, 11
866
867 These combined bitmaps would be EORed with the existing palette register
868 values in order to perform the value inversion necessary to produce the
869 flashing effect.
870
871 Thus, in the VDU 19 operation, the appropriate inversion value would be
872 calculated for the logical colour, and this value would then be combined with
873 other inversion values in a dedicated memory location corresponding to the
874 colour's group as indicated above. Meanwhile, the palette channel values would
875 be derived from the lower three bits of the specified physical colour and
876 combined with other palette data in dedicated memory locations corresponding
877 to the palette registers.
878
879 Interestingly, although flashing colours on the BBC Micro are controlled by
880 toggling bit 0 of the &FE20 control register location for the Video ULA, the
881 actual colour inversion is done in hardware.
882
883 Enhancement: Palette Definition Lists
884 -------------------------------------
885
886 It can be useful to redefine the palette in order to change the colours
887 available for a particular region of the screen, particularly in modes where
888 the choice of colours is constrained, and if an increased colour depth were
889 available, palette redefinition would be useful to give the illusion of more
890 than 16 colours in MODE 2. Traditionally, palette redefinition has been done
891 by using interrupt-driven timers, but a more efficient approach would involve
892 presenting lists of palette definitions to the ULA so that it can change the
893 palette at a particular display line.
894
895 One might define a palette redefinition list in a region of memory and then
896 communicate its contents to the ULA by writing the address and length of the
897 list, along with the display line at which the palette is to be changed, to
898 ULA registers such that the ULA buffers the list and performs the redefinition
899 at the appropriate time. Throughput/bandwidth considerations might impose
900 restrictions on the practical length of such a list, however.
901
902 Enhancement: Display Synchronisation Interrupts
903 -----------------------------------------------
904
905 When completing each scanline of the display, the ULA could trigger an
906 interrupt. Since this might impact system performance substantially, the
907 feature would probably need to be configurable, and it might be sufficient to
908 have an interrupt only after a certain number of display lines instead.
909 Permitting the CPU to take action after eight lines would allow palette
910 switching and other effects to occur on a character row basis.
911
912 The ULA provides an interrupt at the end of the display period, presumably so
913 that software can schedule updates to the screen, avoid flickering or tearing,
914 and so on. However, some applications might benefit from an interrupt at, or
915 just before, the start of the display period so that palette modifications or
916 similar effects could be scheduled.
917
918 Enhancement: Palette-Free Modes
919 -------------------------------
920
921 Palette-free modes might be defined where bit values directly correspond to
922 the red, green and blue channels, although this would mostly make sense only
923 for modes with depths greater than the standard 4 bits per pixel, and such
924 modes would require more memory than MODE 2 if they were to have an acceptable
925 resolution.
926
927 Enhancement: Display Suspend
928 ----------------------------
929
930 Especially when writing to the screen memory, it could be beneficial to be
931 able to suspend the ULA's access to the memory, instead producing blank values
932 for all screen pixels until a program is ready to reveal the screen. This is
933 different from palette blanking since with a blank palette, the ULA is still
934 reading screen memory and translating its contents into pixel values that end
935 up being blank.
936
937 This function is reminiscent of a capability of the ZX81, albeit necessary on
938 that hardware to reduce the load on the system CPU which was responsible for
939 producing the video output. By allowing display suspend on the Electron, the
940 performance benefit would be derived from giving the CPU full access to the
941 memory bandwidth.
942
943 The region blanking feature mentioned above could be implemented using this
944 enhancement instead of employing palette blanking for the affected lines of
945 the display.
946
947 Enhancement: Memory Filling
948 ---------------------------
949
950 A capability that could be given to an enhanced ULA is that of permitting the
951 ULA to write to screen memory as well being able to read from it. Although
952 such a capability would probably not be useful in conjunction with the
953 existing read operations when producing a screen display, and insufficient
954 bandwidth would exist to do so in high-bandwidth screen modes anyway, the
955 capability could be offered during a display suspend period (as described
956 above), permitting a more efficient mechanism to rapidly fill memory with a
957 predetermined value.
958
959 This capability could also support block filling, where the limits of the
960 filled memory would be defined by the position and size of a screen area,
961 although this would demand the provision of additional registers in the ULA to
962 retain the details of such areas and additional logic to control the fill
963 operation.
964
965 Enhancement: Region Filling
966 ---------------------------
967
968 An alternative to memory writing might involve indicating regions using
969 additional registers or memory where the ULA fills regions of the screen with
970 content instead of reading from memory. Unlike hardware sprites which should
971 realistically provide varied content, region filling could employ single
972 colours or patterns, and one advantage of doing so would be that the ULA need
973 not access memory at all within a particular region.
974
975 Regions would be defined on a row-by-row basis. Instead of reading memory and
976 blitting a direct representation to the screen, the ULA would read region
977 definitions containing a start column, region width and colour details. There
978 might be a certain number of definitions allowed per row, or the ULA might
979 just traverse an ordered list of such definitions with each one indicating the
980 row, start column, region width and colour details.
981
982 One could even compress this information further by requiring only the row,
983 start column and colour details with each subsequent definition terminating
984 the effect of the previous one. However, one would also need to consider the
985 convenience of preparing such definitions and whether efficient access to
986 definitions for a particular row might be desirable. It might also be
987 desirable to avoid having to prepare definitions for "empty" areas of the
988 screen, effectively making the definition of the screen contents employ
989 run-length encoding and employ only colour plus length information.
990
991 One application of region filling is that of simple 2D and 3D shape rendering.
992 Although it is entirely possible to plot such shapes to the screen and have
993 the ULA blit the memory contents to the screen, such operations consume
994 bandwidth both in the initial plotting and in the final transfer to the
995 screen. Region filling would reduce such bandwidth usage substantially.
996
997 This way of representing screen images would make certain kinds of images
998 unfeasible to represent - consider alternating single pixel values which could
999 easily occur in some character bitmaps - even if an internal queue of regions
1000 were to be supported such that the ULA could read ahead and buffer such
1001 "bandwidth intensive" areas. Thus, the ULA might be better served providing
1002 this feature for certain areas of the display only as some kind of special
1003 graphics window.
1004
1005 Enhancement: Hardware Sprites
1006 -----------------------------
1007
1008 An enhanced ULA might provide hardware sprites, but this would be done in an
1009 way that is incompatible with the standard ULA, since no &FE*X locations are
1010 available for allocation. To keep the facility simple, hardware sprites would
1011 have a standard byte width and height.
1012
1013 The specification of sprites could involve the reservation of 16 locations
1014 (for example, &FE20-F) specifying a fixed number of eight sprites, with each
1015 location pair referring to the sprite data. By limiting the ULA to dealing
1016 with a fixed number of sprites, the work required inside the ULA would be
1017 reduced since it would avoid having to deal with arbitrary numbers of sprites.
1018
1019 The principal limitation on providing hardware sprites is that of having to
1020 obtain sprite data, given that the ULA is usually required to retrieve screen
1021 data, and given the lack of memory bandwidth available to retrieve sprite data
1022 (particularly from multiple sprites supposedly at the same position) and
1023 screen data simultaneously. Although the ULA could potentially read sprite
1024 data and screen data in alternate memory accesses in screen modes where the
1025 bandwidth is not already fully utilised, this would result in a degradation of
1026 performance.
1027
1028 Enhancement: Additional Screen Mode Configurations
1029 --------------------------------------------------
1030
1031 Alternative screen mode configurations could be supported. The ULA has to
1032 produce 640 pixel values across the screen, with pixel doubling or quadrupling
1033 employed to fill the screen width:
1034
1035 Screen width Columns Scaling Depth Bytes
1036 ------------ ------- ------- ----- -----
1037 640 80 x1 1 80
1038 320 40 x2 1, 2 40, 80
1039 160 20 x4 2, 4 40, 80
1040
1041 It must also use at most 80 byte-sized memory accesses to provide the
1042 information for the display. Given that characters must occupy an 8x8 pixel
1043 array, if a configuration featuring anything other than 20, 40 or 80 character
1044 columns is to be supported, compromises must be made such as the introduction
1045 of blank pixels either between characters (such as occurs between rows in MODE
1046 3 and 6) or at the end of a scanline (such as occurs at the end of the frame
1047 in MODE 3 and 6). Consider the following configuration:
1048
1049 Screen width Columns Scaling Depth Bytes Blank
1050 ------------ ------- ------- ----- ------ -----
1051 208 26 x3 1, 2 26, 52 16
1052
1053 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
1054 colours could be provided, with 16 blank pixel values (out of a total of 640)
1055 generated either at the start or end (or split between the start and end) of
1056 each scanline.
1057
1058 Enhancement: Character Attributes
1059 ---------------------------------
1060
1061 The BBC Micro MODE 7 employs something resembling character attributes to
1062 support teletext displays, but depends on circuitry providing a character
1063 generator. The ZX Spectrum, on the other hand, provides character attributes
1064 as a means of colouring bitmapped graphics. Although such a feature is very
1065 limiting as the sole means of providing multicolour graphics, in situations
1066 where the choice is between low resolution multicolour graphics or high
1067 resolution monochrome graphics, character attributes provide a potentially
1068 useful compromise.
1069
1070 For each byte read, the ULA must deliver 8 pixel values (out of a total of
1071 640) to the video output, doing so by either emptying its pixel buffer on a
1072 pixel per cycle basis, or by multiplying pixels and thus holding them for more
1073 than one cycle. For example for a screen mode having 640 pixels in width:
1074
1075 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1076 Reads: B B
1077 Pixels: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
1078
1079 And for a screen mode having 320 pixels in width:
1080
1081 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1082 Reads: B
1083 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1084
1085 However, in modes where less than 80 bytes are required to generate the pixel
1086 values, an enhanced ULA might be able to read additional bytes between those
1087 providing the bitmapped graphics data:
1088
1089 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1090 Reads: B A
1091 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1092
1093 These additional bytes could provide colour information for the bitmapped data
1094 in the following character column (of 8 pixels). Since it would be desirable
1095 to apply attribute data to the first column, the initial 8 cycles might be
1096 configured to not produce pixel values.
1097
1098 For an entire character, attribute data need only be read for the first row of
1099 pixels for a character. The subsequent rows would have attribute information
1100 applied to them, although this would require the attribute data to be stored
1101 in some kind of buffer. Thus, the following access pattern would be observed:
1102
1103 Reads: A B _ B _ B _ B _ B _ B _ B _ B ...
1104
1105 In modes 3 and 6, the blank display lines could be used to retrieve attribute
1106 data:
1107
1108 Reads (blank): A _ A _ A _ A _ A _ A _ A _ A _ ...
1109 Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ...
1110 Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ...
1111 ...
1112
1113 See below for a discussion of using this for character data as well.
1114
1115 A whole byte used for colour information for a whole character would result in
1116 a choice of 256 colours, and this might be somewhat excessive. By only reading
1117 attribute bytes at every other opportunity, a choice of 16 colours could be
1118 applied individually to two characters.
1119
1120 Cycle: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1121 Reads: B A B -
1122 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1123
1124 Further reductions in attribute data access, offering 4 colours for every
1125 character in a four character block, for example, might also be worth
1126 considering.
1127
1128 Consider the following configurations for screen modes with a colour depth of
1129 1 bit per pixel for bitmap information:
1130
1131 Screen width Columns Scaling Bytes (B) Bytes (A) Colours Screen start
1132 ------------ ------- ------- --------- --------- ------- ------------
1133 320 40 x2 40 40 256 &5300
1134 320 40 x2 40 20 16 &5580 -> &5500
1135 320 40 x2 40 10 4 &56C0 -> &5600
1136 208 26 x3 26 26 256 &62C0 -> &6200
1137 208 26 x3 26 13 16 &6460 -> &6400
1138
1139 Enhancement: Text-Only Modes using Character and Attribute Data
1140 ---------------------------------------------------------------
1141
1142 In modes 3 and 6, the blank display lines could be used to retrieve character
1143 and attribute data instead of trying to insert it between bitmap data accesses,
1144 but this data would then need to be retained:
1145
1146 Reads: A C A C A C A C A C A C A C A C ...
1147 Reads: B _ B _ B _ B _ B _ B _ B _ B _ ...
1148
1149 Only attribute (A) and character (C) reads would require screen memory
1150 storage. Bitmap data reads (B) would involve either accesses to memory to
1151 obtain character definition details or could, at the cost of special storage
1152 in the ULA, involve accesses within the ULA that would then free up the RAM.
1153 However, the CPU would not benefit from having any extra access slots due to
1154 the limitations of the RAM access mechanism.
1155
1156 A scheme without caching might be possible. The same line of memory addresses
1157 might be visited over and over again for eight display lines, with an index
1158 into the bitmap data being incremented from zero to seven. The access patterns
1159 would look like this:
1160
1161 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 0)
1162 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 1)
1163 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 2)
1164 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 3)
1165 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 4)
1166 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 5)
1167 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 6)
1168 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 7)
1169
1170 The bandwidth requirements would be the sum of the accesses to read the
1171 character values (repeatedly) and those to read the bitmap data to reproduce
1172 the characters on screen.
1173
1174 Enhancement: MODE 7 Emulation using Character Attributes
1175 --------------------------------------------------------
1176
1177 If the scheme of applying attributes to character regions were employed to
1178 emulate MODE 7, in conjunction with the MODE 6 display technique, the
1179 following configuration would be required:
1180
1181 Screen width Columns Rows Bytes (B) Bytes (A) Colours Screen start
1182 ------------ ------- ---- --------- --------- ------- ------------
1183 320 40 25 40 20 16 &5ECC -> &5E00
1184 320 40 25 40 10 4 &5FC6 -> &5F00
1185
1186 Although this requires much more memory than MODE 7 (8500 bytes versus MODE
1187 7's 1000 bytes), it does not need much more memory than MODE 6, and it would
1188 at least make a limited 40-column multicolour mode available as a substitute
1189 for MODE 7.
1190
1191 Using the text-only enhancement with caching of data or with repeated reads of
1192 the same character data line for eight display lines, the storage requirements
1193 would be diminished substantially:
1194
1195 Screen width Columns Rows Bytes (C) Bytes (A) Colours Screen start
1196 ------------ ------- ---- --------- --------- ------- ------------
1197 320 40 25 40 20 16 &7A94 -> &7A00
1198 320 40 25 40 10 4 &7B1E -> &7B00
1199 320 40 25 40 5 2 &7B9B -> &7B00
1200 320 40 25 40 0 (2) &7C18 -> &7C00
1201 640 80 25 80 40 16 &7448 -> &7400
1202 640 80 25 80 20 4 &763C -> &7600
1203 640 80 25 80 10 2 &7736 -> &7700
1204 640 80 25 80 0 (2) &7830 -> &7800
1205
1206 Note that the colours describe the locally defined attributes for each
1207 character. When no attribute information is provided, the colours are defined
1208 globally.
1209
1210 Enhancement: Compressed Character Data
1211 --------------------------------------
1212
1213 Another observation about text-only modes is that they only need to store a
1214 restricted set of bitmapped data values. Encoding this set of values in a
1215 smaller unit of storage than a byte could possibly help to reduce the amount
1216 of storage and bandwidth required to reproduce the characters on the display.
1217
1218 Enhancement: High Resolution Graphics
1219 -------------------------------------
1220
1221 Screen modes with higher resolutions and larger colour depths might be
1222 possible, but this would in most cases involve the allocation of more screen
1223 memory, and the ULA would probably then be obliged to page in such memory for
1224 the CPU to be able to sensibly access it all.
1225
1226 Enhancement: Genlock Support
1227 ----------------------------
1228
1229 The ULA generates a video signal in conjunction with circuitry producing the
1230 output features necessary for the correct display of the screen image.
1231 However, it appears that the ULA drives the video synchronisation mechanism
1232 instead of reacting to an existing signal. Genlock support might be possible
1233 if the ULA were made to be responsive to such external signals, resetting its
1234 address generators upon receiving synchronisation events.
1235
1236 Enhancement: Improved Sound
1237 ---------------------------
1238
1239 The standard ULA reserves &FE*6 for sound generation and cassette input/output
1240 (with bits 1 and 2 of &FE*7 being used to select either sound generation or
1241 cassette I/O), thus making it impossible to support multiple channels within
1242 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
1243 and an enhanced ULA could adopt this interface.
1244
1245 The BBC Micro uses the SN76489 chip to produce sound, and the entire
1246 functionality of this chip could be emulated for enhanced sound, with a subset
1247 of the functionality exposed via the &FE*6 interface.
1248
1249 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
1250 See: http://www.smspower.org/Development/SN76489
1251
1252 Enhancement: Waveform Upload
1253 ----------------------------
1254
1255 As with a hardware sprite function, waveforms could be uploaded or referenced
1256 using locations as registers referencing memory regions.
1257
1258 Enhancement: Sound Input/Output
1259 -------------------------------
1260
1261 Since the ULA already controls audio input/output for cassette-based data, it
1262 would have been interesting to entertain the idea of sampling and output of
1263 sounds through the cassette interface. However, a significant amount of
1264 circuitry is employed to process the input signal for use by the ULA and to
1265 process the output signal for recording.
1266
1267 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
1268
1269 Enhancement: BBC ULA Compatibility
1270 ----------------------------------
1271
1272 Although some new ULA functions could be defined in a way that is also
1273 compatible with the BBC Micro, the BBC ULA is itself incompatible with the
1274 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
1275 map, but controls various functions specific to the 6845 video controller;
1276 &FE08-F is reserved for the serial controller. It therefore becomes possible
1277 to disregard compatibility where compatibility is already disregarded for a
1278 particular area of functionality.
1279
1280 &FE20-F maps to video ULA functionality on the BBC Micro which provides
1281 control over the palette (using address &FE21, compared to &FE07-F on the
1282 Electron) and other system-specific functions. Since the location usage is
1283 generally incompatible, this region could be reused for other purposes.
1284
1285 Enhancement: Increased RAM, ULA and CPU Performance
1286 ---------------------------------------------------
1287
1288 More modern implementations of the hardware might feature faster RAM coupled
1289 with an increased ULA clock frequency in order to increase the bandwidth
1290 available to the ULA and to the CPU in situations where the ULA is not needed
1291 to perform work. A ULA employing a 32MHz clock would be able to complete the
1292 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
1293 to access the RAM for the following 250ns even in display modes requiring the
1294 retrieval of a byte for the display every 500ns. The CPU could, subject to
1295 timing issues, run at 2MHz even in MODE 0, 1 and 2.
1296
1297 A scheme such as that described above would have a similar effect to the
1298 scheme employed in the BBC Micro, although the latter made use of RAM with a
1299 wider bandwidth in order to complete memory transfers within 250ns and thus
1300 permit the CPU to run continuously at 2MHz.
1301
1302 Higher bandwidth could potentially be used to implement exotic features such
1303 as RAM-resident hardware sprites or indeed any feature demanding RAM access
1304 concurrent with the production of the display image.
1305
1306 Enhancement: Multiple CPU Stacks and Zero Pages
1307 -----------------------------------------------
1308
1309 The 6502 maintains a stack for subroutine calls and register storage in page
1310 &01. Although the stack register can be manipulated using the TSX and TXS
1311 instructions, thereby permitting the maintenance of multiple stack regions and
1312 thus the potential coexistence of multiple programs each using a separate
1313 region, only programs that make little use of the stack (perhaps avoiding
1314 deeply-nested subroutine invocations and significant register storage) would
1315 be able to coexist without overwriting each other's stacks.
1316
1317 One way that this issue could be alleviated would involve the provision of a
1318 facility to redirect accesses to page &01 to other areas of memory. The ULA
1319 would provide a register that defines a physical page for the use of the CPU's
1320 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would
1321 change the asserted address lines to redirect the access to the appropriate
1322 physical region.
1323
1324 By providing an 8-bit register, mapping to the most significant byte (MSB) of
1325 a 16-bit address, the ULA could then replace any MSB equal to &01 with the
1326 register value before the access is made. Where multiple programs coexist,
1327 upon switching programs, the register would be updated to point the ULA to the
1328 appropriate stack location, thus providing a simple memory management unit
1329 (MMU) capability.
1330
1331 In a similar fashion, zero page accesses could also be redirected so that code
1332 could run from sideways RAM and have zero page operations redirected to "upper
1333 memory" - for example, to page &BE (with stack accesses redirected to page
1334 &BF, perhaps) - thereby permitting most CPU operations to occur without
1335 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
1336 CPU as it contends with the ULA for memory access.
1337
1338 Such facilities could also be provided by a separate circuit between the CPU
1339 and ULA in a fashion similar to that employed by a "turbo" board, but unlike
1340 such boards, no additional RAM would be provided: all memory accesses would
1341 occur as normal through the ULA, albeit redirected when configured
1342 appropriately.
1343
1344 ULA Pin Functions
1345 -----------------
1346
1347 The functions of the ULA pins are described in the Electron Service Manual. Of
1348 interest to video processing are the following:
1349
1350 CSYNC (low during horizontal or vertical synchronisation periods, high
1351 otherwise)
1352
1353 HS (low during horizontal synchronisation periods, high otherwise)
1354
1355 RED, GREEN, BLUE (pixel colour outputs)
1356
1357 CLOCK IN (a 16MHz clock input, 4V peak to peak)
1358
1359 PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
1360
1361 More general memory access pins:
1362
1363 RAM0...RAM3 (data lines to/from the RAM)
1364
1365 RA0...RA7 (address lines for sending both row and column addresses to the RAM)
1366
1367 RAS (row address strobe setting the row address on a negative edge - see the
1368 timing notes)
1369
1370 CAS (column address strobe setting the column address on a negative edge -
1371 see the timing notes)
1372
1373 WE (sets write enable with logic 0, read with logic 1)
1374
1375 ROM (select data access from ROM)
1376
1377 CPU-oriented memory access pins:
1378
1379 A0...A15 (CPU address lines)
1380
1381 PD0...PD7 (CPU data lines)
1382
1383 R/W (indicates CPU write with logic 0, CPU read with logic 1)
1384
1385 Interrupt-related pins:
1386
1387 NMI (CPU request for uninterrupted 1MHz access to memory)
1388
1389 IRQ (signal event to CPU)
1390
1391 POR (power-on reset, resetting the ULA on a positive edge and asserting the
1392 CPU's RST pin)
1393
1394 RST (master reset for the CPU signalled on power-up and by the Break key)
1395
1396 Keyboard-related pins:
1397
1398 KBD0...KBD3 (keyboard inputs)
1399
1400 CAPS LOCK (control status LED)
1401
1402 Sound-related pins:
1403
1404 SOUND O/P (sound output using internal oscillator)
1405
1406 Cassette-related pins:
1407
1408 CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
1409
1410 CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
1411
1412 CAS RC (detect high tone)
1413
1414 CAS MO (motor relay output)
1415
1416 ÷13 IN (~1200 baud clock input)
1417
1418 ULA Socket
1419 ----------
1420
1421 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
1422
1423 References
1424 ----------
1425
1426 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
1427
1428 About this Document
1429 -------------------
1430
1431 The most recent version of this document and accompanying distribution should
1432 be available from the following location:
1433
1434 http://hgweb.boddie.org.uk/ULA
1435
1436 Copyright and licence information can be found in the docs directory of this
1437 distribution - see docs/COPYING.txt for more information.