1 The Acorn Electron ULA
2 ======================
3
4 Principal Design and Feature Constraints
5 ----------------------------------------
6
7 The features of the ULA are limited in sophistication by the amount of time
8 and resources that can be allocated to each activity supporting the
9 fundamental features and obligations of the unit. Maintaining a screen display
10 based on the contents of RAM itself requires the ULA to have exclusive access
11 to various hardware resources for a significant period of time.
12
13 Whilst other elements of the ULA can in principle run in parallel with the
14 display refresh activity, they cannot also access the RAM at the same time.
15 Consequently, other features that might use the RAM must accept a reduced
16 allocation of that resource in comparison to a hypothetical architecture where
17 concurrent RAM access is possible at all times.
18
19 Thus, the principal constraint for many features is bandwidth. The duration of
20 access to hardware resources is one aspect of this; the rate at which such
21 resources can be accessed is another. For example, the RAM is not fast enough
22 to support access more frequently than one byte per 2MHz cycle, and for screen
23 modes involving 80 bytes of screen data per scanline, there are no free cycles
24 for anything other than the production of pixel output during the active
25 scanline periods.
26
27 Another constraint is imposed by the method of RAM access provided by the ULA.
28 The ULA is able to access RAM by fetching 4 bits at a time and thus managing
29 to transfer 8 bits within a single 2MHz cycle, this being sufficient to
30 provide display data for the most demanding screen modes. However, this
31 mechanism's timing requirements are beyond the capabilities of the CPU when
32 running at 2MHz.
33
34 Consequently, the CPU will only ever be able to access RAM via the ULA at
35 1MHz, even when the ULA is not accessing the RAM. Fortunately, when needing to
36 refresh the display, the ULA is still able to make use of the idle part of
37 each 1MHz cycle (or, rather, the idle 2MHz cycle unused by the CPU) to itself
38 access the RAM at a rate of 1 byte per 1MHz cycle (or 1 byte every other 2MHz
39 cycle), thus supporting the less demanding screen modes.
40
41 Timing
42 ------
43
44 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
45 of which are used to generate pixel data. At 50Hz, this means that 128 cycles
46 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
47 312 ~= 128 cycles). This is consistent with the observation that each scanline
48 requires at most 80 bytes of data, and that the ULA is apparently busy for 40
49 out of 64 microseconds in each scanline.
50
51 (In fact, since the ULA is seeking to provide an image for an interlaced
52 625-line display, there are in fact two "fields" involved, one providing 312
53 scanlines and one providing 313 scanlines. See below for a description of the
54 video system.)
55
56 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
57 each providing two bits of each byte) using two cycles within the 500ns period
58 of the 2MHz clock to complete each access operation. Since the CPU and ULA
59 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
60 effectively run at 1MHz (since every other 500ns period involves the ULA
61 accessing RAM) during transfers of screen data.
62
63 The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided
64 by the ULA (IC1) depending on the screen mode in use. Each 16MHz cycle is
65 approximately 62.5ns. To access the memory, the following patterns
66 corresponding to 16MHz cycles are required:
67
68 Time (ns): 0-------------- 500------------- ...
69 2 MHz cycle: 0 1 ...
70 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
71 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
72 ~RAS: /---\___________/---\___________ ...
73 ~CAS: /-----\___/-\___/-----\___/-\___ ...
74 Address events: A B C A B C ...
75 Data events: F S F S ...
76
77 ~RAS ops: 1 0 1 0 ...
78 ~CAS ops: 1 0 1 0 1 0 1 0 ...
79
80 Address ops: a.b. c. a.b. c. ...
81 Data ops: s f s f ...
82
83 ~WE: ......W ...
84 PHI OUT: \_______________/--------------- ...
85 CPU (RAM): L D ...
86 RnW: R ...
87
88 PHI OUT: \_______/-------\_______/------- ...
89 CPU (ROM): L D L D ...
90 RnW: R R ...
91
92 ~RAS must be high for 100ns, ~CAS must be high for 50ns.
93 ~RAS must be low for 150ns, ~CAS must be low for 90ns.
94 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
95
96 Here, "A" and "B" respectively indicate the row and first column addresses
97 being latched into the RAM (on a negative edge for ~RAS and ~CAS
98 respectively), and "C" indicates the second column address being latched into
99 the RAM. Presumably, the first and second half-bytes can be read at "F" and
100 "S" respectively, and the row and column addresses must be made available at
101 "a" and "b" (and "c") respectively at the latest. The TM4164EC4 datasheet
102 suggests that the addresses can be made available as the ~RAS and ~CAS levels
103 are brought low. Data can be read at "f" and "s" for the first and second
104 half-bytes respectively.
105
106 For the CPU, "L" indicates the point at which an address is taken from the CPU
107 address bus, on a negative edge of PHI OUT, with "D" being the point at which
108 data may either be read or be asserted for writing, on a positive edge of PHI
109 OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low
110 for writing or high for reading, and thus propagates RnW from the CPU, this
111 would need to be done before data would be retrieved and, according to the
112 TM4164EC4 datasheet, even as late as the column address is presented and ~CAS
113 brought low.
114
115 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
116 address access time of 90ns (maximum), which appears to mean that ~RAS must be
117 held low for at least 150ns and that ~CAS must be held low for at least 90ns
118 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
119 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
120 is 1.5 cycles.
121
122 Note that the Service Manual refers to the negative edge of RAS and CAS, but
123 the datasheet for the similar TM4164EC4 product shows latching on the negative
124 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
125 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
126 "page mode" provides the appropriate behaviour for that particular product.
127
128 The CPU, when accessing the RAM alone, apparently does not make use of the
129 vacated "slot" that the ULA would otherwise use (when interleaving accesses in
130 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
131 accessing ROM (and potentially sideways RAM). The principal limitation is the
132 amount of time needed between issuing an address and receiving an entire byte
133 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
134 4 cycles that would be required for 2MHz operation.
135
136 See: Acorn Electron Advanced User Guide
137 See: Acorn Electron Service Manual
138 http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
139 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
140 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
141 See: One of the Most Popular 65,536-Bit (64K) Dynamic RAMs The TMS 4164
142 http://smithsonianchips.si.edu/augarten/p64.htm
143
144 A Note on 8-Bit Wide RAM Access
145 -------------------------------
146
147 It is worth considering the timing when 8 bits of data can be obtained at once
148 from the RAM chips:
149
150 Time (ns): 0-------------- 500------------- ...
151 2 MHz cycle: 0 1 ...
152 8 MHz cycle: 0 1 2 3 0 1 2 3 ...
153 /-\_/-\_/-\_/-\_/-\_/-\_/-\_/-\_ ...
154 ~RAS: /---\___________/---\___________ ...
155 ~CAS: /-------\_______/-------\_______ ...
156 Address events: A B A B ...
157 Data events: E E ...
158
159 ~RAS ops: 1 0 1 0 ...
160 ~CAS ops: 1 0 1 0 ...
161
162 Address ops: a b a b ...
163 Data ops: f s f ...
164
165 ~WE: ........W ...
166 PHI OUT: \_______/-------\_______/------- ...
167 CPU: L D L D ...
168 RnW: R R ...
169
170 Here, "E" indicates the availability of an entire byte.
171
172 Since only one fetch is required per 2MHz cycle, instead of two fetches for
173 the 4-bit wide RAM arrangement, it seems likely that longer 8MHz cycles could
174 be used to coordinate the necessary signalling.
175
176 Another conceivable simplification from using an 8-bit wide RAM access channel
177 with a single access within each 2MHz cycle is the possibility of allowing the
178 CPU to signal directly to the RAM instead of having the ULA perform the access
179 signalling on the CPU's behalf. Note that it is this more leisurely signalling
180 that would allow the CPU to conduct accesses at 2MHz: the "compressed"
181 signalling being beyond the capabilities of the CPU.
182
183 Note that 16MHz cycles would still be needed for the pixel clock in MODE 0,
184 which needs to output eight pixels per 2MHz cycle, producing 640 monochrome
185 pixels per 80-byte line.
186
187 An obvious consideration with regard to 8-bit wide access is whether the ULA
188 could still conduct the "compressed" signalling for its own RAM accesses:
189
190 Time (ns): 0-------------- 500------------- ...
191 2 MHz cycle: 0 1 ...
192 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
193 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
194 ~RAS: /---\___________/---\___________ ...
195 ~CAS: /-----\___/-\___/-----\___/-\___ ...
196 Address events: A B C A B C ...
197 Data events: 1 2 1 2 ...
198
199 ~RAS ops: 1 0 1 0 ...
200 ~CAS ops: 1 0 1 0 1 0 1 0 ...
201
202 Address ops: a b c a b c ...
203 Data ops: s f s f ...
204
205 ~WE: ......W ...
206 PHI OUT: \_______/-------\_______/------- ...
207 CPU: L D L D ...
208 RnW: R R ...
209
210 Here, "1" and "2" in the data events correspond to whole byte accesses,
211 effectively upgrading the half-byte "F" and "S" events in the existing ULA
212 arrangement.
213
214 Although the provision of access for the CPU would adhere to the relevant
215 timing constraints, providing only one byte per 2MHz cycle, the ULA could
216 obtain two bytes per cycle. This would then free up bandwidth for the CPU in
217 screen modes where the ULA would normally be dominant (MODE 0 to 3), albeit at
218 the cost of extra buffering. Such buffering could also be done for modes where
219 the bandwidth is shared (MODE 4 to 6), consolidating pairs of ULA accesses into
220 single cycles and freeing up an extra cycle for CPU accesses.
221
222 A further consideration is whether the CPU and ULA could access the memory on
223 interleaved 4MHz cycles, thus replicating the arrangement used by the CPU and
224 Video ULA on the BBC Micro. One potential obstacle is that the apparent 4MHz
225 access rate employed by the ULA does not involve the complete process for
226 accessing the RAM: upon setting up the address and issuing the ~RAS signal,
227 the ULA is able to make a pair of column accesses on the same "row" of memory,
228 effectively achieving an average access rate of 4MHz in an 8-bit
229 configuration.
230
231 However, if arbitrary pairs of column accesses were to be attempted, as would
232 be required by CPU and ULA interleaving, the ~RAS signal would need to be
233 re-issued with different addresses being set up. This would expand the time to
234 access a memory location to beyond the period of a 4MHz cycle, making it
235 impossible to employ interleaved accesses at such a rate.
236
237 In conclusion, a strict interleaving strategy is not possible, but by using
238 pixel data buffering and employing two ULA accesses per 2MHz cycle to obtain
239 two bytes in that cycle, each adjacent 2MHz cycle can be given to the CPU,
240 thus achieving an effective throughput during display update periods of 3
241 bytes for every pair of cycles (2 bytes for the ULA, 1 byte for the CPU), and
242 thus 1.5 bytes per cycle, giving an illusion of 3MHz access to RAM.
243
244 Some other considerations apply to introducing 8-bit wide access. The ULA
245 employs four pins for data transfer to and from the memory devices (RAM0..3),
246 and obviously another four pins would be needed in an 8-bit wide scheme.
247 However, there may have been a physical limitation on the number of pins
248 permissible on a ULA package or the device's socket. This would necessitate
249 the reassignment of pins, although few are readily available for such
250 reassignment.
251
252 One approach might involve connecting the RAM devices to the CPU data bus,
253 with each line connecting to a different RAM chip. The signalling of the RAM
254 would remain under the control of the ULA, thus preventing the RAM devices
255 from interfering with other memory transfer operations, with the ROM
256 signalling also remaining under the ULA's control. One potential disadvantage
257 of this scheme would involve the elimination of the separate data paths
258 between the CPU and ROM and between the ULA and RAM.
259
260 Another approach might involve reclaiming the keyboard input pins (KBD0..3) as
261 data pins for ULA access to RAM. This would necessitate the reorganisation of
262 the keyboard interface, perhaps integrating the keyboard matrix more directly
263 as a kind of ROM device. A bus transceiver could be used to isolate the
264 keyboard inputs, with a pin being used to control the transceiver, since the
265 keyboard data lines are pulled high. In effect, the transceiver would act as a
266 kind of output enable for the keyboard.
267
268 To make the matrix appear within the sideways ROM region of the memory map,
269 A15 would need to be set to a high value and A14 to a low value. Signals A13
270 to A0 would then be brought low to select the appropriate column, with the
271 individual key states being made available via data lines, perhaps D3 to D0.
272 This mostly retains the existing addressing arrangement and scanning
273 mechanism. Internally, the ULA would continue to enable access to the keyboard
274 through the ROM paging mechanism, but instead of integrating separate data
275 pins into the CPU's data path, it would integrate the keyboard inputs using
276 the transceiver.
277
278 Enhancement: Keyboard Matrix Scanning
279 -------------------------------------
280
281 The keyboard scanning mechanism is presumably designed to be as inexpensive as
282 possible, being driven by software and avoiding extra logic, but at the
283 expense of occupying large regions of the memory map when paged in. A more
284 efficient mapping of the keyboard columns could possibly be done using
285 decoders such as the 74xx138 part which permits the decoding of three inputs
286 to select one of eight outputs. Using two of these parts, six address lines
287 would be dedicated to the keyboard columns as follows:
288
289 A5...A3 select up to eight columns via one decoder
290 A2...A0 select up to eight columns via another decoder
291
292 In this arrangement, only one of the two ranges of pins would be used at any
293 given time. If the ULA were to require a certain combination of the remaining
294 address bits, a region as small as 64 bytes could be dedicated to the
295 keyboard.
296
297 A more efficient arrangement could be used by introducing logic that allows
298 the decoders to work together to address the keyboard:
299
300 A2...A0 select up to eight columns via both decoders
301 A3 would enable one decoder if low and the other decoder if high
302
303 With ULA constraints on the remaining address bits, a 16-byte region could be
304 used to represent the keyboard.
305
306 A further refinement might involve combining the existing columns into groups
307 of eight keys. This would reduce the number of columns to seven, requiring
308 only three address lines, with all eight data lines being used to read the
309 matrix.
310
311 On the BBC Micro, the system 6522 VIA is used to monitor and read from the
312 keyboard. The memory locations involved with this chip are located in the
313 region from &FE40 to &FE7F inclusive, although the memory is allocated in a
314 way that is appropriate to operate that chip, as opposed to merely exposing
315 the keyboard matrix.
316
317 Enhancement: Hardware Device Selection
318 --------------------------------------
319
320 An alternative to the existing, rather cumbersome, sideways ROM mapping of the
321 keyboard might involve making it accessible via a hardware-related memory page
322 like page FE. With ULA addresses confined to FE0x, and with the ULA itself
323 having to trap accesses to page FE, the page selection signal might be brought
324 out of the ULA instead of any dedicated signal for the keyboard. Various
325 address lines corresponding to A7 through A4, or a subset of these, could be
326 fed into a decoder to permit the selection of other devices, with the keyboard
327 being one of these.
328
329 Meanwhile, a more efficient keyboard mapping using the above matrix
330 enhancement would permit the different keyboard columns to appear as a group
331 of sixteen or eight bytes. Thus:
332
333 A15...A8 select page FE
334 A7...A4 select a device or peripheral
335 A3...A0 select a register or keyboard column
336
337 Conceivably, devices such as sound generators could be mapped to device
338 regions.
339
340 CPU Clock Notes
341 ---------------
342
343 "The 6502 receives an external square-wave clock input signal on pin 37, which
344 is usually labeled PHI0. [...] This clock input is processed within the 6502
345 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2
346 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been
347 through two inverters and a push-pull amplifier. The same network of
348 transistors within the 6502 which generates PHI2 is also tied to PHI1, and
349 generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made
350 available to external devices is so that they know when they can access the
351 CPU. When PHI1 is high, this means that external devices can read from the
352 address bus or data bus; when PHI2 is high, this means that external devices
353 can write to the data bus."
354
355 See: http://lateblt.livejournal.com/88105.html
356
357 "The 6502 has a synchronous memory bus where the master clock is divided into
358 two phases (Phase 1 and Phase 2). The address is always generated during Phase
359 1 and all memory accesses take place during Phase 2."
360
361 See: http://www.jmargolin.com/vgens/vgens.htm
362
363 Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During
364 Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means
365 when PHI1 is high.
366
367 Bandwidth Figures
368 -----------------
369
370 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
371 total lines, with 80 cycles occurring in the active periods of display
372 scanlines, the following bandwidth calculations can be performed:
373
374 Total theoretical maximum:
375 128 cycles * 312 lines
376 = 39936 bytes
377
378 MODE 0, 1, 2:
379 ULA: 80 cycles * 256 lines
380 = 20480 bytes
381 CPU: 48 cycles / 2 * 256 lines
382 + 128 cycles / 2 * (312 - 256) lines
383 = 9728 bytes
384
385 MODE 3:
386 ULA: 80 cycles * 24 rows * 8 lines
387 = 15360 bytes
388 CPU: 48 cycles / 2 * 24 rows * 8 lines
389 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
390 = 12288 bytes
391
392 MODE 4, 5:
393 ULA: 40 cycles * 256 lines
394 = 10240 bytes
395 CPU: (40 cycles + 48 cycles / 2) * 256 lines
396 + 128 cycles / 2 * (312 - 256) lines
397 = 19968 bytes
398
399 MODE 6:
400 ULA: 40 cycles * 24 rows * 8 lines
401 = 7680 bytes
402 CPU: (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
403 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
404 = 19968 bytes
405
406 Here, the division of 2 for CPU accesses is performed to indicate that the CPU
407 only uses every other access opportunity even in uncontended periods. See the
408 2MHz RAM Access enhancement below for bandwidth calculations that consider
409 this limitation removed.
410
411 A summary of the bandwidth figures is as follows (with extra timing details
412 described below):
413
414 Standard ULA % Total Slowdown BBC-10s BBC-34s
415 MODE 0, 1, 2 9728 bytes 24% 4.11 43s 105s
416 MODE 3 12288 bytes 31% 3.25 34s
417 MODE 4, 5 19968 bytes 50% 2 20s
418 MODE 6 19968 bytes 50% 2 20s 50s
419
420 The review of the Electron in Practical Computing (October 1983) provides a
421 concise overview of the RAM access limitations and gives timing comparisons
422 between modes and BBC Micro performance. In the above, "BBC-10s" is the
423 measured or stated time given for a program taking 10 seconds on the BBC
424 Micro, whereas "BBC-34s" is the apparently measured time given for the
425 "Persian" program taking 34 seconds to complete on the BBC Micro, with a
426 "quick" mode presumably switching to MODE 6 using the ULA directly in order to
427 reduce display bandwidth usage while the program draws to the screen.
428 Evidently, the measured slowdown is slightly lower than the theoretical
429 slowdown, most likely due to the running time not being entirely dominated by
430 RAM access performance characteristics.
431
432 Video Timing
433 ------------
434
435 According to 8.7 in the Service Manual, and the PAL Wikipedia page,
436 approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
437 (including the "colour burst"), and 1.65µs for the "front porch", totalling
438 12.05µs and thus leaving 51.95µs for the active video signal for each
439 scanline. As the Service Manual suggests in the oscilloscope traces, the
440 display information is transmitted more or less centred within the active
441 video period since the ULA will only be providing pixel data for 40µs in each
442 scanline.
443
444 Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
445 each scanline can be divided into 1024 cycles, although only 640 at most are
446 actively used to provide pixel data. Pixel data production should only occur
447 within a certain period on each scanline, approximately 262 cycles after the
448 start of hsync:
449
450 active video period = 51.95µs
451 pixel data period = 40µs
452 total silent period = 51.95µs - 40µs = 11.95µs
453 silent periods (before and after) = 11.95µs / 2 = 5.975µs
454 hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
455 time before pixel data period = 10.4µs + 5.975µs = 16.375µs
456 pixel data period start cycle = 16.375µs / 62.5ns = 262
457
458 By choosing a number divisible by 8, the RAM access mechanism can be
459 synchronised with the pixel production. Thus, 256 is a more appropriate start
460 cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
461 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
462 document) occurs at cycle 0.
463
464 To summarise:
465
466 HS signal starts at cycle 0 on each horizontal scanline
467 HS signal ends approximately 4µs later at cycle 64
468 Pixel data starts approximately 12µs later at cycle 256
469
470 "Re: Electron Memory Contention" provides measurements that appear consistent
471 with these calculations.
472
473 The "vertical blanking period", meaning the period before picture information
474 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
475 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
476 lines. Thus, the first visible scanline on the first field of a frame occurs
477 half way through the 23rd scanline period measured from the start of vsync
478 (indicated by "V" in the diagrams below):
479
480 10 20 23
481 Line in frame: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
482 Line from 1: 0 22 3
483 Line on screen: .:::::VVVVV::::: 12233445566
484 |_________________________________________________|
485 25 line vertical blanking period
486
487 In the second field of a frame, the first visible scanline coincides with the
488 24th scanline period measured from the start of line 313 in the frame:
489
490 310 336
491 Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
492 Line from 313: 0 23 4
493 Line on screen: 88:::::VVVVV:::: 11223344
494 288 | |
495 |_________________________________________________|
496 25 line vertical blanking period
497
498 In order to consider only full lines, we might consider the start of each
499 frame to occur 23 lines after the start of vsync.
500
501 Again, it is likely that pixel data production should only occur on scanlines
502 within a certain period on each frame. The "625/50" document indicates that
503 only a certain region is "safe" to use, suggesting a vertically centred region
504 with approximately 15 blank lines above and below the picture. However, the
505 "PAL TV timing and voltages" document suggests 28 blank lines above and below
506 the picture. This would centre the 256 lines within the 312 lines of each
507 field and thus provide a start of picture approximately 5.5 or 5 lines after
508 the end of the blanking period or 28 or 27.5 lines after the start of vsync.
509
510 To summarise:
511
512 CSYNC signal starts at cycle 0
513 CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
514 Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
515
516 See: http://en.wikipedia.org/wiki/PAL
517 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
518 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
519 http://lipas.uwasa.fi/~f76998/video/modes/
520 See: PAL TV timing and voltages
521 http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
522 See: Line Standards
523 http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
524 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
525 http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
526 See: Re: Electron Memory Contention
527 http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
528
529 RAM Integrated Circuits
530 -----------------------
531
532 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
533 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
534 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
535 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
536 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
537
538 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
539 the Samsung-produced KM41464 series is apparently equivalent to the Texas
540 Instruments 4164 chips presumably used in the Electron.
541
542 The TM4164EC4 series combines 4 64K x 1b units into a single package and
543 appears similar to the TM4164EA4 featured on the Electron's circuit diagram
544 (in the Advanced User Guide but not the Service Manual), and it also has 22
545 pins providing 3 additional inputs and 3 additional outputs over the 16 pins
546 of the individual 4164-15 modules, presumably allowing concurrent access to
547 the packaged memory units.
548
549 As far as currently available replacements are concerned, the NTE4164 is a
550 potential candidate: according to the Vetco Electronics entry, it is
551 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
552 parts include the NTE2164 and the NTE6664, both of which appear to have
553 largely the same performance and connection characteristics. Meanwhile, the
554 NTE21256 appears to be a 16-pin replacement with four times the capacity that
555 maintains the single data input and output pins. Using the NTE21256 as a
556 replacement for all ICs combined would be difficult because of the single bit
557 output.
558
559 Another device equivalent to the 4164-15 appears to be available under the
560 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
561 site lists data sheets for other devices on the same page, but these are
562 different and actually appear to be provided under the 41574 product code (but
563 are listed under 41464-10) and appear to be replacements for the TM4164EC4:
564 the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
565 employing 4 pins for both input and output.
566
567 Pins I/O pins Row access Column access
568 ---- -------- ---------- -------------
569 TM4164EC4 22 4 + 4 150ns (15) 90ns (15)
570 KM41464AP 18 4 150ns (15) 75ns (15)
571 NTE21256 16 1 + 1 150ns 75ns
572 HYB 4164-2 16 1 + 1 150ns 100ns
573 µPD41464 18 4 120ns (12) 60ns (12)
574
575 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
576 https://www.rocelec.com/part/REITM4164EC4-15L
577 See: Dynamic RAMS
578 http://www.unicornelectronics.com/IC/DYNAMIC.html
579 See: New old stock 8x 4164 chips
580 http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
581 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
582 http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
583 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
584 http://www.vetco.net/catalog/product_info.php?products_id=2806
585 See: NTE4164 - IC-NMOS 64K DRAM 150NS
586 http://www.vetco.net/catalog/product_info.php?products_id=3680
587 See: NTE21256 - IC-256K DRAM 150NS
588 http://www.vetco.net/catalog/product_info.php?products_id=2799
589 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
590 http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
591 See: NTE6664 - IC-MOS 64K DRAM 150NS
592 http://www.vetco.net/catalog/product_info.php?products_id=5213
593 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
594 http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
595 See: 4164-150: MAJOR BRANDS
596 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
597 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
598 http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
599 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
600 http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
601 See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
602 http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
603 See: 41464-10: MAJOR BRANDS
604 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
605
606 Interrupts
607 ----------
608
609 The ULA generates IRQs (maskable interrupts) according to certain conditions
610 and these conditions are controlled by location &FE00:
611
612 * Vertical sync (bottom of displayed screen)
613 * 50MHz real time clock
614 * Transmit data empty
615 * Receive data full
616 * High tone detect
617
618 The ULA is also used to clear interrupt conditions through location &FE05. Of
619 particular significance is bit 7, which must be set if an NMI (non-maskable
620 interrupt) has occurred and has thus suspended ULA access to memory, restoring
621 the normal function of the ULA.
622
623 ROM Paging
624 ----------
625
626 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
627 mappings exist:
628
629 8 keyboard
630 9 keyboard (duplicate)
631 10 BASIC ROM
632 11 BASIC ROM (duplicate)
633
634 Paging in a ROM involves the following procedure:
635
636 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
637 2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
638 selected.
639 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
640 whilst writing the desired ROM number n in bits 0 to 2.
641
642 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
643
644 Keyboard Access
645 ---------------
646
647 The keyboard pages appear to be accessed at 1MHz just like the RAM.
648
649 See: https://stardot.org.uk/forums/viewtopic.php?p=254155#p254155
650
651 Shadow/Expanded Memory
652 ----------------------
653
654 The Electron exposes all sixteen address lines and all eight data lines
655 through the expansion bus. Using such lines, it is possible to provide
656 additional memory - typically sideways ROM and RAM - on expansion cards and
657 through cartridges, although the official cartridge specification provides
658 fewer address lines and only seeks to provide access to memory in 16K units.
659
660 Various modifications and upgrades were developed to offer "turbo"
661 capabilities to the Electron, permitting the CPU to access a separate 8K of
662 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
663 the ULA through additional logic. However, an enhanced ULA might support
664 independent CPU access to memory over the expansion bus by allowing itself to
665 be discharged from providing access to memory, potentially for a range of
666 addresses, and for the CPU to communicate with external memory uninterrupted.
667
668 Sideways RAM/ROM and Upper Memory Access
669 ----------------------------------------
670
671 Although the ULA controls the CPU clock, effectively slowing or stopping the
672 CPU when the ULA needs to access screen memory, it is apparently able to allow
673 the CPU to access addresses of &8000 and above - the upper region of memory -
674 at 2MHz independently of any access to RAM that the ULA might be performing,
675 only blocking the CPU if it attempts to access addresses of &7FFF and below
676 during any ULA memory access - the lower region of memory - by stopping or
677 stalling its clock.
678
679 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
680 CPU clock if the line goes low, when the CPU is attempting to access the lower
681 region of memory.
682
683 Hardware Scrolling (and Enhancement)
684 ------------------------------------
685
686 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
687 the least significant 5 bits being zero, thus limiting the scrolling
688 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
689 using the same layout of these addresses.
690
691 |--&FE02--------------| |--&FE03--------------|
692 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
693
694 XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
695
696 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
697 memory to pixel locations is character oriented. A change in 8 bytes would
698 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
699 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
700 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
701 Guide).
702
703 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
704 of changing the screen address by 2 bytes is the change in the number of lines
705 from the initial and final character rows that need reading by the ULA, which
706 would need to maintain this state information (although this is a relatively
707 trivial change). Another pitfall is the complication that might be introduced
708 to software writing bitmaps of character height to the screen.
709
710 See: http://pastraiser.com/computers/acornelectron/acornelectron.html
711
712 Enhancement: Mode Layouts
713 -------------------------
714
715 Merely changing the screen memory mappings in order to have Archimedes-style
716 row-oriented screen addresses (instead of character-oriented addresses) could
717 be done for the existing modes, but this might not be sufficiently beneficial,
718 especially since accessing regions of the screen would involve incrementing
719 pointers by amounts that are inconvenient on an 8-bit CPU.
720
721 However, instead of using a Archimedes-style mapping, column-oriented screen
722 addresses could be more feasibly employed: incrementing the address would
723 reference the vertical screen location below the currently-referenced location
724 (just as occurs within characters using the existing ULA); instead of
725 returning to the top of the character row and referencing the next horizontal
726 location after eight bytes, the address would reference the next character row
727 and continue to reference locations downwards over the height of the screen
728 until reaching the bottom; at the bottom, the next location would be the next
729 horizontal location at the top of the screen.
730
731 In other words, the memory layout for the screen would resemble the following
732 (for MODE 2):
733
734 &3000 &3100 ... &7F00
735 &3001 &3101
736 ... ...
737 &3007
738 &3008
739 ...
740 ... ...
741 &30FF ... &7FFF
742
743 Since there are 256 pixel rows, each column of locations would be addressable
744 using the low byte of the address. Meanwhile, the high byte would be
745 incremented to address different columns. Thus, addressing screen locations
746 would become a lot more convenient and potentially much more efficient for
747 certain kinds of graphical output.
748
749 One potential complication with this simplified addressing scheme arises with
750 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
751 with the existing ULA) would be achieved by incrementing or decrementing the
752 screen start address; by one character row, it would involve adding or
753 subtracting 8. However, the ULA only supports multiples of 64 when changing the
754 screen start address. Thus, if such a scheme were to be adopted, three
755 additional bits would need to be supported in the screen start register (see
756 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal
757 scrolling would be much improved even under the severe constraints of the
758 existing ULA: only adjustments of 256 to the screen start address would be
759 required to produce single-location scrolling of as few as two pixels in MODE 2
760 (four pixels in MODEs 1 and 5, eight pixels otherwise).
761
762 More disruptive is the effect of this alternative layout on software.
763 Presumably, compatibility with the BBC Micro was the primary goal of the
764 Electron's hardware design. With the character-oriented screen layout in
765 place, system software (and application software accessing the screen
766 directly) would be relying on this layout to run on the Electron with little
767 or no modification. Although it might have been possible to change the system
768 software to use this column-oriented layout instead, this would have incurred
769 a development cost and caused additional work porting things like games to the
770 Electron. Moreover, a separate branch of the software from that supporting the
771 BBC Micro and closer derivatives would then have needed maintaining.
772
773 The decision to use the character-oriented layout in the BBC Micro may have
774 been related to the choice of circuitry and to facilitate a convenient
775 hardware implementation, and by the time the Electron was planned, it was too
776 late to do anything about this somewhat unfortunate choice.
777
778 Pixel Layouts
779 -------------
780
781 The pixel layouts are as follows:
782
783 Modes Depth (bpp) Pixels (from bits)
784 ----- ----------- ------------------
785 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
786 1, 5 2 73 62 51 40
787 2 4 7531 6420
788
789 Since the ULA reads a half-byte at a time, one might expect it to attempt to
790 produce pixels for every half-byte, as opposed to handling entire bytes.
791 However, the pixel layout is not conducive to producing pixels as soon as a
792 half-byte has been read for a given full-byte location: in 1bpp modes the
793 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
794 data is spread across the entire byte in different ways.
795
796 An alternative arrangement might be as follows:
797
798 Modes Depth (bpp) Pixels (from bits)
799 ----- ----------- ------------------
800 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
801 1, 5 2 76 54 32 10
802 2 4 7654 3210
803
804 Just as the mode layouts were presumably decided by compatibility with the BBC
805 Micro, the pixel layouts will have been maintained for similar reasons.
806 Unfortunately, this layout prevents any optimisation of the ULA for handling
807 half-byte pixel data generally.
808
809 Enhancement: The Missing MODE 4
810 -------------------------------
811
812 The Electron inherits its screen mode selection from the BBC Micro, where MODE
813 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
814 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
815 however, and they are merely implemented by skipping two scanlines in every
816 ten after the eight required to produce a character line. Thus, such modes
817 provide a 24-row display.
818
819 In principle, nothing prevents this "text mode" effect being applied to other
820 modes. The 20-column modes are not well-suited to displaying text, which
821 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
822 2. Although the need for a non-monochrome 40-column text mode is addressed by
823 MODE 7 on the BBC Micro, the Electron lacks such a mode.
824
825 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
826 would occupy MODE 4 instead of the current MODE 4:
827
828 Screen mode Size (kilobytes) Colours Rows Resolution
829 ----------- ---------------- ------- ---- ----------
830 0 20 2 32 640x256
831 1 20 4 32 320x256
832 2 20 16 32 160x256
833 3 16 2 24 640x256
834 4 (new) 16 4 24 320x256
835 4 (old) 10 2 32 320x256
836 5 10 4 32 160x256
837 6 8 2 24 320x256
838
839 Thus, for increasing mode numbers, the size of each mode would be the same or
840 less than the preceding mode.
841
842 Enhancement: Display Mode Property Control
843 ------------------------------------------
844
845 It is rather curious that the ULA supports the mode numbers directly in bits 3
846 to 5 of &FE07 since these would presumably need to be decoded in order to set
847 the fundamental properties of the display mode. These properties are as
848 follows:
849
850 * Screen data retrieval rate: number of fetches per pair of 2MHz cycles
851 * Pixel colour depth
852 * Text mode vertical spacing
853
854 From these, the following properties emerge:
855
856 Property Influences
857 -------- ----------
858 Character row size (bytes) Retrieval rate
859
860 Number of character rows Text mode setting
861
862 Display size (bytes) Retrieval rate (character row size)
863 Text mode setting (number of rows)
864
865 Pixel frequency Retrieval rate
866 Horizontal resolution (pixels) Colour depth
867
868 One can imagine a register bitfield arrangement as follows:
869
870 Field Values Formula
871 ----- ------ -------
872 Pixel depth 00: 1 bit per pixel log2(depth)
873 01: 2 bits per pixel
874 10: 4 bits per pixel
875
876 Retrieval rate 0: twice 2 - fetches per cycle pair
877 1: once
878
879 Text mode enable 0: disable/off text mode enabled
880 1: enable/on
881
882 This arrangement would require four bits. However, one bit in &FE07 is
883 seemingly inactive and might possibly be reallocated.
884
885 The resulting combination of properties would permit all of the existing modes
886 plus some additional ones, including the missing MODE 4 mentioned above. With
887 the bitfields above ordered from the most significant bits to the least
888 significant bits providing the low-level "mode" values, the following table
889 can be produced:
890
891 Screen mode Depth Rate Text Size (K) Colours Rows Resolution
892 ----------- ----- ---- ---- -------- ------- ---- ----------
893 0 (0000) 1 twice off 20 2 32 640x256 (MODE 0)
894 1 (0001) 1 twice on 16 2 24 640x256 (MODE 3)
895 2 (0010) 1 once off 10 2 32 320x256 (MODE 4)
896 3 (0011) 1 once on 8 2 24 320x256 (MODE 6)
897 4 (0100) 2 twice off 20 4 32 320x256 (MODE 1)
898 5 (0101) 2 twice on 16 4 24 320x256
899 6 (0110) 2 once off 10 4 32 160x256 (MODE 5)
900 7 (0111) 2 once on 8 4 24 160x256
901 8 (1000) 4 twice off 20 16 32 160x256 (MODE 2)
902 9 (1001) 4 twice on 16 16 24 160x256
903 10 (1010) 4 once off 10 16 32 80x256
904 11 (1011) 4 once on 8 16 24 80x256
905
906 The existing modes would be covered in a way that is incompatible with the
907 existing numbering, thus requiring a table in software, but additional text
908 modes would be provided for MODE 1, MODE 5 and MODE 2. An additional two lower
909 resolution modes would also be conceivable within this scheme, requiring the
910 stretching of 16MHz pixels by a factor of eight to yield 80 pixels per
911 scanline. The utility of such modes is questionable and such modes might not
912 be supported.
913
914 Enhancement: 2MHz RAM Access
915 ----------------------------
916
917 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
918 when not competing with the ULA only accesses RAM every other 2MHz cycle (as
919 if the ULA still needed to access the RAM), one useful enhancement would be a
920 mechanism to let the CPU take over the ULA cycles outside the ULA's period of
921 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
922 3.
923
924 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
925
926 Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
927 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
928
929 In MODE 4 to 6:
930
931 Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
932 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
933
934 This would improve CPU bandwidth as follows:
935
936 Standard ULA Enhanced ULA % Total Bandwidth Speedup
937 MODE 0, 1, 2 9728 bytes 19456 bytes 24% -> 49% 2
938 MODE 3 12288 bytes 24576 bytes 31% -> 62% 2
939 MODE 4, 5 19968 bytes 29696 bytes 50% -> 74% 1.5
940 MODE 6 19968 bytes 32256 bytes 50% -> 81% 1.6
941
942 (Here, the uncontended total 2MHz bandwidth for a display period would be
943 39936 bytes, being 128 cycles per line over 312 lines.)
944
945 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
946 because all access opportunities to RAM are doubled. Meanwhile, in the other
947 modes, some CPU accesses occur alongside ULA accesses and thus cannot be
948 doubled, but the CPU bandwidth increase is still significant.
949
950 Unfortunately, the mechanism for accessing the RAM is too slow to provide data
951 within the time constraints of 2MHz operation. There is no time remaining in a
952 2MHz cycle for the CPU to receive and process any retrieved data once the
953 necessary signalling has been performed.
954
955 The only way for the CPU to be able to access the RAM quickly enough would be
956 to do away with the double 4-bit access mechanism and to have a single 8-bit
957 channel to the memory. This would require twice as many 1-bit RAM chips or a
958 different kind of RAM chip, but it would also potentially simplify the ULA.
959
960 The section on 8-bit wide RAM access discusses the possibilities around
961 changing the memory architecture, also describing the possibility of ULA
962 accesses achieving two bytes per 2MHz cycle due to the doubling of the memory
963 channel, leaving every other access free for the CPU during the display period
964 in MODE 0 to 3...
965
966 Standard display period: UUUUUUUU
967 Modified display period: UCUCUCUC
968
969 ...and consolidating accesses in MODE 4 to 6:
970
971 Standard display period: UCUCUCUC
972 Modified display period: UCCCUCCC
973
974 Together with the enhancements for non-display periods, such an "Enhanced+ ULA"
975 would perform as follows:
976
977 Standard ULA Enhanced+ ULA % Total Bandwidth Speedup
978 MODE 0, 1, 2 9728 bytes 29696 bytes 24% -> 74% 3.1
979 MODE 3 12288 bytes 32256 bytes 31% -> 81% 2.6
980 MODE 4, 5 19968 bytes 34816 bytes 50% -> 87% 1.7
981 MODE 6 19968 bytes 36096 bytes 50% -> 90% 1.8
982
983 Of course, the principal enhancement would be the wider memory channel, with
984 more buffering in the ULA being its contribution to this arrangement.
985
986 Enhancement: Region Blanking
987 ----------------------------
988
989 The problem of permitting character-oriented blitting in programs whilst
990 scrolling the screen by sub-character amounts could be mitigated by permitting
991 a region of the display to be blank, such as the final lines of the display.
992 Consider the following vertical scrolling by 2 bytes that would cause an
993 initial character row of 6 lines and a final character row of 2 lines:
994
995 6 lines - initial, partial character row
996 248 lines - 31 complete rows
997 2 lines - final, partial character row
998
999 If a routine were in use that wrote 8 line bitmaps to the partial character
1000 row now split in two, it would be advisable to hide one of the regions in
1001 order to prevent content appearing in the wrong place on screen (such as
1002 content meant to appear at the top "leaking" onto the bottom). Blanking 6
1003 lines would be sufficient, as can be seen from the following cases.
1004
1005 Scrolling up by 2 lines:
1006
1007 6 lines - initial, partial character row
1008 240 lines - 30 complete rows
1009 4 lines - part of 1 complete row
1010 -----------------------------------------------------------------
1011 4 lines - part of 1 complete row (hidden to maintain 250 lines)
1012 2 lines - final, partial character row (hidden)
1013
1014 Scrolling down by 2 lines:
1015
1016 2 lines - initial, partial character row
1017 248 lines - 31 complete rows
1018 ----------------------------------------------------------
1019 6 lines - final, partial character row (hidden)
1020
1021 Thus, in this case, region blanking would impose a 250 line display with the
1022 bottom 6 lines blank.
1023
1024 See the description of the display suspend enhancement for a more efficient
1025 way of blanking lines than merely blanking the palette whilst allowing the CPU
1026 to perform useful work during the blanking period.
1027
1028 To control the blanking or suspending of lines at the top and bottom of the
1029 display, a memory location could be dedicated to the task: the upper 4 bits
1030 could define a blanking region of up to 16 lines at the top of the screen,
1031 whereas the lower 4 bits could define such a region at the bottom of the
1032 screen. If more lines were required, two locations could be employed, allowing
1033 the top and bottom regions to occupy the entire screen.
1034
1035 Enhancement: Screen Height Adjustment
1036 -------------------------------------
1037
1038 The height of the screen could be configurable in order to reduce screen
1039 memory consumption. This is not quite done in MODE 3 and 6 since the start of
1040 the screen appears to be rounded down to the nearest page, but by reducing the
1041 height by amounts more than a page, savings would be possible. For example:
1042
1043 Screen width Depth Height Bytes per line Saving in bytes Start address
1044 ------------ ----- ------ -------------- --------------- -------------
1045 640 1 252 80 320 &3140 -> &3100
1046 640 1 248 80 640 &3280 -> &3200
1047 320 1 240 40 640 &5A80 -> &5A00
1048 320 2 240 80 1280 &3500
1049
1050 Screen Mode Selection
1051 ---------------------
1052
1053 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
1054 range of modes, the other bits of &FE*7 (related to sound, cassette
1055 input/output and the Caps Lock LED) would need to be reassigned and bit 0
1056 potentially being made available for use.
1057
1058 Enhancement: Palette Definition
1059 -------------------------------
1060
1061 Since all memory accesses go via the ULA, an enhanced ULA could employ more
1062 specific addresses than &FE*X to perform enhanced functions. For example, the
1063 palette control is done using &FE*8-F and merely involves selecting predefined
1064 colours, whereas an enhanced ULA could support the redefinition of all 16
1065 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
1066 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour
1067 specifications similar to those used on the Archimedes.
1068
1069 The principal limitation here is actually the hardware: the Electron has only
1070 a single output line for each of the red, green and blue channels, and if
1071 those outputs are strictly digital and can only be set to a "high" and "low"
1072 value, then only the existing eight colours are possible. If a modern ULA were
1073 able to output analogue values (or values at well-defined points between the
1074 high and low values, such as the half-on value supported by the Amstrad CPC
1075 series), it would still need to be assessed whether the circuitry could
1076 successfully handle and propagate such values. Various sources indicate that
1077 only "TTL levels" are supported by the RGB output circuit, and since there are
1078 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
1079 is likely that the ULA is expected to provide only "high" or "low" values.
1080
1081 Short of adding extra outputs from the ULA (either additional red, green and
1082 blue outputs or a combined intensity output), another approach might involve
1083 some kind of modulation where an output value might be encoded in multiple
1084 pulses at a higher frequency than the pixel frequency. However, this would
1085 demand additional circuitry outside the ULA, and component RGB monitors would
1086 probably not be able to take advantage of this feature; only UHF and composite
1087 video devices (the latter with the composite video colour support enabled on
1088 the Electron's circuit board) would potentially benefit.
1089
1090 Flashing Colours
1091 ----------------
1092
1093 According to the Advanced User Guide, "The cursor and flashing colours are
1094 entirely generated in software: This means that all of the logical to physical
1095 colour map must be changed to cause colours to flash." This appears to suggest
1096 that the palette registers must be updated upon the flash counter - read and
1097 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
1098 colour pairs to be any combination of colours might be possible, instead of
1099 having colour complements as pairs.
1100
1101 It is conceivable that the interrupt code responsible does the simple thing
1102 and merely inverts the current values for any logical colours (LC) for which
1103 the associated physical colour (as supplied as the second parameter to the VDU
1104 19 call) has the top bit of its four bit value set. These top bits are not
1105 recorded in the palette registers but are presumably recorded separately and
1106 used to build bitmaps as follows:
1107
1108 LC 2 colour 4 colour 16 colour 4-bit value for inversion
1109 -- -------- -------- --------- -------------------------
1110 0 00010001 00010001 00010001 1, 1, 1
1111 1 01000100 00100010 00010001 4, 2, 1
1112 2 01000100 00100010 4, 2
1113 3 10001000 00100010 8, 2
1114 4 00010001 1
1115 5 00010001 1
1116 6 00100010 2
1117 7 00100010 2
1118 8 01000100 4
1119 9 01000100 4
1120 10 10001000 8
1121 11 10001000 8
1122 12 01000100 4
1123 13 01000100 4
1124 14 10001000 8
1125 15 10001000 8
1126
1127 Inversion value calculation:
1128
1129 2 colour formula: 1 << (colour * 2)
1130 4 colour formula: 1 << colour
1131 16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
1132
1133 For example, where logical colour 0 has been mapped to a physical colour in
1134 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
1135 the inversion operation. (The lower three bits of the physical colour would be
1136 used to set the underlying colour information affected by the inversion
1137 operation.)
1138
1139 An operation in the interrupt code would then combine the bitmaps for all
1140 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
1141 combined for groups of logical colours as follows:
1142
1143 Logical colours
1144 ---------------
1145 0, 2, 8, 10
1146 4, 6, 12, 14
1147 5, 7, 13, 15
1148 1, 3, 9, 11
1149
1150 These combined bitmaps would be EORed with the existing palette register
1151 values in order to perform the value inversion necessary to produce the
1152 flashing effect.
1153
1154 Thus, in the VDU 19 operation, the appropriate inversion value would be
1155 calculated for the logical colour, and this value would then be combined with
1156 other inversion values in a dedicated memory location corresponding to the
1157 colour's group as indicated above. Meanwhile, the palette channel values would
1158 be derived from the lower three bits of the specified physical colour and
1159 combined with other palette data in dedicated memory locations corresponding
1160 to the palette registers.
1161
1162 Interestingly, although flashing colours on the BBC Micro are controlled by
1163 toggling bit 0 of the &FE20 control register location for the Video ULA, the
1164 actual colour inversion is done in hardware.
1165
1166 Enhancement: Palette Definition Lists
1167 -------------------------------------
1168
1169 It can be useful to redefine the palette in order to change the colours
1170 available for a particular region of the screen, particularly in modes where
1171 the choice of colours is constrained, and if an increased colour depth were
1172 available, palette redefinition would be useful to give the illusion of more
1173 than 16 colours in MODE 2. Traditionally, palette redefinition has been done
1174 by using interrupt-driven timers, but a more efficient approach would involve
1175 presenting lists of palette definitions to the ULA so that it can change the
1176 palette at a particular display line.
1177
1178 One might define a palette redefinition list in a region of memory and then
1179 communicate its contents to the ULA by writing the address and length of the
1180 list, along with the display line at which the palette is to be changed, to
1181 ULA registers such that the ULA buffers the list and performs the redefinition
1182 at the appropriate time. Throughput/bandwidth considerations might impose
1183 restrictions on the practical length of such a list, however.
1184
1185 A simple form of palette definition might be useful in text modes. Within the
1186 blank region between lines, the foreground palette could be changed to apply
1187 to the next line. Palette values could be read from a table in RAM, perhaps
1188 preceding the screen data, with 24 2-byte entries providing palette
1189 redefinition support in 2- and 4-colour modes.
1190
1191 Enhancement: Display Synchronisation Interrupts
1192 -----------------------------------------------
1193
1194 When completing each scanline of the display, the ULA could trigger an
1195 interrupt. Since this might impact system performance substantially, the
1196 feature would probably need to be configurable, and it might be sufficient to
1197 have an interrupt only after a certain number of display lines instead.
1198 Permitting the CPU to take action after eight lines would allow palette
1199 switching and other effects to occur on a character row basis.
1200
1201 The ULA provides an interrupt at the end of the display period, presumably so
1202 that software can schedule updates to the screen, avoid flickering or tearing,
1203 and so on. However, some applications might benefit from an interrupt at, or
1204 just before, the start of the display period so that palette modifications or
1205 similar effects could be scheduled.
1206
1207 Enhancement: Palette-Free Modes
1208 -------------------------------
1209
1210 Palette-free modes might be defined where bit values directly correspond to
1211 the red, green and blue channels, although this would mostly make sense only
1212 for modes with depths greater than the standard 4 bits per pixel, and such
1213 modes would require more memory than MODE 2 if they were to have an acceptable
1214 resolution.
1215
1216 Enhancement: Display Suspend
1217 ----------------------------
1218
1219 Especially when writing to the screen memory, it could be beneficial to be
1220 able to suspend the ULA's access to the memory, instead producing blank values
1221 for all screen pixels until a program is ready to reveal the screen. This is
1222 different from palette blanking since with a blank palette, the ULA is still
1223 reading screen memory and translating its contents into pixel values that end
1224 up being blank.
1225
1226 This function is reminiscent of a capability of the ZX81, albeit necessary on
1227 that hardware to reduce the load on the system CPU which was responsible for
1228 producing the video output. By allowing display suspend on the Electron, the
1229 performance benefit would be derived from giving the CPU full access to the
1230 memory bandwidth.
1231
1232 Note that since the CPU is only able to access RAM at 1MHz, there is no
1233 possibility to improve performance beyond that achieved in MODE 4, 5 or 6
1234 normally. However, if faster RAM access were to be made possible (see the
1235 discussion of 8-bit wide RAM access), the CPU could benefit from freeing up
1236 the ULA's access slots entirely.
1237
1238 The region blanking feature mentioned above could be implemented using this
1239 enhancement instead of employing palette blanking for the affected lines of
1240 the display.
1241
1242 Enhancement: Memory Filling
1243 ---------------------------
1244
1245 A capability that could be given to an enhanced ULA is that of permitting the
1246 ULA to write to screen memory as well being able to read from it. Although
1247 such a capability would probably not be useful in conjunction with the
1248 existing read operations when producing a screen display, and insufficient
1249 bandwidth would exist to do so in high-bandwidth screen modes anyway, the
1250 capability could be offered during a display suspend period (as described
1251 above), permitting a more efficient mechanism to rapidly fill memory with a
1252 predetermined value.
1253
1254 This capability could also support block filling, where the limits of the
1255 filled memory would be defined by the position and size of a screen area,
1256 although this would demand the provision of additional registers in the ULA to
1257 retain the details of such areas and additional logic to control the fill
1258 operation.
1259
1260 Enhancement: Region Filling
1261 ---------------------------
1262
1263 An alternative to memory writing might involve indicating regions using
1264 additional registers or memory where the ULA fills regions of the screen with
1265 content instead of reading from memory. Unlike hardware sprites which should
1266 realistically provide varied content, region filling could employ single
1267 colours or patterns, and one advantage of doing so would be that the ULA need
1268 not access memory at all within a particular region.
1269
1270 Regions would be defined on a row-by-row basis. Instead of reading memory and
1271 blitting a direct representation to the screen, the ULA would read region
1272 definitions containing a start column, region width and colour details. There
1273 might be a certain number of definitions allowed per row, or the ULA might
1274 just traverse an ordered list of such definitions with each one indicating the
1275 row, start column, region width and colour details.
1276
1277 One could even compress this information further by requiring only the row,
1278 start column and colour details with each subsequent definition terminating
1279 the effect of the previous one. However, one would also need to consider the
1280 convenience of preparing such definitions and whether efficient access to
1281 definitions for a particular row might be desirable. It might also be
1282 desirable to avoid having to prepare definitions for "empty" areas of the
1283 screen, effectively making the definition of the screen contents employ
1284 run-length encoding and employ only colour plus length information.
1285
1286 One application of region filling is that of simple 2D and 3D shape rendering.
1287 Although it is entirely possible to plot such shapes to the screen and have
1288 the ULA blit the memory contents to the screen, such operations consume
1289 bandwidth both in the initial plotting and in the final transfer to the
1290 screen. Region filling would reduce such bandwidth usage substantially.
1291
1292 This way of representing screen images would make certain kinds of images
1293 unfeasible to represent - consider alternating single pixel values which could
1294 easily occur in some character bitmaps - even if an internal queue of regions
1295 were to be supported such that the ULA could read ahead and buffer such
1296 "bandwidth intensive" areas. Thus, the ULA might be better served providing
1297 this feature for certain areas of the display only as some kind of special
1298 graphics window.
1299
1300 Enhancement: Hardware Sprites
1301 -----------------------------
1302
1303 An enhanced ULA might provide hardware sprites, but this would be done in an
1304 way that is incompatible with the standard ULA, since no &FE*X locations are
1305 available for allocation. To keep the facility simple, hardware sprites would
1306 have a standard byte width and height.
1307
1308 The specification of sprites could involve the reservation of 16 locations
1309 (for example, &FE20-F) specifying a fixed number of eight sprites, with each
1310 location pair referring to the sprite data. By limiting the ULA to dealing
1311 with a fixed number of sprites, the work required inside the ULA would be
1312 reduced since it would avoid having to deal with arbitrary numbers of sprites.
1313
1314 The principal limitation on providing hardware sprites is that of having to
1315 obtain sprite data, given that the ULA is usually required to retrieve screen
1316 data, and given the lack of memory bandwidth available to retrieve sprite data
1317 (particularly from multiple sprites supposedly at the same position) and
1318 screen data simultaneously. Although the ULA could potentially read sprite
1319 data and screen data in alternate memory accesses in screen modes where the
1320 bandwidth is not already fully utilised, this would result in a degradation of
1321 performance.
1322
1323 Enhancement: Additional Screen Mode Configurations
1324 --------------------------------------------------
1325
1326 Alternative screen mode configurations could be supported. The ULA has to
1327 produce 640 pixel values across the screen, with pixel doubling or quadrupling
1328 employed to fill the screen width:
1329
1330 Screen width Columns Scaling Depth Bytes
1331 ------------ ------- ------- ----- -----
1332 640 80 x1 1 80
1333 320 40 x2 1, 2 40, 80
1334 160 20 x4 2, 4 40, 80
1335
1336 It must also use at most 80 byte-sized memory accesses to provide the
1337 information for the display. Given that characters must occupy an 8x8 pixel
1338 array, if a configuration featuring anything other than 20, 40 or 80 character
1339 columns is to be supported, compromises must be made such as the introduction
1340 of blank pixels either between characters (such as occurs between rows in MODE
1341 3 and 6) or at the end of a scanline (such as occurs at the end of the frame
1342 in MODE 3 and 6). Consider the following configuration:
1343
1344 Screen width Columns Scaling Depth Bytes Blank
1345 ------------ ------- ------- ----- ------ -----
1346 208 26 x3 1, 2 26, 52 16
1347
1348 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
1349 colours could be provided, with 16 blank pixel values (out of a total of 640)
1350 generated either at the start or end (or split between the start and end) of
1351 each scanline.
1352
1353 Enhancement: Character Attributes
1354 ---------------------------------
1355
1356 The BBC Micro MODE 7 employs something resembling character attributes to
1357 support teletext displays, but depends on circuitry providing a character
1358 generator. The ZX Spectrum, on the other hand, provides character attributes
1359 as a means of colouring bitmapped graphics. Although such a feature is very
1360 limiting as the sole means of providing multicolour graphics, in situations
1361 where the choice is between low resolution multicolour graphics or high
1362 resolution monochrome graphics, character attributes provide a potentially
1363 useful compromise.
1364
1365 For each byte read, the ULA must deliver 8 pixel values (out of a total of
1366 640) to the video output, doing so by either emptying its pixel buffer on a
1367 pixel per cycle basis, or by multiplying pixels and thus holding them for more
1368 than one cycle. For example for a screen mode having 640 pixels in width:
1369
1370 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1371 Reads: B B
1372 Pixels: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
1373
1374 And for a screen mode having 320 pixels in width:
1375
1376 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1377 Reads: B
1378 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1379
1380 However, in modes where less than 80 bytes are required to generate the pixel
1381 values, an enhanced ULA might be able to read additional bytes between those
1382 providing the bitmapped graphics data:
1383
1384 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1385 Reads: B A
1386 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1387
1388 These additional bytes could provide colour information for the bitmapped data
1389 in the following character column (of 8 pixels). Since it would be desirable
1390 to apply attribute data to the first column, the initial 8 cycles might be
1391 configured to not produce pixel values.
1392
1393 For an entire character, attribute data need only be read for the first row of
1394 pixels for a character. The subsequent rows would have attribute information
1395 applied to them, although this would require the attribute data to be stored
1396 in some kind of buffer. Thus, the following access pattern would be observed:
1397
1398 Reads: A B _ B _ B _ B _ B _ B _ B _ B ...
1399
1400 In modes 3 and 6, the blank display lines could be used to retrieve attribute
1401 data:
1402
1403 Reads (blank): A _ A _ A _ A _ A _ A _ A _ A _ ...
1404 Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ...
1405 Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ...
1406 ...
1407
1408 See below for a discussion of using this for character data as well.
1409
1410 A whole byte used for colour information for a whole character would result in
1411 a choice of 256 colours, and this might be somewhat excessive. By only reading
1412 attribute bytes at every other opportunity, a choice of 16 colours could be
1413 applied individually to two characters.
1414
1415 Cycle: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1416 Reads: B A B -
1417 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1418
1419 Further reductions in attribute data access, offering 4 colours for every
1420 character in a four character block, for example, might also be worth
1421 considering.
1422
1423 Consider the following configurations for screen modes with a colour depth of
1424 1 bit per pixel for bitmap information:
1425
1426 Screen width Columns Scaling Bytes (B) Bytes (A) Colours Screen start
1427 ------------ ------- ------- --------- --------- ------- ------------
1428 320 40 x2 40 40 256 &5300
1429 320 40 x2 40 20 16 &5580 -> &5500
1430 320 40 x2 40 10 4 &56C0 -> &5600
1431 208 26 x3 26 26 256 &62C0 -> &6200
1432 208 26 x3 26 13 16 &6460 -> &6400
1433
1434 Enhancement: Text-Only Modes using Character and Attribute Data
1435 ---------------------------------------------------------------
1436
1437 In modes 3 and 6, the blank display lines could be used to retrieve character
1438 and attribute data instead of trying to insert it between bitmap data accesses,
1439 but this data would then need to be retained:
1440
1441 Reads: A C A C A C A C A C A C A C A C ...
1442 Reads: B _ B _ B _ B _ B _ B _ B _ B _ ...
1443
1444 Only attribute (A) and character (C) reads would require screen memory
1445 storage. Bitmap data reads (B) would involve either accesses to memory to
1446 obtain character definition details or could, at the cost of special storage
1447 in the ULA, involve accesses within the ULA that would then free up the RAM.
1448 However, the CPU would not benefit from having any extra access slots due to
1449 the limitations of the RAM access mechanism.
1450
1451 A scheme without caching might be possible. The same line of memory addresses
1452 might be visited over and over again for eight display lines, with an index
1453 into the bitmap data being incremented from zero to seven. The access patterns
1454 would look like this:
1455
1456 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 0)
1457 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 1)
1458 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 2)
1459 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 3)
1460 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 4)
1461 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 5)
1462 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 6)
1463 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 7)
1464
1465 The bandwidth requirements would be the sum of the accesses to read the
1466 character values (repeatedly) and those to read the bitmap data to reproduce
1467 the characters on screen.
1468
1469 Enhancement: MODE 7 Emulation using Character Attributes
1470 --------------------------------------------------------
1471
1472 If the scheme of applying attributes to character regions were employed to
1473 emulate MODE 7, in conjunction with the MODE 6 display technique, the
1474 following configuration would be required:
1475
1476 Screen width Columns Rows Bytes (B) Bytes (A) Colours Screen start
1477 ------------ ------- ---- --------- --------- ------- ------------
1478 320 40 25 40 20 16 &5ECC -> &5E00
1479 320 40 25 40 10 4 &5FC6 -> &5F00
1480
1481 Although this requires much more memory than MODE 7 (8500 bytes versus MODE
1482 7's 1000 bytes), it does not need much more memory than MODE 6, and it would
1483 at least make a limited 40-column multicolour mode available as a substitute
1484 for MODE 7.
1485
1486 Using the text-only enhancement with caching of data or with repeated reads of
1487 the same character data line for eight display lines, the storage requirements
1488 would be diminished substantially:
1489
1490 Screen width Columns Rows Bytes (C) Bytes (A) Colours Screen start
1491 ------------ ------- ---- --------- --------- ------- ------------
1492 320 40 25 40 20 16 &7A94 -> &7A00
1493 320 40 25 40 10 4 &7B1E -> &7B00
1494 320 40 25 40 5 2 &7B9B -> &7B00
1495 320 40 25 40 0 (2) &7C18 -> &7C00
1496 640 80 25 80 40 16 &7448 -> &7400
1497 640 80 25 80 20 4 &763C -> &7600
1498 640 80 25 80 10 2 &7736 -> &7700
1499 640 80 25 80 0 (2) &7830 -> &7800
1500
1501 Note that the colours describe the locally defined attributes for each
1502 character. When no attribute information is provided, the colours are defined
1503 globally.
1504
1505 Enhancement: Character Generator Support and Vertical Scaling
1506 -------------------------------------------------------------
1507
1508 When generating a picture, the ULA traverses screen memory, obtaining 40 or 80
1509 bytes of pixel data for each scanline. It then proceeds to the next row of
1510 pixel data for each successive scanline, with the exception of the text modes
1511 where scanlines may be blank (for which the row address does not advance).
1512 This arrangement provides a conventional bitmapped graphics display.
1513
1514 However, the ULA could instead facilitate the use of character generators. The
1515 principles involved can be demonstrated by the Jafa Mode 7 Mark 2 Display Unit
1516 expansion for the Electron which feeds the pixel data from a MODE 4 screen to
1517 a SAA5050 character generator to create a MODE 7 display. The solution adopted
1518 involves the replication of 40 bytes of character data across as many pixel
1519 rows as is necessary for the character generator to receive the appropriate
1520 character data for all scanlines in any given character row. If only a single
1521 40-byte row of character data were to be present for the first scanline of a
1522 character row, the character generator would only produce the first scanline
1523 (or the uppermost pixels of the characters) correctly, with the rest of the
1524 character shapes being ill-defined.
1525
1526 Here, the ULA could facilitate the use of memory-efficient character mode
1527 representations (such as MODE 7) by holding the row address for a number of
1528 scanlines, thus providing the same row of screen data for those scanlines,
1529 then advancing to the next row. Visualised in terms of pixel data, it would be
1530 like providing a display with a very low vertical resolution. Indeed, being
1531 able to reduce the vertical resolution of a display mode by a factor of eight
1532 or ten would be equivalent to the above character generation technique in
1533 terms of the ULA's screen reading activities.
1534
1535 By combining this vertical scaling or scanline replication with a circuit
1536 switchable between bitmapped graphics output and character graphics output,
1537 MODE 7 support could be made available, potentially as a hardware option
1538 separate from the ULA.
1539
1540 Enhancement: Compressed Character Data
1541 --------------------------------------
1542
1543 Another observation about text-only modes is that they only need to store a
1544 restricted set of bitmapped data values. Encoding this set of values in a
1545 smaller unit of storage than a byte could possibly help to reduce the amount
1546 of storage and bandwidth required to reproduce the characters on the display.
1547
1548 Enhancement: High Resolution Graphics and Larger Colour Depths
1549 --------------------------------------------------------------
1550
1551 Screen modes with higher resolutions and larger colour depths might be
1552 possible, but this would in most cases involve the allocation of more screen
1553 memory, and the ULA would probably then be obliged to page in such memory for
1554 the CPU to be able to sensibly access it all. Higher resolutions would also
1555 involve a faster pixel clock.
1556
1557 However, we may consider a doubled colour depth and the need for higher
1558 bandwidth transfers by a ULA having an 8-bit data bus to access the RAM,
1559 utilising two "page mode" transfers per 2MHz cycle. If such transfers were to
1560 access consecutive bytes in the same memory region (for example, bytes &3000
1561 and &3001) this would require a change to the arrangement of screen memory,
1562 also incurring changes to the memory map for larger modes:
1563
1564 (&3000 &3001) (&3010 &3011) ...
1565 (&3002 &3003) (&3012 &3013)
1566 ... ...
1567 (&300E &300F) (&301E &301F)
1568
1569 If such transfers were to access two adjacent columns of bytes (for example,
1570 bytes &3000 and &3008), this would still require a change in the step size
1571 across the screen memory, also incur memory map changes for larger modes, and
1572 the method for programs to update the screen would be more complicated:
1573
1574 (&3000 &3008) (&3010 &3018) ...
1575 (&3001 &3009) (&3011 &3019)
1576 ... ...
1577 (&3007 &300F) (&3017 &301F)
1578
1579 However, such transfers could instead map the device address bit that is
1580 toggled between transfers to the most significant system memory address bit.
1581 Thus, bits in adjacent locations within each RAM device would actually reside
1582 in different memory regions:
1583
1584 (&3000 &B000) (&3008 &B008) ...
1585 (&3001 &B001) (&3009 &B009)
1586 ... ...
1587 (&3007 &B007) (&300F &B00F)
1588
1589 Since &B000 can also be considered as &3000 combined with &8000, this
1590 introducing the asserted uppermost bit, address &B000 can be considered as
1591 &3000 in an upper memory bank.
1592
1593 Other mechanisms might be employed to allow programs to access the uppermost
1594 bank, but the ULA would be able to access it trivially and unconditionally.
1595
1596 Enhancement: Genlock Support
1597 ----------------------------
1598
1599 The ULA generates a video signal in conjunction with circuitry producing the
1600 output features necessary for the correct display of the screen image.
1601 However, it appears that the ULA drives the video synchronisation mechanism
1602 instead of reacting to an existing signal. Genlock support might be possible
1603 if the ULA were made to be responsive to such external signals, resetting its
1604 address generators upon receiving synchronisation events.
1605
1606 Enhancement: Improved Sound
1607 ---------------------------
1608
1609 The standard ULA reserves &FE*6 for sound generation and cassette input/output
1610 (with bits 1 and 2 of &FE*7 being used to select either sound generation or
1611 cassette I/O), thus making it impossible to support multiple channels within
1612 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
1613 and an enhanced ULA could adopt this interface.
1614
1615 The BBC Micro uses the SN76489 chip to produce sound, and the entire
1616 functionality of this chip could be emulated for enhanced sound, with a subset
1617 of the functionality exposed via the &FE*6 interface.
1618
1619 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
1620 See: http://www.smspower.org/Development/SN76489
1621
1622 Enhancement: Waveform Upload
1623 ----------------------------
1624
1625 As with a hardware sprite function, waveforms could be uploaded or referenced
1626 using locations as registers referencing memory regions.
1627
1628 Enhancement: Sound Input/Output
1629 -------------------------------
1630
1631 Since the ULA already controls audio input/output for cassette-based data, it
1632 would have been interesting to entertain the idea of sampling and output of
1633 sounds through the cassette interface. However, a significant amount of
1634 circuitry is employed to process the input signal for use by the ULA and to
1635 process the output signal for recording.
1636
1637 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
1638
1639 Enhancement: BBC ULA Compatibility
1640 ----------------------------------
1641
1642 Although some new ULA functions could be defined in a way that is also
1643 compatible with the BBC Micro, the BBC ULA is itself incompatible with the
1644 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
1645 map, but controls various functions specific to the 6845 video controller;
1646 &FE08-F is reserved for the serial controller. It therefore becomes possible
1647 to disregard compatibility where compatibility is already disregarded for a
1648 particular area of functionality.
1649
1650 &FE20-F maps to video ULA functionality on the BBC Micro which provides
1651 control over the palette (using address &FE21, compared to &FE07-F on the
1652 Electron) and other system-specific functions. Since the location usage is
1653 generally incompatible, this region could be reused for other purposes.
1654
1655 Enhancement: Increased RAM, ULA and CPU Performance
1656 ---------------------------------------------------
1657
1658 More modern implementations of the hardware might feature faster RAM coupled
1659 with an increased ULA clock frequency in order to increase the bandwidth
1660 available to the ULA and to the CPU in situations where the ULA is not needed
1661 to perform work. A ULA employing a 32MHz clock would be able to complete the
1662 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
1663 to access the RAM for the following 250ns even in display modes requiring the
1664 retrieval of a byte for the display every 500ns. The CPU could, subject to
1665 timing issues, run at 2MHz even in MODE 0, 1 and 2.
1666
1667 A scheme such as that described above would have a similar effect to the
1668 scheme employed in the BBC Micro, although the latter made use of RAM with a
1669 wider bandwidth in order to complete memory transfers within 250ns and thus
1670 permit the CPU to run continuously at 2MHz.
1671
1672 Higher bandwidth could potentially be used to implement exotic features such
1673 as RAM-resident hardware sprites or indeed any feature demanding RAM access
1674 concurrent with the production of the display image.
1675
1676 Enhancement: Multiple CPU Stacks and Zero Pages
1677 -----------------------------------------------
1678
1679 The 6502 maintains a stack for subroutine calls and register storage in page
1680 &01. Although the stack register can be manipulated using the TSX and TXS
1681 instructions, thereby permitting the maintenance of multiple stack regions and
1682 thus the potential coexistence of multiple programs each using a separate
1683 region, only programs that make little use of the stack (perhaps avoiding
1684 deeply-nested subroutine invocations and significant register storage) would
1685 be able to coexist without overwriting each other's stacks.
1686
1687 One way that this issue could be alleviated would involve the provision of a
1688 facility to redirect accesses to page &01 to other areas of memory. The ULA
1689 would provide a register that defines a physical page for the use of the CPU's
1690 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would
1691 change the asserted address lines to redirect the access to the appropriate
1692 physical region.
1693
1694 By providing an 8-bit register, mapping to the most significant byte (MSB) of
1695 a 16-bit address, the ULA could then replace any MSB equal to &01 with the
1696 register value before the access is made. Where multiple programs coexist,
1697 upon switching programs, the register would be updated to point the ULA to the
1698 appropriate stack location, thus providing a simple memory management unit
1699 (MMU) capability.
1700
1701 In a similar fashion, zero page accesses could also be redirected so that code
1702 could run from sideways RAM and have zero page operations redirected to "upper
1703 memory" - for example, to page &BE (with stack accesses redirected to page
1704 &BF, perhaps) - thereby permitting most CPU operations to occur without
1705 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
1706 CPU as it contends with the ULA for memory access.
1707
1708 Such facilities could also be provided by a separate circuit between the CPU
1709 and ULA in a fashion similar to that employed by a "turbo" board, but unlike
1710 such boards, no additional RAM would be provided: all memory accesses would
1711 occur as normal through the ULA, albeit redirected when configured
1712 appropriately.
1713
1714 ULA Pin Functions
1715 -----------------
1716
1717 The functions of the ULA pins are described in the Electron Service Manual. Of
1718 interest to video processing are the following:
1719
1720 CSYNC (low during horizontal or vertical synchronisation periods, high
1721 otherwise)
1722
1723 HS (low during horizontal synchronisation periods, high otherwise)
1724
1725 RED, GREEN, BLUE (pixel colour outputs)
1726
1727 CLOCK IN (a 16MHz clock input, 4V peak to peak)
1728
1729 PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
1730
1731 More general memory access pins:
1732
1733 RAM0...RAM3 (data lines to/from the RAM)
1734
1735 RA0...RA7 (address lines for sending both row and column addresses to the RAM)
1736
1737 RAS (row address strobe setting the row address on a negative edge - see the
1738 timing notes)
1739
1740 CAS (column address strobe setting the column address on a negative edge -
1741 see the timing notes)
1742
1743 WE (sets write enable with logic 0, read with logic 1)
1744
1745 ROM (select data access from ROM)
1746
1747 CPU-oriented memory access pins:
1748
1749 A0...A15 (CPU address lines)
1750
1751 PD0...PD7 (CPU data lines)
1752
1753 R/W (indicates CPU write with logic 0, CPU read with logic 1)
1754
1755 Interrupt-related pins:
1756
1757 NMI (CPU request for uninterrupted 1MHz access to memory)
1758
1759 IRQ (signal event to CPU)
1760
1761 POR (power-on reset, resetting the ULA on a positive edge and asserting the
1762 CPU's RST pin)
1763
1764 RST (master reset for the CPU signalled on power-up and by the Break key)
1765
1766 Keyboard-related pins:
1767
1768 KBD0...KBD3 (keyboard inputs)
1769
1770 CAPS LOCK (control status LED)
1771
1772 Sound-related pins:
1773
1774 SOUND O/P (sound output using internal oscillator)
1775
1776 Cassette-related pins:
1777
1778 CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
1779
1780 CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
1781
1782 CAS RC (detect high tone)
1783
1784 CAS MO (motor relay output)
1785
1786 ÷13 IN (~1200 baud clock input)
1787
1788 ULA Socket
1789 ----------
1790
1791 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
1792
1793 References
1794 ----------
1795
1796 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
1797
1798 About this Document
1799 -------------------
1800
1801 The most recent version of this document and accompanying distribution should
1802 be available from the following location:
1803
1804 http://hgweb.boddie.org.uk/ULA
1805
1806 Copyright and licence information can be found in the docs directory of this
1807 distribution - see docs/COPYING.txt for more information.