1 The Acorn Electron ULA
2 ======================
3
4 Principal Design and Feature Constraints
5 ----------------------------------------
6
7 The features of the ULA are limited in sophistication by the amount of time
8 and resources that can be allocated to each activity supporting the
9 fundamental features and obligations of the unit. Maintaining a screen display
10 based on the contents of RAM itself requires the ULA to have exclusive access
11 to various hardware resources for a significant period of time.
12
13 Whilst other elements of the ULA can in principle run in parallel with the
14 display refresh activity, they cannot also access the RAM at the same time.
15 Consequently, other features that might use the RAM must accept a reduced
16 allocation of that resource in comparison to a hypothetical architecture where
17 concurrent RAM access is possible at all times.
18
19 Thus, the principal constraint for many features is bandwidth. The duration of
20 access to hardware resources is one aspect of this; the rate at which such
21 resources can be accessed is another. For example, the RAM is not fast enough
22 to support access more frequently than one byte per 2MHz cycle, and for screen
23 modes involving 80 bytes of screen data per scanline, there are no free cycles
24 for anything other than the production of pixel output during the active
25 scanline periods.
26
27 Another constraint is imposed by the method of RAM access provided by the ULA.
28 The ULA is able to access RAM by fetching 4 bits at a time and thus managing
29 to transfer 8 bits within a single 2MHz cycle, this being sufficient to
30 provide display data for the most demanding screen modes. However, this
31 mechanism's timing requirements are beyond the capabilities of the CPU when
32 running at 2MHz.
33
34 Consequently, the CPU will only ever be able to access RAM via the ULA at
35 1MHz, even when the ULA is not accessing the RAM. Fortunately, when needing to
36 refresh the display, the ULA is still able to make use of the idle part of
37 each 1MHz cycle (or, rather, the idle 2MHz cycle unused by the CPU) to itself
38 access the RAM at a rate of 1 byte per 1MHz cycle (or 1 byte every other 2MHz
39 cycle), thus supporting the less demanding screen modes.
40
41 Timing
42 ------
43
44 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
45 of which are used to generate pixel data. At 50Hz, this means that 128 cycles
46 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
47 312 ~= 128 cycles). This is consistent with the observation that each scanline
48 requires at most 80 bytes of data, and that the ULA is apparently busy for 40
49 out of 64 microseconds in each scanline.
50
51 (In fact, since the ULA is seeking to provide an image for an interlaced
52 625-line display, there are in fact two "fields" involved, one providing 312
53 scanlines and one providing 313 scanlines. See below for a description of the
54 video system.)
55
56 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
57 each providing two bits of each byte) using two cycles within the 500ns period
58 of the 2MHz clock to complete each access operation. Since the CPU and ULA
59 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
60 effectively run at 1MHz (since every other 500ns period involves the ULA
61 accessing RAM) during transfers of screen data.
62
63 The CPU is driven by an external clock (IC8) whose 16MHz frequency is divided
64 by the ULA (IC1) depending on the screen mode in use. Each 16MHz cycle is
65 approximately 62.5ns. To access the memory, the following patterns
66 corresponding to 16MHz cycles are required:
67
68 Time (ns): 0-------------- 500------------- ...
69 2 MHz cycle: 0 1 ...
70 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
71 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
72 ~RAS: /---\___________/---\___________ ...
73 ~CAS: /-----\___/-\___/-----\___/-\___ ...
74 Address events: A B C A B C ...
75 Data events: ...F ...S ...F ...S ...
76 ~WE: R R ...
77
78 ~RAS ops: 1 0 1 0 ...
79 ~CAS ops: 1 0 1 0 1 0 1 0 ...
80
81 Address ops: a.b. c. a.b. c. ...
82 Data ops: s f s f ...
83
84 PHI OUT: ----\_______/-------\_______/--- ...
85 CPU (ROM): D .....L ....D .....L .... ...
86 RnW: .....R .....R ...
87
88 ~RAS must be high for 100ns, ~CAS must be high for 50ns.
89 ~RAS must be low for 150ns, ~CAS must be low for 90ns.
90 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
91
92 Here, "A" and "B" respectively indicate the row and first column addresses
93 being latched into the RAM (on a negative edge for ~RAS and ~CAS
94 respectively), and "C" indicates the second column address being latched into
95 the RAM. Presumably, the first and second half-bytes can be read at "F" and
96 "S" respectively, and the row and column addresses must be made available at
97 "a" and "b" (and "c") respectively at the latest. The TM4164EC4 datasheet
98 suggests that the addresses can be made available as the ~RAS and ~CAS levels
99 are brought low. Data can be read at "f" and "s" for the first and second
100 half-bytes respectively.
101
102 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
103 address access time of 90ns (maximum), which appears to mean that ~RAS must be
104 held low for at least 150ns and that ~CAS must be held low for at least 90ns
105 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
106 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
107 is 1.5 cycles.
108
109 Note that the Service Manual refers to the negative edge of RAS and CAS, but
110 the datasheet for the similar TM4164EC4 product shows latching on the negative
111 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
112 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
113 "page mode" provides the appropriate behaviour for that particular product.
114
115 The CPU, when accessing the RAM alone, apparently does not make use of the
116 vacated "slot" that the ULA would otherwise use (when interleaving accesses in
117 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
118 accessing ROM (and potentially sideways RAM). The principal limitation is the
119 amount of time needed between issuing an address and receiving an entire byte
120 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
121 4 cycles that would be required for 2MHz operation.
122
123 Write operations expose some uncertainty about the relationship between the
124 ULA's RAM access schedule and the PHI OUT clock. The Service Manual shows PHI
125 IN (which should be the ULA's PHI OUT signal) as being synchronised with ~RAS.
126 Since the CPU makes its address available potentially as late as 140ns after
127 its PHI2 clock goes low (this clock being broadly similar to PHI OUT), it
128 would make no sense to expect the ULA to be able perform a memory access
129 immediately. What seems more likely is that the CPU makes data available, and
130 this is written during the next 2MHz cycle.
131
132 For CPU write operations, "L" indicates the point at which an address is taken
133 from the CPU address bus, following a negative edge of PHI OUT, with "D" being
134 the point at which data may be asserted for writing, following a positive edge
135 of PHI OUT. Here, PHI OUT is driven at 1MHz.
136
137 Time (ns): 0-------------- 500------------ 1000------------ ...
138 1 MHz cycle: 0 1
139 2 MHz cycle: 0 1 2 ...
140 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
141 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
142 ~RAS: /---\___________/ ...
143 ~CAS: /-----\___/-\___/ ...
144 PHI OUT: ----\_______/-----------------------\_______/--- ...
145 CPU (RAM): .....L ....D ...
146 RnW: .....W ...
147
148 Here, the concurrent RAM accesses performed by the ULA to obtain any screen
149 data have been omitted to avoid confusion.
150
151 Given that ~WE needs to be driven low for writing or high for reading, and
152 thus propagates RnW from the CPU, this would need to be done before data would
153 be retrieved and, according to the TM4164EC4 datasheet, even as late as the
154 column address is presented and ~CAS brought low.
155
156 For CPU read operations, the positive edge of PHI OUT is not critical.
157 Instead, the data presented to the CPU must be available for a minimum setup
158 time before the next negative edge of PHI OUT. In the diagram below, "D" is
159 the point at which data can be made available. The data must be stable
160 approximately 50ns before the start of the next PHI OUT cycle, indicated by
161 "*" below.
162
163 Time (ns): 0-------------- 500------------ 1000------------ ...
164 1 MHz cycle: 0 1
165 2 MHz cycle: 0 1 2 ...
166 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
167 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
168 ~RAS: /---\___________/ ...
169 ~CAS: /-----\___/-\___/ ...
170 PHI OUT: ----\_______/-----------------------\_______/--- ...
171 CPU (RAM): .....L.........D..............* ...
172 RnW: .....R ...
173
174 Here, the concurrent RAM accesses performed by the ULA to obtain any screen
175 data have been omitted to avoid confusion.
176
177 It must be concluded that where accesses are interleaved between the CPU and
178 ULA, the CPU access begins concurrently with the ULA access, with the CPU
179 address and data retained by the ULA, and after the ULA access, the rest of
180 the CPU transaction occurs in the following 2MHz cycle.
181
182 See: Acorn Electron Advanced User Guide
183 See: Acorn Electron Service Manual
184 http://chrisacorns.computinghistory.org.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
185 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
186 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
187 See: One of the Most Popular 65,536-Bit (64K) Dynamic RAMs The TMS 4164
188 http://smithsonianchips.si.edu/augarten/p64.htm
189 See: https://www.mups.co.uk/project/hardware/acorn_electron/
190 See: Rockwell R650X and R651X Microprocessors (CPU)
191 See: http://wilsonminesco.com/6502primer/
192
193 A Note on 8-Bit Wide RAM Access
194 -------------------------------
195
196 It is worth considering the timing when 8 bits of data can be obtained at once
197 from the RAM chips:
198
199 Time (ns): 0-------------- 500------------- ...
200 2 MHz cycle: 0 1 ...
201 8 MHz cycle: 0 1 2 3 0 1 2 3 ...
202 /-\_/-\_/-\_/-\_/-\_/-\_/-\_/-\_ ...
203 ~RAS: /---\___________/---\___________ ...
204 ~CAS: /-------\_______/-------\_______ ...
205 Address events: A B A B ...
206 Data events: ...E ...E ...
207 ~WE: R R ...
208
209 ~RAS ops: 1 0 1 0 ...
210 ~CAS ops: 1 0 1 0 ...
211
212 Address ops: a. b. a. b. ...
213 Data ops: f s f ...
214
215 PHI OUT: ----\_______/-------\_______/--- ...
216 CPU: D .....L ....D .....L .... ...
217 RnW: .....W .....W ...
218
219 Here, "E" indicates the availability of an entire byte.
220
221 Since only one fetch is required per 2MHz cycle, instead of two fetches for
222 the 4-bit wide RAM arrangement, it seems likely that longer 8MHz cycles could
223 be used to coordinate the necessary signalling.
224
225 Another conceivable simplification from using an 8-bit wide RAM access channel
226 with a single access within each 2MHz cycle is the possibility of allowing the
227 CPU to signal directly to the RAM instead of having the ULA perform the access
228 signalling on the CPU's behalf. Note that it is this more leisurely signalling
229 that would allow the CPU to conduct accesses at 2MHz: the "compressed"
230 signalling being beyond the capabilities of the CPU.
231
232 Note that 16MHz cycles would still be needed for the pixel clock in MODE 0,
233 which needs to output eight pixels per 2MHz cycle, producing 640 monochrome
234 pixels per 80-byte line.
235
236 An obvious consideration with regard to 8-bit wide access is whether the ULA
237 could still conduct the "compressed" signalling for its own RAM accesses:
238
239 Time (ns): 0-------------- 500------------- ...
240 2 MHz cycle: 0 1 ...
241 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
242 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
243 ~RAS: /---\___________/---\___________ ...
244 ~CAS: /-----\___/-\___/-----\___/-\___ ...
245 Address events: A B C A B C ...
246 Data events: ...1 ...2 ...1 ...2 ...
247 ~WE: R R ...
248
249 ~RAS ops: 1 0 1 0 ...
250 ~CAS ops: 1 0 1 0 1 0 1 0 ...
251
252 Address ops: a.b. c a.b. c ...
253 Data ops: s f s f ...
254
255 PHI OUT: ----\_______/-------\_______/--- ...
256 CPU: D .....L ....D .....L .... ...
257 RnW: .....W .....W ...
258
259 Here, "1" and "2" in the data events correspond to whole byte accesses,
260 effectively upgrading the half-byte "F" and "S" events in the existing ULA
261 arrangement.
262
263 Although the provision of access for the CPU would adhere to the relevant
264 timing constraints, providing only one byte per 2MHz cycle, the ULA could
265 obtain two bytes per cycle. This would then free up bandwidth for the CPU in
266 screen modes where the ULA would normally be dominant (MODE 0 to 3), albeit at
267 the cost of extra buffering. Such buffering could also be done for modes where
268 the bandwidth is shared (MODE 4 to 6), consolidating pairs of ULA accesses into
269 single cycles and freeing up an extra cycle for CPU accesses.
270
271 A further consideration is whether the CPU and ULA could access the memory on
272 interleaved 4MHz cycles, thus replicating the arrangement used by the CPU and
273 Video ULA on the BBC Micro. One potential obstacle is that the apparent 4MHz
274 access rate employed by the ULA does not involve the complete process for
275 accessing the RAM: upon setting up the address and issuing the ~RAS signal,
276 the ULA is able to make a pair of column accesses on the same "row" of memory,
277 effectively achieving an average access rate of 4MHz in an 8-bit
278 configuration.
279
280 However, if arbitrary pairs of column accesses were to be attempted, as would
281 be required by CPU and ULA interleaving, the ~RAS signal would need to be
282 re-issued with different addresses being set up. This would expand the time to
283 access a memory location to beyond the period of a 4MHz cycle, making it
284 impossible to employ interleaved accesses at such a rate.
285
286 In conclusion, a strict interleaving strategy is not possible, but by using
287 pixel data buffering and employing two ULA accesses per 2MHz cycle to obtain
288 two bytes in that cycle, each adjacent 2MHz cycle can be given to the CPU,
289 thus achieving an effective throughput during display update periods of 3
290 bytes for every pair of cycles (2 bytes for the ULA, 1 byte for the CPU), and
291 thus 1.5 bytes per cycle, giving an illusion of 3MHz access to RAM.
292
293 Some other considerations apply to introducing 8-bit wide access. The ULA
294 employs four pins for data transfer to and from the memory devices (RAM0..3),
295 and obviously another four pins would be needed in an 8-bit wide scheme.
296 However, there may have been a physical limitation on the number of pins
297 permissible on a ULA package or the device's socket. This would necessitate
298 the reassignment of pins, although few are readily available for such
299 reassignment.
300
301 One approach might involve connecting the RAM devices to the CPU data bus,
302 with each line connecting to a different RAM chip. The signalling of the RAM
303 would remain under the control of the ULA, thus preventing the RAM devices
304 from interfering with other memory transfer operations, with the ROM
305 signalling also remaining under the ULA's control. One potential disadvantage
306 of this scheme would involve the elimination of the separate data paths
307 between the CPU and ROM and between the ULA and RAM.
308
309 Another approach might involve reclaiming the keyboard input pins (KBD0..3) as
310 data pins for ULA access to RAM. This would necessitate the reorganisation of
311 the keyboard interface, perhaps integrating the keyboard matrix more directly
312 as a kind of ROM device. A bus transceiver could be used to isolate the
313 keyboard inputs, with a pin being used to control the transceiver, since the
314 keyboard data lines are pulled high. In effect, the transceiver would act as a
315 kind of output enable for the keyboard.
316
317 To make the matrix appear within the sideways ROM region of the memory map,
318 A15 would need to be set to a high value and A14 to a low value. Signals A13
319 to A0 would then be brought low to select the appropriate column, with the
320 individual key states being made available via data lines, perhaps D3 to D0.
321 This mostly retains the existing addressing arrangement and scanning
322 mechanism. Internally, the ULA would continue to enable access to the keyboard
323 through the ROM paging mechanism, but instead of integrating separate data
324 pins into the CPU's data path, it would integrate the keyboard inputs using
325 the transceiver.
326
327 Enhancement: Keyboard Matrix Scanning
328 -------------------------------------
329
330 The keyboard scanning mechanism is presumably designed to be as inexpensive as
331 possible, being driven by software and avoiding extra logic, but at the
332 expense of occupying large regions of the memory map when paged in. A more
333 efficient mapping of the keyboard columns could possibly be done using
334 decoders such as the 74xx138 part which permits the decoding of three inputs
335 to select one of eight outputs. Using two of these parts, six address lines
336 would be dedicated to the keyboard columns as follows:
337
338 A5...A3 select up to eight columns via one decoder
339 A2...A0 select up to eight columns via another decoder
340
341 In this arrangement, only one of the two ranges of pins would be used at any
342 given time. If the ULA were to require a certain combination of the remaining
343 address bits, a region as small as 64 bytes could be dedicated to the
344 keyboard.
345
346 A more efficient arrangement could be used by introducing logic that allows
347 the decoders to work together to address the keyboard:
348
349 A2...A0 select up to eight columns via both decoders
350 A3 would enable one decoder if low and the other decoder if high
351
352 With ULA constraints on the remaining address bits, a 16-byte region could be
353 used to represent the keyboard.
354
355 A further refinement might involve combining the existing columns into groups
356 of eight keys. This would reduce the number of columns to seven, requiring
357 only three address lines, with all eight data lines being used to read the
358 matrix.
359
360 On the BBC Micro, the system 6522 VIA is used to monitor and read from the
361 keyboard. The memory locations involved with this chip are located in the
362 region from &FE40 to &FE7F inclusive, although the memory is allocated in a
363 way that is appropriate to operate that chip, as opposed to merely exposing
364 the keyboard matrix.
365
366 Enhancement: Hardware Device Selection
367 --------------------------------------
368
369 An alternative to the existing, rather cumbersome, sideways ROM mapping of the
370 keyboard might involve making it accessible via a hardware-related memory page
371 like page FE. With ULA addresses confined to FE0x, and with the ULA itself
372 having to trap accesses to page FE, the page selection signal might be brought
373 out of the ULA instead of any dedicated signal for the keyboard. Various
374 address lines corresponding to A7 through A4, or a subset of these, could be
375 fed into a decoder to permit the selection of other devices, with the keyboard
376 being one of these.
377
378 Meanwhile, a more efficient keyboard mapping using the above matrix
379 enhancement would permit the different keyboard columns to appear as a group
380 of sixteen or eight bytes. Thus:
381
382 A15...A8 select page FE
383 A7...A4 select a device or peripheral
384 A3...A0 select a register or keyboard column
385
386 Conceivably, devices such as sound generators could be mapped to device
387 regions.
388
389 CPU Clock Notes
390 ---------------
391
392 "The 6502 receives an external square-wave clock input signal on pin 37, which
393 is usually labeled PHI0. [...] This clock input is processed within the 6502
394 to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2
395 is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been
396 through two inverters and a push-pull amplifier. The same network of
397 transistors within the 6502 which generates PHI2 is also tied to PHI1, and
398 generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made
399 available to external devices is so that they know when they can access the
400 CPU. When PHI1 is high, this means that external devices can read from the
401 address bus or data bus; when PHI2 is high, this means that external devices
402 can write to the data bus."
403
404 See: http://lateblt.livejournal.com/88105.html
405
406 "The 6502 has a synchronous memory bus where the master clock is divided into
407 two phases (Phase 1 and Phase 2). The address is always generated during Phase
408 1 and all memory accesses take place during Phase 2."
409
410 See: http://www.jmargolin.com/vgens/vgens.htm
411
412 Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During
413 Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means
414 when PHI1 is high.
415
416 Bandwidth Figures
417 -----------------
418
419 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
420 total lines, with 80 cycles occurring in the active periods of display
421 scanlines, the following bandwidth calculations can be performed:
422
423 Total theoretical maximum:
424 128 cycles * 312 lines
425 = 39936 bytes
426
427 MODE 0, 1, 2:
428 ULA: 80 cycles * 256 lines
429 = 20480 bytes
430 CPU: 48 cycles / 2 * 256 lines
431 + 128 cycles / 2 * (312 - 256) lines
432 = 9728 bytes
433
434 MODE 3:
435 ULA: 80 cycles * 24 rows * 8 lines
436 = 15360 bytes
437 CPU: 48 cycles / 2 * 24 rows * 8 lines
438 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
439 = 12288 bytes
440
441 MODE 4, 5:
442 ULA: 40 cycles * 256 lines
443 = 10240 bytes
444 CPU: (40 cycles + 48 cycles / 2) * 256 lines
445 + 128 cycles / 2 * (312 - 256) lines
446 = 19968 bytes
447
448 MODE 6:
449 ULA: 40 cycles * 24 rows * 8 lines
450 = 7680 bytes
451 CPU: (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
452 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
453 = 19968 bytes
454
455 Here, the division of 2 for CPU accesses is performed to indicate that the CPU
456 only uses every other access opportunity even in uncontended periods. See the
457 2MHz RAM Access enhancement below for bandwidth calculations that consider
458 this limitation removed.
459
460 A summary of the bandwidth figures is as follows (with extra timing details
461 described below):
462
463 Standard ULA % Total Slowdown BBC-10s BBC-34s
464 MODE 0, 1, 2 9728 bytes 24% 4.11 43s 105s
465 MODE 3 12288 bytes 31% 3.25 34s
466 MODE 4, 5 19968 bytes 50% 2 20s
467 MODE 6 19968 bytes 50% 2 20s 50s
468
469 The review of the Electron in Practical Computing (October 1983) provides a
470 concise overview of the RAM access limitations and gives timing comparisons
471 between modes and BBC Micro performance. In the above, "BBC-10s" is the
472 measured or stated time given for a program taking 10 seconds on the BBC
473 Micro, whereas "BBC-34s" is the apparently measured time given for the
474 "Persian" program taking 34 seconds to complete on the BBC Micro, with a
475 "quick" mode presumably switching to MODE 6 using the ULA directly in order to
476 reduce display bandwidth usage while the program draws to the screen.
477 Evidently, the measured slowdown is slightly lower than the theoretical
478 slowdown, most likely due to the running time not being entirely dominated by
479 RAM access performance characteristics.
480
481 Video Timing
482 ------------
483
484 According to 8.7 in the Service Manual, and the PAL Wikipedia page,
485 approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
486 (including the "colour burst"), and 1.65µs for the "front porch", totalling
487 12.05µs and thus leaving 51.95µs for the active video signal for each
488 scanline. As the Service Manual suggests in the oscilloscope traces, the
489 display information is transmitted more or less centred within the active
490 video period since the ULA will only be providing pixel data for 40µs in each
491 scanline.
492
493 Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
494 each scanline can be divided into 1024 cycles, although only 640 at most are
495 actively used to provide pixel data. Pixel data production should only occur
496 within a certain period on each scanline, approximately 262 cycles after the
497 start of hsync:
498
499 active video period = 51.95µs
500 pixel data period = 40µs
501 total silent period = 51.95µs - 40µs = 11.95µs
502 silent periods (before and after) = 11.95µs / 2 = 5.975µs
503 hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
504 time before pixel data period = 10.4µs + 5.975µs = 16.375µs
505 pixel data period start cycle = 16.375µs / 62.5ns = 262
506
507 By choosing a number divisible by 8, the RAM access mechanism can be
508 synchronised with the pixel production. Thus, 256 is a more appropriate start
509 cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
510 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
511 document) occurs at cycle 0.
512
513 To summarise:
514
515 HS signal starts at cycle 0 on each horizontal scanline
516 HS signal ends approximately 4µs later at cycle 64
517 Pixel data starts approximately 12µs later at cycle 256
518
519 "Re: Electron Memory Contention" provides measurements that appear consistent
520 with these calculations.
521
522 The "vertical blanking period", meaning the period before picture information
523 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
524 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
525 lines. Thus, the first visible scanline on the first field of a frame occurs
526 half way through the 23rd scanline period measured from the start of vsync
527 (indicated by "V" in the diagrams below):
528
529 10 20 23
530 Line in frame: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
531 Line from 1: 0 22 3
532 Line on screen: .:::::VVVVV::::: 12233445566
533 |_________________________________________________|
534 25 line vertical blanking period
535
536 In the second field of a frame, the first visible scanline coincides with the
537 24th scanline period measured from the start of line 313 in the frame:
538
539 310 336
540 Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
541 Line from 313: 0 23 4
542 Line on screen: 88:::::VVVVV:::: 11223344
543 288 | |
544 |_________________________________________________|
545 25 line vertical blanking period
546
547 In order to consider only full lines, we might consider the start of each
548 frame to occur 23 lines after the start of vsync.
549
550 Again, it is likely that pixel data production should only occur on scanlines
551 within a certain period on each frame. The "625/50" document indicates that
552 only a certain region is "safe" to use, suggesting a vertically centred region
553 with approximately 15 blank lines above and below the picture. However, the
554 "PAL TV timing and voltages" document suggests 28 blank lines above and below
555 the picture. This would centre the 256 lines within the 312 lines of each
556 field and thus provide a start of picture approximately 5.5 or 5 lines after
557 the end of the blanking period or 28 or 27.5 lines after the start of vsync.
558
559 To summarise:
560
561 CSYNC signal starts at cycle 0
562 CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
563 Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
564
565 See: http://en.wikipedia.org/wiki/PAL
566 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
567 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
568 http://lipas.uwasa.fi/~f76998/video/modes/
569 See: PAL TV timing and voltages
570 http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
571 See: Line Standards
572 http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
573 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
574 http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
575 See: Re: Electron Memory Contention
576 http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
577
578 RAM Integrated Circuits
579 -----------------------
580
581 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
582 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
583 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
584 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
585 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
586
587 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
588 the Samsung-produced KM41464 series is apparently equivalent to the Texas
589 Instruments 4164 chips presumably used in the Electron.
590
591 The TM4164EC4 series combines 4 64K x 1b units into a single package and
592 appears similar to the TM4164EA4 featured on the Electron's circuit diagram
593 (in the Advanced User Guide but not the Service Manual), and it also has 22
594 pins providing 3 additional inputs and 3 additional outputs over the 16 pins
595 of the individual 4164-15 modules, presumably allowing concurrent access to
596 the packaged memory units.
597
598 As far as currently available replacements are concerned, the NTE4164 is a
599 potential candidate: according to the Vetco Electronics entry, it is
600 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
601 parts include the NTE2164 and the NTE6664, both of which appear to have
602 largely the same performance and connection characteristics. Meanwhile, the
603 NTE21256 appears to be a 16-pin replacement with four times the capacity that
604 maintains the single data input and output pins. Using the NTE21256 as a
605 replacement for all ICs combined would be difficult because of the single bit
606 output.
607
608 Another device equivalent to the 4164-15 appears to be available under the
609 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
610 site lists data sheets for other devices on the same page, but these are
611 different and actually appear to be provided under the 41574 product code (but
612 are listed under 41464-10) and appear to be replacements for the TM4164EC4:
613 the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
614 employing 4 pins for both input and output.
615
616 Pins I/O pins Row access Column access
617 ---- -------- ---------- -------------
618 TM4164EC4 22 4 + 4 150ns (15) 90ns (15)
619 KM41464AP 18 4 150ns (15) 75ns (15)
620 NTE21256 16 1 + 1 150ns 75ns
621 HYB 4164-2 16 1 + 1 150ns 100ns
622 µPD41464 18 4 120ns (12) 60ns (12)
623
624 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
625 https://www.rocelec.com/part/REITM4164EC4-15L
626 See: Dynamic RAMS
627 http://www.unicornelectronics.com/IC/DYNAMIC.html
628 See: New old stock 8x 4164 chips
629 http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
630 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
631 http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
632 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
633 http://www.vetco.net/catalog/product_info.php?products_id=2806
634 See: NTE4164 - IC-NMOS 64K DRAM 150NS
635 http://www.vetco.net/catalog/product_info.php?products_id=3680
636 See: NTE21256 - IC-256K DRAM 150NS
637 http://www.vetco.net/catalog/product_info.php?products_id=2799
638 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
639 http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
640 See: NTE6664 - IC-MOS 64K DRAM 150NS
641 http://www.vetco.net/catalog/product_info.php?products_id=5213
642 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
643 http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
644 See: 4164-150: MAJOR BRANDS
645 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
646 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
647 http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
648 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
649 http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
650 See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
651 http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
652 See: 41464-10: MAJOR BRANDS
653 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
654
655 Interrupts
656 ----------
657
658 The ULA generates IRQs (maskable interrupts) according to certain conditions
659 and these conditions are controlled by location &FE00:
660
661 * Vertical sync (bottom of displayed screen)
662 * 50MHz real time clock
663 * Transmit data empty
664 * Receive data full
665 * High tone detect
666
667 The ULA is also used to clear interrupt conditions through location &FE05. Of
668 particular significance is bit 7, which must be set if an NMI (non-maskable
669 interrupt) has occurred and has thus suspended ULA access to memory, restoring
670 the normal function of the ULA.
671
672 ROM Paging
673 ----------
674
675 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
676 mappings exist:
677
678 8 keyboard
679 9 keyboard (duplicate)
680 10 BASIC ROM
681 11 BASIC ROM (duplicate)
682
683 Paging in a ROM involves the following procedure:
684
685 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
686 2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
687 selected.
688 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
689 whilst writing the desired ROM number n in bits 0 to 2.
690
691 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
692
693 Keyboard Access
694 ---------------
695
696 The keyboard pages appear to be accessed at 1MHz just like the RAM.
697
698 See: https://stardot.org.uk/forums/viewtopic.php?p=254155#p254155
699
700 Shadow/Expanded Memory
701 ----------------------
702
703 The Electron exposes all sixteen address lines and all eight data lines
704 through the expansion bus. Using such lines, it is possible to provide
705 additional memory - typically sideways ROM and RAM - on expansion cards and
706 through cartridges, although the official cartridge specification provides
707 fewer address lines and only seeks to provide access to memory in 16K units.
708
709 Various modifications and upgrades were developed to offer "turbo"
710 capabilities to the Electron, permitting the CPU to access a separate 8K of
711 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
712 the ULA through additional logic. However, an enhanced ULA might support
713 independent CPU access to memory over the expansion bus by allowing itself to
714 be discharged from providing access to memory, potentially for a range of
715 addresses, and for the CPU to communicate with external memory uninterrupted.
716
717 Sideways RAM/ROM and Upper Memory Access
718 ----------------------------------------
719
720 Although the ULA controls the CPU clock, effectively slowing or stopping the
721 CPU when the ULA needs to access screen memory, it is apparently able to allow
722 the CPU to access addresses of &8000 and above - the upper region of memory -
723 at 2MHz independently of any access to RAM that the ULA might be performing,
724 only blocking the CPU if it attempts to access addresses of &7FFF and below
725 during any ULA memory access - the lower region of memory - by stopping or
726 stalling its clock.
727
728 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
729 CPU clock if the line goes low, when the CPU is attempting to access the lower
730 region of memory.
731
732 Hardware Scrolling (and Enhancement)
733 ------------------------------------
734
735 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
736 the least significant 5 bits being zero, thus limiting the scrolling
737 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
738 using the same layout of these addresses.
739
740 |--&FE02--------------| |--&FE03--------------|
741 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
742
743 XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
744
745 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
746 memory to pixel locations is character oriented. A change in 8 bytes would
747 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
748 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
749 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
750 Guide).
751
752 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
753 of changing the screen address by 2 bytes is the change in the number of lines
754 from the initial and final character rows that need reading by the ULA, which
755 would need to maintain this state information (although this is a relatively
756 trivial change). Another pitfall is the complication that might be introduced
757 to software writing bitmaps of character height to the screen.
758
759 See: http://pastraiser.com/computers/acornelectron/acornelectron.html
760
761 Enhancement: Mode Layouts
762 -------------------------
763
764 Merely changing the screen memory mappings in order to have Archimedes-style
765 row-oriented screen addresses (instead of character-oriented addresses) could
766 be done for the existing modes, but this might not be sufficiently beneficial,
767 especially since accessing regions of the screen would involve incrementing
768 pointers by amounts that are inconvenient on an 8-bit CPU.
769
770 However, instead of using a Archimedes-style mapping, column-oriented screen
771 addresses could be more feasibly employed: incrementing the address would
772 reference the vertical screen location below the currently-referenced location
773 (just as occurs within characters using the existing ULA); instead of
774 returning to the top of the character row and referencing the next horizontal
775 location after eight bytes, the address would reference the next character row
776 and continue to reference locations downwards over the height of the screen
777 until reaching the bottom; at the bottom, the next location would be the next
778 horizontal location at the top of the screen.
779
780 In other words, the memory layout for the screen would resemble the following
781 (for MODE 2):
782
783 &3000 &3100 ... &7F00
784 &3001 &3101
785 ... ...
786 &3007
787 &3008
788 ...
789 ... ...
790 &30FF ... &7FFF
791
792 Since there are 256 pixel rows, each column of locations would be addressable
793 using the low byte of the address. Meanwhile, the high byte would be
794 incremented to address different columns. Thus, addressing screen locations
795 would become a lot more convenient and potentially much more efficient for
796 certain kinds of graphical output.
797
798 One potential complication with this simplified addressing scheme arises with
799 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
800 with the existing ULA) would be achieved by incrementing or decrementing the
801 screen start address; by one character row, it would involve adding or
802 subtracting 8. However, the ULA only supports multiples of 64 when changing the
803 screen start address. Thus, if such a scheme were to be adopted, three
804 additional bits would need to be supported in the screen start register (see
805 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal
806 scrolling would be much improved even under the severe constraints of the
807 existing ULA: only adjustments of 256 to the screen start address would be
808 required to produce single-location scrolling of as few as two pixels in MODE 2
809 (four pixels in MODEs 1 and 5, eight pixels otherwise).
810
811 More disruptive is the effect of this alternative layout on software.
812 Presumably, compatibility with the BBC Micro was the primary goal of the
813 Electron's hardware design. With the character-oriented screen layout in
814 place, system software (and application software accessing the screen
815 directly) would be relying on this layout to run on the Electron with little
816 or no modification. Although it might have been possible to change the system
817 software to use this column-oriented layout instead, this would have incurred
818 a development cost and caused additional work porting things like games to the
819 Electron. Moreover, a separate branch of the software from that supporting the
820 BBC Micro and closer derivatives would then have needed maintaining.
821
822 The decision to use the character-oriented layout in the BBC Micro may have
823 been related to the choice of circuitry and to facilitate a convenient
824 hardware implementation, and by the time the Electron was planned, it was too
825 late to do anything about this somewhat unfortunate choice.
826
827 Pixel Layouts
828 -------------
829
830 The pixel layouts are as follows:
831
832 Modes Depth (bpp) Pixels (from bits)
833 ----- ----------- ------------------
834 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
835 1, 5 2 73 62 51 40
836 2 4 7531 6420
837
838 Since the ULA reads a half-byte at a time, one might expect it to attempt to
839 produce pixels for every half-byte, as opposed to handling entire bytes.
840 However, the pixel layout is not conducive to producing pixels as soon as a
841 half-byte has been read for a given full-byte location: in 1bpp modes the
842 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
843 data is spread across the entire byte in different ways.
844
845 An alternative arrangement might be as follows:
846
847 Modes Depth (bpp) Pixels (from bits)
848 ----- ----------- ------------------
849 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
850 1, 5 2 76 54 32 10
851 2 4 7654 3210
852
853 Just as the mode layouts were presumably decided by compatibility with the BBC
854 Micro, the pixel layouts will have been maintained for similar reasons.
855 Unfortunately, this layout prevents any optimisation of the ULA for handling
856 half-byte pixel data generally.
857
858 Enhancement: The Missing MODE 4
859 -------------------------------
860
861 The Electron inherits its screen mode selection from the BBC Micro, where MODE
862 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
863 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
864 however, and they are merely implemented by skipping two scanlines in every
865 ten after the eight required to produce a character line. Thus, such modes
866 provide a 24-row display.
867
868 In principle, nothing prevents this "text mode" effect being applied to other
869 modes. The 20-column modes are not well-suited to displaying text, which
870 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
871 2. Although the need for a non-monochrome 40-column text mode is addressed by
872 MODE 7 on the BBC Micro, the Electron lacks such a mode.
873
874 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
875 would occupy MODE 4 instead of the current MODE 4:
876
877 Screen mode Size (kilobytes) Colours Rows Resolution
878 ----------- ---------------- ------- ---- ----------
879 0 20 2 32 640x256
880 1 20 4 32 320x256
881 2 20 16 32 160x256
882 3 16 2 24 640x256
883 4 (new) 16 4 24 320x256
884 4 (old) 10 2 32 320x256
885 5 10 4 32 160x256
886 6 8 2 24 320x256
887
888 Thus, for increasing mode numbers, the size of each mode would be the same or
889 less than the preceding mode.
890
891 Enhancement: Display Mode Property Control
892 ------------------------------------------
893
894 It is rather curious that the ULA supports the mode numbers directly in bits 3
895 to 5 of &FE07 since these would presumably need to be decoded in order to set
896 the fundamental properties of the display mode. These properties are as
897 follows:
898
899 * Screen data retrieval rate: number of fetches per pair of 2MHz cycles
900 * Pixel colour depth
901 * Text mode vertical spacing
902
903 From these, the following properties emerge:
904
905 Property Influences
906 -------- ----------
907 Character row size (bytes) Retrieval rate
908
909 Number of character rows Text mode setting
910
911 Display size (bytes) Retrieval rate (character row size)
912 Text mode setting (number of rows)
913
914 Pixel frequency Retrieval rate
915 Horizontal resolution (pixels) Colour depth
916
917 One can imagine a register bitfield arrangement as follows:
918
919 Field Values Formula
920 ----- ------ -------
921 Pixel depth 00: 1 bit per pixel log2(depth)
922 01: 2 bits per pixel
923 10: 4 bits per pixel
924
925 Retrieval rate 0: twice 2 - fetches per cycle pair
926 1: once
927
928 Text mode enable 0: disable/off text mode enabled
929 1: enable/on
930
931 This arrangement would require four bits. However, one bit in &FE07 is
932 seemingly inactive and might possibly be reallocated.
933
934 The resulting combination of properties would permit all of the existing modes
935 plus some additional ones, including the missing MODE 4 mentioned above. With
936 the bitfields above ordered from the most significant bits to the least
937 significant bits providing the low-level "mode" values, the following table
938 can be produced:
939
940 Screen mode Depth Rate Text Size (K) Colours Rows Resolution
941 ----------- ----- ---- ---- -------- ------- ---- ----------
942 0 (0000) 1 twice off 20 2 32 640x256 (MODE 0)
943 1 (0001) 1 twice on 16 2 24 640x256 (MODE 3)
944 2 (0010) 1 once off 10 2 32 320x256 (MODE 4)
945 3 (0011) 1 once on 8 2 24 320x256 (MODE 6)
946 4 (0100) 2 twice off 20 4 32 320x256 (MODE 1)
947 5 (0101) 2 twice on 16 4 24 320x256
948 6 (0110) 2 once off 10 4 32 160x256 (MODE 5)
949 7 (0111) 2 once on 8 4 24 160x256
950 8 (1000) 4 twice off 20 16 32 160x256 (MODE 2)
951 9 (1001) 4 twice on 16 16 24 160x256
952 10 (1010) 4 once off 10 16 32 80x256
953 11 (1011) 4 once on 8 16 24 80x256
954
955 The existing modes would be covered in a way that is incompatible with the
956 existing numbering, thus requiring a table in software, but additional text
957 modes would be provided for MODE 1, MODE 5 and MODE 2. An additional two lower
958 resolution modes would also be conceivable within this scheme, requiring the
959 stretching of 16MHz pixels by a factor of eight to yield 80 pixels per
960 scanline. The utility of such modes is questionable and such modes might not
961 be supported.
962
963 Enhancement: 2MHz RAM Access
964 ----------------------------
965
966 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
967 when not competing with the ULA only accesses RAM every other 2MHz cycle (as
968 if the ULA still needed to access the RAM), one useful enhancement would be a
969 mechanism to let the CPU take over the ULA cycles outside the ULA's period of
970 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
971 3.
972
973 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
974
975 Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
976 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
977
978 In MODE 4 to 6:
979
980 Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
981 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
982
983 This would improve CPU bandwidth as follows:
984
985 Standard ULA Enhanced ULA % Total Bandwidth Speedup
986 MODE 0, 1, 2 9728 bytes 19456 bytes 24% -> 49% 2
987 MODE 3 12288 bytes 24576 bytes 31% -> 62% 2
988 MODE 4, 5 19968 bytes 29696 bytes 50% -> 74% 1.5
989 MODE 6 19968 bytes 32256 bytes 50% -> 81% 1.6
990
991 (Here, the uncontended total 2MHz bandwidth for a display period would be
992 39936 bytes, being 128 cycles per line over 312 lines.)
993
994 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
995 because all access opportunities to RAM are doubled. Meanwhile, in the other
996 modes, some CPU accesses occur alongside ULA accesses and thus cannot be
997 doubled, but the CPU bandwidth increase is still significant.
998
999 Unfortunately, the mechanism for accessing the RAM is too slow to provide data
1000 within the time constraints of 2MHz operation. There is no time remaining in a
1001 2MHz cycle for the CPU to receive and process any retrieved data once the
1002 necessary signalling has been performed.
1003
1004 The only way for the CPU to be able to access the RAM quickly enough would be
1005 to do away with the double 4-bit access mechanism and to have a single 8-bit
1006 channel to the memory. This would require twice as many 1-bit RAM chips or a
1007 different kind of RAM chip, but it would also potentially simplify the ULA.
1008
1009 The section on 8-bit wide RAM access discusses the possibilities around
1010 changing the memory architecture, also describing the possibility of ULA
1011 accesses achieving two bytes per 2MHz cycle due to the doubling of the memory
1012 channel, leaving every other access free for the CPU during the display period
1013 in MODE 0 to 3...
1014
1015 Standard display period: UUUUUUUU
1016 Modified display period: UCUCUCUC
1017
1018 ...and consolidating accesses in MODE 4 to 6:
1019
1020 Standard display period: UCUCUCUC
1021 Modified display period: UCCCUCCC
1022
1023 Together with the enhancements for non-display periods, such an "Enhanced+ ULA"
1024 would perform as follows:
1025
1026 Standard ULA Enhanced+ ULA % Total Bandwidth Speedup
1027 MODE 0, 1, 2 9728 bytes 29696 bytes 24% -> 74% 3.1
1028 MODE 3 12288 bytes 32256 bytes 31% -> 81% 2.6
1029 MODE 4, 5 19968 bytes 34816 bytes 50% -> 87% 1.7
1030 MODE 6 19968 bytes 36096 bytes 50% -> 90% 1.8
1031
1032 Of course, the principal enhancement would be the wider memory channel, with
1033 more buffering in the ULA being its contribution to this arrangement.
1034
1035 Enhancement: Region Blanking
1036 ----------------------------
1037
1038 The problem of permitting character-oriented blitting in programs whilst
1039 scrolling the screen by sub-character amounts could be mitigated by permitting
1040 a region of the display to be blank, such as the final lines of the display.
1041 Consider the following vertical scrolling by 2 bytes that would cause an
1042 initial character row of 6 lines and a final character row of 2 lines:
1043
1044 6 lines - initial, partial character row
1045 248 lines - 31 complete rows
1046 2 lines - final, partial character row
1047
1048 If a routine were in use that wrote 8 line bitmaps to the partial character
1049 row now split in two, it would be advisable to hide one of the regions in
1050 order to prevent content appearing in the wrong place on screen (such as
1051 content meant to appear at the top "leaking" onto the bottom). Blanking 6
1052 lines would be sufficient, as can be seen from the following cases.
1053
1054 Scrolling up by 2 lines:
1055
1056 6 lines - initial, partial character row
1057 240 lines - 30 complete rows
1058 4 lines - part of 1 complete row
1059 -----------------------------------------------------------------
1060 4 lines - part of 1 complete row (hidden to maintain 250 lines)
1061 2 lines - final, partial character row (hidden)
1062
1063 Scrolling down by 2 lines:
1064
1065 2 lines - initial, partial character row
1066 248 lines - 31 complete rows
1067 ----------------------------------------------------------
1068 6 lines - final, partial character row (hidden)
1069
1070 Thus, in this case, region blanking would impose a 250 line display with the
1071 bottom 6 lines blank.
1072
1073 See the description of the display suspend enhancement for a more efficient
1074 way of blanking lines than merely blanking the palette whilst allowing the CPU
1075 to perform useful work during the blanking period.
1076
1077 To control the blanking or suspending of lines at the top and bottom of the
1078 display, a memory location could be dedicated to the task: the upper 4 bits
1079 could define a blanking region of up to 16 lines at the top of the screen,
1080 whereas the lower 4 bits could define such a region at the bottom of the
1081 screen. If more lines were required, two locations could be employed, allowing
1082 the top and bottom regions to occupy the entire screen.
1083
1084 Enhancement: Screen Height Adjustment
1085 -------------------------------------
1086
1087 The height of the screen could be configurable in order to reduce screen
1088 memory consumption. This is not quite done in MODE 3 and 6 since the start of
1089 the screen appears to be rounded down to the nearest page, but by reducing the
1090 height by amounts more than a page, savings would be possible. For example:
1091
1092 Screen width Depth Height Bytes per line Saving in bytes Start address
1093 ------------ ----- ------ -------------- --------------- -------------
1094 640 1 252 80 320 &3140 -> &3100
1095 640 1 248 80 640 &3280 -> &3200
1096 320 1 240 40 640 &5A80 -> &5A00
1097 320 2 240 80 1280 &3500
1098
1099 Screen Mode Selection
1100 ---------------------
1101
1102 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
1103 range of modes, the other bits of &FE*7 (related to sound, cassette
1104 input/output and the Caps Lock LED) would need to be reassigned and bit 0
1105 potentially being made available for use.
1106
1107 Enhancement: Palette Definition
1108 -------------------------------
1109
1110 Since all memory accesses go via the ULA, an enhanced ULA could employ more
1111 specific addresses than &FE*X to perform enhanced functions. For example, the
1112 palette control is done using &FE*8-F and merely involves selecting predefined
1113 colours, whereas an enhanced ULA could support the redefinition of all 16
1114 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
1115 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour
1116 specifications similar to those used on the Archimedes.
1117
1118 The principal limitation here is actually the hardware: the Electron has only
1119 a single output line for each of the red, green and blue channels, and if
1120 those outputs are strictly digital and can only be set to a "high" and "low"
1121 value, then only the existing eight colours are possible. If a modern ULA were
1122 able to output analogue values (or values at well-defined points between the
1123 high and low values, such as the half-on value supported by the Amstrad CPC
1124 series), it would still need to be assessed whether the circuitry could
1125 successfully handle and propagate such values. Various sources indicate that
1126 only "TTL levels" are supported by the RGB output circuit, and since there are
1127 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
1128 is likely that the ULA is expected to provide only "high" or "low" values.
1129
1130 Short of adding extra outputs from the ULA (either additional red, green and
1131 blue outputs or a combined intensity output), another approach might involve
1132 some kind of modulation where an output value might be encoded in multiple
1133 pulses at a higher frequency than the pixel frequency. However, this would
1134 demand additional circuitry outside the ULA, and component RGB monitors would
1135 probably not be able to take advantage of this feature; only UHF and composite
1136 video devices (the latter with the composite video colour support enabled on
1137 the Electron's circuit board) would potentially benefit.
1138
1139 Flashing Colours
1140 ----------------
1141
1142 According to the Advanced User Guide, "The cursor and flashing colours are
1143 entirely generated in software: This means that all of the logical to physical
1144 colour map must be changed to cause colours to flash." This appears to suggest
1145 that the palette registers must be updated upon the flash counter - read and
1146 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
1147 colour pairs to be any combination of colours might be possible, instead of
1148 having colour complements as pairs.
1149
1150 It is conceivable that the interrupt code responsible does the simple thing
1151 and merely inverts the current values for any logical colours (LC) for which
1152 the associated physical colour (as supplied as the second parameter to the VDU
1153 19 call) has the top bit of its four bit value set. These top bits are not
1154 recorded in the palette registers but are presumably recorded separately and
1155 used to build bitmaps as follows:
1156
1157 LC 2 colour 4 colour 16 colour 4-bit value for inversion
1158 -- -------- -------- --------- -------------------------
1159 0 00010001 00010001 00010001 1, 1, 1
1160 1 01000100 00100010 00010001 4, 2, 1
1161 2 01000100 00100010 4, 2
1162 3 10001000 00100010 8, 2
1163 4 00010001 1
1164 5 00010001 1
1165 6 00100010 2
1166 7 00100010 2
1167 8 01000100 4
1168 9 01000100 4
1169 10 10001000 8
1170 11 10001000 8
1171 12 01000100 4
1172 13 01000100 4
1173 14 10001000 8
1174 15 10001000 8
1175
1176 Inversion value calculation:
1177
1178 2 colour formula: 1 << (colour * 2)
1179 4 colour formula: 1 << colour
1180 16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
1181
1182 For example, where logical colour 0 has been mapped to a physical colour in
1183 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
1184 the inversion operation. (The lower three bits of the physical colour would be
1185 used to set the underlying colour information affected by the inversion
1186 operation.)
1187
1188 An operation in the interrupt code would then combine the bitmaps for all
1189 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
1190 combined for groups of logical colours as follows:
1191
1192 Logical colours
1193 ---------------
1194 0, 2, 8, 10
1195 4, 6, 12, 14
1196 5, 7, 13, 15
1197 1, 3, 9, 11
1198
1199 These combined bitmaps would be EORed with the existing palette register
1200 values in order to perform the value inversion necessary to produce the
1201 flashing effect.
1202
1203 Thus, in the VDU 19 operation, the appropriate inversion value would be
1204 calculated for the logical colour, and this value would then be combined with
1205 other inversion values in a dedicated memory location corresponding to the
1206 colour's group as indicated above. Meanwhile, the palette channel values would
1207 be derived from the lower three bits of the specified physical colour and
1208 combined with other palette data in dedicated memory locations corresponding
1209 to the palette registers.
1210
1211 Interestingly, although flashing colours on the BBC Micro are controlled by
1212 toggling bit 0 of the &FE20 control register location for the Video ULA, the
1213 actual colour inversion is done in hardware.
1214
1215 Enhancement: Palette Definition Lists
1216 -------------------------------------
1217
1218 It can be useful to redefine the palette in order to change the colours
1219 available for a particular region of the screen, particularly in modes where
1220 the choice of colours is constrained, and if an increased colour depth were
1221 available, palette redefinition would be useful to give the illusion of more
1222 than 16 colours in MODE 2. Traditionally, palette redefinition has been done
1223 by using interrupt-driven timers, but a more efficient approach would involve
1224 presenting lists of palette definitions to the ULA so that it can change the
1225 palette at a particular display line.
1226
1227 One might define a palette redefinition list in a region of memory and then
1228 communicate its contents to the ULA by writing the address and length of the
1229 list, along with the display line at which the palette is to be changed, to
1230 ULA registers such that the ULA buffers the list and performs the redefinition
1231 at the appropriate time. Throughput/bandwidth considerations might impose
1232 restrictions on the practical length of such a list, however.
1233
1234 A simple form of palette definition might be useful in text modes. Within the
1235 blank region between lines, the foreground palette could be changed to apply
1236 to the next line. Palette values could be read from a table in RAM, perhaps
1237 preceding the screen data, with 24 2-byte entries providing palette
1238 redefinition support in 2- and 4-colour modes.
1239
1240 Enhancement: Display Synchronisation Interrupts
1241 -----------------------------------------------
1242
1243 When completing each scanline of the display, the ULA could trigger an
1244 interrupt. Since this might impact system performance substantially, the
1245 feature would probably need to be configurable, and it might be sufficient to
1246 have an interrupt only after a certain number of display lines instead.
1247 Permitting the CPU to take action after eight lines would allow palette
1248 switching and other effects to occur on a character row basis.
1249
1250 The ULA provides an interrupt at the end of the display period, presumably so
1251 that software can schedule updates to the screen, avoid flickering or tearing,
1252 and so on. However, some applications might benefit from an interrupt at, or
1253 just before, the start of the display period so that palette modifications or
1254 similar effects could be scheduled.
1255
1256 Enhancement: Palette-Free Modes
1257 -------------------------------
1258
1259 Palette-free modes might be defined where bit values directly correspond to
1260 the red, green and blue channels, although this would mostly make sense only
1261 for modes with depths greater than the standard 4 bits per pixel, and such
1262 modes would require more memory than MODE 2 if they were to have an acceptable
1263 resolution.
1264
1265 Enhancement: Display Suspend
1266 ----------------------------
1267
1268 Especially when writing to the screen memory, it could be beneficial to be
1269 able to suspend the ULA's access to the memory, instead producing blank values
1270 for all screen pixels until a program is ready to reveal the screen. This is
1271 different from palette blanking since with a blank palette, the ULA is still
1272 reading screen memory and translating its contents into pixel values that end
1273 up being blank.
1274
1275 This function is reminiscent of a capability of the ZX81, albeit necessary on
1276 that hardware to reduce the load on the system CPU which was responsible for
1277 producing the video output. By allowing display suspend on the Electron, the
1278 performance benefit would be derived from giving the CPU full access to the
1279 memory bandwidth.
1280
1281 Note that since the CPU is only able to access RAM at 1MHz, there is no
1282 possibility to improve performance beyond that achieved in MODE 4, 5 or 6
1283 normally. However, if faster RAM access were to be made possible (see the
1284 discussion of 8-bit wide RAM access), the CPU could benefit from freeing up
1285 the ULA's access slots entirely.
1286
1287 The region blanking feature mentioned above could be implemented using this
1288 enhancement instead of employing palette blanking for the affected lines of
1289 the display.
1290
1291 Enhancement: Memory Filling
1292 ---------------------------
1293
1294 A capability that could be given to an enhanced ULA is that of permitting the
1295 ULA to write to screen memory as well being able to read from it. Although
1296 such a capability would probably not be useful in conjunction with the
1297 existing read operations when producing a screen display, and insufficient
1298 bandwidth would exist to do so in high-bandwidth screen modes anyway, the
1299 capability could be offered during a display suspend period (as described
1300 above), permitting a more efficient mechanism to rapidly fill memory with a
1301 predetermined value.
1302
1303 This capability could also support block filling, where the limits of the
1304 filled memory would be defined by the position and size of a screen area,
1305 although this would demand the provision of additional registers in the ULA to
1306 retain the details of such areas and additional logic to control the fill
1307 operation.
1308
1309 Enhancement: Region Filling
1310 ---------------------------
1311
1312 An alternative to memory writing might involve indicating regions using
1313 additional registers or memory where the ULA fills regions of the screen with
1314 content instead of reading from memory. Unlike hardware sprites which should
1315 realistically provide varied content, region filling could employ single
1316 colours or patterns, and one advantage of doing so would be that the ULA need
1317 not access memory at all within a particular region.
1318
1319 Regions would be defined on a row-by-row basis. Instead of reading memory and
1320 blitting a direct representation to the screen, the ULA would read region
1321 definitions containing a start column, region width and colour details. There
1322 might be a certain number of definitions allowed per row, or the ULA might
1323 just traverse an ordered list of such definitions with each one indicating the
1324 row, start column, region width and colour details.
1325
1326 One could even compress this information further by requiring only the row,
1327 start column and colour details with each subsequent definition terminating
1328 the effect of the previous one. However, one would also need to consider the
1329 convenience of preparing such definitions and whether efficient access to
1330 definitions for a particular row might be desirable. It might also be
1331 desirable to avoid having to prepare definitions for "empty" areas of the
1332 screen, effectively making the definition of the screen contents employ
1333 run-length encoding and employ only colour plus length information.
1334
1335 One application of region filling is that of simple 2D and 3D shape rendering.
1336 Although it is entirely possible to plot such shapes to the screen and have
1337 the ULA blit the memory contents to the screen, such operations consume
1338 bandwidth both in the initial plotting and in the final transfer to the
1339 screen. Region filling would reduce such bandwidth usage substantially.
1340
1341 This way of representing screen images would make certain kinds of images
1342 unfeasible to represent - consider alternating single pixel values which could
1343 easily occur in some character bitmaps - even if an internal queue of regions
1344 were to be supported such that the ULA could read ahead and buffer such
1345 "bandwidth intensive" areas. Thus, the ULA might be better served providing
1346 this feature for certain areas of the display only as some kind of special
1347 graphics window.
1348
1349 Enhancement: Hardware Sprites
1350 -----------------------------
1351
1352 An enhanced ULA might provide hardware sprites, but this would be done in an
1353 way that is incompatible with the standard ULA, since no &FE*X locations are
1354 available for allocation. To keep the facility simple, hardware sprites would
1355 have a standard byte width and height.
1356
1357 The specification of sprites could involve the reservation of 16 locations
1358 (for example, &FE20-F) specifying a fixed number of eight sprites, with each
1359 location pair referring to the sprite data. By limiting the ULA to dealing
1360 with a fixed number of sprites, the work required inside the ULA would be
1361 reduced since it would avoid having to deal with arbitrary numbers of sprites.
1362
1363 The principal limitation on providing hardware sprites is that of having to
1364 obtain sprite data, given that the ULA is usually required to retrieve screen
1365 data, and given the lack of memory bandwidth available to retrieve sprite data
1366 (particularly from multiple sprites supposedly at the same position) and
1367 screen data simultaneously. Although the ULA could potentially read sprite
1368 data and screen data in alternate memory accesses in screen modes where the
1369 bandwidth is not already fully utilised, this would result in a degradation of
1370 performance.
1371
1372 Enhancement: Additional Screen Mode Configurations
1373 --------------------------------------------------
1374
1375 Alternative screen mode configurations could be supported. The ULA has to
1376 produce 640 pixel values across the screen, with pixel doubling or quadrupling
1377 employed to fill the screen width:
1378
1379 Screen width Columns Scaling Depth Bytes
1380 ------------ ------- ------- ----- -----
1381 640 80 x1 1 80
1382 320 40 x2 1, 2 40, 80
1383 160 20 x4 2, 4 40, 80
1384
1385 It must also use at most 80 byte-sized memory accesses to provide the
1386 information for the display. Given that characters must occupy an 8x8 pixel
1387 array, if a configuration featuring anything other than 20, 40 or 80 character
1388 columns is to be supported, compromises must be made such as the introduction
1389 of blank pixels either between characters (such as occurs between rows in MODE
1390 3 and 6) or at the end of a scanline (such as occurs at the end of the frame
1391 in MODE 3 and 6). Consider the following configuration:
1392
1393 Screen width Columns Scaling Depth Bytes Blank
1394 ------------ ------- ------- ----- ------ -----
1395 208 26 x3 1, 2 26, 52 16
1396
1397 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
1398 colours could be provided, with 16 blank pixel values (out of a total of 640)
1399 generated either at the start or end (or split between the start and end) of
1400 each scanline.
1401
1402 Enhancement: Character Attributes
1403 ---------------------------------
1404
1405 The BBC Micro MODE 7 employs something resembling character attributes to
1406 support teletext displays, but depends on circuitry providing a character
1407 generator. The ZX Spectrum, on the other hand, provides character attributes
1408 as a means of colouring bitmapped graphics. Although such a feature is very
1409 limiting as the sole means of providing multicolour graphics, in situations
1410 where the choice is between low resolution multicolour graphics or high
1411 resolution monochrome graphics, character attributes provide a potentially
1412 useful compromise.
1413
1414 For each byte read, the ULA must deliver 8 pixel values (out of a total of
1415 640) to the video output, doing so by either emptying its pixel buffer on a
1416 pixel per cycle basis, or by multiplying pixels and thus holding them for more
1417 than one cycle. For example for a screen mode having 640 pixels in width:
1418
1419 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1420 Reads: B B
1421 Pixels: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
1422
1423 And for a screen mode having 320 pixels in width:
1424
1425 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1426 Reads: B
1427 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1428
1429 However, in modes where less than 80 bytes are required to generate the pixel
1430 values, an enhanced ULA might be able to read additional bytes between those
1431 providing the bitmapped graphics data:
1432
1433 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1434 Reads: B A
1435 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1436
1437 These additional bytes could provide colour information for the bitmapped data
1438 in the following character column (of 8 pixels). Since it would be desirable
1439 to apply attribute data to the first column, the initial 8 cycles might be
1440 configured to not produce pixel values.
1441
1442 For an entire character, attribute data need only be read for the first row of
1443 pixels for a character. The subsequent rows would have attribute information
1444 applied to them, although this would require the attribute data to be stored
1445 in some kind of buffer. Thus, the following access pattern would be observed:
1446
1447 Reads: A B _ B _ B _ B _ B _ B _ B _ B ...
1448
1449 In modes 3 and 6, the blank display lines could be used to retrieve attribute
1450 data:
1451
1452 Reads (blank): A _ A _ A _ A _ A _ A _ A _ A _ ...
1453 Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ...
1454 Reads (active): B _ B _ B _ B _ B _ B _ B _ B _ ...
1455 ...
1456
1457 See below for a discussion of using this for character data as well.
1458
1459 A whole byte used for colour information for a whole character would result in
1460 a choice of 256 colours, and this might be somewhat excessive. By only reading
1461 attribute bytes at every other opportunity, a choice of 16 colours could be
1462 applied individually to two characters.
1463
1464 Cycle: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1465 Reads: B A B -
1466 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1467
1468 Further reductions in attribute data access, offering 4 colours for every
1469 character in a four character block, for example, might also be worth
1470 considering.
1471
1472 Consider the following configurations for screen modes with a colour depth of
1473 1 bit per pixel for bitmap information:
1474
1475 Screen width Columns Scaling Bytes (B) Bytes (A) Colours Screen start
1476 ------------ ------- ------- --------- --------- ------- ------------
1477 320 40 x2 40 40 256 &5300
1478 320 40 x2 40 20 16 &5580 -> &5500
1479 320 40 x2 40 10 4 &56C0 -> &5600
1480 208 26 x3 26 26 256 &62C0 -> &6200
1481 208 26 x3 26 13 16 &6460 -> &6400
1482
1483 Enhancement: Text-Only Modes using Character and Attribute Data
1484 ---------------------------------------------------------------
1485
1486 In modes 3 and 6, the blank display lines could be used to retrieve character
1487 and attribute data instead of trying to insert it between bitmap data accesses,
1488 but this data would then need to be retained:
1489
1490 Reads: A C A C A C A C A C A C A C A C ...
1491 Reads: B _ B _ B _ B _ B _ B _ B _ B _ ...
1492
1493 Only attribute (A) and character (C) reads would require screen memory
1494 storage. Bitmap data reads (B) would involve either accesses to memory to
1495 obtain character definition details or could, at the cost of special storage
1496 in the ULA, involve accesses within the ULA that would then free up the RAM.
1497 However, the CPU would not benefit from having any extra access slots due to
1498 the limitations of the RAM access mechanism.
1499
1500 A scheme without caching might be possible. The same line of memory addresses
1501 might be visited over and over again for eight display lines, with an index
1502 into the bitmap data being incremented from zero to seven. The access patterns
1503 would look like this:
1504
1505 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 0)
1506 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 1)
1507 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 2)
1508 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 3)
1509 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 4)
1510 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 5)
1511 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 6)
1512 Reads: C B C B C B C B C B C B C B C B ... (generate data from index 7)
1513
1514 The bandwidth requirements would be the sum of the accesses to read the
1515 character values (repeatedly) and those to read the bitmap data to reproduce
1516 the characters on screen.
1517
1518 Enhancement: 40-Column Text Modes by Interleaving Screen and Bitmap Accesses
1519 ----------------------------------------------------------------------------
1520
1521 A simplified form of the above interleaved character/bitmap reading scheme.
1522 This was also suggested in a discussion here:
1523
1524 https://stardot.org.uk/forums/viewtopic.php?p=393243#p393243
1525
1526 The ULA could be run in high-bandwidth mode to fetch character codes from
1527 screen memory in one cycle and then to use the character code to look up a
1528 pixel row of a character bitmap, reading that bitmap slice in the following
1529 cycle. The bitmap would be converted to pixel values that would then be
1530 emitted over the subsequent two cycles concurrently with the preparation of
1531 the next character's pixels.
1532
1533 2MHz cycle: 0 1 2 3 4 5 ...
1534 Reads: C B C B C B ...
1535 Pixels: a b ...
1536
1537 The memory access to bitmap data would be computed as follows, assuming the
1538 normal eight pixel height and single-byte encoding of character bitmaps:
1539
1540 bitmap address = bitmap table base + (character code * 8) + index
1541
1542 Each successive pixel row on the screen would expose the appropriate row in
1543 the character bitmap, with this index looping from 0 to 7 repeatedly as shown
1544 previously. Spacing between character lines could be introduced as already
1545 done in MODE 6.
1546
1547 Character bitmap data would be stored in RAM, since this is the only possible
1548 source of data for the ULA as delivered. The use of ROM would require changes
1549 to the broader system architecture. Thus, the total memory requirements of
1550 such a mode would be the locations for character positions plus the storage
1551 requirements of the bitmaps to be supported.
1552
1553 Columns Rows Screen size Bitmaps Bitmaps size Total size
1554 ------- ---- ----------- ------- ------------ ----------
1555 40 25 1000 256 2048 3048
1556 40 25 1000 128 1024 2024
1557 40 25 1000 96 768 1768
1558 40 32 1280 256 2048 3328
1559 40 32 1280 128 1024 2304
1560 40 32 1280 96 768 2048
1561
1562 The simplest arrangement would involve bitmap definitions for all 256 possible
1563 character codes, demanding a total of around 3K of RAM. Reducing the number of
1564 supported bitmaps to 96 (codes 32 to 127 inclusive) would bring this total to
1565 a maximum of 2K, but this would incur additional complexity in the ULA itself
1566 if the codes not corresponding to bitmaps were to be specially mapped to, say,
1567 the bitmap for the space character or to a null character.
1568
1569 With the screen start address controllable, it is conceivable that with a
1570 256-entry bitmap table, the screen memory could be made to overlap the bitmap
1571 table for bitmaps not likely to be used. For example, the bitmap table might
1572 be situated at &7700, with this leaving enough space for 128 entries (&400 or
1573 1024 bytes) and a 40x32 text screen (&500 or 1280 bytes):
1574
1575 &8000 +---------------+---------------+
1576 &7F00 +---------------+ |
1577 | | Display |
1578 | Bitmaps (128) | (40x32) |
1579 | | |
1580 &7B00 +---------------+---------------+
1581 | |
1582 | Bitmaps (128) |
1583 | |
1584 &7700 +---------------+
1585
1586 Care would then need to be taken to avoid the use of codes from 128 to 255 in
1587 the screen memory as these would replicate character data as bitmap data.
1588
1589 Enhancement: MODE 7 Emulation using Character Attributes
1590 --------------------------------------------------------
1591
1592 If the scheme of applying attributes to character regions were employed to
1593 emulate MODE 7, in conjunction with the MODE 6 display technique, the
1594 following configuration would be required:
1595
1596 Screen width Columns Rows Bytes (B) Bytes (A) Colours Screen start
1597 ------------ ------- ---- --------- --------- ------- ------------
1598 320 40 25 40 20 16 &5ECC -> &5E00
1599 320 40 25 40 10 4 &5FC6 -> &5F00
1600
1601 Although this requires much more memory than MODE 7 (8500 bytes versus MODE
1602 7's 1000 bytes), it does not need much more memory than MODE 6, and it would
1603 at least make a limited 40-column multicolour mode available as a substitute
1604 for MODE 7.
1605
1606 Using the text-only enhancement with caching of data or with repeated reads of
1607 the same character data line for eight display lines, the storage requirements
1608 would be diminished substantially:
1609
1610 Screen width Columns Rows Bytes (C) Bytes (A) Colours Screen start
1611 ------------ ------- ---- --------- --------- ------- ------------
1612 320 40 25 40 20 16 &7A94 -> &7A00
1613 320 40 25 40 10 4 &7B1E -> &7B00
1614 320 40 25 40 5 2 &7B9B -> &7B00
1615 320 40 25 40 0 (2) &7C18 -> &7C00
1616 640 80 25 80 40 16 &7448 -> &7400
1617 640 80 25 80 20 4 &763C -> &7600
1618 640 80 25 80 10 2 &7736 -> &7700
1619 640 80 25 80 0 (2) &7830 -> &7800
1620
1621 Note that the colours describe the locally defined attributes for each
1622 character. When no attribute information is provided, the colours are defined
1623 globally.
1624
1625 Enhancement: Character Generator Support and Vertical Scaling
1626 -------------------------------------------------------------
1627
1628 When generating a picture, the ULA traverses screen memory, obtaining 40 or 80
1629 bytes of pixel data for each scanline. It then proceeds to the next row of
1630 pixel data for each successive scanline, with the exception of the text modes
1631 where scanlines may be blank (for which the row address does not advance).
1632 This arrangement provides a conventional bitmapped graphics display.
1633
1634 However, the ULA could instead facilitate the use of character generators. The
1635 principles involved can be demonstrated by the Jafa Mode 7 Mark 2 Display Unit
1636 expansion for the Electron which feeds the pixel data from a MODE 4 screen to
1637 a SAA5050 character generator to create a MODE 7 display. The solution adopted
1638 involves the replication of 40 bytes of character data across as many pixel
1639 rows as is necessary for the character generator to receive the appropriate
1640 character data for all scanlines in any given character row. If only a single
1641 40-byte row of character data were to be present for the first scanline of a
1642 character row, the character generator would only produce the first scanline
1643 (or the uppermost pixels of the characters) correctly, with the rest of the
1644 character shapes being ill-defined.
1645
1646 Here, the ULA could facilitate the use of memory-efficient character mode
1647 representations (such as MODE 7) by holding the row address for a number of
1648 scanlines, thus providing the same row of screen data for those scanlines,
1649 then advancing to the next row. Visualised in terms of pixel data, it would be
1650 like providing a display with a very low vertical resolution. Indeed, being
1651 able to reduce the vertical resolution of a display mode by a factor of eight
1652 or ten would be equivalent to the above character generation technique in
1653 terms of the ULA's screen reading activities.
1654
1655 By combining this vertical scaling or scanline replication with a circuit
1656 switchable between bitmapped graphics output and character graphics output,
1657 MODE 7 support could be made available, potentially as a hardware option
1658 separate from the ULA.
1659
1660 Enhancement: Compressed Character Data
1661 --------------------------------------
1662
1663 Another observation about text-only modes is that they only need to store a
1664 restricted set of bitmapped data values. Encoding this set of values in a
1665 smaller unit of storage than a byte could possibly help to reduce the amount
1666 of storage and bandwidth required to reproduce the characters on the display.
1667
1668 Enhancement: High Resolution Graphics and Larger Colour Depths
1669 --------------------------------------------------------------
1670
1671 Screen modes with higher resolutions and larger colour depths might be
1672 possible, but this would in most cases involve the allocation of more screen
1673 memory, and the ULA would probably then be obliged to page in such memory for
1674 the CPU to be able to sensibly access it all. Higher resolutions would also
1675 involve a faster pixel clock.
1676
1677 However, we may consider a doubled colour depth and the need for higher
1678 bandwidth transfers by a ULA having an 8-bit data bus to access the RAM,
1679 utilising two "page mode" transfers per 2MHz cycle. If such transfers were to
1680 access consecutive bytes in the same memory region (for example, bytes &3000
1681 and &3001) this would require a change to the arrangement of screen memory,
1682 also incurring changes to the memory map for larger modes:
1683
1684 (&3000 &3001) (&3010 &3011) ...
1685 (&3002 &3003) (&3012 &3013)
1686 ... ...
1687 (&300E &300F) (&301E &301F)
1688
1689 If such transfers were to access two adjacent columns of bytes (for example,
1690 bytes &3000 and &3008), this would still require a change in the step size
1691 across the screen memory, also incur memory map changes for larger modes, and
1692 the method for programs to update the screen would be more complicated:
1693
1694 (&3000 &3008) (&3010 &3018) ...
1695 (&3001 &3009) (&3011 &3019)
1696 ... ...
1697 (&3007 &300F) (&3017 &301F)
1698
1699 However, such transfers could instead map the device address bit that is
1700 toggled between transfers to the most significant system memory address bit.
1701 Thus, bits in adjacent locations within each RAM device would actually reside
1702 in different memory regions:
1703
1704 (&3000 &B000) (&3008 &B008) ...
1705 (&3001 &B001) (&3009 &B009)
1706 ... ...
1707 (&3007 &B007) (&300F &B00F)
1708
1709 Since &B000 can also be considered as &3000 combined with &8000, this
1710 introducing the asserted uppermost bit, address &B000 can be considered as
1711 &3000 in an upper memory bank.
1712
1713 Other mechanisms might be employed to allow programs to access the uppermost
1714 bank, but the ULA would be able to access it trivially and unconditionally.
1715
1716 Enhancement: Assembling a Display from Separate Display Planes
1717 --------------------------------------------------------------
1718
1719 Continuing from the use of separate memory regions for higher bandwidth modes,
1720 one can consider a memory layout where modes 1 and 2 would employ two regions
1721 that individually resemble modes 4 and 5 respectively. Programs would be able
1722 to populate two copies of the screen memory for a low-bandwidth mode in order
1723 to produce a single screen memory region for the corresponding high-bandwidth
1724 mode. This would allow a seamless transition between displays with different
1725 numbers of colours without needing to redraw the display.
1726
1727 Enhancement: Genlock Support
1728 ----------------------------
1729
1730 The ULA generates a video signal in conjunction with circuitry producing the
1731 output features necessary for the correct display of the screen image.
1732 However, it appears that the ULA drives the video synchronisation mechanism
1733 instead of reacting to an existing signal. Genlock support might be possible
1734 if the ULA were made to be responsive to such external signals, resetting its
1735 address generators upon receiving synchronisation events.
1736
1737 Enhancement: Improved Sound
1738 ---------------------------
1739
1740 The standard ULA reserves &FE*6 for sound generation and cassette input/output
1741 (with bits 1 and 2 of &FE*7 being used to select either sound generation or
1742 cassette I/O), thus making it impossible to support multiple channels within
1743 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
1744 and an enhanced ULA could adopt this interface.
1745
1746 The BBC Micro uses the SN76489 chip to produce sound, and the entire
1747 functionality of this chip could be emulated for enhanced sound, with a subset
1748 of the functionality exposed via the &FE*6 interface.
1749
1750 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
1751 See: http://www.smspower.org/Development/SN76489
1752
1753 Enhancement: Waveform Upload
1754 ----------------------------
1755
1756 As with a hardware sprite function, waveforms could be uploaded or referenced
1757 using locations as registers referencing memory regions.
1758
1759 Enhancement: Sound Input/Output
1760 -------------------------------
1761
1762 Since the ULA already controls audio input/output for cassette-based data, it
1763 would have been interesting to entertain the idea of sampling and output of
1764 sounds through the cassette interface. However, a significant amount of
1765 circuitry is employed to process the input signal for use by the ULA and to
1766 process the output signal for recording.
1767
1768 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
1769
1770 Enhancement: BBC ULA Compatibility
1771 ----------------------------------
1772
1773 Although some new ULA functions could be defined in a way that is also
1774 compatible with the BBC Micro, the BBC ULA is itself incompatible with the
1775 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
1776 map, but controls various functions specific to the 6845 video controller;
1777 &FE08-F is reserved for the serial controller. It therefore becomes possible
1778 to disregard compatibility where compatibility is already disregarded for a
1779 particular area of functionality.
1780
1781 &FE20-F maps to video ULA functionality on the BBC Micro which provides
1782 control over the palette (using address &FE21, compared to &FE07-F on the
1783 Electron) and other system-specific functions. Since the location usage is
1784 generally incompatible, this region could be reused for other purposes.
1785
1786 Enhancement: Increased RAM, ULA and CPU Performance
1787 ---------------------------------------------------
1788
1789 More modern implementations of the hardware might feature faster RAM coupled
1790 with an increased ULA clock frequency in order to increase the bandwidth
1791 available to the ULA and to the CPU in situations where the ULA is not needed
1792 to perform work. A ULA employing a 32MHz clock would be able to complete the
1793 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
1794 to access the RAM for the following 250ns even in display modes requiring the
1795 retrieval of a byte for the display every 500ns. The CPU could, subject to
1796 timing issues, run at 2MHz even in MODE 0, 1 and 2.
1797
1798 A scheme such as that described above would have a similar effect to the
1799 scheme employed in the BBC Micro, although the latter made use of RAM with a
1800 wider bandwidth in order to complete memory transfers within 250ns and thus
1801 permit the CPU to run continuously at 2MHz.
1802
1803 Higher bandwidth could potentially be used to implement exotic features such
1804 as RAM-resident hardware sprites or indeed any feature demanding RAM access
1805 concurrent with the production of the display image.
1806
1807 Enhancement: Multiple CPU Stacks and Zero Pages
1808 -----------------------------------------------
1809
1810 The 6502 maintains a stack for subroutine calls and register storage in page
1811 &01. Although the stack register can be manipulated using the TSX and TXS
1812 instructions, thereby permitting the maintenance of multiple stack regions and
1813 thus the potential coexistence of multiple programs each using a separate
1814 region, only programs that make little use of the stack (perhaps avoiding
1815 deeply-nested subroutine invocations and significant register storage) would
1816 be able to coexist without overwriting each other's stacks.
1817
1818 One way that this issue could be alleviated would involve the provision of a
1819 facility to redirect accesses to page &01 to other areas of memory. The ULA
1820 would provide a register that defines a physical page for the use of the CPU's
1821 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would
1822 change the asserted address lines to redirect the access to the appropriate
1823 physical region.
1824
1825 By providing an 8-bit register, mapping to the most significant byte (MSB) of
1826 a 16-bit address, the ULA could then replace any MSB equal to &01 with the
1827 register value before the access is made. Where multiple programs coexist,
1828 upon switching programs, the register would be updated to point the ULA to the
1829 appropriate stack location, thus providing a simple memory management unit
1830 (MMU) capability.
1831
1832 In a similar fashion, zero page accesses could also be redirected so that code
1833 could run from sideways RAM and have zero page operations redirected to "upper
1834 memory" - for example, to page &BE (with stack accesses redirected to page
1835 &BF, perhaps) - thereby permitting most CPU operations to occur without
1836 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
1837 CPU as it contends with the ULA for memory access.
1838
1839 Such facilities could also be provided by a separate circuit between the CPU
1840 and ULA in a fashion similar to that employed by a "turbo" board, but unlike
1841 such boards, no additional RAM would be provided: all memory accesses would
1842 occur as normal through the ULA, albeit redirected when configured
1843 appropriately.
1844
1845 ULA Pin Functions
1846 -----------------
1847
1848 The functions of the ULA pins are described in the Electron Service Manual. Of
1849 interest to video processing are the following:
1850
1851 CSYNC (low during horizontal or vertical synchronisation periods, high
1852 otherwise)
1853
1854 HS (low during horizontal synchronisation periods, high otherwise)
1855
1856 RED, GREEN, BLUE (pixel colour outputs)
1857
1858 CLOCK IN (a 16MHz clock input, 4V peak to peak)
1859
1860 PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
1861
1862 More general memory access pins:
1863
1864 RAM0...RAM3 (data lines to/from the RAM)
1865
1866 RA0...RA7 (address lines for sending both row and column addresses to the RAM)
1867
1868 RAS (row address strobe setting the row address on a negative edge - see the
1869 timing notes)
1870
1871 CAS (column address strobe setting the column address on a negative edge -
1872 see the timing notes)
1873
1874 WE (sets write enable with logic 0, read with logic 1)
1875
1876 ROM (select data access from ROM)
1877
1878 CPU-oriented memory access pins:
1879
1880 A0...A15 (CPU address lines)
1881
1882 PD0...PD7 (CPU data lines)
1883
1884 R/W (indicates CPU write with logic 0, CPU read with logic 1)
1885
1886 Interrupt-related pins:
1887
1888 NMI (CPU request for uninterrupted 1MHz access to memory)
1889
1890 IRQ (signal event to CPU)
1891
1892 POR (power-on reset, resetting the ULA on a positive edge and asserting the
1893 CPU's RST pin)
1894
1895 RST (master reset for the CPU signalled on power-up and by the Break key)
1896
1897 Keyboard-related pins:
1898
1899 KBD0...KBD3 (keyboard inputs)
1900
1901 CAPS LOCK (control status LED)
1902
1903 Sound-related pins:
1904
1905 SOUND O/P (sound output using internal oscillator)
1906
1907 Cassette-related pins:
1908
1909 CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
1910
1911 CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
1912
1913 CAS RC (detect high tone)
1914
1915 CAS MO (motor relay output)
1916
1917 ÷13 IN (~1200 baud clock input)
1918
1919 ULA Socket
1920 ----------
1921
1922 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
1923
1924 References
1925 ----------
1926
1927 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
1928
1929 About this Document
1930 -------------------
1931
1932 The most recent version of this document and accompanying distribution should
1933 be available from the following location:
1934
1935 http://hgweb.boddie.org.uk/ULA
1936
1937 Copyright and licence information can be found in the docs directory of this
1938 distribution - see docs/COPYING.txt for more information.