1 The Acorn Electron ULA
2 ======================
3
4 Principal Design and Feature Constraints
5 ----------------------------------------
6
7 The features of the ULA are limited by the amount of time and resources that
8 can be allocated to each activity necessary to support such features given the
9 fundamental obligations of the unit. Maintaining a screen display based on the
10 contents of RAM itself requires the ULA to have exclusive access to such
11 hardware resources for a significant period of time. Whilst other elements of
12 the ULA can in principle run in parallel with this activity, they cannot also
13 access the RAM. Consequently, other features that might use the RAM must
14 accept a reduced allocation of that resource in comparison to a hypothetical
15 architecture where concurrent RAM access is possible.
16
17 Thus, the principal constraint for many features is bandwidth. The duration of
18 access to hardware resources is one aspect of this; the rate at which such
19 resources can be accessed is another. For example, the RAM is not fast enough
20 to support access more frequently than one byte per 2MHz cycle, and for screen
21 modes involving 80 bytes of screen data per scanline, there are no free cycles
22 for anything other than the production of pixel output during the active
23 scanline periods.
24
25 Timing
26 ------
27
28 According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
29 of which are used to generate pixel data. At 50Hz, this means that 128 cycles
30 are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
31 312 ~= 128 cycles). This is consistent with the observation that each scanline
32 requires at most 80 bytes of data, and that the ULA is apparently busy for 40
33 out of 64 microseconds in each scanline.
34
35 (In fact, since the ULA is seeking to provide an image for an interlaced
36 625-line display, there are in fact two "fields" involved, one providing 312
37 scanlines and one providing 313 scanlines. See below for a description of the
38 video system.)
39
40 Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
41 each providing two bits of each byte) using two cycles within the 500ns period
42 of the 2MHz clock to complete each access operation. Since the CPU and ULA
43 have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
44 effectively run at 1MHz (since every other 500ns period involves the ULA
45 accessing RAM). The CPU is driven by an external clock (IC8) whose 16MHz
46 frequency is divided by the ULA (IC1) depending on the screen mode in use.
47
48 Each 16MHz cycle is approximately 62.5ns. To access the memory, the following
49 patterns corresponding to 16MHz cycles are required:
50
51 Time (ns): 0-------------- 500------------- ...
52 2 MHz cycle: 0 1 ...
53 16 MHz cycle: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ...
54 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
55 ~RAS: /---\___________/---\___________ ...
56 ~CAS: /-----\___/-\___/-----\___/-\___ ...
57 Address events: A B C A B C ...
58 Data events: F S F S ...
59
60 ~RAS ops: 1 0 1 0 ...
61 ~CAS ops: 1 0 1 0 1 0 1 0 ...
62
63 Address ops: a b c a b c ...
64 Data ops: s f s f ...
65
66 ~WE: ......W ...
67 PHI OUT: \_______________/--------------- ...
68 CPU (RAM): L D ...
69 RnW: R ...
70
71 PHI OUT: \_______/-------\_______/------- ...
72 CPU (ROM): L D L D ...
73 RnW: R R ...
74
75 ~RAS must be high for 100ns, ~CAS must be high for 50ns.
76 ~RAS must be low for 150ns, ~CAS must be low for 90ns.
77 Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
78
79 Here, "A" and "B" respectively indicate the row and first column addresses
80 being latched into the RAM (on a negative edge for ~RAS and ~CAS
81 respectively), and "C" indicates the second column address being latched into
82 the RAM. Presumably, the first and second half-bytes can be read at "F" and
83 "S" respectively, and the row and column addresses must be made available at
84 "a" and "b" (and "c") respectively at the latest. Data can be read at "f" and
85 "s" for the first and second half-bytes respectively.
86
87 For the CPU, "L" indicates the point at which an address is taken from the CPU
88 address bus, on a negative edge of PHI OUT, with "D" being the point at which
89 data may either be read or be asserted for writing, on a positive edge of PHI
90 OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low
91 for writing or high for reading, and thus propagates RnW from the CPU, this
92 would need to be done before data would be retrieved and, according to the
93 TM4164EC4 datasheet, even as late as the column address is presented and ~CAS
94 brought low.
95
96 The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
97 address access time of 90ns (maximum), which appears to mean that ~RAS must be
98 held low for at least 150ns and that ~CAS must be held low for at least 90ns
99 before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
100 cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
101 is 1.5 cycles.
102
103 Note that the Service Manual refers to the negative edge of RAS and CAS, but
104 the datasheet for the similar TM4164EC4 product shows latching on the negative
105 edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
106 communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
107 "page mode" provides the appropriate behaviour for that particular product.
108
109 The CPU, when accessing the RAM alone, apparently does not make use of the
110 vacated "slot" that the ULA would otherwise use (when interleaving accesses in
111 MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
112 accessing ROM (and potentially sideways RAM). The principal limitation is the
113 amount of time needed between issuing an address and receiving an entire byte
114 from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
115 4 cycles that would be required for 2MHz operation.
116
117 See: Acorn Electron Advanced User Guide
118 See: Acorn Electron Service Manual
119 http://acorn.chriswhy.co.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
120 See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
121 See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
122
123 CPU Clock Notes
124 ---------------
125
126 "The 6502 has a synchronous memory bus where the master clock is divided into
127 two phases (Phase 1 and Phase 2). The address is always generated during Phase
128 1 and all memory accesses take place during Phase 2."
129
130 Thus, the inverse of PHI OUT provides the other phase of the clock.
131
132 See: http://www.jmargolin.com/vgens/vgens.htm
133
134 Bandwidth Figures
135 -----------------
136
137 Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
138 total lines, with 80 cycles occurring in the active periods of display
139 scanlines, the following bandwidth calculations can be performed:
140
141 Total theoretical maximum:
142 128 cycles * 312 lines
143 = 39936 bytes
144
145 MODE 0, 1, 2:
146 ULA: 80 cycles * 256 lines
147 = 20480 bytes
148 CPU: 48 cycles / 2 * 256 lines
149 + 128 cycles / 2 * (312 - 256) lines
150 = 9728 bytes
151
152 MODE 3:
153 ULA: 80 cycles * 24 rows * 8 lines
154 = 15360 bytes
155 CPU: 48 cycles / 2 * 24 rows * 8 lines
156 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
157 = 12288 bytes
158
159 MODE 4, 5:
160 ULA: 40 cycles * 256 lines
161 = 10240 bytes
162 CPU: (40 cycles + 48 cycles / 2) * 256 lines
163 + 128 cycles / 2 * (312 - 256) lines
164 = 19968 bytes
165
166 MODE 6:
167 ULA: 40 cycles * 24 rows * 8 lines
168 = 7680 bytes
169 CPU: (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
170 + 128 cycles / 2 * (312 - (24 rows * 8 lines))
171 = 19968 bytes
172
173 Here, the division of 2 for CPU accesses is performed to indicate that the CPU
174 only uses every other access opportunity even in uncontended periods. See the
175 2MHz RAM Access enhancement below for bandwidth calculations that consider
176 this limitation removed.
177
178 Video Timing
179 ------------
180
181 According to 8.7 in the Service Manual, and the PAL Wikipedia page,
182 approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
183 (including the "colour burst"), and 1.65µs for the "front porch", totalling
184 12.05µs and thus leaving 51.95µs for the active video signal for each
185 scanline. As the Service Manual suggests in the oscilloscope traces, the
186 display information is transmitted more or less centred within the active
187 video period since the ULA will only be providing pixel data for 40µs in each
188 scanline.
189
190 Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
191 each scanline can be divided into 1024 cycles, although only 640 at most are
192 actively used to provide pixel data. Pixel data production should only occur
193 within a certain period on each scanline, approximately 262 cycles after the
194 start of hsync:
195
196 active video period = 51.95µs
197 pixel data period = 40µs
198 total silent period = 51.95µs - 40µs = 11.95µs
199 silent periods (before and after) = 11.95µs / 2 = 5.975µs
200 hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
201 time before pixel data period = 10.4µs + 5.975µs = 16.375µs
202 pixel data period start cycle = 16.375µs / 62.5ns = 262
203
204 By choosing a number divisible by 8, the RAM access mechanism can be
205 synchronised with the pixel production. Thus, 256 is a more appropriate start
206 cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
207 pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
208 document) occurs at cycle 0.
209
210 To summarise:
211
212 HS signal starts at cycle 0 on each horizontal scanline
213 HS signal ends approximately 4µs later at cycle 64
214 Pixel data starts approximately 12µs later at cycle 256
215
216 "Re: Electron Memory Contention" provides measurements that appear consistent
217 with these calculations.
218
219 The "vertical blanking period", meaning the period before picture information
220 in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
221 this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
222 lines. Thus, the first visible scanline on the first field of a frame occurs
223 half way through the 23rd scanline period measured from the start of vsync
224 (indicated by "V" in the diagrams below):
225
226 10 20 23
227 Line in frame: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
228 Line from 1: 0 22 3
229 Line on screen: .:::::VVVVV::::: 12233445566
230 |_________________________________________________|
231 25 line vertical blanking period
232
233 In the second field of a frame, the first visible scanline coincides with the
234 24th scanline period measured from the start of line 313 in the frame:
235
236 310 336
237 Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
238 Line from 313: 0 23 4
239 Line on screen: 88:::::VVVVV:::: 11223344
240 288 | |
241 |_________________________________________________|
242 25 line vertical blanking period
243
244 In order to consider only full lines, we might consider the start of each
245 frame to occur 23 lines after the start of vsync.
246
247 Again, it is likely that pixel data production should only occur on scanlines
248 within a certain period on each frame. The "625/50" document indicates that
249 only a certain region is "safe" to use, suggesting a vertically centred region
250 with approximately 15 blank lines above and below the picture. However, the
251 "PAL TV timing and voltages" document suggests 28 blank lines above and below
252 the picture. This would centre the 256 lines within the 312 lines of each
253 field and thus provide a start of picture approximately 5.5 or 5 lines after
254 the end of the blanking period or 28 or 27.5 lines after the start of vsync.
255
256 To summarise:
257
258 CSYNC signal starts at cycle 0
259 CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
260 Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
261
262 See: http://en.wikipedia.org/wiki/PAL
263 See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
264 See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
265 http://lipas.uwasa.fi/~f76998/video/modes/
266 See: PAL TV timing and voltages
267 http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
268 See: Line Standards
269 http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
270 See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
271 http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
272 See: Re: Electron Memory Contention
273 http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
274
275 RAM Integrated Circuits
276 -----------------------
277
278 Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
279 CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
280 available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
281 have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
282 ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
283
284 The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
285 the Samsung-produced KM41464 series is apparently equivalent to the Texas
286 Instruments 4164 chips presumably used in the Electron.
287
288 The TM4164EC4 series combines 4 64K x 1b units into a single package and
289 appears similar to the TM4164EA4 featured on the Electron's circuit diagram
290 (in the Advanced User Guide but not the Service Manual), and it also has 22
291 pins providing 3 additional inputs and 3 additional outputs over the 16 pins
292 of the individual 4164-15 modules, presumably allowing concurrent access to
293 the packaged memory units.
294
295 As far as currently available replacements are concerned, the NTE4164 is a
296 potential candidate: according to the Vetco Electronics entry, it is
297 supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
298 parts include the NTE2164 and the NTE6664, both of which appear to have
299 largely the same performance and connection characteristics. Meanwhile, the
300 NTE21256 appears to be a 16-pin replacement with four times the capacity that
301 maintains the single data input and output pins. Using the NTE21256 as a
302 replacement for all ICs combined would be difficult because of the single bit
303 output.
304
305 Another device equivalent to the 4164-15 appears to be available under the
306 code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
307 site lists data sheets for other devices on the same page, but these are
308 different and actually appear to be provided under the 41574 product code (but
309 are listed under 41464-10) and appear to be replacements for the TM4164EC4:
310 the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
311 employing 4 pins for both input and output.
312
313 Pins I/O pins Row access Column access
314 ---- -------- ---------- -------------
315 TM4164EC4 22 4 + 4 150ns (15) 90ns (15)
316 KM41464AP 18 4 150ns (15) 75ns (15)
317 NTE21256 16 1 + 1 150ns 75ns
318 HYB 4164-2 16 1 + 1 150ns 100ns
319 µPD41464 18 4 120ns (12) 60ns (12)
320
321 See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
322 http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf
323 See: Dynamic RAMS
324 http://www.unicornelectronics.com/IC/DYNAMIC.html
325 See: New old stock 8x 4164 chips
326 http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
327 See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
328 http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
329 See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
330 http://www.vetco.net/catalog/product_info.php?products_id=2806
331 See: NTE4164 - IC-NMOS 64K DRAM 150NS
332 http://www.vetco.net/catalog/product_info.php?products_id=3680
333 See: NTE21256 - IC-256K DRAM 150NS
334 http://www.vetco.net/catalog/product_info.php?products_id=2799
335 See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
336 http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
337 See: NTE6664 - IC-MOS 64K DRAM 150NS
338 http://www.vetco.net/catalog/product_info.php?products_id=5213
339 See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
340 http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
341 See: 4164-150: MAJOR BRANDS
342 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
343 See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
344 http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
345 See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
346 http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
347 See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
348 http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
349 See: 41464-10: MAJOR BRANDS
350 http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
351
352 Interrupts
353 ----------
354
355 The ULA generates IRQs (maskable interrupts) according to certain conditions
356 and these conditions are controlled by location &FE00:
357
358 * Vertical sync (bottom of displayed screen)
359 * 50MHz real time clock
360 * Transmit data empty
361 * Receive data full
362 * High tone detect
363
364 The ULA is also used to clear interrupt conditions through location &FE05. Of
365 particular significance is bit 7, which must be set if an NMI (non-maskable
366 interrupt) has occurred and has thus suspended ULA access to memory, restoring
367 the normal function of the ULA.
368
369 ROM Paging
370 ----------
371
372 Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
373 mappings exist:
374
375 8 keyboard
376 9 keyboard (duplicate)
377 10 BASIC ROM
378 11 BASIC ROM (duplicate)
379
380 Paging in a ROM involves the following procedure:
381
382 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
383 2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
384 selected.
385 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
386 whilst writing the desired ROM number n in bits 0 to 2.
387
388 See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
389
390 Shadow/Expanded Memory
391 ----------------------
392
393 The Electron exposes all sixteen address lines and all eight data lines
394 through the expansion bus. Using such lines, it is possible to provide
395 additional memory - typically sideways ROM and RAM - on expansion cards and
396 through cartridges, although the official cartridge specification provides
397 fewer address lines and only seeks to provide access to memory in 16K units.
398
399 Various modifications and upgrades were developed to offer "turbo"
400 capabilities to the Electron, permitting the CPU to access a separate 8K of
401 RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
402 the ULA through additional logic. However, an enhanced ULA might support
403 independent CPU access to memory over the expansion bus by allowing itself to
404 be discharged from providing access to memory, potentially for a range of
405 addresses, and for the CPU to communicate with external memory uninterrupted.
406
407 Sideways RAM/ROM and Upper Memory Access
408 ----------------------------------------
409
410 Although the ULA controls the CPU clock, effectively slowing or stopping the
411 CPU when the ULA needs to access screen memory, it is apparently able to allow
412 the CPU to access addresses of &8000 and above - the upper region of memory -
413 at 2MHz independently of any access to RAM that the ULA might be performing,
414 only blocking the CPU if it attempts to access addresses of &7FFF and below
415 during any ULA memory access - the lower region of memory - by stopping or
416 stalling its clock.
417
418 Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
419 CPU clock if the line goes low, when the CPU is attempting to access the lower
420 region of memory.
421
422 Hardware Scrolling (and Enhancement)
423 ------------------------------------
424
425 On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
426 the least significant 5 bits being zero, thus limiting the scrolling
427 resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
428 using the same layout of these addresses.
429
430 |--&FE02--------------| |--&FE03--------------|
431 XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
432
433 XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
434
435 Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
436 memory to pixel locations is character oriented. A change in 8 bytes would
437 permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
438 MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
439 observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
440 Guide).
441
442 One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
443 of changing the screen address by 2 bytes is the change in the number of lines
444 from the initial and final character rows that need reading by the ULA, which
445 would need to maintain this state information (although this is a relatively
446 trivial change). Another pitfall is the complication that might be introduced
447 to software writing bitmaps of character height to the screen.
448
449 See: http://pastraiser.com/computers/acornelectron/acornelectron.html
450
451 Enhancement: Mode Layouts
452 -------------------------
453
454 Merely changing the screen memory mappings in order to have Archimedes-style
455 row-oriented screen addresses (instead of character-oriented addresses) could
456 be done for the existing modes, but this might not be sufficiently beneficial,
457 especially since accessing regions of the screen would involve incrementing
458 pointers by amounts that are inconvenient on an 8-bit CPU.
459
460 However, instead of using a Archimedes-style mapping, column-oriented screen
461 addresses could be more feasibly employed: incrementing the address would
462 reference the vertical screen location below the currently-referenced location
463 (just as occurs within characters using the existing ULA); instead of
464 returning to the top of the character row and referencing the next horizontal
465 location after eight bytes, the address would reference the next character row
466 and continue to reference locations downwards over the height of the screen
467 until reaching the bottom; at the bottom, the next location would be the next
468 horizontal location at the top of the screen.
469
470 In other words, the memory layout for the screen would resemble the following
471 (for MODE 2):
472
473 &3000 &3100 ... &7F00
474 &3001 &3101
475 ... ...
476 &3007
477 &3008
478 ...
479 ... ...
480 &30FF ... &7FFF
481
482 Since there are 256 pixel rows, each column of locations would be addressable
483 using the low byte of the address. Meanwhile, the high byte would be
484 incremented to address different columns. Thus, addressing screen locations
485 would become a lot more convenient and potentially much more efficient for
486 certain kinds of graphical output.
487
488 One potential complication with this simplified addressing scheme arises with
489 hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
490 with the existing ULA) would be achieved by incrementing or decrementing the
491 screen start address; by one character row, it would involve adding or
492 subtracting 8. However, the ULA only supports multiples of 64 when changing the
493 screen start address. Thus, if such a scheme were to be adopted, three
494 additional bits would need to be supported in the screen start register (see
495 "Hardware Scrolling (and Enhancement)" for more details). However, horizontal
496 scrolling would be much improved even under the severe constraints of the
497 existing ULA: only adjustments of 256 to the screen start address would be
498 required to produce single-location scrolling of as few as two pixels in MODE 2
499 (four pixels in MODEs 1 and 5, eight pixels otherwise).
500
501 More disruptive is the effect of this alternative layout on software.
502 Presumably, compatibility with the BBC Micro was the primary goal of the
503 Electron's hardware design. With the character-oriented screen layout in
504 place, system software (and application software accessing the screen
505 directly) would be relying on this layout to run on the Electron with little
506 or no modification. Although it might have been possible to change the system
507 software to use this column-oriented layout instead, this would have incurred
508 a development cost and caused additional work porting things like games to the
509 Electron. Moreover, a separate branch of the software from that supporting the
510 BBC Micro and closer derivatives would then have needed maintaining.
511
512 The decision to use the character-oriented layout in the BBC Micro may have
513 been related to the choice of circuitry and to facilitate a convenient
514 hardware implementation, and by the time the Electron was planned, it was too
515 late to do anything about this somewhat unfortunate choice.
516
517 Pixel Layouts
518 -------------
519
520 The pixel layouts are as follows:
521
522 Modes Depth (bpp) Pixels (from bits)
523 ----- ----------- ------------------
524 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
525 1, 5 2 73 62 51 40
526 2 4 7531 6420
527
528 Since the ULA reads a half-byte at a time, one might expect it to attempt to
529 produce pixels for every half-byte, as opposed to handling entire bytes.
530 However, the pixel layout is not conducive to producing pixels as soon as a
531 half-byte has been read for a given full-byte location: in 1bpp modes the
532 first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
533 data is spread across the entire byte in different ways.
534
535 An alternative arrangement might be as follows:
536
537 Modes Depth (bpp) Pixels (from bits)
538 ----- ----------- ------------------
539 0, 3, 4, 6 1 7 6 5 4 3 2 1 0
540 1, 5 2 76 54 32 10
541 2 4 7654 3210
542
543 Just as the mode layouts were presumably decided by compatibility with the BBC
544 Micro, the pixel layouts will have been maintained for similar reasons.
545 Unfortunately, this layout prevents any optimisation of the ULA for handling
546 half-byte pixel data generally.
547
548 Enhancement: The Missing MODE 4
549 -------------------------------
550
551 The Electron inherits its screen mode selection from the BBC Micro, where MODE
552 3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
553 Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
554 however, and they are merely implemented by skipping two scanlines in every
555 ten after the eight required to produce a character line. Thus, such modes
556 provide a 24-row display.
557
558 In principle, nothing prevents this "text mode" effect being applied to other
559 modes. The 20-column modes are not well-suited to displaying text, which
560 leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
561 2. Although the need for a non-monochrome 40-column text mode is addressed by
562 MODE 7 on the BBC Micro, the Electron lacks such a mode.
563
564 If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
565 would occupy MODE 4 instead of the current MODE 4:
566
567 Screen mode Size (kilobytes) Colours Rows Resolution
568 ----------- ---------------- ------- ---- ----------
569 0 20 2 32 640x256
570 1 20 4 32 320x256
571 2 20 16 32 160x256
572 3 16 2 24 640x256
573 4 (new) 16 4 24 320x256
574 4 (old) 10 2 32 320x256
575 5 10 4 32 160x256
576 6 8 2 24 320x256
577
578 Thus, for increasing mode numbers, the size of each mode would be the same or
579 less than the preceding mode.
580
581 Enhancement: 2MHz RAM Access
582 ----------------------------
583
584 Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
585 when not competing with the ULA only accesses RAM every other 2MHz cycle (as
586 if the ULA still needed to access the RAM), one useful enhancement would be a
587 mechanism to let the CPU take over the ULA cycles outside the ULA's period of
588 activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
589 3.
590
591 Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
592
593 Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
594 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
595
596 In MODE 4 to 6:
597
598 Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
599 On a non-display line: CCCCCCCC (instead of C_C_C_C_)
600
601 This would improve CPU bandwidth as follows:
602
603 Standard ULA Enhanced ULA
604 MODE 0, 1, 2 9728 bytes 19456 bytes
605 MODE 3 12288 bytes 24576 bytes
606 MODE 4, 5 19968 bytes 29696 bytes
607 MODE 6 19968 bytes 32256 bytes
608
609 With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
610 because all access opportunities to RAM are doubled. Meanwhile, in the other
611 modes, some CPU accesses occur alongside ULA accesses and thus cannot be
612 doubled, but the CPU bandwidth increase is still significant.
613
614 Unfortunately, the mechanism for accessing the RAM is too slow to provide data
615 within the time constraints of 2MHz operation. There is no time remaining in a
616 2MHz cycle for the CPU to receive and process any retrieved data.
617
618 Enhancement: Region Blanking
619 ----------------------------
620
621 The problem of permitting character-oriented blitting in programs whilst
622 scrolling the screen by sub-character amounts could be mitigated by permitting
623 a region of the display to be blank, such as the final lines of the display.
624 Consider the following vertical scrolling by 2 bytes that would cause an
625 initial character row of 6 lines and a final character row of 2 lines:
626
627 6 lines - initial, partial character row
628 248 lines - 31 complete rows
629 2 lines - final, partial character row
630
631 If a routine were in use that wrote 8 line bitmaps to the partial character
632 row now split in two, it would be advisable to hide one of the regions in
633 order to prevent content appearing in the wrong place on screen (such as
634 content meant to appear at the top "leaking" onto the bottom). Blanking 6
635 lines would be sufficient, as can be seen from the following cases.
636
637 Scrolling up by 2 lines:
638
639 6 lines - initial, partial character row
640 240 lines - 30 complete rows
641 4 lines - part of 1 complete row
642 -----------------------------------------------------------------
643 4 lines - part of 1 complete row (hidden to maintain 250 lines)
644 2 lines - final, partial character row (hidden)
645
646 Scrolling down by 2 lines:
647
648 2 lines - initial, partial character row
649 248 lines - 31 complete rows
650 ----------------------------------------------------------
651 6 lines - final, partial character row (hidden)
652
653 Thus, in this case, region blanking would impose a 250 line display with the
654 bottom 6 lines blank.
655
656 See the description of the display suspend enhancement for a more efficient
657 way of blanking lines than merely blanking the palette whilst allowing the CPU
658 to perform useful work during the blanking period.
659
660 To control the blanking or suspending of lines at the top and bottom of the
661 display, a memory location could be dedicated to the task: the upper 4 bits
662 could define a blanking region of up to 16 lines at the top of the screen,
663 whereas the lower 4 bits could define such a region at the bottom of the
664 screen. If more lines were required, two locations could be employed, allowing
665 the top and bottom regions to occupy the entire screen.
666
667 Enhancement: Screen Height Adjustment
668 -------------------------------------
669
670 The height of the screen could be configurable in order to reduce screen
671 memory consumption. This is not quite done in MODE 3 and 6 since the start of
672 the screen appears to be rounded down to the nearest page, but by reducing the
673 height by amounts more than a page, savings would be possible. For example:
674
675 Screen width Depth Height Bytes per line Saving in bytes Start address
676 ------------ ----- ------ -------------- --------------- -------------
677 640 1 252 80 320 &3140 -> &3100
678 640 1 248 80 640 &3280 -> &3200
679 320 1 240 40 640 &5A80 -> &5A00
680 320 2 240 80 1280 &3500
681
682 Screen Mode Selection
683 ---------------------
684
685 Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
686 range of modes, the other bits of &FE*7 (related to sound, cassette
687 input/output and the Caps Lock LED) would need to be reassigned and bit 0
688 potentially being made available for use.
689
690 Enhancement: Palette Definition
691 -------------------------------
692
693 Since all memory accesses go via the ULA, an enhanced ULA could employ more
694 specific addresses than &FE*X to perform enhanced functions. For example, the
695 palette control is done using &FE*8-F and merely involves selecting predefined
696 colours, whereas an enhanced ULA could support the redefinition of all 16
697 colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
698 (colours 8 to 15), where a single byte might provide 8 bits per pixel colour
699 specifications similar to those used on the Archimedes.
700
701 The principal limitation here is actually the hardware: the Electron has only
702 a single output line for each of the red, green and blue channels, and if
703 those outputs are strictly digital and can only be set to a "high" and "low"
704 value, then only the existing eight colours are possible. If a modern ULA were
705 able to output analogue values (or values at well-defined points between the
706 high and low values, such as the half-on value supported by the Amstrad CPC
707 series), it would still need to be assessed whether the circuitry could
708 successfully handle and propagate such values. Various sources indicate that
709 only "TTL levels" are supported by the RGB output circuit, and since there are
710 74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
711 is likely that the ULA is expected to provide only "high" or "low" values.
712
713 Short of adding extra outputs from the ULA (either additional red, green and
714 blue outputs or a combined intensity output), another approach might involve
715 some kind of modulation where an output value might be encoded in multiple
716 pulses at a higher frequency than the pixel frequency. However, this would
717 demand additional circuitry outside the ULA, and component RGB monitors would
718 probably not be able to take advantage of this feature; only UHF and composite
719 video devices (the latter with the composite video colour support enabled on
720 the Electron's circuit board) would potentially benefit.
721
722 Flashing Colours
723 ----------------
724
725 According to the Advanced User Guide, "The cursor and flashing colours are
726 entirely generated in software: This means that all of the logical to physical
727 colour map must be changed to cause colours to flash." This appears to suggest
728 that the palette registers must be updated upon the flash counter - read and
729 written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
730 colour pairs to be any combination of colours might be possible, instead of
731 having colour complements as pairs.
732
733 It is conceivable that the interrupt code responsible does the simple thing
734 and merely inverts the current values for any logical colours (LC) for which
735 the associated physical colour (as supplied as the second parameter to the VDU
736 19 call) has the top bit of its four bit value set. These top bits are not
737 recorded in the palette registers but are presumably recorded separately and
738 used to build bitmaps as follows:
739
740 LC 2 colour 4 colour 16 colour 4-bit value for inversion
741 -- -------- -------- --------- -------------------------
742 0 00010001 00010001 00010001 1, 1, 1
743 1 01000100 00100010 00010001 4, 2, 1
744 2 01000100 00100010 4, 2
745 3 10001000 00100010 8, 2
746 4 00010001 1
747 5 00010001 1
748 6 00100010 2
749 7 00100010 2
750 8 01000100 4
751 9 01000100 4
752 10 10001000 8
753 11 10001000 8
754 12 01000100 4
755 13 01000100 4
756 14 10001000 8
757 15 10001000 8
758
759 Inversion value calculation:
760
761 2 colour formula: 1 << (colour * 2)
762 4 colour formula: 1 << colour
763 16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
764
765 For example, where logical colour 0 has been mapped to a physical colour in
766 the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
767 the inversion operation. (The lower three bits of the physical colour would be
768 used to set the underlying colour information affected by the inversion
769 operation.)
770
771 An operation in the interrupt code would then combine the bitmaps for all
772 logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
773 combined for groups of logical colours as follows:
774
775 Logical colours
776 ---------------
777 0, 2, 8, 10
778 4, 6, 12, 14
779 5, 7, 13, 15
780 1, 3, 9, 11
781
782 These combined bitmaps would be EORed with the existing palette register
783 values in order to perform the value inversion necessary to produce the
784 flashing effect.
785
786 Thus, in the VDU 19 operation, the appropriate inversion value would be
787 calculated for the logical colour, and this value would then be combined with
788 other inversion values in a dedicated memory location corresponding to the
789 colour's group as indicated above. Meanwhile, the palette channel values would
790 be derived from the lower three bits of the specified physical colour and
791 combined with other palette data in dedicated memory locations corresponding
792 to the palette registers.
793
794 Interestingly, although flashing colours on the BBC Micro are controlled by
795 toggling bit 0 of the &FE20 control register location for the Video ULA, the
796 actual colour inversion is done in hardware.
797
798 Enhancement: Palette Definition Lists
799 -------------------------------------
800
801 It can be useful to redefine the palette in order to change the colours
802 available for a particular region of the screen, particularly in modes where
803 the choice of colours is constrained, and if an increased colour depth were
804 available, palette redefinition would be useful to give the illusion of more
805 than 16 colours in MODE 2. Traditionally, palette redefinition has been done
806 by using interrupt-driven timers, but a more efficient approach would involve
807 presenting lists of palette definitions to the ULA so that it can change the
808 palette at a particular display line.
809
810 One might define a palette redefinition list in a region of memory and then
811 communicate its contents to the ULA by writing the address and length of the
812 list, along with the display line at which the palette is to be changed, to
813 ULA registers such that the ULA buffers the list and performs the redefinition
814 at the appropriate time. Throughput/bandwidth considerations might impose
815 restrictions on the practical length of such a list, however.
816
817 Enhancement: Display Synchronisation Interrupts
818 -----------------------------------------------
819
820 When completing each scanline of the display, the ULA could trigger an
821 interrupt. Since this might impact system performance substantially, the
822 feature would probably need to be configurable, and it might be sufficient to
823 have an interrupt only after a certain number of display lines instead.
824 Permitting the CPU to take action after eight lines would allow palette
825 switching and other effects to occur on a character row basis.
826
827 The ULA provides an interrupt at the end of the display period, presumably so
828 that software can schedule updates to the screen, avoid flickering or tearing,
829 and so on. However, some applications might benefit from an interrupt at, or
830 just before, the start of the display period so that palette modifications or
831 similar effects could be scheduled.
832
833 Enhancement: Palette-Free Modes
834 -------------------------------
835
836 Palette-free modes might be defined where bit values directly correspond to
837 the red, green and blue channels, although this would mostly make sense only
838 for modes with depths greater than the standard 4 bits per pixel, and such
839 modes would require more memory than MODE 2 if they were to have an acceptable
840 resolution.
841
842 Enhancement: Display Suspend
843 ----------------------------
844
845 Especially when writing to the screen memory, it could be beneficial to be
846 able to suspend the ULA's access to the memory, instead producing blank values
847 for all screen pixels until a program is ready to reveal the screen. This is
848 different from palette blanking since with a blank palette, the ULA is still
849 reading screen memory and translating its contents into pixel values that end
850 up being blank.
851
852 This function is reminiscent of a capability of the ZX81, albeit necessary on
853 that hardware to reduce the load on the system CPU which was responsible for
854 producing the video output. By allowing display suspend on the Electron, the
855 performance benefit would be derived from giving the CPU full access to the
856 memory bandwidth.
857
858 The region blanking feature mentioned above could be implemented using this
859 enhancement instead of employing palette blanking for the affected lines of
860 the display.
861
862 Enhancement: Memory Filling
863 ---------------------------
864
865 A capability that could be given to an enhanced ULA is that of permitting the
866 ULA to write to screen memory as well being able to read from it. Although
867 such a capability would probably not be useful in conjunction with the
868 existing read operations when producing a screen display, and insufficient
869 bandwidth would exist to do so in high-bandwidth screen modes anyway, the
870 capability could be offered during a display suspend period (as described
871 above), permitting a more efficient mechanism to rapidly fill memory with a
872 predetermined value.
873
874 This capability could also support block filling, where the limits of the
875 filled memory would be defined by the position and size of a screen area,
876 although this would demand the provision of additional registers in the ULA to
877 retain the details of such areas and additional logic to control the fill
878 operation.
879
880 Enhancement: Region Filling
881 ---------------------------
882
883 An alternative to memory writing might involve indicating regions using
884 additional registers or memory where the ULA fills regions of the screen with
885 content instead of reading from memory. Unlike hardware sprites which should
886 realistically provide varied content, region filling could employ single
887 colours or patterns, and one advantage of doing so would be that the ULA need
888 not access memory at all within a particular region.
889
890 Regions would be defined on a row-by-row basis. Instead of reading memory and
891 blitting a direct representation to the screen, the ULA would read region
892 definitions containing a start column, region width and colour details. There
893 might be a certain number of definitions allowed per row, or the ULA might
894 just traverse an ordered list of such definitions with each one indicating the
895 row, start column, region width and colour details.
896
897 One could even compress this information further by requiring only the row,
898 start column and colour details with each subsequent definition terminating
899 the effect of the previous one. However, one would also need to consider the
900 convenience of preparing such definitions and whether efficient access to
901 definitions for a particular row might be desirable. It might also be
902 desirable to avoid having to prepare definitions for "empty" areas of the
903 screen, effectively making the definition of the screen contents employ
904 run-length encoding and employ only colour plus length information.
905
906 One application of region filling is that of simple 2D and 3D shape rendering.
907 Although it is entirely possible to plot such shapes to the screen and have
908 the ULA blit the memory contents to the screen, such operations consume
909 bandwidth both in the initial plotting and in the final transfer to the
910 screen. Region filling would reduce such bandwidth usage substantially.
911
912 This way of representing screen images would make certain kinds of images
913 unfeasible to represent - consider alternating single pixel values which could
914 easily occur in some character bitmaps - even if an internal queue of regions
915 were to be supported such that the ULA could read ahead and buffer such
916 "bandwidth intensive" areas. Thus, the ULA might be better served providing
917 this feature for certain areas of the display only as some kind of special
918 graphics window.
919
920 Enhancement: Hardware Sprites
921 -----------------------------
922
923 An enhanced ULA might provide hardware sprites, but this would be done in an
924 way that is incompatible with the standard ULA, since no &FE*X locations are
925 available for allocation. To keep the facility simple, hardware sprites would
926 have a standard byte width and height.
927
928 The specification of sprites could involve the reservation of 16 locations
929 (for example, &FE20-F) specifying a fixed number of eight sprites, with each
930 location pair referring to the sprite data. By limiting the ULA to dealing
931 with a fixed number of sprites, the work required inside the ULA would be
932 reduced since it would avoid having to deal with arbitrary numbers of sprites.
933
934 The principal limitation on providing hardware sprites is that of having to
935 obtain sprite data, given that the ULA is usually required to retrieve screen
936 data, and given the lack of memory bandwidth available to retrieve sprite data
937 (particularly from multiple sprites supposedly at the same position) and
938 screen data simultaneously. Although the ULA could potentially read sprite
939 data and screen data in alternate memory accesses in screen modes where the
940 bandwidth is not already fully utilised, this would result in a degradation of
941 performance.
942
943 Enhancement: Additional Screen Mode Configurations
944 --------------------------------------------------
945
946 Alternative screen mode configurations could be supported. The ULA has to
947 produce 640 pixel values across the screen, with pixel doubling or quadrupling
948 employed to fill the screen width:
949
950 Screen width Columns Scaling Depth Bytes
951 ------------ ------- ------- ----- -----
952 640 80 x1 1 80
953 320 40 x2 1, 2 40, 80
954 160 20 x4 2, 4 40, 80
955
956 It must also use at most 80 byte-sized memory accesses to provide the
957 information for the display. Given that characters must occupy an 8x8 pixel
958 array, if a configuration featuring anything other than 20, 40 or 80 character
959 columns is to be supported, compromises must be made such as the introduction
960 of blank pixels either between characters (such as occurs between rows in MODE
961 3 and 6) or at the end of a scanline (such as occurs at the end of the frame
962 in MODE 3 and 6). Consider the following configuration:
963
964 Screen width Columns Scaling Depth Bytes Blank
965 ------------ ------- ------- ----- ------ -----
966 208 26 x3 1, 2 26, 52 16
967
968 Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
969 colours could be provided, with 16 blank pixel values (out of a total of 640)
970 generated either at the start or end (or split between the start and end) of
971 each scanline.
972
973 Enhancement: Character Attributes
974 ---------------------------------
975
976 The BBC Micro MODE 7 employs something resembling character attributes to
977 support teletext displays, but depends on circuitry providing a character
978 generator. The ZX Spectrum, on the other hand, provides character attributes
979 as a means of colouring bitmapped graphics. Although such a feature is very
980 limiting as the sole means of providing multicolour graphics, in situations
981 where the choice is between low resolution multicolour graphics or high
982 resolution monochrome graphics, character attributes provide a potentially
983 useful compromise.
984
985 For each byte read, the ULA must deliver 8 pixel values (out of a total of
986 640) to the video output, doing so by either emptying its pixel buffer on a
987 pixel per cycle basis, or by multiplying pixels and thus holding them for more
988 than one cycle. For example for a screen mode having 640 pixels in width:
989
990 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
991 Reads: B B
992 Pixels: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
993
994 And for a screen mode having 320 pixels in width:
995
996 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
997 Reads: B
998 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
999
1000 However, in modes where less than 80 bytes are required to generate the pixel
1001 values, an enhanced ULA might be able to read additional bytes between those
1002 providing the bitmapped graphics data:
1003
1004 Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1005 Reads: B A
1006 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1007
1008 These additional bytes could provide colour information for the bitmapped data
1009 in the following character column (of 8 pixels). Since it would be desirable
1010 to apply attribute data to the first column, the initial 8 cycles might be
1011 configured to not produce pixel values.
1012
1013 For an entire character, attribute data need only be read for the first row of
1014 pixels for a character. The subsequent rows would have attribute information
1015 applied to them, although this would require the attribute data to be stored
1016 in some kind of buffer. Thus, the following access pattern would be observed:
1017
1018 Cycle: A B ... _ B ... _ B ... _ B ... _ B ... _ B ... _ B ... _ B ...
1019
1020 A whole byte used for colour information for a whole character would result in
1021 a choice of 256 colours, and this might be somewhat excessive. By only reading
1022 attribute bytes at every other opportunity, a choice of 16 colours could be
1023 applied individually to two characters.
1024
1025 Cycle: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1026 Reads: B A B -
1027 Pixels: 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
1028
1029 Further reductions in attribute data access, offering 4 colours for every
1030 character in a four character block, for example, might also be worth
1031 considering.
1032
1033 Consider the following configurations for screen modes with a colour depth of
1034 1 bit per pixel for bitmap information:
1035
1036 Screen width Columns Scaling Bytes (B) Bytes (A) Colours Screen start
1037 ------------ ------- ------- --------- --------- ------- ------------
1038 320 40 x2 40 40 256 &5300
1039 320 40 x2 40 20 16 &5580 -> &5500
1040 320 40 x2 40 10 4 &56C0 -> &5600
1041 208 26 x3 26 26 256 &62C0 -> &6200
1042 208 26 x3 26 13 16 &6460 -> &6400
1043
1044 Enhancement: MODE 7 Emulation using Character Attributes
1045 --------------------------------------------------------
1046
1047 If the scheme of applying attributes to character regions were employed to
1048 emulate MODE 7, in conjunction with the MODE 6 display technique, the
1049 following configuration would be required:
1050
1051 Screen width Columns Rows Bytes (B) Bytes (A) Colours Screen start
1052 ------------ ------- ---- --------- --------- ------- ------------
1053 320 40 25 40 20 16 &5ECC -> &5E00
1054 320 40 25 40 10 4 &5FC6 -> &5F00
1055
1056 Although this requires much more memory than MODE 7 (8500 bytes versus MODE
1057 7's 1000 bytes), it does not need much more memory than MODE 6, and it would
1058 at least make a limited 40-column multicolour mode available as a substitute
1059 for MODE 7.
1060
1061 Enhancement: High Resolution Graphics
1062 -------------------------------------
1063
1064 Screen modes with higher resolutions and larger colour depths might be
1065 possible, but this would in most cases involve the allocation of more screen
1066 memory, and the ULA would probably then be obliged to page in such memory for
1067 the CPU to be able to sensibly access it all.
1068
1069 Enhancement: Genlock Support
1070 ----------------------------
1071
1072 The ULA generates a video signal in conjunction with circuitry producing the
1073 output features necessary for the correct display of the screen image.
1074 However, it appears that the ULA drives the video synchronisation mechanism
1075 instead of reacting to an existing signal. Genlock support might be possible
1076 if the ULA were made to be responsive to such external signals, resetting its
1077 address generators upon receiving synchronisation events.
1078
1079 Enhancement: Improved Sound
1080 ---------------------------
1081
1082 The standard ULA reserves &FE*6 for sound generation and cassette input/output
1083 (with bits 1 and 2 of &FE*7 being used to select either sound generation or
1084 cassette I/O), thus making it impossible to support multiple channels within
1085 the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
1086 and an enhanced ULA could adopt this interface.
1087
1088 The BBC Micro uses the SN76489 chip to produce sound, and the entire
1089 functionality of this chip could be emulated for enhanced sound, with a subset
1090 of the functionality exposed via the &FE*6 interface.
1091
1092 See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
1093 See: http://www.smspower.org/Development/SN76489
1094
1095 Enhancement: Waveform Upload
1096 ----------------------------
1097
1098 As with a hardware sprite function, waveforms could be uploaded or referenced
1099 using locations as registers referencing memory regions.
1100
1101 Enhancement: Sound Input/Output
1102 -------------------------------
1103
1104 Since the ULA already controls audio input/output for cassette-based data, it
1105 would have been interesting to entertain the idea of sampling and output of
1106 sounds through the cassette interface. However, a significant amount of
1107 circuitry is employed to process the input signal for use by the ULA and to
1108 process the output signal for recording.
1109
1110 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
1111
1112 Enhancement: BBC ULA Compatibility
1113 ----------------------------------
1114
1115 Although some new ULA functions could be defined in a way that is also
1116 compatible with the BBC Micro, the BBC ULA is itself incompatible with the
1117 Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
1118 map, but controls various functions specific to the 6845 video controller;
1119 &FE08-F is reserved for the serial controller. It therefore becomes possible
1120 to disregard compatibility where compatibility is already disregarded for a
1121 particular area of functionality.
1122
1123 &FE20-F maps to video ULA functionality on the BBC Micro which provides
1124 control over the palette (using address &FE21, compared to &FE07-F on the
1125 Electron) and other system-specific functions. Since the location usage is
1126 generally incompatible, this region could be reused for other purposes.
1127
1128 Enhancement: Increased RAM, ULA and CPU Performance
1129 ---------------------------------------------------
1130
1131 More modern implementations of the hardware might feature faster RAM coupled
1132 with an increased ULA clock frequency in order to increase the bandwidth
1133 available to the ULA and to the CPU in situations where the ULA is not needed
1134 to perform work. A ULA employing a 32MHz clock would be able to complete the
1135 retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
1136 to access the RAM for the following 250ns even in display modes requiring the
1137 retrieval of a byte for the display every 500ns. The CPU could, subject to
1138 timing issues, run at 2MHz even in MODE 0, 1 and 2.
1139
1140 A scheme such as that described above would have a similar effect to the
1141 scheme employed in the BBC Micro, although the latter made use of RAM with a
1142 wider bandwidth in order to complete memory transfers within 250ns and thus
1143 permit the CPU to run continuously at 2MHz.
1144
1145 Higher bandwidth could potentially be used to implement exotic features such
1146 as RAM-resident hardware sprites or indeed any feature demanding RAM access
1147 concurrent with the production of the display image.
1148
1149 Enhancement: Multiple CPU Stacks and Zero Pages
1150 -----------------------------------------------
1151
1152 The 6502 maintains a stack for subroutine calls and register storage in page
1153 &01. Although the stack register can be manipulated using the TSX and TXS
1154 instructions, thereby permitting the maintenance of multiple stack regions and
1155 thus the potential coexistence of multiple programs each using a separate
1156 region, only programs that make little use of the stack (perhaps avoiding
1157 deeply-nested subroutine invocations and significant register storage) would
1158 be able to coexist without overwriting each other's stacks.
1159
1160 One way that this issue could be alleviated would involve the provision of a
1161 facility to redirect accesses to page &01 to other areas of memory. The ULA
1162 would provide a register that defines a physical page for the use of the CPU's
1163 "logical" page &01, and upon any access to page &01 by the CPU, the ULA would
1164 change the asserted address lines to redirect the access to the appropriate
1165 physical region.
1166
1167 By providing an 8-bit register, mapping to the most significant byte (MSB) of
1168 a 16-bit address, the ULA could then replace any MSB equal to &01 with the
1169 register value before the access is made. Where multiple programs coexist,
1170 upon switching programs, the register would be updated to point the ULA to the
1171 appropriate stack location, thus providing a simple memory management unit
1172 (MMU) capability.
1173
1174 In a similar fashion, zero page accesses could also be redirected so that code
1175 could run from sideways RAM and have zero page operations redirected to "upper
1176 memory" - for example, to page &BE (with stack accesses redirected to page
1177 &BF, perhaps) - thereby permitting most CPU operations to occur without
1178 inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
1179 CPU as it contends with the ULA for memory access.
1180
1181 Such facilities could also be provided by a separate circuit between the CPU
1182 and ULA in a fashion similar to that employed by a "turbo" board, but unlike
1183 such boards, no additional RAM would be provided: all memory accesses would
1184 occur as normal through the ULA, albeit redirected when configured
1185 appropriately.
1186
1187 ULA Pin Functions
1188 -----------------
1189
1190 The functions of the ULA pins are described in the Electron Service Manual. Of
1191 interest to video processing are the following:
1192
1193 CSYNC (low during horizontal or vertical synchronisation periods, high
1194 otherwise)
1195
1196 HS (low during horizontal synchronisation periods, high otherwise)
1197
1198 RED, GREEN, BLUE (pixel colour outputs)
1199
1200 CLOCK IN (a 16MHz clock input, 4V peak to peak)
1201
1202 PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
1203
1204 More general memory access pins:
1205
1206 RAM0...RAM3 (data lines to/from the RAM)
1207
1208 RA0...RA7 (address lines for sending both row and column addresses to the RAM)
1209
1210 RAS (row address strobe setting the row address on a negative edge - see the
1211 timing notes)
1212
1213 CAS (column address strobe setting the column address on a negative edge -
1214 see the timing notes)
1215
1216 WE (sets write enable with logic 0, read with logic 1)
1217
1218 ROM (select data access from ROM)
1219
1220 CPU-oriented memory access pins:
1221
1222 A0...A15 (CPU address lines)
1223
1224 PD0...PD7 (CPU data lines)
1225
1226 R/W (indicates CPU write with logic 0, CPU read with logic 1)
1227
1228 Interrupt-related pins:
1229
1230 NMI (CPU request for uninterrupted 1MHz access to memory)
1231
1232 IRQ (signal event to CPU)
1233
1234 POR (power-on reset, resetting the ULA on a positive edge and asserting the
1235 CPU's RST pin)
1236
1237 RST (master reset for the CPU signalled on power-up and by the Break key)
1238
1239 Keyboard-related pins:
1240
1241 KBD0...KBD3 (keyboard inputs)
1242
1243 CAPS LOCK (control status LED)
1244
1245 Sound-related pins:
1246
1247 SOUND O/P (sound output using internal oscillator)
1248
1249 Cassette-related pins:
1250
1251 CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
1252
1253 CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
1254
1255 CAS RC (detect high tone)
1256
1257 CAS MO (motor relay output)
1258
1259 ÷13 IN (~1200 baud clock input)
1260
1261 ULA Socket
1262 ----------
1263
1264 The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
1265
1266 References
1267 ----------
1268
1269 See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
1270
1271 About this Document
1272 -------------------
1273
1274 The most recent version of this document and accompanying distribution should
1275 be available from the following location:
1276
1277 http://hgweb.boddie.org.uk/ULA
1278
1279 Copyright and licence information can be found in the docs directory of this
1280 distribution - see docs/COPYING.txt for more information.