ULA

Annotated ULA.txt

113:d4fc92a0628e
2017-04-12 Paul Boddie Noted that the ULA could re-read the character values and just increment an index into the character bitmap data, moving to the next set of values when changing character lines, rather than having to cache the character values.
paul@71 1
The Acorn Electron ULA
paul@71 2
======================
paul@71 3
paul@46 4
Principal Design and Feature Constraints
paul@46 5
----------------------------------------
paul@46 6
paul@46 7
The features of the ULA are limited by the amount of time and resources that
paul@46 8
can be allocated to each activity necessary to support such features given the
paul@46 9
fundamental obligations of the unit. Maintaining a screen display based on the
paul@46 10
contents of RAM itself requires the ULA to have exclusive access to such
paul@46 11
hardware resources for a significant period of time. Whilst other elements of
paul@46 12
the ULA can in principle run in parallel with this activity, they cannot also
paul@46 13
access the RAM. Consequently, other features that might use the RAM must
paul@46 14
accept a reduced allocation of that resource in comparison to a hypothetical
paul@46 15
architecture where concurrent RAM access is possible.
paul@46 16
paul@46 17
Thus, the principal constraint for many features is bandwidth. The duration of
paul@46 18
access to hardware resources is one aspect of this; the rate at which such
paul@46 19
resources can be accessed is another. For example, the RAM is not fast enough
paul@46 20
to support access more frequently than one byte per 2MHz cycle, and for screen
paul@46 21
modes involving 80 bytes of screen data per scanline, there are no free cycles
paul@46 22
for anything other than the production of pixel output during the active
paul@46 23
scanline periods.
paul@46 24
paul@22 25
Timing
paul@22 26
------
paul@22 27
paul@40 28
According to 15.3.2 in the Advanced User Guide, there are 312 scanlines, 256
paul@40 29
of which are used to generate pixel data. At 50Hz, this means that 128 cycles
paul@40 30
are spent on each scanline (2000000 cycles / 50 = 40000 cycles; 40000 cycles /
paul@40 31
312 ~= 128 cycles). This is consistent with the observation that each scanline
paul@37 32
requires at most 80 bytes of data, and that the ULA is apparently busy for 40
paul@37 33
out of 64 microseconds in each scanline.
paul@22 34
paul@78 35
(In fact, since the ULA is seeking to provide an image for an interlaced
paul@78 36
625-line display, there are in fact two "fields" involved, one providing 312
paul@78 37
scanlines and one providing 313 scanlines. See below for a description of the
paul@78 38
video system.)
paul@78 39
paul@33 40
Access to RAM involves accessing four 64Kb dynamic RAM devices (IC4 to IC7,
paul@33 41
each providing two bits of each byte) using two cycles within the 500ns period
paul@36 42
of the 2MHz clock to complete each access operation. Since the CPU and ULA
paul@36 43
have to take turns in accessing the RAM in MODE 4, 5 and 6, the CPU must
paul@36 44
effectively run at 1MHz (since every other 500ns period involves the ULA
paul@36 45
accessing RAM). The CPU is driven by an external clock (IC8) whose 16MHz
paul@36 46
frequency is divided by the ULA (IC1) depending on the screen mode in use.
paul@33 47
paul@37 48
Each 16MHz cycle is approximately 62.5ns. To access the memory, the following
paul@37 49
patterns corresponding to 16MHz cycles are required:
paul@37 50
paul@99 51
     Time (ns):  0-------------- 500------------- ...
paul@99 52
   2 MHz cycle:  0               1                ...
paul@99 53
  16 MHz cycle:  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  ...
paul@99 54
                 /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ ...
paul@100 55
          ~RAS:  /---\___________/---\___________ ...
paul@100 56
          ~CAS:  /-----\___/-\___/-----\___/-\___ ...
paul@101 57
Address events:      A B     C       A B     C    ...
paul@101 58
   Data events:           F     S         F     S ...
paul@37 59
paul@101 60
      ~RAS ops:  1   0           1   0            ...
paul@101 61
      ~CAS ops:  1     0   1 0   1     0   1 0    ...
paul@101 62
paul@101 63
   Address ops:     a b     c       a b     c     ...
paul@101 64
      Data ops:  s         f     s         f      ...
paul@101 65
paul@101 66
           ~WE:  ......W                          ...
paul@100 67
       PHI OUT:  \_______________/--------------- ...
paul@100 68
     CPU (RAM):  L               D                ...
paul@101 69
           RnW:  R                                ...
paul@99 70
paul@100 71
       PHI OUT:  \_______/-------\_______/------- ...
paul@100 72
     CPU (ROM):  L       D       L       D        ...
paul@100 73
           RnW:          R               R        ...
paul@97 74
paul@101 75
~RAS must be high for 100ns, ~CAS must be high for 50ns.
paul@101 76
~RAS must be low for 150ns, ~CAS must be low for 90ns.
paul@101 77
Data is available 150ns after ~RAS goes low, 90ns after ~CAS goes low.
paul@101 78
paul@64 79
Here, "A" and "B" respectively indicate the row and first column addresses
paul@64 80
being latched into the RAM (on a negative edge for ~RAS and ~CAS
paul@64 81
respectively), and "C" indicates the second column address being latched into
paul@64 82
the RAM. Presumably, the first and second half-bytes can be read at "F" and
paul@64 83
"S" respectively, and the row and column addresses must be made available at
paul@99 84
"a" and "b" (and "c") respectively at the latest. Data can be read at "f" and
paul@99 85
"s" for the first and second half-bytes respectively.
paul@64 86
paul@97 87
For the CPU, "L" indicates the point at which an address is taken from the CPU
paul@97 88
address bus, on a negative edge of PHI OUT, with "D" being the point at which
paul@97 89
data may either be read or be asserted for writing, on a positive edge of PHI
paul@97 90
OUT. Here, PHI OUT is driven at 1MHz. Given that ~WE needs to be driven low
paul@97 91
for writing or high for reading, and thus propagates RnW from the CPU, this
paul@97 92
would need to be done before data would be retrieved and, according to the
paul@97 93
TM4164EC4 datasheet, even as late as the column address is presented and ~CAS
paul@97 94
brought low.
paul@97 95
paul@64 96
The TM4164EC4-15 has a row address access time of 150ns (maximum) and a column
paul@99 97
address access time of 90ns (maximum), which appears to mean that ~RAS must be
paul@99 98
held low for at least 150ns and that ~CAS must be held low for at least 90ns
paul@99 99
before data becomes available. 150ns is 2.4 cycles (at 16MHz) and 90ns is 1.44
paul@99 100
cycles. Thus, "A" to "F" is 2.5 cycles, "B" to "F" is 1.5 cycles, "C" to "S"
paul@99 101
is 1.5 cycles.
paul@37 102
paul@38 103
Note that the Service Manual refers to the negative edge of RAS and CAS, but
paul@38 104
the datasheet for the similar TM4164EC4 product shows latching on the negative
paul@38 105
edge of ~RAS and ~CAS. It is possible that the Service Manual also intended to
paul@38 106
communicate the latter behaviour. In the TM4164EC4 datasheet, it appears that
paul@38 107
"page mode" provides the appropriate behaviour for that particular product.
paul@38 108
paul@76 109
The CPU, when accessing the RAM alone, apparently does not make use of the
paul@76 110
vacated "slot" that the ULA would otherwise use (when interleaving accesses in
paul@76 111
MODE 4, 5 and 6). It only employs a full 2MHz access frequency to memory when
paul@103 112
accessing ROM (and potentially sideways RAM). The principal limitation is the
paul@103 113
amount of time needed between issuing an address and receiving an entire byte
paul@103 114
from the RAM, which is approximately 7 cycles (at 16MHz): much longer than the
paul@103 115
4 cycles that would be required for 2MHz operation.
paul@76 116
paul@57 117
See: Acorn Electron Advanced User Guide
paul@57 118
See: Acorn Electron Service Manual
paul@57 119
     http://acorn.chriswhy.co.uk/docs/Acorn/Manuals/Acorn_ElectronSM.pdf
paul@57 120
See: http://mdfs.net/Docs/Comp/Electron/Techinfo.htm
paul@76 121
See: http://stardot.org.uk/forums/viewtopic.php?p=120438#p120438
paul@76 122
paul@110 123
CPU Clock Notes
paul@110 124
---------------
paul@110 125
paul@111 126
"The 6502 receives an external square-wave clock input signal on pin 37, which
paul@111 127
is usually labeled PHI0. [...] This clock input is processed within the 6502
paul@111 128
to form two clock outputs: PHI1 and PHI2 (pins 3 and 39, respectively). PHI2
paul@111 129
is essentially a copy of PHI0; more specifically, PHI2 is PHI0 after it's been
paul@111 130
through two inverters and a push-pull amplifier. The same network of
paul@111 131
transistors within the 6502 which generates PHI2 is also tied to PHI1, and
paul@111 132
generates PHI1 as the inverse of PHI0. The reason why PHI1 and PHI2 are made
paul@111 133
available to external devices is so that they know when they can access the
paul@111 134
CPU. When PHI1 is high, this means that external devices can read from the
paul@111 135
address bus or data bus; when PHI2 is high, this means that external devices
paul@111 136
can write to the data bus."
paul@111 137
paul@111 138
See: http://lateblt.livejournal.com/88105.html
paul@111 139
paul@110 140
"The 6502 has a synchronous memory bus where the master clock is divided into
paul@110 141
two phases (Phase 1 and Phase 2). The address is always generated during Phase
paul@110 142
1 and all memory accesses take place during Phase 2."
paul@110 143
paul@111 144
See: http://www.jmargolin.com/vgens/vgens.htm
paul@110 145
paul@111 146
Thus, the inverse of PHI OUT provides the "other phase" of the clock. "During
paul@111 147
Phase 1" means when PHI0 - really PHI2 - is high and "during Phase 2" means
paul@111 148
when PHI1 is high.
paul@110 149
paul@76 150
Bandwidth Figures
paul@76 151
-----------------
paul@76 152
paul@76 153
Using an observation of 128 2MHz cycles per scanline, 256 active lines and 312
paul@76 154
total lines, with 80 cycles occurring in the active periods of display
paul@76 155
scanlines, the following bandwidth calculations can be performed:
paul@76 156
paul@76 157
Total theoretical maximum:
paul@76 158
       128 cycles * 312 lines
paul@76 159
     = 39936 bytes
paul@76 160
paul@76 161
MODE 0, 1, 2:
paul@76 162
ULA:    80 cycles * 256 lines
paul@76 163
     = 20480 bytes
paul@76 164
CPU:    48 cycles / 2 * 256 lines
paul@76 165
     + 128 cycles / 2 * (312 - 256) lines
paul@76 166
     = 9728 bytes
paul@76 167
paul@76 168
MODE 3:
paul@76 169
ULA:    80 cycles * 24 rows * 8 lines
paul@76 170
     = 15360 bytes
paul@76 171
CPU:    48 cycles / 2 * 24 rows * 8 lines
paul@76 172
     + 128 cycles / 2 * (312 - (24 rows * 8 lines))
paul@76 173
     = 12288 bytes
paul@76 174
paul@76 175
MODE 4, 5:
paul@76 176
ULA:    40 cycles * 256 lines
paul@76 177
     = 10240 bytes
paul@76 178
CPU:   (40 cycles + 48 cycles / 2) * 256 lines
paul@76 179
     + 128 cycles / 2 * (312 - 256) lines
paul@76 180
     = 19968 bytes
paul@76 181
paul@76 182
MODE 6:
paul@76 183
ULA:    40 cycles * 24 rows * 8 lines
paul@76 184
     = 7680 bytes
paul@76 185
CPU:   (40 cycles + 48 cycles / 2) * 24 rows * 8 lines
paul@76 186
     + 128 cycles / 2 * (312 - (24 rows * 8 lines))
paul@76 187
     = 19968 bytes
paul@76 188
paul@76 189
Here, the division of 2 for CPU accesses is performed to indicate that the CPU
paul@76 190
only uses every other access opportunity even in uncontended periods. See the
paul@76 191
2MHz RAM Access enhancement below for bandwidth calculations that consider
paul@76 192
this limitation removed.
paul@57 193
paul@40 194
Video Timing
paul@40 195
------------
paul@40 196
paul@40 197
According to 8.7 in the Service Manual, and the PAL Wikipedia page,
paul@40 198
approximately 4.7µs is used for the sync pulse, 5.7µs for the "back porch"
paul@40 199
(including the "colour burst"), and 1.65µs for the "front porch", totalling
paul@40 200
12.05µs and thus leaving 51.95µs for the active video signal for each
paul@40 201
scanline. As the Service Manual suggests in the oscilloscope traces, the
paul@40 202
display information is transmitted more or less centred within the active
paul@40 203
video period since the ULA will only be providing pixel data for 40µs in each
paul@40 204
scanline.
paul@39 205
paul@39 206
Each 62.5ns cycle happens to correspond to 64µs divided by 1024, meaning that
paul@39 207
each scanline can be divided into 1024 cycles, although only 640 at most are
paul@40 208
actively used to provide pixel data. Pixel data production should only occur
paul@40 209
within a certain period on each scanline, approximately 262 cycles after the
paul@40 210
start of hsync:
paul@40 211
paul@40 212
  active video period = 51.95µs
paul@40 213
  pixel data period = 40µs
paul@40 214
  total silent period = 51.95µs - 40µs = 11.95µs
paul@40 215
  silent periods (before and after) = 11.95µs / 2 = 5.975µs
paul@40 216
  hsync and back porch period = 4.7µs + 5.7µs = 10.4µs
paul@40 217
  time before pixel data period = 10.4µs + 5.975µs = 16.375µs
paul@40 218
  pixel data period start cycle = 16.375µs / 62.5ns = 262
paul@40 219
paul@40 220
By choosing a number divisible by 8, the RAM access mechanism can be
paul@84 221
synchronised with the pixel production. Thus, 256 is a more appropriate start
paul@84 222
cycle, where the HS (horizontal sync) signal corresponding to the 4µs sync
paul@84 223
pulse (or "normal sync" pulse as described by the "PAL TV timing and voltages"
paul@84 224
document) occurs at cycle 0.
paul@84 225
paul@84 226
To summarise:
paul@84 227
paul@84 228
  HS signal starts at cycle 0 on each horizontal scanline
paul@84 229
  HS signal ends approximately 4µs later at cycle 64
paul@84 230
  Pixel data starts approximately 12µs later at cycle 256
paul@84 231
paul@84 232
"Re: Electron Memory Contention" provides measurements that appear consistent
paul@84 233
with these calculations.
paul@40 234
paul@40 235
The "vertical blanking period", meaning the period before picture information
paul@78 236
in each field is 25 lines out of 312 (or 313) and thus lasts for 1.6ms. Of
paul@78 237
this, 2.5 lines occur before the vsync (field sync) which also lasts for 2.5
paul@78 238
lines. Thus, the first visible scanline on the first field of a frame occurs
paul@84 239
half way through the 23rd scanline period measured from the start of vsync
paul@84 240
(indicated by "V" in the diagrams below):
paul@40 241
paul@40 242
                                        10                  20    23
paul@40 243
  Line in frame:       1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
paul@40 244
    Line from 1:       0                                          22 3
paul@40 245
 Line on screen: .:::::VVVVV:::::                                   12233445566
paul@40 246
                  |_________________________________________________|
paul@40 247
                           25 line vertical blanking period
paul@40 248
paul@40 249
In the second field of a frame, the first visible scanline coincides with the
paul@40 250
24th scanline period measured from the start of line 313 in the frame:
paul@40 251
paul@40 252
               310                                                 336
paul@40 253
  Line in frame: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
paul@78 254
  Line from 313:       0                                            23 4
paul@40 255
 Line on screen: 88:::::VVVVV::::                                    11223344
paul@40 256
               288 |                                                 |
paul@40 257
                   |_________________________________________________|
paul@40 258
                            25 line vertical blanking period
paul@40 259
paul@40 260
In order to consider only full lines, we might consider the start of each
paul@40 261
frame to occur 23 lines after the start of vsync.
paul@40 262
paul@40 263
Again, it is likely that pixel data production should only occur on scanlines
paul@40 264
within a certain period on each frame. The "625/50" document indicates that
paul@40 265
only a certain region is "safe" to use, suggesting a vertically centred region
paul@84 266
with approximately 15 blank lines above and below the picture. However, the
paul@84 267
"PAL TV timing and voltages" document suggests 28 blank lines above and below
paul@84 268
the picture. This would centre the 256 lines within the 312 lines of each
paul@84 269
field and thus provide a start of picture approximately 5.5 or 5 lines after
paul@84 270
the end of the blanking period or 28 or 27.5 lines after the start of vsync.
paul@84 271
paul@84 272
To summarise:
paul@84 273
paul@84 274
  CSYNC signal starts at cycle 0
paul@84 275
  CSYNC signal ends approximately 160µs (2.5 lines) later at cycle 2560
paul@84 276
  Start of line occurs approximately 1632µs (5.5 lines) later at cycle 28672
paul@40 277
paul@57 278
See: http://en.wikipedia.org/wiki/PAL
paul@57 279
See: http://en.wikipedia.org/wiki/Analog_television#Structure_of_a_video_signal
paul@57 280
See: The 625/50 PAL Video Signal and TV Compatible Graphics Modes
paul@57 281
     http://lipas.uwasa.fi/~f76998/video/modes/
paul@57 282
See: PAL TV timing and voltages
paul@57 283
     http://www.retroleum.co.uk/electronics-articles/pal-tv-timing-and-voltages/
paul@57 284
See: Line Standards
paul@57 285
     http://www.pembers.freeserve.co.uk/World-TV-Standards/Line-Standards.html
paul@84 286
See: Horizontal Blanking Interval of 405-, 525-, 625- and 819-Line Standards
paul@84 287
     http://www.pembers.freeserve.co.uk/World-TV-Standards/HBI.pdf
paul@84 288
See: Re: Electron Memory Contention
paul@84 289
     http://www.stardot.org.uk/forums/viewtopic.php?p=134109#p134109
paul@57 290
paul@56 291
RAM Integrated Circuits
paul@56 292
-----------------------
paul@56 293
paul@65 294
Unicorn Electronics appears to offer 4164 RAM chips (as well as 6502 series
paul@65 295
CPUs such as the 6502, 6502A, 6502B and 65C02). These 4164 devices are
paul@65 296
available in 100ns (4164-100), 120ns (4164-120) and 150ns (4164-150) variants,
paul@73 297
have 16 pins and address 65536 bits through a 1-bit wide channel. Similarly,
paul@73 298
ByteDelight.com sell 4164 devices primarily for the ZX Spectrum.
paul@65 299
paul@56 300
The documentation for the Electron mentions 4164-15 RAM chips for IC4-7, and
paul@64 301
the Samsung-produced KM41464 series is apparently equivalent to the Texas
paul@56 302
Instruments 4164 chips presumably used in the Electron.
paul@56 303
paul@56 304
The TM4164EC4 series combines 4 64K x 1b units into a single package and
paul@57 305
appears similar to the TM4164EA4 featured on the Electron's circuit diagram
paul@57 306
(in the Advanced User Guide but not the Service Manual), and it also has 22
paul@56 307
pins providing 3 additional inputs and 3 additional outputs over the 16 pins
paul@57 308
of the individual 4164-15 modules, presumably allowing concurrent access to
paul@57 309
the packaged memory units.
paul@56 310
paul@56 311
As far as currently available replacements are concerned, the NTE4164 is a
paul@57 312
potential candidate: according to the Vetco Electronics entry, it is
paul@57 313
supposedly a replacement for the TMS4164-15 amongst many other parts. Similar
paul@57 314
parts include the NTE2164 and the NTE6664, both of which appear to have
paul@57 315
largely the same performance and connection characteristics. Meanwhile, the
paul@58 316
NTE21256 appears to be a 16-pin replacement with four times the capacity that
paul@58 317
maintains the single data input and output pins. Using the NTE21256 as a
paul@57 318
replacement for all ICs combined would be difficult because of the single bit
paul@57 319
output.
paul@56 320
paul@57 321
Another device equivalent to the 4164-15 appears to be available under the
paul@57 322
code 41662 from Jameco Electronics as the Siemens HYB 4164-2. The Jameco Web
paul@57 323
site lists data sheets for other devices on the same page, but these are
paul@57 324
different and actually appear to be provided under the 41574 product code (but
paul@57 325
are listed under 41464-10) and appear to be replacements for the TM4164EC4:
paul@57 326
the Samsung KM41464A-15 and NEC µPD41464 employ 18 pins, eliminating 4 pins by
paul@57 327
employing 4 pins for both input and output.
paul@57 328
paul@64 329
            Pins    I/O pins    Row access  Column access
paul@64 330
            ----    --------    ----------  -------------
paul@64 331
TM4164EC4   22      4 + 4       150ns (15)  90ns (15)
paul@64 332
KM41464AP   18      4           150ns (15)  75ns (15)
paul@64 333
NTE21256    16      1 + 1       150ns       75ns
paul@64 334
HYB 4164-2  16      1 + 1       150ns       100ns
paul@64 335
µPD41464    18      4           120ns (12)  60ns (12)
paul@64 336
paul@40 337
See: TM4164EC4 65,536 by 4-Bit Dynamic RAM Module
paul@40 338
     http://www.datasheetarchive.com/dl/Datasheets-112/DSAP0051030.pdf
paul@65 339
See: Dynamic RAMS
paul@65 340
     http://www.unicornelectronics.com/IC/DYNAMIC.html
paul@73 341
See: New old stock 8x 4164 chips
paul@73 342
     http://www.bytedelight.com/?product=8x-4164-chips-new-old-stock
paul@56 343
See: KM4164B 64K x 1 Bit Dynamic RAM with Page Mode
paul@56 344
     http://images.ihscontent.net/vipimages/VipMasterIC/IC/SAMS/SAMSD020/SAMSD020-45.pdf
paul@57 345
See: NTE2164 Integrated Circuit 65,536 X 1 Bit Dynamic Random Access Memory
paul@57 346
     http://www.vetco.net/catalog/product_info.php?products_id=2806
paul@56 347
See: NTE4164 - IC-NMOS 64K DRAM 150NS
paul@56 348
     http://www.vetco.net/catalog/product_info.php?products_id=3680
paul@56 349
See: NTE21256 - IC-256K DRAM 150NS
paul@56 350
     http://www.vetco.net/catalog/product_info.php?products_id=2799
paul@56 351
See: NTE21256 262,144-Bit Dynamic Random Access Memory (DRAM)
paul@56 352
     http://www.nteinc.com/specs/21000to21999/pdf/nte21256.pdf
paul@57 353
See: NTE6664 - IC-MOS 64K DRAM 150NS
paul@57 354
     http://www.vetco.net/catalog/product_info.php?products_id=5213
paul@57 355
See: NTE6664 Integrated Circuit 64K-Bit Dynamic RAM
paul@57 356
     http://www.nteinc.com/specs/6600to6699/pdf/nte6664.pdf
paul@57 357
See: 4164-150: MAJOR BRANDS
paul@57 358
     http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41662_-1
paul@57 359
See: HYB 4164-1, HYB 4164-2, HYB 4164-3 65,536-Bit Dynamic Random Access Memory (RAM)
paul@57 360
     http://www.jameco.com/Jameco/Products/ProdDS/41662SIEMENS.pdf
paul@57 361
See: KM41464A NMOS DRAM 64K x 4 Bit Dynamic RAM with Page Mode
paul@57 362
     http://www.jameco.com/Jameco/Products/ProdDS/41662SAM.pdf
paul@57 363
See: NEC µ41464 65,536 x 4-Bit Dynamic NMOS RAM
paul@57 364
     http://www.jameco.com/Jameco/Products/ProdDS/41662NEC.pdf
paul@57 365
See: 41464-10: MAJOR BRANDS
paul@57 366
     http://www.jameco.com/webapp/wcs/stores/servlet/Product_10001_10001_41574_-1
paul@39 367
paul@43 368
Interrupts
paul@43 369
----------
paul@43 370
paul@43 371
The ULA generates IRQs (maskable interrupts) according to certain conditions
paul@43 372
and these conditions are controlled by location &FE00:
paul@43 373
paul@43 374
  * Vertical sync (bottom of displayed screen)
paul@43 375
  * 50MHz real time clock
paul@43 376
  * Transmit data empty
paul@43 377
  * Receive data full
paul@43 378
  * High tone detect
paul@43 379
paul@43 380
The ULA is also used to clear interrupt conditions through location &FE05. Of
paul@43 381
particular significance is bit 7, which must be set if an NMI (non-maskable
paul@43 382
interrupt) has occurred and has thus suspended ULA access to memory, restoring
paul@43 383
the normal function of the ULA.
paul@43 384
paul@43 385
ROM Paging
paul@43 386
----------
paul@43 387
paul@43 388
Accessing different ROMs involves bits 0 to 3 of &FE05. Some special ROM
paul@43 389
mappings exist:
paul@43 390
paul@43 391
   8    keyboard
paul@43 392
   9    keyboard (duplicate)
paul@43 393
  10    BASIC ROM
paul@43 394
  11    BASIC ROM (duplicate)
paul@43 395
paul@43 396
Paging in a ROM involves the following procedure:
paul@43 397
paul@43 398
 1. Assert ROM page enable (bit 3) together with a ROM number n in bits 0 to
paul@43 399
    2, corresponding to ROM number 8+n, such that one of ROMs 12 to 15 is
paul@43 400
    selected.
paul@43 401
 2. Where a ROM numbered from 0 to 7 is to be selected, set bit 3 to zero
paul@43 402
    whilst writing the desired ROM number n in bits 0 to 2.
paul@43 403
paul@81 404
See: http://stardot.org.uk/forums/viewtopic.php?p=136686#p136686
paul@81 405
paul@37 406
Shadow/Expanded Memory
paul@37 407
----------------------
paul@37 408
paul@37 409
The Electron exposes all sixteen address lines and all eight data lines
paul@37 410
through the expansion bus. Using such lines, it is possible to provide
paul@37 411
additional memory - typically sideways ROM and RAM - on expansion cards and
paul@37 412
through cartridges, although the official cartridge specification provides
paul@37 413
fewer address lines and only seeks to provide access to memory in 16K units.
paul@37 414
paul@37 415
Various modifications and upgrades were developed to offer "turbo"
paul@37 416
capabilities to the Electron, permitting the CPU to access a separate 8K of
paul@37 417
RAM at 2MHz, presumably preventing access to the low 8K of RAM accessible via
paul@37 418
the ULA through additional logic. However, an enhanced ULA might support
paul@37 419
independent CPU access to memory over the expansion bus by allowing itself to
paul@37 420
be discharged from providing access to memory, potentially for a range of
paul@37 421
addresses, and for the CPU to communicate with external memory uninterrupted.
paul@33 422
paul@72 423
Sideways RAM/ROM and Upper Memory Access
paul@72 424
----------------------------------------
paul@72 425
paul@72 426
Although the ULA controls the CPU clock, effectively slowing or stopping the
paul@72 427
CPU when the ULA needs to access screen memory, it is apparently able to allow
paul@72 428
the CPU to access addresses of &8000 and above - the upper region of memory -
paul@72 429
at 2MHz independently of any access to RAM that the ULA might be performing,
paul@72 430
only blocking the CPU if it attempts to access addresses of &7FFF and below
paul@72 431
during any ULA memory access - the lower region of memory - by stopping or
paul@72 432
stalling its clock.
paul@72 433
paul@72 434
Thus, the ULA remains aware of the level of the A15 line, only inhibiting the
paul@72 435
CPU clock if the line goes low, when the CPU is attempting to access the lower
paul@72 436
region of memory.
paul@72 437
paul@79 438
Hardware Scrolling (and Enhancement)
paul@79 439
------------------------------------
paul@0 440
paul@0 441
On the standard ULA, &FE02 and &FE03 map to a 9 significant bits address with
paul@0 442
the least significant 5 bits being zero, thus limiting the scrolling
paul@0 443
resolution to 64 bytes. An enhanced ULA could support a resolution of 2 bytes
paul@0 444
using the same layout of these addresses.
paul@0 445
paul@0 446
|--&FE02--------------| |--&FE03--------------|
paul@0 447
XX XX 14 13 12 11 10 09 08 07 06 XX XX XX XX XX
paul@0 448
paul@0 449
   XX 14 13 12 11 10 09 08 07 06 05 04 03 02 01 XX
paul@0 450
paul@4 451
Arguably, a resolution of 8 bytes is more useful, since the mapping of screen
paul@4 452
memory to pixel locations is character oriented. A change in 8 bytes would
paul@4 453
permit a horizontal scrolling resolution of 2 pixels in MODE 2, 4 pixels in
paul@4 454
MODE 1 and 5, and 8 pixels in MODE 0, 3 and 6. This resolution is actually
paul@4 455
observed on the BBC Micro (see 18.11.2 in the BBC Microcomputer Advanced User
paul@4 456
Guide).
paul@4 457
paul@4 458
One argument for a 2 byte resolution is smooth vertical scrolling. A pitfall
paul@4 459
of changing the screen address by 2 bytes is the change in the number of lines
paul@4 460
from the initial and final character rows that need reading by the ULA, which
paul@9 461
would need to maintain this state information (although this is a relatively
paul@9 462
trivial change). Another pitfall is the complication that might be introduced
paul@9 463
to software writing bitmaps of character height to the screen.
paul@4 464
paul@81 465
See: http://pastraiser.com/computers/acornelectron/acornelectron.html
paul@81 466
paul@82 467
Enhancement: Mode Layouts
paul@82 468
-------------------------
paul@82 469
paul@82 470
Merely changing the screen memory mappings in order to have Archimedes-style
paul@82 471
row-oriented screen addresses (instead of character-oriented addresses) could
paul@82 472
be done for the existing modes, but this might not be sufficiently beneficial,
paul@82 473
especially since accessing regions of the screen would involve incrementing
paul@82 474
pointers by amounts that are inconvenient on an 8-bit CPU.
paul@82 475
paul@82 476
However, instead of using a Archimedes-style mapping, column-oriented screen
paul@82 477
addresses could be more feasibly employed: incrementing the address would
paul@82 478
reference the vertical screen location below the currently-referenced location
paul@82 479
(just as occurs within characters using the existing ULA); instead of
paul@82 480
returning to the top of the character row and referencing the next horizontal
paul@82 481
location after eight bytes, the address would reference the next character row
paul@82 482
and continue to reference locations downwards over the height of the screen
paul@82 483
until reaching the bottom; at the bottom, the next location would be the next
paul@82 484
horizontal location at the top of the screen.
paul@82 485
paul@82 486
In other words, the memory layout for the screen would resemble the following
paul@82 487
(for MODE 2):
paul@82 488
paul@82 489
  &3000 &3100       ... &7F00
paul@82 490
  &3001 &3101
paul@82 491
  ...   ...
paul@82 492
  &3007
paul@82 493
  &3008
paul@82 494
  ...
paul@82 495
  ...                   ...
paul@82 496
  &30FF             ... &7FFF
paul@82 497
paul@82 498
Since there are 256 pixel rows, each column of locations would be addressable
paul@82 499
using the low byte of the address. Meanwhile, the high byte would be
paul@82 500
incremented to address different columns. Thus, addressing screen locations
paul@82 501
would become a lot more convenient and potentially much more efficient for
paul@82 502
certain kinds of graphical output.
paul@82 503
paul@82 504
One potential complication with this simplified addressing scheme arises with
paul@82 505
hardware scrolling. Vertical hardware scrolling by one pixel row (not supported
paul@82 506
with the existing ULA) would be achieved by incrementing or decrementing the
paul@82 507
screen start address; by one character row, it would involve adding or
paul@82 508
subtracting 8. However, the ULA only supports multiples of 64 when changing the
paul@82 509
screen start address. Thus, if such a scheme were to be adopted, three
paul@82 510
additional bits would need to be supported in the screen start register (see
paul@82 511
"Hardware Scrolling (and Enhancement)" for more details). However, horizontal
paul@82 512
scrolling would be much improved even under the severe constraints of the
paul@82 513
existing ULA: only adjustments of 256 to the screen start address would be
paul@82 514
required to produce single-location scrolling of as few as two pixels in MODE 2
paul@82 515
(four pixels in MODEs 1 and 5, eight pixels otherwise).
paul@82 516
paul@82 517
More disruptive is the effect of this alternative layout on software.
paul@82 518
Presumably, compatibility with the BBC Micro was the primary goal of the
paul@82 519
Electron's hardware design. With the character-oriented screen layout in
paul@82 520
place, system software (and application software accessing the screen
paul@82 521
directly) would be relying on this layout to run on the Electron with little
paul@82 522
or no modification. Although it might have been possible to change the system
paul@82 523
software to use this column-oriented layout instead, this would have incurred
paul@82 524
a development cost and caused additional work porting things like games to the
paul@82 525
Electron. Moreover, a separate branch of the software from that supporting the
paul@82 526
BBC Micro and closer derivatives would then have needed maintaining.
paul@82 527
paul@82 528
The decision to use the character-oriented layout in the BBC Micro may have
paul@82 529
been related to the choice of circuitry and to facilitate a convenient
paul@82 530
hardware implementation, and by the time the Electron was planned, it was too
paul@82 531
late to do anything about this somewhat unfortunate choice.
paul@82 532
paul@89 533
Pixel Layouts
paul@89 534
-------------
paul@89 535
paul@89 536
The pixel layouts are as follows:
paul@89 537
paul@89 538
  Modes         Depth (bpp)     Pixels (from bits)
paul@89 539
  -----         -----------     ------------------
paul@89 540
  0, 3, 4, 6    1               7 6 5 4 3 2 1 0
paul@89 541
  1, 5          2               73 62 51 40
paul@89 542
  2             4               7531 6420
paul@89 543
paul@89 544
Since the ULA reads a half-byte at a time, one might expect it to attempt to
paul@89 545
produce pixels for every half-byte, as opposed to handling entire bytes.
paul@89 546
However, the pixel layout is not conducive to producing pixels as soon as a
paul@89 547
half-byte has been read for a given full-byte location: in 1bpp modes the
paul@89 548
first four pixels can indeed be produced, but in 2bpp and 4bpp modes the pixel
paul@89 549
data is spread across the entire byte in different ways.
paul@89 550
paul@89 551
An alternative arrangement might be as follows:
paul@89 552
paul@89 553
  Modes         Depth (bpp)     Pixels (from bits)
paul@89 554
  -----         -----------     ------------------
paul@89 555
  0, 3, 4, 6    1               7 6 5 4 3 2 1 0
paul@89 556
  1, 5          2               76 54 32 10
paul@89 557
  2             4               7654 3210
paul@89 558
paul@89 559
Just as the mode layouts were presumably decided by compatibility with the BBC
paul@89 560
Micro, the pixel layouts will have been maintained for similar reasons.
paul@89 561
Unfortunately, this layout prevents any optimisation of the ULA for handling
paul@89 562
half-byte pixel data generally.
paul@89 563
paul@79 564
Enhancement: The Missing MODE 4
paul@79 565
-------------------------------
paul@79 566
paul@79 567
The Electron inherits its screen mode selection from the BBC Micro, where MODE
paul@79 568
3 is a text version of MODE 0, and where MODE 6 is a text version of MODE 4.
paul@79 569
Neither MODE 3 nor MODE 6 is a genuine character-based text mode like MODE 7,
paul@79 570
however, and they are merely implemented by skipping two scanlines in every
paul@79 571
ten after the eight required to produce a character line. Thus, such modes
paul@79 572
provide a 24-row display.
paul@79 573
paul@79 574
In principle, nothing prevents this "text mode" effect being applied to other
paul@79 575
modes. The 20-column modes are not well-suited to displaying text, which
paul@79 576
leaves MODE 1 which, unlike MODEs 3 and 6, can display 4 colours rather than
paul@79 577
2. Although the need for a non-monochrome 40-column text mode is addressed by
paul@79 578
MODE 7 on the BBC Micro, the Electron lacks such a mode.
paul@79 579
paul@79 580
If the 4-colour, 24-row variant of MODE 1 were to be provided, logically it
paul@79 581
would occupy MODE 4 instead of the current MODE 4:
paul@79 582
paul@79 583
  Screen mode  Size (kilobytes)  Colours  Rows  Resolution
paul@79 584
  -----------  ----------------  -------  ----  ----------
paul@79 585
  0            20                2        32    640x256
paul@79 586
  1            20                4        32    320x256
paul@79 587
  2            20                16       32    160x256
paul@79 588
  3            16                2        24    640x256
paul@79 589
  4 (new)      16                4        24    320x256
paul@79 590
  4 (old)      10                2        32    320x256
paul@79 591
  5            10                4        32    160x256
paul@79 592
  6            8                 2        24    320x256
paul@79 593
paul@79 594
Thus, for increasing mode numbers, the size of each mode would be the same or
paul@79 595
less than the preceding mode.
paul@79 596
paul@76 597
Enhancement: 2MHz RAM Access
paul@76 598
----------------------------
paul@76 599
paul@76 600
Given that the CPU and ULA both access RAM at 2MHz, but given that the CPU
paul@76 601
when not competing with the ULA only accesses RAM every other 2MHz cycle (as
paul@76 602
if the ULA still needed to access the RAM), one useful enhancement would be a
paul@76 603
mechanism to let the CPU take over the ULA cycles outside the ULA's period of
paul@76 604
activity comparable to the way the ULA takes over the CPU cycles in MODE 0 to
paul@76 605
3.
paul@76 606
paul@76 607
Thus, the RAM access cycles would resemble the following in MODE 0 to 3:
paul@76 608
paul@76 609
  Upon a transition from display cycles: UUUUCCCC (instead of UUUUC_C_)
paul@76 610
  On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
paul@76 611
paul@76 612
In MODE 4 to 6:
paul@76 613
 
paul@76 614
  Upon a transition from display cycles: CUCUCCCC (instead of CUCUC_C_)
paul@76 615
  On a non-display line:                 CCCCCCCC (instead of C_C_C_C_)
paul@76 616
paul@76 617
This would improve CPU bandwidth as follows:
paul@76 618
paul@76 619
                Standard ULA    Enhanced ULA
paul@76 620
MODE 0, 1, 2    9728 bytes      19456 bytes
paul@76 621
MODE 3          12288 bytes     24576 bytes
paul@76 622
MODE 4, 5       19968 bytes     29696 bytes
paul@76 623
MODE 6          19968 bytes     32256 bytes
paul@76 624
paul@76 625
With such an enhancement, MODE 0 to 3 experience a doubling of CPU bandwidth
paul@76 626
because all access opportunities to RAM are doubled. Meanwhile, in the other
paul@76 627
modes, some CPU accesses occur alongside ULA accesses and thus cannot be
paul@76 628
doubled, but the CPU bandwidth increase is still significant.
paul@76 629
paul@103 630
Unfortunately, the mechanism for accessing the RAM is too slow to provide data
paul@109 631
within the time constraints of 2MHz operation. There is no time remaining in a
paul@109 632
2MHz cycle for the CPU to receive and process any retrieved data.
paul@103 633
paul@55 634
Enhancement: Region Blanking
paul@55 635
----------------------------
paul@4 636
paul@4 637
The problem of permitting character-oriented blitting in programs whilst
paul@4 638
scrolling the screen by sub-character amounts could be mitigated by permitting
paul@4 639
a region of the display to be blank, such as the final lines of the display.
paul@4 640
Consider the following vertical scrolling by 2 bytes that would cause an
paul@4 641
initial character row of 6 lines and a final character row of 2 lines:
paul@4 642
paul@4 643
    6 lines - initial, partial character row
paul@4 644
  248 lines - 31 complete rows
paul@4 645
    2 lines - final, partial character row
paul@4 646
paul@4 647
If a routine were in use that wrote 8 line bitmaps to the partial character
paul@4 648
row now split in two, it would be advisable to hide one of the regions in
paul@4 649
order to prevent content appearing in the wrong place on screen (such as
paul@4 650
content meant to appear at the top "leaking" onto the bottom). Blanking 6
paul@4 651
lines would be sufficient, as can be seen from the following cases.
paul@4 652
paul@4 653
Scrolling up by 2 lines:
paul@4 654
paul@4 655
    6 lines - initial, partial character row
paul@4 656
  240 lines - 30 complete rows
paul@4 657
    4 lines - part of 1 complete row
paul@4 658
  -----------------------------------------------------------------
paul@4 659
    4 lines - part of 1 complete row (hidden to maintain 250 lines)
paul@4 660
    2 lines - final, partial character row (hidden)
paul@4 661
paul@4 662
Scrolling down by 2 lines:
paul@4 663
paul@4 664
    2 lines - initial, partial character row
paul@4 665
  248 lines - 31 complete rows
paul@4 666
  ----------------------------------------------------------
paul@4 667
    6 lines - final, partial character row (hidden)
paul@4 668
paul@24 669
Thus, in this case, region blanking would impose a 250 line display with the
paul@24 670
bottom 6 lines blank.
paul@24 671
paul@55 672
See the description of the display suspend enhancement for a more efficient
paul@74 673
way of blanking lines than merely blanking the palette whilst allowing the CPU
paul@74 674
to perform useful work during the blanking period.
paul@74 675
paul@74 676
To control the blanking or suspending of lines at the top and bottom of the
paul@74 677
display, a memory location could be dedicated to the task: the upper 4 bits
paul@74 678
could define a blanking region of up to 16 lines at the top of the screen,
paul@74 679
whereas the lower 4 bits could define such a region at the bottom of the
paul@74 680
screen. If more lines were required, two locations could be employed, allowing
paul@74 681
the top and bottom regions to occupy the entire screen.
paul@55 682
paul@55 683
Enhancement: Screen Height Adjustment
paul@55 684
-------------------------------------
paul@24 685
paul@24 686
The height of the screen could be configurable in order to reduce screen
paul@24 687
memory consumption. This is not quite done in MODE 3 and 6 since the start of
paul@24 688
the screen appears to be rounded down to the nearest page, but by reducing the
paul@24 689
height by amounts more than a page, savings would be possible. For example:
paul@24 690
paul@24 691
  Screen width  Depth  Height  Bytes per line  Saving in bytes  Start address
paul@24 692
  ------------  -----  ------  --------------  ---------------  -------------
paul@24 693
  640           1      252     80              320              &3140 -> &3100
paul@24 694
  640           1      248     80              640              &3280 -> &3200
paul@24 695
  320           1      240     40              640              &5A80 -> &5A00
paul@24 696
  320           2      240     80              1280             &3500
paul@0 697
paul@55 698
Screen Mode Selection
paul@55 699
---------------------
paul@55 700
paul@55 701
Bits 3, 4 and 5 of address &FE*7 control the selected screen mode. For a wider
paul@55 702
range of modes, the other bits of &FE*7 (related to sound, cassette
paul@55 703
input/output and the Caps Lock LED) would need to be reassigned and bit 0
paul@55 704
potentially being made available for use.
paul@55 705
paul@58 706
Enhancement: Palette Definition
paul@58 707
-------------------------------
paul@0 708
paul@0 709
Since all memory accesses go via the ULA, an enhanced ULA could employ more
paul@0 710
specific addresses than &FE*X to perform enhanced functions. For example, the
paul@0 711
palette control is done using &FE*8-F and merely involves selecting predefined
paul@0 712
colours, whereas an enhanced ULA could support the redefinition of all 16
paul@0 713
colours using specific ranges such as &FE18-F (colours 0 to 7) and &FE28-F
paul@0 714
(colours 8 to 15), where a single byte might provide 8 bits per pixel colour
paul@0 715
specifications similar to those used on the Archimedes.
paul@0 716
paul@4 717
The principal limitation here is actually the hardware: the Electron has only
paul@4 718
a single output line for each of the red, green and blue channels, and if
paul@4 719
those outputs are strictly digital and can only be set to a "high" and "low"
paul@4 720
value, then only the existing eight colours are possible. If a modern ULA were
paul@81 721
able to output analogue values (or values at well-defined points between the
paul@81 722
high and low values, such as the half-on value supported by the Amstrad CPC
paul@81 723
series), it would still need to be assessed whether the circuitry could
paul@81 724
successfully handle and propagate such values. Various sources indicate that
paul@81 725
only "TTL levels" are supported by the RGB output circuit, and since there are
paul@81 726
74LS08 AND logic gates involved in the RGB component outputs from the ULA, it
paul@81 727
is likely that the ULA is expected to provide only "high" or "low" values.
paul@4 728
paul@58 729
Short of adding extra outputs from the ULA (either additional red, green and
paul@81 730
blue outputs or a combined intensity output), another approach might involve
paul@81 731
some kind of modulation where an output value might be encoded in multiple
paul@81 732
pulses at a higher frequency than the pixel frequency. However, this would
paul@81 733
demand additional circuitry outside the ULA, and component RGB monitors would
paul@81 734
probably not be able to take advantage of this feature; only UHF and composite
paul@81 735
video devices (the latter with the composite video colour support enabled on
paul@81 736
the Electron's circuit board) would potentially benefit.
paul@58 737
paul@51 738
Flashing Colours
paul@51 739
----------------
paul@51 740
paul@51 741
According to the Advanced User Guide, "The cursor and flashing colours are
paul@51 742
entirely generated in software: This means that all of the logical to physical
paul@51 743
colour map must be changed to cause colours to flash." This appears to suggest
paul@51 744
that the palette registers must be updated upon the flash counter - read and
paul@51 745
written by OSBYTE &C1 (193) - reaching zero and that some way of changing the
paul@51 746
colour pairs to be any combination of colours might be possible, instead of
paul@52 747
having colour complements as pairs.
paul@52 748
paul@52 749
It is conceivable that the interrupt code responsible does the simple thing
paul@54 750
and merely inverts the current values for any logical colours (LC) for which
paul@54 751
the associated physical colour (as supplied as the second parameter to the VDU
paul@54 752
19 call) has the top bit of its four bit value set. These top bits are not
paul@52 753
recorded in the palette registers but are presumably recorded separately and
paul@52 754
used to build bitmaps as follows:
paul@52 755
paul@54 756
  LC  2 colour  4 colour  16 colour  4-bit value for inversion
paul@54 757
  --  --------  --------  ---------  -------------------------
paul@54 758
   0  00010001  00010001  00010001   1, 1, 1
paul@54 759
   1  01000100  00100010  00010001   4, 2, 1
paul@54 760
   2            01000100  00100010      4, 2
paul@54 761
   3            10001000  00100010      8, 2
paul@54 762
   4                      00010001         1
paul@54 763
   5                      00010001         1
paul@54 764
   6                      00100010         2
paul@54 765
   7                      00100010         2
paul@54 766
   8                      01000100         4
paul@54 767
   9                      01000100         4
paul@54 768
  10                      10001000         8
paul@54 769
  11                      10001000         8
paul@54 770
  12                      01000100         4
paul@54 771
  13                      01000100         4
paul@54 772
  14                      10001000         8
paul@54 773
  15                      10001000         8
paul@54 774
paul@54 775
  Inversion value calculation:
paul@54 776
paul@54 777
   2 colour formula: 1 << (colour * 2)
paul@54 778
   4 colour formula: 1 << colour
paul@54 779
  16 colour formula: 1 << ((colour & 2) + ((colour & 8) * 2))
paul@52 780
paul@53 781
For example, where logical colour 0 has been mapped to a physical colour in
paul@53 782
the range 8 to 15, a bitmap of 00010001 would be chosen as its contribution to
paul@53 783
the inversion operation. (The lower three bits of the physical colour would be
paul@53 784
used to set the underlying colour information affected by the inversion
paul@53 785
operation.)
paul@53 786
paul@52 787
An operation in the interrupt code would then combine the bitmaps for all
paul@52 788
logical colours in 2 and 4 colour modes, with the 16 colour bitmaps being
paul@52 789
combined for groups of logical colours as follows:
paul@52 790
paul@54 791
   Logical colours
paul@54 792
   ---------------
paul@52 793
   0,  2,  8, 10
paul@52 794
   4,  6, 12, 14
paul@52 795
   5,  7, 13, 15
paul@52 796
   1,  3,  9, 11
paul@52 797
paul@52 798
These combined bitmaps would be EORed with the existing palette register
paul@52 799
values in order to perform the value inversion necessary to produce the
paul@52 800
flashing effect.
paul@51 801
paul@54 802
Thus, in the VDU 19 operation, the appropriate inversion value would be
paul@54 803
calculated for the logical colour, and this value would then be combined with
paul@54 804
other inversion values in a dedicated memory location corresponding to the
paul@54 805
colour's group as indicated above. Meanwhile, the palette channel values would
paul@54 806
be derived from the lower three bits of the specified physical colour and
paul@54 807
combined with other palette data in dedicated memory locations corresponding
paul@54 808
to the palette registers.
paul@54 809
paul@72 810
Interestingly, although flashing colours on the BBC Micro are controlled by
paul@72 811
toggling bit 0 of the &FE20 control register location for the Video ULA, the
paul@72 812
actual colour inversion is done in hardware.
paul@72 813
paul@55 814
Enhancement: Palette Definition Lists
paul@55 815
-------------------------------------
paul@4 816
paul@4 817
It can be useful to redefine the palette in order to change the colours
paul@4 818
available for a particular region of the screen, particularly in modes where
paul@4 819
the choice of colours is constrained, and if an increased colour depth were
paul@4 820
available, palette redefinition would be useful to give the illusion of more
paul@4 821
than 16 colours in MODE 2. Traditionally, palette redefinition has been done
paul@4 822
by using interrupt-driven timers, but a more efficient approach would involve
paul@4 823
presenting lists of palette definitions to the ULA so that it can change the
paul@4 824
palette at a particular display line.
paul@4 825
paul@4 826
One might define a palette redefinition list in a region of memory and then
paul@4 827
communicate its contents to the ULA by writing the address and length of the
paul@4 828
list, along with the display line at which the palette is to be changed, to
paul@4 829
ULA registers such that the ULA buffers the list and performs the redefinition
paul@4 830
at the appropriate time. Throughput/bandwidth considerations might impose
paul@4 831
restrictions on the practical length of such a list, however.
paul@4 832
paul@79 833
Enhancement: Display Synchronisation Interrupts
paul@79 834
-----------------------------------------------
paul@79 835
paul@79 836
When completing each scanline of the display, the ULA could trigger an
paul@79 837
interrupt. Since this might impact system performance substantially, the
paul@79 838
feature would probably need to be configurable, and it might be sufficient to
paul@79 839
have an interrupt only after a certain number of display lines instead.
paul@79 840
Permitting the CPU to take action after eight lines would allow palette
paul@79 841
switching and other effects to occur on a character row basis.
paul@79 842
paul@79 843
The ULA provides an interrupt at the end of the display period, presumably so
paul@79 844
that software can schedule updates to the screen, avoid flickering or tearing,
paul@79 845
and so on. However, some applications might benefit from an interrupt at, or
paul@79 846
just before, the start of the display period so that palette modifications or
paul@79 847
similar effects could be scheduled.
paul@79 848
paul@55 849
Enhancement: Palette-Free Modes
paul@55 850
-------------------------------
paul@4 851
paul@4 852
Palette-free modes might be defined where bit values directly correspond to
paul@4 853
the red, green and blue channels, although this would mostly make sense only
paul@4 854
for modes with depths greater than the standard 4 bits per pixel, and such
paul@4 855
modes would require more memory than MODE 2 if they were to have an acceptable
paul@4 856
resolution.
paul@4 857
paul@55 858
Enhancement: Display Suspend
paul@55 859
----------------------------
paul@4 860
paul@4 861
Especially when writing to the screen memory, it could be beneficial to be
paul@4 862
able to suspend the ULA's access to the memory, instead producing blank values
paul@4 863
for all screen pixels until a program is ready to reveal the screen. This is
paul@4 864
different from palette blanking since with a blank palette, the ULA is still
paul@4 865
reading screen memory and translating its contents into pixel values that end
paul@4 866
up being blank.
paul@4 867
paul@4 868
This function is reminiscent of a capability of the ZX81, albeit necessary on
paul@4 869
that hardware to reduce the load on the system CPU which was responsible for
paul@62 870
producing the video output. By allowing display suspend on the Electron, the
paul@62 871
performance benefit would be derived from giving the CPU full access to the
paul@62 872
memory bandwidth.
paul@4 873
paul@74 874
The region blanking feature mentioned above could be implemented using this
paul@74 875
enhancement instead of employing palette blanking for the affected lines of
paul@74 876
the display.
paul@74 877
paul@63 878
Enhancement: Memory Filling
paul@63 879
---------------------------
paul@63 880
paul@63 881
A capability that could be given to an enhanced ULA is that of permitting the
paul@63 882
ULA to write to screen memory as well being able to read from it. Although
paul@63 883
such a capability would probably not be useful in conjunction with the
paul@63 884
existing read operations when producing a screen display, and insufficient
paul@63 885
bandwidth would exist to do so in high-bandwidth screen modes anyway, the
paul@63 886
capability could be offered during a display suspend period (as described
paul@63 887
above), permitting a more efficient mechanism to rapidly fill memory with a
paul@63 888
predetermined value.
paul@63 889
paul@63 890
This capability could also support block filling, where the limits of the
paul@63 891
filled memory would be defined by the position and size of a screen area,
paul@63 892
although this would demand the provision of additional registers in the ULA to
paul@63 893
retain the details of such areas and additional logic to control the fill
paul@63 894
operation.
paul@63 895
paul@69 896
Enhancement: Region Filling
paul@69 897
---------------------------
paul@69 898
paul@69 899
An alternative to memory writing might involve indicating regions using
paul@69 900
additional registers or memory where the ULA fills regions of the screen with
paul@69 901
content instead of reading from memory. Unlike hardware sprites which should
paul@69 902
realistically provide varied content, region filling could employ single
paul@69 903
colours or patterns, and one advantage of doing so would be that the ULA need
paul@69 904
not access memory at all within a particular region.
paul@69 905
paul@69 906
Regions would be defined on a row-by-row basis. Instead of reading memory and
paul@69 907
blitting a direct representation to the screen, the ULA would read region
paul@69 908
definitions containing a start column, region width and colour details. There
paul@69 909
might be a certain number of definitions allowed per row, or the ULA might
paul@69 910
just traverse an ordered list of such definitions with each one indicating the
paul@71 911
row, start column, region width and colour details.
paul@71 912
paul@71 913
One could even compress this information further by requiring only the row,
paul@71 914
start column and colour details with each subsequent definition terminating
paul@71 915
the effect of the previous one. However, one would also need to consider the
paul@71 916
convenience of preparing such definitions and whether efficient access to
paul@71 917
definitions for a particular row might be desirable. It might also be
paul@71 918
desirable to avoid having to prepare definitions for "empty" areas of the
paul@71 919
screen, effectively making the definition of the screen contents employ
paul@71 920
run-length encoding and employ only colour plus length information.
paul@69 921
paul@69 922
One application of region filling is that of simple 2D and 3D shape rendering.
paul@69 923
Although it is entirely possible to plot such shapes to the screen and have
paul@69 924
the ULA blit the memory contents to the screen, such operations consume
paul@69 925
bandwidth both in the initial plotting and in the final transfer to the
paul@69 926
screen. Region filling would reduce such bandwidth usage substantially.
paul@69 927
paul@71 928
This way of representing screen images would make certain kinds of images
paul@71 929
unfeasible to represent - consider alternating single pixel values which could
paul@71 930
easily occur in some character bitmaps - even if an internal queue of regions
paul@71 931
were to be supported such that the ULA could read ahead and buffer such
paul@71 932
"bandwidth intensive" areas. Thus, the ULA might be better served providing
paul@71 933
this feature for certain areas of the display only as some kind of special
paul@71 934
graphics window.
paul@71 935
paul@55 936
Enhancement: Hardware Sprites
paul@55 937
-----------------------------
paul@0 938
paul@0 939
An enhanced ULA might provide hardware sprites, but this would be done in an
paul@0 940
way that is incompatible with the standard ULA, since no &FE*X locations are
paul@34 941
available for allocation. To keep the facility simple, hardware sprites would
paul@34 942
have a standard byte width and height.
paul@34 943
paul@34 944
The specification of sprites could involve the reservation of 16 locations
paul@34 945
(for example, &FE20-F) specifying a fixed number of eight sprites, with each
paul@34 946
location pair referring to the sprite data. By limiting the ULA to dealing
paul@34 947
with a fixed number of sprites, the work required inside the ULA would be
paul@35 948
reduced since it would avoid having to deal with arbitrary numbers of sprites.
paul@0 949
paul@35 950
The principal limitation on providing hardware sprites is that of having to
paul@35 951
obtain sprite data, given that the ULA is usually required to retrieve screen
paul@35 952
data, and given the lack of memory bandwidth available to retrieve sprite data
paul@35 953
(particularly from multiple sprites supposedly at the same position) and
paul@35 954
screen data simultaneously. Although the ULA could potentially read sprite
paul@35 955
data and screen data in alternate memory accesses in screen modes where the
paul@35 956
bandwidth is not already fully utilised, this would result in a degradation of
paul@35 957
performance.
paul@34 958
paul@55 959
Enhancement: Additional Screen Mode Configurations
paul@55 960
--------------------------------------------------
paul@24 961
paul@24 962
Alternative screen mode configurations could be supported. The ULA has to
paul@24 963
produce 640 pixel values across the screen, with pixel doubling or quadrupling
paul@24 964
employed to fill the screen width:
paul@24 965
paul@24 966
  Screen width      Columns     Scaling     Depth       Bytes
paul@24 967
  ------------      -------     -------     -----       -----
paul@24 968
  640               80          x1          1           80
paul@24 969
  320               40          x2          1, 2        40, 80
paul@24 970
  160               20          x4          2, 4        40, 80
paul@24 971
paul@24 972
It must also use at most 80 byte-sized memory accesses to provide the
paul@24 973
information for the display. Given that characters must occupy an 8x8 pixel
paul@24 974
array, if a configuration featuring anything other than 20, 40 or 80 character
paul@24 975
columns is to be supported, compromises must be made such as the introduction
paul@24 976
of blank pixels either between characters (such as occurs between rows in MODE
paul@24 977
3 and 6) or at the end of a scanline (such as occurs at the end of the frame
paul@55 978
in MODE 3 and 6). Consider the following configuration:
paul@24 979
paul@24 980
  Screen width      Columns     Scaling     Depth       Bytes       Blank
paul@24 981
  ------------      -------     -------     -----       ------      -----
paul@24 982
  208               26          x3          1, 2        26, 52      16
paul@24 983
paul@24 984
Here, if the ULA can triple pixels, a 26 column mode with either 2 or 4
paul@24 985
colours could be provided, with 16 blank pixel values (out of a total of 640)
paul@24 986
generated either at the start or end (or split between the start and end) of
paul@24 987
each scanline.
paul@24 988
paul@55 989
Enhancement: Character Attributes
paul@55 990
---------------------------------
paul@24 991
paul@24 992
The BBC Micro MODE 7 employs something resembling character attributes to
paul@24 993
support teletext displays, but depends on circuitry providing a character
paul@24 994
generator. The ZX Spectrum, on the other hand, provides character attributes
paul@24 995
as a means of colouring bitmapped graphics. Although such a feature is very
paul@24 996
limiting as the sole means of providing multicolour graphics, in situations
paul@24 997
where the choice is between low resolution multicolour graphics or high
paul@24 998
resolution monochrome graphics, character attributes provide a potentially
paul@24 999
useful compromise.
paul@24 1000
paul@24 1001
For each byte read, the ULA must deliver 8 pixel values (out of a total of
paul@24 1002
640) to the video output, doing so by either emptying its pixel buffer on a
paul@24 1003
pixel per cycle basis, or by multiplying pixels and thus holding them for more
paul@24 1004
than one cycle. For example for a screen mode having 640 pixels in width:
paul@24 1005
paul@24 1006
  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
paul@24 1007
  Reads:    B                               B
paul@24 1008
  Pixels:   0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7
paul@24 1009
paul@24 1010
And for a screen mode having 320 pixels in width:
paul@24 1011
paul@24 1012
  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
paul@24 1013
  Reads:    B
paul@24 1014
  Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
paul@24 1015
paul@24 1016
However, in modes where less than 80 bytes are required to generate the pixel
paul@24 1017
values, an enhanced ULA might be able to read additional bytes between those
paul@24 1018
providing the bitmapped graphics data:
paul@24 1019
paul@24 1020
  Cycle:    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
paul@24 1021
  Reads:    B                               A
paul@24 1022
  Pixels:   0   0   1   1   2   2   3   3   4   4   5   5   6   6   7   7
paul@24 1023
paul@24 1024
These additional bytes could provide colour information for the bitmapped data
paul@24 1025
in the following character column (of 8 pixels). Since it would be desirable
paul@24 1026
to apply attribute data to the first column, the initial 8 cycles might be
paul@24 1027
configured to not produce pixel values.
paul@24 1028
paul@35 1029
For an entire character, attribute data need only be read for the first row of
paul@35 1030
pixels for a character. The subsequent rows would have attribute information
paul@35 1031
applied to them, although this would require the attribute data to be stored
paul@35 1032
in some kind of buffer. Thus, the following access pattern would be observed:
paul@35 1033
paul@112 1034
  Reads:    A B _ B _ B _ B _ B _ B _ B _ B ...
paul@112 1035
paul@112 1036
In modes 3 and 6, the blank display lines could be used to retrieve attribute
paul@112 1037
data:
paul@112 1038
paul@112 1039
  Reads (blank):     A _ A _ A _ A _ A _ A _ A _ A _ ...
paul@112 1040
  Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
paul@112 1041
  Reads (active):    B _ B _ B _ B _ B _ B _ B _ B _ ...
paul@112 1042
                     ...
paul@112 1043
paul@112 1044
See below for a discussion of using this for character data as well.
paul@35 1045
paul@24 1046
A whole byte used for colour information for a whole character would result in
paul@35 1047
a choice of 256 colours, and this might be somewhat excessive. By only reading
paul@35 1048
attribute bytes at every other opportunity, a choice of 16 colours could be
paul@35 1049
applied individually to two characters.
paul@24 1050
paul@24 1051
  Cycle:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
paul@24 1052
  Reads:    B               A               B               -
paul@24 1053
  Pixels:   0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
paul@24 1054
paul@35 1055
Further reductions in attribute data access, offering 4 colours for every
paul@35 1056
character in a four character block, for example, might also be worth
paul@34 1057
considering.
paul@34 1058
paul@24 1059
Consider the following configurations for screen modes with a colour depth of
paul@24 1060
1 bit per pixel for bitmap information:
paul@24 1061
paul@35 1062
  Screen width  Columns  Scaling  Bytes (B)  Bytes (A)  Colours  Screen start
paul@35 1063
  ------------  -------  -------  ---------  ---------  -------  ------------
paul@35 1064
  320           40       x2       40         40         256      &5300
paul@35 1065
  320           40       x2       40         20         16       &5580 -> &5500
paul@35 1066
  320           40       x2       40         10         4        &56C0 -> &5600
paul@35 1067
  208           26       x3       26         26         256      &62C0 -> &6200
paul@35 1068
  208           26       x3       26         13         16       &6460 -> &6400
paul@34 1069
paul@113 1070
Enhancement: Text-Only Modes using Character and Attribute Data
paul@113 1071
---------------------------------------------------------------
paul@112 1072
paul@112 1073
In modes 3 and 6, the blank display lines could be used to retrieve character
paul@112 1074
and attribute data instead of trying to insert it between bitmap data accesses,
paul@112 1075
but this data would then need to be retained:
paul@112 1076
paul@112 1077
  Reads:    A C A C A C A C A C A C A C A C ...
paul@112 1078
  Reads:    B _ B _ B _ B _ B _ B _ B _ B _ ...
paul@112 1079
paul@112 1080
Only attribute (A) and character (C) reads would require screen memory
paul@112 1081
storage. Bitmap data reads (B) would involve either accesses to memory to
paul@112 1082
obtain character definition details or could, at the cost of special storage
paul@112 1083
in the ULA, involve accesses within the ULA that would then free up the RAM.
paul@112 1084
However, the CPU would not benefit from having any extra access slots due to
paul@112 1085
the limitations of the RAM access mechanism.
paul@112 1086
paul@113 1087
A scheme without caching might be possible. The same line of memory addresses
paul@113 1088
might be visited over and over again for eight display lines, with an index
paul@113 1089
into the bitmap data being incremented from zero to seven. The access patterns
paul@113 1090
would look like this:
paul@113 1091
paul@113 1092
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 0)
paul@113 1093
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 1)
paul@113 1094
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 2)
paul@113 1095
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 3)
paul@113 1096
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 4)
paul@113 1097
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 5)
paul@113 1098
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 6)
paul@113 1099
  Reads:    C B C B C B C B C B C B C B C B ... (generate data from index 7)
paul@113 1100
paul@113 1101
The bandwidth requirements would be the sum of the accesses to read the
paul@113 1102
character values (repeatedly) and those to read the bitmap data to reproduce
paul@113 1103
the characters on screen.
paul@113 1104
paul@55 1105
Enhancement: MODE 7 Emulation using Character Attributes
paul@55 1106
--------------------------------------------------------
paul@24 1107
paul@24 1108
If the scheme of applying attributes to character regions were employed to
paul@24 1109
emulate MODE 7, in conjunction with the MODE 6 display technique, the
paul@24 1110
following configuration would be required:
paul@24 1111
paul@24 1112
  Screen width  Columns  Rows  Bytes (B)  Bytes (A)  Colours  Screen start
paul@24 1113
  ------------  -------  ----  ---------  ---------  -------  ------------
paul@35 1114
  320           40       25    40         20         16       &5ECC -> &5E00
paul@35 1115
  320           40       25    40         10         4        &5FC6 -> &5F00
paul@24 1116
paul@35 1117
Although this requires much more memory than MODE 7 (8500 bytes versus MODE
paul@35 1118
7's 1000 bytes), it does not need much more memory than MODE 6, and it would
paul@35 1119
at least make a limited 40-column multicolour mode available as a substitute
paul@35 1120
for MODE 7.
paul@24 1121
paul@113 1122
Using the text-only enhancement with caching of data or with repeated reads of
paul@113 1123
the same character data line for eight display lines, the storage requirements
paul@112 1124
would be diminished substantially:
paul@112 1125
paul@112 1126
  Screen width  Columns  Rows  Bytes (C)  Bytes (A)  Colours  Screen start
paul@112 1127
  ------------  -------  ----  ---------  ---------  -------  ------------
paul@112 1128
  320           40       25    40         20         16       &7A94 -> &7A00
paul@112 1129
  320           40       25    40         10         4        &7B1E -> &7B00
paul@112 1130
  320           40       25    40         5          2        &7B9B -> &7B00
paul@112 1131
  320           40       25    40         0          (2)      &7C18 -> &7C00
paul@112 1132
  640           80       25    80         40         16       &7448 -> &7400
paul@112 1133
  640           80       25    80         20         4        &763C -> &7600
paul@112 1134
  640           80       25    80         10         2        &7736 -> &7700
paul@112 1135
  640           80       25    80         0          (2)      &7830 -> &7800
paul@112 1136
paul@112 1137
Note that the colours describe the locally defined attributes for each
paul@112 1138
character. When no attribute information is provided, the colours are defined
paul@112 1139
globally.
paul@112 1140
paul@112 1141
Enhancement: Compressed Character Data
paul@112 1142
--------------------------------------
paul@112 1143
paul@112 1144
Another observation about text-only modes is that they only need to store a
paul@112 1145
restricted set of bitmapped data values. Encoding this set of values in a
paul@112 1146
smaller unit of storage than a byte could possibly help to reduce the amount
paul@112 1147
of storage and bandwidth required to reproduce the characters on the display.
paul@112 1148
paul@82 1149
Enhancement: High Resolution Graphics
paul@82 1150
-------------------------------------
paul@0 1151
paul@82 1152
Screen modes with higher resolutions and larger colour depths might be
paul@82 1153
possible, but this would in most cases involve the allocation of more screen
paul@82 1154
memory, and the ULA would probably then be obliged to page in such memory for
paul@82 1155
the CPU to be able to sensibly access it all.
paul@0 1156
paul@55 1157
Enhancement: Genlock Support
paul@55 1158
----------------------------
paul@46 1159
paul@46 1160
The ULA generates a video signal in conjunction with circuitry producing the
paul@46 1161
output features necessary for the correct display of the screen image.
paul@46 1162
However, it appears that the ULA drives the video synchronisation mechanism
paul@46 1163
instead of reacting to an existing signal. Genlock support might be possible
paul@46 1164
if the ULA were made to be responsive to such external signals, resetting its
paul@46 1165
address generators upon receiving synchronisation events.
paul@46 1166
paul@55 1167
Enhancement: Improved Sound
paul@55 1168
---------------------------
paul@0 1169
paul@55 1170
The standard ULA reserves &FE*6 for sound generation and cassette input/output
paul@55 1171
(with bits 1 and 2 of &FE*7 being used to select either sound generation or
paul@55 1172
cassette I/O), thus making it impossible to support multiple channels within
paul@0 1173
the given framework. The BBC Micro ULA employs &FE40-&FE4F for sound control,
paul@0 1174
and an enhanced ULA could adopt this interface.
paul@0 1175
paul@9 1176
The BBC Micro uses the SN76489 chip to produce sound, and the entire
paul@9 1177
functionality of this chip could be emulated for enhanced sound, with a subset
paul@9 1178
of the functionality exposed via the &FE*6 interface.
paul@9 1179
paul@9 1180
See: http://en.wikipedia.org/wiki/Texas_Instruments_SN76489
paul@81 1181
See: http://www.smspower.org/Development/SN76489
paul@9 1182
paul@55 1183
Enhancement: Waveform Upload
paul@55 1184
----------------------------
paul@0 1185
paul@0 1186
As with a hardware sprite function, waveforms could be uploaded or referenced
paul@0 1187
using locations as registers referencing memory regions.
paul@0 1188
paul@55 1189
Enhancement: Sound Input/Output
paul@55 1190
-------------------------------
paul@46 1191
paul@46 1192
Since the ULA already controls audio input/output for cassette-based data, it
paul@46 1193
would have been interesting to entertain the idea of sampling and output of
paul@46 1194
sounds through the cassette interface. However, a significant amount of
paul@46 1195
circuitry is employed to process the input signal for use by the ULA and to
paul@46 1196
process the output signal for recording.
paul@46 1197
paul@46 1198
See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw_03.htm#3.11
paul@46 1199
paul@55 1200
Enhancement: BBC ULA Compatibility
paul@55 1201
----------------------------------
paul@0 1202
paul@0 1203
Although some new ULA functions could be defined in a way that is also
paul@0 1204
compatible with the BBC Micro, the BBC ULA is itself incompatible with the
paul@0 1205
Electron ULA: &FE00-7 is reserved for the video controller in the BBC memory
paul@0 1206
map, but controls various functions specific to the 6845 video controller;
paul@0 1207
&FE08-F is reserved for the serial controller. It therefore becomes possible
paul@0 1208
to disregard compatibility where compatibility is already disregarded for a
paul@0 1209
particular area of functionality.
paul@0 1210
paul@0 1211
&FE20-F maps to video ULA functionality on the BBC Micro which provides
paul@0 1212
control over the palette (using address &FE21, compared to &FE07-F on the
paul@0 1213
Electron) and other system-specific functions. Since the location usage is
paul@0 1214
generally incompatible, this region could be reused for other purposes.
paul@31 1215
paul@55 1216
Enhancement: Increased RAM, ULA and CPU Performance
paul@55 1217
---------------------------------------------------
paul@49 1218
paul@49 1219
More modern implementations of the hardware might feature faster RAM coupled
paul@49 1220
with an increased ULA clock frequency in order to increase the bandwidth
paul@49 1221
available to the ULA and to the CPU in situations where the ULA is not needed
paul@49 1222
to perform work. A ULA employing a 32MHz clock would be able to complete the
paul@49 1223
retrieval of a byte from RAM in only 250ns and thus be able to enable the CPU
paul@49 1224
to access the RAM for the following 250ns even in display modes requiring the
paul@49 1225
retrieval of a byte for the display every 500ns. The CPU could, subject to
paul@49 1226
timing issues, run at 2MHz even in MODE 0, 1 and 2.
paul@49 1227
paul@49 1228
A scheme such as that described above would have a similar effect to the
paul@49 1229
scheme employed in the BBC Micro, although the latter made use of RAM with a
paul@49 1230
wider bandwidth in order to complete memory transfers within 250ns and thus
paul@49 1231
permit the CPU to run continuously at 2MHz.
paul@49 1232
paul@49 1233
Higher bandwidth could potentially be used to implement exotic features such
paul@49 1234
as RAM-resident hardware sprites or indeed any feature demanding RAM access
paul@49 1235
concurrent with the production of the display image.
paul@49 1236
paul@80 1237
Enhancement: Multiple CPU Stacks and Zero Pages
paul@80 1238
-----------------------------------------------
paul@75 1239
paul@75 1240
The 6502 maintains a stack for subroutine calls and register storage in page
paul@75 1241
&01. Although the stack register can be manipulated using the TSX and TXS
paul@75 1242
instructions, thereby permitting the maintenance of multiple stack regions and
paul@75 1243
thus the potential coexistence of multiple programs each using a separate
paul@75 1244
region, only programs that make little use of the stack (perhaps avoiding
paul@75 1245
deeply-nested subroutine invocations and significant register storage) would
paul@75 1246
be able to coexist without overwriting each other's stacks.
paul@75 1247
paul@75 1248
One way that this issue could be alleviated would involve the provision of a
paul@75 1249
facility to redirect accesses to page &01 to other areas of memory. The ULA
paul@75 1250
would provide a register that defines a physical page for the use of the CPU's
paul@75 1251
"logical" page &01, and upon any access to page &01 by the CPU, the ULA would
paul@75 1252
change the asserted address lines to redirect the access to the appropriate
paul@75 1253
physical region.
paul@75 1254
paul@75 1255
By providing an 8-bit register, mapping to the most significant byte (MSB) of
paul@75 1256
a 16-bit address, the ULA could then replace any MSB equal to &01 with the
paul@75 1257
register value before the access is made. Where multiple programs coexist,
paul@75 1258
upon switching programs, the register would be updated to point the ULA to the
paul@75 1259
appropriate stack location, thus providing a simple memory management unit
paul@75 1260
(MMU) capability.
paul@75 1261
paul@80 1262
In a similar fashion, zero page accesses could also be redirected so that code
paul@80 1263
could run from sideways RAM and have zero page operations redirected to "upper
paul@80 1264
memory" - for example, to page &BE (with stack accesses redirected to page
paul@80 1265
&BF, perhaps) - thereby permitting most CPU operations to occur without
paul@80 1266
inadvertent accesses to "lower memory" (the RAM) which would risk stalling the
paul@80 1267
CPU as it contends with the ULA for memory access.
paul@80 1268
paul@80 1269
Such facilities could also be provided by a separate circuit between the CPU
paul@80 1270
and ULA in a fashion similar to that employed by a "turbo" board, but unlike
paul@80 1271
such boards, no additional RAM would be provided: all memory accesses would
paul@80 1272
occur as normal through the ULA, albeit redirected when configured
paul@80 1273
appropriately.
paul@80 1274
paul@31 1275
ULA Pin Functions
paul@31 1276
-----------------
paul@31 1277
paul@31 1278
The functions of the ULA pins are described in the Electron Service Manual. Of
paul@31 1279
interest to video processing are the following:
paul@31 1280
paul@31 1281
  CSYNC (low during horizontal or vertical synchronisation periods, high
paul@31 1282
         otherwise)
paul@31 1283
paul@31 1284
  HS (low during horizontal synchronisation periods, high otherwise)
paul@31 1285
paul@31 1286
  RED, GREEN, BLUE (pixel colour outputs)
paul@31 1287
paul@31 1288
  CLOCK IN (a 16MHz clock input, 4V peak to peak)
paul@31 1289
paul@31 1290
  PHI OUT (a 1MHz, 2MHz and stopped clock signal for the CPU)
paul@31 1291
paul@31 1292
More general memory access pins:
paul@31 1293
paul@31 1294
  RAM0...RAM3 (data lines to/from the RAM)
paul@31 1295
paul@31 1296
  RA0...RA7 (address lines for sending both row and column addresses to the RAM)
paul@31 1297
paul@38 1298
  RAS (row address strobe setting the row address on a negative edge - see the
paul@38 1299
       timing notes)
paul@31 1300
paul@38 1301
  CAS (column address strobe setting the column address on a negative edge -
paul@38 1302
       see the timing notes)
paul@31 1303
paul@31 1304
  WE (sets write enable with logic 0, read with logic 1)
paul@31 1305
paul@31 1306
  ROM (select data access from ROM)
paul@31 1307
paul@31 1308
CPU-oriented memory access pins:
paul@31 1309
paul@31 1310
  A0...A15 (CPU address lines)
paul@31 1311
paul@31 1312
  PD0...PD7 (CPU data lines)
paul@31 1313
paul@31 1314
  R/W (indicates CPU write with logic 0, CPU read with logic 1)
paul@31 1315
paul@31 1316
Interrupt-related pins:
paul@31 1317
paul@31 1318
  NMI (CPU request for uninterrupted 1MHz access to memory)
paul@31 1319
paul@31 1320
  IRQ (signal event to CPU)
paul@31 1321
paul@31 1322
  POR (power-on reset, resetting the ULA on a positive edge and asserting the
paul@31 1323
       CPU's RST pin)
paul@31 1324
paul@31 1325
  RST (master reset for the CPU signalled on power-up and by the Break key)
paul@31 1326
paul@31 1327
Keyboard-related pins:
paul@31 1328
paul@31 1329
  KBD0...KBD3 (keyboard inputs)
paul@31 1330
paul@31 1331
  CAPS LOCK (control status LED)
paul@31 1332
paul@31 1333
Sound-related pins:
paul@31 1334
paul@31 1335
  SOUND O/P (sound output using internal oscillator)
paul@31 1336
paul@31 1337
Cassette-related pins:
paul@31 1338
paul@31 1339
  CAS IN (cassette circuit input, between 0.5V to 2V peak to peak)
paul@31 1340
paul@31 1341
  CAS OUT (pseudo-sinusoidal output, 1.8V peak to peak)
paul@31 1342
paul@31 1343
  CAS RC (detect high tone)
paul@31 1344
paul@31 1345
  CAS MO (motor relay output)
paul@31 1346
paul@31 1347
  ÷13 IN (~1200 baud clock input)
paul@46 1348
paul@72 1349
ULA Socket
paul@72 1350
----------
paul@72 1351
paul@72 1352
The socket used for the ULA is a 3M/TexTool 268-5400 68-pin socket.
paul@72 1353
paul@46 1354
References
paul@46 1355
----------
paul@46 1356
paul@46 1357
See: http://bbc.nvg.org/doc/A%20Hardware%20Guide%20for%20the%20BBC%20Microcomputer/bbc_hw.htm
paul@71 1358
paul@71 1359
About this Document
paul@71 1360
-------------------
paul@71 1361
paul@71 1362
The most recent version of this document and accompanying distribution should
paul@71 1363
be available from the following location:
paul@71 1364
paul@71 1365
http://hgweb.boddie.org.uk/ULA
paul@71 1366
paul@71 1367
Copyright and licence information can be found in the docs directory of this
paul@71 1368
distribution - see docs/COPYING.txt for more information.