194 lines
8.2 KiB
Markdown
194 lines
8.2 KiB
Markdown
|
# UXN Audio Proposal
|
||
|
|
||
|
## Problems
|
||
|
|
||
|
Currently the UXN audio device doesn't work very well for playing
|
||
|
complex music. There are a few reasons for this:
|
||
|
|
||
|
* Note duration is conflated with envelope shape
|
||
|
* Envelope resolution (67ms) limits tempos/subdivisions
|
||
|
* Microtonal music is not possible (according to the spec)
|
||
|
* Using audio callback requires scheduling pauses/silence
|
||
|
|
||
|
## Proposal outline
|
||
|
|
||
|
One way to improve the situation is to disentangle the envelope
|
||
|
specification from the note duration, and more generally make it
|
||
|
easier to specify things that a composer will frequently need to
|
||
|
change (pitch, articulation, duration) without having to change the
|
||
|
underlying voice (waveform/envelope settings).
|
||
|
|
||
|
This proposal does four things:
|
||
|
|
||
|
1. Add a two-byte `duration` port that configures a note's duration
|
||
|
in milliseconds. The longest possible note is about 66 seconds.
|
||
|
|
||
|
2. Double the size of the `adsr` port. This means replacing the
|
||
|
existing two-byte port with four one-byte ports for `attack`,
|
||
|
`decay`, `sustain`, and `release`. Since we have 4 extra bits per
|
||
|
stage, we will reduce the resolution of each stage from 66ms to
|
||
|
10ms (so 0x01 means 10ms). The longest envelope stage is now about
|
||
|
26s (up from 1s previously).
|
||
|
|
||
|
3. Add a one-byte `mode` port, which declares what kind of note or
|
||
|
sound is being played. This provides an easy way to specify
|
||
|
different behaviors such as:
|
||
|
* staccato, legato, or standard playing styles
|
||
|
* different sample rates (44.1, 22.05, 11.025)
|
||
|
* looping or non-looping playback
|
||
|
|
||
|
4. Move the `volume` port to `0x5` and add a one-byte `detune` port.
|
||
|
A zero value (`0x00`) indicates a "normal" semitone pitch, and
|
||
|
non-zero values indicate a fractional amount to add. The
|
||
|
calculation is that the pitch is raised by `detune/256` cents. For
|
||
|
example, a value of `0x80` will raise the pitch by a quarter-tone.
|
||
|
The port is placed just before `pitch` so that microtonal music
|
||
|
can write a "micro-pitch" using one `DEO2` instruction.
|
||
|
|
||
|
## Microtonal music
|
||
|
|
||
|
Here's how to encode the 17-tone equal temperment scale (17ET) as
|
||
|
`detune/pitch` pairs starting from middle C (`0x3c`). Since each step
|
||
|
of the scale consists of 70.588 cents, we can get accurate pitches and
|
||
|
detunes by adding 70.588 for each step then dividing by 100 and using
|
||
|
the quotient and remainder:
|
||
|
|
||
|
```
|
||
|
pitch 1: #003c (0 semitones + 0.00 cents)
|
||
|
pitch 2: #b53c (0 semitones + 70.59 cents)
|
||
|
pitch 3: #693d (1 semitones + 41.18 cents)
|
||
|
pitch 4: #1e3e (2 semitones + 11.76 cents)
|
||
|
pitch 5: #d33e (2 semitones + 82.35 cents)
|
||
|
pitch 6: #883f (3 semitones + 52.94 cents)
|
||
|
pitch 7: #3c40 (4 semitones + 23.53 cents)
|
||
|
pitch 8: #f140 (4 semitones + 94.12 cents)
|
||
|
pitch 9: #a641 (5 semitones + 64.70 cents)
|
||
|
pitch 10: #5a42 (6 semitones + 35.29 cents)
|
||
|
pitch 11: #0f43 (7 semitones + 5.88 cents)
|
||
|
pitch 12: #c443 (7 semitones + 76.47 cents)
|
||
|
pitch 13: #7844 (8 semitones + 47.06 cents)
|
||
|
pitch 14: #2d45 (9 semitones + 17.64 cents)
|
||
|
pitch 15: #e245 (9 semitones + 88.23 cents)
|
||
|
pitch 16: #9746 (10 semitones + 58.82 cents)
|
||
|
pitch 17: #4b47 (11 semitones + 29.41 cents)
|
||
|
pitch 18: #0048 (12 semitones + 0.00 cents)
|
||
|
```
|
||
|
|
||
|
While it's somewhat cumbersome to calculate these detune values in
|
||
|
advance, it only has be done for one octave and the resulting
|
||
|
microtonal pitches can be compactly stored and used.
|
||
|
|
||
|
## Note duration and tempo
|
||
|
|
||
|
The `duration` and `vector` ports precisely specify the audio device
|
||
|
behavior. The given note should be played for a number of milliseconds
|
||
|
specified by `duration`, at which point the `vector` should be called
|
||
|
to play the next note (or next silence). If the specified ADSR ports
|
||
|
have a shorter duration, the *mode* defines how to extend the pitch
|
||
|
(using the *note type* bits). If the ADSR ports have a longer
|
||
|
duration, then the ADSR will be shortened to fit, starting with S/R
|
||
|
but also truncating D and A if necessary.
|
||
|
|
||
|
Composers can choose a duration for the smallest subdivision needed
|
||
|
(e.g. 125ms per 16th note to achieve 120 bpm) and then compute precise
|
||
|
durations for 8th notes, quarter notes, dotted-8th notes, whole notes,
|
||
|
and so on. Similarly, composers can use the same envelope with
|
||
|
stacatto and legato notes to easily achieve different articulations
|
||
|
for different passages.
|
||
|
|
||
|
## More flexible envelope and waveform settings
|
||
|
|
||
|
The new envelope duration range (10ms to 25s) allows more more complex
|
||
|
envelopes to be specified, from slow builds and fades to very fast
|
||
|
attacks and releases. Similarly, allowing waveforms to be specified at
|
||
|
lower sampling rates potentially allows more interesting percussion
|
||
|
instruments to be specified without using too many bytes of the ROM.
|
||
|
|
||
|
For comparison, the NES uses variable frequency samples to allow basic
|
||
|
voices/sounds without using too much space. For NTSC devices the
|
||
|
supported range is 4182-33144 Hz.
|
||
|
|
||
|
## Appending A: proposed specification:
|
||
|
|
||
|
ADDR SIZE NAME DESCRIPTION
|
||
|
0x30 2 bytes vector callback address to use when note finishes playing
|
||
|
0x32 2 bytes duration (new) duration to play sound in fractional seconds (1ms resolution)
|
||
|
0x34 1 byte mode (new) configures how to interpret addr/adsr/pitch (see below)
|
||
|
0x35 1 byte volume (moved) 4-bit volumes for left/right channels (6.7% resolution)
|
||
|
0x36 1 byte attack (new) envelope: attack duration (vol 0-100%, 10ms resolution)
|
||
|
0x37 1 byte decay (new) envelope: decay duration (vol 100-50%, 10ms resolution)
|
||
|
0x38 1 byte sustain (new) envelope: sustain duration (vol 50%, 10ms resolution)
|
||
|
0x39 1 byte release (new) envelope: release duration (vol 50-0%, 10ms resolution)
|
||
|
0x3a 2 bytes length length of waveform data to read (in bytes)
|
||
|
0x3c 2 bytes addr address to read waveform data from
|
||
|
0x3e 1 byte detune (new) fraction of semitone to raise (0x80 gives a quarter tone)
|
||
|
0x3f 1 byte pitch 1-bit loop and 7-bit MIDI note (0x00 gives silence)
|
||
|
|
||
|
MODES
|
||
|
|
||
|
Mode consists of the bits `Lxxx WWNN`.
|
||
|
|
||
|
The `N` bits correspond to note type:
|
||
|
|
||
|
```
|
||
|
0x00 (xxxx xx00) standard note (uses ADSR, extends with silence)
|
||
|
0x01 (xxxx xx01) staccato note (uses ADR, ignores S, extends with silence)
|
||
|
0x02 (xxxx xx10) legato note (uses ADSR, extends S as needed)
|
||
|
0x03 (xxxx xx11) slurred note (uses S, ignores ADR, extends S as needed)
|
||
|
```
|
||
|
|
||
|
Since we are no longer computing note duration from the ADSR
|
||
|
durations, the note type specifies what to do when the duration is
|
||
|
different than the envelope. For shorter durations, the sustain and/or
|
||
|
release are truncated; for longer durations it varies by type.
|
||
|
|
||
|
The `W` bits correspond to waveform type:
|
||
|
|
||
|
```
|
||
|
0x00 (xxxx 00xx) waveform sampled at 44100 Hz (44.1 kHz)
|
||
|
0x40 (xxxx 01xx) waveform sampled at 22050 Hz
|
||
|
0x80 (xxxx 10xx) waveform sampled at 11025 Hz
|
||
|
0xc0 (xxxx 11xx) waveform sampled at 5512 Hz
|
||
|
```
|
||
|
|
||
|
Upsampling will be performed by repeating sample values as many times
|
||
|
as needed (2x, 4x, or 8x). The underlying sound engine is still
|
||
|
expected to play sounds at 44.1 kHz.
|
||
|
|
||
|
The `L` bit corresponds to whether to loop or not:
|
||
|
|
||
|
```
|
||
|
0x00 (0xxx xxxx) play once (do not loop)
|
||
|
0x80 (1xxx xxxx) repeat note indefinitely
|
||
|
```
|
||
|
|
||
|
Looping will continue until a new `pitch` is written (at which point
|
||
|
that note's looping behavior will be used).
|
||
|
|
||
|
## Appendix B: not currently supported
|
||
|
|
||
|
There are some features which would be nice to add but which are not
|
||
|
strictly necessary and would require more significant changes. They
|
||
|
could potentially be supported in the future using additional bits
|
||
|
from the `mode` port, by new devices, or by a larger change.
|
||
|
|
||
|
* Vibrato/Tremelo
|
||
|
* Portamento/Glissando/Glide
|
||
|
* Effects (reverb, equalization, overdrive, etc.)
|
||
|
* Frequency generators/software synths
|
||
|
|
||
|
## Appendix C: existing specification
|
||
|
|
||
|
ADDR SIZE NAME DESCRIPTION
|
||
|
0x30 2 bytes vector callback address to use when note finishes playing
|
||
|
0x32 2 bytes position read current position in sample
|
||
|
0x34 1 byte output read envelope loudness at this moment (0x000 to 0x888)
|
||
|
0x35 (unused)
|
||
|
0x36 (unused)
|
||
|
0x37 (unused)
|
||
|
0x38 2 bytes adsr four 4-bit envelope values (attack/decay/sustain/release)
|
||
|
0x3a 2 bytes length length of waveform data to read in bytes
|
||
|
0x3c 2 bytes addr address to read waveform data from
|
||
|
0x3e 1 byte volume 4-bit volumes for left/right channels
|
||
|
0x3f 1 byte pitch 1-bit loop and 7-bit MIDI note
|