nxu/audio.md

8.7 KiB

UXN Audio Proposal

Problems

Currently the UXN audio device doesn't work very well for playing complex music. There are a few reasons for this:

  • Note duration is conflated with envelope shape
  • Envelope resolution (67ms) limits tempos/subdivisions
  • Microtonal music is not possible (according to the spec)
  • Using audio callback requires scheduling pauses/silence

Proposal outline

One way to improve the situation is to disentangle the envelope specification from the note duration, and more generally make it easier to specify things that a composer will frequently need to change (pitch, articulation, duration) without having to change the underlying voice (waveform/envelope settings).

This proposal does four things:

  1. Add a two-byte duration port that configures a note's duration in milliseconds. The longest possible note is about 66 seconds.

  2. Double the size of the adsr port. This means replacing the existing two-byte port with four one-byte ports for attack, decay, sustain, and release. Since we have 4 extra bits per stage, we will reduce the resolution of each stage from 66ms to 10ms (so 0x01 means 10ms). The longest envelope stage is now about 2.6s (up from 1s previously).

  3. Add a one-byte mode port, which declares what kind of note or sound is being played. This provides an easy way to specify different behaviors such as:

    • articulation (e.g. staccato, legato, etc.)
    • different sample rates (44.1k, 22.05k, 11.025k, 5512)
  4. Move the volume port to 0x5 and add a one-byte detune port. A zero value (0x00) indicates a "normal" semitone pitch, and non-zero values indicate a fractional amount to add. The calculation is that the pitch is raised by detune/256 cents. For example, a value of 0x80 will raise the pitch by a quarter-tone. The port is placed just before pitch so that microtonal music can write a "micro-pitch" using one DEO2 instruction.

Microtonal music

Here's how to encode the 17-tone equal temperment scale (17ET) as detune/pitch pairs starting from middle C (0x3c, i.e. #3c). Since each step of the scale consists of 70.588 cents, we can get accurate pitches and detunes by adding 70.588 for each step then dividing by 100 and using the quotient and remainder:

  pitch  1: #003c  (0 semitones +  0.00 cents) -- root note is C (#3c)
  pitch  2: #b53c  (0 semitones + 70.59 cents)
  pitch  3: #693d  (1 semitones + 41.18 cents)
  pitch  4: #1e3e  (2 semitones + 11.76 cents)
  pitch  5: #d33e  (2 semitones + 82.35 cents)
  pitch  6: #883f  (3 semitones + 52.94 cents)
  pitch  7: #3c40  (4 semitones + 23.53 cents)
  pitch  8: #f140  (4 semitones + 94.12 cents)
  pitch  9: #a641  (5 semitones + 64.70 cents)
  pitch 10: #5a42  (6 semitones + 35.29 cents)
  pitch 11: #0f43  (7 semitones +  5.88 cents) -- almost perfect 5th (#43)
  pitch 12: #c443  (7 semitones + 76.47 cents)
  pitch 13: #7844  (8 semitones + 47.06 cents)
  pitch 14: #2d45  (9 semitones + 17.64 cents)
  pitch 15: #e245  (9 semitones + 88.23 cents)
  pitch 16: #9746 (10 semitones + 58.82 cents)
  pitch 17: #4b47 (11 semitones + 29.41 cents)
  pitch 18: #0048 (12 semitones +  0.00 cents) -- octave is C (#48)

While it's somewhat cumbersome to calculate these detune values in advance, it only has be done for one octave and the resulting microtonal pitches can be compactly stored and used.

Note duration and tempo

The duration and vector ports precisely specify the audio device behavior. The given note should be played for a number of milliseconds specified by duration, at which point the vector should be called to play the next note (or next silence). If the specified ADSR ports have a shorter duration, the mode defines how to extend the pitch (using the note type bits). If the ADSR ports have a longer duration, then the ADSR will be shortened to fit, starting with S/R but also truncating D and A if necessary. If duration is zero the duration will be calculated dynamically from ADSR, as it is now.

Composers can choose a duration for the smallest subdivision needed (e.g. 125ms per 16th note to achieve 120 bpm) and then compute precise durations for 8th notes, quarter notes, dotted-8th notes, whole notes, and so on. Similarly, composers can use the same envelope with stacatto and legato notes to easily achieve different articulations for different passages.

More flexible envelope and waveform settings

The new envelope duration range (10ms to 2.6s) allows more more complex envelopes to be specified, from slower builds and fades to very fast attacks and releases. Similarly, allowing waveforms to be specified at lower sampling rates potentially allows more interesting percussion instruments to be specified without using too many bytes of the ROM.

For comparison, the NES uses variable frequency samples to allow basic voices/sounds without using too much space. For NTSC devices the supported range is 4182-33144 Hz.

Appending A: proposed specification:

ADDR SIZE NAME DESCRIPTION 0x30 2 bytes vector callback address to use when note finishes playing 0x32 2 bytes duration (new) duration to play sound in fractional seconds (1ms resolution) 0x34 1 byte mode (new) configures how to interpret addr/adsr/pitch (see below) 0x35 1 byte volume (moved) 4-bit volumes for left/right channels (6.7% resolution) 0x36 1 byte attack (new) envelope: attack duration (vol 0-100%, 10ms resolution) 0x37 1 byte decay (new) envelope: decay duration (vol 100-50%, 10ms resolution) 0x38 1 byte sustain (new) envelope: sustain duration (vol 50%, 10ms resolution) 0x39 1 byte release (new) envelope: release duration (vol 50-0%, 10ms resolution) 0x3a 2 bytes length length of waveform data to read (in bytes) 0x3c 2 bytes addr address to read waveform data from 0x3e 1 byte detune (new) fraction of semitone to raise (0x80 gives a quarter tone) 0x3f 1 byte pitch 1-bit loop and 7-bit MIDI note (0x00 gives silence)

MODES

Mode consists of the bits xxWW xAAA.

The A bits correspond to articulation, which determines how to fill extra space when the duration exceeds the envelope length, and also which parts of the envelope to exclude (if any) which is denoted with an underscore:

0x00  (xxxx x000)  regular (ADSR, pads with silence)
0x01  (xxxx x001)  short (AD_R, pads with silence)
0x02  (xxxx x010)  staccato (_D_R, pads with silence)
0x03  (xxxx x011)  staccatissimo (_D__, pads with silence)
0x04  (xxxx x100)  legato (ADSR, extends S)
0x05  (xxxx x101)  begin slur (ADS_, extends S)
0x06  (xxxx x110)  slur (__S_, extends S)
0x07  (xxxx x111)  end slur (__SR, extends S)

For shorter durations, the envelope is truncated starting from the end. For example, with short articulation the release will be truncated first, then the decay, and finally the attack.

The W bits correspond to waveform type:

0x00  (xx00 xxxx)  waveform sampled at 44100 Hz (44.1 kHz)
0x40  (xx01 xxxx)  waveform sampled at 22050 Hz
0x80  (xx10 xxxx)  waveform sampled at 11025 Hz
0xc0  (xx11 xxxx)  waveform sampled at  5512 Hz

Upsampling will be performed by repeating sample values as many times as needed (2x, 4x, or 8x). The underlying sound engine is still expected to play sounds at 44.1 kHz. For sounds which are not samples at a lower frequency using one of the lower frequencies will have the effect of "pitch shifting" the sample down one or more octaves.

Appendix B: not currently supported

There are some features which would be nice to add but which are not strictly necessary and would require more significant changes. They could potentially be supported in the future using additional bits from the mode port, by new devices, or by a larger change.

  • Vibrato/Tremelo
  • Portamento/Glissando/Glide
  • Effects (reverb, equalization, overdrive, etc.)
  • Frequency generators/software synths

If we want to support even slower builds/fades we could use existing mode bits to change the units used by the envelope. That would allow us to specify larger resolutions.

Appendix C: existing specification

ADDR SIZE NAME DESCRIPTION 0x30 2 bytes vector callback address to use when note finishes playing 0x32 2 bytes position read current position in sample 0x34 1 byte output read envelope loudness at this moment (0x000 to 0x888) 0x35 (unused) 0x36 (unused) 0x37 (unused) 0x38 2 bytes adsr four 4-bit envelope values (attack/decay/sustain/release) 0x3a 2 bytes length length of waveform data to read in bytes 0x3c 2 bytes addr address to read waveform data from 0x3e 1 byte volume 4-bit volumes for left/right channels 0x3f 1 byte pitch 1-bit loop and 7-bit MIDI note