nxu/audio-v2.md

4.8 KiB

UXN Audio Proposal (v2)

(Updated with input from bd and neauoire)

Problems

Currently the UXN audio device doesn't work very well for playing complex music. There are a few reasons for this:

  • Note duration is conflated with envelope shape
  • Envelope resolution (67ms) limits tempos/subdivisions
  • Using audio callback requires scheduling pauses/silence

Proposal outline

One way to improve the situation is to disentangle the envelope specification from the note duration, and more generally make it easier to specify things that a composer will frequently need to change (pitch, articulation, duration) without having to change the underlying voice (waveform/envelope settings).

This proposal makes four changes:

  1. Add a two-byte duration port that configures a note's duration in milliseconds. The longest possible note is about 66 seconds.

  2. Double the size of the adsr port. This means replacing the existing two-byte port with four one-byte ports for attack, decay, sustain, and release. Since we have 4 extra bits per stage, we will reduce the resolution of each stage from 66ms to 10ms (so 0x01 means 10ms). The longest envelope stage is now about 2.6s (up from 1s previously). We special-case sustain and instead treat its value as a fraction x/255 (i.e. 0.0 to 1.0).

  3. Move various ports around, both to improve the layout and prepare for future additions. In particular an expansion port for possible MIDI operations and a detune port for microtonal music are likely (but are left unspecified by this proposal).

  4. Recommends that emulators use a separate wst and rst for evaluating the audio vector (when possible). Code run from the audio vector should not expect to read existing values from wst or rst (and should not leave values behind). This allows emulators to use a separate audio thread for evaluating callbacks without needing to pause other execution.

Note duration and tempo

The duration and vector ports precisely specify the audio device behavior. The given note should be played for a number of milliseconds specified by duration, at which point the vector should be called to play the next note (or next silence). For example if the duration is 0x04b0 then the note should play for 1.2 seconds (1200 ms).

More flexible envelope settings

The ADSR ports determine how loud the pitch should be at any given moment. The ADR ports (attack, decay, and release) are all specified in 10ms increments (e.g. 0x03 is 30ms). The S port for sustain behaves differently: it specifies what how much of the "leftover" duration to use before the release as a fraction x/255. So with a value of 0xff the note would hold as long as possible, and with 0x00 the release would occur just after the decay ends.

(If the duration is short parts of the envelope may be truncated.)

Since each component has its own port, it's also much easier to adjust one without having to fiddle with bit masks, shifting, etc.

Appendix A: proposed specification:

ADDR  SIZE     NAME      DESCRIPTION
0x30  2 bytes  vector    callback address to use when note finishes playing
0x32  2 bytes  duration  duration to play sound in fractional seconds (1ms resolution)
0x34  1 byte   attack    envelope: attack duration (vol 0-100%, 10ms resolution)
0x35  1 byte   decay     envelope: decay duration (vol 100-50%, 10ms resolution)
0x36  1 byte   sustain   envelope: sustain fraction (vol 50%, x/255 of free time)
0x37  1 byte   release   envelope: release duration (vol 50-0%, 10ms resolution)
0x38  2 bytes  addr      address to read waveform data from
0x3a  2 bytes  length    length of waveform data to read (in bytes)
0x3c  1 byte   volume    4-bit volumes for left/right channels (6.7% resolution)
0x3d  1 byte             (unused - reserved for expansion)
0x3e  1 byte   pitch     1-bit loop and 7-bit MIDI note (0x00 gives silence)
0x3f  1 byte             (unused - reserved for detune)

Appendix B: existing specification

ADDR  SIZE     NAME      DESCRIPTION
0x30  2 bytes  vector    callback address to use when note finishes playing
0x32  2 bytes  position  read current position in sample
0x34  1 byte   output    read envelope loudness at this moment (0x000 to 0x888)
0x35                     (unused)
0x36                     (unused)
0x37                     (unused)
0x38  2 bytes  adsr      four 4-bit envelope values (attack/decay/sustain/release)
0x3a  2 bytes  length    length of waveform data to read in bytes
0x3c  2 bytes  addr      address to read waveform data from
0x3e  1 byte   volume    4-bit volumes for left/right channels
0x3f  1 byte   pitch     1-bit loop and 7-bit MIDI note