nxu/audio.md

# UXN Audio Proposal

## Problems

Currently the UXN audio device doesn't work very well for playing
complex music. There are a few reasons for this:

 * Note duration is conflated with envelope shape
 * Envelope resolution (67ms) limits tempos/subdivisions
 * Microtonal music is not possible (according to the spec)
 * Using audio callback requires scheduling pauses/silence
 
## Proposal outline

One way to improve the situation is to disentangle the envelope
specification from the note duration, and more generally make it
easier to specify things that a composer will frequently need to
change (pitch, articulation, duration) without having to change the
underlying voice (waveform/envelope settings).

This proposal does four things:

 1. Add a two-byte `duration` port that configures a note's duration
    in milliseconds. The longest possible note is about 66 seconds.

 2. Double the size of the `adsr` port. This means replacing the
    existing two-byte port with four one-byte ports for `attack`,
    `decay`, `sustain`, and `release`. Since we have 4 extra bits per
    stage, we will reduce the resolution of each stage from 66ms to
    10ms (so 0x01 means 10ms). The longest envelope stage is now about
    26s (up from 1s previously).
    
 3. Add a one-byte `mode` port, which declares what kind of note or
    sound is being played. This provides an easy way to specify
    different behaviors such as:
     * staccato, legato, or standard playing styles
     * different sample rates (44.1, 22.05, 11.025)
     * looping or non-looping playback

 4. Move the `volume` port to `0x5` and add a one-byte `detune` port.
    A zero value (`0x00`) indicates a "normal" semitone pitch, and
    non-zero values indicate a fractional amount to add. The
    calculation is that the pitch is raised by `detune/256` cents. For
    example, a value of `0x80` will raise the pitch by a quarter-tone.
    The port is placed just before `pitch` so that microtonal music
    can write a "micro-pitch" using one `DEO2` instruction.

## Microtonal music

Here's how to encode the 17-tone equal temperment scale (17ET) as
`detune/pitch` pairs starting from middle C (`0x3c`). Since each step
of the scale consists of 70.588 cents, we can get accurate pitches and
detunes by adding 70.588 for each step then dividing by 100 and using
the quotient and remainder:

```
  pitch  1: #003c  (0 semitones +  0.00 cents)
  pitch  2: #b53c  (0 semitones + 70.59 cents)
  pitch  3: #693d  (1 semitones + 41.18 cents)
  pitch  4: #1e3e  (2 semitones + 11.76 cents)
  pitch  5: #d33e  (2 semitones + 82.35 cents)
  pitch  6: #883f  (3 semitones + 52.94 cents)
  pitch  7: #3c40  (4 semitones + 23.53 cents)
  pitch  8: #f140  (4 semitones + 94.12 cents)
  pitch  9: #a641  (5 semitones + 64.70 cents)
  pitch 10: #5a42  (6 semitones + 35.29 cents)
  pitch 11: #0f43  (7 semitones +  5.88 cents)
  pitch 12: #c443  (7 semitones + 76.47 cents)
  pitch 13: #7844  (8 semitones + 47.06 cents)
  pitch 14: #2d45  (9 semitones + 17.64 cents)
  pitch 15: #e245  (9 semitones + 88.23 cents)
  pitch 16: #9746 (10 semitones + 58.82 cents)
  pitch 17: #4b47 (11 semitones + 29.41 cents)
  pitch 18: #0048 (12 semitones +  0.00 cents)
```

While it's somewhat cumbersome to calculate these detune values in
advance, it only has be done for one octave and the resulting
microtonal pitches can be compactly stored and used.

## Note duration and tempo

The `duration` and `vector` ports precisely specify the audio device
behavior. The given note should be played for a number of milliseconds
specified by `duration`, at which point the `vector` should be called
to play the next note (or next silence). If the specified ADSR ports
have a shorter duration, the *mode* defines how to extend the pitch
(using the *note type* bits). If the ADSR ports have a longer
duration, then the ADSR will be shortened to fit, starting with S/R
but also truncating D and A if necessary.

Composers can choose a duration for the smallest subdivision needed
(e.g. 125ms per 16th note to achieve 120 bpm) and then compute precise
durations for 8th notes, quarter notes, dotted-8th notes, whole notes,
and so on. Similarly, composers can use the same envelope with
stacatto and legato notes to easily achieve different articulations
for different passages.

## More flexible envelope and waveform settings

The new envelope duration range (10ms to 25s) allows more more complex
envelopes to be specified, from slow builds and fades to very fast
attacks and releases. Similarly, allowing waveforms to be specified at
lower sampling rates potentially allows more interesting percussion
instruments to be specified without using too many bytes of the ROM.

For comparison, the NES uses variable frequency samples to allow basic
voices/sounds without using too much space. For NTSC devices the
supported range is 4182-33144 Hz.

## Appending A: proposed specification:

ADDR  SIZE     NAME      DESCRIPTION
0x30  2 bytes  vector    callback address to use when note finishes playing
0x32  2 bytes  duration  (new) duration to play sound in fractional seconds (1ms resolution)
0x34  1 byte   mode      (new) configures how to interpret addr/adsr/pitch (see below)
0x35  1 byte   volume    (moved) 4-bit volumes for left/right channels (6.7% resolution)
0x36  1 byte   attack    (new) envelope: attack duration (vol 0-100%, 10ms resolution)
0x37  1 byte   decay     (new) envelope: decay duration (vol 100-50%, 10ms resolution)
0x38  1 byte   sustain   (new) envelope: sustain duration (vol 50%, 10ms resolution)
0x39  1 byte   release   (new) envelope: release duration (vol 50-0%, 10ms resolution)
0x3a  2 bytes  length    length of waveform data to read (in bytes)
0x3c  2 bytes  addr      address to read waveform data from
0x3e  1 byte   detune    (new) fraction of semitone to raise (0x80 gives a quarter tone)
0x3f  1 byte   pitch     1-bit loop and 7-bit MIDI note (0x00 gives silence)

MODES

Mode consists of the bits `Lxxx WWNN`.

The `N` bits correspond to note type:

```
0x00  (xxxx xx00)  standard note (uses ADSR, extends with silence)
0x01  (xxxx xx01)  staccato note (uses ADR, ignores S, extends with silence)
0x02  (xxxx xx10)  legato note (uses ADSR, extends S as needed)
0x03  (xxxx xx11)  slurred note (uses S, ignores ADR, extends S as needed)
```

Since we are no longer computing note duration from the ADSR
durations, the note type specifies what to do when the duration is
different than the envelope. For shorter durations, the sustain and/or
release are truncated; for longer durations it varies by type.

The `W` bits correspond to waveform type:

```
0x00  (xxxx 00xx)  waveform sampled at 44100 Hz (44.1 kHz)
0x40  (xxxx 01xx)  waveform sampled at 22050 Hz
0x80  (xxxx 10xx)  waveform sampled at 11025 Hz
0xc0  (xxxx 11xx)  waveform sampled at  5512 Hz
```

Upsampling will be performed by repeating sample values as many times
as needed (2x, 4x, or 8x). The underlying sound engine is still
expected to play sounds at 44.1 kHz.

The `L` bit corresponds to whether to loop or not:

```
0x00 (0xxx xxxx) play once (do not loop)
0x80 (1xxx xxxx) repeat note indefinitely
```

Looping will continue until a new `pitch` is written (at which point
that note's looping behavior will be used).

## Appendix B: not currently supported

There are some features which would be nice to add but which are not
strictly necessary and would require more significant changes. They
could potentially be supported in the future using additional bits
from the `mode` port, by new devices, or by a larger change.

 * Vibrato/Tremelo
 * Portamento/Glissando/Glide
 * Effects (reverb, equalization, overdrive, etc.)
 * Frequency generators/software synths

## Appendix C: existing specification

ADDR  SIZE     NAME      DESCRIPTION
0x30  2 bytes  vector    callback address to use when note finishes playing
0x32  2 bytes  position  read current position in sample
0x34  1 byte   output    read envelope loudness at this moment (0x000 to 0x888)
0x35                     (unused)
0x36                     (unused)
0x37                     (unused)
0x38  2 bytes  adsr      four 4-bit envelope values (attack/decay/sustain/release)
0x3a  2 bytes  length    length of waveform data to read in bytes
0x3c  2 bytes  addr      address to read waveform data from
0x3e  1 byte   volume    4-bit volumes for left/right channels
0x3f  1 byte   pitch     1-bit loop and 7-bit MIDI note
synth/audio experiments 2023-08-17 14:07:41 -04:00			`# UXN Audio Proposal`

			`## Problems`

			`Currently the UXN audio device doesn't work very well for playing`
			`complex music. There are a few reasons for this:`

			`* Note duration is conflated with envelope shape`
			`* Envelope resolution (67ms) limits tempos/subdivisions`
			`* Microtonal music is not possible (according to the spec)`
			`* Using audio callback requires scheduling pauses/silence`

			`## Proposal outline`

			`One way to improve the situation is to disentangle the envelope`
			`specification from the note duration, and more generally make it`
			`easier to specify things that a composer will frequently need to`
			`change (pitch, articulation, duration) without having to change the`
			`underlying voice (waveform/envelope settings).`

			`This proposal does four things:`

			1. Add a two-byte `duration` port that configures a note's duration
			`in milliseconds. The longest possible note is about 66 seconds.`

			2. Double the size of the `adsr` port. This means replacing the
			existing two-byte port with four one-byte ports for `attack`,
			`decay`, `sustain`, and `release`. Since we have 4 extra bits per
			`stage, we will reduce the resolution of each stage from 66ms to`
			`10ms (so 0x01 means 10ms). The longest envelope stage is now about`
			`26s (up from 1s previously).`

			3. Add a one-byte `mode` port, which declares what kind of note or
			`sound is being played. This provides an easy way to specify`
			`different behaviors such as:`
			`* staccato, legato, or standard playing styles`
			`* different sample rates (44.1, 22.05, 11.025)`
			`* looping or non-looping playback`

			4. Move the `volume` port to `0x5` and add a one-byte `detune` port.
			A zero value (`0x00`) indicates a "normal" semitone pitch, and
			`non-zero values indicate a fractional amount to add. The`
			calculation is that the pitch is raised by `detune/256` cents. For
			example, a value of `0x80` will raise the pitch by a quarter-tone.
			The port is placed just before `pitch` so that microtonal music
			can write a "micro-pitch" using one `DEO2` instruction.

			`## Microtonal music`

			`Here's how to encode the 17-tone equal temperment scale (17ET) as`
			`detune/pitch` pairs starting from middle C (`0x3c`). Since each step
			`of the scale consists of 70.588 cents, we can get accurate pitches and`
			`detunes by adding 70.588 for each step then dividing by 100 and using`
			`the quotient and remainder:`

			```
			`pitch 1: #003c (0 semitones + 0.00 cents)`
			`pitch 2: #b53c (0 semitones + 70.59 cents)`
			`pitch 3: #693d (1 semitones + 41.18 cents)`
			`pitch 4: #1e3e (2 semitones + 11.76 cents)`
			`pitch 5: #d33e (2 semitones + 82.35 cents)`
			`pitch 6: #883f (3 semitones + 52.94 cents)`
			`pitch 7: #3c40 (4 semitones + 23.53 cents)`
			`pitch 8: #f140 (4 semitones + 94.12 cents)`
			`pitch 9: #a641 (5 semitones + 64.70 cents)`
			`pitch 10: #5a42 (6 semitones + 35.29 cents)`
			`pitch 11: #0f43 (7 semitones + 5.88 cents)`
			`pitch 12: #c443 (7 semitones + 76.47 cents)`
			`pitch 13: #7844 (8 semitones + 47.06 cents)`
			`pitch 14: #2d45 (9 semitones + 17.64 cents)`
			`pitch 15: #e245 (9 semitones + 88.23 cents)`
			`pitch 16: #9746 (10 semitones + 58.82 cents)`
			`pitch 17: #4b47 (11 semitones + 29.41 cents)`
			`pitch 18: #0048 (12 semitones + 0.00 cents)`
			```

			`While it's somewhat cumbersome to calculate these detune values in`
			`advance, it only has be done for one octave and the resulting`
			`microtonal pitches can be compactly stored and used.`

			`## Note duration and tempo`

			The `duration` and `vector` ports precisely specify the audio device
			`behavior. The given note should be played for a number of milliseconds`
			specified by `duration`, at which point the `vector` should be called
			`to play the next note (or next silence). If the specified ADSR ports`
			`have a shorter duration, the mode defines how to extend the pitch`
			`(using the note type bits). If the ADSR ports have a longer`
			`duration, then the ADSR will be shortened to fit, starting with S/R`
			`but also truncating D and A if necessary.`

			`Composers can choose a duration for the smallest subdivision needed`
			`(e.g. 125ms per 16th note to achieve 120 bpm) and then compute precise`
			`durations for 8th notes, quarter notes, dotted-8th notes, whole notes,`
			`and so on. Similarly, composers can use the same envelope with`
			`stacatto and legato notes to easily achieve different articulations`
			`for different passages.`

			`## More flexible envelope and waveform settings`

			`The new envelope duration range (10ms to 25s) allows more more complex`
			`envelopes to be specified, from slow builds and fades to very fast`
			`attacks and releases. Similarly, allowing waveforms to be specified at`
			`lower sampling rates potentially allows more interesting percussion`
			`instruments to be specified without using too many bytes of the ROM.`

			`For comparison, the NES uses variable frequency samples to allow basic`
			`voices/sounds without using too much space. For NTSC devices the`
			`supported range is 4182-33144 Hz.`

			`## Appending A: proposed specification:`

			`ADDR SIZE NAME DESCRIPTION`
			`0x30 2 bytes vector callback address to use when note finishes playing`
			`0x32 2 bytes duration (new) duration to play sound in fractional seconds (1ms resolution)`
			`0x34 1 byte mode (new) configures how to interpret addr/adsr/pitch (see below)`
			`0x35 1 byte volume (moved) 4-bit volumes for left/right channels (6.7% resolution)`
			`0x36 1 byte attack (new) envelope: attack duration (vol 0-100%, 10ms resolution)`
			`0x37 1 byte decay (new) envelope: decay duration (vol 100-50%, 10ms resolution)`
			`0x38 1 byte sustain (new) envelope: sustain duration (vol 50%, 10ms resolution)`
			`0x39 1 byte release (new) envelope: release duration (vol 50-0%, 10ms resolution)`
			`0x3a 2 bytes length length of waveform data to read (in bytes)`
			`0x3c 2 bytes addr address to read waveform data from`
			`0x3e 1 byte detune (new) fraction of semitone to raise (0x80 gives a quarter tone)`
			`0x3f 1 byte pitch 1-bit loop and 7-bit MIDI note (0x00 gives silence)`

			`MODES`

			Mode consists of the bits `Lxxx WWNN`.

			The `N` bits correspond to note type:

			```
			`0x00 (xxxx xx00) standard note (uses ADSR, extends with silence)`
			`0x01 (xxxx xx01) staccato note (uses ADR, ignores S, extends with silence)`
			`0x02 (xxxx xx10) legato note (uses ADSR, extends S as needed)`
			`0x03 (xxxx xx11) slurred note (uses S, ignores ADR, extends S as needed)`
			```

			`Since we are no longer computing note duration from the ADSR`
			`durations, the note type specifies what to do when the duration is`
			`different than the envelope. For shorter durations, the sustain and/or`
			`release are truncated; for longer durations it varies by type.`

			The `W` bits correspond to waveform type:

			```
			`0x00 (xxxx 00xx) waveform sampled at 44100 Hz (44.1 kHz)`
			`0x40 (xxxx 01xx) waveform sampled at 22050 Hz`
			`0x80 (xxxx 10xx) waveform sampled at 11025 Hz`
			`0xc0 (xxxx 11xx) waveform sampled at 5512 Hz`
			```

			`Upsampling will be performed by repeating sample values as many times`
			`as needed (2x, 4x, or 8x). The underlying sound engine is still`
			`expected to play sounds at 44.1 kHz.`

			The `L` bit corresponds to whether to loop or not:

			```
			`0x00 (0xxx xxxx) play once (do not loop)`
			`0x80 (1xxx xxxx) repeat note indefinitely`
			```

			Looping will continue until a new `pitch` is written (at which point
			`that note's looping behavior will be used).`

			`## Appendix B: not currently supported`

			`There are some features which would be nice to add but which are not`
			`strictly necessary and would require more significant changes. They`
			`could potentially be supported in the future using additional bits`
			from the `mode` port, by new devices, or by a larger change.

			`* Vibrato/Tremelo`
			`* Portamento/Glissando/Glide`
			`* Effects (reverb, equalization, overdrive, etc.)`
			`* Frequency generators/software synths`

			`## Appendix C: existing specification`

			`ADDR SIZE NAME DESCRIPTION`
			`0x30 2 bytes vector callback address to use when note finishes playing`
			`0x32 2 bytes position read current position in sample`
			`0x34 1 byte output read envelope loudness at this moment (0x000 to 0x888)`
			`0x35 (unused)`
			`0x36 (unused)`
			`0x37 (unused)`
			`0x38 2 bytes adsr four 4-bit envelope values (attack/decay/sustain/release)`
			`0x3a 2 bytes length length of waveform data to read in bytes`
			`0x3c 2 bytes addr address to read waveform data from`
			`0x3e 1 byte volume 4-bit volumes for left/right channels`
			`0x3f 1 byte pitch 1-bit loop and 7-bit MIDI note`