From 374e92d844e4c5b561b919e2b4004a5fa7255a96 Mon Sep 17 00:00:00 2001 From: d_m Date: Mon, 21 Aug 2023 23:10:02 -0400 Subject: [PATCH] updates to spec --- audio.md | 76 ++++++++++++++++++++++++++++++-------------------------- 1 file changed, 41 insertions(+), 35 deletions(-) diff --git a/audio.md b/audio.md index ae0f801..23ebc41 100644 --- a/audio.md +++ b/audio.md @@ -33,9 +33,8 @@ This proposal does four things: 3. Add a one-byte `mode` port, which declares what kind of note or sound is being played. This provides an easy way to specify different behaviors such as: - * staccato, legato, or standard playing styles - * different sample rates (44.1, 22.05, 11.025) - * looping or non-looping playback + * articulation (e.g. staccato, legato, etc.) + * different sample rates (44.1k, 22.05k, 11.025k, 5512) 4. Move the `volume` port to `0x5` and add a one-byte `detune` port. A zero value (`0x00`) indicates a "normal" semitone pitch, and @@ -48,13 +47,13 @@ This proposal does four things: ## Microtonal music Here's how to encode the 17-tone equal temperment scale (17ET) as -`detune/pitch` pairs starting from middle C (`0x3c`). Since each step -of the scale consists of 70.588 cents, we can get accurate pitches and -detunes by adding 70.588 for each step then dividing by 100 and using -the quotient and remainder: +`detune/pitch` pairs starting from middle C (`0x3c`, i.e. `#3c`). +Since each step of the scale consists of 70.588 cents, we can get +accurate pitches and detunes by adding 70.588 for each step then +dividing by 100 and using the quotient and remainder: ``` - pitch 1: #003c (0 semitones + 0.00 cents) + pitch 1: #003c (0 semitones + 0.00 cents) -- root note is C (#3c) pitch 2: #b53c (0 semitones + 70.59 cents) pitch 3: #693d (1 semitones + 41.18 cents) pitch 4: #1e3e (2 semitones + 11.76 cents) @@ -64,14 +63,14 @@ the quotient and remainder: pitch 8: #f140 (4 semitones + 94.12 cents) pitch 9: #a641 (5 semitones + 64.70 cents) pitch 10: #5a42 (6 semitones + 35.29 cents) - pitch 11: #0f43 (7 semitones + 5.88 cents) + pitch 11: #0f43 (7 semitones + 5.88 cents) -- almost perfect 5th (#43) pitch 12: #c443 (7 semitones + 76.47 cents) pitch 13: #7844 (8 semitones + 47.06 cents) pitch 14: #2d45 (9 semitones + 17.64 cents) pitch 15: #e245 (9 semitones + 88.23 cents) pitch 16: #9746 (10 semitones + 58.82 cents) pitch 17: #4b47 (11 semitones + 29.41 cents) - pitch 18: #0048 (12 semitones + 0.00 cents) + pitch 18: #0048 (12 semitones + 0.00 cents) -- octave is C (#48) ``` While it's somewhat cumbersome to calculate these detune values in @@ -87,7 +86,8 @@ to play the next note (or next silence). If the specified ADSR ports have a shorter duration, the *mode* defines how to extend the pitch (using the *note type* bits). If the ADSR ports have a longer duration, then the ADSR will be shortened to fit, starting with S/R -but also truncating D and A if necessary. +but also truncating D and A if necessary. If duration is zero the +duration will be calculated dynamically from ADSR, as it is now. Composers can choose a duration for the smallest subdivision needed (e.g. 125ms per 16th note to achieve 120 bpm) and then compute precise @@ -127,45 +127,51 @@ ADDR SIZE NAME DESCRIPTION MODES -Mode consists of the bits `Lxxx WWNN`. +Mode consists of the bits `xxWW xAAA`. -The `N` bits correspond to note type: +The `A` bits correspond to articulation, which determines how to fill +extra space when the duration exceeds the envelope length, and also +which parts of the envelope to exclude (if any) which is denoted with +an underscore: ``` -0x00 (xxxx xx00) standard note (uses ADSR, extends with silence) -0x01 (xxxx xx01) staccato note (uses ADR, ignores S, extends with silence) -0x02 (xxxx xx10) legato note (uses ADSR, extends S as needed) -0x03 (xxxx xx11) slurred note (uses S, ignores ADR, extends S as needed) +0x00 (xxxx x000) regular (ADSR, pads with silence) +0x01 (xxxx x001) short (AD_R, pads with silence) +0x02 (xxxx x010) staccato (_D_R, pads with silence) +0x03 (xxxx x011) staccatissimo (_D__, pads with silence) +0x04 (xxxx x100) legato (ADSR, extends S) +0x05 (xxxx x101) begin slur (ADS_, extends S) +0x06 (xxxx x110) slur (__S_, extends S) +0x07 (xxxx x111) end slur (__SR, extends S) ``` -Since we are no longer computing note duration from the ADSR -durations, the note type specifies what to do when the duration is -different than the envelope. For shorter durations, the sustain and/or -release are truncated; for longer durations it varies by type. +For shorter durations, the envelope is truncated starting from the +end. For example, with `short` articulation the release will be +truncated first, then the decay, and finally the attack. + +The `L` bit corresponds to whether to loop or not: + +``` +0x00 (xxxx 0xxx) play once (do not loop) +0x80 (xxxx 1xxx) repeat note indefinitely +``` + +Looping will continue until a new `pitch` is written (at which point +that note's looping behavior will be used). The `W` bits correspond to waveform type: ``` -0x00 (xxxx 00xx) waveform sampled at 44100 Hz (44.1 kHz) -0x40 (xxxx 01xx) waveform sampled at 22050 Hz -0x80 (xxxx 10xx) waveform sampled at 11025 Hz -0xc0 (xxxx 11xx) waveform sampled at 5512 Hz +0x00 (xx00 xxxx) waveform sampled at 44100 Hz (44.1 kHz) +0x40 (xx01 xxxx) waveform sampled at 22050 Hz +0x80 (xx10 xxxx) waveform sampled at 11025 Hz +0xc0 (xx11 xxxx) waveform sampled at 5512 Hz ``` Upsampling will be performed by repeating sample values as many times as needed (2x, 4x, or 8x). The underlying sound engine is still expected to play sounds at 44.1 kHz. -The `L` bit corresponds to whether to loop or not: - -``` -0x00 (0xxx xxxx) play once (do not loop) -0x80 (1xxx xxxx) repeat note indefinitely -``` - -Looping will continue until a new `pitch` is written (at which point -that note's looping behavior will be used). - ## Appendix B: not currently supported There are some features which would be nice to add but which are not