nxu/proposal.txt

194 lines
8.1 KiB
Plaintext

UXN ROM METADATA PROPOSAL
by d6
TL;DR SUMMARY
i'm proposing adding four bytes to the start of a rom.
- "uxn0" means there is no additional metadata
- "uxn1" means there is up to 5904 bytes of additional metadata
- roms not starting with "uxn" are treated as having no metadata
emulators will need to skip the metadata to load program memory. for
"uxn0" that means skipping those four bytes. for "uxn1" it means
reading the next two bytes and skipping the metadata based on that.
for roms not starting in "uxn" no skipping is needed.
this metadata could be used by other roms such as loader.tal,
emulators, or even websites cataloging uxn roms.
INTRODUCTION
currently uxn rom files are just the data that will be loaded into the
VMs memory on start up (starting with address 0x100 since the zero
page is skipped). this means that the maximum rom size is 65280 bytes,
although most roms are smaller since trailing zeros are left out.
this simplicity is great, but comes with some downsides:
- roms aren't identifiable beyond their file name
- roms don't contain any attribution information, credits, or licenses
- roms don't contain a version information (rom version or uxn version)
- roms don't contain any icon or preview information
while it would be nice to just start requiring all of these things
that would create a major burden on assembler and emulator authors. i
think there's a smoother path forward.
PROPOSAL
i propose adding four bytes to the start of every rom:
- the literal 3 bytes "uxn"
- a fourth metadata mode byte
the bytes "uxn" correspond to the instructions STA2r ADD2r JSR2r, so
we aren't at risk of creating an ambiguity with valid uxn roms which
previously would have worked, since a rom starting with STA2r would
immediately crash.
this proposal just covers metadata modes 0 and mode 1, but in the
future we could have up to 254 other modes to use (though we might
choose to forbid those later to keep things simple).
UXN0 FORMAT
the "uxn0" format would be exactly what we have now, just with those
four bytes at the start. assembler authors could choose to only
support creating "uxn0" roms without very much extra effort over what
they do now. emulator authors could easily adapt their current work to
read this format. rom files that lack a "uxn" at the start would
continue to work (though in the future we might choose to deprecate
this).
UXN1 FORMAT
the "uxn1" format would provide some extra metadata:
- total-size (2 bytes): total metadata size (including "uxn1")
- uxn-version (2 bytes): 0x0000 for unspecified, 0x0001 for current
- name-size (1 byte): size of the following name string in bytes
- name (n bytes): the name string (ASCII/UTF-8)
- version-size (1 byte): size of the version string in bytes
- version (n bytes): the version string (ASCII/UTF-8)
- author-size (1 byte): size of the author string in bytes
- author (n bytes): the author string (ASCII/UTF-8)
- desc-size (2 bytes): size of the description string in bytes
- desc (n bytes): the description string (ASCII/UTF-8) (4096 max)
- icon-type (1 byte): the size and depth of the icon
- icon-palette (n bytes): the icon's color theme (6 max)
- icon-data (n bytes): the icon's ICN or CHR data (1024 max)
we limit descriptions to a 4096 byte maximum. this helps put a
reasonable upper bound on the size of metadata.
the minimal "uxn1" header size (assuming the strings and icon are all
empty) would be 10 bytes (2 + 2 + 1 + 1 + 1 + 2 + 1). emulator
implementors could read total-size and then seek past this metadata to
read the rom data.
UXN1 ICON TYPES
the icon types would be defined by:
- bit 8: is icon present? (0x80 yes, 0x00 no)
- bit 7: transparency of color 1? (0x40 transparent, 0x00 solid)
- bit 6: color depth? (0x20 2-bit color (CHR), 0x00 1-bit color (ICN))
- bits 3-5: unused
- bits 1-2: icon dimensions (0x00: 8x8, 0x01: 16x16, 0x02: 32x32, 0x03: 64x64)
so in table form that would mean:
ICON ICON PALETTE IMAGE DATA TRANSPARENT
BYTE FORMAT SIZE SIZE COLOR 1?
0x00 no icon 0 bytes 0 bytes n/a
0x80 8x8 ICN 3 bytes 8 bytes no
0x81 16x16 ICN 3 bytes 32 bytes no
0x82 32x32 ICN 3 bytes 128 bytes no
0x83 64x64 ICN 3 bytes 512 bytes no
0xa0 8x8 CHR 6 bytes 16 bytes no
0xa1 16x16 CHR 6 bytes 64 bytes no
0xa2 32x32 CHR 6 bytes 256 bytes no
0xa3 64x64 CHR 6 bytes 1024 bytes no
0xc0 8x8 ICN 3 bytes 8 bytes yes
0xc1 16x16 ICN 3 bytes 32 bytes yes
0xc2 32x32 ICN 3 bytes 128 bytes yes
0xc3 64x64 ICN 3 bytes 512 bytes yes
0xe0 8x8 CHR 6 bytes 16 bytes yes
0xe1 16x16 CHR 6 bytes 64 bytes yes
0xe2 32x32 CHR 6 bytes 256 bytes yes
0xe3 64x64 CHR 6 bytes 1024 bytes yes
icons would be stored in 8x8 tiles, as mandated by the ICN and CHR
formats. they would be read left-to-right, top-to-bottom. while
external applications might have an easier time with other formats
(e.g. BMP, PNG, etc.) this icon format is primarily designed for ease
of use by other uxn roms running inside varvara.
the maximal "uxn1" header size (assuming all strings are maximum
length and using an icon format of 0xe3) would be 5904 bytes. while
this is substantial it is unlikely to push most rom sizes over 64k
(the point at which working with them from within varvara becomes
annoying). since the maximum rom data size is 65280, authors are
guaranteed to be able to use up to 255 bytes of metadata while still
keeping the total rom size under 65536 even if they are using the
maximum amount of rom data.
TEXT FORMATS
uxn isn't likely to ever support unicode very well, so why did i write
ASCII/UTF-8 for text formats? my take is that the text in roms is
likely to be used both by uxn but also by external systems that
probably _can_ handle UTF-8. i expect most authors to stick to the
ASCII subset that uxn can handle.
we have some alternatives to consider:
(a) require 7-bit ASCII only
(b) add metadata about text encoding
(c) mandate another particular encoding (latin-1)
(d) leave the behavior of 8-bit values (0x80 - 0xff) unspecified
i don't think (a) has any advantages over UTF-8 (uxn programs will
still need to ignore 8-bit inputs in either case). option (b) sounds
like a nightmare from within uxn programs and outside varvara UTF-8 is
about as general as we really need to be (in my opinion). option (c)
doesn't feel better than just limiting ourselves to ASCII (but maybe
that's my own cultural bias speaking) and (d) sounds like total chaos.
so my take is that inside varvara only ASCII values are likely to be
well-supported, but for display outside varvara UTF-8 feels like the
best option (e.g. allowing authors to write their names correctly).
another weirder option would be to provide graphical tiles or font
data that authors could use to encode program text. embedding a
font/tileset just for handling metadata feels very heavy but it would
ensure authors can display metadata in any language they can draw.
TOOLING
one advantage of this proposal is that assemblers can just produce
"uxn0" and let some other tool prepend metadata later. authors could
run a metadata tool which reads a uxn rom along with the desired
strings and icon data and produces a full-featured "uxn1" rom.
additionally it would be fairly easy to write tools to extract the
metadata from roms into other formats that are easy for external tools
to consume and work with (e.g. PNG or BMP for icons, JSON or TXT for
metadata, etc.)
CONCLUSION
adding metadata to roms is undoubtedly annoying but will pay real
dividends as we move forward. among other things, it will:
- ensure authors are credited for their work
- display what license or copyright covers a rom
- let users reliably determine which rom version is newest
- allow launchers to display nice images and names
- make it easier to produce online catalogs of uxn roms
- provide dates, places, and other historical info
- let authors to write dedications or nice messages
thanks for considering this feature.