uxn metadata proposal

This commit is contained in:
~d6 2022-11-11 21:00:50 -05:00
parent d293f8f84d
commit 09e7609dc8
1 changed files with 143 additions and 33 deletions

View File

@ -1,37 +1,63 @@
uxn rom metadata proposal UXN ROM METADATA PROPOSAL
by d6 by d6
currently uxn rom files are just the data that will be loaded into the VMs TL;DR SUMMARY
memory on start up (starting with address 0x100 since the zero page is
skipped). this means that the maximum rom size is 65280 bytes, although
most roms are much smaller since trailing zeros can be left out.
this simplicity is great, but comes with some (possible) downsides: i'm proposing adding four bytes to the start of a rom.
- "uxn0" means there is no additional metadata
- "uxn1" means there is up to 5904 bytes of additional metadata
- roms not starting with "uxn" are treated as having no metadata
emulators will need to skip the metadata to load program memory. for
"uxn0" that means skipping those four bytes. for "uxn1" it means
reading the next two bytes and skipping the metadata based on that.
for roms not starting in "uxn" no skipping is needed.
this metadata could be used by other roms such as loader.tal,
emulators, or even websites cataloging uxn roms.
INTRODUCTION
currently uxn rom files are just the data that will be loaded into the
VMs memory on start up (starting with address 0x100 since the zero
page is skipped). this means that the maximum rom size is 65280 bytes,
although most roms are smaller since trailing zeros are left out.
this simplicity is great, but comes with some downsides:
- roms aren't identifiable beyond their file name - roms aren't identifiable beyond their file name
- roms don't contain any attribution information, credits, or licenses - roms don't contain any attribution information, credits, or licenses
- roms don't contain a version information (rom version or uxn version) - roms don't contain a version information (rom version or uxn version)
- roms don't contain any icon or preview information - roms don't contain any icon or preview information
while it would be nice to add all of these things that would create a major while it would be nice to just start requiring all of these things
burden on assembler and emulator authors. i think there's an easier path that would create a major burden on assembler and emulator authors. i
forward. think there's a smoother path forward.
PROPOSAL
i propose adding four bytes to the start of every rom: i propose adding four bytes to the start of every rom:
- the literal 3 bytes "uxn" - the literal 3 bytes "uxn"
- a fourth mode byte - a fourth metadata mode byte
this proposal just covers mode 0 and mode 1, but in the future we have up to this proposal just covers metadata modes 0 and mode 1, but in the
254 other modes to use (though we might choose to forbid those later to keep future we could have up to 254 other modes to use (though we might
things simple). choose to forbid those later to keep things simple).
the "uxn0" format would be exactly what we have now, just with those four UXN0 FORMAT
bytes at the start. assembler authors could choose to only support creating
"uxn0" roms without very much extra effort over what they do now. emulator the "uxn0" format would be exactly what we have now, just with those
authors could easily adapt their current work to read this format. rom files four bytes at the start. assembler authors could choose to only
that lack a "uxn" at the start could continue to work as they do now. support creating "uxn0" roms without very much extra effort over what
they do now. emulator authors could easily adapt their current work to
read this format. rom files that lack a "uxn" at the start would
continue to work (though in the future we might choose to deprecate
this).
UXN1 FORMAT
the "uxn1" format would provide some extra metadata: the "uxn1" format would provide some extra metadata:
@ -43,22 +69,106 @@ the "uxn1" format would provide some extra metadata:
- version (n bytes): the version string (ASCII/UTF-8) - version (n bytes): the version string (ASCII/UTF-8)
- author-size (1 byte): size of the author string in bytes - author-size (1 byte): size of the author string in bytes
- author (n bytes): the author string (ASCII/UTF-8) - author (n bytes): the author string (ASCII/UTF-8)
- desc-size (2 bytes): size of the description string in bytes (4096 max) - desc-size (2 bytes): size of the description string in bytes
- desc (n bytes): the description string (ASCII/UTF-8) - desc (n bytes): the description string (ASCII/UTF-8) (4096 max)
- icon-type (1 byte): the size and depth of the icon - icon-type (1 byte): the size and depth of the icon
- icon-palette (n bytes): the icon's color theme - icon-palette (n bytes): the icon's color theme (6 max)
- icon-chr (n bytes): the icon's CHR data - icon-data (n bytes): the icon's ICN or CHR data (1024 max)
the minimal "uxn1" header size (assuming the strings and icon are all empty) we limit descriptions to a 4096 byte maximum. this helps put a
would be 10 bytes (2 + 2 + 1 + 1 + 1 + 2 + 1). emulator implementors could reasonable upper bound on the size of metadata.
read total-size and then seek past this metadata to read the rom data.
the icon types would be: the minimal "uxn1" header size (assuming the strings and icon are all
empty) would be 10 bytes (2 + 2 + 1 + 1 + 1 + 2 + 1). emulator
implementors could read total-size and then seek past this metadata to
read the rom data.
- 0x00: no icon provided (0-byte icon-palette, 0-byte icon-chr) UXN1 ICON TYPES
- 0x01: 1-bit 16x16 icon (2-byte icon-palette, 32-byte icon-chr)
- 0x02: 1-bit 32x32 icon (2-byte icon-palette, 128-byte icon-chr) the icon types would be defined by:
- 0x03: 1-bit 64x64 icon (2-byte icon-palette, 512-byte icon-chr)
- 0x81: 2-bit 16x16 icon (6-byte icon-palette, 64-byte icon-chr) - bit 8: is icon present? (0x80 yes, 0x00 no)
- 0x82: 2-bit 32x32 icon (6-byte icon-palette, 256-byte icon-chr) - bit 7: transparency of color 1? (0x40 transparent, 0x00 solid)
- 0x83: 2-bit 64x64 icon (6-byte icon-palette, 1024-byte icon-chr) - bit 6: color depth? (0x20 2-bit color (CHR), 0x00 1-bit color (ICN))
- bits 3-5: unused
- bits 1-2: icon dimensions (0x00: 8x8, 0x01: 16x16, 0x02: 32x32, 0x03: 64x64)
so in table form that would mean:
ICON ICON PALETTE IMAGE DATA TRANSPARENT
BYTE FORMAT SIZE SIZE COLOR 1?
0x00 no icon 0 bytes 0 bytes n/a
0x80 8x8 ICN 3 bytes 8 bytes no
0x81 16x16 ICN 3 bytes 32 bytes no
0x82 32x32 ICN 3 bytes 128 bytes no
0x83 64x64 ICN 3 bytes 512 bytes no
0xa0 8x8 CHR 6 bytes 16 bytes no
0xa1 16x16 CHR 6 bytes 64 bytes no
0xa2 32x32 CHR 6 bytes 256 bytes no
0xa3 64x64 CHR 6 bytes 1024 bytes no
0xc0 8x8 ICN 3 bytes 8 bytes yes
0xc1 16x16 ICN 3 bytes 32 bytes yes
0xc2 32x32 ICN 3 bytes 128 bytes yes
0xc3 64x64 ICN 3 bytes 512 bytes yes
0xe0 8x8 CHR 6 bytes 16 bytes yes
0xe1 16x16 CHR 6 bytes 64 bytes yes
0xe2 32x32 CHR 6 bytes 256 bytes yes
0xe3 64x64 CHR 6 bytes 1024 bytes yes
icons would be stored in 8x8 tiles, as mandated by the ICN and CHR
formats. they would be read left-to-right, top-to-bottom. while
external applications might have an easier time with other formats
(e.g. BMP, PNG, etc.) this icon format is primarily designed for ease
of use by other uxn roms running inside varvara.
the maximal "uxn1" header size (assuming all strings are maximum
length and using an icon format of x0a3) would be 5904 bytes. while
this is substantial it is unlikely to push most rom sizes over 64k
(the point at which working with them from within varvara becomes
annoying). since the maximum rom data size is 65280, authors are free
to use up to 255 bytes of metadata while still keeping the total rom
size under 65536.
TEXT FORMATS
uxn isn't likely to ever support unicode very well, so why did i write
ASCII/UTF-8 for text formats? my take is that the text in roms is
likely to be used both by uxn but also by exeternal systems that
probably _can_ handle UTF-8. i expect most authors to stick to the
ASCII subset that uxn can handle.
we have some alternatives to consider:
(a) require 7-bit ASCII only
(b) add metadata about text encoding
(c) mandate another particular encoding (latin-1)
(d) leave the behavior of 8-bit values (0x80 - 0xff) unspecified
i don't think (a) has any advantages of UTF-8 (uxn programs will still
need to ignore 8-bit inputs in either case). option (b) sounds like a
nightmare from within uxn programs and outside varvara UTF-8 is about
as general as we really need to be in my opinion. option (c) doesn't
feel much better than just limiting ourselves to ASCII (but maybe
that's my own cultural bias speaking) and (d) sounds like a nightmare.
so my take is that inside varvara only ASCII values are likely to be
well-supported, but for display outside varvara UTF-8 feels like the
best option (e.g. allowing author's to write their names correctly).
another weirder option would be to provide graphical tiles or font
data that authors could use to encode program text. embedding a
font/tileset just for handling metadata feels very heavy but it would
ensure authors can display metadata in any language they can draw.
CONCLUSION
adding metadata to roms is undoubtedly annoying but will pay real
dividends as we move forward. among other things, it will:
- ensure authors are credited for their work
- display what license or copyright covers a rom
- let users reliably determine which rom version is newest
- allow launchers to display nice images and names
- provide dates, places, and other historical info
- let authors to write dedications or nice messages
thanks for considering this feature.