UXN ROM METADATA PROPOSAL by d6 TL;DR SUMMARY i'm proposing adding four bytes to the start of a rom. - "uxn0" means there is no additional metadata - "uxn1" means there is up to 5904 bytes of additional metadata - roms not starting with "uxn" are treated as having no metadata emulators will need to skip the metadata to load program memory. for "uxn0" that means skipping those four bytes. for "uxn1" it means reading the next two bytes and skipping the metadata based on that. for roms not starting in "uxn" no skipping is needed. this metadata could be used by other roms such as loader.tal, emulators, or even websites cataloging uxn roms. INTRODUCTION currently uxn rom files are just the data that will be loaded into the VMs memory on start up (starting with address 0x100 since the zero page is skipped). this means that the maximum rom size is 65280 bytes, although most roms are smaller since trailing zeros are left out. this simplicity is great, but comes with some downsides: - roms aren't identifiable beyond their file name - roms don't contain any attribution information, credits, or licenses - roms don't contain a version information (rom version or uxn version) - roms don't contain any icon or preview information while it would be nice to just start requiring all of these things that would create a major burden on assembler and emulator authors. i think there's a smoother path forward. PROPOSAL i propose adding four bytes to the start of every rom: - the literal 3 bytes "uxn" - a fourth metadata mode byte the bytes "uxn" correspond to the instructions STA2r ADD2r JSR2r, so we aren't at risk of creating an ambiguity with valid uxn roms which previously would have worked, since a rom starting with STA2r would immediately crash. this proposal just covers metadata modes 0 and mode 1, but in the future we could have up to 254 other modes to use (though we might choose to forbid those later to keep things simple). UXN0 FORMAT the "uxn0" format would be exactly what we have now, just with those four bytes at the start. assembler authors could choose to only support creating "uxn0" roms without very much extra effort over what they do now. emulator authors could easily adapt their current work to read this format. rom files that lack a "uxn" at the start would continue to work (though in the future we might choose to deprecate this). UXN1 FORMAT the "uxn1" format would provide some extra metadata: - total-size (2 bytes): total metadata size (including "uxn1") - uxn-version (2 bytes): 0x0000 for unspecified, 0x0001 for current - name-size (1 byte): size of the following name string in bytes - name (n bytes): the name string (ASCII/UTF-8) - version-size (1 byte): size of the version string in bytes - version (n bytes): the version string (ASCII/UTF-8) - author-size (1 byte): size of the author string in bytes - author (n bytes): the author string (ASCII/UTF-8) - desc-size (2 bytes): size of the description string in bytes - desc (n bytes): the description string (ASCII/UTF-8) (4096 max) - icon-type (1 byte): the size and depth of the icon - icon-palette (n bytes): the icon's color theme (6 max) - icon-data (n bytes): the icon's ICN or CHR data (1024 max) we limit descriptions to a 4096 byte maximum. this helps put a reasonable upper bound on the size of metadata. the minimal "uxn1" header size (assuming the strings and icon are all empty) would be 10 bytes (2 + 2 + 1 + 1 + 1 + 2 + 1). emulator implementors could read total-size and then seek past this metadata to read the rom data. UXN1 ICON TYPES the icon types would be defined by: - bit 8: is icon present? (0x80 yes, 0x00 no) - bit 7: transparency of color 1? (0x40 transparent, 0x00 solid) - bit 6: color depth? (0x20 2-bit color (CHR), 0x00 1-bit color (ICN)) - bits 3-5: unused - bits 1-2: icon dimensions (0x00: 8x8, 0x01: 16x16, 0x02: 32x32, 0x03: 64x64) so in table form that would mean: ICON ICON PALETTE IMAGE DATA TRANSPARENT BYTE FORMAT SIZE SIZE COLOR 1? 0x00 no icon 0 bytes 0 bytes n/a 0x80 8x8 ICN 3 bytes 8 bytes no 0x81 16x16 ICN 3 bytes 32 bytes no 0x82 32x32 ICN 3 bytes 128 bytes no 0x83 64x64 ICN 3 bytes 512 bytes no 0xa0 8x8 CHR 6 bytes 16 bytes no 0xa1 16x16 CHR 6 bytes 64 bytes no 0xa2 32x32 CHR 6 bytes 256 bytes no 0xa3 64x64 CHR 6 bytes 1024 bytes no 0xc0 8x8 ICN 3 bytes 8 bytes yes 0xc1 16x16 ICN 3 bytes 32 bytes yes 0xc2 32x32 ICN 3 bytes 128 bytes yes 0xc3 64x64 ICN 3 bytes 512 bytes yes 0xe0 8x8 CHR 6 bytes 16 bytes yes 0xe1 16x16 CHR 6 bytes 64 bytes yes 0xe2 32x32 CHR 6 bytes 256 bytes yes 0xe3 64x64 CHR 6 bytes 1024 bytes yes icons would be stored in 8x8 tiles, as mandated by the ICN and CHR formats. they would be read left-to-right, top-to-bottom. while external applications might have an easier time with other formats (e.g. BMP, PNG, etc.) this icon format is primarily designed for ease of use by other uxn roms running inside varvara. the maximal "uxn1" header size (assuming all strings are maximum length and using an icon format of 0xe3) would be 5904 bytes. while this is substantial it is unlikely to push most rom sizes over 64k (the point at which working with them from within varvara becomes annoying). since the maximum rom data size is 65280, authors are guaranteed to be able to use up to 255 bytes of metadata while still keeping the total rom size under 65536 even if they are using the maximum amount of rom data. TEXT FORMATS uxn isn't likely to ever support unicode very well, so why did i write ASCII/UTF-8 for text formats? my take is that the text in roms is likely to be used both by uxn but also by external systems that probably _can_ handle UTF-8. i expect most authors to stick to the ASCII subset that uxn can handle. we have some alternatives to consider: (a) require 7-bit ASCII only (b) add metadata about text encoding (c) mandate another particular encoding (latin-1) (d) leave the behavior of 8-bit values (0x80 - 0xff) unspecified i don't think (a) has any advantages over UTF-8 (uxn programs will still need to ignore 8-bit inputs in either case). option (b) sounds like a nightmare from within uxn programs and outside varvara UTF-8 is about as general as we really need to be (in my opinion). option (c) doesn't feel better than just limiting ourselves to ASCII (but maybe that's my own cultural bias speaking) and (d) sounds like total chaos. so my take is that inside varvara only ASCII values are likely to be well-supported, but for display outside varvara UTF-8 feels like the best option (e.g. allowing authors to write their names correctly). another weirder option would be to provide graphical tiles or font data that authors could use to encode program text. embedding a font/tileset just for handling metadata feels very heavy but it would ensure authors can display metadata in any language they can draw. TOOLING one advantage of this proposal is that assemblers can just produce "uxn0" and let some other tool prepend metadata later. authors could run a metadata tool which reads a uxn rom along with the desired strings and icon data and produces a full-featured "uxn1" rom. additionally it would be fairly easy to write tools to extract the metadata from roms into other formats that are easy for external tools to consume and work with (e.g. PNG or BMP for icons, JSON or TXT for metadata, etc.) CONCLUSION adding metadata to roms is undoubtedly annoying but will pay real dividends as we move forward. among other things, it will: - ensure authors are credited for their work - display what license or copyright covers a rom - let users reliably determine which rom version is newest - allow launchers to display nice images and names - make it easier to produce online catalogs of uxn roms - provide dates, places, and other historical info - let authors to write dedications or nice messages thanks for considering this feature.