nxu/regex.tal

566 lines
18 KiB
Tal
Raw Normal View History

2022-01-22 22:54:51 -05:00
( regex.tal )
( )
( compiles regex expression strings into regex nodes, then uses )
( regex ndoes to match input strings. )
( )
( this currently only supports matching an entire string, as )
( opposed to searching for a matching substring, or extracting )
( matching subgroups. )
( )
( regex node types: )
( )
( NAME DESCRIPTION STRUCT )
( empty matches empty string [ #01 next* ] )
( dot matches any one char [ #02 next* ] )
( lit matches one specific char (c) [ #03 c^ next* ] )
( or matches either left or right [ #04 left* right* ] )
( star matches expr zero-or-more times [ #05 r* next* ] )
( (NOTE: r.expr.next must be r) )
( )
( `or` and `star` have the same structure and are handled by the )
( same code (;do-or). however, the node types are kept different )
( to make it clearer how to parse and assemble the nodes. )
( )
( concatenation isn't a node, it is implied by the *next addr. )
( a next value of #0000 signals the end of the regex. )
( )
( in these docs str* is an address to a null-terminated string. )
( regexes should not include nulls and cannot match them (other )
( than the null which signals the end of a string). )
%null? { #00 EQU }
%debug { #ff #0e DEO }
%emit { #18 DEO }
%space { #20 emit }
%newline { #0a emit }
2022-01-30 14:34:50 -05:00
%quit! { #01 #0f DEO BRK }
2022-01-22 22:54:51 -05:00
2022-01-30 19:13:39 -05:00
( ERROR HANDLING )
2022-01-30 14:34:50 -05:00
2022-01-30 19:13:39 -05:00
( using error! will print the given message before causing )
( the interpreter to halt. )
2022-01-30 14:34:50 -05:00
@error! ( msg* -> )
LIT '! emit space
&loop LDAk ,&continue JCN ,&done JMP
&continue LDAk emit INC2 ,&loop JMP
&done POP2 newline quit!
2022-01-30 19:13:39 -05:00
( error messages )
2022-01-30 14:34:50 -05:00
@unknown-node-type "unknown 20 "node 20 "type 00
@mismatched-parens "mismatched 20 "parenthesis 00
@stack-is-full "stack 20 "is 20 "full 00
@stack-is-empty "stack 20 "is 20 "empty 00
@arena-is-full "arena 20 "is 20 "full 00
2022-01-30 15:11:03 -05:00
@star-invariant "star 20 "invariant 20 "failed 00
2022-02-03 01:32:35 -05:00
@plus-invariant "plus 20 "invariant 20 "failed 00
@qmark-invariant "question 20 "mark 20 "invariant 20 "failed 00
2022-01-22 22:54:51 -05:00
2022-01-30 19:13:39 -05:00
( REGEX MATCHING )
( use stored regex to match against a stored string. )
( )
( regex* should be the address of a compiled regex )
( such as that returned from ;compile. )
( )
( str* should be a null-terminated string. )
( )
( returns true if the string, and false otherwise. )
2022-01-22 22:54:51 -05:00
@match ( str* regex* -> bool^ )
;reset-stack JSR2
;loop JMP2
2022-01-30 19:13:39 -05:00
( loop used during matching )
( )
2022-01-22 22:54:51 -05:00
( we don't use the return stack here since that )
( complicates the back-tracking we need to do. )
( ultimately this code will issue a JMP2r to )
( return a boolean, which is where the stack )
( effects signature comes from. )
@loop ( s* r* -> bool^ )
LDAk #01 EQU ;do-empty JCN2
LDAk #02 EQU ;do-dot JCN2
LDAk #03 EQU ;do-literal JCN2
LDAk #04 EQU ;do-or JCN2
LDAk #05 EQU ;do-or JCN2 ( same code as the or case )
2022-02-03 20:05:10 -05:00
;unknown-node-type ;error! JSR2
2022-01-22 22:54:51 -05:00
2022-01-30 19:13:39 -05:00
( used when we hit a dead-end during matching. )
( )
( if stack is non-empty we have a point we can resume from. )
2022-01-22 22:54:51 -05:00
@goto-backtrack ( -> bool^ )
;stack-exist JSR2 ,&has-stack JCN ( do we have stack? )
#00 JMP2r ( no, return false )
2022-01-30 19:13:39 -05:00
&has-stack ;pop4 JSR2 ;goto-next JMP2 ( yes, resume from the top )
2022-01-22 22:54:51 -05:00
2022-01-30 19:13:39 -05:00
( follow the given address (next*) to continue matching )
2022-01-22 22:54:51 -05:00
@goto-next ( str* next* -> bool^ )
DUP2 #0000 GTH2 ,&has-next JCN
POP2 LDA null? ,&end-of-string JCN
;goto-backtrack JMP2
&end-of-string #01 JMP2r
&has-next ;loop JMP2
2022-01-30 19:13:39 -05:00
( handle the empty node -- just follow the next pointer )
2022-01-22 22:54:51 -05:00
@do-empty ( str* regex* -> bool^ )
INC2 LDA2 ( load next )
;goto-next JMP2 ( jump to next )
2022-01-30 19:13:39 -05:00
( handle dot -- match any one character )
2022-01-22 22:54:51 -05:00
@do-dot ( str* regex* -> bool^ )
INC2 LDA2 STH2 ( load and stash next )
LDAk #00 NEQ ,&non-empty JCN ( is there a char? )
POP2r POP2 ;goto-backtrack JMP2 ( no, clear stacks and backtrack )
&non-empty INC2 STH2r ;goto-next JMP2 ( yes, inc s, restore and jump )
2022-01-30 19:13:39 -05:00
( handle literal -- match one specific character )
2022-01-22 22:54:51 -05:00
@do-literal ( str* regex* -> bool^ )
INC2
LDAk STH ( store c )
INC2 LDA2 STH2 ROTr ( store next, move c to top )
LDAk
STHr EQU ,&matches JCN ( do we match this char? )
POP2r POP2 ;goto-backtrack JMP2 ( no, clear stacks and backtrack )
&matches
INC2 STH2r ;goto-next JMP2 ( yes, inc s, restore and jump )
2022-01-30 19:13:39 -05:00
( handle or -- try the left branch but backtrack to the right if needed )
( )
2022-01-22 22:54:51 -05:00
( this also handles asteration, since it ends up having the same structure )
@do-or ( str* regex* -> bool^ )
INC2 OVR2 OVR2 #0002 ADD2 ( s r+1 s r+3 )
2022-01-29 23:13:10 -05:00
LDA2 ;push4 JSR2 ( save (s, right) in the stack for possible backtracking )
2022-01-22 22:54:51 -05:00
LDA2 ;loop JMP2 ( continue on left branch )
2022-01-30 19:13:39 -05:00
( REGEX PARSING )
2022-01-29 23:49:53 -05:00
( track the position in the input string )
2022-01-29 23:13:10 -05:00
@pos $2
2022-01-29 23:49:53 -05:00
( track how many levels deep we are in parenthesis )
2022-01-29 23:13:10 -05:00
@parens $2
2022-01-30 19:13:39 -05:00
( read and increment pos )
2022-01-29 23:13:10 -05:00
@read ( -> c^ )
;pos LDA2k ( pos s )
LDAk STHk #00 EQU ( pos s c=0 [c] )
,&is-eof JCN ( pos s [c] )
INC2 ( pos s+1 [c] )
SWP2 STA2 ,&return JMP ( [c] )
&is-eof POP2 POP2
2022-01-30 15:11:03 -05:00
&return STHr ( c )
JMP2r
2022-01-29 23:13:10 -05:00
2022-01-30 19:13:39 -05:00
( is pos currently pointing to a star? )
2022-01-30 14:59:30 -05:00
@peek-to-star ( -> is-star^ )
;pos LDA2 LDA LIT '* EQU JMP2r
2022-02-03 01:32:35 -05:00
( is pos currently pointing to a plus? )
@peek-to-plus ( -> is-plus^ )
;pos LDA2 LDA LIT '+ EQU JMP2r
( is pos currently pointing to a qmark? )
@peek-to-qmark ( -> is-qmark^ )
;pos LDA2 LDA LIT '? EQU JMP2r
2022-01-30 19:13:39 -05:00
( just increment pos )
2022-01-30 14:59:30 -05:00
@skip
2022-01-30 19:13:39 -05:00
;pos LDA2 INC2 ;pos STA2 JMP2r
2022-01-30 14:59:30 -05:00
2022-01-30 19:23:44 -05:00
( TODO: )
( 1. character groups: [] and [^] )
2022-02-03 01:32:35 -05:00
( 2. symbolic escapes, e.g. \n )
2022-01-30 19:23:44 -05:00
( STRETCH GOALS: )
( a. ^ and $ )
( b. counts: {n} and {m,n} )
( c. substring matching, i.e. searching )
( d. subgroup extraction )
( e. back-references, e.g \1 )
2022-01-22 22:54:51 -05:00
( compile an expression string into a regex graph )
2022-01-30 19:13:39 -05:00
( )
( the regex will be allocated in the arena; if there is not )
( sufficient space an error will be thrown. )
( )
( the stack will also be used during parsing although unlike )
( the arena it will be released once compilation ends. )
2022-01-22 22:54:51 -05:00
@compile ( expr* -> regex* )
2022-01-29 23:13:10 -05:00
;pos STA2
#0000 ;parens STA2
;reset-stack JSR2
;compile-region JMP2
( the basic strategy here is to build a stack of non-or )
( expressions to be joined together at the end of the )
( region. each stack entry has two regex addresses: )
( - the start of the regex )
( - the current tail of the regex )
( when we concatenate a new node to a regex we update )
( the second of these but not the first. )
( )
( the bottom of the stack for a given region is denoted )
( by #ffff #ffff. above that we start with #0000 #0000 )
( to signal an empty node. )
@compile-region ( -> r2* )
2022-01-29 23:55:05 -05:00
#ffff #ffff ;push4 JSR2 ( stack delimiter )
#0000 #0000 ;push4 JSR2 ( stack frame start )
2022-01-29 23:13:10 -05:00
@compile-region-loop
;read JSR2
DUP #00 EQU ;c-done JCN2
DUP LIT '| EQU ;c-or JCN2
DUP LIT '. EQU ;c-dot JCN2
DUP LIT '( EQU ;c-lpar JCN2
DUP LIT ') EQU ;c-rpar JCN2
DUP LIT '\ EQU ;c-esc JCN2
2022-01-30 15:11:03 -05:00
DUP LIT '* EQU ;c-star JCN2
2022-02-03 01:32:35 -05:00
DUP LIT '+ EQU ;c-plus JCN2
DUP LIT '? EQU ;c-qmark JCN2
2022-01-29 23:13:10 -05:00
;c-char JMP2
2022-01-30 19:13:39 -05:00
( either finalize the given r0/r1 or else wrap it in )
( a star node if a star is coming up next. )
( )
( we use this look-ahead approach rather than compiling )
( star nodes directly since the implementation is simpler. )
2022-01-30 15:11:03 -05:00
@c-peek-and-finalize ( r0* r1* -> r2* )
2022-02-03 01:32:35 -05:00
;peek-to-star JSR2 ( r0 r1 next-is-star? ) ,&next-is-star JCN
;peek-to-plus JSR2 ( r0 r1 next-is-plus? ) ,&next-is-plus JCN
;peek-to-qmark JSR2 ( r0 r1 next-is-qmark? ) ,&next-is-qmark JCN
,&finally JMP ( r0 r1 )
&next-is-star ;skip JSR2 POP2 ;alloc-star JSR2 DUP2 ,&finally JMP
&next-is-plus ;skip JSR2 POP2 ;alloc-plus JSR2 DUP2 ,&finally JMP
&next-is-qmark ;skip JSR2 POP2 ;alloc-qmark JSR2 DUP2 ,&finally JMP
2022-01-30 15:11:03 -05:00
&finally ;push-next JSR2 ;compile-region-loop JMP2
2022-01-30 14:59:30 -05:00
2022-01-30 19:13:39 -05:00
( called when we reach EOF of the input string )
( )
( as with c-rpar we have to unroll the current level )
( of the stack, building any or-nodes that are needed. )
( )
( this is where we detect unclosed parenthesis. )
2022-01-29 23:13:10 -05:00
@c-done ( c^ -> r2* )
POP
;parens LDA2 #0000 GTH2 ,&mismatched-parens JCN
2022-01-29 23:55:05 -05:00
;unroll-stack JSR2 POP2 JMP2r
2022-01-30 14:34:50 -05:00
&mismatched-parens ;mismatched-parens ;error! JSR2
2022-01-29 23:13:10 -05:00
2022-01-30 19:13:39 -05:00
( called when we read "|" )
( )
( since we defer building or-nodes until the end of the region )
( we just start a new stack frame and continue. )
2022-01-29 23:13:10 -05:00
@c-or ( c^ -> r2* )
POP
#0000 #0000 ;push4 JSR2
;compile-region-loop JMP2
2022-01-30 19:13:39 -05:00
( called when we read "(" )
( )
( this causes us to: )
( )
( 1. increment parens )
( 2. start a new region on the stack )
( 3. jump to compile-region to start parsing the new region )
2022-01-29 23:13:10 -05:00
@c-lpar ( c^ -> r2* )
POP
;parens LDA2 INC2 ;parens STA2 ( parens++ )
;compile-region JMP2
2022-01-30 19:13:39 -05:00
( called when we read ")" )
( )
( this causes us to: )
( )
( 1. check for mismatched parens )
( 2. decrement parens )
( 3. unroll the current region on the stack into one regex node )
( 4. finalize that node and append it to the previous region )
( 5. continue parsing )
2022-01-29 23:13:10 -05:00
@c-rpar ( c^ -> r2* )
2022-01-29 23:55:05 -05:00
POP
;parens LDA2 #0000 EQU2 ,&mismatched-parens JCN
;parens LDA2 #0001 SUB2 ;parens STA2 ( parens-- )
2022-01-29 23:13:10 -05:00
;unroll-stack JSR2
2022-01-30 15:11:03 -05:00
;c-peek-and-finalize JMP2
2022-01-30 14:34:50 -05:00
&mismatched-parens ;mismatched-parens ;error! JSR2
2022-01-29 23:13:10 -05:00
2022-01-30 19:13:39 -05:00
( called when we read "." )
( )
( allocates a dot-node and continues. )
2022-01-29 23:13:10 -05:00
@c-dot ( c^ -> r2* )
POP
2022-01-30 14:59:30 -05:00
;alloc-dot JSR2 ( dot )
2022-01-30 15:11:03 -05:00
DUP2 ;c-peek-and-finalize JMP2
2022-01-29 23:13:10 -05:00
( TODO: escaping rules not quite right )
2022-01-30 19:13:39 -05:00
( called when we read "\" )
( )
( allocates a literal of the next character. )
( )
( this doesn't currently handle any special escape sequences. )
2022-01-29 23:13:10 -05:00
@c-esc ( c^ -> r2* )
2022-01-29 23:55:05 -05:00
POP
;read JSR2
2022-01-29 23:13:10 -05:00
;c-char JMP2
2022-01-30 19:13:39 -05:00
( called when we read any other character )
( )
( allocates a literal-node and continues. )
@c-char ( c^ -> r2* )
;alloc-lit JSR2 ( lit )
DUP2 ;c-peek-and-finalize JMP2
( called if we parse a "*" )
( )
( actually calling this means the code broke an invariant somewhere. )
2022-01-29 23:13:10 -05:00
@c-star ( c^ -> regex* )
2022-01-29 23:55:05 -05:00
POP
2022-01-30 15:11:03 -05:00
;star-invariant ;error! JSR2
2022-01-29 23:13:10 -05:00
2022-02-03 01:32:35 -05:00
( called if we parse a "+" )
( )
( actually calling this means the code broke an invariant somewhere. )
@c-plus ( c^ -> regex* )
POP
;plus-invariant ;error! JSR2
( called if we parse a "?" )
( )
( actually calling this means the code broke an invariant somewhere. )
@c-qmark ( c^ -> regex* )
POP
;qmark-invariant ;error! JSR2
2022-01-30 19:13:39 -05:00
( ALLOCATING REGEX NDOES )
2022-01-29 23:13:10 -05:00
@alloc3 ( mode^ -> r* )
#0000 ROT ( 00 00 mode^ )
2022-01-30 14:34:50 -05:00
#03 ;alloc JSR2 ( 00 00 mode^ addr* )
2022-01-30 19:13:39 -05:00
STH2k STA ( addr <- mode )
STH2kr INC2 STA2 ( addr+1 <- 0000 )
STH2r JMP2r ( return addr )
2022-01-29 23:13:10 -05:00
@alloc-empty ( -> r* )
#01 ;alloc3 JMP2
@alloc-dot ( -> r* )
#02 ;alloc3 JMP2
@alloc-lit ( c^ -> r* )
2022-01-30 19:13:39 -05:00
#03 #0000 SWP2 ( 0000 c^ 03 )
#04 ;alloc JSR2 ( 0000 c^ 03 addr* )
STH2k STA ( addr <- 03 )
STH2kr INC2 STA ( addr+1 <- c )
STH2kr #0002 ADD2 STA2 ( addr+2 <- 0000 )
STH2r JMP2r ( return addr )
2022-01-29 23:13:10 -05:00
@alloc-or ( right* left* -> r* )
#05 ;alloc JSR2 STH2 ( r l [x] )
#04 STH2kr STA ( r l [x] )
STH2kr INC2 STA2 ( r [x] )
STH2kr #0003 ADD2 STA2 ( [x] )
STH2r JMP2r
@alloc-star ( expr* -> r* )
2022-01-29 23:55:05 -05:00
#05 ;alloc JSR2 STH2 ( expr [r] )
#05 STH2kr STA ( expr [r] )
DUP2 STH2kr INC2 STA2 ( expr [r] )
#0000 STH2kr #0003 ADD2 STA2 ( expr [r] )
STH2kr SWP2 ( r expr [r] )
2022-01-29 23:13:10 -05:00
;set-next JSR2 ( [r] )
STH2r JMP2r
2022-02-03 01:32:35 -05:00
@alloc-plus ( expr* -> r* )
#05 ;alloc JSR2 STH2 ( expr [r] )
#05 STH2kr STA ( expr [r] )
DUP2 STH2kr INC2 STA2 ( expr [r] )
#0000 STH2kr #0003 ADD2 STA2 ( expr [r] )
STH2r SWP2 STH2k ( r expr [expr] )
;set-next JSR2 ( [expr] )
STH2r JMP2r
@alloc-qmark ( expr* -> r* )
;alloc-empty JSR2 STH2k ( expr e [e] )
OVR2 ;set-next JSR2 ( expr [e] )
#05 ;alloc JSR2 STH2 ( expr [r e] )
#04 STH2kr STA ( expr [r e] )
STH2kr INC2 STA2 ( [r e] )
SWP2r STH2r STH2kr ( e r [r] )
#0003 ADD2 STA2 ( [r] )
STH2r JMP2r
2022-02-02 17:39:08 -05:00
( if r is 0000, allocate an empty node )
@alloc-if-null ( r* -> r2* )
ORAk ,&return JCN POP2 ;alloc-empty JSR2 &return JMP2r
2022-01-30 19:13:39 -05:00
( unroll one region of the parsing stack, returning )
2022-01-29 23:13:10 -05:00
( a single node consisting of an alternation of )
( all elements on the stack. )
( )
( this unrolls until it hits #ffff #ffff, which it )
( also removes from the stack. )
@unroll-stack ( -> start* end* )
2022-01-29 23:56:57 -05:00
;pop4 JSR2 STH2 ( r )
#00 STH ( count items in stack frame )
2022-02-02 17:39:08 -05:00
;alloc-if-null JSR2 ( replace 0000 with empty )
2022-01-29 23:56:57 -05:00
&loop ( r* )
;pop4 JSR2 POP2 ( r x )
DUP2 #ffff EQU2 ( r x x-is-end? ) ,&done JCN
INCr ( items++ )
;alloc-or JSR2 ( r|x ) ,&loop JMP
2022-01-29 23:13:10 -05:00
&done
2022-01-29 23:56:57 -05:00
( r ffff )
2022-01-29 23:13:10 -05:00
POP2
STHr ,&is-or JCN
STH2r JMP2r
&is-or
POP2r
;alloc-empty JSR2 OVR2 OVR2 SWP2 ( r empty empty r )
2022-02-02 17:39:08 -05:00
;set-next-or JSR2
2022-01-29 23:13:10 -05:00
JMP2r
( add r to the top of the stock. )
( )
( in particular, this will write r into tail.next )
( before replacing tail with r. )
@push-next ( r0 r1 -> )
;pop4 JSR2 ( r0 r1 x0 x1 )
DUP2 #0000 EQU2 ( r0 r1 x0 x1 x1=0? ) ,&is-zero JCN
STH2 ROT2 STH2r ( r1 x0 r0 x1 )
;set-next JSR2 SWP2 ( x0 r1 )
;push4 JSR2
JMP2r
&is-zero POP2 POP2 ;push4 JSR2 JMP2r
2022-01-30 19:13:39 -05:00
( load the given address: )
( )
( 1. if it points to 0000, update it to target )
( 2. otherwise, call set-next on it )
2022-01-29 23:13:10 -05:00
@set-next-addr ( target* addr* -> )
LDA2k #0000 EQU2 ( target addr addr=0? ) ,&is-zero JCN
2022-01-29 23:56:57 -05:00
LDA2 ;set-next JSR2 JMP2r
&is-zero STA2 JMP2r
2022-01-29 23:13:10 -05:00
( set regex.next to target )
@set-next ( target* regex* -> )
2022-01-29 23:59:12 -05:00
LDAk #01 NEQ ,&!1 JCN INC2 ;set-next-addr JSR2 JMP2r
&!1 LDAk #02 NEQ ,&!2 JCN INC2 ;set-next-addr JSR2 JMP2r
2022-02-02 17:39:08 -05:00
&!2 LDAk #03 NEQ ,&!3 JCN #0002 ADD2 ;set-next-addr JSR2 JMP2r
&!3 LDAk #04 NEQ ,&!4 JCN INC2 ;set-next-addr JSR2 JMP2r
2022-01-29 23:59:12 -05:00
&!4 LDAk #05 NEQ ,&!5 JCN #0003 ADD2 ;set-next-addr JSR2 JMP2r
2022-02-03 20:05:10 -05:00
&!5 ;unknown-node-type ;error! JSR2
2022-02-02 17:39:08 -05:00
@set-next-or-addr ( target* addr* -> )
LDA2k #0000 EQU2 ( target addr addr=0? ) ,&is-zero JCN
LDA2 ;set-next-or JSR2 JMP2r
&is-zero STA2 JMP2r
( this is used when first building or-nodes )
( structure will always be: )
( [x1, [x2, [x3, ..., [xm, xn]]]] )
( so we recurse on the right side but not the left. )
@set-next-or ( target* regex* -> )
LDAk #04 NEQ ,&!4 JCN
OVR2 OVR2 INC2 ;set-next-addr JSR2
#0003 ADD2 ;set-next-or-addr JSR2 JMP2r
&!4 ;set-next JMP2
2022-01-29 23:13:10 -05:00
2022-01-30 19:13:39 -05:00
( STACK OPERATIONS )
( )
( we always push/pop 4 bytes at a time. the stack has a fixed )
( maximum size it can use, defined by ;stack-top. )
( )
( the stack can be cleared using ;reset-stack, which resets )
( the stack pointers but does not zero out any memory. )
2022-01-30 19:26:16 -05:00
( )
( stack size is 4096 bytes here but is configurable. )
( in some cases it could be very small but this will limit )
( how many branches can be parsed and executed. )
2022-01-22 22:54:51 -05:00
2022-01-30 19:13:39 -05:00
( push 4 bytes onto the stack )
2022-01-29 23:13:10 -05:00
@push4 ( str* regex* -> )
2022-01-30 14:34:50 -05:00
;assert-stack-avail JSR2 ( check for space )
2022-01-22 22:54:51 -05:00
;stack-pos LDA2 #0002 ADD2 STA2 ( cell[2:3] <- regex )
;stack-pos LDA2 STA2 ( cell[0:1] <- str )
;stack-pos LDA2 #0004 ADD2 ;stack-pos STA2 ( pos += 4 )
JMP2r
2022-01-30 19:13:39 -05:00
( pop 4 bytes from the stack )
2022-01-29 23:13:10 -05:00
@pop4 ( -> str* regex* )
2022-01-30 14:34:50 -05:00
;assert-stack-exist JSR2 ( check for space )
2022-01-22 22:54:51 -05:00
;stack-pos LDA2 ( load stack-pos )
#0002 SUB2 LDA2k STH2 ( pop and stash regex )
#0002 SUB2 LDA2k STH2 ( pop and stash str )
;stack-pos STA2 ( save new stack-pos )
STH2r STH2r ( restore str and regex )
JMP2r
2022-02-02 17:39:08 -05:00
( -> size^ )
@frame-size
#00 STH ;stack-pos LDA2
&loop
#0004 SUB2 LDA2k #ffff EQU2 ,&done JCN
INCr ,&loop JMP
&done
STHr JMP2r
2022-01-30 19:13:39 -05:00
( reset stack pointers )
2022-01-22 22:54:51 -05:00
@reset-stack ( -> )
;stack-bot ;stack-pos STA2 JMP2r ( pos <- 0 )
2022-01-29 23:13:10 -05:00
2022-01-30 19:13:39 -05:00
( can more stack be allocated? )
2022-01-22 22:54:51 -05:00
@stack-avail ( -> bool^ )
;stack-pos LDA2 ;stack-top LTH2 JMP2r
2022-01-30 19:13:39 -05:00
( is the stack non-empty? )
2022-01-22 22:54:51 -05:00
@stack-exist ( -> bool^ )
;stack-pos LDA2 ;stack-bot GTH2 JMP2r
2022-01-30 19:13:39 -05:00
( error if stack is full )
2022-01-30 14:34:50 -05:00
@assert-stack-avail ( -> )
;stack-avail JSR2 ,&ok JCN ;stack-is-full ;error! JSR2 &ok JMP2r
2022-01-30 19:13:39 -05:00
( error is stack is empty )
2022-01-30 14:34:50 -05:00
@assert-stack-exist ( -> )
;stack-exist JSR2 ,&ok JCN ;stack-is-empty ;error! JSR2 &ok JMP2r
2022-01-22 22:54:51 -05:00
2022-01-30 19:13:39 -05:00
( stack-pos points to the next free stack position (or the top if full). )
2022-01-22 22:54:51 -05:00
@stack-pos :stack-bot ( the next position to insert at )
2022-01-30 19:13:39 -05:00
( stack-bot is the address of the first stack position. )
( stack-top is the address of the first byte beyond the stack. )
2022-01-22 22:54:51 -05:00
@stack-bot $1000 @stack-top ( holds 1024 steps (4096 bytes) )
2022-01-30 19:13:39 -05:00
( ARENA OPERATIONS )
( )
( the arena represents a heap of memory that can easily be )
( allocated in small amounts. )
( )
( the entire arena can be reclaimed using ;reset-arena, but )
( unlike systems such as malloc/free, the arena cannot relcaim )
( smaller amounts of memory. )
( )
( the arena is used to allocate regex graph nodes, which are )
( dynamically-allocated as the regex string is parsed. once )
( a regex is no longer needed the arena may be reclaimed. )
2022-01-30 19:26:16 -05:00
( )
( arena size is 1024 bytes here but is configurable. )
( smaller sizes would likely be fine but will limit the )
( overall complexity of regexes to be parsed and executed. )
2022-01-29 23:13:10 -05:00
2022-01-30 19:13:39 -05:00
( reclaim all the memory used by the arena )
2022-01-22 22:54:51 -05:00
@reset-arena ( -> )
;arena-bot ;arena-pos STA2 JMP2r
2022-01-30 19:13:39 -05:00
( currently caller is responsible for zeroing out memory if needed )
2022-01-22 22:54:51 -05:00
@alloc ( size^ -> addr* )
#00 SWP ( size* )
2022-01-30 14:34:50 -05:00
;arena-pos LDA2 STH2k ADD2 ( pos+size* [pos] )
DUP2 ;arena-top GTH2 ( pos+size pos+size>top? [pos] )
,&error JCN ( pos+size [pos] )
;arena-pos STA2 ( pos += size [pos] )
STH2r JMP2r ( pos )
&error POP2 POP2r ;arena-is-full ;error! JSR2
2022-01-22 22:54:51 -05:00
@arena-pos :arena-bot ( the next position to allocate )
@arena-bot $400 @arena-top ( holds up to 1024 bytes )