Commit graph

15 commits

Author SHA1 Message Date
aa6346eafa tokenizer: Add backtick delimited strings
This is useful when writing strings that contain a lot of backslashes,
e.g. in regular expressions.
2023-11-07 21:45:27 +01:00
387b4bfd99 tokenizer: Use same number parsing code as in tonumber 2023-09-04 14:04:40 +02:00
0209fe6a51 tokenizer: Use unsigned char 2023-09-03 16:36:10 +02:00
52ea737975 tokenizer: Manage last position automatically when unreading byte 2023-09-03 16:24:50 +02:00
97f5986781 Implement tonumber 2023-07-03 23:52:49 +02:00
6056e3a450 Implement pairs
Pairs are 2-tuples of values that are constructed and matched with the `::`
operator. They can also be matched with a `:` operator, the LHS is an
expression then, the pair will then only match, if the LHS matches the
result of that expression.

Pairs should be useful to do something similar what sum types / tagged
unions do in statically typed languages, e.g. you could write something
like:

    some := (symbol) # Somthing that creates a unique value
    filter-map := {
      _ [] -> []
      f [x ~xs] ->
        {
          some:y -> [y ~(filter-map f xs)]
          nil -> filter-map f xs
        } (f x)
    }
    filter-map {
      x?even -> some :: (* x 10)
      _ -> nil
    } some-list
2023-03-22 23:54:03 +01:00
e0881c558c strings+io: Make chars unsigned 2023-02-13 22:31:18 +01:00
c18df0ab19 Make apfl_string_builder_init easier to use
It was a bit silly that you've first had to declare a string builder
variable and then pass a reference to that into the init function, which
could not fail. Instead just return a ready to use string builder :)
2022-10-30 22:51:51 +01:00
b7c88635d9 Redo source reader interface
The callback and the opaque data are now grouped together in a struct
instead of being passed individually into the tokenizer.

This also exposes the string source reader struct and therefore removes
the need of heap allocating it. Neat!
2022-04-15 22:35:36 +02:00
90a80152e1 Implement mark&sweep garbage collection and bytecode compilation
Instead of the previous refcount base garbage collection, we're now using
a basic tri-color mark&sweep collector. This is done to support cyclical
value relationships in the future (functions can form cycles, all values
implemented up to this point can not).

The collector maintains a set of roots and a set of objects (grouped into
blocks). The GC enabled objects are no longer allocated manually, but will
be allocated by the GC. The GC also wraps an allocator, this way the GC
knows, if we ran out of memory and will try to get out of this situation by
performing a full collection cycle.

The tri-color abstraction was chosen for two reasons:

- We don't have to maintain a list of objects that need to be marked, we
  can simply grab the next grey one.
- It should allow us to later implement incremental collection (right now
  we only do a stop-the-world collection).

This also switches to a bytecode based evaluation of the code: We no longer
directly evaluate the AST, but first compile it into a series of
instructions, that are evaluated in a separate step. This was done in
preparation for inplementing functions: We only need to turn a function
body into instructions instead of evaluating the node again with each call
of the function. Also, since an instruction list is implemented as a GC
object, this then removes manual memory management of the function body and
it's child nodes. Since the GC and the bytecode go hand in hand, this was
done in one (giant) commit.

As a downside, we've now lost the ability do do list matching on
assignments. I've already started to work on implementing this in the new
architecture, but left it out of this commit, as it's already quite a large
commit :)
2022-04-11 22:24:22 +02:00
ebf3fc89ff Introduce allocator abstraction
We now no longer call malloc/free/... directly, but use an allocator object
that is passed around.

This was mainly done as a preparation for a garbage collector: The
collector will need to know, how much memory we're using, introducing the
collector abstraction will allow the GC to hook into the memory allocation
and observe the memory usage.

This has other potential applications:

- We could now be embedded into applications that can't use the libc
  allocator.
- There could be an allocator that limits the total amount of used memory,
  e.g. for sandboxing purposes.
- In our tests we could use this to simulate out of memory conditions
  (implement an allocator that fails at the n-th allocation, increase n by
  one and restart the test until there are no more faked OOM conditions).

The function signature of the allocator is basically exactly the same as
the one Lua uses.
2022-02-08 22:53:13 +01:00
d81bef9184 parser/tokenizer: Save textual data as refcounted strings
This avoids creating refcounted strings during evaluation and makes it
easier to use the same parsed string in multiple places (should be
useful once we implement functions).
2022-01-18 21:18:27 +01:00
6439f4f8ce Tokenizer: Disallow ASCII control characters outside strings 2022-01-07 23:39:06 +01:00
c288c333ca Continue work on parser
Seems that we can parse most things now :). Assignments don't work yet,
thoug. Also we're currently leaking memory pretty badly.
2021-12-15 21:47:17 +01:00
d094ed7bd5 Initial commit 2021-12-15 21:47:17 +01:00