Status Update: February 2025

· im tosti


Well, as could have potentially been expected (I did, but not to this degree), it's currently crunch time where I work, and so I have very little energy to spend on my own projects. As a result, Nabel barely made any progress this month!

What I'm going to do is go about it in a more tiered approach, with the goal of releasing minimal functionality bigints and bigrats in March, which can then be expanded upon further.

So if you were here for Nabel news, here you go, and see you next month. To apologize for this state of affairs, let me talk a bit about some completely unrelated stuff.

Maps #

It has come to my attention that OpenStreetMap around where I live isn't in the best state. I'm planning to start going (starting tomorrow from the date of publication actually) to the nice coffee shop that opened nearby at least weekly. While I'm out and about doing that (as well as during any other walks I take), I figured I'd try and improve its state a bit. If you didn't know about it yet, OpenStreetMap exists, and is very cool. "Oh I use corporate-backed maps actually." Great news, basically all of them (with the notably exception of Google Maps) are backed by OSM.

Persistent HAMTs #

Hash Array-Mapped Tries are really cool. They're also very useful for implementing "Persistent" maps, i.e. those where a modification of it still leaves you with the original, where most of the structure between the two (old and new) is shared. This is desirable for immutable data structure implementations, and is how clojure's maps work.

There's just one small problem: having them be implemented entirely on top of the language's GC is really hard on said GC. Imagine a map you interactively build up, discarding the old versions. Each of those intermediate versions will cause not one, but the entire line of internal nodes to have to be GCd. It can be ok in some circumstances, like with Java's generational GC, but even then it's non-ideal.

I've been interested in writing my own language (I have plenty of ideas and roughly know what I want and what I want) for a while now, and I know I'll want it to be immutable-first, but also for it to have a simple tracing gc. This means that I need to resolve this conflict (or just accept truly terrible performance).

One approach I have in mind is something I call "internal" reference counting. Basically, the "top level" nodes (the ones that represent the entire trie) are handled by the language's GC. However, all of the internal nodes are kept track of by the actual data structure, and are cleaned up during gc via reference counting.

The obvious potential problem here is with concurrency: what if a reference count goes "up" at the same time it goes "down". It's actually not a concern in this case though, for the simple reason that the reference count only goes down during GC. And the GC is stop-the-world. If the GC isn't stop the world, then we can have a lock somewhere, but you know.

I don't have ANY data to back up whether this'll help with the performance by the way. I'd like to find out though! I might try and implement a PoC for janet, just so I can check out how this approach behaves.