Synbil: what?
I think I've stumbled across a way to build languages more portably by default than the traditional methods. The potential in this method's other benefits, such as full embedding-as-a-library, makes me quite excited for speedup of famously slow languages.
To explain what it is that I found, I have to explain a common pattern that I'd run into when developing compilers.
So here's the scenario right
I'm on system A, with language X, and want to make language Y available on A.
I write the bootstrap compiler in (or transpile it to) X, now Y is on A. I like to get things working before I get them working well, so the backend of this compiler usually is Y itself, rather than whatever assembly is on A. All well and good, but this is where the headache begins...
Now nine times out of ten, if I want a compiler for Y, I want a self-hosted compiler. But then I've ended up having to write two compilers for the benefits of one, and generally endured bootstrapping headaches. Why bootstrap at all? This led to a provocative question.
What if we shrunk the compiler by building syntax in a metalanguage?
This would certainly fix the problem at hand. Instead of "building a compiler" for language X, make a library in metalanguage M that implements the precedence and parsing, make a library that implements macros for the syntax to transform it into Y, and you're done!
No more compiler-bootstrap
and compiler
in X and Y, just two versions in M, one 100% in language X, and one mostly in language Y, with the rest being the implementation of Y, written in X.
Thinking about this led to more provocative questions, mostly centered around the idea of making a language not just portable, but downright pocket-able.
Why not build syntax incrementally?
Why not write syntax in our code?
How small can the metalanguage be?
and maybe even,
Can we implement types in userspace?
We'll talk about all of these.
These all had immediate problems, which I'll go though now, as if we were designing it together, and working through them one by one, before we talk about specific design.
Problem 1: Now we have to deal with a metalanguage
I know, not ideal. We'll settle by making it as small as possible. We'll decide now that only primitives for syntax-building are going to be included, whatever those look like.
Problem 2: Writing syntax uses syntax
On the one hand, it'd be nice to implement a language as a library. But how does one go about implementing the syntax? We need code that runs, and to write explicit parser code in userspace would put us right at square one.
Instead, we imagine a metalanguage where the compiler and programmer are in constant communication, with the programmer explicitly building some sort of parser throughout the metaprogram. I've thought of a few ways to do this:
- Every time symbols are added, build an entire BNF tree. Metacompiler generates parser from BNF.
Problem 3: Imports require a clean namespace
The whole idea with modular programming, imports, and code reuse is the assumption that one's namespace is kept clean, to prevent things like value, module, or type name clashes.
But we're not importing variable names here, it's much more inconvenient than that! We're importing syntax names, which means symbol clashes need to be resolved before the import, and symbols may contain several identifiers - think about the symbol if{_}else{_}
. Let's work through some possible solutions here:
- Maybe you only use one language at a time? If you want to parse a section as another language, switch it fully, and leave the metalanguage's symbols behind?
- Sadly, this would make so many usecases impossible that it just needs vetoed
- Maybe keep the metalanguage's symbols, but rename them if necessary?
- This could work, but renaming symbols manually makes inter-op via embedding uses difficult and painful.
- This would require at least one symbol to be used for changing the metalanguage's symbols, which does suck a bit
- Do we even need all of the metalanguage's symbols? At least some of the time that we import a language, we might really just want to use it, and return a value. No syntax building in the language imported, nor usage of other language. In that case, why have anything but the symbols to destroy and build syntax?