I'm on a train! (Disclaimer: I'm not on a train any more.)
What better thing to do, during the hot, hot July days, thank something semi-mechanical involving very little brain power? With student-facing work briefly calming down (we've just about weathered the storm of appeals and complaints following the publication of exam results), and the all-too-brief Summer holidays approaching, I caught myself thinking that it wasn't fair that Doug and his team should have all the fun of porting SBCL to a new architecture; I want a go, too! (The fact that I have no time notwithstanding.)
But, to what? 64-bit powerpc is definitely on the move; I've missed that boat. SBCL already supports most current architectures, and “supports” a fair number of obsolete ones too. One thought that did catch my brain, though, was prompted by the recent publication by ARM of a set of claims purporting to demonstrate the dangers of the RISC-V architecture, in comparison to ARM's own. Well, since SBCL already runs on 32-bit and 64-bit ARM platforms, how hard could porting to RISC-V be?
I don't know the answer to that yet! This first post merely covers the straightforward – some might even say “tedious” – architecture-neutral changes required to get SBCL to the point that it could start about considering compiling code for a new backend.
The SBCL build has roughly seven phases (depending a bit on how you count):
- build configuration;
- build the cross-compiler using the host Lisp, generating target-specific header files;
- build the runtime and garbage collector using a C compiler and platform assembler, generating an executable;
- build the target compiler using the cross-compiler, generating target-specific fasl files;
- build the initial ("cold") image, using the genesis program to simulate the effect of loading fasl files, generating a target Lisp memory image;
- run all the code that stage 5 delayed because it couldn't simulate the effects of loading fasls (e.g. many side-effectful top-level forms);
- build everything else, primarily PCL (a full implementation CLOS) and save the resulting image.
1. Define a keyword for a new backend architecture
Probably the most straightforward thing to do, and something that
allows me to (pretty much) say that we're one seventh of the way
there: defining a new keyword for a new backend architecture is
almost as simple as adding a line or two to
make-config.sh
.
Almost.
We run some devious dastardly consistency checks on our target
*features*
, and those need to be updated too. This gets
make-config.sh
running, and allows make-host-1.sh
to run its
consistency checks.
2. Construct minimal backend-specific files
Phase 2, building the cross-compiler, sounds like something straightforward: after all, if we don't have the constraint that the cross-compiler will do anything useful, not even run, then surely just producing minimal files where the build system expects to find them will do, and Lisp’s usual late-binding nature will allow us to get away with the rest. No?
Well, that could have been the case, but SBCL itself does impose some constraints, and the minimal files are a lot less minimal than you might think. The translation of IR2 (second intermediate representation) to machine code involves calling a number of VOPs (virtual operations) by name: and those calls by name perform compile-time checking that the virtual operation template name already exists. So we have to define stub VOPs for all those virtual operations called by name, including those in optimizing vector initialisation, for all of our specialised vector types. These minimal definitions do nothing other than pacify the safety checks in the cross-compiler code.
Phase 2 also generates a number of header files used in the
compilation of the C-based runtime, propagating constant and object
layout definitions to the garbage collector and other support
routines. We won't worry about that for now; we just have to ignore
an error about this at first-genesis time for a while. And with this,
make-host-1.sh
runs to completion.
4. Build the target compiler using the cross-compiler
At this point, we have a compiler backend that compiles. It almost
certainly doesn't run, and even when it does run, we will want to ask
it to compile simple forms of our choosing. For now, we
just
add a new file to
the start of the build order, and put in a very simple piece of code,
to see how we get on. (Spoiler alert: not very well). We've added
the :trace-file
flag so that the cross-compiler will print
information about its compilation, which will be useful as and when we
get as far as actually compiling things.
The next steps, moving beyond the start of step 4, will involve making decisions about how we are going to support sbcl on RISC-V. As yet, we have not made any decisions about register function (some of those, such as the C stack and frame pointer, will be dictated by the platform ABI, but many won't, such as exactly how we will partition the register set), or about how the stack works (again, the C stack will be largely determined, but Lisp will have its own stack, kept separate for ease of implementating garbage collection).
(Observant readers will note that phase 3 is completely unstarted. It's not necessary yet, though it does need to be completed for phase 4 to run to completion, as some of the cross-compilation depends on information about the runtime memory layout. Some of the details in phase 3 depend on the decisions made in phase 4, though, so that's why I'm doing things in this order, and not, say, because I don't have a RISC-V machine to run anything on -- it was pleasantly straightforward to bring up Fedora under QEMU which should be fine as a place to get started.)
And this concludes the initial SBCL porting work that can be done with very little brain. I do not know where, if anywhere, this effort will go; I am notionally on holiday, and it is also hot: two factors which suggest that work on this, if any, will be accomplished slowly. I'd be very happy to collaborate, if anyone else is interested, or else plod along in my own slow way, writing things up as I go.