Implementing FORTH on macOS

2022-01-24

by David Jones

SixtyForth

A few years ago i implemented FORTH for 64-bit Intel architecture (AMD64). This was on Linux and i had deliberately not spent much time thinking “portably”. Most obviously not in terms of operating systems, but also CPU architecture, and wordsize.

After buying a second-hand 2015 Mac Book Pro in the second March of the long year of the pandemic i would every now and then tinker with SixtyForth to try and get it working on macOS.

With a bit of a push over the weekend i managed to get a basic implementation going: it reaches the interactive prompt, but the nano-style line editor does not work.

What is involved? Well, the 2015 mac predates the switch to Apple Silicon so it is a 64-bit Intel architecture. That's a good start. macOS implements a largely POSIX subsystem, so that too is a good start.

Many blog articles on the very low-level details, even ones that are using assembly, assume you are writing assembly and C, or are targetting the C runtime (so you may have a conventional main(), or are willing to link to the C library).

There are not many hints for people who eschew C entirely.

Building

Building is the first stumbling block. I use nasm for the assembler. macOS and Linux use different object file formats. Linux uses ELF, so i use

nasm -w+error -g -f elf64 -o 64th.o -l ignore/listing 64th.asm

but macOS used Mach-O, so i need -f macho64

nasm -w+error -g -f macho64 -o 64th.o -l ignore/listing 64th.asm

The source for SixtyForth is currently two files: one 64th.asm file, and one rc.4 file. 64th.asm implements a fairly minimal interpreter, and rc.4 (run commands FORTH) contains the rest of the implementation, written in itself. The .asm file is compiled using nasm (above), the rc.4 file is “compiled” using objcopy making an object file that has a byte-copy of the rc.4 text file. As part of FORTH startup the rc.4 file is passed to EVALUATE as a single string.

On macOS i couldn't wrangle objcopy to do this. So the rc.4 file is now compiled using nasm and a INCBIN directive. The end result is the same, a .o file that has a byte-bopy of the text file. The extra file is a little bit more annoying, but in a way it’s a more portable toolchain because we already use nasm and we remove objcopy as a dependency.

The linker step also changes. Relatively straightforward on Linux:

ld -g -o 64th 64th.o rc.o

On macOS:

ld -L$(xcode-select -p)/SDKs/MacOSX.sdk/usr/lib -o 64th -arch x86_64 -segprot __DICT rwx rwx -macosx_version_min 10.13 -lSystem -no_pie -e _start asmrc.o 64th.o

Of note:

-e option can be used to select the entrypoint;
-arch must be explicit (good idea anyway);
-L option seems over-the-top? xcrun didn’t seem to help, but i have a hard time believing that i have to link to -lSystem like that;
the -segprot __DICT rwx rwx selects a non-default memory protection for a custom segment, more on that next.

Sections and self-modifying code

The sections have to have different names when using the Mach-O format: SEGNAME,sectname. Conventionally both parts start with __, for example: __DICT,__data. In Mach-O all the sections with the same SEGNAME (segment name) are joined together in a segment.

By default in macOS a segment is not both writable and executable. Data segments are writable and not executable; code segments are executable and not writable.

Which eventually leads to the line in the linker: -segprot __DICT rwx rwx.

SixtyForth requires that memory is both writable and executable in order to implement the class-defining word DOES>. At compile time DOES> pokes machine code into the word containing it, and arranges at run time for the most recently defined word to execute that machine code.

A future port to Apple Silicon (ARM based, extremely not x86-64) would require a different strategy as it is not possible to have a segment that is created by the linker that is both writable and executable. It is possible at runtime by using mmap().

Currently in SixtyForth DOES> is only used in the definition of CONSTANT. Alternative strategies could be:

Implement DOES> so that it does not compile machine code (not really in the spirit of FORTH and very slightly less efficient);
Put the modifiable dictionary in a suitable JIT mmap() segment;
Implement CONSTANT (or the run-time part of it) in assembly;
Eliminate the use of CONSTANT in rc.4.

Some of these strategies combine. It is possible to mmap() a segment and link the dictionary to it so that new definitions may use DOES>. There is a separate question as to wether it is desirable to do this before or after the rc.4 code wants to use CONSTANT.

System Call numbers and structures

The syscall numbers change between Linux and macOS; for example write() is syscall 1 on Linux and syscall 0x2000004 on macOS. For system calls that take in-memory structures, like the termios() family, the in-memory structures are different.

Because both systems implement the Sys V AMD64 (aka x86-64) ABI the rest of the calling conventions do not change. Same registers and same syscall instruction.

SixtyForth only makes about 6 different system calls (read, write, mmap, exit, open, ioctl).

syscall numbers are taken from https://sigsegv.pl/osx-bsd-syscalls/ with the not-very-well documented caveat that you have to add 0x2000000 to all the syscalls.

The flags (for mmap()) and the structs for ioctl() are a bit more tricky. Sure you can poke around header files and you get stuff like:

#define TIOCGWINSZ      _IOR('t', 104, struct winsize)  /* get window size */

For TIOCGWINZ (get winsize). Hmm. I suppose i can go and find the defintion of _IOR and guess the size of struct winsize, but in practice it was easier to write a C program to print out the value. Which in this case is 0x40087468. Incidentally it's nice to see traditions living on: that 't' in the macro is 0x74 ASCII which becomes the hex-digits 74 in that long number. The idea of encoding the subsystem in the ioctl() command and using t for terminal goes back to the first implementation of ioctl() in Unix Version 7.

It’s similar for the structs. You can go to the header files again, but i still need to know the exact memory format for tcflag_t and how cc_t c_cc[NCCS] is packed. On Linux there is an extra wrinkle in that often the C library interface and the syscall interface use a different struct and there is glue code to shuffle data around.

Best thing to do is create a buffer, make the syscall, and hexdump the buffer to work out the exact memory layout with whatever struct definitions you have available as a hint. This is one of the reasons that DUMP is implemented in assembly.

It may seem dodgy or underhand to obtain knowledge in this way, but the knowledge obtained absolutely cannot change because it forms part of the ABI with the operating system. For as long as the OS guarantees that old binaries will work, it cannot change the length or layout of structs and it cannot change flag values.

argc, argv, envp

On Linux at program start $rsp contains the address of argc, which is immediately followed in memory by the the argv vector (essentially argv = &argc + 8 bytes) and then the envp vector. It so happens that SixtyForth doesn’t use $rsp so it is unchanged by the time that the rc.4 code wants to inspect the program arguments.

On macOS argc, argv, envp are as if passed as arguments. The initial register state has them in $rdi, $rsi, $rdx (thanks lldb). Those registers are used by SixtyForth and because we need those values later they have to be specially saved. I added code to the _start entrypoint to save them in a block of memory that can be recalled with the FORTH word iplsave.

The Unified Future

So far all of this work has been done on a branch. What will a unified future look like?

In .asm we can have macros for the syscall numbers and the SECTION directives. The FORTH side of the iplsave mechanism can be the same, but the assembly that implements it will have to be different; so i guess that can be a macro too.

For the FORTH side of system calls we can use symbols with different numeric values for syscall numbers, flags, and struct offsets. Where tho?

Options:

define numeric values in assembly, one file per OS;
at runtime in FORTH by using not-yet-defined LINUX? MACOS? words;
a separate per-OS FORTH file that is included;
edit the rc.4 file with text-hackery prior to compiling it.

In principle i prefer a separate .4 file. In practice it will mean splitting the rc.4 file up as well. Probably a good thing to be honest:

portable rc.4
os-variant.4
mildly os-dependent rc.4