One of the goals we have with GoToSocial is to make self-hosting your own fediverse instance really easy. In practice this means a few things:
- Providing a single static binary for easy deployment and without requiring containers
- Being very mindful of how much compute and memory we require
- No dependencies on external databases like Postgres, background job systems like Sidekiq which would require Redis, or an object store for media storage even though we do support using some of these things
Though we don’t want to require an external database, we do need a database. For GoToSocial, that means we support SQLite. However, SQLite is a database-as-a-library, not a client-server architecture. To use it in Go, you typically need Cgo. But with Cgo comes the need for a C toolchain at build time, it’s hard to do cross-platform builds and the target system needs to now also have SQLite installed.
So instead we’ve used the Modernc SQLite project. What this does is transpile (or convert) the SQLite C code base to Go. The end result is a native Go “port” of SQLite that we can use in any Go project. No Cgo, no C toolchain and we can continue to ship a static binary. This is an impressive feat of engineering but it comes at no small cost. The project has its own libc to implement all the primitives SQLite needs, it has the transpiler itself and then the resulting Go code. As far as I know this is the only Go project that’s taken this approach and gotten to something usable in practice.
A bug can be hiding in any of these and whenever we’ve had issues with it it’s proven difficult to track down the cause. It’s also hard to know for sure that the Go code matches the intent and behaviour of the original C code, which brings an additional dimension to debugging issues.
Enter WebAssembly
I recently stumbled on Nuno Cruces’ work where he compiles SQLite to WASM and then provides the necessary wrapping code to use it as a Go library. This takes the form of go-sqlite3. It leverages the Go-native and dependency-free WASM runtime wazero that’s being built by the folks at Tetrate Labs.
This approach remains entirely Cgo-free but has the benefit of coming with significantly less code to maintain. The wrapping code to access SQLite is very limited and the implementation straight-forward. You don’t have to be a compiler engineer to understand it and it’s something that if necessary we can maintain ourselves within the GoToSocial project. This does introduce a dependency on wazero, but Tetrate Labs seems to be in a good place and wazero has a lot of use beyond their own products.
During the 0.16 release cycle Kim added support for using the WASM-based SQLite library to GoToSocial (behind a build tag) and we’ve both been running it ever since. We’ve been very happy with it and the performance is the same so far. Now that we’re on the way to 0.17 this has become our default way of using SQLite. If no issues arise I’ll submit a PR to remove support for the old approach from our code base during the 0.18 cycle.
Media processing
One other thing that GoToSocial needs to do is handle media: images, audio and video that people share in their posts. Because we want to be mindful of people’s privacy, we only support image formats where we can currently redact location metadata. That means we support most formats where we can redact EXIF data like JPEG, PNG, WebP etc. though not HEIF.
For video we unfortunately can’t do that yet. We would also like to be able to generate a good preview image. But that requires opening the video stream and capturing a non-blank frame. There are no Cgo-free libraries for this in Go with good coverage of the various container formats and codecs. Writing our own is a huge task where we’d have to go dumpster diving through big and complex specifications.
We’re going to try and do what many other projects also do: use ffmpeg. But we’re of course doing it our way, because we don’t want ffmpeg to be a runtime dependency users have to deal with. Instead Kim has been making progress on compiling ffmeg to WASM, so that we can then use it through wazero. This lets us embed a build of ffmpeg in the binaries we distribute, keeping our single static binary that’s easy to deploy promise and means we don’t have to cope with ffmpeg version splay either.
One additional benefit, more so for the media processing than SQLite, is that each WASM module runs in its own sandbox. It doesn’t have access to the host process’ memory and it can’t crash the host process either. Given vulnerabilities sometimes happen in media libraries and we’re at times dealing with potentially maliciously crafted files (from remote instances) this insulates us a bit more from such issues.
Even more “FFI”
We’re far from the only ones who’ve latched on to this approach. Many folks have started to use WASM wrap various C libraries and other things as you can see on the wazero community page instead of using Cgo.
There is also the wasilibs project on GitHub which similarly compiles a number of things to WASI and then wraps it for easy use in Go through wazero. This seems to be how Buf is shipping a set of protoc plugins for various languages other than Go in their CLI.
As wazero continues to improve, this approach should hopefully become more and more viable. It provides an easier way of using C libraries in Go with a lot less toolchain issues, though it does come with slightly increased binary sizes due to the compiled WASM code being included. It fits nicely with Go’s build a single binary approach though, including the ability to embed static assets. Binaryen also keeps improving which should hopefully also help with speed and binary size. I’m not sure if there’s anything the Go team can do to help the wazero project or push more in this direction, but I’m hoping they’re at least keeping an eye on it.
In this post I’ve so far only talked about Cgo-free FFI, but going through WASM means we get an FFI-like mechanism for a lot more than the C ecosystem. Anything written in any language that can be compiled to WASM starts to come within reach.