Rust can replace Protobuf

Rust has a lot to like. Protobuf has a lot to hate. Can we make Protobuf better by replacing it with Rust? What follows is part of a long tradition of replacing things with Rust [1][2][3][4][5][6].
Protobuf is a language that describes an interface. An interface description language (IDL), if you will. It's also a serialization format that transports data across the wire. It's also a command line tool (protoc
) that compiles the IDL into language-specific packages to read and write said serialization format. It generates code in many languages.
Rust is a language that can do everything, but it's very boilerplate-heavy. Not good for simple interface definitions. Blessedly, Rust supports macros, which reduce boilerplate. Macros can auto-generate virtually any Rust code at compile time. There are two kinds of macros that are important for replacing Protobuf.
- Macros like wasm-bindgen (JS) and pyo3 (Python) that generate glue code to bind to foreign function interfaces. With hardly any effort, we can expose our basic Rust types to JavaScript and Python.
- Serde, a macro that describes how any Rust struct or enum should be serialized. It's format-agnostic, so you can plug-and-play any serialization format you want such as JSON or msgpack.
If it's not abundantly obvious what we're going to do, let me make it clear: we can write our interface using Rust structs, bind them to other languages, and serialize them with serde. Rust bindings in any language give us the portability of Protobuf with the expressiveness of Rust/serde types.
Protobuf is bad
Perhaps because it has to interface with so many languages which do not share the same language features, some things are just needlessly verbose. Google publishes a standalone Python library (not baked into the generated Protobuf code) just to smooth over the painful ergonomics in Python. For example, everything has a a default field, so if you don't check HasField
everywhere you can't tell the difference between something that was set to the default, or just something that is the default. (The pythonic approach would be to simply check if not field
.)
For a language that's meant to describe data, Protobuf is extremely limited in its expressiveness. There are lots of complaints about this online. To provide an example we can use, consider this interface meant to represent arithmetic expressions:
syntax = "proto3";
package algebraic;
// An algebraic data type for arithmetic expressions
message Expr {
oneof expr {
int32 number = 1;
BinaryOp binary_op = 2;
UnaryOp unary_op = 3;
}
}
// Represents a binary operation (e.g., addition, multiplication)
message BinaryOp {
enum Operator {
ADD = 0;
SUBTRACT = 1;
MULTIPLY = 2;
DIVIDE = 3;
}
Operator op = 1;
Expr left = 2;
Expr right = 3;
}
// Represents a unary operation (e.g., negation)
message UnaryOp {
enum Operator {
NEGATE = 0;
}
Operator op = 1;
Expr operand = 2;
}
Probably, you won't "just use" the generated Protobuf types in your language. It's going to be tedious. So instead, you'll write your own types and convert them to/from Protobuf. This isn't wrong per se, it's similar to how you probably wouldn't "just use" JSON off the wire. But it is annoying. If we're going through the effort to define our types, we should get more for our troubles.
In Rust we could represent the above far more concisely:
pub enum Expr {
Number(i32),
Op(Op),
}
pub struct UnaryOp(Expr);
pub struct BinaryOp(Expr, Expr);
pub enum Op {
Add(BinaryOp),
Subtract(BinaryOp),
Multiply(BinaryOp),
Divide(BinaryOp),
Negate(UnaryOp),
}
Not only is this more concise, it may map more neatly to our data model. Rust enum values mean we can represent all operations as a single type Op
instead of two types: BinaryOp
and UnaryOp
. More flexible types like this make it easier to express what we mean in our APIs.
Serde is great
The 10th most downloaded crate on crates.io is serde. It's usually the first dependency I add to a project. It's so good I have a hard time using other languages that don't have something similar.
We can annotate a struct with serde derive macros, and convert it to any number of formats.
This example is from the serde docs. Observe that the Point
struct has no knowledge of what formats it will serialize. Only that it can be serialized. We can delay the choice of format all the way until the moment we absolutely need it.
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize, Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let point = Point { x: 1, y: 2 };
let serialized = serde_json::to_string(&point).unwrap();
println!("serialized = {}", serialized);
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
println!("deserialized = {:?}", deserialized);
}
This can be used, for example, to respond intelligently to different Accept
headers. We can use JSON for a smooth development experience and switch to MsgPack when we want better performance in prod.
Ridl
I've published a demo of a crate I call ridl
(sounds like "riddle") which does roughly everything I said above. It's free for all to see and modify. I can't say for sure that people should be using this. What I can say though, is if you have a reason to invent another IDL, maybe start with Rust. If you think this idea could actually be something, then please open a pull request against ridl and start hacking. I'd love to see people's ideas.
At the time of writing, a ridl object looks something like this, with support for Python and JavaScript.
#[cfg_attr(feature = "py", ridl::popo("hello"))]
#[cfg_attr(feature = "wasm", ridl::pojso)]
pub struct Hello {
pub name: String,
}
pub struct Hello {
pub name: String,
}
That's it! A whole message in just a few lines of code.
For now, ridl serves as a proof-of-concept and further development will depend on how much people find value in this idea. After all, maximizing value is what we're here to do.
To write hello in Python:
from typing import Annotated
from rust_idl import hello
from fastapi import FastAPI, Response, Header
app = FastAPI()
@app.get("/hello/{name}")
def greet(name: str, accept: Annotated[str, Header()] = None):
greeting = hello.Hello(name)
if accept == "application/msgpack":
response = Response(content=greeting.to_msgpack())
response.headers["Content-Type"] = "application/msgpack"
else:
response = Response(content=greeting.to_json())
response.headers["Content-Type"] = "application/json"
return response
To read hello in JavaScript
import init, { Hello } from "rust-idl";
await init(); // load wasm
const greet = async (name: string): Promise<Hello> => {
const result = await fetch(`${WEBSERVER_URL}/hello/${name}`);
payload = new Uint8Array(await result.arrayBuffer());
return Hello.from_json(payload);
};
Future
There are limitations. For example, pyo3 doesn't support enum values like the arithmetic example from earlier. Similarly, generic types present some challenges in language bindings. These may get better over time as the Rust ecosystem evolves, or they may be fundamental limitations that are never fully resolved.
Ridl may not be significantly more expressive than Protobuf without enum values and generics, but it can only get better. Protobuf is a highly constrained language, but Rust is a very expressive language. For example, ridl could, with the help of some more custom proc macros, rewrite the value enums into something pyo3 can bind.
There are some ergonomic improvements to be made. For example, having a single proc macro instead of one-per-language. Also the build steps have to be run separately for each language (wasm-pack
for JS, maturin
for Python, etc.) but even that could be improved.
The future API for ridl could be much better. Here's the simplest one you could imagine:
#[ridl]
pub struct Hello {
name: String
}
// build with `cargo build -F js -F py` or similar
Serde lacks the backward compatibility guarantees of the Protobuf serialization format. While serde can't serialize Protobuf (or anything like it), we don't have to use serde either. We could invent a new format. Or you could just, you know, not break backwards-compatibility.
There are other conceivable benefits to using Rust over Protobuf and its ilk. For example, you could extend your types with bits of logic. Arguably, that might stretch the idea of an IDL a bit too far. Taken to the extreme, you might as well write your whole app in Rust. But delivering value sometimes means being pragmatic. Just like this whole blog post is about being pragmatic. And rewriting everything in Rust is pragmatic.