In this series of posts we’re going to explore ActivityPub, the protocol that powers microblogging across the Fediverse.
This post is going to focus on how ActivityPub models microblogging. We’re going to dive into the three main parts: Actors, Activities and Objects. We’ll also take a look at how we use these to achieve microblogging in practice.
Actor
An Actor in ActivityPub is meant to represent someone or something performing an activity. The something here is important, it doesn’t have to be a person, it could be a service.
In practice you’ll find this is normally the Person from the Activity
Vocabulary. This means they’ll be described as type: "Person"
and should
have a name
attribute too.
Object
The object represents things that you may share or exchange over social
media. It’s things like a Note
, Audio
an Image
, a Place
etc. The
special Tombstone
object is used to represent an object that has been
deleted.
In the context of ActivityPub you’ll primarily run into the Note
, as that
is the equivalent of a tweet. Officially it is:
A short written work typically less than a single paragraph in length.
Most client allow messages of around 500 characters which is somewhere between 70 and 125 words. A paragraph is usually anywhere between 100 and 200 words so this works out.
Activity
This is where it gets a little more complicated. There are two types of activities: the regular activity and the intransitive activity. The regular activity can also be called a transitive activity but the spec doesn’t seem to use that term.
Intransitive activities can be thought of as things that are happening to
the actor itself. For example, the Actor has Arrive
d at a location or is
Travel
ling. Asking a Question
, typically used to represent a poll is the
last of the intransitives and is a little odd in that it normally doesn’t have
an actor. Because you’re not doing an action to a thing the intransitive
activity never has an object it is referencing.
Regular activities represent something the Actor is doing or has done to an
Object, such as Create
a Note
, Like
an object or Follow
a Person
.
Regular activites always have an Actor
repersenting who is doing it and an
Object
, the target to which it is done.
Create
ing a Note
is how you tweet, and this is what the message looks
like:
{"@context": "https://www.w3.org/ns/activitystreams",
"type": "Create",
"id": "https://social.example/alyssa/posts/9282e9cc-14d0-42b3-a758-d6aeca6c876b",
"to": ["https://social.example/alyssa/followers/",
"https://www.w3.org/ns/activitystreams#Public"],
"actor": "https://social.example/alyssa/",
"object": {"type": "Note",
"id": "https://social.example/alyssa/posts/d18c55d4-8a63-4181-9745-4e6cf7938fa1",
"attributedTo": "https://social.example/alyssa/",
"to": ["https://social.example/alyssa/followers/",
"https://www.w3.org/ns/activitystreams#Public"],
"content": "Lending books to friends is nice. Getting them back is even nicer! :)"}}
Microblogging
In order to microblog, aka tweet, we use the CLient-to-Server and Server-to- Server APIs.
Both the Client-to-Server and Server-to-Server APIs use your Actor’s inbox
and outbox
to do things. The way we discover where your inbox and outbox
are located is done through something called WebFinger.
You and other servers interact with your inbox and outbox over HTTP by POSTing
JSON messages to it. Your inbox
is used to receive messages, so you never
POST to it, you only retrieve messages from it.
The outbox
are objects you create and activities you perform, they’re what
eventually federates out. Your server picks up objects and activities in your
outbox and then tries to do something useful with it. Servers then drop those
activities in the inbox
of your followers, which is how your followers
become aware of you having done something like Create
ing a Note
.
Discovering a user’s inbox and outbox
WebFinger, also known as RFC 7033, is how we go from a handle like
@name@domain.tld
to a set of endpoints that tell us where your inbox
and
outbox
are. It’s an “over the web”, read HTTP, implementation of the old
Finger User Information protocol, RFC 1288. ActivityPub and
WebFinger are completely separate specs and ActivityPub doesn’t mandate the
use of WebFinger, but it’s what happens in practice.
The way this works is that on domain.tld
an endpoint exists at
/.well-known/webfinger
to which you pass a query of
resource=acct:name@domain.tld
. This should then return a JSON document
and within its links
array you’ll find one of rel=self
,
type=application/activity+json
and the href
that we can use to query
to get more information about you, like your inbox.
$ curl 'https://mastodon.social/.well-known/webfinger?resource=acct:gargron@mastodon.social'
{
"subject": "acct:Gargron@mastodon.social",
"aliases": [
"https://mastodon.social/@Gargron",
"https://mastodon.social/users/Gargron"
],
"links": [
{
"rel": "http://webfinger.net/rel/profile-page",
"type": "text/html",
"href": "https://mastodon.social/@Gargron"
},
{
"rel": "self",
"type": "application/activity+json",
"href": "https://mastodon.social/users/Gargron"
},
{
"rel": "http://ostatus.org/schema/1.0/subscribe",
"template": "https://mastodon.social/authorize_interaction?uri={uri}"
}
]
}
So we continue on, and we now query the ActivityPub server. Keep in mind that
you can’t assume that @name@domain.tld
is actually hosted on domain.tld
,
they might use a subdomain like social.domain.tld
or have it delegated
to a different domain entirely. This is why we go through WebFinger. It
provides that layer of indirection decoupling the handle from the domain the
ActivityPub server is hosted on.
$ curl 'https://mastodon.social/users/Gargron' -H 'Accept: application/ld+json; profile="https://www.w3.org/ns/activitystreams"'
{
...
"id": "https://mastodon.social/users/Gargron",
"type": "Person",
"following": "https://mastodon.social/users/Gargron/following",
"followers": "https://mastodon.social/users/Gargron/followers",
"inbox": "https://mastodon.social/users/Gargron/inbox",
"outbox": "https://mastodon.social/users/Gargron/outbox",
"featured": "https://mastodon.social/users/Gargron/collections/featured",
"featuredTags": "https://mastodon.social/users/Gargron/collections/tags",
"preferredUsername": "Gargron",
...
"publicKey": {
"id": "https://mastodon.social/users/Gargron#main-key",
"owner": "https://mastodon.social/users/Gargron",
"publicKeyPem": "..."
},
"endpoints": {
"sharedInbox": "https://mastodon.social/inbox"
},
...
}
There’s a lot more information in there, but for our purposes we now know
enough. We’ve found where their inbox
and outbox
are located, a number of
URLs for different collections like who they are following
and who their
followers
are etc. We also now know their publicKey
, which we’ll use to
verify the HTTP signature of a message that gets POST
ed to our inbox
.
Client-to-Server
Client-to-Server is how an ActivityPub application could interact with your server so you can send messages. In practice this doesn’t happen because most applications are coded against Mastodon’s API, so everyone tends to implement that instead.
The idea with C2S is that it’s more or less the same as Server-to-Server. This is helpful when working on an implementation, since if you implement one part you’ve almost got the other too.
There is a slight asymetry though that for a bunch of things you do in C2S
you don’t have to do it through an Activity. The idea is you only POST the
Note
object to your outbox, which your server then picks up, wraps in a
Create
activity and posts that to your outbox. Since there’s now a Create
activity in your outbox, the server picks that up and does what is necessary
to POST that to the inbox
es of whoever you intended the message for. You
can however also use the Create
activity yourself directly.
The Block
activity is unique to the C2S API, since we don’t want to federate
out the fact that we’ve blocked something (someone) and inform them of that
fact.
Server-to-Server
Server-to-Server is how ActivityPub federates, it’s how we share messages between people on different instances. Everyone implements this as without it you’re not connected to the rest of the ActivityPub fediverse. Some instances explicitly chose not to federate at all.
This is very similar to the C2S API. When activites that are meant to federate
are posted to your outbox, your server goes and figures out who it should send
that activity to. Once it’s built up the list of everyone it should go to,
it’ll lookup everyone’s inbox
and POST the activity to it.
Sometimes users have a sharedInbox
. You can POST an activity to that
inbox and then the receiving instance will figure out which user’s inboxes on
it should receive the activity. This means that if you have 50 followers all
on the same instance and each of those followers share a sharedInbox
, you
server will only have to do one POST request. Reducing a really big list of
followers down to as many shared inboxes as possible greatly reduces the
amount of traffic we need to federate a message.
Other servers will POST messages to your inbox
and that’s how you become
aware of one of the people you follow having sent out a message. When receiving
a message we check the HTTP signature with the publicKeyPem
of the user the
message was supposedly sent by. This ensures you can’t go around impersonating
people by dropping messages in other people’s inboxes.
Recap
ActivityPub is a relatively simple system. Actors are performing Activities on Objects. You create messages in your outbox which your server then goes and plops into other people’s inboxes. This is how you send out a message to other people in the Fediverse, notably the people following you. You in turn will be receiving messages in your inbox from other people in the Fediverse, specifically from folks you are following.
In order to map a handle to a set of HTTP endpoints to post messages to, we
first use WebFinger to find the user’s ActivityPub instance. Once we have that
we query the instance for the user which should return an object with at least
the inbox
and outbox
, and potentially other collections as well.