In this series of posts we’re going to explore ActivityPub, the protocol that powers microblogging across the Fediverse.

This post is going to focus on how ActivityPub models microblogging. We’re going to dive into the three main parts: Actors, Activities and Objects. We’ll also take a look at how we use these to achieve microblogging in practice.

Actor

An Actor in ActivityPub is meant to represent someone or something performing an activity. The something here is important, it doesn’t have to be a person, it could be a service.

In practice you’ll find this is normally the Person from the Activity Vocabulary. This means they’ll be described as type: "Person" and should have a name attribute too.

Object

The object represents things that you may share or exchange over social media. It’s things like a Note, Audio an Image, a Place etc. The special Tombstone object is used to represent an object that has been deleted.

In the context of ActivityPub you’ll primarily run into the Note, as that is the equivalent of a tweet. Officially it is:

A short written work typically less than a single paragraph in length.

Most client allow messages of around 500 characters which is somewhere between 70 and 125 words. A paragraph is usually anywhere between 100 and 200 words so this works out.

Activity

This is where it gets a little more complicated. There are two types of activities: the regular activity and the intransitive activity. The regular activity can also be called a transitive activity but the spec doesn’t seem to use that term.

Intransitive activities can be thought of as things that are happening to the actor itself. For example, the Actor has Arrived at a location or is Travelling. Asking a Question, typically used to represent a poll is the last of the intransitives and is a little odd in that it normally doesn’t have an actor. Because you’re not doing an action to a thing the intransitive activity never has an object it is referencing.

Regular activities represent something the Actor is doing or has done to an Object, such as Create a Note, Like an object or Follow a Person. Regular activites always have an Actor repersenting who is doing it and an Object, the target to which it is done.

Createing a Note is how you tweet, and this is what the message looks like:

{"@context": "https://www.w3.org/ns/activitystreams",
 "type": "Create",
 "id": "https://social.example/alyssa/posts/9282e9cc-14d0-42b3-a758-d6aeca6c876b",
 "to": ["https://social.example/alyssa/followers/",
        "https://www.w3.org/ns/activitystreams#Public"],
 "actor": "https://social.example/alyssa/",
 "object": {"type": "Note",
            "id": "https://social.example/alyssa/posts/d18c55d4-8a63-4181-9745-4e6cf7938fa1",
            "attributedTo": "https://social.example/alyssa/",
            "to": ["https://social.example/alyssa/followers/",
                   "https://www.w3.org/ns/activitystreams#Public"],
            "content": "Lending books to friends is nice.  Getting them back is even nicer! :)"}}

Microblogging

In order to microblog, aka tweet, we use the CLient-to-Server and Server-to- Server APIs.

Both the Client-to-Server and Server-to-Server APIs use your Actor’s inbox and outbox to do things. The way we discover where your inbox and outbox are located is done through something called WebFinger.

You and other servers interact with your inbox and outbox over HTTP by POSTing JSON messages to it. Your inbox is used to receive messages, so you never POST to it, you only retrieve messages from it.

The outbox are objects you create and activities you perform, they’re what eventually federates out. Your server picks up objects and activities in your outbox and then tries to do something useful with it. Servers then drop those activities in the inbox of your followers, which is how your followers become aware of you having done something like Createing a Note.

Discovering a user’s inbox and outbox

WebFinger, also known as RFC 7033, is how we go from a handle like @name@domain.tld to a set of endpoints that tell us where your inbox and outbox are. It’s an “over the web”, read HTTP, implementation of the old Finger User Information protocol, RFC 1288. ActivityPub and WebFinger are completely separate specs and ActivityPub doesn’t mandate the use of WebFinger, but it’s what happens in practice.

The way this works is that on domain.tld an endpoint exists at /.well-known/webfinger to which you pass a query of resource=acct:name@domain.tld. This should then return a JSON document and within its links array you’ll find one of rel=self, type=application/activity+json and the href that we can use to query to get more information about you, like your inbox.

$ curl 'https://mastodon.social/.well-known/webfinger?resource=acct:gargron@mastodon.social'

{
  "subject": "acct:Gargron@mastodon.social",
  "aliases": [
    "https://mastodon.social/@Gargron",
    "https://mastodon.social/users/Gargron"
  ],
  "links": [
    {
      "rel": "http://webfinger.net/rel/profile-page",
      "type": "text/html",
      "href": "https://mastodon.social/@Gargron"
    },
    {
      "rel": "self",
      "type": "application/activity+json",
      "href": "https://mastodon.social/users/Gargron"
    },
    {
      "rel": "http://ostatus.org/schema/1.0/subscribe",
      "template": "https://mastodon.social/authorize_interaction?uri={uri}"
    }
  ]
}

So we continue on, and we now query the ActivityPub server. Keep in mind that you can’t assume that @name@domain.tld is actually hosted on domain.tld, they might use a subdomain like social.domain.tld or have it delegated to a different domain entirely. This is why we go through WebFinger. It provides that layer of indirection decoupling the handle from the domain the ActivityPub server is hosted on.

$ curl 'https://mastodon.social/users/Gargron' -H 'Accept: application/ld+json; profile="https://www.w3.org/ns/activitystreams"'

{
...
  "id": "https://mastodon.social/users/Gargron",
  "type": "Person",
  "following": "https://mastodon.social/users/Gargron/following",
  "followers": "https://mastodon.social/users/Gargron/followers",
  "inbox": "https://mastodon.social/users/Gargron/inbox",
  "outbox": "https://mastodon.social/users/Gargron/outbox",
  "featured": "https://mastodon.social/users/Gargron/collections/featured",
  "featuredTags": "https://mastodon.social/users/Gargron/collections/tags",
  "preferredUsername": "Gargron",
...
  "publicKey": {
    "id": "https://mastodon.social/users/Gargron#main-key",
    "owner": "https://mastodon.social/users/Gargron",
    "publicKeyPem": "..."
  },
  "endpoints": {
    "sharedInbox": "https://mastodon.social/inbox"
  },
...
}

There’s a lot more information in there, but for our purposes we now know enough. We’ve found where their inbox and outbox are located, a number of URLs for different collections like who they are following and who their followers are etc. We also now know their publicKey, which we’ll use to verify the HTTP signature of a message that gets POSTed to our inbox.

Client-to-Server

Client-to-Server is how an ActivityPub application could interact with your server so you can send messages. In practice this doesn’t happen because most applications are coded against Mastodon’s API, so everyone tends to implement that instead.

The idea with C2S is that it’s more or less the same as Server-to-Server. This is helpful when working on an implementation, since if you implement one part you’ve almost got the other too.

There is a slight asymetry though that for a bunch of things you do in C2S you don’t have to do it through an Activity. The idea is you only POST the Note object to your outbox, which your server then picks up, wraps in a Create activity and posts that to your outbox. Since there’s now a Create activity in your outbox, the server picks that up and does what is necessary to POST that to the inboxes of whoever you intended the message for. You can however also use the Create activity yourself directly.

The Block activity is unique to the C2S API, since we don’t want to federate out the fact that we’ve blocked something (someone) and inform them of that fact.

Server-to-Server

Server-to-Server is how ActivityPub federates, it’s how we share messages between people on different instances. Everyone implements this as without it you’re not connected to the rest of the ActivityPub fediverse. Some instances explicitly chose not to federate at all.

This is very similar to the C2S API. When activites that are meant to federate are posted to your outbox, your server goes and figures out who it should send that activity to. Once it’s built up the list of everyone it should go to, it’ll lookup everyone’s inbox and POST the activity to it.

Sometimes users have a sharedInbox. You can POST an activity to that inbox and then the receiving instance will figure out which user’s inboxes on it should receive the activity. This means that if you have 50 followers all on the same instance and each of those followers share a sharedInbox, you server will only have to do one POST request. Reducing a really big list of followers down to as many shared inboxes as possible greatly reduces the amount of traffic we need to federate a message.

Other servers will POST messages to your inbox and that’s how you become aware of one of the people you follow having sent out a message. When receiving a message we check the HTTP signature with the publicKeyPem of the user the message was supposedly sent by. This ensures you can’t go around impersonating people by dropping messages in other people’s inboxes.

Recap

ActivityPub is a relatively simple system. Actors are performing Activities on Objects. You create messages in your outbox which your server then goes and plops into other people’s inboxes. This is how you send out a message to other people in the Fediverse, notably the people following you. You in turn will be receiving messages in your inbox from other people in the Fediverse, specifically from folks you are following.

In order to map a handle to a set of HTTP endpoints to post messages to, we first use WebFinger to find the user’s ActivityPub instance. Once we have that we query the instance for the user which should return an object with at least the inbox and outbox, and potentially other collections as well.