Thinking about the Fediverse a question that popped into my head was just how centralised the Fediverse is? By centralised I don’t mean the fact that mastodon.social is a huge instance, but centralised in the sense of which ISPs are hosting instances across the Fediverse.

Mapping the Fediverse to Autonomous Systems

The way this is achieved is technically speaking very simple:

One problem is that this results in over 14k instances and for each one we do an A and AAAA query. That results in 2x the amount of queries. This is something I’ll improve later, as once we have an answer to the A query we don’t really need the quad-A for our purposes. Some instances are also dead or don’t resolve. Once we have the IP the ASN lookup is trivial using a local copy of the MaxMind database.

There’s a but, because of course there is. Many ISPs have multiple AS numbers (due to historical reasons, mergers and acquisitions etc.) so the code tries to dedupe them by matching on the name. Thankfully most networks have their name consistently set, but some of course don’t. That’ll need fixing later.

The code and resulting data can be found here. The code for this is extremely rough and will be improved and reorganised over time. But I did manage to complete a first round of data collection! In the future I’ll probably run this on a weekly basis from some cheap VPS because I suspect that using GitHub Actions for this might be me into trouble.

I’m purposefully not mapping countries here. GeoIP-based DNS and anycast may result in IPs for certain names that the MaxMind database will place in a country that the instance isn’t actually located in. We can’t use the TLD for this either as nothing says that .se has to be hosted in Sweden. As such we limit ourselves to the ISP.

Results

After 2.5hrs of waiting for all the queries to complete (I’m doing this on my home connection right now so I’m extremely careful about not doing too many DNS requests and upsetting my ISP) the results are in! Here’s our top 10:

NameInstancesAS Number(s)
OVH SAS212916276, 35540
CLOUDFLARENET199813335
DIGITALOCEAN-ASN186414061
Hetzner Online GmbH182724940, 213230, 212317
Linode, LLC75363949
netcup GmbH472197540
AMAZON-0235816509
AS-CHOOPA32520473
Online S.a.s.29012876
ORACLE-BMC-3189828531898

OVH and Hetzner are heavy hitters here. They’re both European companies which is also rather interesting. DigitalOcean and Linode round out the top 5 though Linode has significantly less instances than Hetzner already. Cloudflare doesn’t host instances, it just fronts them. I suspect many of those instances are probably also hosted on the hosters in the top 5 but there’s no way for me to know and correct the numbers.

The numbers for Amazon/AWS should be higher because there’s a second and differently named AS for them too. I’ll get around to fixing the AS deduping at some point. Somewhat surprisingly is just how many instances are hosted on Oracle Cloud. Wny people, why? Microsoft Azure and Google Cloud Platform do make it into the top 20. But in total the Big 3 cloud providers host only about 5% of the Fediverse. That’s pretty cool!

Future work

As already noted, the code is ugly and there’s a lot of room for improvement. That’ll be up first, but I just wanted to get the data out there for folks to look at. Once that’s out of the way I’d like to generate some pretty charts for people to look at, to complement the current table.

I also want to add lookups for each instance’s authoritative name servers and the ASes hosting those to get a more complete picture of what ISPs the Fediverse is heavily dependent on.