Thinking about the Fediverse a question that popped into my head was just how centralised the Fediverse is? By centralised I don’t mean the fact that mastodon.social is a huge instance, but centralised in the sense of which ISPs are hosting instances across the Fediverse.
Mapping the Fediverse to Autonomous Systems
The way this is achieved is technically speaking very simple:
- Query instances.social API for all instances it knows about
- Lookup the IP address(es) of each instance through DNS
- Use the MaxMind ASN database to find the AS number for the associated IP
One problem is that this results in over 14k instances and for each one we do an A and AAAA query. That results in 2x the amount of queries. This is something I’ll improve later, as once we have an answer to the A query we don’t really need the quad-A for our purposes. Some instances are also dead or don’t resolve. Once we have the IP the ASN lookup is trivial using a local copy of the MaxMind database.
There’s a but, because of course there is. Many ISPs have multiple AS numbers (due to historical reasons, mergers and acquisitions etc.) so the code tries to dedupe them by matching on the name. Thankfully most networks have their name consistently set, but some of course don’t. That’ll need fixing later.
The code and resulting data can be found here. The code for this is extremely rough and will be improved and reorganised over time. But I did manage to complete a first round of data collection! In the future I’ll probably run this on a weekly basis from some cheap VPS because I suspect that using GitHub Actions for this might be me into trouble.
I’m purposefully not mapping countries here. GeoIP-based DNS and anycast may
result in IPs for certain names that the MaxMind database will place in a
country that the instance isn’t actually located in. We can’t use the TLD for
this either as nothing says that
.se has to be hosted in Sweden. As such we
limit ourselves to the ISP.
After 2.5hrs of waiting for all the queries to complete (I’m doing this on my home connection right now so I’m extremely careful about not doing too many DNS requests and upsetting my ISP) the results are in! Here’s our top 10:
|OVH SAS||2129||16276, 35540|
|Hetzner Online GmbH||1827||24940, 213230, 212317|
OVH and Hetzner are heavy hitters here. They’re both European companies which is also rather interesting. DigitalOcean and Linode round out the top 5 though Linode has significantly less instances than Hetzner already. Cloudflare doesn’t host instances, it just fronts them. I suspect many of those instances are probably also hosted on the hosters in the top 5 but there’s no way for me to know and correct the numbers.
The numbers for Amazon/AWS should be higher because there’s a second and differently named AS for them too. I’ll get around to fixing the AS deduping at some point. Somewhat surprisingly is just how many instances are hosted on Oracle Cloud. Wny people, why? Microsoft Azure and Google Cloud Platform do make it into the top 20. But in total the Big 3 cloud providers host only about 5% of the Fediverse. That’s pretty cool!
As already noted, the code is ugly and there’s a lot of room for improvement. That’ll be up first, but I just wanted to get the data out there for folks to look at. Once that’s out of the way I’d like to generate some pretty charts for people to look at, to complement the current table.
I also want to add lookups for each instance’s authoritative name servers and the ASes hosting those to get a more complete picture of what ISPs the Fediverse is heavily dependent on.