Pulumi is a pleasant surprise

Dec 20, 2022 (Jun 29, 2024)

For a few years now I’ve had my eyes on Pulumi. Every time I looked at it I came away with a sentiment of “hmm, that seems nice”. I never got around to using it though and nobody around me had either so it was hard to get a sense of whether it would live up to my hopes. I recently decided to redo some of my personal infrastructure and since Pulumi has Hetzner Cloud support I decided to give it a go and see what would happen.

The main aspect I really like about Pulumi is that it is just code. It distinctly feels like what I would have loved Terraform to be. The DSL approach of Terraform is interesting but it’s proven to be incredibly limiting as it makes it ridiculously hard to express the mess that is the real world in it. In theory Terraform’s DSL is superior, if the world was regular, predictable, resources weren’t dynamic over the course of their lifetime and people didn’t already have a mess of infrastructure to begin with. I’m not the only person who feels that way and the existence of AWS CDK is very telling in that respect. Terraform might still be the reigning monarch, but between Pulumi, Crossplane and CDK the throne Hashicorp is sitting on has become very wobbly.

Now, there’s one important caveat following here. I’ve used Terraform from early versions right up to 1.2. I’ve used it for personal infra, for small startup infra with a couple hundred resources in a cloud and large public company with thousands upon thousands of resources spread over the 3 major cloud providers. I’ve used it and debugged it in anger. With Pulumi I don’t have that level of experience as I’ve just started using it to manage some of my personal cloud infra. This means I don’t know how this works out at scale, but at least for my own infra it is much more pleasant to work with. Given that it is code and can be developed as such I expect Pulumi will scale a lot better to larger teams and a lot of infrastructure.

Infrastructure as code

The whole deal with IaC was that, code. Don’t click-ops your infrastructure together, write software that does it. Develop that software like any other, have tests, do PRs, code review, build the abstractions that you need etc. If you’ve ever heard of DevOps, this is part of it.

Unfortunately, IaC got downgraded to Infrastructure as DSL. We saw this in the configuration management space too especially in Puppet vs. Chef. But where Puppet evolved its DSL and gained a lot of expressiveness over time, Terrform never really did. Terraform still feels incredibly constraining and not in a “it makes you think about good design” kind of constraining, but a “it’s a pain to get real stuff done with it” kind of constraining. In many cases we’ve had to entirely rebuild infrastructure in order to be able to properly manage it with Terraform. That’s certainly yielded some improvements too, but I’ve never really felt a lot of that was justified just to get to a point where Terraform could be used.

Pulumi lets me write actual code, with loops, conditionals and pulling in data from wherever I please, including infrastructure provisioned elsewhere and through other means. It’s a bit verbose at times, but I can easily write a wrapper or two to generate the underlying structures making this a non-issue in practice. Since it’s code I can test those wrappers and ensure they behave as expected. You can test a lot more and you’re not limited to integration testing with super slow development cycles. Since I can write it in Typescript, Go, Python and more I also have access to tons of libraries from their respective ecosystems. You still get deterministic infrastructure and it still has a reconciliation loop within which it plans and executes your infrastructure. In that sense it’s very similar to Terraform or even a Kubernetes Operator.

There are also more advanced features like Component Resources and Dynamic Providers which unlock a lot more possibilities than Terraform modules ever will.

Under the covers

Much like other solutions in this space, Pulumi starts by building a desired state model of the infrastructure based on your code. It then figures out the drift based on the current state and tries to reconcile that by creating, updating and deleting resources.

The interesting bit is how it goes from your code to infrastructure. This is a three-step process. A “language host” takes your code and transforms that into resources that are registered with a “deployment engine”. The deployment engine is responsible for figuring out what needs to change. It figures out what the state of a resource should be and then uses a “resource provider” to make those changes.

Resource providers need to be available in the same language as you write code in. So if you write Go and want to provision things in Hetzner, a resource provider written in Go for Hetzner needs to exist. This feels a bit unfortunate. Thinking about it from a high level, if we go from Language Host to Deployment Engine, then it should be possible for the Deployment Engine to use any Resource Provider to provision those resources. But it would require a well-defined way to “lower” the resources you declared into a common format which in turn could be passed to any resource provider. That comes with a lot of engineering challenges and probably a bunch of lowest common denominator issues and limitations.

Given most cloud providers maintain SDKs for Go, Python and JavaScript this isn’t much of a problem. Terraform being written in Go means Terraform providers are written in Go. This results in an SDK being available in Go for anything even remotely cloud-adjacent. Pulumi can also use Terraform providers itself in case there isn’t a Pulumi resource provider available yet.

Secrets, data and state

One aspect I really appreciate is that Pulumi has a built-in way to handle secrets. They’re encrypted by default even in state. This is a huge quality of life improvement over Terraform. I don’t need Vault to do this safely, mess with stuff like GPG encrypted secrets or use a custom provider like the one from 1Password in order to pull my secrets. I can add those encrypted secrets to the repo and not think about it. For teams you might want a different solution and those are available, but for my personal infra this is perfectly scrumptious.

There’s also a simple way to pass in configuration data through the Pulumi.<stack>.yaml files. You’re not limited to simple key-value pairs there either. You can do lists and maps and nested things. Go nuts. This is very similar to Terraform variables but there’s less setup work for them and because we’re writing actual code the data is much easier to validate and transform too.

Pulumi by default uses Pulumi Service (available for free for personal use) to manage state. This is also a massive quality of life change as managing the state locally like Terraform does by default is challenging. If you lose that state, you’re screwed because despite Terraforms’ import feature being a thing it’s barely functional in practice. I’ve had to recraft Terraform state by hand a few times. It’s not fun. Despite the default, it’s entirely possible to switch to local state if that’s more your thing.

Conclusion

The ability to manage infrastructure using a programming language you’re already familiar with is a huge benefit. This should make early adoption of Pulumi in a company’s lifecycle a no-brainer. It’s accessible to any team and doesn’t require specialised tooling knowledge. The fact that it deals with state and secrets properly out of the box unloads me from a bunch of decision making upfront and makes it safe to dig in and start experimenting.

Pulumi has been very interesting to get going with. There’s some things that don’t feel like idiomatic Go when writing Pulumi but a lot of that is fixable with a bit of code on my side. Everything else so far has been a joy and I highly recommend you give it a try and not just for personal projcets either.