Mo cloud, mo problems

I’m always surprised when I find infrastructure problems that transcend company size and scale. A few have been coming up for me recently, all of which make me think there’s a startup idea or two in here somewhere.

Title of this post inspired by The Notorious B.I.G. – “Mo Money Mo Problems”

The first is capacity management. Most cloud consumption models today are consumption based, i.e. you pay-as-you-go for the amount that you consume. As a customer, you don’t have to plan and commit ahead of time. Seems simple, except this makes it nearly impossible for the cloud or service provider to plan for how much capacity (servers) to keep handy.

At Google, each product area had dedicated resource management teams responsible for forecasting capacity which fed into capacity orders. There is a very sophisticated resource economy within Google Compute without which things would crumble. Even for serverless cloud offerings within Google Cloud, our largest customers needed to tell us when they were expecting surges in traffic (Black Friday, NYE etc.) so that we could provision capacity ahead of time.

There is no escaping the fact that cloud isn’t infinitely elastic. Of course, Google’s size and scale does allow it a large margin of error when it comes to capacity — it’s relatively easy to move provisioned capacity around between customers and product areas when you have so much of it, assuming it is fungible (which is not always the case). Even at Google, products offered reservations or commitments using which customers paid a premium or committed ahead of time for a discount to ensure that capacity was available when needed.

Every startup (including Temporal) that offers a consumption-based cloud product or service faces the same problem. In many ways, startups have it worse because you’re often paying upfront for cloud capacity (regardless of the provider) and then passing those costs to customers as they use your service. If the customers don’t turn up, you’re sitting on unused capacity (or inventory in commerce parlance). The only way out is to subvert the pure consumption-based model with the “hint-about-consumption based model”… which kind of defeats the purpose, but what are you gonna do?

The second infrastructure problem is how much of building a SaaS service is repeatable yet painful. I asked ChatGPT to generate a non-exhaustive list for me (which I slightly modified). But it it did pretty well which tells you how much of this stuff is boilerplate.

  1. User Authentication and Authorization
  2. Billing, Metering, and Payment Processing
  3. High Availability, Backup, and Recovery
  4. APIs / CLIs for automation
  5. Audit Logging, Analytics, and Reporting
  6. Subscription Management
  7. Customer Relationship Management
  8. Compliance and Legal
  9. Security
  10. Documentation and Support

There’s a plethora of compmanies adding value in each bucket but the experience of putting it all together and customizing it for your business is awful. In larger companies like Google, there are entire teams of hundreds (if not thousands) of people dedicated to defining and building each item on the list. Meanwhile, a startup would be lucky to have one person dedicated to each. I imagine a magical (AI-driven?) future where a single entity takes care of all the SaaS scaffolding for me, so that I can focus on my core business logic.

Shameless plug: Temporal is this single entity if you’re aiming for “reliability scaffolding”. If you want to get rid of message queues, retries, complex error handling, state management etc., Temporal is the answer. If you’re interested in exploring Temporal Cloud for free (and you work at a startup), reach out.

Leave a Comment

Your email address will not be published. Required fields are marked *