The Problems with "Cloud-Prem"

Sep 21, 2021

"Cloud prem" (cloud + on-premise) is a deployment pattern becoming more and more common with companies. Vendors deploy software to their client's cloud provider under an isolated account or a separate VPC (see my SaaS Isolation Patterns). I first heard the term from Tomasz Tunguz's blog.

The practical way that it's applied is packaging up an application as some Terraform or Kubernetes configuration. This is how you might deploy something like Databricks on your cloud. Startups like Replicated offer this as a service by packaging your application up with Kubernetes.

Since vendors don't need to pay for cloud resources, they should theoretically see higher gross margins (avoiding the "cloud tax"). In addition, data and security is no longer an issue because it never leaves the client's account.

But there are downsides, many of which are why we switched to SaaS in the first place.

Customers can often stay on previous versions in the cloud prem model, leading to version skew. This is often touted as a feature of cloud prem, takings some of the pressure off of internal IT teams to do updates and migrations. Multi-tenant SaaS puts the software service burden on the vendor, only exposing functionality through APIs.

Supporting old versions can severely reduce product velocity at a company. Security patches need to be backported, and data migrations need to be performed for each customer.

Cloud prem deployments inherently don't share resources. If services are completely isolated in a separate cloud account, then there can exist significant redundancy in services (i.e., running a separate Database for the application). This makes it more expensive for customers to run it themselves (in time, since they aren't experts, and in $ because of duplicated resources).

For a more concrete example, take Snowflake and Databricks. Snowflake has a completely cloud-based offering versus Databricks's cloud prem model. When Snowflake makes an improvement to its data compression or query engine, it can immediately be rolled out to all customers with a behind-the-scenes migration. Databricks can't roll out a change like that as quickly, since customers are on different versions.

Customers can opt to fully integrate the application into their account, de-duplicating redundant infrastructure. Except now, the integration problem is even trickier.

Customers will begin to rely on parts of your internal implementation that you didn't plan to expose. To quote Hyrum's Law (read: Keep Your API Surface Small):

With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.

Yet, customers continue to vie for this model because of compliance concerns. It's much easier to get a new service through security review when there is no chance that sensitive data will leave the customer's cloud account.  

As go-to-market continues to be extremely important, vendors will continue to offer the most extensive API Surfaces they can to garner adoption. However, I'm not sure what it will look like when vendors have to maintain these deployments in the long run.

Daily posts on startups, engineering, and AI