ETL vs. Platform Extensibility

May 27, 2022

Stripe announced Stripe Apps this week, allowing customers to build custom experiences right into the dashboard. Last week, Stripe announced Stripe Data Pipeline, an ETL (extract, transform, load) service that syncs Stripe data to a data warehouse, where engineers can run analytics against it. And it's not just Stripe. Shopify has extended its platform with Shopify Apps powered by WebAssembly.

These moves ignite an age-old tension between SaaS platforms and extensibility:

SaaS platforms want to be a system of record, but can't possibly satisfy downstream data use cases. If the use case is critical enough, users will churn to a more open service.

In the past, customers have chosen to extract (the E in ETL) their data from these SaaS platforms. They do this through data integration platforms – Zapier for consumers, Mulesoft, Fivetran, and Airbyte for the enterprise. Some SaaS like Census even specialized in carting data from data warehouse back to SaaS. Unfortunately, even with reliable glue and robust pipelines, these services don't control the API at either end of the pipeline (see the M:N API Problem).

SaaS companies don't like this. Data gravity creates vendor lock-in. Moving data outside your platform shifts eyeballs and developers away from your service. They're working with the data somewhere else (maybe even on a different SaaS). Extraction turns systems of record into dumb data collection points.

SaaS platforms have responded in a few ways. If you're big enough, you buy the services extracting your data – Salesforce bought Mulesoft in 2018 for $6.5 billion. You can also choose to restrict your API in some way – smaller API surface, breaking changes, restricted partners, or breaking behavior.

The other option is to build extensibility into your SaaS platform. In recent years, technology has made this increasingly easier to do. In the past, you'd have to go the Salesforce route to build a completely alternative software stack (custom languages, databases, UI frameworks). Platforms of the past risked allowing too much extensibility, pushing themselves down the value chain, and losing the end-user relationship. Imagine an extensible platform that is entirely abstracted over – it becomes an API-as-a-Service. This isn't the worst but can be disastrous for some SaaS categories (like CRM).

There's a way to provide extensibility and scripting inside your application without giving away too much. React has provided an embedded layer for UI extensions. WebAssembly and edge runtimes make it easy for services to run untrusted code. It's easy enough to embed workflow builders and orchestration systems into these applications.

Another way to look at it is through bundling vs. unbundling. Are we in a bundling (few but extensible platforms) or unbundling (many, small but specific APIs) phase? I'd don't think that platforms like Stripe and Shopify will be able to hold on to as much data as they were in the past. They might have to operate in a world where customers expect that their data lives in cloud data warehouses. But they will retain strict ownership of the data regardless of wherever it lives.