Databricks at $62B: The Open-Source Bait-and-Switch Is the Best Business Model in Enterprise Software
Databricks gave away Apache Spark, Delta Lake, and MLflow for free. Then it built the governance layer on top and charged enterprises $2.4B a year for the privilege of managing their own data. Snowflake's pivot to open formats is the clearest admission yet: Databricks won the architecture war.
By Erik Sundberg, Developer Tools · Mar 25, 2026
Databricks' open-core strategy turned Apache Spark, Delta Lake, and Unity Catalog into a $62B enterprise lock-in machine. How giving away the engine and charging for the dashboard became the dominant GTM pattern in data infrastructure — and why Snowflake's Iceberg pivot proves it worked.
Frequently Asked Questions
How did Databricks reach a $62 billion valuation?
Databricks reached a $62 billion valuation through a combination of rapid revenue growth (approximately $2.4B in annualized revenue as of early 2026, up from $1.6B in 2024), a defensible open-core business model, and the strategic acquisition of Mosaic ML in 2023 for $1.3B. The company's open-source contributions — Apache Spark, Delta Lake, MLflow — created massive developer adoption at zero acquisition cost, and then Databricks monetized the governance and management layers that enterprises require on top of those open-source foundations. The $62B valuation reflects approximately 26x forward revenue, consistent with high-growth enterprise data infrastructure companies.
What is the open-core business model and why is it effective?
The open-core model involves open-sourcing the foundational compute or runtime layer of a software product — which eliminates switching costs and drives bottom-up developer adoption — while charging for proprietary management, governance, security, and support layers on top. The model works because: (1) open-source adoption provides zero-cost distribution at scale, (2) enterprises that adopt the open-source layer inevitably need the enterprise features that only the original vendor provides, and (3) the governance and metadata layers are structurally stickier than the compute layer. Databricks executed this across four successive layers: Spark, Delta Lake, Unity Catalog, and Mosaic ML, each time expanding the surface area of monetizable enterprise features.
Why is Unity Catalog more important than Databricks' compute platform?
Unity Catalog is the metadata and governance layer that sits across all of Databricks' compute. Once an enterprise maps its data assets, access policies, lineage, and compliance rules into Unity Catalog, switching away from Databricks requires not just migrating compute workloads but re-building the entire governance architecture. This makes Unity Catalog dramatically stickier than the Spark or Delta Lake layers, which are technically portable. Governance metadata — data lineage, access policies, audit trails, semantic tags — is organizational knowledge that cannot be easily exported or replicated on another platform. It is the enterprise equivalent of a CRM's contact history: the accumulation is the moat.
What does Snowflake's pivot to Apache Iceberg mean for the competitive landscape?
Snowflake's announcement that it would natively support Apache Iceberg — the open table format that competes with Databricks' Delta Lake — is a strategic concession. It acknowledges that data gravity is shifting toward open formats that customers own and control, rather than proprietary formats that lock data inside a vendor's platform. Snowflake adopted Iceberg because it was losing deals to Databricks on architecture grounds: enterprises were choosing Delta Lake specifically because it is open and portable. By supporting Iceberg, Snowflake validated the open-format thesis. But it also complicated its own lock-in story, since the primary reason to pay Snowflake's premium was proprietary performance on proprietary storage. The Iceberg pivot buys Snowflake table-stakes parity; it does not change the strategic momentum in Databricks' favor.
How does the Mosaic ML acquisition position Databricks for AI?
The $1.3B Mosaic ML acquisition in 2023 gave Databricks LLM training and fine-tuning capabilities — specifically, the MPT model series and the MosaicML training platform — that slots directly into the enterprise data workflow. The strategic logic is a replay of the Spark-to-Delta Lake playbook: enterprises already running data workloads on Databricks can now train and fine-tune models on the same platform, using the same data governance layer (Unity Catalog), without moving data to an external AI vendor. This eliminates the data-export step that most enterprise AI projects require and positions Databricks as the single platform for data engineering, analytics, and AI model training. As AI training workloads scale, Databricks captures a larger share of enterprise compute spend without any additional customer acquisition cost.
Related Articles
Topics: Databricks, Open Source, Enterprise SaaS, Snowflake, Data Infrastructure
Browse all articles | About Signal