Question 1

How did Databricks reach a $62 billion valuation?

Accepted Answer

Databricks reached a $62 billion valuation through a combination of rapid revenue growth (approximately $2.4B in annualized revenue as of early 2026, up from $1.6B in 2024), a defensible open-core business model, and the strategic acquisition of Mosaic ML in 2023 for $1.3B. The company's open-source contributions — Apache Spark, Delta Lake, MLflow — created massive developer adoption at zero acquisition cost, and then Databricks monetized the governance and management layers that enterprises require on top of those open-source foundations. The $62B valuation reflects approximately 26x forward revenue, consistent with high-growth enterprise data infrastructure companies.

Question 2

What is the open-core business model and why is it effective?

Accepted Answer

The open-core model involves open-sourcing the foundational compute or runtime layer of a software product — which eliminates switching costs and drives bottom-up developer adoption — while charging for proprietary management, governance, security, and support layers on top. The model works because: (1) open-source adoption provides zero-cost distribution at scale, (2) enterprises that adopt the open-source layer inevitably need the enterprise features that only the original vendor provides, and (3) the governance and metadata layers are structurally stickier than the compute layer. Databricks executed this across four successive layers: Spark, Delta Lake, Unity Catalog, and Mosaic ML, each time expanding the surface area of monetizable enterprise features.

Question 3

Why is Unity Catalog more important than Databricks' compute platform?

Accepted Answer

Unity Catalog is the metadata and governance layer that sits across all of Databricks' compute. Once an enterprise maps its data assets, access policies, lineage, and compliance rules into Unity Catalog, switching away from Databricks requires not just migrating compute workloads but re-building the entire governance architecture. This makes Unity Catalog dramatically stickier than the Spark or Delta Lake layers, which are technically portable. Governance metadata — data lineage, access policies, audit trails, semantic tags — is organizational knowledge that cannot be easily exported or replicated on another platform. It is the enterprise equivalent of a CRM's contact history: the accumulation is the moat.

Question 4

What does Snowflake's pivot to Apache Iceberg mean for the competitive landscape?

Accepted Answer

Snowflake's announcement that it would natively support Apache Iceberg — the open table format that competes with Databricks' Delta Lake — is a strategic concession. It acknowledges that data gravity is shifting toward open formats that customers own and control, rather than proprietary formats that lock data inside a vendor's platform. Snowflake adopted Iceberg because it was losing deals to Databricks on architecture grounds: enterprises were choosing Delta Lake specifically because it is open and portable. By supporting Iceberg, Snowflake validated the open-format thesis. But it also complicated its own lock-in story, since the primary reason to pay Snowflake's premium was proprietary performance on proprietary storage. The Iceberg pivot buys Snowflake table-stakes parity; it does not change the strategic momentum in Databricks' favor.

Question 5

How does the Mosaic ML acquisition position Databricks for AI?

Accepted Answer

The $1.3B Mosaic ML acquisition in 2023 gave Databricks LLM training and fine-tuning capabilities — specifically, the MPT model series and the MosaicML training platform — that slots directly into the enterprise data workflow. The strategic logic is a replay of the Spark-to-Delta Lake playbook: enterprises already running data workloads on Databricks can now train and fine-tune models on the same platform, using the same data governance layer (Unity Catalog), without moving data to an external AI vendor. This eliminates the data-export step that most enterprise AI projects require and positions Databricks as the single platform for data engineering, analytics, and AI model training. As AI training workloads scale, Databricks captures a larger share of enterprise compute spend without any additional customer acquisition cost.