DeepSeek Spent $5.6M Training a Model That Rivals GPT-4. The AI Cost Curve Just Broke.
A 150-person team in Hangzhou trained a 671-billion-parameter model for less than the cost of a Series A. NVIDIA lost $589 billion in a single day. Open-source models now match frontier performance at 1/100th the cost. The entire AI industry's margin thesis just got rewritten -- and the Jevons Paradox says demand will only accelerate.
By Raj Patel, AI & Infrastructure · Mar 9, 2026
DeepSeek R1 trained a 671-billion-parameter model for $5.6M that matches GPT-4 on major benchmarks. This breakdown covers the training economics, the $589 billion NVIDIA crash, open-source vs. closed model performance, inference cost collapse, the Jevons Paradox in AI compute, and what it all means for the industry's margin structure.
Frequently Asked Questions
What is DeepSeek R1 and who made it?
DeepSeek R1 is a 671-billion-parameter large language model released on January 20, 2025, by DeepSeek, an AI lab based in Hangzhou, China. The company was founded by Liang Wenfeng, co-founder of High-Flyer, a quantitative hedge fund managing approximately $8 billion in assets. DeepSeek operates with roughly 150-200 employees and a core model team of just 63 people. R1 uses a Mixture-of-Experts (MoE) architecture that activates only 37 billion parameters per token, making it far more efficient than dense models of comparable size. It was trained on 2,048 Nvidia H800 GPUs for approximately 2.788 million GPU hours.
How much did DeepSeek R1 cost to train?
DeepSeek R1 cost approximately $5.6 million in compute to train, based on 2.788 million H800 GPU hours. For comparison, GPT-4 is estimated to have cost $78-100 million or more to train, and GPT-5 reportedly cost $500 million per training run with total development costs of $1.25-2.5 billion. That makes DeepSeek R1 roughly 14-18x cheaper than GPT-4 and nearly 90-100x cheaper than GPT-5's total cost. The low training cost was achieved through the MoE architecture, aggressive engineering optimization, and the fact that DeepSeek's parent company High-Flyer had already accumulated significant GPU resources before the US export ban on H800 chips.
How does DeepSeek compare to GPT-4 on benchmarks?
DeepSeek R1 outperforms GPT-4 on several major benchmarks. On MMLU (Massive Multitask Language Understanding), R1 scores 90.8% versus GPT-4's 87.2%. On AIME 2024 (a competitive mathematics exam), R1 scores 79.8% compared to GPT-4's 9.3% -- a gap of over 70 percentage points. On MATH-500, R1 scores 97.3%. The subsequent DeepSeek V3.2-Speciale model scored 96.0% on AIME, beating even GPT-5-High's 94.6%. These results demonstrate that a model trained for $5.6 million can match or exceed models that cost 10-100x more to develop.
What was the DeepSeek stock market crash?
On January 27, 2025 -- the first trading day after DeepSeek R1 gained viral attention -- NVIDIA's stock fell approximately 17%, erasing $589 billion in market capitalization in a single session. This was the largest single-day market cap loss for any company in US stock market history. The broader US tech sector lost roughly $1 trillion in value that day, as investors recalculated whether the massive capital expenditures planned for AI infrastructure were justified if models could be trained at a fraction of the assumed cost. However, NVIDIA recovered fully within less than a month and went on to reach a $5.03 trillion market cap by October 2025, as the market concluded that cheaper AI would drive more demand, not less.
What is the Jevons Paradox in AI?
The Jevons Paradox, originally observed by economist William Stanley Jevons in 1865, states that when a resource becomes more efficient to use, total consumption of that resource increases rather than decreases. In AI, this means that as model training and inference costs decline -- inference costs fell 280x from $20 to $0.07 per million tokens between November 2022 and October 2024 -- total AI compute demand grows dramatically. Jensen Huang has noted that reasoning models consume 100x more compute than standard inference. AI is projected to consume 20% of US electricity by 2030. Cheaper models do not reduce infrastructure spending; they expand the addressable market for AI applications, creating net new demand that exceeds the efficiency gains.
Is open-source AI catching up to closed models?
Yes, and the gap is closing rapidly. Open-source models now average 89.6% of closed-model performance across standard benchmarks. On MMLU specifically, the gap between the best open and closed models shrank from 17.5 points to just 0.3 points in a single year. The average time for an open-source model to match a new closed-model benchmark dropped from 27 weeks to 13 weeks. Alibaba's Qwen family has surpassed 700 million downloads on Hugging Face with over 113,000 derivative models, and Chinese-origin models overtook US-origin models in total Hugging Face downloads by summer 2025. DeepSeek R1 itself, as an open-weight model, demonstrated that frontier-level performance no longer requires frontier-level budgets.
Related Articles
Topics: AI, Open Source, Strategy, Infrastructure
Browse all articles | About Signal