Question 1

What is DeepSeek R1 and who made it?

Accepted Answer

DeepSeek R1 is a 671-billion-parameter large language model released on January 20, 2025, by DeepSeek, an AI lab based in Hangzhou, China. The company was founded by Liang Wenfeng, co-founder of High-Flyer, a quantitative hedge fund managing approximately $8 billion in assets. DeepSeek operates with roughly 150-200 employees and a core model team of just 63 people. R1 uses a Mixture-of-Experts (MoE) architecture that activates only 37 billion parameters per token, making it far more efficient than dense models of comparable size. It was trained on 2,048 Nvidia H800 GPUs for approximately 2.788 million GPU hours.

Question 2

How much did DeepSeek R1 cost to train?

Accepted Answer

DeepSeek R1 cost approximately $5.6 million in compute to train, based on 2.788 million H800 GPU hours. For comparison, GPT-4 is estimated to have cost $78-100 million or more to train, and GPT-5 reportedly cost $500 million per training run with total development costs of $1.25-2.5 billion. That makes DeepSeek R1 roughly 14-18x cheaper than GPT-4 and nearly 90-100x cheaper than GPT-5's total cost. The low training cost was achieved through the MoE architecture, aggressive engineering optimization, and the fact that DeepSeek's parent company High-Flyer had already accumulated significant GPU resources before the US export ban on H800 chips.

Question 3

How does DeepSeek compare to GPT-4 on benchmarks?

Accepted Answer

DeepSeek R1 outperforms GPT-4 on several major benchmarks. On MMLU (Massive Multitask Language Understanding), R1 scores 90.8% versus GPT-4's 87.2%. On AIME 2024 (a competitive mathematics exam), R1 scores 79.8% compared to GPT-4's 9.3% -- a gap of over 70 percentage points. On MATH-500, R1 scores 97.3%. The subsequent DeepSeek V3.2-Speciale model scored 96.0% on AIME, beating even GPT-5-High's 94.6%. These results demonstrate that a model trained for $5.6 million can match or exceed models that cost 10-100x more to develop.

Question 4

What was the DeepSeek stock market crash?

Accepted Answer

On January 27, 2025 -- the first trading day after DeepSeek R1 gained viral attention -- NVIDIA's stock fell approximately 17%, erasing $589 billion in market capitalization in a single session. This was the largest single-day market cap loss for any company in US stock market history. The broader US tech sector lost roughly $1 trillion in value that day, as investors recalculated whether the massive capital expenditures planned for AI infrastructure were justified if models could be trained at a fraction of the assumed cost. However, NVIDIA recovered fully within less than a month and went on to reach a $5.03 trillion market cap by October 2025, as the market concluded that cheaper AI would drive more demand, not less.

Question 5

What is the Jevons Paradox in AI?

Accepted Answer

The Jevons Paradox, originally observed by economist William Stanley Jevons in 1865, states that when a resource becomes more efficient to use, total consumption of that resource increases rather than decreases. In AI, this means that as model training and inference costs decline -- inference costs fell 280x from $20 to $0.07 per million tokens between November 2022 and October 2024 -- total AI compute demand grows dramatically. Jensen Huang has noted that reasoning models consume 100x more compute than standard inference. AI is projected to consume 20% of US electricity by 2030. Cheaper models do not reduce infrastructure spending; they expand the addressable market for AI applications, creating net new demand that exceeds the efficiency gains.

Question 6

Is open-source AI catching up to closed models?

Accepted Answer

Yes, and the gap is closing rapidly. Open-source models now average 89.6% of closed-model performance across standard benchmarks. On MMLU specifically, the gap between the best open and closed models shrank from 17.5 points to just 0.3 points in a single year. The average time for an open-source model to match a new closed-model benchmark dropped from 27 weeks to 13 weeks. Alibaba's Qwen family has surpassed 700 million downloads on Hugging Face with over 113,000 derivative models, and Chinese-origin models overtook US-origin models in total Hugging Face downloads by summer 2025. DeepSeek R1 itself, as an open-weight model, demonstrated that frontier-level performance no longer requires frontier-level budgets.