0% found this document useful (0 votes)
17 views11 pages

DeepSeek图解10页

The document discusses DeepSeek, a framework related to large language models (LLMs) and transformers, detailing its components, including pretraining, supervised fine-tuning, and reinforcement learning. It also introduces DeepSeek-R1, which emphasizes reasoning-oriented reinforcement learning and various model checkpoints. Additionally, it provides links for further reading on related topics and methodologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views11 pages

DeepSeek图解10页

The document discusses DeepSeek, a framework related to large language models (LLMs) and transformers, detailing its components, including pretraining, supervised fine-tuning, and reinforcement learning. It also introduces DeepSeek-R1, which emphasizes reasoning-oriented reinforcement learning and various model checkpoints. Additionally, it provides links for further reading on related topics and methodologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

DeepSeek 10 PDF

1 DeepSeek . . . . . . . . . . . . . . . . . . . . . . 2
1.1 DeepSeek . . . . . . . . . . . . . . . . . 2
1.2 DeepSeek . . . . . . . . . . . . . . . . . . . 2
1.3 DeepSeek . . . . . . . . . . . . . . . . . . . 4

2 DeepSeek . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 LLM . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Transformer . . . . . . . . . . . . . . . . . . . . . . 6
2.3 LLM . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Pretraining . . . . . . . . . . . . . . . . . . 7
2.3.2 Supervised Fine-Tuning, SFT . . . . . . 7
2.3.3 Reinforcement Learning, RL . . . . . . . 7

3 DeepSeek-R1 . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 DeepSeek-R1 . . . . . . . . . . . . . . . . . . . 7
3.1.1 1 R1-Zero . . . . . . . 8
3.1.2 2 . . . . . . . . . . . . . . . 8
3.2 R1-Zero . . . . . . . . . . . . . . 9
3.3 . . . . . . . . . . . . . . . . . . . . . . 10
3.4 DeepSeek-R1 . . . . . . . . . . . . . . . . . . . . . . . . 11

4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1
1 DeepSeek

1.1 DeepSeek

DeepSeek
1.

2. Fine-tuning

3.

DeepSeek

DeepSeek

• DeepSeek R1

1.2 DeepSeek

DeepSeek
ollama ollama

Ollama
1
1:

ollama
10 2

2: Ollama

ollama pull deepseek-r1:1.5b deepseek-


r1 3

3: DeepSeek-r1
DeepSeek
cmd(Windows ) terminal(
) ollama run deepseek-r1:1.5b
4

4: Ollama deepseek-r1

1.3 DeepSeek

DeepSeek
Python ? think

5: deepseek-r1

think
6 :

6: deepseek-r1

2 DeepSeek
DeepSeek-R1 LLM

AI Large
Language Model, LLM LLM NLP

LLM
LLM

2.1 LLM

deepseek-r1:1.5b, qwen:7b, llama:8b


1.5b, 7b 8b b billion 7b 70 8b
80 70 80 weight+bias
Transformer Transformer
70 80
ImageNet 20News-
Group

Scaling Laws

Scaling Laws
Scaling Laws

Transformer Scaling Laws


Transformer

2.2 Transformer

LLM 2017 Google Transformer


RNN LSTM
Transformer 1.
Self-Attention
2. Multi-Head Attention
3.
FFN 4.
Positional Encoding

Transformer

1.
2.
3. AI
2.3 LLM

2.3.1 Pretraining

LLM 1.
2.
3.

2.3.2 Supervised Fine-Tuning, SFT

SFT

2.3.3 Reinforcement Learning, RL

RL RLHF,
Reinforcement Learning from Human Feedback

RLHF

• 1

• 2

• 3

3 DeepSeek-R1

3.1 DeepSeek-R1

DeepSeek-R1
AI RL SFT
AI
DeepSeek-V3
SFT +
7

7: R1

DeepSeek-R1 DeepSeek-v3-Base

3.1.1 1 R1-Zero

7 Reasoning-Oriented Reinforcement Learn-


ing Iterim reasoning model , 8

DeepSeek-R1
DeepSeek-R1-Zero
R1-Zero Chain-of-Thought,
CoT SFT 7 3.2

3.1.2 2

R1-Zero
DeepSeek

7 General Reinforcement Learning SFT-


checkpoint RL
3.3

3.2 R1-Zero

SFT 8
SFT

8: Interim reasoning model

DeepSeek
R1-Zero
R1-Zero SFT
9 V3

9: R1-Zero

OpenAI O1
10 pass@1 16
cons@16
OpenAI O1 DeepSeek-R1-Zero
OpenAI O1.

10: R1-Zero

3.3

Preference Tuning 11
R1

helpfulness safety Llama

DeepSeek-R1 R1-Zero
AI

11: R1
3.4 DeepSeek-R1

Reasoning-Oriented RL
CoT

DeepSeek-R1 R1-Zero

AI

Reasoning-Oriented
RL CoT

DeepSeek-R1 R1-Zero

AI

4
https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1
https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-
experts

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy