307 lines
4.6 KiB
Markdown
307 lines
4.6 KiB
Markdown
好,直接给你一条**从“网卡收包 → 中断 → 内核 → 唤醒线程 → 用户 `read` 返回”**的**纳秒级时间线**。你就按这条线在脑子里“放电影”,把线程/中断/调度三者的关系锁死。
|
||
|
||
---
|
||
|
||
# 0)初始状态(t = 0)
|
||
|
||
```text id="t0"
|
||
CPU 正在运行 线程A(你的程序)
|
||
线程A 调用了 read(socket)
|
||
→ 内核发现没有数据
|
||
→ 线程A 被阻塞(睡眠)
|
||
→ 调度器切换到 线程B(比如别的程序)
|
||
```
|
||
|
||
👉 当前:
|
||
|
||
```text id="state0"
|
||
CPU 在跑 线程B
|
||
线程A 在等数据(sleep)
|
||
```
|
||
|
||
---
|
||
|
||
# 1)网卡收到数据(t ≈ 10ns 级)
|
||
|
||
```text id="t1"
|
||
网卡 NIC 收到一个数据包(来自网络)
|
||
```
|
||
|
||
网卡内部:
|
||
|
||
```text id="t1_1"
|
||
1)根据 hash → 选择 RX queue
|
||
2)找到一个 descriptor
|
||
3)准备 DMA
|
||
```
|
||
|
||
---
|
||
|
||
# 2)DMA 写入内存(t ≈ 100ns)
|
||
|
||
```text id="t2"
|
||
网卡 → DMA → 内存(ring buffer)
|
||
```
|
||
|
||
👉 此时:
|
||
|
||
```text id="state1"
|
||
数据已经在内存里了
|
||
CPU 完全不知道
|
||
线程A 还在睡
|
||
```
|
||
|
||
---
|
||
|
||
# 3)网卡决定“是否发中断”(t ≈ 100ns ~ μs)
|
||
|
||
取决于策略(coalescing):
|
||
|
||
```text id="t3"
|
||
可能:
|
||
- 立刻中断
|
||
- 等几个包一起中断
|
||
```
|
||
|
||
假设现在触发:
|
||
|
||
```text id="t3_1"
|
||
网卡发 MSI(PCIe 写)
|
||
```
|
||
|
||
---
|
||
|
||
# 4)中断到达 CPU(t ≈ μs)
|
||
|
||
```text id="t4"
|
||
Local APIC 收到中断
|
||
→ 通知 CPU core(比如 CPU2)
|
||
```
|
||
|
||
CPU此时:
|
||
|
||
```text id="state2"
|
||
正在执行 线程B
|
||
```
|
||
|
||
---
|
||
|
||
# 5)CPU 响应中断(关键点)
|
||
|
||
在指令边界:
|
||
|
||
```text id="t5"
|
||
CPU:
|
||
1)保存现场(线程B的寄存器)
|
||
2)查 IDT
|
||
3)跳转到网卡中断处理函数(ISR)
|
||
```
|
||
|
||
👉 现在:
|
||
|
||
```text id="state3"
|
||
CPU 不在跑线程
|
||
CPU 在跑“中断处理代码”
|
||
```
|
||
|
||
---
|
||
|
||
# 6)中断“上半部”(hardirq,极短)
|
||
|
||
```text id="t6"
|
||
ISR(中断处理函数):
|
||
- 关闭该设备中断(防止风暴)
|
||
- 标记:需要后续处理(softirq)
|
||
- 立刻返回
|
||
```
|
||
|
||
时间:
|
||
|
||
```text id="t6_1"
|
||
通常 < 5 微秒(必须非常短)
|
||
```
|
||
|
||
👉 此时:
|
||
|
||
```text id="state4"
|
||
数据还没真正处理
|
||
只是“登记了一下”
|
||
```
|
||
|
||
---
|
||
|
||
# 7)进入 softirq(下半部,真正干活)
|
||
|
||
```text id="t7"
|
||
CPU 进入 softirq(NET_RX)
|
||
```
|
||
|
||
这一步可能:
|
||
|
||
```text id="t7_1"
|
||
- 立刻执行
|
||
- 或稍后调度执行
|
||
```
|
||
|
||
---
|
||
|
||
## softirq 做什么(核心)
|
||
|
||
```text id="t7_2"
|
||
1)从 ring buffer 批量取包
|
||
2)交给网络协议栈(IP / TCP)
|
||
3)找到对应 socket
|
||
4)把数据放入 socket buffer
|
||
```
|
||
|
||
---
|
||
|
||
# 8)唤醒线程A(关键转折点)
|
||
|
||
```text id="t8"
|
||
发现:
|
||
这个 socket 正在被线程A等待
|
||
```
|
||
|
||
于是:
|
||
|
||
```text id="t8_1"
|
||
wake_up(线程A)
|
||
→ 线程A 从 sleep → runnable(就绪态)
|
||
```
|
||
|
||
👉 现在:
|
||
|
||
```text id="state5"
|
||
线程A 已经可以运行了
|
||
但还没运行
|
||
```
|
||
|
||
---
|
||
|
||
# 9)调度器决定是否切换(关键!)
|
||
|
||
此时 CPU 还在:
|
||
|
||
```text id="state6"
|
||
执行 softirq(还没回到线程B)
|
||
```
|
||
|
||
接下来两种情况:
|
||
|
||
---
|
||
|
||
## 情况 A:不抢占(常见)
|
||
|
||
```text id="t9A"
|
||
softirq 执行完
|
||
→ 返回中断
|
||
→ 恢复线程B
|
||
```
|
||
|
||
然后某个时刻:
|
||
|
||
```text id="t9A_1"
|
||
调度器:
|
||
发现线程A优先级高
|
||
→ 切换到线程A
|
||
```
|
||
|
||
---
|
||
|
||
## 情况 B:立即抢占
|
||
|
||
```text id="t9B"
|
||
softirq 结束时触发调度
|
||
→ 直接切换到线程A
|
||
```
|
||
|
||
---
|
||
|
||
# 10)线程A 恢复执行(t ≈ μs ~ ms)
|
||
|
||
```text id="t10"
|
||
CPU 开始执行 线程A
|
||
```
|
||
|
||
此时:
|
||
|
||
```text id="t10_1"
|
||
read() 返回
|
||
buf 已经有数据
|
||
```
|
||
|
||
👉 用户程序看到:
|
||
|
||
```text id="state7"
|
||
“数据到了”
|
||
```
|
||
|
||
---
|
||
|
||
# 十一、把整条链压缩成一条“黄金路径”
|
||
|
||
```text id="gold"
|
||
网卡收包
|
||
→ DMA 写内存
|
||
→ 发中断
|
||
→ CPU 进入 ISR(登记)
|
||
→ softirq 处理数据
|
||
→ 放入 socket buffer
|
||
→ 唤醒线程A
|
||
→ 调度器切换
|
||
→ 线程A 继续执行 read()
|
||
```
|
||
|
||
---
|
||
|
||
# 十二、你要锁死的三件事
|
||
|
||
---
|
||
|
||
## 1)中断干嘛?
|
||
|
||
```text id="lock1"
|
||
中断 = 告诉CPU:“有事发生了”
|
||
```
|
||
|
||
---
|
||
|
||
## 2)DMA干嘛?
|
||
|
||
```text id="lock2"
|
||
DMA = 提前把数据放到内存
|
||
(不等CPU)
|
||
```
|
||
|
||
---
|
||
|
||
## 3)线程干嘛?
|
||
|
||
```text id="lock3"
|
||
线程 = 最终处理数据的执行主体
|
||
```
|
||
|
||
---
|
||
|
||
# 十三、终极一句话(一定要内化)
|
||
|
||
```text id="final"
|
||
中断不会“处理你的数据”,
|
||
它只是把CPU拉进内核,
|
||
真正干活的是 softirq 和线程,
|
||
真正决定谁运行的是调度器。
|
||
```
|
||
|
||
---
|
||
|
||
如果你这条时间线已经完全顺了,下一步可以直接进“更硬核”的:
|
||
|
||
👉 **为什么 DPDK 能把这整条链砍掉(没有中断、没有内核、没有调度)还能更快?**
|
||
|
||
那一层就是“极限性能模型”。
|
||
|
||
|
||
---
|
||
|