441 lines
6.2 KiB
Markdown
441 lines
6.2 KiB
Markdown
好,直接给你一版**“一屏吃透文件系统数据结构”**,从最底层磁盘组织 → inode → 数据块索引 → 目录结构 → 空闲空间管理 → 现代高级结构,一次性压满信息密度,尽量做到你说的“每屏最大信息量”。
|
||
|
||
---
|
||
|
||
# 一、文件系统的本质抽象(先立一个统一模型)
|
||
|
||
你可以把整个文件系统抽象成三张核心表:
|
||
|
||
```text
|
||
(1) inode表: inode_id → metadata(权限/大小/时间/指针)
|
||
(2) 数据映射: (inode_id, offset) → block
|
||
(3) 目录映射: (parent_inode, filename) → inode_id
|
||
```
|
||
|
||
👉 所有“文件结构”本质都在优化这三件事:
|
||
|
||
* 如何表示文件数据(第2条)
|
||
* 如何快速查找文件名(第3条)
|
||
* 如何管理空间(块分配)
|
||
|
||
---
|
||
|
||
# 二、文件数据的组织方式(核心重点)
|
||
|
||
## 1️⃣ 连续分配(Contiguous Allocation)
|
||
|
||
```text
|
||
inode:
|
||
start_block
|
||
length
|
||
```
|
||
|
||
```text
|
||
[block100][block101][block102][block103]...
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* 顺序读写极快(完美局部性)
|
||
* 实现简单
|
||
|
||
✘ 缺点:
|
||
|
||
* 文件增长困难(需要搬迁)
|
||
* 外部碎片严重
|
||
|
||
👉 现在基本不用(但思想在 extent 里复活)
|
||
|
||
---
|
||
|
||
## 2️⃣ 链式分配(Linked Allocation)
|
||
|
||
```text
|
||
block100 → block500 → block20 → block9 ...
|
||
```
|
||
|
||
inode 只存:
|
||
|
||
```text
|
||
first_block
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* 无外部碎片
|
||
* 动态扩展容易
|
||
|
||
✘ 致命问题:
|
||
|
||
* 随机访问 O(n)
|
||
* 每个block都要存指针(浪费)
|
||
* cache locality 极差
|
||
|
||
👉 FAT 文件系统就是这个思路(用数组模拟链表)
|
||
|
||
---
|
||
|
||
## 3️⃣ FAT(File Allocation Table)
|
||
|
||
本质:
|
||
|
||
```text
|
||
FAT[block] = next_block
|
||
```
|
||
|
||
```text
|
||
FAT:
|
||
100 → 500
|
||
500 → 20
|
||
20 → EOF
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* 比链式更容易管理
|
||
* 支持随机访问(通过表)
|
||
|
||
✘ 问题:
|
||
|
||
* FAT表巨大(必须常驻内存)
|
||
* 可靠性差(单点损坏)
|
||
|
||
---
|
||
|
||
## 4️⃣ 多级索引(inode经典结构,ext2/3)
|
||
|
||
inode结构:
|
||
|
||
```text
|
||
inode:
|
||
direct[12]
|
||
indirect
|
||
double_indirect
|
||
triple_indirect
|
||
```
|
||
|
||
访问逻辑:
|
||
|
||
```text
|
||
offset → 第几个block → 走指针树
|
||
```
|
||
|
||
结构:
|
||
|
||
```text
|
||
inode
|
||
/ | \
|
||
direct indirect double
|
||
|
|
||
[block pointers]
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* 支持大文件
|
||
* 随机访问 O(1)(理论)
|
||
|
||
✘ 问题:
|
||
|
||
* 小文件浪费指针
|
||
* 大文件访问多次IO(多级跳转)
|
||
|
||
👉 这是 UNIX 经典设计
|
||
|
||
---
|
||
|
||
## 5️⃣ Extent(现代主流,ext4)
|
||
|
||
核心思想:
|
||
|
||
👉 不记录“每个block”,而记录“连续区间”
|
||
|
||
```text
|
||
extent:
|
||
(logical_start, physical_start, length)
|
||
```
|
||
|
||
例:
|
||
|
||
```text
|
||
(0, 100, 50) // 文件0-49 → 磁盘100-149
|
||
```
|
||
|
||
结构:
|
||
|
||
```text
|
||
inode
|
||
└── extent tree
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* 极大减少元数据
|
||
* 顺序读写性能好
|
||
* 减少碎片
|
||
|
||
👉 本质:**连续分配的升级版 + 树结构管理**
|
||
|
||
---
|
||
|
||
## 6️⃣ B-tree / B+tree(XFS / Btrfs)
|
||
|
||
```text
|
||
[key]
|
||
/ | \
|
||
child child child
|
||
```
|
||
|
||
key可能是:
|
||
|
||
```text
|
||
(file_offset → block)
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* O(log n) 查找
|
||
* 高扇出 → 极少IO
|
||
* 适合大规模数据
|
||
|
||
👉 XFS:extent索引用B+tree
|
||
👉 Btrfs:所有东西都是B-tree
|
||
|
||
---
|
||
|
||
## 7️⃣ Copy-on-Write(Btrfs / ZFS核心)
|
||
|
||
写数据时:
|
||
|
||
```text
|
||
旧块不改 → 写新块 → 更新指针
|
||
```
|
||
|
||
结构:
|
||
|
||
```text
|
||
旧树 ----\
|
||
\--> 新树(部分共享)
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* snapshot免费
|
||
* 崩溃一致性好
|
||
* 支持版本回滚
|
||
|
||
✘ 缺点:
|
||
|
||
* 写放大
|
||
* 碎片增加
|
||
|
||
---
|
||
|
||
# 三、目录结构(名字 → inode)
|
||
|
||
## 1️⃣ 线性表(ext2早期)
|
||
|
||
```text
|
||
"file1" → inode 10
|
||
"file2" → inode 20
|
||
```
|
||
|
||
查找:
|
||
|
||
```text
|
||
O(n)
|
||
```
|
||
|
||
---
|
||
|
||
## 2️⃣ 哈希表(HTree,ext4)
|
||
|
||
```text
|
||
hash(filename) → bucket → inode
|
||
```
|
||
|
||
结构:
|
||
|
||
```text
|
||
root
|
||
/ \
|
||
bucket bucket
|
||
```
|
||
|
||
✔ 平均:
|
||
|
||
```text
|
||
O(log n)
|
||
```
|
||
|
||
---
|
||
|
||
## 3️⃣ B-tree目录(XFS / Btrfs)
|
||
|
||
```text
|
||
filename → inode
|
||
```
|
||
|
||
用B+tree组织:
|
||
|
||
✔ 优点:
|
||
|
||
* 有序(支持range scan)
|
||
* 高效查找
|
||
|
||
---
|
||
|
||
# 四、空闲空间管理(磁盘怎么分配)
|
||
|
||
## 1️⃣ 位图(bitmap)
|
||
|
||
```text
|
||
1 = used
|
||
0 = free
|
||
```
|
||
|
||
```text
|
||
001011000100...
|
||
```
|
||
|
||
✔ 优点:
|
||
|
||
* 查找连续空间方便
|
||
* 实现简单
|
||
|
||
---
|
||
|
||
## 2️⃣ 空闲链表
|
||
|
||
```text
|
||
free → free → free
|
||
```
|
||
|
||
✘ 问题:
|
||
|
||
* 找连续空间困难
|
||
|
||
---
|
||
|
||
## 3️⃣ 分组(grouping)
|
||
|
||
一次记录多个空闲块:
|
||
|
||
```text
|
||
block → [free1, free2, free3...]
|
||
```
|
||
|
||
---
|
||
|
||
## 4️⃣ B-tree管理(现代FS)
|
||
|
||
```text
|
||
free_extent:
|
||
(start, length)
|
||
```
|
||
|
||
用树管理:
|
||
|
||
✔ 高效查找大块连续空间
|
||
|
||
---
|
||
|
||
# 五、日志(Journaling)
|
||
|
||
目的:
|
||
|
||
👉 防止系统崩溃导致文件系统损坏
|
||
|
||
流程:
|
||
|
||
```text
|
||
写日志 → 提交 → 写实际数据
|
||
```
|
||
|
||
模式:
|
||
|
||
1. write-ahead logging
|
||
2. metadata journaling(ext4默认)
|
||
3. full journaling(慢但安全)
|
||
|
||
---
|
||
|
||
# 六、把所有结构统一成一张“全景图”
|
||
|
||
```text
|
||
用户操作:
|
||
open("/a/b/file")
|
||
|
||
↓ 路径解析
|
||
目录结构(HTree / B-tree)
|
||
(parent_inode, name) → inode
|
||
|
||
↓ inode
|
||
inode table
|
||
|
||
↓ 数据定位
|
||
extent / B-tree / indirect block
|
||
|
||
↓ block
|
||
page cache → disk
|
||
|
||
↓ 磁盘
|
||
物理block
|
||
```
|
||
|
||
---
|
||
|
||
# 七、一句话总结(给你压缩记忆)
|
||
|
||
```text
|
||
文件系统 = inode + 数据索引结构 + 目录索引 + 空闲管理 + 一致性机制
|
||
```
|
||
|
||
演化路径:
|
||
|
||
```text
|
||
连续 → 链式 → FAT → inode多级索引 → extent → B-tree → CoW
|
||
```
|
||
|
||
---
|
||
|
||
# 八、你现在应该形成的“认知模型”(很重要)
|
||
|
||
你可以这样理解:
|
||
|
||
```text
|
||
早期FS:像数组 + 指针
|
||
现代FS:像数据库(KV + B-tree + WAL)
|
||
```
|
||
|
||
---
|
||
|
||
# 九、如果你继续往下深挖(我建议你下一步搞这个)
|
||
|
||
下一步最关键的是搞清楚:
|
||
|
||
👉 一次 read() 发生了什么:
|
||
|
||
```text
|
||
用户态
|
||
→ VFS
|
||
→ dentry cache
|
||
→ inode cache
|
||
→ page cache
|
||
→ block layer
|
||
→ IO scheduler
|
||
→ disk
|
||
```
|
||
|
||
这个链路一通:
|
||
|
||
👉 文件系统 + 虚拟内存 + IO 子系统就串起来了
|
||
|
||
---
|
||
|
||
如果你下一步想继续,我可以给你一版:
|
||
|
||
👉 **“从 open 到 read 到 page cache 命中/缺页的全流程 + 内核数据结构图(task_struct / file / inode / dentry 全串)”**
|
||
|
||
那个是操作系统理解的分水岭。
|