416 lines
6.7 KiB
Markdown
416 lines
6.7 KiB
Markdown
你可以把 **inode(i节点)**当成一句话记住:
|
||
👉 **inode = 文件的“唯一身份 + 元数据 + 数据定位入口”**
|
||
它不存文件名,只描述“这个文件是什么、数据在哪”。
|
||
|
||
下面给你一版**从磁盘结构 → 内核对象 → 系统调用全链路**的高密度说明,直接对齐现代 Linux(ext4 / XFS / Btrfs 共性 + 差异)。
|
||
|
||
---
|
||
|
||
# 一、inode在磁盘上的本质结构(on-disk)
|
||
|
||
以 ext4 为典型(其他FS思想类似但结构不同):
|
||
|
||
```text
|
||
inode (128B / 256B / 可扩展)
|
||
├── i_mode (文件类型 + 权限 rwx)
|
||
├── i_uid / i_gid (属主)
|
||
├── i_size (文件大小)
|
||
├── i_atime (访问时间)
|
||
├── i_mtime (修改时间)
|
||
├── i_ctime (状态变更时间)
|
||
├── i_links_count (硬链接数)
|
||
├── i_blocks (占用块数)
|
||
├── i_flags (immutable / append-only等)
|
||
├── i_block (核心:数据指针 or extent root)
|
||
└── extra (ACL / xattr / checksum ...)
|
||
```
|
||
|
||
关键点:
|
||
|
||
```text
|
||
inode号(inode number)= 文件的唯一ID(在同一文件系统内)
|
||
```
|
||
|
||
👉 磁盘布局:
|
||
|
||
```text
|
||
[superblock][group desc][inode table][data blocks]
|
||
↑
|
||
inode存在这里
|
||
```
|
||
|
||
👉 每个文件 = 一个 inode
|
||
👉 inode 是**预分配的(ext4)** or 动态分配(XFS/Btrfs)
|
||
|
||
---
|
||
|
||
# 二、inode不存什么(非常关键)
|
||
|
||
inode **不存文件名**,文件名在目录里:
|
||
|
||
```text
|
||
目录项(directory entry)
|
||
"name" → inode number
|
||
```
|
||
|
||
所以:
|
||
|
||
```text
|
||
多个文件名 → 同一个 inode(硬链接)
|
||
```
|
||
|
||
---
|
||
|
||
# 三、inode如何定位数据(核心机制)
|
||
|
||
现代FS的关键就在这里。
|
||
|
||
## 1️⃣ 旧:多级索引(ext2/3)
|
||
|
||
```text
|
||
inode.i_block:
|
||
[ direct pointers ]
|
||
[ single indirect ]
|
||
[ double indirect ]
|
||
[ triple indirect ]
|
||
```
|
||
|
||
访问路径:
|
||
|
||
```text
|
||
offset → block index → 指针跳转 → 数据块
|
||
```
|
||
|
||
---
|
||
|
||
## 2️⃣ 现代:extent tree(ext4)
|
||
|
||
```text
|
||
inode
|
||
└── extent tree root
|
||
├── (logical → physical, length)
|
||
├── ...
|
||
```
|
||
|
||
👉 一个extent表示一段连续空间:
|
||
|
||
```text
|
||
(0 → block100, len=50)
|
||
```
|
||
|
||
优点:
|
||
|
||
```text
|
||
减少元数据 + 提高顺序读性能 + 降低碎片
|
||
```
|
||
|
||
---
|
||
|
||
## 3️⃣ XFS / Btrfs:B+tree / B-tree
|
||
|
||
inode中只存:
|
||
|
||
```text
|
||
root pointer
|
||
```
|
||
|
||
真正的数据结构在树里:
|
||
|
||
```text
|
||
inode → B+tree → extent → block
|
||
```
|
||
|
||
👉 本质:inode只是入口,数据在树结构中
|
||
|
||
---
|
||
|
||
# 四、inode在内核中的结构(in-memory)
|
||
|
||
磁盘inode加载到内存后变成:
|
||
|
||
```c
|
||
struct inode {
|
||
umode_t i_mode;
|
||
kuid_t i_uid;
|
||
loff_t i_size;
|
||
|
||
struct super_block *i_sb;
|
||
|
||
struct address_space *i_mapping; // page cache
|
||
|
||
struct inode_operations *i_op;
|
||
struct file_operations *i_fop;
|
||
|
||
atomic_t i_count; // 引用计数
|
||
}
|
||
```
|
||
|
||
关键理解:
|
||
|
||
```text
|
||
磁盘inode = 静态数据
|
||
内存inode = 活跃对象 + 行为(函数指针)
|
||
```
|
||
|
||
👉 i_op / i_fop:
|
||
|
||
* read / write / lookup / create 等操作入口
|
||
* 实现“多文件系统统一接口”(VFS)
|
||
|
||
---
|
||
|
||
# 五、VFS层:inode的统一抽象(非常关键)
|
||
|
||
Linux不是直接操作ext4/XFS,而是通过:
|
||
|
||
```text
|
||
VFS(Virtual File System)
|
||
```
|
||
|
||
统一结构:
|
||
|
||
```text
|
||
struct file // 打开的文件
|
||
struct inode // 文件本体
|
||
struct dentry // 路径缓存
|
||
```
|
||
|
||
关系:
|
||
|
||
```text
|
||
path → dentry → inode → data
|
||
```
|
||
|
||
---
|
||
|
||
# 六、一次 open() 的完整inode路径(你要重点理解这个)
|
||
|
||
```text
|
||
open("/a/b/file")
|
||
```
|
||
|
||
内核流程:
|
||
|
||
```text
|
||
1. 路径解析
|
||
"/" → "a" → "b" → "file"
|
||
|
||
2. 查 dentry cache
|
||
(name → inode) 是否命中
|
||
|
||
3. 若未命中:
|
||
→ 从磁盘目录读取
|
||
→ 找到 inode number
|
||
|
||
4. 读取 inode(inode table / B-tree)
|
||
|
||
5. 创建 struct file
|
||
|
||
6. 返回 fd
|
||
```
|
||
|
||
👉 最核心的一跳:
|
||
|
||
```text
|
||
filename → inode number → inode → data
|
||
```
|
||
|
||
---
|
||
|
||
# 七、read() 时 inode 的作用
|
||
|
||
```text
|
||
read(fd, buf, size)
|
||
```
|
||
|
||
路径:
|
||
|
||
```text
|
||
fd → struct file → inode → address_space → page cache
|
||
```
|
||
|
||
关键:
|
||
|
||
```text
|
||
inode.i_mapping → page cache
|
||
```
|
||
|
||
流程:
|
||
|
||
```text
|
||
1. 查 page cache(按 inode + offset)
|
||
2. 命中 → 直接返回
|
||
3. 未命中:
|
||
→ inode 找 block(extent/B-tree)
|
||
→ 触发磁盘IO
|
||
→ 填充 page cache
|
||
```
|
||
|
||
👉 所以:
|
||
|
||
```text
|
||
inode = page cache 的索引根
|
||
```
|
||
|
||
---
|
||
|
||
# 八、写入(write)时 inode 的变化
|
||
|
||
```text
|
||
write(fd, ...)
|
||
```
|
||
|
||
流程:
|
||
|
||
```text
|
||
1. 写 page cache(dirty)
|
||
2. 更新 inode:
|
||
- size
|
||
- mtime
|
||
3. 延迟写回(writeback)
|
||
4. journaling / CoW 保证一致性
|
||
```
|
||
|
||
---
|
||
|
||
# 九、删除文件(unlink)的本质
|
||
|
||
```text
|
||
rm file
|
||
```
|
||
|
||
不是“删数据”,而是:
|
||
|
||
```text
|
||
1. 目录删除:
|
||
name → inode 映射移除
|
||
|
||
2. inode.i_links_count--
|
||
|
||
3. 如果 == 0:
|
||
→ 回收 inode
|
||
→ 回收数据块
|
||
```
|
||
|
||
👉 重要结论:
|
||
|
||
```text
|
||
文件内容存在与否 ≠ 是否有文件名
|
||
```
|
||
|
||
这解释了:
|
||
|
||
* 删除但进程还在用 → 文件仍存在
|
||
* 临时文件机制
|
||
|
||
---
|
||
|
||
# 十、硬链接 vs 软链接(inode视角)
|
||
|
||
## 硬链接
|
||
|
||
```text
|
||
file1 → inode 100
|
||
file2 → inode 100
|
||
```
|
||
|
||
👉 同一个 inode
|
||
|
||
---
|
||
|
||
## 软链接(symbolic link)
|
||
|
||
```text
|
||
link → inode 200 → "path/to/file"
|
||
```
|
||
|
||
👉 inode里存路径字符串
|
||
|
||
---
|
||
|
||
# 十一、不同文件类型的 inode 行为
|
||
|
||
inode.type 决定行为:
|
||
|
||
```text
|
||
普通文件 → 数据块
|
||
目录 → name→inode映射
|
||
字符设备 → device id
|
||
块设备 → device id
|
||
socket → 内核对象
|
||
pipe → 内核缓冲区
|
||
```
|
||
|
||
👉 所以:
|
||
|
||
```text
|
||
“一切皆文件” = 一切皆 inode
|
||
```
|
||
|
||
---
|
||
|
||
# 十二、现代文件系统中的inode演化
|
||
|
||
## ext4
|
||
|
||
* inode固定大小
|
||
* extent tree
|
||
* inode table(分组)
|
||
|
||
## XFS
|
||
|
||
* inode动态分配
|
||
* B+tree管理
|
||
* 更适合大规模
|
||
|
||
## Btrfs
|
||
|
||
* inode只是B-tree中的一项
|
||
* 没有“固定inode表”
|
||
* 完全对象化:
|
||
|
||
```text
|
||
(key = inode_id, value = metadata)
|
||
```
|
||
|
||
---
|
||
|
||
# 十三、最重要的统一理解(帮你打通)
|
||
|
||
```text
|
||
inode = 文件系统中的“对象(object)”
|
||
```
|
||
|
||
它承担:
|
||
|
||
```text
|
||
1. 身份(inode number)
|
||
2. 属性(metadata)
|
||
3. 数据入口(block mapping)
|
||
4. 行为(VFS函数指针)
|
||
```
|
||
|
||
---
|
||
|
||
# 十四、一句话终极总结
|
||
|
||
```text
|
||
文件名只是入口,inode才是文件本体
|
||
```
|
||
|
||
---
|
||
|
||
# 十五、如果你继续往下(强烈建议)
|
||
|
||
下一步你应该打通这一整条链(非常关键):
|
||
|
||
```text
|
||
fd → file → inode → page cache → block → disk
|
||
```
|
||
|
||
如果你要,我可以给你一版:
|
||
|
||
👉 **“从用户态read()到磁盘IO的完整内核调用路径 + 所有结构体关系图(源码级)”**
|
||
|
||
这个一旦通了,你的OS理解会直接上一个台阶。
|