python/Pytorch/nn/readme.md

好嘞 😃，那我给你系统、详细地讲一下 **`torch.nn`** ——它基本上是 PyTorch 搭建神经网络的核心模块。

---

# 🔹 1. `torch.nn` 是什么？

* `torch.nn` = **neural network（神经网络）** 的缩写。
* 作用：提供了构建神经网络的各种**模块**（layers）、**容器**、**损失函数**、**工具函数**。
* 核心思想：把网络看成一组 **模块（Module）** 的组合，每个模块都可以包含参数（权重、偏置），也可以嵌套其他模块。

---

# 🔹 2. 核心类：`nn.Module`

所有神经网络的基础类，几乎所有东西（层、模型、损失函数）都继承自它。

### 常见用法

```python
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)  # 输入10维 -> 输出20维
        self.fc2 = nn.Linear(20, 1)   # 输入20维 -> 输出1维

    def forward(self, x):  # 定义前向传播
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = MyModel()
```

👉 所有模型都需要定义 `__init__` 和 `forward` 方法。

---

# 🔹 3. 常用层（Layers）

`torch.nn` 提供了很多常用的层，常见的分类：

### （1）全连接层（Linear）

```python
nn.Linear(in_features, out_features)
```

* 就是矩阵乘法 + bias
* 常用于 MLP

---

### （2）卷积层（CNN）

```python
nn.Conv1d, nn.Conv2d, nn.Conv3d
```

* 卷积神经网络的核心
* 用于提取空间/时序特征

---

### （3）循环神经网络（RNN / LSTM / GRU）

```python
nn.RNN, nn.LSTM, nn.GRU
```

* 用于处理序列数据（文本、时间序列）

---

### （4）归一化层

```python
nn.BatchNorm1d, nn.BatchNorm2d, nn.LayerNorm
```

* 让训练更稳定，加速收敛

---

### （5）正则化层

```python
nn.Dropout(p=0.5)
```

* 随机“丢弃”神经元，防止过拟合

---

# 🔹 4. 常用容器（Containers）

用于把多个层组合在一起。

### （1）Sequential

```python
model = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 1)
)
```

* 顺序堆叠层，适合简单模型。

### （2）ModuleList / ModuleDict

```python
self.layers = nn.ModuleList([nn.Linear(10, 20), nn.Linear(20, 30)])
self.dict = nn.ModuleDict({
    "fc1": nn.Linear(10, 20),
    "fc2": nn.Linear(20, 1)
})
```

* 更灵活，可以动态组合模块。

---

# 🔹 5. 常用损失函数（Loss functions）

`torch.nn` 提供了很多常用的 loss：

* **回归**

  ```python
  nn.MSELoss()       # 均方误差
  nn.L1Loss()        # 绝对值误差
  ```

* **分类**

  ```python
  nn.CrossEntropyLoss()   # 多分类交叉熵
  nn.BCELoss()            # 二分类交叉熵
  nn.NLLLoss()            # 负对数似然
  ```

* **其他**

  ```python
  nn.SmoothL1Loss()  # Huber loss
  ```

---

# 🔹 6. 激活函数

激活函数也在 `torch.nn` 里，分为 **类** 和 **函数式** 两种：

* 类形式（模块）

  ```python
  nn.ReLU(), nn.Sigmoid(), nn.Tanh()
  ```
* 函数形式（函数接口）

  ```python
  torch.relu(x), torch.sigmoid(x), torch.tanh(x)
  ```

👉 一般在 `nn.Sequential` 里用类形式，在 `forward` 里用函数形式。

---

# 🔹 7. 使用流程（套路）

一个典型训练流程：

```python
import torch
import torch.nn as nn
import torch.optim as optim

# 1. 定义模型
model = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 1)
)

# 2. 定义损失函数
criterion = nn.MSELoss()

# 3. 定义优化器
optimizer = optim.SGD(model.parameters(), lr=0.01)

# 4. 前向传播 + 反向传播 + 更新
x = torch.randn(5, 10)   # 输入 (batch=5, features=10)
y = torch.randn(5, 1)    # 目标

output = model(x)        # 前向
loss = criterion(output, y)   # 计算损失
optimizer.zero_grad()    # 梯度清零
loss.backward()          # 反向传播
optimizer.step()         # 参数更新
```

---

# 🔹 8. 记忆小技巧

* **所有层、损失函数、网络结构都是 `nn.Module` 的子类**
* **训练流程三步走：forward → loss → backward**
* **组合模块时，先想清楚输入输出 shape**

---

要不要我帮你整理一个 **torch.nn 思维导图（层 → 容器 → 损失函数 → 激活函数 → 用法套路）**，这样你一张图就能记住？