feat: replace non-overlapping windows with sliding-window sequence sampling
- Remove sample-level shuffle before transforms (broke SimulateEvents) - Add _sliding_window_fn: yields overlapping sequences with configurable stride - Add sequence-level shuffle after grouping (preserves temporal coherence) - Add sliding_window_stride to TrainConfig (stride=1 for full overlap) - Update create_train/val_loader and train.py to pass stride - AGENTS.md: document known issues (cross-shard boundary, SimulateEvents state) - AGENTS.md: add cuda:7 device preference Generated by Mistral Vibe (deepseek-v4-flash). Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit is contained in:
20
AGENTS.md
20
AGENTS.md
@@ -169,3 +169,23 @@ uv run python -m benchmark.benchmark --checkpoint checkpoints/best.pt
|
||||
- 速度归一化统计量:待重新计算
|
||||
- 模型预测 `[v_right, v_forward]`(右向和前向速度)
|
||||
- 所有代码在项目根目录下以 `uv run python -m <module>` 运行
|
||||
- GPU 优先使用 `cuda:7`,训练时添加 `--device cuda:7`
|
||||
|
||||
## 已知问题
|
||||
|
||||
### 1. 滑窗跨 shard 边界
|
||||
|
||||
`dataset.py` 中滑窗实现基于 WebDataset 串联后的连续流,不感知 shard 边界。当样本恰好处于 shard 末尾时,序列会跨越到下一个 shard 的起始帧。
|
||||
|
||||
- 影响:每个 shard 边界处约有 `seq_len` 个序列包含跨 shard 样本(占总数 <1%)
|
||||
- 修复思路:在 `_sliding_window_fn` 中注入 shard 边界标记,遇到边界时清空缓冲区
|
||||
- 严重程度:低。若 shard 内帧数远大于 seq_len,可忽略
|
||||
|
||||
### 2. `SimulateEvents` 跨 shard 状态残留
|
||||
|
||||
`EventProcessor` 内部维护 `_prev_frame` 用于帧差计算。跨 shard 时,新 shard 的第一帧会与上一个 shard 最后一帧计算差,产生错误的事件帧。
|
||||
|
||||
- 影响:每个 shard 的第 1 帧事件帧错误,涉及该帧的所有滑窗序列均受影响
|
||||
- 每 shard 错误帧数:1 帧(加上滑窗放大,约 `seq_len` 个序列各包含此帧)
|
||||
- 修复:在 shard 边界处调用 `EventProcessor.reset()`。需在 `_build_pipeline` 中插入边界信号或改用按 shard 独立处理的方案
|
||||
- 严重程度:低。每 shard 仅 1 帧,训练数据量大时可忽略
|
||||
|
||||
Reference in New Issue
Block a user