STDD故障现场
[2026/6/1 15:20:52] at c: stdd 本地建 dev分支,开发 笔记 BsJqGgNFUTDK 上的需求
[2026/6/1 15:22:24] H02 - ARM: 需求已读取。现在探索当前代码结构,确认改动范围。
[2026/6/1 15:24:18] H02 - ARM: 现在创建 dev 分支,然后写 proposal。
[2026/6/1 15:24:20] at c: 开发过程中你可以需要 asset净值csv和factor因子csv,到时候问我要
[2026/6/1 15:25:36] H02 - ARM: Phase 1 — UNDERSTAND 完成,Gate 1 等待确认
变更:2026-06-01-factor-dual-chain
Why:当前 asset DataFrame 同时用于计价链路和分析链路,无法区分。因子数据与净值数据不同源时无法传入。
What Changes(6 处修改):
plan_bt.py
• 改动: __init__ 加 factor 参数;init_asset_info → init_asset_info_factor 返回三元组;save_csv 存 factor.csv;plan_metrics/plan_plot 用 factor 做分析
port_bt.py
• 改动: __init__ 加 factor 参数,拆 _net_value/_factor;from_plan 透传;run() 计价用 _net_value,分析用 _factor
manager.py
• 改动: 无需改动
risk.py
• 改动: 无需改动
关键约束:
- factor 列名与 net_value 完全一致
- 默认 factor = asset(向后兼容)
- plan_plot 画图仍用 asset(净值),分析用 factor
Success Criteria:
- [ ] 默认调用(不传 factor)时,回测结果与修改前完全一致
- [ ] 传入独立 factor 时,计价链路用 asset,分析链路用 factor
- [ ] 输出目录包含 *_factor.csv
请确认 proposal 是否可以进入 Phase 2(SPEC)?
[2026/6/1 15:31:38] H02 - ARM: 好的,开发过程中需要数据时我会说。继续推进 Phase 2。
[2026/6/1 15:31:39] H02 - ARM: 📚 skill_view: "stdd"
[2026/6/1 15:33:04] at c: 确认
[2026/6/1 15:35:44] H02 - ARM: Phase 2 — SPEC 完成,Gate 2 等待确认
已生成以下文档:
design.md
• 说明: 技术设计 — 决策、架构、风险
specs/factor-dual-chain.md
• 说明: 行为规格 — 5 个 Scenario
test-plan.md
• 说明: 测试方案 — 8 个 TC,P0 优先
关键设计决策:
1. 在 PortBacktest 内部拆分为 _net_value(计价)和 _factor(分析)— 不改 Manager/Risk 接口
2. plan_metrics 分析用 factor,plan_plot 画图用 asset
3. 默认 factor = asset,向后兼容
风险最高项:计价链路 _net_value 替代 asset — 必须用真实数据回归确认结果一致。
请审阅以上设计文档,确认是否可以进入 Phase 3(SLICE)?
[2026/6/1 15:36:44] at c: 确认
[2026/6/1 15:38:45] H02 - ARM: ⏳ Working — 18 min — waiting for stream response (90s, no chunks yet)
[2026/6/1 15:54:15] H02 - ARM: 🔄 Primary model failed — switching to fallback: Qwen3.5-27B via v100
[2026/6/1 15:54:16] H02 - ARM: API call failed after 3 retries: Request timed out.
[2026/6/1 15:54:16] H02 - ARM: ⚠️ The model provider failed after retries. I kept raw provider details out of chat; check gateway logs for diagnostics.
本地算力故障现场
(base) x99@x99:~/gguf$ tail -f ~/gguf/llama_server.log| grep --line-buffered -E "slot create_check: |srv stop|slot print_timing: |srv load_model|llama_model_load_from_file_impl|error"
[54473] 4354.48.024.209 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 2048, progress = 0.02, t = 8.92 s / 229.67 tokens per second
[54473] 4354.56.935.587 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 4096, progress = 0.05, t = 17.83 s / 229.74 tokens per second
[54473] 4355.05.938.933 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 6144, progress = 0.07, t = 26.83 s / 228.98 tokens per second
[54473] 4355.15.029.593 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 8192, progress = 0.10, t = 35.92 s / 228.05 tokens per second
[54473] 4355.24.175.420 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 10240, progress = 0.12, t = 45.07 s / 227.21 tokens per second
[54473] 4355.33.365.091 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 12288, progress = 0.14, t = 54.26 s / 226.47 tokens per second
[54473] 4355.42.515.876 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 14336, progress = 0.17, t = 63.41 s / 226.09 tokens per second
[54473] 4355.51.671.913 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 16384, progress = 0.19, t = 72.56 s / 225.78 tokens per second
[54473] 4356.00.932.661 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 18432, progress = 0.22, t = 81.83 s / 225.26 tokens per second
[54473] 4356.10.040.015 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 20480, progress = 0.24, t = 90.93 s / 225.22 tokens per second
4356.44.579.102 E srv operator(): http client error: Failed to read connection
[54473] 4356.18.815.865 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 22528, progress = 0.26, t = 99.71 s / 225.94 tokens per second
[54473] 4356.27.725.767 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 24576, progress = 0.29, t = 108.62 s / 226.26 tokens per second
[54473] 4356.36.732.149 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 26624, progress = 0.31, t = 117.63 s / 226.35 tokens per second
[54473] 4356.45.812.252 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 28672, progress = 0.34, t = 126.71 s / 226.29 tokens per second
[54473] 4356.55.038.156 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 30720, progress = 0.36, t = 135.93 s / 226.00 tokens per second
[54473] 4357.04.194.607 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 32768, progress = 0.38, t = 145.09 s / 225.85 tokens per second
[54473] 4357.13.407.185 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 34816, progress = 0.41, t = 154.30 s / 225.64 tokens per second
[54473] 4357.22.636.455 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 36864, progress = 0.43, t = 163.53 s / 225.43 tokens per second
[54473] 4357.32.003.747 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 38912, progress = 0.46, t = 172.90 s / 225.06 tokens per second
[54473] 4357.41.492.706 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 40960, progress = 0.48, t = 182.39 s / 224.58 tokens per second
[54473] 4357.50.934.420 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 43008, progress = 0.50, t = 191.83 s / 224.20 tokens per second
[54473] 4358.00.435.795 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 45056, progress = 0.53, t = 201.33 s / 223.79 tokens per second
[54473] 4358.10.034.384 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 47104, progress = 0.55, t = 210.93 s / 223.32 tokens per second
4358.46.835.007 E srv operator(): http client error: Failed to read connection
[54473] 4358.19.667.244 I slot print_timing: id 0 | task 188844 | prompt processing, n_tokens = 49152, progress = 0.58, t = 220.56 s / 222.85 tokens per second
[54473] 4358.38.086.286 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 2048, progress = 0.02, t = 8.89 s / 230.38 tokens per second
[54473] 4358.47.012.796 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 4096, progress = 0.05, t = 17.82 s / 229.90 tokens per second
[54473] 4358.56.002.651 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 6144, progress = 0.07, t = 26.81 s / 229.20 tokens per second
[54473] 4359.05.078.851 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 8192, progress = 0.10, t = 35.88 s / 228.30 tokens per second
[54473] 4359.14.209.374 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 10240, progress = 0.12, t = 45.01 s / 227.49 tokens per second
[54473] 4359.23.384.987 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 12288, progress = 0.14, t = 54.19 s / 226.76 tokens per second
[54473] 4359.32.520.500 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 14336, progress = 0.17, t = 63.32 s / 226.39 tokens per second
[54473] 4359.41.673.322 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 16384, progress = 0.19, t = 72.48 s / 226.06 tokens per second
[54473] 4359.50.910.839 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 18432, progress = 0.22, t = 81.71 s / 225.57 tokens per second
[54473] 4360.00.019.934 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 20480, progress = 0.24, t = 90.82 s / 225.49 tokens per second
[54473] 4360.08.819.306 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 22528, progress = 0.26, t = 99.62 s / 226.13 tokens per second
[54473] 4360.17.767.305 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 24576, progress = 0.29, t = 108.57 s / 226.36 tokens per second
4360.53.273.986 E srv operator(): http client error: Failed to read connection
[54473] 4360.27.565.176 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 26624, progress = 0.31, t = 118.37 s / 224.92 tokens per second
[54473] 4360.36.640.288 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 28672, progress = 0.34, t = 127.44 s / 224.98 tokens per second
[54473] 4360.45.883.037 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 30720, progress = 0.36, t = 136.69 s / 224.75 tokens per second
[54473] 4360.55.039.837 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 32768, progress = 0.38, t = 145.84 s / 224.68 tokens per second
[54473] 4361.04.217.080 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 34816, progress = 0.41, t = 155.02 s / 224.59 tokens per second
[54473] 4361.13.488.979 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 36864, progress = 0.43, t = 164.29 s / 224.38 tokens per second
[54473] 4361.22.837.075 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 38912, progress = 0.46, t = 173.64 s / 224.10 tokens per second
[54473] 4361.32.327.205 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 40960, progress = 0.48, t = 183.13 s / 223.67 tokens per second
[54473] 4361.41.800.308 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 43008, progress = 0.50, t = 192.60 s / 223.30 tokens per second
[54473] 4361.51.278.616 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 45056, progress = 0.53, t = 202.08 s / 222.96 tokens per second
[54473] 4362.00.867.426 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 47104, progress = 0.55, t = 211.67 s / 222.53 tokens per second
[54473] 4362.10.499.716 I slot print_timing: id 0 | task 188876 | prompt processing, n_tokens = 49152, progress = 0.58, t = 221.30 s / 222.10 tokens per second
^C
(base) x99@x99:~/gguf$ cat 0410.ini
[Qwen3.5-27B]
model = /home/x99/gguf/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled.i1-Q4_K_M.gguf
chat-template-file = /home/x99/gguf/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled.i1-Q4_K_M.jinja
ctx-size = 262144
parallel = 2
kv-unified = 1
n-gpu-layers = 99
tensor-split = 0.87,1
flash-attn = on
cache-type-k = q8_0
cache-type-v = q8_0
no-mmap = 1
no-kv-offload = 0
no-warmup = 0
threads = 12
batch-size = 2048
ubatch-size = 512
temp = 0.3
min-p = 0.0
#top-p = 0.95
#repeat-penalty = 1.1
timeout = 3600
reasoning = on
reasoning-budget = 1024
chat-template-kwargs = {"enable_thinking":true}
[embed]
model = /home/x99/gguf/Qwen3-Embedding-0.6B.Q4_K_M.gguf
chat-template-file = /home/x99/gguf/Qwen3-Embedding-0.6B.jinja
ctx-size = 8192
parallel = 2
kv-unified = 1
device = none
no-warmup = 0
mlock = 1
threads = 12
batch-size = 512
ubatch-size = 512
pooling = cls
embedding = 1
(base) x99@x99:~/gguf$ nvitop
Mon Jun 01 16:04:24 2026
╒═════════════════════════════════════════════════════════════════════════════╕
│ NVITOP 1.7.0 Driver Version: 580.126.20 CUDA Driver Version: 13.0 │
├───────────────────────────────┬──────────────────────┬──────────────────────┤
│ GPU Name Persistence-M│ Bus-Id Disp.A │ Volatile Uncorr. ECC │
│ Fan Temp Perf Pwr:Usage/Cap│ Memory-Usage │ GPU-Util Compute M. │
╞═══════════════════════════════╪══════════════════════╪══════════════════════╪═══════════════════════════════════════════════╤═════════════════════════════════════════════════╕
│ 0 ..la V100-SXM2-16GB Off │ 00000000:05:00.0 Off │ 0 │ MEM: ████████████████████████████████▊ 91.1% │ MBW: ███ 8% @ 877MHz │
│ N/A 73C P0 105W / 300W │ 14933MiB / 16384MiB │ 52% Default │ UTL: ██████████████████▊ 52% @ 1530MHz │ PWR: █████████████▎ 105W 35.0% │
├───────────────────────────────┼──────────────────────┼──────────────────────┼───────────────────────────────────────────────┼─────────────────────────────────────────────────┤
│ 1 ..la V100-SXM2-16GB Off │ 00000000:06:00.0 Off │ 0 │ MEM: ███████████████████████████████▏ 86.4% │ MBW: ▏ 0% @ 877MHz │
│ N/A 70C P0 87W / 300W │ 14159MiB / 16384MiB │ 0% Default │ UTL: ▏ 0% @ 1530MHz │ PWR: ███████████ 87W 29.1% │
╘═══════════════════════════════╧══════════════════════╧══════════════════════╧═══════════════════════════════════════════════╧═════════════════════════════════════════════════╛
[ CPU: ███████████ 8.7% UPTIME: 17.7 days ] ( Load Average: 1.61 1.80 1.83 )
[ MEM: █████████████████████████████████████████████████████████████████████▌ 54.7% USED: 18.41GiB ] [ SWP: ████████████▎ 55.4% ]
╒═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Processes: x99@x99 │
│ GPU PID USER GPU-MEM %SM %GMBW %CPU %MEM TIME COMMAND │
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ 0 434837 G 1002 4.45MiB 0 0 0.0 0.1 13.1 days /usr/lib/xorg/Xorg vt3 -displayfd 3 -auth /run/user/1002/gdm/Xauthority -nolisten tcp -background none -no.. │
│ 0 429237 G x99 4.45MiB 0 0 0.0 0.1 13.1 days /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -nolisten tcp -background none -no.. │
│ 0 429473 X x99 313.9MiB 0 0 0.5 0.3 13.1 days /usr/libexec/gnome-remote-desktop-daemon │
│ 0 1388953 C x99 14294MiB 31 4 100.0 25.5 72:42:23 /home/x99/llama.cpp/build/bin/llama-server --api-key-file /home/x99/gguf/apikey.txt --chat-template-file /.. │
│ 0 1389519 C x99 308.0MiB 0 0 0.5 8.2 72:41:33 /home/x99/llama.cpp/build/bin/llama-server --api-key-file /home/x99/gguf/apikey.txt --chat-template-file /.. │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 1 434837 G 1002 4.45MiB 0 0 0.0 0.1 13.1 days /usr/lib/xorg/Xorg vt3 -displayfd 3 -auth /run/user/1002/gdm/Xauthority -nolisten tcp -background none -no.. │
│ 1 429237 G x99 4.45MiB 0 0 0.0 0.1 13.1 days /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -nolisten tcp -background none -no.. │
│ 1 1388953 C x99 13834MiB 24 4 100.0 25.5 72:42:23 /home/x99/llama.cpp/build/bin/llama-server --api-key-file /home/x99/gguf/apikey.txt --chat-template-file /.. │
│ 1 1389519 C x99 308.0MiB 0 0 0.5 8.2 72:41:33 /home/x99/llama.cpp/build/bin/llama-server --api-key-file /home/x99/gguf/apikey.txt --chat-template-file /.. │
╘═══════════════════════════════════════════════════════════════════════════════╛
(base) x99@x99:~/gguf$
排查记录1
时序: 15:38:45 Hermes Phase3 → 15:54:15 主模型超时 fallback V100 → llama-server 收到 task 188844/188876 开始 PP → ~220s 后 HTTP 断开 → 3次重试均超时 → 失败
PP速度: 49K token @ ~225 t/s (远低于此前小prompt记录的436 t/s)
→ task188844: 220.56s / 222.85 t/s (全程49K)
→ task188876: 221.30s / 222.10 t/s (全程49K)
→ 每个2K chunk稳定在225-230 t/s,无衰减
HTTP断开: 3次 "Failed to read connection" (4356s/4358s/4360s)
→ llama-server timeout=3600远未到,是中继层先断
→ 断后llama-server继续处理至完成但结果无法回传
显存: GPU0 14933/16384 (91.1%) Util 52% | GPU1 14159/16384 (86.4%) Util 0%
→ 温度73°C/70°C,功耗仅105W/87W(上限300W),GPU远未跑满
根因链:
直接 → HTTP中继层~220s断开,Hermes收不到响应
间接 → PP 49K需220s,落入超时窗口
深层 → 待排查
排查记录2
FRP配置: frpc heartbeatInterval=120 heartbeatTimeout=600 tcpMux=true TLS=true
frps vhostHTTPTimeout=1200 (20分钟)
结论: FRP层不是瓶颈,PP才~220s远小于1200s。超时发生在别处。
PP 225t/s的重新解读:
progress=0.58时n_tokens=49152 → ctx-size=262144, 0.58×262144≈152K
已有缓存 ≈152K−49K=103K (来自多轮对话历史)
flash-attention下PP计算量随总上下文增长,不是只算新token
空缓存下49K PP: 注意力∫₀^{49K}=1.2B units
已有103K缓存下49K PP: ∫_{103K}^{152K}=6.25B units → ~5×计算量
实测225t/s vs 空缓存小prompt 436t/s 是预期的,非配置退化
与ini注释"12核2048/512: 50K pp300"相比差~25%,这部分可能来自GPU1未参与
时序分析:
task188844: 4354s起PP → 4356.44s第1次断(20K处) → 继续 → 4358.46s第2次断(47K) → 4358.19s完成
task188876: 4358.38s起PP → 4360.53s第3次断(25K处) → 4362.10s完成
两次断开间隔 ~220s,说明客户端超时约180-240s
显存修正: nvitop是空闲时采样(16:04:24),PP时GPU Util未知
GPU0 14294MiB占用含:模型权重+KV cache+缓存上下文103K
GPU1 13834MiB占用:模型权重+KV cache的GPU1部分
实际瓶颈圈定:
① FRP ✅ 没问题(1200s超时)
② Nginx proxy_read_timeout? 不确定
③ Hermes gateway HTTP timeout? 需查ARM端
④ 即使排除超时问题,~220s PP对STDD这种需要多次往返的工作流也难接受
排查记录3
网络路径已确认:
v1008080.atibm.com → 本宿主机(ARM) → Docker net → nginx(容器) → frps → frpc → llama-server(127.0.0.1:54473)
nginx frps.conf关键配置:
server_name ~^v100808\d\.atibm\.com$;
proxy_pass http://frps:8080;
proxy_buffering off;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
结论: Nginx proxy_read_timeout=300s, PP仅220s → Nginx ✅ 也不是瓶颈!
三层中继已全部排除:
✅ llama-server timeout=3600 (>220s)
✅ FRP vhostHTTPTimeout=1200 (>220s)
✅ Nginx proxy_read_timeout=300 (>220s)
更精确断开时间(t=累积PP耗时):
task188844 第1次断: ~20K token处 t≈90-100s
task188844 第2次断: ~47K token处 t≈210-220s
task188876 第1次断: ~25K token处 t≈108-118s
全部 <300s, nginx不可能触发超时
断开一定来自客户端(Hermes agent)层面。
V100 provider无显式request_timeout_seconds配置(env var方式),
HTTP客户端使用默认超时值
排查记录4
llama.cpp升级: 旧版(?) → version 9449 (e22b0de60)
升级前: TG 63 t/s, PP 273 t/s (layer, 0.87,1, reasoning=on)
升级后: TG 103 t/s, PP 573 t/s ✅ 恢复基线
split-mode测试结论 (2×V100, MoE模型Qwen3.6-35B):
layer模式: TG 103 t/s, PP 573 t/s ✅ (适用MoE, pipeline通信少)
row模式: TG 34 t/s, PP 210 t/s ❌ (MoE每层多次all-reduce, NVLink扛不住)
tensor模式: MoE不支持, 不可用
layer模式的tensor-split = 0.87,1:
目的: 补偿GPU0嵌入表~1GB + CUDA context~0.5GB的固定开销
row模式也需要tensor-split调平衡, 但嵌入表钉GPU0、MoE+row的TG损耗不可接受
→ layer + tensor-split=0.87,1 是这套硬件的正确组合
nvitop空闲采样问题:
GPU1 Util 0%只代表采样时刻无任务, 不代表PP时GPU1不参与
layer模式下GPU1参与KV cache读取和解码, TG时两张卡都在工作
100 t/s基线的条件:
12核 + 2048/512 batch/ubatch + reasoning=on + llama.cpp 9449
旧版本(llama.cpp 9449之前)会掉到63 t/s
仍然待确认:
Hermes agent V100 provider的HTTP客户端默认超时值
→ 修复: hermes config set providers.V100.request_timeout_seconds 600