命令行
日常维护-4060
手动逐行执行
# cuda - 4060 lms runtime select llama.cpp-win-x86_64-nvidia-cuda-avx2@2.13.0 lms load at/models/qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3.i1-q4_k_m.gguf --identifier 4060-Qwen3.5-4B --parallel 1 --context-length 20480 --gpu max lms load nomic-embed-text-v1.5 --identifier CPU-nomic-embed终端脚本
# 4060 lms runtime select llama.cpp-win-x86_64-nvidia-cuda-avx2@2.13.0 & lms status | findstr /C:"Qwen3.5-9B" >nul && (echo 模型Qwen3.5-9B已加载) || (echo 正在加载Qwen3.5-4B-claude... && lms load models@q4_k_s --identifier Qwen3.5-9B --parallel 1 --context-length 20480 --gpu max) & lms status & lms log stream
激活命令
- 初始化CLI:"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap
- 程序状态:lms status
聊天
- lms chat
runtime
- 可用容器:lms runtime ls
- 切换容器:lms runtime select llama.cpp-win-x86_64-nvidia-cuda12-avx2@2.8.0
- 识别显卡:lms runtime survey
模型管理
- 模型下载:lms get xxxxxxxx
- 模型文件列表:lms ls
- 模型预加载:lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --estimate-only
模型加载GPU:
- lms load --gpu 1 at/models/qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3.i1-q4_k_m.gguf --identifier 4060-Qwen3.5-4b-claude
- lms load --gpu 0 at/models/llama4-dolphin-8b.q4_k_m.gguf --identifier 780M-llama4-8B
- lms load --gpu 0 at/models/qwen3.5-4b-python-coder-q4_k_m.gguf --identifier 780M-Qwen3.5-4b-python
load_tensors: offloaded 23/25 layers to GPU <-说明运行在gpu模式一行cmd命令,不重复加载模型并查看日志 lms status | findstr /C:"qwen3.5-4b" >nul && (echo 模型已加载) || (echo 正在加载... && lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3-i1) & lms status | findstr /C:"embed" >nul && (echo 模型已加载) || (echo 正在加载... && lms load --gpu 1 text-embedding-nomic-embed-text-v1.5)& lms status & lms log stream模型加载CPU:lms load --gpu 0 qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3-i1
load_tensors: offloaded 0/25 layers to GPU <--说明运行在cpu模式- 模型卸载:lms unload 模型ID/--all
- 模型运行列表:lms ps
lms服务管理
- 进程守护:lms daemon up
- 进程状态:tasklist | findstr lms.exe
- 服务状态:lms server status
- 服务启动:lms server start
- 服务停止:lms server stop
- 服务日志:lms log stream 和 C:\Users\cat\.lmstudio\server-logs\2026-04
- 显卡资源:nvidia-smi 或 nvitop(python环境的程序)
服务验证:curl请求http openai 接口
2026-03-31 09:30.31 /home/mobaxterm curl -k -H "Authorization: Bearer sk-lm-MCnjmxJZ:******" https://openai1.atibm.com/v1/models { "data": [ { "id": "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled", "object": "model", "owned_by": "organization_owner" }, { "id": "qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2@q4_k_m", "object": "model", "owned_by": "organization_owner" } ], "object": "list" }
问题排查
CUDA是否识别可用:nvidia-smi
2026-04-01 09:38.05 /home/mobaxterm nvidia-smi Wed Apr 1 09:38:28 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 595.79 Driver Version: 595.79 CUDA Version: 13.2 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4060 WDDM | 00000000:01:00.0 Off | N/A | | 0% 43C P8 N/A / 115W | 4403MiB / 8188MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 12080 C+G ...ram Files\Tencent\QQNT\QQ.exe N/A | | 0 N/A N/A 22496 C ...es_AI\LM Studio\LM Studio.exe N/A | +-----------------------------------------------------------------------------------------+- 8G显存加载4B 4.8GB模型,干爆显存问题
- 需要控制上下文长度,建议4096或8192
调用用amd显卡:设置-runtime-换vulkan模式,用 lms runtime survey 查看显卡列表
-----------------------切换lms的显卡感知----------------------- D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>vulkaninfo | findstr "deviceName" WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. deviceName = AMD Radeon 780M Graphics deviceName = NVIDIA GeForce RTX 4060 硬件设置禁用4060就可以让imstudio只感知780M了 -----------------------日志----------------------- 2026-03-30 16:23:08 [DEBUG] LlamaV4::load called with model path: D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true 2026-03-30 16:23:09 [DEBUG] srv load_model: loading model 'D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf' 2026-03-30 16:23:09 [DEBUG] llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon 780M Graphics) (unknown id) - 32010 MiB free-----------------------性能对比----------------------- 同样提问+同样4B模型的思考耗时:vulkan-780M=30秒,CPU-7840U=34秒,CUDA-4060=4秒 推理耗时 (秒) ↑ 30 │ ● CPU (AMD 7840U 34秒) │ ╱ 25 │ ╱● Vulkan(AMD 780M 30秒) │ ╱╱ 20 │ ╱ │╱ 15 │ │ 10 │ │ 5 │ ● CUDA(RTX 4060 4秒) ← 快7.5倍! └───────────────────────────────→ 平台 CPU iGPU dGPU