命令行
程序
- 初始化CLI:"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap
- 程序状态:lms status
模型
- 模型下载:lms get xxxxxxxx
- 模型文件列表:lms ls
- 模型预加载:lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes --estimate-only
模型加载GPU:lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes
- -- yes Automatically approve all prompts. Useful for scripting. If there are multiple models matching the model key, the model will be loaded on the preferred device (if set), or the first matching model will be loaded.
load_tensors: offloaded 23/25 layers to GPU <-说明运行在gpu模式一行cmd命令,不重复加载模型并查看日志 lms status | findstr /C:"qwen3.5-4b" >nul && (echo 模型已加载) || (echo 正在加载... && lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes) & lms status & lms log stream模型加载CPU:lms load --gpu 0 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes
load_tensors: offloaded 0/25 layers to GPU <--说明运行在cpu模式- 模型卸载:lms unload 模型ID/--all
- 模型运行列表:lms ps
服务
- 进程守护:lms daemon up
- 进程状态:tasklist | findstr lms.exe
- 服务状态:lms server status
- 服务启动:lms server start
- 服务停止:lms server stop
- 服务日志:lms log stream
- 显卡资源:nvidia-smi 或 nvitop(python环境的程序)
服务验证:curl请求http openai 接口
2026-03-31 09:30.31 /home/mobaxterm curl -k -H "Authorization: Bearer sk-lm-MCnjmxJZ:******" https://openai1.atibm.com/v1/models { "data": [ { "id": "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled", "object": "model", "owned_by": "organization_owner" }, { "id": "qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2@q4_k_m", "object": "model", "owned_by": "organization_owner" } ], "object": "list" }
聊天
- lms chat
runtime
- 可用容器:lms runtime ls
- 切换容器:lms runtime select llama.cpp-win-x86_64-nvidia-cuda12-avx2@2.8.0
- 识别显卡:lms runtime survey
问题排查
CUDA是否识别可用:nvidia-smi
2026-04-01 09:38.05 /home/mobaxterm nvidia-smi Wed Apr 1 09:38:28 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 595.79 Driver Version: 595.79 CUDA Version: 13.2 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4060 WDDM | 00000000:01:00.0 Off | N/A | | 0% 43C P8 N/A / 115W | 4403MiB / 8188MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 12080 C+G ...ram Files\Tencent\QQNT\QQ.exe N/A | | 0 N/A N/A 22496 C ...es_AI\LM Studio\LM Studio.exe N/A | +-----------------------------------------------------------------------------------------+- 8G显存加载4B 4.8GB模型,干爆显存问题
- 需要控制上下文长度,建议4096或8192
调用用amd显卡:设置-runtime-换vulkan模式
-----------------------切换lms的显卡感知----------------------- D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>vulkaninfo | findstr "deviceName" WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files. deviceName = AMD Radeon 780M Graphics deviceName = NVIDIA GeForce RTX 4060 硬件设置禁用4060就可以让imstudio只感知780M了 -----------------------日志----------------------- 2026-03-30 16:23:08 [DEBUG] LlamaV4::load called with model path: D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true 2026-03-30 16:23:09 [DEBUG] srv load_model: loading model 'D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf' 2026-03-30 16:23:09 [DEBUG] llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon 780M Graphics) (unknown id) - 32010 MiB free-----------------------性能对比----------------------- 同样提问+同样4B模型的思考耗时:vulkan-780M=30秒,CPU-7840U=34秒,CUDA-4060=4秒 推理耗时 (秒) ↑ 30 │ ● CPU (AMD 7840U 34秒) │ ╱ 25 │ ╱● Vulkan(AMD 780M 30秒) │ ╱╱ 20 │ ╱ │╱ 15 │ │ 10 │ │ 5 │ ● CUDA(RTX 4060 4秒) ← 快7.5倍! └───────────────────────────────→ 平台 CPU iGPU dGPU