命令行
- 管理
初始化CLI:"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap
D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap ✓ Already Installed The path C:\Users\cat\.lmstudio\bin is already in the PATH environment variable. (i) If Windows cannot find the CLI tool, please try again in a new terminal window. (i) If you are using an integrated terminal in an editor (such as VS Code), please try to restart the editor.程序状态:lms status
D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms status Server: ON (port: 1234) Loaded Models · qwen3.5-2b-claude-4.6-opus-reasoning-distilled - 2.68 GB
- 模型
- 模型下载:lms get xxxxxxxx
模型文件列表:lms ls
D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms ls You have 10 models, taking up 69.03 GB of disk space. LLM PARAMS ARCH SIZE DEVICE llama4-dolphin-8b 8B Llama 4.92 GB Local qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled 752M qwen35 811.84 MB Local qwen3.5-0.8b-coder-calude-full 752M qwen35 527.51 MB Local ✓ LOADED qwen3.5-14b-a3b-claude-4.6-opus-reasoning-distilled-reap A3B qwen35moe 9.46 GB Local qwen3.5-24b-a3b-claude-opus-gemini-3.1-pro-reasoning-distilled-heretic-i1 24B-A3B qwen35moe 14.08 GB Local qwen3.5-2b-claude-4.6-opus-reasoning-distilled 1.9B qwen35 2.68 GB Local ✓ LOADED qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-i1 35B-A3B qwen35moe 24.76 GB Local qwen3.5-4b-claude-4.6-opus-reasoning-distilled 4.2B qwen35 5.16 GB Local qwen3.5-9b-claude-4.6-opus-reasoning-distilled-v2 9.0B qwen35 6.55 GB Local EMBEDDING PARAMS ARCH SIZE DEVICE text-embedding-nomic-embed-text-v1.5 Nomic BERT 84.11 MB Local模型加载GPU:lms load --gpu 1 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled
-------------------------命令------------------------ D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms load --gpu 1 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled Model loaded successfully in 3.74s. (774.23 MiB) To use the model in the API/SDK, use the identifier "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled". -------------------------日志------------------------ 2026-03-30 14:57:42 [DEBUG] ggml_cuda_init: found 1 CUDA devices (Total VRAM: 8187 MiB): Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes, VRAM: 8187 MiB ... load_tensors: offloading 22 repeating layers to GPU load_tensors: offloaded 23/25 layers to GPU <---------说明运行在gpu模式 load_tensors: CPU_Mapped model buffer size = 301.49 MiB load_tensors: CUDA0 model buffer size = 719.95 MiB模型加载CPU:lms load --gpu 0 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled
-------------------------命令------------------------ D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms load --gpu 0 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled Model loaded successfully in 3.64s. (774.23 MiB) To use the model in the API/SDK, use the identifier "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled". -------------------------日志------------------------ [2026-03-30 14:59:27][DEBUG] ggml_cuda_init: found 1 CUDA devices (Total VRAM: 8187 MiB): Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes, VRAM: 8187 MiB ... load_tensors: offloaded 0/25 layers to GPU <---------说明没有用gpu,运行在cpu模式 load_tensors: CPU_Mapped model buffer size = 763.78 MiB模型卸载:lms unload 模型ID
D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms unload qwen3.5-0.8b-coder-calude-full Model "qwen3.5-0.8b-coder-calude-full" unloaded.模型运行列表:lms ps
D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms ps IDENTIFIER MODEL STATUS SIZE CONTEXT PARALLEL DEVICE TTL qwen3.5-0.8b-coder-calude-full qwen3.5-0.8b-coder-calude-full IDLE 527.51 MB 262144 4 Local qwen3.5-2b-claude-4.6-opus-reasoning-distilled qwen3.5-2b-claude-4.6-opus-reasoning-distilled IDLE 2.68 GB 262144 4 Local
- 服务
- 查看服务:lms server status
- 启动服务:lms server start
- 停止服务:lms server stop
- 服务资源:tasklist | findstr lms.exe
- 显卡资源:nvidia-smi 或 nvitop(python环境的程序)
问题排查
CUDA是否识别可用:nvidia-smi
C:\windows\system32>nvidia-smi Unable to determine the device handle for GPU0: 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU- 8G显存加载4B 4.8GB模型,干爆显存问题:需要控制上下文长度,建议4096或8192