LMstudio

命令行

  • 管理
    • 初始化CLI:"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap

      D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap
        ✓ Already Installed
      The path C:\Users\cat\.lmstudio\bin is already in the PATH environment variable.
        (i) If Windows cannot find the CLI tool, please try again in a new terminal window.
        (i) If you are using an integrated terminal in an editor (such as VS Code), please try to restart the editor.
    • 程序状态:lms status

      D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms status
      Server: ON (port: 1234)
      Loaded Models
        · qwen3.5-2b-claude-4.6-opus-reasoning-distilled - 2.68 GB
  • 模型
    • 模型下载:lms get xxxxxxxx
    • 模型文件列表:lms ls

      D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms ls
      You have 10 models, taking up 69.03 GB of disk space.
      LLM                                                                          PARAMS     ARCH         SIZE         DEVICE
      llama4-dolphin-8b                                                            8B         Llama        4.92 GB      Local
      qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled                             752M       qwen35       811.84 MB    Local
      qwen3.5-0.8b-coder-calude-full                                               752M       qwen35       527.51 MB    Local     ✓ LOADED
      qwen3.5-14b-a3b-claude-4.6-opus-reasoning-distilled-reap                     A3B        qwen35moe    9.46 GB      Local
      qwen3.5-24b-a3b-claude-opus-gemini-3.1-pro-reasoning-distilled-heretic-i1    24B-A3B    qwen35moe    14.08 GB     Local
      qwen3.5-2b-claude-4.6-opus-reasoning-distilled                               1.9B       qwen35       2.68 GB      Local     ✓ LOADED
      qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-i1                       35B-A3B    qwen35moe    24.76 GB     Local
      qwen3.5-4b-claude-4.6-opus-reasoning-distilled                               4.2B       qwen35       5.16 GB      Local
      qwen3.5-9b-claude-4.6-opus-reasoning-distilled-v2                            9.0B       qwen35       6.55 GB      Local
      EMBEDDING                               PARAMS    ARCH          SIZE        DEVICE
      text-embedding-nomic-embed-text-v1.5              Nomic BERT    84.11 MB    Local
    • 模型加载GPU:lms load --gpu 1 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled

      -------------------------命令------------------------
      D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms load --gpu 1 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled
      Model loaded successfully in 3.74s.
      (774.23 MiB)
      To use the model in the API/SDK, use the identifier "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled".
      -------------------------日志------------------------
      2026-03-30 14:57:42 [DEBUG]
       ggml_cuda_init: found 1 CUDA devices (Total VRAM: 8187 MiB):
        Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes, VRAM: 8187 MiB
        ...
      load_tensors: offloading 22 repeating layers to GPU
      load_tensors: offloaded 23/25 layers to GPU		<---------说明运行在gpu模式
      load_tensors:   CPU_Mapped model buffer size =   301.49 MiB
      load_tensors:        CUDA0 model buffer size =   719.95 MiB
    • 模型加载CPU:lms load --gpu 0 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled

      
      -------------------------命令------------------------
      D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms load --gpu 0 qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled
      Model loaded successfully in 3.64s.
      (774.23 MiB)
      To use the model in the API/SDK, use the identifier "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled".
      -------------------------日志------------------------
      [2026-03-30 14:59:27][DEBUG] ggml_cuda_init: found 1 CUDA devices (Total VRAM: 8187 MiB):
        Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes, VRAM: 8187 MiB
        ...
      load_tensors: offloaded 0/25 layers to GPU <---------说明没有用gpu,运行在cpu模式
      load_tensors:   CPU_Mapped model buffer size =   763.78 MiB
    • 模型卸载:lms unload 模型ID

      D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms unload qwen3.5-0.8b-coder-calude-full
      Model "qwen3.5-0.8b-coder-calude-full" unloaded.
    • 模型运行列表:lms ps

      D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>lms ps
      IDENTIFIER                                        MODEL                                             STATUS    SIZE         CONTEXT    PARALLEL    DEVICE    TTL
      qwen3.5-0.8b-coder-calude-full                    qwen3.5-0.8b-coder-calude-full                    IDLE      527.51 MB    262144     4           Local
      qwen3.5-2b-claude-4.6-opus-reasoning-distilled    qwen3.5-2b-claude-4.6-opus-reasoning-distilled    IDLE      2.68 GB      262144     4           Local
  • 服务
    • 查看服务:lms server status
    • 启动服务:lms server start
    • 停止服务:lms server stop
    • 服务资源:tasklist | findstr lms.exe
    • 显卡资源:nvidia-smi 或 nvitop(python环境的程序)

问题排查

  • CUDA是否识别可用:nvidia-smi

    C:\windows\system32>nvidia-smi
    Unable to determine the device handle for GPU0: 0000:01:00.0: GPU is lost.  Reboot the system to recover this GPU
  • 8G显存加载4B 4.8GB模型,干爆显存问题:需要控制上下文长度,建议4096或8192