LMS

命令行

程序

  • 初始化CLI:"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap
  • 程序状态:lms status

模型

  • 模型下载:lms get xxxxxxxx
  • 模型文件列表:lms ls
  • 模型预加载:lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes --estimate-only 
  • 模型加载GPU:lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes

    • -- yes Automatically approve all prompts. Useful for scripting. If there are multiple models matching the model key, the model will be loaded on the preferred device (if set), or the first matching model will be loaded.
    load_tensors: offloaded 23/25 layers to GPU		<-说明运行在gpu模式
    一行cmd命令,不重复加载模型并查看日志
    lms status | findstr /C:"qwen3.5-4b" >nul && (echo 模型已加载) || (echo 正在加载... && lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes) & lms status & lms log stream
  • 模型加载CPU:lms load --gpu 0 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --yes

    load_tensors: offloaded 0/25 layers to GPU      <--说明运行在cpu模式
  • 模型卸载:lms unload 模型ID/--all
  • 模型运行列表:lms ps

服务

  • 进程守护:lms daemon up
  • 进程状态:tasklist | findstr lms.exe
  • 服务状态:lms server status
  • 服务启动:lms server start
  • 服务停止:lms server stop
  • 服务日志:lms log stream
  • 显卡资源:nvidia-smi 或 nvitop(python环境的程序)
  • 服务验证:curl请求http openai 接口

      2026-03-31   09:30.31   /home/mobaxterm  curl -k -H "Authorization: Bearer sk-lm-MCnjmxJZ:******" https://openai1.atibm.com/v1/models
    {
      "data": [
        {
          "id": "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled",
          "object": "model",
          "owned_by": "organization_owner"
        },
        {
          "id": "qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2@q4_k_m",
          "object": "model",
          "owned_by": "organization_owner"
        }
      ],
      "object": "list"
    }       

聊天

  • lms chat

runtime

  • 可用容器:lms runtime ls
  • 切换容器:lms runtime select llama.cpp-win-x86_64-nvidia-cuda12-avx2@2.8.0
  • 识别显卡:lms runtime survey

问题排查

  • CUDA是否识别可用:nvidia-smi

    
      2026-04-01   09:38.05   /home/mobaxterm  nvidia-smi
    Wed Apr  1 09:38:28 2026
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 595.79                 Driver Version: 595.79         CUDA Version: 13.2     |
    +-----------------------------------------+------------------------+----------------------+
    | GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA GeForce RTX 4060      WDDM  |   00000000:01:00.0 Off |                  N/A |
    |  0%   43C    P8            N/A  /  115W |    4403MiB /   8188MiB |      0%      Default |
    |                                         |                        |                  N/A |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |    0   N/A  N/A           12080    C+G   ...ram Files\Tencent\QQNT\QQ.exe      N/A      |
    |    0   N/A  N/A           22496      C   ...es_AI\LM Studio\LM Studio.exe      N/A      |
    +-----------------------------------------------------------------------------------------+
    
  • 8G显存加载4B 4.8GB模型,干爆显存问题
    • 需要控制上下文长度,建议4096或8192
  • 调用用amd显卡:设置-runtime-换vulkan模式

    -----------------------切换lms的显卡感知-----------------------
    D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>vulkaninfo | findstr "deviceName"
    WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
            deviceName        = AMD Radeon 780M Graphics
            deviceName        = NVIDIA GeForce RTX 4060
    硬件设置禁用4060就可以让imstudio只感知780M了
    -----------------------日志-----------------------
    2026-03-30 16:23:08 [DEBUG]
     LlamaV4::load called with model path: D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf
    LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
    2026-03-30 16:23:09 [DEBUG]
     srv    load_model: loading model 'D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf'
    2026-03-30 16:23:09 [DEBUG]
     llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon 780M Graphics) (unknown id) - 32010 MiB free  
     -----------------------性能对比----------------------- 
     同样提问+同样4B模型的思考耗时:vulkan-780M=30秒,CPU-7840U=34秒,CUDA-4060=4秒
     
     推理耗时 (秒)
       ↑
    30 │    ● CPU (AMD 7840U 34秒)
       │     ╱
    25 │    ╱● Vulkan(AMD 780M 30秒)
       │   ╱╱
    20 │ ╱
       │╱
    15 │
       │
    10 │
       │
     5 │        ● CUDA(RTX 4060 4秒) ← 快7.5倍!
       └───────────────────────────────→ 平台
         CPU    iGPU      dGPU