lms命令行操作

命令行

日常维护-4060

  • 手动逐行执行

    # cuda - 4060
    lms runtime select llama.cpp-win-x86_64-nvidia-cuda-avx2@2.13.0
    
    lms load at/models/qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3.i1-q4_k_m.gguf --identifier 4060-Qwen3.5-4B --parallel 1 --context-length 20480 --gpu max
    
    lms load nomic-embed-text-v1.5 --identifier CPU-nomic-embed
  • 终端脚本

    # 4060
    lms runtime select llama.cpp-win-x86_64-nvidia-cuda-avx2@2.13.0 & lms status | findstr /C:"Qwen3.5-9B" >nul && (echo 模型Qwen3.5-9B已加载) || (echo 正在加载Qwen3.5-4B-claude... && lms load models@q4_k_s --identifier Qwen3.5-9B --parallel 1 --context-length 20480 --gpu max) & lms status & lms log stream

激活命令

  • 初始化CLI:"C:\Users\cat\.lmstudio\bin\lms.exe" bootstrap
  • 程序状态:lms status

聊天

  • lms chat

runtime

  • 可用容器:lms runtime ls
  • 切换容器:lms runtime select llama.cpp-win-x86_64-nvidia-cuda12-avx2@2.8.0
  • 识别显卡:lms runtime survey

模型管理

  • 模型下载:lms get xxxxxxxx
  • 模型文件列表:lms ls
  • 模型预加载:lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2 --estimate-only 
  • 模型加载GPU:

    • lms load --gpu 1 at/models/qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3.i1-q4_k_m.gguf --identifier 4060-Qwen3.5-4b-claude
    • lms load --gpu 0 at/models/llama4-dolphin-8b.q4_k_m.gguf --identifier 780M-llama4-8B
    • lms load --gpu 0 at/models/qwen3.5-4b-python-coder-q4_k_m.gguf  --identifier 780M-Qwen3.5-4b-python
    load_tensors: offloaded 23/25 layers to GPU		<-说明运行在gpu模式
    一行cmd命令,不重复加载模型并查看日志
    lms status | findstr /C:"qwen3.5-4b" >nul && (echo 模型已加载) || (echo 正在加载... && lms load --gpu 1 qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3-i1) & lms status | findstr /C:"embed" >nul && (echo 模型已加载) || (echo 正在加载... && lms load --gpu 1 text-embedding-nomic-embed-text-v1.5)& lms status & lms log stream
  • 模型加载CPU:lms load --gpu 0 qwen3.5-4b-claude-4.6-opus-reasoning-distill-heretic-v3-i1

    load_tensors: offloaded 0/25 layers to GPU      <--说明运行在cpu模式
  • 模型卸载:lms unload 模型ID/--all
  • 模型运行列表:lms ps

lms服务管理

  • 进程守护:lms daemon up
  • 进程状态:tasklist | findstr lms.exe
  • 服务状态:lms server status
  • 服务启动:lms server start
  • 服务停止:lms server stop
  • 服务日志:lms log stream 和 C:\Users\cat\.lmstudio\server-logs\2026-04
  • 显卡资源:nvidia-smi 或 nvitop(python环境的程序)
  • 服务验证:curl请求http openai 接口

      2026-03-31   09:30.31   /home/mobaxterm  curl -k -H "Authorization: Bearer sk-lm-MCnjmxJZ:******" https://openai1.atibm.com/v1/models
    {
      "data": [
        {
          "id": "qwen3.5-0.8b-claude-4.6-opus-reasoning-distilled",
          "object": "model",
          "owned_by": "organization_owner"
        },
        {
          "id": "qwen3.5-4b-claude-4.6-opus-reasoning-distilled-v2@q4_k_m",
          "object": "model",
          "owned_by": "organization_owner"
        }
      ],
      "object": "list"
    }       

问题排查

  • CUDA是否识别可用:nvidia-smi

    
      2026-04-01   09:38.05   /home/mobaxterm  nvidia-smi
    Wed Apr  1 09:38:28 2026
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 595.79                 Driver Version: 595.79         CUDA Version: 13.2     |
    +-----------------------------------------+------------------------+----------------------+
    | GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA GeForce RTX 4060      WDDM  |   00000000:01:00.0 Off |                  N/A |
    |  0%   43C    P8            N/A  /  115W |    4403MiB /   8188MiB |      0%      Default |
    |                                         |                        |                  N/A |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |    0   N/A  N/A           12080    C+G   ...ram Files\Tencent\QQNT\QQ.exe      N/A      |
    |    0   N/A  N/A           22496      C   ...es_AI\LM Studio\LM Studio.exe      N/A      |
    +-----------------------------------------------------------------------------------------+
    
  • 8G显存加载4B 4.8GB模型,干爆显存问题
    • 需要控制上下文长度,建议4096或8192
  • 调用用amd显卡:设置-runtime-换vulkan模式,用 lms runtime survey 查看显卡列表

    -----------------------切换lms的显卡感知-----------------------
    D:\softWin\ProgramFiles_green\MobaXterm.25.4\Data>vulkaninfo | findstr "deviceName"
    WARNING: [Loader Message] Code 0 : windows_read_data_files_in_registry: Registry lookup failed to get layer manifest files.
            deviceName        = AMD Radeon 780M Graphics
            deviceName        = NVIDIA GeForce RTX 4060
    硬件设置禁用4060就可以让imstudio只感知780M了
    -----------------------日志-----------------------
    2026-03-30 16:23:08 [DEBUG]
     LlamaV4::load called with model path: D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf
    LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
    2026-03-30 16:23:09 [DEBUG]
     srv    load_model: loading model 'D:\softWin\ProgramFiles_AI\LMStudioModels\usemodels\Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF\Qwen3.5-4B.Q8_0.gguf'
    2026-03-30 16:23:09 [DEBUG]
     llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon 780M Graphics) (unknown id) - 32010 MiB free  
     -----------------------性能对比----------------------- 
     同样提问+同样4B模型的思考耗时:vulkan-780M=30秒,CPU-7840U=34秒,CUDA-4060=4秒
     
     推理耗时 (秒)
       ↑
    30 │    ● CPU (AMD 7840U 34秒)
       │     ╱
    25 │    ╱● Vulkan(AMD 780M 30秒)
       │   ╱╱
    20 │ ╱
       │╱
    15 │
       │
    10 │
       │
     5 │        ● CUDA(RTX 4060 4秒) ← 快7.5倍!
       └───────────────────────────────→ 平台
         CPU    iGPU      dGPU