llama.cpp win部署

部署：伸手党

下载编译版

https://github.com/ggml-org/llama.cpp/releases
https://github.com/ggml-org/llama.cpp/releases/download/b8772/llama-b8772-bin-win-cuda-13.1-x64.zip
https://github.com/ggml-org/llama.cpp/releases/download/b8772/llama-b8772-bin-win-hip-radeon-x64.zip
https://github.com/ggml-org/llama.cpp/releases/download/b8772/llama-b8772-bin-win-vulkan-x64.zip

解压，运行即可
vulkan能识别cuda显卡和amd显卡，ok

cuda12和13都不能识别任何显卡，存在问题

安装cuda toolkit 13.2
- 把这 3 个 DLL 复制到 llama-server.exe 同目录：cudart64_13.dll cublas64_13.dll cublasLt64_13.dll

(jupyter314)D:\softWin\ProgramFiles_AI\llama\cuda>llama-server --list-devices
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 8187 MiB):
  Device 0: NVIDIA GeForce RTX 4060, compute capability 8.9, VMM: yes, VRAM: 8187 MiB
load_backend: loaded CUDA backend from D:\softWin\ProgramFiles_AI\llama\cuda\ggml-cuda.dll
load_backend: loaded RPC backend from D:\softWin\ProgramFiles_AI\llama\cuda\ggml-rpc.dll
load_backend: loaded CPU backend from D:\softWin\ProgramFiles_AI\llama\cuda\ggml-cpu-zen4.dll
Available devices:
  CUDA0: NVIDIA GeForce RTX 4060 (8187 MiB, 7106 MiB free)

部署：让Trae搞定编译

------------------- 选择 GPT-5.3-Codex -------------------------
问题一：以 D:\softWin\ProgramFiles_AI\llama\cuda_b\llama-server --list-devices 能看到4060显卡为目标，无限循环完成任务
各种自动执行，到visual安装卡住了，微软真垃圾，自己的生态软件安装都这么费劲

问题二：目前什么进度，好像卡住了，只有一个弹窗是否同意visual安装，然后什么进度都没有，我需要反馈，知道系统和你都在正常干活

问题三：D:\softWin\ProgramFiles_AI\llama\llama.cpp_release\llama-b8851-bin-win-cuda-13.1-x64.zip 把这个也作为一种可能的路径，不一定要本地重头编译，那个在下载，等会可以继续验证，现在可以先验证这个预编译包

trae探索成功
------------------- 预编译包执行成功 ------------------
可以，已经按你说的“先验证预编译包”走通了，结果如下：
- 你给的 llama-b8851-bin-win-cuda-13.1-x64.zip 已解压并验证成功。
- 关键问题不是“没有显卡/驱动”，而是 CUDA 运行时 DLL 路径不对 。
- CUDA 13.2 安装后，DLL 实际在 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64 ，不是 bin 根目录。
- 把这个目录加到 PATH 后， --list-devices 立即识别到 4060。

-------------------预编译包最小依赖（Windows）-------------
- NVIDIA 驱动 ：已包含的nvcuda.dll
- CUDA 13.x 运行时 DLL ：cudart64_13.dll cublas64_13.dll cublasLt64_13.dll
- VC++ 运行时 （通常系统已有）。

问题四：不需要编译了，清理所有为了编译做的痕迹

部署：A/N双卡编译

基础编译工具（MSVC+CMake+Git）

# 组件下载目录
cd D:\softWin\ProgramFiles_AI\llama\envinstall

# 1. 下载并安装 Git (静默模式)
# curl.exe -x http://127.0.0.1:7890 -L -o git_setup.exe "https://github.com/git-for-windows/git/releases/download/v2.44.0.windows.1/Git-2.44.0-64-bit.exe"
#.\git_setup.exe /VERYSILENT /NORESTART

# 2. 下载并安装 CMake (添加到系统 PATH)
curl.exe -x http://127.0.0.1:7890 -L -o cmake_setup.msi "https://github.com/Kitware/CMake/releases/download/v3.29.2/cmake-3.29.2-windows-x86_64.msi"
msiexec.exe /i cmake-3.29.2-windows-x86_64.msi /quiet /qn /norestart
添加环境变量 C:\Program Files\CMake\bin

# 3. 下载 Visual Studio 2022 Build Tools 并安装 C++ 核心组件
curl.exe -x http://127.0.0.1:7890 -L -o vs_bt.exe "https://aka.ms/vs/17/release/vs_buildtools.exe"
.\vs_buildtools.exe --quiet --wait --norestart --nocache `
    --add Microsoft.VisualStudio.Workload.VCTools `
    --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 `
    --add Microsoft.VisualStudio.Component.Windows11SDK.22621
添加环境变量 C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.44.35207\bin\Hostx64\x64
执行 Visual Studio Installer，修改安装，因为win10系统，需要勾选一下Windows SDK (10.0.19041.0)(已停止支持)

# 验证
cl  # 应输出MSVC版本
cmake --version  # ≥3.25
git --version

拉取llama.cpp

cd D:\softWin\ProgramFiles_AI\llama
git clone https://github.com/ggerganov/llama.cpp

# 不需要安装显卡sdk，正常显卡驱动就行了
# 用llama.cpp源码执行编译，不同的运行时编译目录，有各自的exe执行加载模型

编译llama - 放弃，搞不定，直接从Realese下载编译好的

--------------必须用VS的终端编译------------------
Developer PowerShell for VS 2022

# ------------------- 1. 编译 NVIDIA 版本 -------------------
# 针对 RTX 4060 编译
d:;cd D:\softWin\ProgramFiles_AI\llama\cuda
cmake ../llama.cpp -G "Visual Studio 17 2022" -A x64 -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 --debug-trycompile
cmake --build . --config Release --target llama-cli --target llama-server --target llama-quantize -j 16
cd ..

# ------------------- 2. 编译 AMD 版本 -------------------
# 注意：前提是你已经手动安装了 ROCm/HIP SDK，但没有破坏驱动
d:;cd D:\softWin\ProgramFiles_AI\llama\amd
$env:HIP_PATH = "C:\Program Files\AMD\ROCm\6.0" # 请根据实际路径确认
cmake ../llama.cpp -DGGML_VULKAN=ON
cmake --build . --config Release --target llama-cli --target llama-server --target llama-quantize -j 16
cd ..

# ------------------- 3. 编译 Vulkan 版本 (新增) -------------------
# 不需要特殊编译器，MSVC 即可，兼容性最强
d:;cd D:\softWin\ProgramFiles_AI\llama\vulkan
cmake ../llama.cpp -DGGML_VULKAN=ON
cmake --build . --config Release --target llama-cli --target llama-server --target llama-quantize -j 16
cd ..

# ------------------- 4. 设置持久性别名 (PowerShell) -------------------
# 建议将以下内容加入你的 $PROFILE
$binPath = "D:\softWin\ProgramFiles_AI\llama\llama.cpp"
Set-Alias llama-cuda   "$binPath\cuda\bin\llama-cli.exe"
Set-Alias llama-amd    "$binPath\hip\bin\llama-cli.exe"
Set-Alias llama-vulkan "$binPath\vulkan\bin\llama-cli.exe"