DeepSeek 大语言模型

仓库

官网 https://www.deepseek.com/
safetensors格式
- R1 github仓库 https://github.com/deepseek-ai/DeepSeek-R1
- R1-8B hf仓库 https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- R1-8B文档 https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- 本地部署 https://techdiylife.github.io/blog/blog.html?category1=c02&blogid=0034
- 本地部署 https://zhuanlan.zhihu.com/p/20585987738
GGUF格式
- ollama官网 https://ollama.com/download/windows 执行ollama run deepseek-r1:8b

本地推理方案

llama.cpp： https://github.com/ggerganov/llama.cpp
- 特点：基于C/C++，支持Window，Mac，Linux；支持AMD显卡；API访问支持（兼容OpenAI API），支持Multimodal models
- 推荐的模型：llama支持，qwen1.5，Yi
- 使用方法
  Yi模型提供： https://github.com/01-ai/Yi?tab=readme-ov-file#quick-start---llamacpp
ollama： https://github.com/ollama/ollama
- 官网：https://ollama.ai/
- 特点：llama官方文档称其为最简单的方式, 图标非常可爱。支持Window，Mac，Linux；支持AMD显卡；支持视觉模型；REST API
- 推荐的模型：llama，qwen1.5
HuggingFace TGI： https://github.com/huggingface/text-generation-inference
- 特点：Huggingface出品，支持Window，Mac，Linux；支持AMD显卡；API
- 推荐的模型：llama
- 使用方法
  llama模型提供：https://github.com/meta-llama/llama-recipes/tree/main/recipes/inference/model_servers/hf_text_generation_inference
TensorRT-LLM （NVidia）：https://github.com/NVIDIA/TensorRT-LLM
- 特点：英伟达出品，2023年10月开始
- 推荐的模型：mistral
MLC LLM： https://github.com/mlc-ai/mlc-llm
- 特点：目标是要支持所有平台，ios，Android，Web Browser，也支持Linux，Win，macOS
- 推荐的模型：llama 推荐ios，android量化版运行工具
liteLLM：https://github.com/BerriAI/litellm
- 特点：通过OpenAI API格式方法LLM
- 推荐的模型：llama推荐的支持OpenAI Style API
LlamaIndex：https://github.com/run-llama/llama_index
- 特点：支持RAG应用（retrieval augmented generation ）
- 推荐的模型：Qwen模型提到
闻达：https://github.com/wenda-LLM/wenda
- 特点：国产，支持知识库对接，本地，在线搜索。对话管理，多用户支持
- 推荐的模型：ChatGLM有提到
llamafile：https://github.com/Mozilla-Ocho/llamafile
- 特点：基于llama.cpp开发的工具
- 推荐的模型：qwen1.5提到
ScaleLLM：https://github.com/vectorch-ai/ScaleLLM
- 特点：小Team，开发中
- 推荐的模型：Yi
LM studio：https://lmstudio.ai/拖入模型文件即可
Google AI studio：https://aistudio.google.com/在线AI api key接入平台

vLLM方案

拉取仓库
- 排除大文件拉取 GIT_LFS_SKIP_SMUDGE=1 git clone git@hf.co:deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  - 仓库添加ssh权限
    - 本地生成密钥 ssh-keygen -t ed25519 -C 'email@x.com' -f ~/.ssh/file_name
    - 本地挂载密钥 ssh-add ~/.ssh/file_name
    - 远端添加公钥 cat ~/.ssh/file_name
    - 增加域名信息 vi ~/.ssh/config
      # hf.co配置
      Host hf.co
      HostName hf.co
      IdentityFile ~/.ssh/file_name
    - 本地验证 ssh -T git@hf.co
    - 本地增加指 vi ~/.ssh/known_hosts （官方ssh帮助文件）
- 支持lfs大文件 git lfs install
- 拉去lfs大文件 git lfs pull（我是从网页单下的大文件）
安装：pip install vLLM
本地运行：vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
- ModuleNotFoundError: No module named 'resource'
  目前vLLM只支持linux系统，不能运行在windows

SGlang方案

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2

Ollama方案

安装ollama
- 下载解压zip https://github.com/ollama/ollama/releases
- 配置环境变量
  - 执行路径 OLLAMA_HOME = D:\softWin\ProgramFiles_AI\ollama-windows-amd64-v0.5.7
  - 模型路径 OLLAMA_MODELS = D:\softWin\ProgramFiles_AI\ollama-models
  - 定义端口 OLLAMA_HOST = 8001 默认 http://localhost:11434
  - 跨域配置 OLLAMA_ORIGINS = *
- 验证：ollama -v
启动服务
- 启动 ollama serve
- ps后台：Start-Process -FilePath "ollama" -ArgumentList "serve" -WindowStyle Hidden
- 停止 ollama stop
- 查看进程：tasklist | findstr /I “open-webui ollama”
- 结束进程：taskkill /IM ollama.exe /F
挂载模型
- 查看模型：ollama list
- 运行：ollama run deepseek-r1:8b
- 停止：ollama stop deepseek-r1:8b
启动可视化
- 配置python环境
  - 开启conda终端
  - 创建环境：conda create -n openwebui_311 python=3.11
  - 激活环境：conda activate openwebui_311
- 安装web：pip install --upgrade open-webui
  - 本地版本：pip show open-webui
  - 仓库版本：pip index versions open-webui
  - 官方版本：(Invoke-RestMethod -Uri "https://pypi.org/pypi/open-webui/json").info.version
  - 官方更新：pip install --upgrade --no-cache-dir -i https://pypi.org/simple open-webui
- 启动web
  - 直接启动：open-webui serve
  - cmd后台：start /b conda run -n openwebui_311 open-webui serve
  - ps后台：Start-Process -FilePath "conda.exe" -ArgumentList “run -n openwebui_311 open-webui serve --port 8080”-WindowStyle Hidden
  - 验证：
  - 访问web：http://localhost:8080，默认已关联ollama serve的http://localhost:11434
  - 结束进程：taskkill /IM open-webui.exe /F