continue-plugin-maintenance-manual

Continue 插件维护手册

本手册适用于 Continue 插件（VS Code / JetBrains IDE AI 编程助手）的安装、配置、维护和故障排除

1. 快速开始

1.1 安装 Continue 插件

VS Code

1. 打开 VS Code
2. 进入扩展面板 (Ctrl+Shift+X / Cmd+Shift+X)
3. 搜索 "Continue"
4. 点击安装

JetBrains IDE (IntelliJ, PyCharm, WebStorm 等)

1. 打开 IDE 设置
2. 进入 Plugins -> Marketplace
3. 搜索 "Continue"
4. 点击安装

1.2 首次配置

安装完成后，运行命令：

VS Code: Continue: Edit Settings JSON
JetBrains: 打开 .continue/config.ts 配置文件

2. 安装指南

2.1 系统要求

组件	最低要求
VS Code	1.80.0+
JetBrains IDE	2022.3+
Node.js	16.0.0+ (如使用本地模型)
Python	3.7+ (如使用本地模型)
RAM	8GB (16GB 推荐)

2.2 依赖安装

本地模型运行需要

安装 Ollama (推荐用于本地 LLM)

# macOS
brew install ollama
ollama serve

# Linux
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve

# Windows
# 下载并安装 https://ollama.ai/download

安装 LM Studio (备选)

# 下载 https://lmstudio.ai/
# 启动 LM Studio 并设置本地 API 端口

2.3 验证安装

打开 VS Code / JetBrains IDE
打开 Continue 侧边栏
选择配置好的模型
输入测试消息：Hello, can you help me debug code?

3. 核心配置

3.1 配置文件位置

IDE	配置文件路径
VS Code	`.vscode/settings.json` 或 `~/.continue/config.json`
JetBrains	`.continue/config.ts`

3.2 基础配置模板

{
  "models": [
    {
      "title": "Default Model",
      "provider": "ollama",
      "model": "llama3.2:latest",
      "apiKey": "",
      "contextWindow": 200000,
      "temperature": 0.7
    }
  ],
  "tabAutocompleteEnabled": true,
  "userToken": "",
  "embeddingsModel": {
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

3.3 配置选项说明

选项	类型	说明	默认值
`models`	array	模型配置列表	`[]`
`tabAutocompleteEnabled`	boolean	启用 Tab 自动补全	`true`
`embeddingsModel`	object	嵌入模型配置	`null`
`contextWindow`	number	上下文窗口大小	`4096`
`temperature`	number	温度参数	`0.7`

4. 模型配置

4.1 支持的模型提供商

4.1.1 Ollama (本地模型)

{
  "title": "Llama 3.2 (Ollama)",
  "provider": "ollama",
  "model": "llama3.2:latest",
  "contextWindow": 200000,
  "temperature": 0.7,
  "apiBase": "http://localhost:11434"
}

支持的模型列表:

llama3.2 - 通用模型，平衡性能
llama3.1 - 更大上下文窗口
mistral - 轻量级模型
codellama - 代码专用模型
qwen2.5-coder - 代码专用模型

4.1.2 OpenAI

{
  "title": "GPT-4o",
  "provider": "openai",
  "model": "gpt-4o",
  "apiKey": "sk-...",
  "contextWindow": 128000,
  "temperature": 0.7,
  "apiBase": "https://api.openai.com/v1"
}

4.1.3 Anthropic (Claude)

{
  "title": "Claude 3.5 Sonnet",
  "provider": "anthropic",
  "model": "claude-3-5-sonnet-20241022",
  "apiKey": "sk-ant-...",
  "contextWindow": 200000,
  "temperature": 0.7
}

4.1.4 Google (Gemini)

{
  "title": "Gemini 2.0 Flash",
  "provider": "google",
  "model": "gemini-2.0-flash",
  "apiKey": "AIza...",
  "contextWindow": 1048576,
  "temperature": 0.7
}

4.1.5 Groq (高速推理)

{
  "title": "Llama 3.3 (Groq)",
  "provider": "groq",
  "model": "llama-3.3-70b-versatile",
  "apiKey": "gsk_...",
  "contextWindow": 128000,
  "temperature": 0.7
}

4.2 多模型配置

{
  "models": [
    {
      "title": "GPT-4o (Primary)",
      "provider": "openai",
      "model": "gpt-4o",
      "apiKey": "sk-...",
      "contextWindow": 128000,
      "role": "default"
    },
    {
      "title": "Llama 3.2 (Local)",
      "provider": "ollama",
      "model": "llama3.2:latest",
      "contextWindow": 200000,
      "role": "autocomplete"
    },
    {
      "title": "Claude 3.5 Sonnet",
      "provider": "anthropic",
      "model": "claude-3-5-sonnet-20241022",
      "apiKey": "sk-ant-...",
      "contextWindow": 200000,
      "role": "editor"
    }
  ]
}

4.3 模型角色分配

角色	说明	推荐模型
`default`	默认聊天模型	GPT-4o, Claude 3.5
`autocomplete`	Tab 自动补全	本地小模型，Llama 3.2
`editor`	代码编辑/重构	GPT-4o, Claude 3.5
`embeddings`	代码嵌入	nomic-embed-text

5. 高级配置

5.1 自定义提示词

{
  "customCommands": [
    {
      "name": "explain",
      "description": "解释当前选中的代码",
      "prompt": "请用中文详细解释以下代码的功能和执行流程:\n\n{{selection}}"
    },
    {
      "name": "optimize",
      "description": "优化代码性能",
      "prompt": "请优化以下代码的性能，关注时间复杂度和空间复杂度:\n\n{{selection}}"
    },
    {
      "name": "test",
      "description": "生成单元测试",
      "prompt": "请为以下代码生成完整的单元测试，包括边界条件:\n\n{{selection}}"
    }
  ],
  "systemMessage": "你是一个专业的编程助手，擅长多种编程语言。请用中文回答，并提供清晰的解释和示例。"
}

5.2 上下文源配置

{
  "contextProviders": [
    {
      "name": "file",
      "description": "添加文件到上下文"
    },
    {
      "name": "codebase",
      "description": "添加整个代码库到上下文"
    },
    {
      "name": "folder",
      "description": "添加文件夹到上下文"
    },
    {
      "name": "git",
      "description": "添加 Git 相关上下文"
    },
    {
      "name": "terminal",
      "description": "添加终端输出到上下文"
    }
  ]
}

5.3 嵌入模型配置

{
  "embeddingsModel": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "apiBase": "http://localhost:11434"
  },
  "indexing": {
    "enabled": true,
    "updateInterval": 300000,
    "maxContextChars": 1000000
  }
}

5.4 自动补全配置

{
  "tabAutocompleteEnabled": true,
  "tabAutocompleteModel": {
    "title": "Llama 3.2 (Fast)",
    "provider": "ollama",
    "model": "llama3.2:latest"
  },
  "tabAutocompleteOptions": {
    "requirePrefixChars": 1,
    "debounceMs": 100,
    "maxPromptTokens": 150,
    "fuzzyMatchThreshold": 2.0,
    "useSuffix": true
  }
}

5.5 编辑器功能配置

{
  "editSummary": {
    "enabled": true,
    "showOnFileChange": true
  },
  "codebaseIndex": {
    "enabled": true,
    "embeddingsProvider": "ollama"
  },
  "chatPanelWidth": 400
}

6. 故障排除

6.1 本地推理引擎连接配置

Continue 支持多种本地推理引擎，以下是常用引擎的详细配置和连接问题排查指南。

6.1.1 Ollama API 配置

默认配置:

{
  "title": "Ollama - Llama 3.2",
  "provider": "ollama",
  "model": "llama3.2:latest",
  "apiBase": "http://localhost:11434",
  "contextWindow": 200000,
  "temperature": 0.7
}

完整配置选项:

{
  "title": "Ollama - Advanced",
  "provider": "ollama",
  "model": "llama3.2:latest",
  "apiBase": "http://localhost:11434",
  "contextWindow": 200000,
  "temperature": 0.7,
  "options": {
    "num_thread": 4,
    "num_ctx": 200000,
    "num_predict": 2048,
    "top_p": 0.9,
    "top_k": 40,
    "repeat_penalty": 1.1
  }
}

连接测试:

# 1. 检查 Ollama 服务状态
curl http://localhost:11434/api/tags

# 2. 检查特定模型是否存在
curl http://localhost:11434/api/show -d '{"name": "llama3.2"}'

# 3. 测试生成请求
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello",
  "stream": false
}'

# 4. 检查端口占用
lsof -i :11434
netstat -an | grep 11434

常见问题:

问题	原因	解决方案
Connection refused	Ollama 未运行	`ollama serve`
Model not found	模型未下载	`ollama pull llama3.2`
Timeout	模型加载慢	增加超时设置
CORS error	跨域限制	设置 `OLLAMA_ORIGINS="*"`

6.1.2 llama.cpp API 配置

启动 llama.cpp 服务器:

# 基本启动
./llama-server -m models/llama3.2.gguf -c 200000

# 完整参数启动
./llama-server \
  -m models/llama3.2.gguf \
  -c 200000 \
  --host 0.0.0.0 \
  --port 8080 \
  --ctx-size 200000 \
  --batch-size 512 \
  --nthreads 4 \
  --n-predict 2048 \
  --verbose

Continue 配置:

{
  "title": "llama.cpp - Llama 3.2",
  "provider": "llama.cpp",
  "model": "llama3.2",
  "apiBase": "http://localhost:8080",
  "contextWindow": 200000,
  "temperature": 0.7,
  "options": {
    "n_gpu_layers": -1,
    "n_threads": 4,
    "n_batch": 512,
    "n_predict": 2048,
    "repeat_penalty": 1.1,
    "top_p": 0.9
  }
}

使用 OpenAI 兼容模式（推荐）:

# 启动命令
./llama-server \
  -m models/llama3.2.gguf \
  --host 0.0.0.0 \
  --port 8080

{
  "title": "llama.cpp (OpenAI Compatible)",
  "provider": "openai",
  "model": "llama3.2",
  "apiBase": "http://localhost:8080/v1",
  "apiKey": "dummy",
  "contextWindow": 200000,
  "temperature": 0.7
}

连接测试:

# 1. 检查服务状态
curl http://localhost:8080/v1/models

# 2. 测试聊天完成
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.7
  }'

# 3. 测试流式输出
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

常见问题:

问题	原因	解决方案
GPU not detected	GPU 参数错误	设置 `n_gpu_layers=-1`
Out of memory	上下文过大	减小 `ctx-size` 参数
Slow inference	CPU 推理	启用 GPU 加速
Model load failed	GGUF 格式错误	检查模型文件格式

6.1.3 LM Studio API 配置

LM Studio 设置:

打开 LM Studio
进入 Settings -> Server
配置 API 端口（默认 1234）
启用 CORS（如需跨域访问）

Continue 配置:

{
  "title": "LM Studio - Local",
  "provider": "lmstudio",
  "model": "local",
  "apiBase": "http://localhost:1234",
  "contextWindow": 200000,
  "temperature": 0.7
}

使用 OpenAI 兼容模式（推荐）:

{
  "title": "LM Studio (OpenAI Compatible)",
  "provider": "openai",
  "model": "local",
  "apiBase": "http://localhost:1234/v1",
  "apiKey": "",
  "contextWindow": 200000,
  "temperature": 0.7
}

连接测试:

# 1. 检查服务状态
curl http://localhost:1234/v1/models

# 2. 测试聊天完成
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.7
  }'

# 3. 获取服务器信息
curl http://localhost:1234/v1/server/info

常见问题:

问题	原因	解决方案
No model loaded	未在 LM Studio 加载模型	在 LM Studio 加载模型
Port in use	端口被占用	更改端口设置
Connection timeout	服务未启动	启动 LM Studio 服务器
CORS blocked	跨域限制	启用 CORS 设置

6.1.4 三种引擎对比

特性	Ollama	llama.cpp	LM Studio
安装复杂度	⭐ 简单	⭐⭐⭐ 复杂	⭐⭐ 中等
配置灵活性	⭐⭐ 中等	⭐⭐⭐ 高	⭐⭐ 中等
GPU 支持	⭐⭐⭐ 自动	⭐⭐ 需配置	⭐⭐⭐ 自动
模型格式	GGUF	GGUF	GGUF
默认端口	11434	8080	1234
中文文档	⭐⭐⭐	⭐⭐	⭐⭐⭐
GUI 界面	❌	❌	✅

6.1.5 统一配置模板

以下配置同时支持三种引擎，可根据需要启用：

{
  "models": [
    {
      "title": "Ollama - Llama 3.2",
      "provider": "ollama",
      "model": "llama3.2:latest",
      "apiBase": "http://localhost:11434",
      "contextWindow": 200000,
      "temperature": 0.7,
      "enabled": true
    },
    {
      "title": "llama.cpp - Qwen2.5-Coder",
      "provider": "openai",
      "model": "qwen2.5-coder",
      "apiBase": "http://localhost:8080/v1",
      "apiKey": "dummy",
      "contextWindow": 200000,
      "temperature": 0.7,
      "enabled": false
    },
    {
      "title": "LM Studio - Mixtral",
      "provider": "openai",
      "model": "local",
      "apiBase": "http://localhost:1234/v1",
      "apiKey": "",
      "contextWindow": 200000,
      "temperature": 0.7,
      "enabled": false
    }
  ],
  "tabAutocompleteModel": {
    "title": "Ollama - Llama 3.2",
    "provider": "ollama",
    "model": "llama3.2:latest"
  }
}

6.2 连接问题排查

通用连接测试脚本

创建 test-connections.sh 脚本：

#!/bin/bash

echo "=== Testing Local AI Engine Connections ==="
echo ""

# Test Ollama
echo "[Ollama] http://localhost:11434"
if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
    echo "  ✅ Connected"
    curl -s http://localhost:11434/api/tags | jq -r ".models[].name" 2>/dev/null
else
    echo "  ❌ Not available"
fi
echo ""

# Test llama.cpp
echo "[llama.cpp] http://localhost:8080"
if curl -s http://localhost:8080/v1/models > /dev/null 2>&1; then
    echo "  ✅ Connected"
else
    echo "  ❌ Not available"
fi
echo ""

# Test LM Studio
echo "[LM Studio] http://localhost:1234"
if curl -s http://localhost:1234/v1/models > /dev/null 2>&1; then
    echo "  ✅ Connected"
else
    echo "  ❌ Not available"
fi
echo ""

echo "=== Test Complete ==="

运行测试：

chmod +x test-connections.sh
./test-connections.sh

问题：API 密钥无效

症状: 提示 "Invalid API key"

解决方案:

# 1. 检查密钥是否正确复制
echo "sk-..." | wc -c  # 验证长度

# 2. 验证密钥格式
# OpenAI: sk-xxxxxxxxxxxxxxxxxxxx
# Anthropic: sk-ant-xxxxxxxxxxxxxxxxxxxx
# Google: AIza...xxxxx

# 3. 检查密钥是否过期或被撤销

6.3 防火墙和网络配置

macOS 防火墙设置

# 检查防火墙状态
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getstatus

# 允许 Ollama 通过防火墙
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /Applications/Ollama.app
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblockapp /Applications/Ollama.app

Linux 防火墙设置 (ufw)

# 允许端口
sudo ufw allow 11434  # Ollama
sudo ufw allow 8080   # llama.cpp
sudo ufw allow 1234   # LM Studio

# 检查状态
sudo ufw status

Windows 防火墙设置

# 添加入站规则
New-NetFirewallRule -DisplayName "Ollama" -Direction Inbound -LocalPort 11434 -Protocol TCP -Action Allow
New-NetFirewallRule -DisplayName "llama.cpp" -Direction Inbound -LocalPort 8080 -Protocol TCP -Action Allow
New-NetFirewallRule -DisplayName "LM Studio" -Direction Inbound -LocalPort 1234 -Protocol TCP -Action Allow

6.4 远程访问配置

允许远程连接

Ollama:

# 设置环境变量
export OLLAMA_HOST=0.0.0.0:11434

# 启动服务
ollama serve

llama.cpp:

./llama-server -m model.gguf --host 0.0.0.0 --port 8080

LM Studio: 在设置中修改 Host 为 0.0.0.0

通过 SSH 隧道访问

# 转发远程服务器端口到本地
ssh -L 11434:localhost:11434 user@remote-server

使用 Docker 部署

# Ollama Docker
docker run -d \
  --name ollama \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

# llama.cpp Docker
docker run -d \
  --name llama-cpp \
  -p 8080:8080 \
  -v $(pwd)/models:/models \
  ghcr.io/ggerganov/llama.cpp:server

6.2 性能问题

问题：响应速度慢

诊断步骤:

# 1. 检查网络延迟
ping api.openai.com

# 2. 检查本地模型资源占用
htop

# 3. 检查磁盘空间
df -h

# 4. 检查内存使用情况
free -h

优化措施:

{
  "models": [
    {
      "title": "Optimized Model",
      "provider": "ollama",
      "model": "llama3.2:latest",
      "contextWindow": 65536,
      "temperature": 0.7,
      "maxTokens": 2048,
      "numThreads": 4,
      "numCtx": 65536
    }
  ],
  "tabAutocompleteOptions": {
    "debounceMs": 200,
    "maxPromptTokens": 100
  }
}

本地引擎性能对比

引擎	模型	首字延迟	吞吐量	内存占用
Ollama	llama3.2 3B	~500ms	30 tok/s	~4GB
Ollama	llama3.2 7B	~1s	25 tok/s	~8GB
llama.cpp	qwen2.5-coder 7B	~800ms	28 tok/s	~6GB
LM Studio	mistral 7B	~600ms	32 tok/s	~7GB

测试环境：M2 Pro, 32GB RAM, GPU 加速

6.3 内存问题

问题：内存占用过高

Ollama 内存优化:

# 设置内存限制
export OLLAMA_MAX_LOADED_MODELS=1

# 使用较小量化模型
ollama pull llama3.2:3b  # 3B 版本
ollama pull llama3.2:8b-q4_k_m  # 4-bit 量化

llama.cpp 内存优化:

./llama-server \
  -m model.gguf \
  --memory-f32  # 使用 F32 精度
  --n-gpu-layers 99  # 尽可能使用 GPU
  --ctx-size 32768   # 限制上下文大小

Continue 配置优化:

{
  "contextWindow": 65536,
  "indexing": {
    "enabled": false
  },
  "tabAutocompleteOptions": {
    "maxPromptTokens": 100
  }
}

GPU 内存问题

检查 GPU 使用情况:

# NVIDIA GPU
nvidia-smi

# Apple Silicon
sudo powermetrics --samplers gpu_memory -i 1000

# AMD GPU
rocm-smi

解决方案:

使用量化模型（Q4_K_M, Q5_K_M）
减小上下文窗口
使用较小模型
混合 CPU/GPU 推理

6.4 日志调试

获取调试日志

VS Code:

# 打开开发者工具
Ctrl+Shift+P -> "Developer: Toggle Developer Tools"

# 查看 Continue 扩展日志
Ctrl+Shift+P -> "Extensions: Show Output"
选择 "Continue"

日志文件位置:

# Linux
~/.config/Code/logs/
~/.continue/logs/

# macOS
~/Library/Logs/Code/
~/Library/Application Support/Continue/

# Windows
%APPDATA%\Code\logs\
%APPDATA%\Continue\

启用详细日志

{
  "debug": true,
  "verboseLogging": true
}

Ollama 调试日志

# 启用调试模式
OLLAMA_DEBUG=1 ollama serve

# 查看日志
ollama logs

llama.cpp 调试日志

./llama-server -m model.gguf --verbose

6.5 常见问题速查表

问题	可能原因	解决方案
响应慢	网络延迟	使用本地模型或 CDN
内存溢出	上下文过大	减小 contextWindow
无法连接	防火墙阻止	检查防火墙设置
自动补全不工作	配置错误	检查 tabAutocompleteEnabled
嵌入失败	模型未加载	运行 `ollama pull nomic-embed-text`
GPU 未使用	参数错误	设置 `n_gpu_layers=-1`
模型加载失败	磁盘空间不足	`df -h` 检查空间
CORS 错误	跨域限制	设置允许的来源
超时错误	模型太大	使用较小模型
中文乱码	编码问题	确保 UTF-8 编码

6.6 诊断脚本

系统健康检查脚本:

#!/bin/bash

echo "=== Continue Plugin Health Check ==="
echo ""

# System Resources
echo "[System Resources]"
echo "Memory: $(free -h | awk '/^Mem:/ {print $3"/"$2}')"
echo "Disk: $(df -h / | awk 'NR==2 {print $3"/"$2}')"
echo ""

# Local Engines
echo "[Local AI Engines]"
for port in 11434 8080 1234; do
    if curl -s "http://localhost:$port" > /dev/null 2>&1; then
        echo "  Port $port: ✅ Available"
    else
        echo "  Port $port: ❌ Not available"
    fi
done
echo ""

# Ollama Models
if command -v ollama &> /dev/null; then
    echo "[Ollama Models]"
    ollama list 2>/dev/null || echo "  Ollama not running"
    echo ""
fi

echo "=== Check Complete ==="

运行检查：

chmod +x health-check.sh
./health-check.sh

7. 最佳实践

7.1 生产环境配置

{
  "models": [
    {
      "title": "Primary Model",
      "provider": "openai",
      "model": "gpt-4o",
      "apiKey": "${OPENAI_API_KEY}",
      "contextWindow": 128000,
      "temperature": 0.7
    },
    {
      "title": "Local Fallback",
      "provider": "ollama",
      "model": "llama3.2:latest",
      "contextWindow": 65536,
      "temperature": 0.7
    }
  ],
  "tabAutocompleteEnabled": true,
  "tabAutocompleteModel": {
    "title": "Local Fallback",
    "provider": "ollama",
    "model": "llama3.2:latest"
  }
}

7.2 环境变量管理

# .env 文件
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AIza..."
export GROQ_API_KEY="gsk_..."

# 在配置中使用
"apiKey": "${OPENAI_API_KEY}"

7.3 团队共享配置

{
  "teamConfig": {
    "enabled": true,
    "configUrl": "https://your-server.com/config.json"
  }
}

7.4 安全建议

{
  "security": {
    "enableContentFiltering": true,
    "enableLogging": false,
    "enableTelemetry": false
  }
}

8. 性能优化

8.1 模型选择建议

场景	推荐模型	理由
聊天对话	GPT-4o / Claude 3.5	高质量回答
代码补全	Llama 3.2 (本地)	低延迟
代码审查	GPT-4o	准确识别问题
文档生成	GPT-4o	结构化输出
快速原型	Llama 3.2	成本低

8.2 缓存策略

{
  "cache": {
    "enabled": true,
    "ttl": 3600000,
    "maxSize": 100
  }
}

8.3 资源限制

{
  "resourceLimits": {
    "maxConcurrentRequests": 3,
    "maxContextChars": 100000,
    "maxTokensPerRequest": 4096
  }
}

9. 常见问题 FAQ

Q1: 如何切换模型？

A: 在 Continue 侧边栏顶部点击当前模型名称，选择其他模型。

Q2: 如何添加自定义模型？

A: 在 models 数组中添加新的模型配置对象。

{
  "models": [
    {
      "title": "My Custom Model",
      "provider": "openai",
      "model": "your-model-name",
      "apiKey": "your-api-key",
      "apiBase": "https://your-api-endpoint.com"
    }
  ]
}

Q3: 如何禁用自动补全？

A: 设置 "tabAutocompleteEnabled": false

Q4: 如何重置配置？

# 删除配置文件
rm ~/.continue/config.json
# 然后重启 IDE

Q5: 如何使用命令行与模型交互？

# 使用 curl
curl -X POST http://localhost:11434/api/chat \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

附录

A. API 端点参考

提供商	API 端点
OpenAI	https://api.openai.com/v1
Anthropic	https://api.anthropic.com/v1
Google	https://generativelanguage.googleapis.com/v1beta
Groq	https://api.groq.com/openai/v1

B. 键盘快捷键

功能	VS Code	JetBrains
打开 Continue	`Ctrl+Shift+P` → Continue	`Cmd+Shift+A` → Continue
选中代码解释	`Ctrl+Shift+P` → Continue: Explain	-
选中代码测试	`Ctrl+Shift+P` → Continue: Test	-
选中代码文档	`Ctrl+Shift+P` → Continue: Doc	-

C. 版本兼容性

Continue 版本	VS Code	JetBrains	发布日期
0.1.0+	1.80.0+	2022.3+	2024-01
0.2.0+	1.85.0+	2023.1+	2024-06
0.3.0+	1.90.0+	2023.3+	2024-11

D. 资源链接

官方文档: https://docs.continue.dev
GitHub 仓库: https://github.com/continuedev/continue
社区论坛: https://discord.gg/vapESyrZmR

版本历史

版本	日期	变更说明
1.0.0	2024-12-01	初始版本
1.1.0	2025-01-15	添加 Groq 支持
1.2.0	2025-02-20	优化自动补全性能
1.3.0	2026-04-08	添加本地推理引擎详细配置（Ollama、llama.cpp、LM Studio）

最后更新: 2026-04-08

维护者: [Your Name]

联系方式: [Your Email]

continue-plugin-maintenance-manual

Continue 插件维护手册

目录

1. 快速开始

1.1 安装 Continue 插件

VS Code

JetBrains IDE (IntelliJ, PyCharm, WebStorm 等)

1.2 首次配置

2. 安装指南

2.1 系统要求

2.2 依赖安装

本地模型运行需要

2.3 验证安装

3. 核心配置

3.1 配置文件位置

3.2 基础配置模板

3.3 配置选项说明

4. 模型配置

4.1 支持的模型提供商

4.1.1 Ollama (本地模型)

4.1.2 OpenAI

4.1.3 Anthropic (Claude)

4.1.4 Google (Gemini)

4.1.5 Groq (高速推理)

4.2 多模型配置

4.3 模型角色分配

5. 高级配置

5.1 自定义提示词

5.2 上下文源配置

5.3 嵌入模型配置

5.4 自动补全配置

5.5 编辑器功能配置

6. 故障排除

6.1 本地推理引擎连接配置

6.1.1 Ollama API 配置

6.1.2 llama.cpp API 配置

6.1.3 LM Studio API 配置

6.1.4 三种引擎对比

6.1.5 统一配置模板

6.2 连接问题排查

通用连接测试脚本

问题：API 密钥无效

6.3 防火墙和网络配置

macOS 防火墙设置

Linux 防火墙设置 (ufw)

Windows 防火墙设置

6.4 远程访问配置

允许远程连接

通过 SSH 隧道访问

使用 Docker 部署

6.2 性能问题

问题：响应速度慢

本地引擎性能对比

6.3 内存问题

问题：内存占用过高

GPU 内存问题

6.4 日志调试

获取调试日志

启用详细日志

Ollama 调试日志

llama.cpp 调试日志

6.5 常见问题速查表

6.6 诊断脚本

7. 最佳实践

7.1 生产环境配置

7.2 环境变量管理

7.3 团队共享配置

7.4 安全建议

8. 性能优化

8.1 模型选择建议

8.2 缓存策略

8.3 资源限制

9. 常见问题 FAQ

Q1: 如何切换模型？

Q2: 如何添加自定义模型？

Q3: 如何禁用自动补全？

Q4: 如何重置配置？

Q5: 如何使用命令行与模型交互？

附录

A. API 端点参考