从零开始部署DeepSeek：基于Ollama+Flask的本地化AI对话系统

从零开始部署DeepSeek：基于Ollama+Flask的本地化AI对话系统

一、部署背景与工具选型

在AI大模型遍地开花的2025年，DeepSeek R1凭借其出色的推理能力和开源特性成为开发者首选。本文将以零基础视角，通过以下工具链实现本地化部署：

1.Ollama：轻量级模型管理工具，支持一键拉取、运行模型

Ollama 是一个功能强大的大语言模型管理端，专为下载、运行和调用大型语言模型（如DeepSeek）而设计。它提供了以下几个核心功能：

模型下载：支持从官方仓库下载不同规模的模型。
模型运行：通过API提供服务，让用户与模型进行交互。
实时对话模拟：以流式方式展示模型回复。

2.DeepSeek : 平衡性能与资源消耗的中等规模模型

DeepSeek 是目标语言模型，可以根据您的硬件配置选择不同的模型规模（如7B、13B等）。它具有以下特点：

可扩展性：根据内存和计算资源调整模型大小。
高效训练：支持并行训练以加速模型收敛。

3.Python+Flask：构建Web交互界面，支持流式响应

Flask 是一个轻量级的Web框架，广泛应用于快速开发Web应用。结合Python，它为构建动态Web界面提供了灵活性：

API集成：通过Flask API调用Ollama服务。
前端动态：使用JavaScript和HTML创建交互式对话框。

4.HTML+JavaScript：实现类ChatGPT的对话交互体验

HTML 是构建Web界面的基础语言，用于设计用户友好的对话框，并将其嵌入到Flask应用中。您可以通过以下方式集成：

响应式布局：使用CSS样式表确保界面在不同设备上适配。
动态内容展示：通过JavaScript更新页面内容，模拟模型回复，需要模拟打字机方式。

**备注说明**: 不想自己编程的话，可使用chatbox直接接入ollama

二、环境准备与模型部署

1. 安装Ollama

Windows/Mac用户：
访问Ollama官网下载安装包，双击完成安装。 （github下载很慢，需要使用加速或从其他地方下载）

Linux用户：

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl start ollama

2. 下载DeepSeek模型

根据硬件配置选择模型（显存要求参考）：

基础版（1.5B）：适合笔记本（8GB内存）
标准版（7B）：推荐配置（16GB内存+8GB显存）
加强版（70B）：需专业显卡（如RTX 5090）

# 拉取7B模型
ollama run deepseek-r1:7b

上述指令会完成下载和运行两个步骤（ollama支持断线续传功能，而且我发现速度慢下来的时候，手动停止然后再启动接着下载模型的话，速度会恢复上来），成功运行后可通过命令行直接向deepseek提问，如下：

另外，也可以在命令行手动执行ollama相关指令：

无参数启动

C:\Users\arbboter\Desktop>ollama
Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

命令	功能	示例
`ollama serve`	启动模型	`ollama serve`
`ollama pull`	下载模型	`ollama pull deepseek-r1:7b`
`ollama list`	查看本地模型	`ollama list`
`ollama rm`	删除模型	`ollama rm deepseek-r1:1.5b`
`ollama cp`	复制模型	`ollama cp deepseek-r1:7b my-backup`

3.配置ollama

# 限制本地访问（默认配置）
export OLLAMA_HOST=127.0.0.1

# 允许局域网访问
export OLLAMA_HOST=0.0.0.0

# 启用调试
export OLLAMA_DEBUG=1

# 自定义模型路径（可能需重启机器生效）
export OLLAMA_MODELS=~\ollama\models

三、构建Web对话系统

备注说明：选型`flask`及代码编写基本是由`deepseek`完成，本人只负责提问和修缮

1. 项目结构

deepseek-web/
├── app.py          # Flask主程序
├── templates/
│   └── index.html  # 聊天界面

2. Flask后端实现（app.py）

# app.py
from flask import Flask, render_template, request, Response
import ollama
import logging
from datetime import datetime

app = Flask(__name__)

# 配置日志
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('chat.log'),
        logging.StreamHandler()
    ]
)

def chat_ollama(user_message, stream):
    host = 'http://192.168.1.10:11434'
    cli = ollama.Client(host=host)
    response = cli.chat(
        model='deepseek-r1:7b',
        messages=[{'role': 'user', 'content': user_message}],
        stream=stream,
        # options={'temperature': 0.7}
    )
    return response

@app.route('/')
def index():
    return render_template('index.html')


@app.route('/api/chat', methods=['POST'])
def chat():
    """流式聊天接口"""
    def generate(user_message):
        try:
            app.logger.info(f"流式处理开始: {user_message[:50]}...")

            stream = chat_ollama(user_message, True)
            for chunk in stream:
                content = chunk['message']['content']
                if content.startswith('<think>'):
                    content = content.replace('<think>', '', 1)
                elif content.startswith('</think>'):
                    content = content.replace('</think>', '\n', 1)
                app.logger.debug(f"发送数据块: {content}")
                yield f"{content}"
                
            app.logger.info("流式处理完成")

        except Exception as e:
            app.logger.error(f"流式错误: {str(e)}")
            yield f"[ERROR] {str(e)}\n\n"

    return Response(generate(request.json['message']), mimetype='text/event-stream')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5001, debug=True)

3. 前端实现（index.html）

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>AI 对话助手</title>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/github.min.css">
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>
    <style>
        :root {
            --primary-color: #10a37f;
            --bg-color: #f0f2f5;
        }
        body {
            margin: 0;
            padding: 20px;
            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
            background-color: var(--bg-color);
            max-width: 800px;
            margin: 0 auto;
        }
        #chat-container {
            height: 70vh;
            overflow-y: auto;
            border: 1px solid #e0e0e0;
            border-radius: 8px;
            padding: 15px;
            background: white;
            margin-bottom: 20px;
        }
        .message {
            margin: 12px 0;
            display: flex;
            gap: 15px;
        }
        .user-message {
            justify-content: flex-end;
        }
        .message-content {
            max-width: 80%;
            padding: 12px 16px;
            border-radius: 8px;
        }
        .assistant-message .message-content {
            background: #f8f9fa;
            border: 1px solid #e0e0e0;
        }
        .user-message .message-content {
            background: var(--primary-color);
            color: white;
        }
        #input-container {
            display: flex;
            gap: 10px;
        }
        #user-input {
            flex: 1;
            padding: 12px;
            border: 1px solid #e0e0e0;
            border-radius: 8px;
            resize: none;
            min-height: 44px;
        }
        button {
            background: var(--primary-color);
            color: white;
            border: none;
            padding: 0 20px;
            border-radius: 8px;
            cursor: pointer;
            transition: opacity 0.2s;
        }
        button:disabled {
            opacity: 0.6;
            cursor: not-allowed;
        }
        .typing-indicator {
            display: inline-block;
            padding: 8px 12px;
            background: #f8f9fa;
            border-radius: 8px;
            border: 1px solid #e0e0e0;
        }
        .dot {
            display: inline-block;
            width: 6px;
            height: 6px;
            margin-right: 3px;
            background: #ccc;
            border-radius: 50%;
            animation: bounce 1.4s infinite;
        }
        @keyframes bounce {
            0%, 80%, 100% { transform: translateY(0) }
            40% { transform: translateY(-6px) }
        }

        /* markdown基础样式 */
        .markdown-content {
            line-height: 1.6;
            transition: opacity 0.3s;
        }

        .markdown-content:not(.markdown-rendered) {
            opacity: 0.5;
        }

        .markdown-content h1 { font-size: 2em; margin: 0.67em 0; }
        .markdown-content h2 { font-size: 1.5em; margin: 0.83em 0; }
        .markdown-content pre { 
            background: #f5f5f5;
            padding: 1em;
            border-radius: 4px;
            overflow-x: auto;
        }
    </style>
</head>
<body>
    <div id="chat-container"></div>
    <div id="input-container">
        <textarea id="user-input" placeholder="输入消息..." rows="1"></textarea>
        <button id="send-btn" onclick="sendMessage()">发送</button>
    </div>

    <script>
        const chatContainer = document.getElementById('chat-container');
        const userInput = document.getElementById('user-input');
        const sendBtn = document.getElementById('send-btn');

        // 滚动到底部
        function scrollToBottom() {
            
        }

        function renderMarkdown(options = {}) {
            // 合并配置参数
            const config = {
                selector: options.selector || '.markdown-content',
                breaks: options.breaks ?? true,
                gfm: options.gfm ?? true,
                highlight: options.highlight || null
            };

            // 配置Marked
            marked.setOptions({
                breaks: config.breaks,
                gfm: config.gfm,
                highlight: config.highlight
            });

            // 渲染处理器
            const render = () => {
                document.querySelectorAll(config.selector).forEach(container => {
                    if (container.dataset.rendered) return;
                    
                    // 创建虚拟容器避免内容闪烁
                    const virtualDiv = document.createElement('div');
                    virtualDiv.style.display = 'none';
                    virtualDiv.innerHTML = container.innerHTML.trim();
                    
                    // 执行Markdown转换
                    container.innerHTML = marked.parse(virtualDiv.innerHTML);
                    container.dataset.rendered = true;
                    
                    // 添加加载动画
                    container.classList.add('markdown-rendered');
                });
            };

            // 自动执行渲染
            if (document.readyState === 'complete') {
                render();
            } else {
                document.addEventListener('DOMContentLoaded', render);
            }
        }

        // 添加用户消息
        function addUserMessage(content) {
            const messageDiv = document.createElement('div');
            messageDiv.className = 'message user-message';
            messageDiv.innerHTML = `
                <div class="message-content">${content}</div>
            `;
            chatContainer.appendChild(messageDiv);
        }

        // 添加AI消息（流式）
        async function addAssistantMessageStream() {
            const messageDiv = document.createElement('div');
            messageDiv.className = 'message assistant-message';
            messageDiv.innerHTML = `
                <div class="message-content markdown-content">
                    <div class="typing-indicator">
                        <span class="dot"></span>
                        <span class="dot" style="animation-delay: 0.2s"></span>
                        <span class="dot" style="animation-delay: 0.4s"></span>
                    </div>
                </div>
            `;
            chatContainer.appendChild(messageDiv);
            return messageDiv.querySelector('.message-content');
        }

        // 发送消息
        async function sendMessage() {
            const content = userInput.value.trim();
            if (!content) return;

            sendBtn.disabled = true;
            userInput.disabled = true;
            userInput.value = '';
            
            addUserMessage(content);
            const responseContainer = await addAssistantMessageStream();

            try {
                const response = await fetch('/api/chat', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        // 如果需要认证
                        // 'Authorization': 'Bearer YOUR_TOKEN'
                    },
                    body: JSON.stringify({ message: content })
                });

                if (!response.ok) throw new Error('请求失败');
                
                await this.createStreamTypewriter(response, responseContainer, {});
                this.scrollToBottom();
            } catch (error) {
                responseContainer.innerHTML = '❌ 请求出错: ' + error.message;
            } finally {
                sendBtn.disabled = false;
                userInput.disabled = false;
                userInput.focus();
            }
        }

        // 输入框事件处理
        userInput.addEventListener('keydown', (e) => {
            if (e.key === 'Enter' && !e.shiftKey && !e.ctrlKey) {
                e.preventDefault();
                sendMessage();
            } else if (e.key === 'Enter' && (e.ctrlKey || e.metaKey)) {
                userInput.value += '\n';
            }
        });

        async function createStreamTypewriter(stream, container, options = {}) {
                const config = {
                    baseSpeed: 50,
                    maxSpeedup: 3,
                    retryCount: 3,
                    ...options
                };
                this.reader = null;

                // 状态控制
                let isDestroyed = false;
                let cursorVisible = true;
                let renderQueue = [];
                let retryCounter = 0;

                // DOM元素初始化
                const cursor = document.createElement('span');
                cursor.className = 'typewriter-cursor';
                cursor.textContent = '▌';
                container.append(cursor);

                // 光标动画
                const cursorInterval = setInterval(() => {
                    cursor.style.opacity = cursorVisible ? 1 : 0;
                    cursorVisible = !cursorVisible;
                }, 600);

                // 核心渲染逻辑
                const renderEngine = () => {
                    if (renderQueue.length === 0 || isDestroyed) return;

                    // 动态调速算法
                    const speed = Math.max(
                        config.baseSpeed / config.maxSpeedup,
                        config.baseSpeed - renderQueue.length * 2
                    );

                    const fragment = document.createDocumentFragment();
                    while (renderQueue.length > 0) {
                        const char = renderQueue.shift();
                        fragment.append(document.createTextNode(char));
                    }

                    container.insertBefore(fragment, cursor);
                    setTimeout(() => requestAnimationFrame(renderEngine), speed);
                };

                // 流数据处理
                const processStream = async () => {
                    try {
                        this.reader = stream.body.getReader();
                        // Fetch模式处理
                        while (!isDestroyed) {
                            const { done, value } = await this.reader.read();
                            if (done) break;
                            renderQueue.push(...new TextDecoder().decode(value).split(''));
                            if (!renderQueue.length) continue;
                            requestAnimationFrame(renderEngine);
                        }
                    } catch (err) {
                        if (retryCounter++ < config.retryCount && !isDestroyed) {
                            processStream();
                        } else {
                            destroy();
                            throw new Error('Stream connection failed');
                        }
                    } finally {
                        container.innerHTML = marked.parse(container.textContent.replace('▌', ''));
                        destroy();
                    }
                };

                // 资源清理
                const destroy = () => {
                    if (isDestroyed) return;
                    isDestroyed = true;
                    clearInterval(cursorInterval);
                    cursor.remove();
                    if (stream.cancel) stream.cancel();
                    if (stream.close) stream.close();
                };

                // 启动引擎
                processStream();
                return { destroy };
            };
    </script>
</body>
</html>

四、启动与优化

1. 运行系统

flask run --port 5001

访问 http://localhost:5001 即可开始对话，如下：
主页面

2. 性能优化技巧

量化加速：使用ollama run deepseek-r1:7b --quantize q4_0 减少显存占用
GPU加速：在Ollama配置中启用CUDA支持

五、安全注意事项

端口防护：

export OLLAMA_HOST=127.0.0.1  # 禁止外部访问
netstat -an | grep 11434      # 验证监听地址

防火墙规则：

sudo ufw deny 11434/tcp      # 禁用Ollama外部端口

六、扩展应用场景

私有知识库：接入LangChain处理本地文档
自动化脚本：通过API实现代码生成/自动Debug
硬件控制：结合HomeAssistant实现语音智能家居

完整代码已开源（其实没有)：GitHub仓库地址
通过本教程，您已掌握从模型部署到应用开发的全流程。本地化AI部署不仅降低成本，更保障了数据隐私，为开发者提供了真正的「AI自由」！