实战指南：封装Whisper为FastAPI接口并实现高并发处理-附整合包

news2025/4/17 14:02:47

实战指南：封装Whisper为FastAPI接口并实现高并发处理

下面给出一个详细的示例，说明如何使用 FastAPI 封装 OpenAI 的 Whisper 模型，提供一个对外的 REST API 接口，并支持一定的并发请求。

下面是主要步骤和示例代码。

1. 环境准备

Python 环境： 建议使用 Python 3.8+。
依赖库：
- FastAPI：轻量级、高性能的 Python web 框架。
- Uvicorn：用于运行 FastAPI 的 ASGI 服务器。
- Whisper：开源的语音识别模型，依赖于 PyTorch，因此需提前安装 torch（根据具体设备配置选择版本）。

可以使用 pip 安装依赖：

pip install fastapi uvicorn
# pip install git+https://github.com/openai/whisper.git 这个网络问题比较大
pip install openai-whisper
pip install torch  # 根据硬件环境选择合适的版本

# torch
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

2. 项目结构

项目目录结构可以如下：

whisper_fastapi/
├── models
├── app.py
└── requirements.txt

其中 requirements.txt 可写入：

fastapi
uvicorn
torch
# git+https://github.com/openai/whisper.git
openai-whisper

3. 编写 FastAPI 应用

在 app.py 中完成以下主要内容：

模型加载
为了避免每次请求都重复加载模型，建议在应用启动时加载一次模型，可以定义为全局变量。
接口定义
使用 POST 接口接收音频文件（例如 MP3、WAV 等）通过文件上传方式。注意这里使用 UploadFile 与 File。
并发执行
由于 Whisper 的转录过程比较耗时且是 CPU 或 GPU 密集型的计算，我们可以将其放入线程池中执行。FastAPI 中通过 asyncio.get_running_loop().run_in_executor(...) 调用同步的转录方法，让异步接口可以处理并发。

下面给出示例代码：

import sys
import shutil
import tempfile
import asyncio
import warnings
import torch  # 用于检测 CUDA 是否可用
from fastapi import FastAPI, UploadFile, File, HTTPException, Query
from fastapi.responses import JSONResponse
import whisper  # 导入 OpenAI 的 Whisper 模型
from concurrent.futures import ThreadPoolExecutor

# 检查 ffmpeg 是否可用
if shutil.which("ffmpeg") is None:
    sys.exit("错误：未找到 ffmpeg。请下载并安装 ffmpeg，并确保其所在目录已添加到系统 PATH 环境变量中。")

app = FastAPI(title="Whisper FastAPI 接口")

# 指定本地模型文件存储目录，事先要将下载好的模型文件放入该目录
local_model_dir = "./models"

# 用一个全局字典缓存加载过的模型，避免重复加载
loaded_models = {}

# 自动检测设备：如果 CUDA 可用则使用 GPU，否则使用 CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cpu":
    warnings.filterwarnings("ignore", message="FP16 is not supported on CPU; using FP32 instead")
print(f"使用的设备：{device}")


def load_model_if_needed(model_name: str):
    """
    检查全局字典中是否已存在指定 model_name 对应的模型，
    如果不存在，则从本地目录加载模型并保存到缓存中，
    并使用 device 参数确保模型加载到正确的设备上。
    """
    if model_name not in loaded_models:
        try:
            model = whisper.load_model(model_name, download_root=local_model_dir, device=device)
            loaded_models[model_name] = model
        except Exception as e:
            raise RuntimeError(
                f"加载 Whisper 模型 {model_name} 失败，请检查本地模型文件是否存在或模型路径配置是否正确"
            ) from e
    return loaded_models[model_name]


# 创建线程池，用于并发处理（模型加载和转录过程可能较为耗时）
executor = ThreadPoolExecutor(max_workers=4)


def transcribe_audio(model, file_path: str) -> dict:
    """
    对给定音频文件进行转录，返回转录结果。
    根据设备自动启用或禁用 fp16 模式：
      - GPU：fp16=True
      - CPU：fp16=False
    """
    try:
        result = model.transcribe(file_path, fp16=(device == "cuda"))
        return result
    except Exception as e:
        return {"error": str(e)}


@app.post("/transcribe")
async def transcribe(
        file: UploadFile = File(...),
        model_name: str = Query("base", description="使用的模型名称（如：tiny, base, small, medium, large）")
):
    """
    接收上传的音频文件及可选参数 model_name，
    通过指定的模型进行转录并返回结果。
    """
    if file.content_type not in [
        "audio/wav",
        "audio/x-wav",
        "audio/wave",
        "audio/x-pn-wav",
        "audio/mpeg",
        "audio/mp3"
    ]:
        raise HTTPException(status_code=400, detail="文件类型不支持，请上传 WAV 或 MP3 格式的音频文件")

    # 保存上传文件到临时文件
    try:
        suffix = "." + file.filename.split(".")[-1]
    except Exception:
        suffix = ".wav"
    try:
        with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
            contents = await file.read()
            tmp.write(contents)
            tmp_path = tmp.name
    except Exception as e:
        raise HTTPException(status_code=500, detail="保存临时文件失败")

    # 使用线程池加载指定的模型（如果尚未加载）
    loop = asyncio.get_running_loop()
    try:
        model = await loop.run_in_executor(executor, load_model_if_needed, model_name)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

    # 异步调用转录任务，放入线程池执行以避免阻塞事件循环
    transcription_result = await loop.run_in_executor(executor, transcribe_audio, model, tmp_path)

    if "error" in transcription_result:
        raise HTTPException(status_code=500, detail=transcription_result["error"])

    return JSONResponse(content=transcription_result)


if __name__ == "__main__":
    import uvicorn

    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)

4. 运行与部署

本地测试
在项目目录下运行：
```
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
```
访问 http://localhost:8000/docs 可查看 FastAPI 自动生成的 API 文档，测试接口。
并发支持说明
- 这里我们通过 ThreadPoolExecutor 将转录任务分发到子线程上，利用多线程来处理阻塞的 CPU 密集型任务，支持一定的并发。
- 在正式生产环境中，建议考虑使用 GPU 加速模型推理，并根据服务器硬件资源配置合理的线程数或进程数。另外，也可使用 Uvicorn 的多进程启动，例如：
```
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4
```
容错与日志
根据需要可以增加异常处理、日志记录和监控，这里给出一个简单示例，您可以根据需求扩展。

5. 总结

环境搭建与依赖安装：确保 Python、FastAPI、Uvicorn、Whisper 及其相关依赖正确安装。
全局加载模型：避免重复加载模型，提高接口响应效率。
接口实现：使用 FastAPI 实现 /transcribe 接口，通过上传文件参数进行音频转录。
http://localhost:8000/transcribe?model_name=base
并发处理：将耗时的模型转录调用放置在线程池中执行，并结合 uvicorn 部署参数进一步扩展并发。

这样，一个简单的封装了 Whisper 模型的 FastAPI 接口就搭建完成了，可以支持并发调用，对外提供语音转文本的服务。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/2335949.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！