基于微软TTS，优雅的实现文本转语音

news2025/7/4 2:55:28

项目介绍

该项目源自以前了解的edge-tts，edge-tts 是一个python库，用于将文本转换为语音，它依赖于 Microsoft Azure 的 Text-to-Speech 服务，可以轻松实现本地文字转语音，在所有的文字转语音的服务中，说它是"最好用的"也不为过，包含了众多“网红主播”的voice （晓晓、云扬、云希...）

本地使用方法如下：

#安装edge-tts库
pip install edge-tts

#查看支持的声音
edge-tts --list-voices

#执行以下命令就可以生成音频(采用xiaoxiao的中文发音)
edge-tts  --voice zh-CN-XiaoxiaoNeural --text "Hello, world!" --write-media hello.mp3

后续在这个库的基础上，用python封装成了简单的web服务，使项目能够通过普通的http来调用文本转语音服务

项目主要采用了Flask + edge-tts + gunicorn + cos来实现了这个web服务。可以将文件保存到本地，也可以上传到腾讯云COS，项目能本地运行也可以通过Docker部署到应用服务器

具体的代码实现

1、引用flask框架和flask_cors ，来实现简单的 Flask Web 服务

from flask import Flask, request
from flask_cors import CORS


app = Flask(__name__, static_folder='tts')  # 指定静态文件夹
CORS(app)  # 这样设置允许所有来源的请求



if __name__ == "__main__":
    app.run(port=2020,host="127.0.0.1",debug=True)

2、接收http请求，处理文本的转换

2.1 voice选用了常用的中文角色来做了一个简单的封装方便接口传参


voiceMap = {
    "xiaoxiao": "zh-CN-XiaoxiaoNeural",
    "xiaoyi": "zh-CN-XiaoyiNeural",
    "yunjian": "zh-CN-YunjianNeural",
    "yunxi": "zh-CN-YunxiNeural",
    "yunxia": "zh-CN-YunxiaNeural",
    "yunyang": "zh-CN-YunyangNeural",
    "xiaobei": "zh-CN-liaoning-XiaobeiNeural",
    "xiaoni": "zh-CN-shaanxi-XiaoniNeural",
    "hiugaai": "zh-HK-HiuGaaiNeural",
    "hiumaan": "zh-HK-HiuMaanNeural",
    "wanlung": "zh-HK-WanLungNeural",
    "hsiaochen": "zh-TW-HsiaoChenNeural",
    "hsioayu": "zh-TW-HsiaoYuNeural",
    "yunjhe": "zh-TW-YunJheNeural",
}


def getVoiceById(voiceId):
    return voiceMap.get(voiceId)

2.2 通过dealAudio接口来接收参数，调用createAudio 来处理文字转音频


def createAudio(text, file_name, voiceId):
    new_text = remove_html(text)
    print(f"Text without html tags: {new_text}")
    voice = getVoiceById(voiceId)
    if not voice:
        return "error params"

    pwdPath = os.getcwd()
    #本地路径
    filePath = pwdPath + "/tts/" + file_name
    #相对路径
    relativePath = "/tts/" + file_name
    dirPath = os.path.dirname(filePath)
    if not os.path.exists(dirPath):
        os.makedirs(dirPath)
    if not os.path.exists(filePath):
        # 用open创建文件 兼容mac
        open(filePath, 'a').close()

    script = 'edge-tts --voice ' + voice + ' --text "' + new_text + '" --write-media ' + filePath
    os.system(script)
    #这里可以选择上传云存储和本地使用
    # 上传到腾讯云COS云存储-返回云存储地址
#     url = uploadCos(filePath, relativePath)

    # 音频保存到本地-直接返回音频地址
    url = f'http://127.0.0.1:2020/{relativePath}'
    return url



@app.route('/dealAudio',methods=['POST','GET'])
def dealAudio():
    text = getParameter('text')
    file_name = getParameter('file_name')
    voice = getParameter('voice')
    return createAudio(text, file_name, voice)

注释部分可以看到，可以选用本地化存储和云存储两种方式

3 音频文件的使用和调用

3.1 本地存储，在生成音频的时候已经存储到本地了，只需要开通一个本地静态文件调用的接口


# 添加一个路由来处理静态文件的请求
@app.route('/static/<path:filename>')
def serve_static(filename):
    return send_from_directory(app.static_folder, filename)

3.2 上传到腾讯云COS

提供的sdk上传文件到cos中（当然也能上传到其他云厂商的云存储中，看个人喜好），上传完成之后删除本地生成文件


#上传到COS
def uploadCos(file_path,relativePath):
    # 腾讯云COSV5Python SDK, 目前可以支持Python2.6与Python2.7以及Python3.x
    # pip安装指南:pip install -U cos-python-sdk-v5
    # cos最新可用地域,参照https://www.qcloud.com/document/product/436/6224
    logging.basicConfig(level=logging.INFO, stream=sys.stdout)

    # 设置用户属性, 包括 secret_id, secret_key, region等。Appid 已在CosConfig中移除，请在参数 Bucket 中带上 Appid。Bucket 由 BucketName-Appid 组成
    region = ''      # 替换为用户的 region，已创建桶归属的region可以在控制台查看，https://console.cloud.tencent.com/cos5/bucket
    secret_id = ''     # 替换为用户的 SecretId，请登录访问管理控制台进行查看和管理，https://console.cloud.tencent.com/cam/capi
    secret_key = ''   # 替换为用户的 SecretKey，请登录访问管理控制台进行查看和管理，https://console.cloud.tencent.com/cam/capi
    bucket_name = ''

    # COS支持的所有region列表参见https://www.qcloud.com/document/product/436/6224
    token = None               # 如果使用永久密钥不需要填入token，如果使用临时密钥需要填入，临时密钥生成和使用指引参见https://cloud.tencent.com/document/product/436/14048
    domain = None # domain可以不填，此时使用COS区域域名访问存储桶。domain也可以填写用户自定义域名，或者桶的全球加速域名
    config = CosConfig(Region=region, SecretId=secret_id, SecretKey=secret_key, Token=token, Domain=domain)  # 获取配置对象
    client = CosS3Client(config)
    # 文件流 简单上传
    with open(f'./{relativePath}', 'rb') as fp:
        response = client.put_object(
            Bucket=bucket_name,
            Body=fp,
            Key=relativePath,
            StorageClass='STANDARD',
            ContentType='audio/mpeg'
        )
        print(response['ETag'])
    # 上传完成之后删除文件
    os.remove(file_path)  # 删除文件

    # 构建文件的访问 URL
    url = f"https://{bucket_name}.cos.{region}.myqcloud.com{relativePath}"
    print("文件访问路径:", url)
    return url

项目在github上已经开源，项目仓库地址 edge-ttshttps://github.com/lyz1810/edge-tts