百度飞桨语音PaddleSpeech在win上运行语音识别(ASR)与语音合成(TTS)

前言

PaddleSpeech是PaddlePadle百度飞桨深度学习框架下的语音服务工具包。PaddleSpeech包括自监督学习模型、带标点符号的 SOTA/流式 ASR、带文本前端的流式 TTS、说话人验证系统、端到端语音翻译和关键字识别。

我在部署运行过程中，遇到很多问题，参考了很多资料，在这里做一个从安装部署到运行测试的、详尽的记录。

文章目录

文章目录

前言
先决条件
运行环境搭建
- 安装Anaconda
- - 安装过程
  - 查看conda版本
项目部署
- 为PaddleSpeech创建专属虚拟环境
- 安装 C++ 编译环境
- 安装PaddleSpeech
- - 安装pytest-runner
  - 安装paddlepaddle
  - 安装paddlespeech
  - 新建PaddleSpeech工作文件夹
快速上手
- 自动语音识别ASR(Automatic Speech Recognition)
- - 准备测试音频
  - 语音识别
  - 运行报错及解决
  - 重新识别
- 语音合成TTS(Text-to-Speech)(本文转语音)
- - 语音合成
  - 运行报错及解决
  - 重新合成语音
- 关于日志
快速使用服务
- 启动服务
- - 准备启动服务的配置文件
引用：

先决条件

官方给的要求如下

Python >= 3.7 (我这里使用Python3.9)
最新版本的 PaddlePaddle
C++ 编译环境 (win下，可通过Visual Studio Installer来安装)
提示: 我们建议在安装 paddlepaddle 的时候使用百度源 https://mirror.baidu.com/pypi/simple ，而在安装 paddlespeech 的时候使用清华源 https://pypi.tuna.tsinghua.edu.cn/simple 。

运行环境搭建

安装Anaconda

安装过程

安装步骤略过，参考文章：anaconda的安装和使用

查看conda版本

Anaconda Power Shell控制台中输入以下命令：

conda info

我的conda版本是23.1.0。

项目部署

为PaddleSpeech创建专属虚拟环境

Anaconda Power Shell中输入以下命令，创建python版本为【3.9】、名为【paddle_speech】的虚拟环境。建议使用【Python3.9】，亲测可用。

conda env list
conda create -n paddle_speech python=3.9

激活 conda 虚拟环境：

conda activate paddle_speech

安装 paddlespeech 的 conda 依赖：

conda install -y -c conda-forge sox libsndfile bzip2

安装 C++ 编译环境

(如果你系统上已经安装了 C++ 编译环境，请忽略这一步。)

对于 Windows 系统，需要安装 Visual Studio 来完成 C++ 编译环境的安装。

https://visualstudio.microsoft.com/visual-cpp-build-tools/

主要是Visual Studio Installer 中勾选 C++桌面开发。

[开始]->[搜索]->Visual Studio Installer

勾选【使用C++的桌面开发】，进行安装。

你可以前往讨论区#1195获取更多帮助。

安装PaddleSpeech

安装pytest-runner

部分用户系统由于默认源的问题，安装中会出现kaldiio安装出错的问题，建议首先安装pytest-runner:

pip install pytest-runner -i https://pypi.tuna.tsinghua.edu.cn/simple

安装paddlepaddle

paddlespeech依赖于paddlepaddle，我们需要先安装paddlepaddle：

pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple

安装paddlespeech

最后安装paddlespeech

pip install paddlespeech -i https://pypi.tuna.tsinghua.edu.cn/simple

期间会安装很多类库，最终安装成功如图所示：

新建PaddleSpeech工作文件夹

我们需要手动为PddleSpeech新建一个工作文件夹，以存放一些临时文件、输入输出文件

这里我的路径是C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech

Anaconda Power Shell 进入该目录：

cd C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech

快速上手

自动语音识别ASR(Automatic Speech Recognition)

准备测试音频

测试音频示例下载，直接通过迅雷下载即可

https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

下载下来之后，放入

C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech

语音识别

我们识别一下zh.wav文件

cd C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech
conda activate paddle_speech
paddlespeech asr --lang zh --input zh.wav

运行结果如下

报错了！

运行报错及解决

【DeprecationWarning】是警告，不影响正常运行。

【AttributeError】是错误，需要解决。

报错信息：

AttributeError: module 'numpy' has no attribute 'complex'.
`np.complex` was a deprecated alias for the builtin `complex`. To avoid this error in existing code, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

参考Issues：numpy1.24.3报错 #3235

报错原因分析：

The deprecation for the aliases np.object, np.bool, np.float, np.complex, np.str, and np.int is expired (introduces NumPy 1.20). Some of these will now give a FutureWarning in addition to raising an error since they will be mapped to the NumPy scalars in the future.

是numpy版本更新导致np.complex的用法已经过时了。

解决方案：

将numpy降级为1.23.5

pip uninstall numpy
pip install numpy==1.23.5 -i https://pypi.tuna.tsinghua.edu.cn/simple

重新识别

让我们再重新识别以下zh.wav文件：

cd C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech
conda activate paddle_speech
paddlespeech asr --lang zh --input zh.wav

识别成功！

语音合成TTS(Text-to-Speech)(本文转语音)

语音合成

使用以下命令进行语音合成：

paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！" --output output.wav

又报错了！

运行报错及解决

【DeprecationWarning】是警告，不影响正常运行。

【ImportError】是错误，需要解决。

报错信息：

 from timer import timer
ImportError: cannot import name 'timer' from 'timer' (C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\win32\timer.pyd)

报错原因分析：

1、首先我们要明确：报错的地方是paddlespeech的python代码中，有地方写了【from timer import timer】这句话导致的报错。

2、然后分析这句报错信息：从【cannot import name ‘timer’ from ‘timer’】这句话看得出来，不是没有install timer包的问题，因为通常来说没有安装包的报错是【ImportError: No module named xxx】；我们可以通过命令pip list查看timer是否安装：

显然，timer包已经安装了。

3、【cannot import name ‘timer’ from ‘timer’】这句话的意思是无法从timer包中导入名为timer的函数。不是没有找到，而是找到了但却无法导入。从这里推测是系统中存在两个或多个timer包，并且版本不一样(或者单纯就是两个完全不同的包刚好名字都取成了timer)，而优先导入的timer包中不存在timer函数。再结合后面那句话【(C:\Users\Administrator\AppData\Roaming\Python\Python39\site-packages\win32\timer.pyd)】可以看出系统是去【Python39】自带的【site-packages】中找的【timer.pyd】(pyd是python的编译中间文件)而不是去虚拟环境【paddle_speech】的【site-packages】去找【timer.pyd】。可以断定确实存在两个【timer】

解决方案：

将Python39自带的timer.pyd重命名掉（如：timer_rename.pyd），或者删掉(比较危险)，强制使用虚拟环境中安装的timer。

重新合成语音

paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！" --output output.wav

成功！

关于日志

工作目录下会自动生成一个exp文件夹，日志文件log都会保存到 exp\log\ 下面。

快速使用服务

安装完成后，开发者可以通过命令行一键启动语音识别，语音合成，音频分类等多种服务。

启动服务

准备启动服务的配置文件

前往PaddleSpeech的Gitee，下载源码

找到源码中的【/demos/speech_server/conf/application.yaml】：

在当前工作目录(我的路径是C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech)下，

新建文件夹【speech_server】，在【speech_server】下再新建文件夹【conf】，

将application.yaml拷贝到【 /speech_server/conf/】下面：

使用以下命令启动服务

paddlespeech_server start --config_file ./speech_server/conf/application.yaml

启动成功，端口是8090。

访问语音识别服务

新开一个Anaconda Power Shell，输入一下命令，调用语音识别服务：

conda env list
conda activate paddle_speech
cd C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input zh.wav

访问语音合成服务

新开一个Anaconda Power Shell，输入一下命令，调用语音识别服务：

conda env list
conda activate paddle_speech
cd C:\Users\Administrator\Documents\ftp\qianyuhui\src\PaddleSpeech\PaddleSpeech
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav