免费语音识别转写(优于讯飞):www.funsound.cn
Funsound语音识别工具包:https://github.com/pika-online/Funsound/tree/main
1. 前言
本文主要介绍一种简单的多路语音识别转写部署方案,基于多线程开发实现后台同时转写多个音频文件,这里给出简易实现思路,具体构建服务端/客户端 离线语音转写请参考Funsound工具包代码,本文以部署多个whisper语音转写引擎为例
2. 思路
- 初始化 K个 语音识别引擎
- 为每个引擎分配一个worker(线程),构建线程组workers
- 向workers提交待转写音频,workers会自动调度,将任务分配给最闲的worker进行转写,返回task_id
- 对每个音频根据task_id 轮询进度,直至完成。
3. 流程
3.1 初始化ASR引擎
from funsound.whisper.asr import ASR
def init_engine():
engine = ASR(model_id='funasr_models/keepitsimple/faster-whisper-large-v3',
cfg_file='conf/whisper.yaml',
log_file=f'log/whisper-{id}.log')
engine.init_state(id)
return engine
3.2 定义worker
from funsound.common.executor import Worker
def processor(self,params):
audio_file = params[0]
result = self.engine.inference(audio_file)
return result
Worker.processor = processor
3.3 初始化多路待命引擎
nj = 5 # 开启5路
workers = []
for id in range(nj):
engine = init_engine(id)
worker = Worker(wid=id,log_file=f'log/worker-{id}.log')
worker.load_engine(engine=engine)
workers.append(worker)
launch(workers)
3.4 提交音频转写任务
audio_file = "funsound/examples/test1.wav"
task_id = submit_task(workers,params=[audio_file])
音频会分配给最闲的worker处理
3.5 轮询进度直至转写完成
while 1:
prgs = get_task_progress(task_id)
print(prgs)
if prgs['status'] in ["SUCCESS","FAIL"]:
break
time.sleep(1)
获取识别结果: