免费语音识别转写(优于讯飞):www.funsound.cn
前言
Paraformer在声学后验上通过greedy search得到语音识别结果,对于自定义命令(唤醒)词识别,肯定还得走asr模型。对此我们可以在paraformer上为每个命令词构建模板进行命令匹配,最终结果远远优于科大讯飞语音控制sdk结果。
思路
大致思路如图,例如给”前翻页“命令制作模板,只需分析在解码后的声学后验上”前“/“翻”/"页"的能量分布。当然paraformer同时提供热词模块,添加以后会进一步提高命令词召回
代码
算法实现如下:
def kws(self,waveform_list,WORDS=[],as_hotwords=True):
"""加载词表"""
WORDS_IDXS = []
for WORD in WORDS:
WORD_IDX = self.converter.tokens2ids(list(WORD))
WORDS_IDXS.append(WORD_IDX)
"""解码"""
_, AM_SCORES, VALID_TOKEN_LENS, US_ALPHAS, US_PEAKS = self.__call__(waveform_list=waveform_list,
hotwords=" ".join(WORDS) if as_hotwords else "")
RESULTS = []
for am_score, valid_token_len in zip(AM_SCORES, VALID_TOKEN_LENS):
am_score = am_score[:valid_token_len-1]
best_score = -float('inf')
for WORD, WORD_IDX in zip(WORDS, WORDS_IDXS):
tgt_score = am_score[:,WORD_IDX]
_max = np.max(tgt_score,axis=1)
mean_score = np.mean(_max)
if mean_score>best_score:
best_score = mean_score
best_word = WORD
RESULTS.append([best_score, best_word])
return RESULTS
召回率测试
在6个命令词,27人,820句的验证集上进行召回测试,测试集会进行一定程度加噪
结果可以看到当前基于paraformer的语音控制表现还是比较不错的。