前言
此前,笔者梳理了语音相关的两篇文章:
- 如何打造车载语音交互:Google Voice Interaction 给你答案:介绍的是 3rd Party App 如何通过
Voice Interaction
API 快速调用系统的语音交互服务快速完成确认、选择的基础语音对话 - 直面原理:5 张图彻底了解 Android TextToSpeech 机制:侧重于阐述 TTS Engine App 如何提供
Text-to-Speech
文字转语音服务,以及 3rd Party App 又如何便捷地调用这些服务。
还缺最后一块即如何向系统提供语音识别的 SpeechRecognizer
服务、3rd Party App 如何使用他们,以及系统和联系这两者?
本篇文章将为你补齐这块知识点。
如何实现识别服务?
首先我们得提供识别服务的实现,简单来说继承 RecognitionService
实现最重要的几个抽象方法即可:
- 首先可以定义抽象的识别 Engine 的接口
IRecognitionEngine
- 在 RecognitionService 启动的时候获取识别 engine 提供商的实现实例
- 在
onStartListening()
里解析识别请求 Intent 中的参数,比如语言、最大结果数等信息封装成 json 字符串传递给 engine 的开始识别。那么 Engine 也需要依据参数进行识别实现方面的调整,并将识别过程中相应的状态、结果返回,比如开始说话 beginningOfSpeech() 、结束说话 endOfSpeech() 、中间结果 partialResults() 等 onStopListening()
里调用 engine 的停止识别,一样的需要 engine 回传结果,比如最终识别结果 results()onCancel()
里执行 engine 提供的 release() 进行识别 engine 的解绑、资源释放
interface IRecognitionEngine {
fun init()
fun startASR(parameter: String, callback: Callback?)
fun stopASR(callback: Callback?)
fun release(callback: Callback?)
}
class CommonRecognitionService : RecognitionService() {
private val recognitionEngine: IRecognitionEngine by lazy {
RecognitionProvider.provideRecognition()
}
override fun onCreate() {
super.onCreate()
recognitionEngine.init()
}
override fun onStartListening(intent: Intent?, callback: Callback?) {
val params: String = "" // Todo parse parameter from intent
recognitionEngine.startASR(params, callback)
}
override fun onStopListening(callback: Callback?) {
recognitionEngine.stopASR(callback)
}
override fun onCancel(callback: Callback?) {
recognitionEngine.release(callback)
}
}
当然不要忘记在 Manifest 中声明:
<service
android:name=".recognition.service.CommonRecognitionService"
android:exported="true">
<intent-filter>
<action android:name="android.speech.RecognitionService"/>
</intent-filter>
</service>
如何请求识别?
首先得声明 capture audio 的 Runtime 权限,还需补充运行时权限的代码逻辑。
<manifest ... >
<uses-configuration android:name="android.permission.RECORD_AUDIO"/>
</manifest>
另外,Android 11 以上的话,需要额外添加对识别服务的包名 query 声明。
<manifest ... >
...
<queries>
<intent>
<action
android:name="android.speech.RecognitionService" />
</intent>
</queries>
</manifest>
权限满足之后,最好先检查整个系统里是否有 Recognition
服务可用,NO 的话,直接结束即可。
class RecognitionHelper(val context: Context) {
fun prepareRecognition(): Boolean {
if (!SpeechRecognizer.isRecognitionAvailable(context)) {
Log.e("RecognitionHelper", "System has no recognition service yet.")
return false
}
...
}
}
有可用服务的话,通过 SpeechRecognizer 提供的静态方法创建调用识别的入口实例,该方法必须在主线程调用。
class RecognitionHelper(val context: Context) : RecognitionListener{
private lateinit var recognizer: SpeechRecognizer
fun prepareRecognition(): Boolean {
...
recognizer = SpeechRecognizer.createSpeechRecognizer(context)
...
}
}
当然如果系统搭载的服务不止一个,并且已知了其包名,可指定识别的实现方:
public static SpeechRecognizer createSpeechRecognizer (Context context,
ComponentName serviceComponent)
接下来就是设置 Recognition 的监听器,对应着识别过程中各种状态,比如:
- onPartialResults() 返回的中间识别结果,通过 SpeechRecognizer#RESULTS_RECOGNITION key 去 Bundle 中获取识别字符串 getStringArrayList(String)
- onResults() 将返回最终识别的结果,解析办法同上
- onBeginningOfSpeech():检测到说话开始
- onEndOfSpeech():检测到说话结束
- onError() 将返回各种错误,和 SpeechRecognizer#ERROR_XXX 中各数值相对应,例如没有麦克风权限的话,会返回
ERROR_INSUFFICIENT_PERMISSIONS
- 等等
class RecognitionHelper(val context: Context) : RecognitionListener{
...
fun prepareRecognition(): Boolean {
...
recognizer.setRecognitionListener(this)
return true
}
override fun onReadyForSpeech(p0: Bundle?) {
TODO("Not yet implemented")
}
override fun onBeginningOfSpeech() {
TODO("Not yet implemented")
}
override fun onRmsChanged(p0: Float) {
TODO("Not yet implemented")
}
override fun onBufferReceived(p0: ByteArray?) {
TODO("Not yet implemented")
}
override fun onEndOfSpeech() {
TODO("Not yet implemented")
}
override fun onError(p0: Int) {
TODO("Not yet implemented")
}
override fun onResults(p0: Bundle?) {
TODO("Not yet implemented")
}
override fun onPartialResults(p0: Bundle?) {
TODO("Not yet implemented")
}
override fun onEvent(p0: Int, p1: Bundle?) {
TODO("Not yet implemented")
}
}
之后创建识别的必要 Intent 信息并启动,信息包括:
EXTRA_LANGUAGE_MODEL
:必选,期望识别的偏好模型,比如代码里设置的自由形式的 LANGUAGE_MODEL_FREE_FORM 模型,还有依赖网络搜索的 LANGUAGE_MODEL_WEB_SEARCH 模型等EXTRA_PARTIAL_RESULTS
:可选,是否要求识别服务回传识别途中的结果,默认 falseEXTRA_MAX_RESULTS
:可选,设置允许服务返回的最多结果数值,int 类型EXTRA_LANGUAGE
:可选,设置识别语言,默认情况下是 Locale.getDefault() 的地区语言(笔者使用的是 Google Assistant 提供的识别服务,暂不支持中文,所以此处配置的 Locale 为 ENGLISH)- 等
另外,需要留意两点:1. 此方法必须在上述监听器设置之后进行,2. 该方法得在主线程发起:
class RecognitionHelper(val context: Context) : RecognitionListener{
...
fun startRecognition() {
val intent = createRecognitionIntent()
recognizer.startListening(intent)
}
...
}
fun createRecognitionIntent() = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3)
putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.ENGLISH)
}
下面我们添加一个布局调用上述的 RecognitionHelper 进行识别的初始化和启动,并将结果进行展示。
同时添加和 UI 交互的中间识别结果和最终识别结果的 interface,将 RecognitionListener 的数据带回。
interface ASRResultListener {
fun onPartialResult(result: String)
fun onFinalResult(result: String)
}
class RecognitionHelper(private val context: Context) : RecognitionListener {
...
private lateinit var mResultListener: ASRResultListener
fun prepareRecognition(resultListener: ASRResultListener): Boolean {
...
mResultListener = resultListener
...
}
...
override fun onPartialResults(bundle: Bundle?) {
bundle?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let {
Log.d(
"RecognitionHelper", "onPartialResults() with:$bundle" +
" results:$it"
)
mResultListener.onPartialResult(it[0])
}
}
override fun onResults(bundle: Bundle?) {
bundle?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let {
Log.d(
"RecognitionHelper", "onResults() with:$bundle" +
" results:$it"
)
mResultListener.onFinalResult(it[0])
}
}
}
接着,Activity 实现该借口,将数据展示到 TextView,为了能够让肉眼能够分辨中间结果的识别过程,在更新 TextView 前进行 300ms 的等待。
class RecognitionActivity : AppCompatActivity(), ASRResultListener {
private lateinit var binding: RecognitionLayoutBinding
private val recognitionHelper: RecognitionHelper by lazy {
RecognitionHelper(this)
}
private var updatingTextTimeDelayed = 0L
private val mainHandler = Handler(Looper.getMainLooper())
override fun onCreate(savedInstanceState: Bundle?) {
...
if (!recognitionHelper.prepareRecognition(this)) {
Toast.makeText(this, "Recognition not available", Toast.LENGTH_SHORT).show()
return
}
binding.start.setOnClickListener {
Log.d("RecognitionHelper", "startRecognition()")
recognitionHelper.startRecognition()
}
binding.stop.setOnClickListener {
Log.d("RecognitionHelper", "stopRecognition()")
recognitionHelper.stopRecognition()
}
}
override fun onStop() {
super.onStop()
Log.d("RecognitionHelper", "onStop()")
recognitionHelper.releaseRecognition()
}
override fun onPartialResult(result: String) {
Log.d("RecognitionHelper", "onPartialResult() with result:$result")
updatingTextTimeDelayed += 300L
mainHandler.postDelayed(
{
Log.d("RecognitionHelper", "onPartialResult() updating")
binding.recoAsr.text = result
}, updatingTextTimeDelayed
)
}
override fun onFinalResult(result: String) {
Log.d("RecognitionHelper", "onFinalResult() with result:$result")
updatingTextTimeDelayed += 300L
mainHandler.postDelayed(
{
Log.d("RecognitionHelper", "onFinalResult() updating")
binding.recoAsr.text = result
}, updatingTextTimeDelayed
)
}
}
我们点击“START RECOGNITION” button,然后可以看到手机右上角显示了 mic 录音中,当我们说出“Can you introduce yourself” 后,TextView 能够逐步上屏,呈现打字机的效果。
下面是过程中的 log,也反映了识别过程:
// 初始化
08-15 22:43:13.963 6879 6879 D RecognitionHelper: onCreate()
08-15 22:43:14.037 6879 6879 E RecognitionHelper: audio recording permission granted
08-15 22:43:14.050 6879 6879 D RecognitionHelper: onStart()
// 开始识别
08-15 22:43:41.491 6879 6879 D RecognitionHelper: startRecognition()
08-15 22:43:41.577 6879 6879 D RecognitionHelper: onReadyForSpeech()
08-15 22:43:41.776 6879 6879 D RecognitionHelper: onRmsChanged() with:-2.0
...
08-15 22:43:46.532 6879 6879 D RecognitionHelper: onRmsChanged() with:-0.31999993
// 检测到开始说话
08-15 22:43:46.540 6879 6879 D RecognitionHelper: onBeginningOfSpeech()
// 第 1 个识别结果:Can
08-15 22:43:46.541 6879 6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can]
08-15 22:43:46.541 6879 6879 D RecognitionHelper: onPartialResult() with result:Can
// 第 2 个识别结果:Can you
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you]
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResult() with result:Can you
// 第 3 个识别结果:Can you in
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you in], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you in]
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResult() with result:Can you in
// 第 4 个识别结果:Can you intro
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you intro], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you intro]
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResult() with result:Can you intro
// 第 n 个识别结果:Can you introduce yourself
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you introduce yourself], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you introduce yourself]
08-15 22:43:46.542 6879 6879 D RecognitionHelper: onPartialResult() with result:Can you introduce yourself
// 检测到停止说话
08-15 22:43:46.543 6879 6879 D RecognitionHelper: onEndOfSpeech()
08-15 22:43:46.543 6879 6879 D RecognitionHelper: onEndOfSpeech()
08-15 22:43:46.545 6879 6879 D RecognitionHelper: onResults() with:Bundle[{results_recognition=[Can you introduce yourself], confidence_scores=[0.0]}] results:[Can you introduce yourself]
// 识别到最终结果:Can you introduce yourself
08-15 22:43:46.545 6879 6879 D RecognitionHelper: onFinalResult() with result:Can you introduce yourself
系统如何调度?
SpeechRecognizer 没有像 Text-to-speech 一样在设置中提供独立的设置入口,其默认 App 由 VoiceInteraction 联动设置。
但如下命令可以 dump 出系统默认的识别服务。
adb shell settings get secure voice_recognition_service
当在模拟器中 dump 的话,可以看到默认搭载的是 Google 的识别服务。
com.google.android.tts/com.google.android.apps.speech.tts.googletts.service.GoogleTTSRecognitionService
在三星设备中 dump 的话,则是 Samsung 提供的识别服务。
com.samsung.android.bixby.agent/.mainui.voiceinteraction.RecognitionServiceTrampoline
我们从请求识别中提及的几个 API 入手探究一下识别服务的实现原理。
检测识别服务
检查服务是否可用的实现很简单,即是用 Recognition 专用的 Action(“android.speech.RecognitionService”) 去 PackageManager 中检索,能够启动的 App 存在 1 个的话,即认为系统有识别服务可用。
public static boolean isRecognitionAvailable(final Context context) {
final List<ResolveInfo> list = context.getPackageManager().queryIntentServices(
new Intent(RecognitionService.SERVICE_INTERFACE), 0);
return list != null && list.size() != 0;
}
初始化识别服务
正如【如何请求识别?】章节中讲述的,调用静态方法 createSpeechRecognizer() 完成初始化,内部将检查 Context 是否存在、依据是否指定识别服务的包名决定是否记录目标的服务名称。
public static SpeechRecognizer createSpeechRecognizer(final Context context) {
return createSpeechRecognizer(context, null);
}
public static SpeechRecognizer createSpeechRecognizer(final Context context,
final ComponentName serviceComponent) {
if (context == null) {
throw new IllegalArgumentException("Context cannot be null");
}
checkIsCalledFromMainThread();
return new SpeechRecognizer(context, serviceComponent);
}
private SpeechRecognizer(final Context context, final ComponentName serviceComponent) {
mContext = context;
mServiceComponent = serviceComponent;
mOnDevice = false;
}
得到 SpeechRecognizer 之后调用 setRecognitionListener() 则稍微复杂些:
- 检查调用源头是否属于主线程
- 创建专用 Message MSG_CHANGE_LISTENER
- 如果系统处理 Recognition 请求的服务 SpeechRecognitionManagerService 尚未建立连接,先将该 Message 排入 Pending Queue,等后续发起识别的时候创建连接后会将 Message 发往 Handler
- 反之直接放入 Handler 等待调度
public void setRecognitionListener(RecognitionListener listener) {
checkIsCalledFromMainThread();
putMessage(Message.obtain(mHandler, MSG_CHANGE_LISTENER, listener));
}
private void putMessage(Message msg) {
if (mService == null) {
mPendingTasks.offer(msg);
} else {
mHandler.sendMessage(msg);
}
}
而 Handler 通过 handleChangeListener()
将 Listener 实例更新。
private Handler mHandler = new Handler(Looper.getMainLooper()) {
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
...
case MSG_CHANGE_LISTENER:
handleChangeListener((RecognitionListener) msg.obj);
break;
...
}
}
};
private void handleChangeListener(RecognitionListener listener) {
if (DBG) Log.d(TAG, "handleChangeListener, listener=" + listener);
mListener.mInternalListener = listener;
}
开始识别
startListening() 首先将确保识别请求的 Intent 不为空,否则弹出 “intent must not be null” 的提示,接着检查调用线程是否是主线程,反之抛出 “SpeechRecognizer should be used only from the application’s main thread” 的 Exception。
然后就是确保服务是准备妥当的,不然的话调用 connectToSystemService() 建立识别服务的连接。
public void startListening(final Intent recognizerIntent) {
if (recognizerIntent == null) {
throw new IllegalArgumentException("intent must not be null");
}
checkIsCalledFromMainThread();
if (mService == null) {
// First time connection: first establish a connection, then dispatch #startListening.
connectToSystemService();
}
putMessage(Message.obtain(mHandler, MSG_START, recognizerIntent));
}
connectToSystemService() 的第一步是调用 getSpeechRecognizerComponentName() 获取识别服务的组件名称,一种是来自于请求 App 的指定,一种是来自 SettingsProvider
中存放的当前识别服务的包名 VOICE_RECOGNITION_SERVICE,其实就是和 VoiceInteraction 的 App 一致。如果包名不存在的话结束。
包名确实存在的话,通过 IRecognitionServiceManager.aidl 向 SystemServer 中管理语音识别的 SpeechRecognitionManagerService 系统服务发送创建 Session 的请求。
/** Establishes a connection to system server proxy and initializes the session. */
private void connectToSystemService() {
if (!maybeInitializeManagerService()) {
return;
}
ComponentName componentName = getSpeechRecognizerComponentName();
if (!mOnDevice && componentName == null) {
mListener.onError(ERROR_CLIENT);
return;
}
try {
mManagerService.createSession(
componentName,
mClientToken,
mOnDevice,
new IRecognitionServiceManagerCallback.Stub(){
@Override
public void onSuccess(IRecognitionService service) throws RemoteException {
mService = service;
while (!mPendingTasks.isEmpty()) {
mHandler.sendMessage(mPendingTasks.poll());
}
}
@Override
public void onError(int errorCode) throws RemoteException {
mListener.onError(errorCode);
}
});
} catch (RemoteException e) {
e.rethrowFromSystemServer();
}
}
SpeechRecognitionManagerService 的处理是调用 SpeechRecognitionManagerServiceImpl
实现。
// SpeechRecognitionManagerService.java
final class SpeechRecognitionManagerServiceStub extends IRecognitionServiceManager.Stub {
@Override
public void createSession(
ComponentName componentName,
IBinder clientToken,
boolean onDevice,
IRecognitionServiceManagerCallback callback) {
int userId = UserHandle.getCallingUserId();
synchronized (mLock) {
SpeechRecognitionManagerServiceImpl service = getServiceForUserLocked(userId);
service.createSessionLocked(componentName, clientToken, onDevice, callback);
}
}
...
}
SpeechRecognitionManagerServiceImpl 则是交给 RemoteSpeechRecognitionService
类完成和 App 识别服务的绑定,可以看到 RemoteSpeechRecognitionService 将负责和识别服务的通信。
// SpeechRecognitionManagerServiceImpl.java
void createSessionLocked( ... ) {
...
RemoteSpeechRecognitionService service = createService(creatorCallingUid, serviceComponent);
...
service.connect().thenAccept(binderService -> {
if (binderService != null) {
try {
callback.onSuccess(new IRecognitionService.Stub() {
@Override
public void startListening( ... )
throws RemoteException {
...
service.startListening(recognizerIntent, listener, attributionSource);
}
...
});
} catch (RemoteException e) {
tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
}
} else {
tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
}
});
}
当和识别服务 App 的连接建立成功或者已经存在的话,发送 MSG_START
的 Message,Main Handler 则是调用 handleStartListening() 继续。其首先会再度检查 mService 是否存在,避免引发 NPE。
接着,向该 AIDL 接口代理对象发送开始聆听的请求。
private Handler mHandler = new Handler(Looper.getMainLooper()) {
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case MSG_START:
handleStartListening((Intent) msg.obj);
break;
...
}
}
};
private void handleStartListening(Intent recognizerIntent) {
if (!checkOpenConnection()) {
return;
}
try {
mService.startListening(recognizerIntent, mListener, mContext.getAttributionSource());
}
...
}
该 AIDL 的定义在如下文件中:
// android/speech/IRecognitionService.aidl
oneway interface IRecognitionService {
void startListening(in Intent recognizerIntent, in IRecognitionListener listener,
in AttributionSource attributionSource);
void stopListening(in IRecognitionListener listener);
void cancel(in IRecognitionListener listener, boolean isShutdown);
...
}
该 AIDL 的实现在系统的识别管理类 SpeechRecognitionManagerServiceImpl
中:
// com/android/server/speech/SpeechRecognitionManagerServiceImpl.java
void createSessionLocked( ... ) {
...
service.connect().thenAccept(binderService -> {
if (binderService != null) {
try {
callback.onSuccess(new IRecognitionService.Stub() {
@Override
public void startListening( ...) {
attributionSource.enforceCallingUid();
if (!attributionSource.isTrusted(mMaster.getContext())) {
attributionSource = mMaster.getContext()
.getSystemService(PermissionManager.class)
.registerAttributionSource(attributionSource);
}
service.startListening(recognizerIntent, listener, attributionSource);
}
...
});
} ...
} else {
tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
}
});
}
此后还要经过一层 RemoteSpeechRecognitionService 的中转:
// com/android/server/speech/RemoteSpeechRecognitionService.java
void startListening(Intent recognizerIntent, IRecognitionListener listener,
@NonNull AttributionSource attributionSource) {
...
synchronized (mLock) {
if (mSessionInProgress) {
tryRespondWithError(listener, SpeechRecognizer.ERROR_RECOGNIZER_BUSY);
return;
}
mSessionInProgress = true;
mRecordingInProgress = true;
mListener = listener;
mDelegatingListener = new DelegatingListener(listener, () -> {
synchronized (mLock) {
resetStateLocked();
}
});
final DelegatingListener listenerToStart = this.mDelegatingListener;
run(service ->
service.startListening(
recognizerIntent,
listenerToStart,
attributionSource));
}
}
最后调用具体服务的实现,自然位于 RecognitionService
中,该 Binder 线程向主线程发送 MSG_START_LISTENING Message:
/** Binder of the recognition service */
private static final class RecognitionServiceBinder extends IRecognitionService.Stub {
...
@Override
public void startListening(Intent recognizerIntent, IRecognitionListener listener,
@NonNull AttributionSource attributionSource) {
final RecognitionService service = mServiceRef.get();
if (service != null) {
service.mHandler.sendMessage(Message.obtain(service.mHandler,
MSG_START_LISTENING, service.new StartListeningArgs(
recognizerIntent, listener, attributionSource)));
}
}
...
}
private final Handler mHandler = new Handler() {
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case MSG_START_LISTENING:
StartListeningArgs args = (StartListeningArgs) msg.obj;
dispatchStartListening(args.mIntent, args.mListener, args.mAttributionSource);
break;
...
}
}
};
Handler 接受一样将具体事情交由 dispatchStartListening() 继续,最重要的内容是检查发起识别的 Intent 中是否提供了 EXTRA_AUDIO_SOURCE 活跃音频来源,或者请求的 App 是否具备 RECORD_AUDIO 的 permission。
private void dispatchStartListening(Intent intent, final IRecognitionListener listener,
@NonNull AttributionSource attributionSource) {
try {
if (mCurrentCallback == null) {
boolean preflightPermissionCheckPassed =
intent.hasExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE)
|| checkPermissionForPreflightNotHardDenied(attributionSource);
if (preflightPermissionCheckPassed) {
mCurrentCallback = new Callback(listener, attributionSource);
RecognitionService.this.onStartListening(intent, mCurrentCallback);
}
if (!preflightPermissionCheckPassed || !checkPermissionAndStartDataDelivery()) {
listener.onError(SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS);
if (preflightPermissionCheckPassed) {
// If we attempted to start listening, cancel the callback
RecognitionService.this.onCancel(mCurrentCallback);
dispatchClearCallback();
}
}
...
}
} catch (RemoteException e) {
Log.d(TAG, "onError call from startListening failed");
}
}
任一条件满足的话,调用服务实现的 onStartListening 方法发起识别,具体逻辑由各自的服务决定,其最终将调用 Callback 返回识别状态和结果,对应着【如何请求识别?】章节里对应的 RecognitionListener 回调。
protected abstract void onStartListening(Intent recognizerIntent, Callback listener);
停止识别 & 取消服务
后续的停止识别 stopListening()
、取消服务 cancel()
的实现链路和开始识别基本一致,最终分别抵达 RecognitionService 的 onStopListening()
以及 onCancel()
回调。
唯一区别的地方在于 stop 只是暂时停止识别,识别 App 的连接还在,而 cancel 则是断开了连接、并重置了相关数据。
void cancel(IRecognitionListener listener, boolean isShutdown) {
...
synchronized (mLock) {
...
mRecordingInProgress = false;
mSessionInProgress = false;
mDelegatingListener = null;
mListener = null;
// Schedule to unbind after cancel is delivered.
if (isShutdown) {
run(service -> unbind());
}
}
}
结语
最后我们结合一张图整体了解一下 SpeechRecognizer
机制的链路:
- 需要语音识别的 App 通过 SpeechRecognizer 发送 Request
- SpeechRecognizer 在发起识别的时候通过 IRecognitionServiceManager.aidl 告知 SystemServer 的
SpeechRecognitionManagerService
系统服务,去SettingsProvider
中获取默认的 Recognition 服务包名 - SpeechRecognitionManagerService 并不直接负责绑定,而是交由
SpeechRecognitionManagerServiceImpl
调度 - SpeechRecognitionManagerServiceImpl 则是交给
RemoteSpeechRecognitionService
专门绑定和管理 - RemoteSpeechRecognitionService 通过 IRecognitionService.aidl 和具体的识别服务 RecognitionService 进行交互
- RecognitionService 则会通过
Handler
切换到主线程,调用识别 engine 开始处理识别请求,并通过Callback
内部类完成识别状态、结果的返回 - 后续则是 RecognitionService 通过 IRecognitionListener.aidl 将结果传递至 SystemServer,以及进一步抵达发出请求的 App 源头
推荐阅读
- 如何打造车载语音交互:Google Voice Interaction 给你答案
- 直面原理:5 张图彻底了解 Android TextToSpeech 机制
参考资料
- SpeechRecognizer
- RecognitionService
- 系统 Sample Project