Android 标准语音识别框架：SpeechRecognizer 的封装和调用

前言

此前，笔者梳理了语音相关的两篇文章：

如何打造车载语音交互：Google Voice Interaction 给你答案：介绍的是 3rd Party App 如何通过 Voice Interaction API 快速调用系统的语音交互服务快速完成确认、选择的基础语音对话
直面原理：5 张图彻底了解 Android TextToSpeech 机制：侧重于阐述 TTS Engine App 如何提供 Text-to-Speech 文字转语音服务,以及 3rd Party App 又如何便捷地调用这些服务。

还缺最后一块即如何向系统提供语音识别的 SpeechRecognizer 服务、3rd Party App 如何使用他们，以及系统和联系这两者？

本篇文章将为你补齐这块知识点。

如何实现识别服务？

首先我们得提供识别服务的实现，简单来说继承 RecognitionService 实现最重要的几个抽象方法即可：

首先可以定义抽象的识别 Engine 的接口 IRecognitionEngine
在 RecognitionService 启动的时候获取识别 engine 提供商的实现实例
在 onStartListening() 里解析识别请求 Intent 中的参数，比如语言、最大结果数等信息封装成 json 字符串传递给 engine 的开始识别。那么 Engine 也需要依据参数进行识别实现方面的调整，并将识别过程中相应的状态、结果返回，比如开始说话 beginningOfSpeech() 、结束说话 endOfSpeech() 、中间结果 partialResults() 等
onStopListening() 里调用 engine 的停止识别，一样的需要 engine 回传结果，比如最终识别结果 results()
onCancel() 里执行 engine 提供的 release() 进行识别 engine 的解绑、资源释放

interface IRecognitionEngine {
    fun init()

    fun startASR(parameter: String, callback: Callback?)

    fun stopASR(callback: Callback?)

    fun release(callback: Callback?)
}

class CommonRecognitionService : RecognitionService() {
    private val recognitionEngine: IRecognitionEngine by lazy {
        RecognitionProvider.provideRecognition()
    }

    override fun onCreate() {
        super.onCreate()
        recognitionEngine.init()
    }

    override fun onStartListening(intent: Intent?, callback: Callback?) {
        val params: String = "" // Todo parse parameter from intent

        recognitionEngine.startASR(params, callback)
    }

    override fun onStopListening(callback: Callback?) {
        recognitionEngine.stopASR(callback)
    }

    override fun onCancel(callback: Callback?) {
        recognitionEngine.release(callback)
    }
}

当然不要忘记在 Manifest 中声明：

<service
    android:name=".recognition.service.CommonRecognitionService"
    android:exported="true">
    <intent-filter>
        <action android:name="android.speech.RecognitionService"/>
    </intent-filter>
</service>

如何请求识别？

首先得声明 capture audio 的 Runtime 权限，还需补充运行时权限的代码逻辑。

<manifest ... >
    <uses-configuration android:name="android.permission.RECORD_AUDIO"/>
</manifest>

另外，Android 11 以上的话，需要额外添加对识别服务的包名 query 声明。

<manifest ... >
    ...
    <queries>
        <intent>
            <action
                android:name="android.speech.RecognitionService" />
        </intent>
    </queries>
</manifest>

权限满足之后，最好先检查整个系统里是否有 Recognition 服务可用，NO 的话，直接结束即可。

class RecognitionHelper(val context: Context) {
    fun prepareRecognition(): Boolean {
        if (!SpeechRecognizer.isRecognitionAvailable(context)) {
            Log.e("RecognitionHelper", "System has no recognition service yet.")
            return false
        }
        ...
    }
}

有可用服务的话，通过 SpeechRecognizer 提供的静态方法创建调用识别的入口实例，该方法必须在主线程调用。

class RecognitionHelper(val context: Context) : RecognitionListener{
    private lateinit var recognizer: SpeechRecognizer

    fun prepareRecognition(): Boolean {
        ...
        recognizer = SpeechRecognizer.createSpeechRecognizer(context)
        ...
    }
}

当然如果系统搭载的服务不止一个，并且已知了其包名，可指定识别的实现方：

public static SpeechRecognizer createSpeechRecognizer (Context context, 
                ComponentName serviceComponent)

接下来就是设置 Recognition 的监听器，对应着识别过程中各种状态，比如：

onPartialResults() 返回的中间识别结果，通过 SpeechRecognizer#RESULTS_RECOGNITION key 去 Bundle 中获取识别字符串 getStringArrayList(String)
onResults() 将返回最终识别的结果，解析办法同上
onBeginningOfSpeech()：检测到说话开始
onEndOfSpeech()：检测到说话结束
onError() 将返回各种错误，和 SpeechRecognizer#ERROR_XXX 中各数值相对应，例如没有麦克风权限的话，会返回 ERROR_INSUFFICIENT_PERMISSIONS
等等

class RecognitionHelper(val context: Context) : RecognitionListener{
    ...
    fun prepareRecognition(): Boolean {
        ...
        recognizer.setRecognitionListener(this)
        return true
    }

    override fun onReadyForSpeech(p0: Bundle?) {
        TODO("Not yet implemented")
    }

    override fun onBeginningOfSpeech() {
        TODO("Not yet implemented")
    }

    override fun onRmsChanged(p0: Float) {
        TODO("Not yet implemented")
    }

    override fun onBufferReceived(p0: ByteArray?) {
        TODO("Not yet implemented")
    }

    override fun onEndOfSpeech() {
        TODO("Not yet implemented")
    }

    override fun onError(p0: Int) {
        TODO("Not yet implemented")
    }

    override fun onResults(p0: Bundle?) {
        TODO("Not yet implemented")
    }

    override fun onPartialResults(p0: Bundle?) {
        TODO("Not yet implemented")
    }

    override fun onEvent(p0: Int, p1: Bundle?) {
        TODO("Not yet implemented")
    }
}

之后创建识别的必要 Intent 信息并启动，信息包括：

EXTRA_LANGUAGE_MODEL：必选，期望识别的偏好模型，比如代码里设置的自由形式的 LANGUAGE_MODEL_FREE_FORM 模型，还有依赖网络搜索的 LANGUAGE_MODEL_WEB_SEARCH 模型等
EXTRA_PARTIAL_RESULTS：可选，是否要求识别服务回传识别途中的结果，默认 false
EXTRA_MAX_RESULTS：可选，设置允许服务返回的最多结果数值，int 类型
EXTRA_LANGUAGE：可选，设置识别语言，默认情况下是 Locale.getDefault() 的地区语言（笔者使用的是 Google Assistant 提供的识别服务，暂不支持中文，所以此处配置的 Locale 为 ENGLISH）
等

另外，需要留意两点：1. 此方法必须在上述监听器设置之后进行，2. 该方法得在主线程发起：

class RecognitionHelper(val context: Context) : RecognitionListener{
    ...
    fun startRecognition() {
        val intent = createRecognitionIntent()
        recognizer.startListening(intent)
    }
    ...
}

fun createRecognitionIntent() = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
    putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
    putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
    putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3)
    putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.ENGLISH)
}

下面我们添加一个布局调用上述的 RecognitionHelper 进行识别的初始化和启动，并将结果进行展示。

同时添加和 UI 交互的中间识别结果和最终识别结果的 interface，将 RecognitionListener 的数据带回。

interface ASRResultListener {
    fun onPartialResult(result: String)

    fun onFinalResult(result: String)
}

class RecognitionHelper(private val context: Context) : RecognitionListener {
    ...
    private lateinit var mResultListener: ASRResultListener

    fun prepareRecognition(resultListener: ASRResultListener): Boolean {
        ...
        mResultListener = resultListener
        ...
    }
    
    ...

    override fun onPartialResults(bundle: Bundle?) {
        bundle?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let {
            Log.d(
                "RecognitionHelper", "onPartialResults() with:$bundle" +
                        " results:$it"
            )

            mResultListener.onPartialResult(it[0])
        }
    }

    override fun onResults(bundle: Bundle?) {
        bundle?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let {
            Log.d(
                "RecognitionHelper", "onResults() with:$bundle" +
                        " results:$it"
            )

            mResultListener.onFinalResult(it[0])
        }
    }
}

接着，Activity 实现该借口，将数据展示到 TextView，为了能够让肉眼能够分辨中间结果的识别过程，在更新 TextView 前进行 300ms 的等待。

class RecognitionActivity : AppCompatActivity(), ASRResultListener {
    private lateinit var binding: RecognitionLayoutBinding
    private val recognitionHelper: RecognitionHelper by lazy {
        RecognitionHelper(this)
    }

    private var updatingTextTimeDelayed = 0L
    private val mainHandler = Handler(Looper.getMainLooper())

    override fun onCreate(savedInstanceState: Bundle?) {
        ...

        if (!recognitionHelper.prepareRecognition(this)) {
            Toast.makeText(this, "Recognition not available", Toast.LENGTH_SHORT).show()

            return
        }

        binding.start.setOnClickListener {
            Log.d("RecognitionHelper", "startRecognition()")

            recognitionHelper.startRecognition()
        }

        binding.stop.setOnClickListener {
            Log.d("RecognitionHelper", "stopRecognition()")

            recognitionHelper.stopRecognition()
        }
    }

    override fun onStop() {
        super.onStop()
        Log.d("RecognitionHelper", "onStop()")

        recognitionHelper.releaseRecognition()
    }

    override fun onPartialResult(result: String) {
        Log.d("RecognitionHelper", "onPartialResult() with result:$result")

        updatingTextTimeDelayed += 300L
        mainHandler.postDelayed(
            {
                Log.d("RecognitionHelper", "onPartialResult() updating")
                binding.recoAsr.text = result
            }, updatingTextTimeDelayed
        )
    }

    override fun onFinalResult(result: String) {
        Log.d("RecognitionHelper", "onFinalResult() with result:$result")

        updatingTextTimeDelayed += 300L
        mainHandler.postDelayed(
            {
                Log.d("RecognitionHelper", "onFinalResult() updating")
                binding.recoAsr.text = result
            }, updatingTextTimeDelayed
        )
    }
}

我们点击“START RECOGNITION” button，然后可以看到手机右上角显示了 mic 录音中，当我们说出“Can you introduce yourself” 后，TextView 能够逐步上屏，呈现打字机的效果。

下面是过程中的 log，也反映了识别过程：

// 初始化
08-15 22:43:13.963  6879  6879 D RecognitionHelper: onCreate()
08-15 22:43:14.037  6879  6879 E RecognitionHelper: audio recording permission granted
08-15 22:43:14.050  6879  6879 D RecognitionHelper: onStart()

// 开始识别
08-15 22:43:41.491  6879  6879 D RecognitionHelper: startRecognition()
08-15 22:43:41.577  6879  6879 D RecognitionHelper: onReadyForSpeech()
08-15 22:43:41.776  6879  6879 D RecognitionHelper: onRmsChanged() with:-2.0
...
08-15 22:43:46.532  6879  6879 D RecognitionHelper: onRmsChanged() with:-0.31999993

// 检测到开始说话
08-15 22:43:46.540  6879  6879 D RecognitionHelper: onBeginningOfSpeech()

// 第 1 个识别结果：Can
08-15 22:43:46.541  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can]
08-15 22:43:46.541  6879  6879 D RecognitionHelper: onPartialResult() with result:Can

// 第 2 个识别结果：Can you
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you

// 第 3 个识别结果：Can you in
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you in], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you in]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you in

// 第 4 个识别结果：Can you intro
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you intro], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you intro]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you intro

// 第 n 个识别结果：Can you introduce yourself
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{results_recognition=[Can you introduce yourself], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you introduce yourself]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you introduce yourself

// 检测到停止说话
08-15 22:43:46.543  6879  6879 D RecognitionHelper: onEndOfSpeech()
08-15 22:43:46.543  6879  6879 D RecognitionHelper: onEndOfSpeech()
08-15 22:43:46.545  6879  6879 D RecognitionHelper: onResults() with:Bundle[{results_recognition=[Can you introduce yourself], confidence_scores=[0.0]}] results:[Can you introduce yourself]

// 识别到最终结果：Can you introduce yourself
08-15 22:43:46.545  6879  6879 D RecognitionHelper: onFinalResult() with result:Can you introduce yourself

系统如何调度？

SpeechRecognizer 没有像 Text-to-speech 一样在设置中提供独立的设置入口，其默认 App 由 VoiceInteraction 联动设置。

但如下命令可以 dump 出系统默认的识别服务。

adb shell settings get secure voice_recognition_service

当在模拟器中 dump 的话，可以看到默认搭载的是 Google 的识别服务。

com.google.android.tts/com.google.android.apps.speech.tts.googletts.service.GoogleTTSRecognitionService

在三星设备中 dump 的话，则是 Samsung 提供的识别服务。

com.samsung.android.bixby.agent/.mainui.voiceinteraction.RecognitionServiceTrampoline

我们从请求识别中提及的几个 API 入手探究一下识别服务的实现原理。

检测识别服务

检查服务是否可用的实现很简单，即是用 Recognition 专用的 Action（“android.speech.RecognitionService”）去 PackageManager 中检索，能够启动的 App 存在 1 个的话，即认为系统有识别服务可用。

    public static boolean isRecognitionAvailable(final Context context) {
        final List<ResolveInfo> list = context.getPackageManager().queryIntentServices(
                new Intent(RecognitionService.SERVICE_INTERFACE), 0);
        return list != null && list.size() != 0;
    }

初始化识别服务

正如【如何请求识别？】章节中讲述的，调用静态方法 createSpeechRecognizer() 完成初始化，内部将检查 Context 是否存在、依据是否指定识别服务的包名决定是否记录目标的服务名称。

    public static SpeechRecognizer createSpeechRecognizer(final Context context) {
        return createSpeechRecognizer(context, null);
    }

    public static SpeechRecognizer createSpeechRecognizer(final Context context,
            final ComponentName serviceComponent) {
        if (context == null) {
            throw new IllegalArgumentException("Context cannot be null");
        }
        checkIsCalledFromMainThread();
        return new SpeechRecognizer(context, serviceComponent);
    }

    private SpeechRecognizer(final Context context, final ComponentName serviceComponent) {
        mContext = context;
        mServiceComponent = serviceComponent;
        mOnDevice = false;
    }

得到 SpeechRecognizer 之后调用 setRecognitionListener() 则稍微复杂些：

检查调用源头是否属于主线程
创建专用 Message MSG_CHANGE_LISTENER
如果系统处理 Recognition 请求的服务 SpeechRecognitionManagerService 尚未建立连接，先将该 Message 排入 Pending Queue，等后续发起识别的时候创建连接后会将 Message 发往 Handler
反之直接放入 Handler 等待调度

    public void setRecognitionListener(RecognitionListener listener) {
        checkIsCalledFromMainThread();
        putMessage(Message.obtain(mHandler, MSG_CHANGE_LISTENER, listener));
    }

    private void putMessage(Message msg) {
        if (mService == null) {
            mPendingTasks.offer(msg);
        } else {
            mHandler.sendMessage(msg);
        }
    }

而 Handler 通过 handleChangeListener() 将 Listener 实例更新。

    private Handler mHandler = new Handler(Looper.getMainLooper()) {
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                ...
                case MSG_CHANGE_LISTENER:
                    handleChangeListener((RecognitionListener) msg.obj);
                    break;
                ...
            }
        }
    };

    private void handleChangeListener(RecognitionListener listener) {
        if (DBG) Log.d(TAG, "handleChangeListener, listener=" + listener);
        mListener.mInternalListener = listener;
    }

开始识别

startListening() 首先将确保识别请求的 Intent 不为空，否则弹出 “intent must not be null” 的提示，接着检查调用线程是否是主线程，反之抛出 “SpeechRecognizer should be used only from the application’s main thread” 的 Exception。

然后就是确保服务是准备妥当的，不然的话调用 connectToSystemService() 建立识别服务的连接。

    public void startListening(final Intent recognizerIntent) {
        if (recognizerIntent == null) {
            throw new IllegalArgumentException("intent must not be null");
        }
        checkIsCalledFromMainThread();

        if (mService == null) {
            // First time connection: first establish a connection, then dispatch #startListening.
            connectToSystemService();
        }
        putMessage(Message.obtain(mHandler, MSG_START, recognizerIntent));
    }

connectToSystemService() 的第一步是调用 getSpeechRecognizerComponentName() 获取识别服务的组件名称，一种是来自于请求 App 的指定，一种是来自 SettingsProvider 中存放的当前识别服务的包名 VOICE_RECOGNITION_SERVICE，其实就是和 VoiceInteraction 的 App 一致。如果包名不存在的话结束。

包名确实存在的话，通过 IRecognitionServiceManager.aidl 向 SystemServer 中管理语音识别的 SpeechRecognitionManagerService 系统服务发送创建 Session 的请求。

    /** Establishes a connection to system server proxy and initializes the session. */
    private void connectToSystemService() {
        if (!maybeInitializeManagerService()) {
            return;
        }

        ComponentName componentName = getSpeechRecognizerComponentName();

        if (!mOnDevice && componentName == null) {
            mListener.onError(ERROR_CLIENT);
            return;
        }

        try {
            mManagerService.createSession(
                    componentName,
                    mClientToken,
                    mOnDevice,
                    new IRecognitionServiceManagerCallback.Stub(){
                        @Override
                        public void onSuccess(IRecognitionService service) throws RemoteException {
                            mService = service;
                            while (!mPendingTasks.isEmpty()) {
                                mHandler.sendMessage(mPendingTasks.poll());
                            }
                        }

                        @Override
                        public void onError(int errorCode) throws RemoteException {
                            mListener.onError(errorCode);
                        }
                    });
        } catch (RemoteException e) {
            e.rethrowFromSystemServer();
        }
    }

SpeechRecognitionManagerService 的处理是调用 SpeechRecognitionManagerServiceImpl 实现。

// SpeechRecognitionManagerService.java
    final class SpeechRecognitionManagerServiceStub extends IRecognitionServiceManager.Stub {
        @Override
        public void createSession(
                ComponentName componentName,
                IBinder clientToken,
                boolean onDevice,
                IRecognitionServiceManagerCallback callback) {
            int userId = UserHandle.getCallingUserId();
            synchronized (mLock) {
                SpeechRecognitionManagerServiceImpl service = getServiceForUserLocked(userId);
                service.createSessionLocked(componentName, clientToken, onDevice, callback);
            }
        }
        ...
    }

SpeechRecognitionManagerServiceImpl 则是交给 RemoteSpeechRecognitionService 类完成和 App 识别服务的绑定，可以看到 RemoteSpeechRecognitionService 将负责和识别服务的通信。

// SpeechRecognitionManagerServiceImpl.java
    void createSessionLocked( ... ) {
        ...
        RemoteSpeechRecognitionService service = createService(creatorCallingUid, serviceComponent);
        ...
        service.connect().thenAccept(binderService -> {
            if (binderService != null) {
                try {
                    callback.onSuccess(new IRecognitionService.Stub() {
                        @Override
                        public void startListening( ... )
                                        throws RemoteException {
                            ...
                            service.startListening(recognizerIntent, listener, attributionSource);
                        }
                        ...
                    });
                } catch (RemoteException e) {
                    tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
                }
            } else {
                tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
            }
        });
    }

当和识别服务 App 的连接建立成功或者已经存在的话，发送 MSG_START 的 Message，Main Handler 则是调用 handleStartListening() 继续。其首先会再度检查 mService 是否存在，避免引发 NPE。

接着，向该 AIDL 接口代理对象发送开始聆听的请求。

    private Handler mHandler = new Handler(Looper.getMainLooper()) {
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case MSG_START:
                    handleStartListening((Intent) msg.obj);
                    break;
                ...
            }
        }
    };

    private void handleStartListening(Intent recognizerIntent) {
        if (!checkOpenConnection()) {
            return;
        }
        try {
            mService.startListening(recognizerIntent, mListener, mContext.getAttributionSource());
        }
        ...
    }

该 AIDL 的定义在如下文件中：

// android/speech/IRecognitionService.aidl
oneway interface IRecognitionService {
    void startListening(in Intent recognizerIntent, in IRecognitionListener listener,
            in AttributionSource attributionSource);
            
    void stopListening(in IRecognitionListener listener);

    void cancel(in IRecognitionListener listener, boolean isShutdown);
    ...
}

该 AIDL 的实现在系统的识别管理类 SpeechRecognitionManagerServiceImpl 中：

// com/android/server/speech/SpeechRecognitionManagerServiceImpl.java
    void createSessionLocked( ... ) {
        ...
        service.connect().thenAccept(binderService -> {
            if (binderService != null) {
                try {
                    callback.onSuccess(new IRecognitionService.Stub() {
                        @Override
                        public void startListening( ...) {
                            attributionSource.enforceCallingUid();
                            if (!attributionSource.isTrusted(mMaster.getContext())) {
                                attributionSource = mMaster.getContext()
                                        .getSystemService(PermissionManager.class)
                                        .registerAttributionSource(attributionSource);
                            }
                            service.startListening(recognizerIntent, listener, attributionSource);
                        }
                        ...
                    });
                } ...
            } else {
                tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
            }
        });
    }

此后还要经过一层 RemoteSpeechRecognitionService 的中转：

// com/android/server/speech/RemoteSpeechRecognitionService.java
void startListening(Intent recognizerIntent, IRecognitionListener listener,
            @NonNull AttributionSource attributionSource) {
        ...
        synchronized (mLock) {
            if (mSessionInProgress) {
                tryRespondWithError(listener, SpeechRecognizer.ERROR_RECOGNIZER_BUSY);
                return;
            }

            mSessionInProgress = true;
            mRecordingInProgress = true;

            mListener = listener;
            mDelegatingListener = new DelegatingListener(listener, () -> {
                synchronized (mLock) {
                    resetStateLocked();
                }
            });

            final DelegatingListener listenerToStart = this.mDelegatingListener;
            run(service ->
                    service.startListening(
                            recognizerIntent,
                            listenerToStart,
                            attributionSource));
        }
    }

最后调用具体服务的实现，自然位于 RecognitionService 中，该 Binder 线程向主线程发送 MSG_START_LISTENING Message：

/** Binder of the recognition service */
    private static final class RecognitionServiceBinder extends IRecognitionService.Stub {
        ...
        @Override
        public void startListening(Intent recognizerIntent, IRecognitionListener listener,
                @NonNull AttributionSource attributionSource) {
            final RecognitionService service = mServiceRef.get();
            if (service != null) {
                service.mHandler.sendMessage(Message.obtain(service.mHandler,
                        MSG_START_LISTENING, service.new StartListeningArgs(
                                recognizerIntent, listener, attributionSource)));
            }
        }
        ...
    }

    private final Handler mHandler = new Handler() {
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case MSG_START_LISTENING:
                    StartListeningArgs args = (StartListeningArgs) msg.obj;
                    dispatchStartListening(args.mIntent, args.mListener, args.mAttributionSource);
                    break;
                ...
            }
        }
    };

Handler 接受一样将具体事情交由 dispatchStartListening() 继续，最重要的内容是检查发起识别的 Intent 中是否提供了 EXTRA_AUDIO_SOURCE 活跃音频来源，或者请求的 App 是否具备 RECORD_AUDIO 的 permission。

private void dispatchStartListening(Intent intent, final IRecognitionListener listener,
            @NonNull AttributionSource attributionSource) {
        try {
            if (mCurrentCallback == null) {
                boolean preflightPermissionCheckPassed =
                        intent.hasExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE)
                        || checkPermissionForPreflightNotHardDenied(attributionSource);
                if (preflightPermissionCheckPassed) {
                    mCurrentCallback = new Callback(listener, attributionSource);
                    RecognitionService.this.onStartListening(intent, mCurrentCallback);
                }

                if (!preflightPermissionCheckPassed || !checkPermissionAndStartDataDelivery()) {
                    listener.onError(SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS);
                    if (preflightPermissionCheckPassed) {
                        // If we attempted to start listening, cancel the callback
                        RecognitionService.this.onCancel(mCurrentCallback);
                        dispatchClearCallback();
                    }
                }
                ...
            }
        } catch (RemoteException e) {
            Log.d(TAG, "onError call from startListening failed");
        }
    }

任一条件满足的话，调用服务实现的 onStartListening 方法发起识别，具体逻辑由各自的服务决定，其最终将调用 Callback 返回识别状态和结果，对应着【如何请求识别？】章节里对应的 RecognitionListener 回调。

protected abstract void onStartListening(Intent recognizerIntent, Callback listener);

停止识别 & 取消服务

后续的停止识别 stopListening()、取消服务 cancel() 的实现链路和开始识别基本一致，最终分别抵达 RecognitionService 的 onStopListening() 以及 onCancel() 回调。

唯一区别的地方在于 stop 只是暂时停止识别，识别 App 的连接还在，而 cancel 则是断开了连接、并重置了相关数据。

void cancel(IRecognitionListener listener, boolean isShutdown) {
        ...
        synchronized (mLock) {
            ...
            mRecordingInProgress = false;
            mSessionInProgress = false;

            mDelegatingListener = null;
            mListener = null;

            // Schedule to unbind after cancel is delivered.
            if (isShutdown) {
                run(service -> unbind());
            }
        }
    }

结语

Recognition_whole_process.drawio.png

最后我们结合一张图整体了解一下 SpeechRecognizer 机制的链路：

需要语音识别的 App 通过 SpeechRecognizer 发送 Request
SpeechRecognizer 在发起识别的时候通过 IRecognitionServiceManager.aidl 告知 SystemServer 的 SpeechRecognitionManagerService 系统服务，去 SettingsProvider 中获取默认的 Recognition 服务包名
SpeechRecognitionManagerService 并不直接负责绑定，而是交由 SpeechRecognitionManagerServiceImpl 调度
SpeechRecognitionManagerServiceImpl 则是交给 RemoteSpeechRecognitionService 专门绑定和管理
RemoteSpeechRecognitionService 通过 IRecognitionService.aidl 和具体的识别服务 RecognitionService 进行交互
RecognitionService 则会通过 Handler 切换到主线程，调用识别 engine 开始处理识别请求，并通过 Callback 内部类完成识别状态、结果的返回
后续则是 RecognitionService 通过 IRecognitionListener.aidl 将结果传递至 SystemServer，以及进一步抵达发出请求的 App 源头