Android 标准语音识别框架：SpeechRecognizer 的封装、调用和原理

如何实现识别服务？

首先我们得提供识别服务的实现，简单来说继承 RecognitionService 实现最重要的几个抽象方法即可：

定义抽象的识别 Engine 的接口 IRecognitionEngine。
在 RecognitionService 启动的时候获取识别 engine 提供商的实现实例。
在 onStartListening() 里解析识别请求 Intent 中的参数，比如语言、最大结果数等信息，封装成 JSON 字符串传递给 engine 的开始识别。Engine 也需要依据参数进行识别实现方面的调整，并将识别过程中相应的状态、结果返回，比如开始说话 beginningOfSpeech()、结束说话 endOfSpeech()、中间结果 partialResults() 等。
onStopListening() 里调用 engine 的停止识别，同样需要 engine 回传结果，比如最终识别结果 results()。
onCancel() 里执行 engine 提供的 release() 进行识别 engine 的解绑、资源释放。

接口定义

interface IRecognitionEngine {
    
    
    fun init()
    fun startASR(parameter: String, callback: Callback?)
    fun stopASR(callback: Callback?)
    fun release(callback: Callback?)
}

服务实现

class CommonRecognitionService : RecognitionService() {
    
    
    private val recognitionEngine: IRecognitionEngine by lazy {
    
    
        RecognitionProvider.provideRecognition()
    }

    override fun onCreate() {
    
    
        super.onCreate()
        recognitionEngine.init()
    }

    override fun onStartListening(intent: Intent?, callback: Callback?) {
    
    
        val params: String = "" // Todo parse parameter from intent
        recognitionEngine.startASR(params, callback)
    }

    override fun onStopListening(callback: Callback?) {
    
    
        recognitionEngine.stopASR(callback)
    }

    override fun onCancel(callback: Callback?) {
    
    
        recognitionEngine.release(callback)
    }
}

当然不要忘记在 Manifest 中声明：

<service
    android:name=".recognition.service.CommonRecognitionService"
    android:exported="true">
    <intent-filter>
        <action android:name="android.speech.RecognitionService"/>
    </intent-filter>
</service>

这样格式化后，更加清晰易读。

如何请求识别？

首先需要声明 capture audio 的运行时权限，并补充运行时权限的代码逻辑。

<manifest ... >
    <uses-permission android:name="android.permission.RECORD_AUDIO"/>
</manifest>

另外，Android 11 以上还需额外添加对识别服务的包名 query 声明。

<manifest ... >
    ...
    <queries>
        <intent>
            <action android:name="android.speech.RecognitionService" />
        </intent>
    </queries>
</manifest>

权限满足之后，最好先检查系统里是否有可用的 Recognition 服务，如果没有，直接结束。

class RecognitionHelper(val context: Context) {
    
    
    fun prepareRecognition(): Boolean {
    
    
        if (!SpeechRecognizer.isRecognitionAvailable(context)) {
    
    
            Log.e("RecognitionHelper", "System has no recognition service yet.")
            return false
        }
        ...
    }
}

如果有可用服务，可以通过 SpeechRecognizer 的静态方法创建识别实例，必须在主线程调用。

class RecognitionHelper(val context: Context) : RecognitionListener {
    
    
    private lateinit var recognizer: SpeechRecognizer

    fun prepareRecognition(): Boolean {
    
    
        ...
        recognizer = SpeechRecognizer.createSpeechRecognizer(context)
        ...
    }
}

如果系统有多个服务，且已知其包名，可以指定识别的实现方：

public static SpeechRecognizer createSpeechRecognizer(Context context, 
                ComponentName serviceComponent)

接下来，设置 Recognition 的监听器，对应识别过程中的各种状态：

onPartialResults()：返回中间结果，通过 SpeechRecognizer#RESULTS_RECOGNITION 从 Bundle 获取识别字符串。
onResults()：返回最终识别结果，解析方式同上。
onBeginningOfSpeech()：检测到开始说话。
onEndOfSpeech()：检测到说话结束。
onError()：返回错误代码，如没有麦克风权限会返回 ERROR_INSUFFICIENT_PERMISSIONS。

class RecognitionHelper(val context: Context) : RecognitionListener {
    
    
    ...
    fun prepareRecognition(): Boolean {
    
    
        ...
        recognizer.setRecognitionListener(this)
        return true
    }

    override fun onReadyForSpeech(p0: Bundle?) {
    
    }
    override fun onBeginningOfSpeech() {
    
    }
    override fun onRmsChanged(p0: Float) {
    
    }
    override fun onBufferReceived(p0: ByteArray?) {
    
    }
    override fun onEndOfSpeech() {
    
    }
    override fun onError(p0: Int) {
    
    }
    override fun onResults(p0: Bundle?) {
    
    }
    override fun onPartialResults(p0: Bundle?) {
    
    }
    override fun onEvent(p0: Int, p1: Bundle?) {
    
    }
}

之后，创建识别的必要 Intent 信息并启动。信息包括：

EXTRA_LANGUAGE_MODEL：必选，指定识别的模型，比如 LANGUAGE_MODEL_FREE_FORM 或 LANGUAGE_MODEL_WEB_SEARCH。
EXTRA_PARTIAL_RESULTS：可选，是否要求返回识别过程中的结果，默认为 false。
EXTRA_MAX_RESULTS：可选，设置允许返回的最多结果数。
EXTRA_LANGUAGE：可选，设置识别语言，默认为 Locale.getDefault()。

注意两点：

必须在设置监听器之后调用。
方法需在主线程发起。

class RecognitionHelper(val context: Context) : RecognitionListener {
    
    
    ...
    fun startRecognition() {
    
    
        val intent = createRecognitionIntent()
        recognizer.startListening(intent)
    }
    ...
}

fun createRecognitionIntent() = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
    
    
    putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
    putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
    putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3)
    putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.ENGLISH)
}

最后，添加一个布局来调用 RecognitionHelper 进行识别的初始化和启动，并将结果展示出来。

同时添加和 UI 交互的中间识别结果和最终识别结果的 interface，将 RecognitionListener 的数据带回。

interface ASRResultListener {
    
    
    fun onPartialResult(result: String)

    fun onFinalResult(result: String)
}

class RecognitionHelper(private val context: Context) : RecognitionListener {
    
    
    ...
    private lateinit var mResultListener: ASRResultListener

    fun prepareRecognition(resultListener: ASRResultListener): Boolean {
    
    
        ...
        mResultListener = resultListener
        ...
    }
    
    ...

    override fun onPartialResults(bundle: Bundle?) {
    
    
        bundle?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let {
    
    
            Log.d(
                "RecognitionHelper", "onPartialResults() with:$bundle" +
                        " results:$it"
            )

            mResultListener.onPartialResult(it[0])
        }
    }

    override fun onResults(bundle: Bundle?) {
    
    
        bundle?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let {
    
    
            Log.d(
                "RecognitionHelper", "onResults() with:$bundle" +
                        " results:$it"
            )

            mResultListener.onFinalResult(it[0])
        }
    }
}

接着，Activity 实现该借口，将数据展示到 TextView，为了能够让肉眼能够分辨中间结果的识别过程，在更新 TextView 前进行 300ms 的等待。

class RecognitionActivity : AppCompatActivity(), ASRResultListener {
    
    
    private lateinit var binding: RecognitionLayoutBinding
    private val recognitionHelper: RecognitionHelper by lazy {
    
    
        RecognitionHelper(this)
    }

    private var updatingTextTimeDelayed = 0L
    private val mainHandler = Handler(Looper.getMainLooper())

    override fun onCreate(savedInstanceState: Bundle?) {
    
    
        ...

        if (!recognitionHelper.prepareRecognition(this)) {
    
    
            Toast.makeText(this, "Recognition not available", Toast.LENGTH_SHORT).show()

            return
        }

        binding.start.setOnClickListener {
    
    
            Log.d("RecognitionHelper", "startRecognition()")

            recognitionHelper.startRecognition()
        }

        binding.stop.setOnClickListener {
    
    
            Log.d("RecognitionHelper", "stopRecognition()")

            recognitionHelper.stopRecognition()
        }
    }

    override fun onStop() {
    
    
        super.onStop()
        Log.d("RecognitionHelper", "onStop()")

        recognitionHelper.releaseRecognition()
    }

    override fun onPartialResult(result: String) {
    
    
        Log.d("RecognitionHelper", "onPartialResult() with result:$result")

        updatingTextTimeDelayed += 300L
        mainHandler.postDelayed(
            {
    
    
                Log.d("RecognitionHelper", "onPartialResult() updating")
                binding.recoAsr.text = result
            }, updatingTextTimeDelayed
        )
    }

    override fun onFinalResult(result: String) {
    
    
        Log.d("RecognitionHelper", "onFinalResult() with result:$result")

        updatingTextTimeDelayed += 300L
        mainHandler.postDelayed(
            {
    
    
                Log.d("RecognitionHelper", "onFinalResult() updating")
                binding.recoAsr.text = result
            }, updatingTextTimeDelayed
        )
    }
}

我们点击“START RECOGNITION” button，然后可以看到手机右上角显示了 mic 录音中，当我们说出“Can you introduce yourself” 后，TextView 能够逐步上屏，呈现打字机的效果。

下面是过程中的 log，也反映了识别过程：

// 初始化
08-15 22:43:13.963  6879  6879 D RecognitionHelper: onCreate()
08-15 22:43:14.037  6879  6879 E RecognitionHelper: audio recording permission granted
08-15 22:43:14.050  6879  6879 D RecognitionHelper: onStart()

// 开始识别
08-15 22:43:41.491  6879  6879 D RecognitionHelper: startRecognition()
08-15 22:43:41.577  6879  6879 D RecognitionHelper: onReadyForSpeech()
08-15 22:43:41.776  6879  6879 D RecognitionHelper: onRmsChanged() with:-2.0
...
08-15 22:43:46.532  6879  6879 D RecognitionHelper: onRmsChanged() with:-0.31999993

// 检测到开始说话
08-15 22:43:46.540  6879  6879 D RecognitionHelper: onBeginningOfSpeech()

// 第 1 个识别结果：Can
08-15 22:43:46.541  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{
    
    results_recognition=[Can], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can]
08-15 22:43:46.541  6879  6879 D RecognitionHelper: onPartialResult() with result:Can

// 第 2 个识别结果：Can you
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{
    
    results_recognition=[Can you], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you

// 第 3 个识别结果：Can you in
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{
    
    results_recognition=[Can you in], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you in]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you in

// 第 4 个识别结果：Can you intro
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{
    
    results_recognition=[Can you intro], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you intro]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you intro

// 第 n 个识别结果：Can you introduce yourself
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResults() with:Bundle[{
    
    results_recognition=[Can you introduce yourself], android.speech.extra.UNSTABLE_TEXT=[]}] results:[Can you introduce yourself]
08-15 22:43:46.542  6879  6879 D RecognitionHelper: onPartialResult() with result:Can you introduce yourself

// 检测到停止说话
08-15 22:43:46.543  6879  6879 D RecognitionHelper: onEndOfSpeech()
08-15 22:43:46.543  6879  6879 D RecognitionHelper: onEndOfSpeech()
08-15 22:43:46.545  6879  6879 D RecognitionHelper: onResults() with:Bundle[{
    
    results_recognition=[Can you introduce yourself], confidence_scores=[0.0]}] results:[Can you introduce yourself]

// 识别到最终结果：Can you introduce yourself
08-15 22:43:46.545  6879  6879 D RecognitionHelper: onFinalResult() with result:Can you introduce yourself

系统如何调度？

SpeechRecognizer 没有像 Text-to-speech 一样在设置中提供独立的设置入口，其默认 App 由 VoiceInteraction 联动设置。

但如下命令可以 dump 出系统默认的识别服务。

adb shell settings get secure voice_recognition_service
当在模拟器中 dump 的话，可以看到默认搭载的是 Google 的识别服务。

com.google.android.tts/com.google.android.apps.speech.tts.googletts.service.GoogleTTSRecognitionService

在三星设备中 dump 的话，则是 Samsung 提供的识别服务。

com.samsung.android.bixby.agent/.mainui.voiceinteraction.RecognitionServiceTrampoline

我们从请求识别中提及的几个 API 入手探究一下识别服务的实现原理。

检测识别服务

检查服务是否可用的实现很简单，即是用 Recognition 专用的 Action（android.speech.RecognitionService）去 PackageManager 中检索，能够启动的 App 存在 1 个的话，即认为系统有识别服务可用。

 public static boolean isRecognitionAvailable(final Context context) {
    
    
      final List<ResolveInfo> list = context.getPackageManager().queryIntentServices(
              new Intent(RecognitionService.SERVICE_INTERFACE), 0);
      return list != null && list.size() != 0;
  }

初始化识别服务

正如【如何请求识别？】章节中讲述的，调用静态方法 createSpeechRecognizer() 完成初始化，内部将检查 Context 是否存在、依据是否指定识别服务的包名决定是否记录目标的服务名称。

    public static SpeechRecognizer createSpeechRecognizer(final Context context) {
    
    
        return createSpeechRecognizer(context, null);
    }

    public static SpeechRecognizer createSpeechRecognizer(final Context context,
            final ComponentName serviceComponent) {
    
    
        if (context == null) {
    
    
            throw new IllegalArgumentException("Context cannot be null");
        }
        checkIsCalledFromMainThread();
        return new SpeechRecognizer(context, serviceComponent);
    }

    private SpeechRecognizer(final Context context, final ComponentName serviceComponent) {
    
    
        mContext = context;
        mServiceComponent = serviceComponent;
        mOnDevice = false;
    }

得到 SpeechRecognizer 之后调用 setRecognitionListener() 则稍微复杂些：

检查调用源头是否属于主线程
创建专用 Message MSG_CHANGE_LISTENER
如果系统处理 Recognition 请求的服务 SpeechRecognitionManagerService 尚未建立连接，先将该 Message 排入 Pending Queue，等后续发起识别的时候创建连接后会将 Message 发往 Handler
反之直接放入 Handler 等待调度

public void setRecognitionListener(RecognitionListener listener) {
    
    
        checkIsCalledFromMainThread();
        putMessage(Message.obtain(mHandler, MSG_CHANGE_LISTENER, listener));
    }

    private void putMessage(Message msg) {
    
    
        if (mService == null) {
    
    
            mPendingTasks.offer(msg);
        } else {
    
    
            mHandler.sendMessage(msg);
        }
    }

而 Handler 通过 handleChangeListener() 将 Listener 实例更新。

    private Handler mHandler = new Handler(Looper.getMainLooper()) {
    
    
        @Override
        public void handleMessage(Message msg) {
    
    
            switch (msg.what) {
    
    
                ...
                case MSG_CHANGE_LISTENER:
                    handleChangeListener((RecognitionListener) msg.obj);
                    break;
                ...
            }
        }
    };

    private void handleChangeListener(RecognitionListener listener) {
    
    
        if (DBG) Log.d(TAG, "handleChangeListener, listener=" + listener);
        mListener.mInternalListener = listener;
    }

开始识别

startListening() 首先将确保识别请求的 Intent 不为空，否则弹出 "intent must not be null"的提示，接着检查调用线程是否是主线程，反之抛出 “SpeechRecognizer should be used only from the application’s main thread” 的 Exception。

然后就是确保服务是准备妥当的，不然的话调用 connectToSystemService() 建立识别服务的连接。

    public void startListening(final Intent recognizerIntent) {
    
    
        if (recognizerIntent == null) {
    
    
            throw new IllegalArgumentException("intent must not be null");
        }
        checkIsCalledFromMainThread();

        if (mService == null) {
    
    
            // First time connection: first establish a connection, then dispatch #startListening.
            connectToSystemService();
        }
        putMessage(Message.obtain(mHandler, MSG_START, recognizerIntent));
    }

connectToSystemService() 的第一步是调用 getSpeechRecognizerComponentName() 获取识别服务的组件名称，一种是来自于请求 App 的指定，一种是来自 SettingsProvider 中存放的当前识别服务的包名 VOICE_RECOGNITION_SERVICE，其实就是和 VoiceInteraction 的 App 一致。如果包名不存在的话结束。

包名确实存在的话，通过 IRecognitionServiceManager.aidl 向 SystemServer 中管理语音识别的 SpeechRecognitionManagerService 系统服务发送创建 Session 的请求。

    /** Establishes a connection to system server proxy and initializes the session. */
    private void connectToSystemService() {
    
    
        if (!maybeInitializeManagerService()) {
    
    
            return;
        }

        ComponentName componentName = getSpeechRecognizerComponentName();

        if (!mOnDevice && componentName == null) {
    
    
            mListener.onError(ERROR_CLIENT);
            return;
        }

        try {
    
    
            mManagerService.createSession(
                    componentName,
                    mClientToken,
                    mOnDevice,
                    new IRecognitionServiceManagerCallback.Stub(){
    
    
                        @Override
                        public void onSuccess(IRecognitionService service) throws RemoteException {
    
    
                            mService = service;
                            while (!mPendingTasks.isEmpty()) {
    
    
                                mHandler.sendMessage(mPendingTasks.poll());
                            }
                        }

                        @Override
                        public void onError(int errorCode) throws RemoteException {
    
    
                            mListener.onError(errorCode);
                        }
                    });
        } catch (RemoteException e) {
    
    
            e.rethrowFromSystemServer();
        }
    }

SpeechRecognitionManagerService 的处理是调用 SpeechRecognitionManagerServiceImpl 实现。

// SpeechRecognitionManagerService.java
    final class SpeechRecognitionManagerServiceStub extends IRecognitionServiceManager.Stub {
    
    
        @Override
        public void createSession(
                ComponentName componentName,
                IBinder clientToken,
                boolean onDevice,
                IRecognitionServiceManagerCallback callback) {
    
    
            int userId = UserHandle.getCallingUserId();
            synchronized (mLock) {
    
    
                SpeechRecognitionManagerServiceImpl service = getServiceForUserLocked(userId);
                service.createSessionLocked(componentName, clientToken, onDevice, callback);
            }
        }
        ...
    }

SpeechRecognitionManagerServiceImpl 则是交给 RemoteSpeechRecognitionService类完成和 App 识别服务的绑定，可以看到 RemoteSpeechRecognitionService 将负责和识别服务的通信。

// SpeechRecognitionManagerServiceImpl.java
    void createSessionLocked( ... ) {
    
    
        ...
        RemoteSpeechRecognitionService service = createService(creatorCallingUid, serviceComponent);
        ...
        service.connect().thenAccept(binderService -> {
    
    
            if (binderService != null) {
    
    
                try {
    
    
                    callback.onSuccess(new IRecognitionService.Stub() {
    
    
                        @Override
                        public void startListening( ... )
                                        throws RemoteException {
    
    
                            ...
                            service.startListening(recognizerIntent, listener, attributionSource);
                        }
                        ...
                    });
                } catch (RemoteException e) {
    
    
                    tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
                }
            } else {
    
    
                tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
            }
        });
    }

当和识别服务 App 的连接建立成功或者已经存在的话，发送 MSG_START 的 Message，Main Handler 则是调用 handleStartListening() 继续。其首先会再度检查 mService 是否存在，避免引发 NPE。

接着，向该 AIDL 接口代理对象发送开始聆听的请求。

    private Handler mHandler = new Handler(Looper.getMainLooper()) {
    
    
        @Override
        public void handleMessage(Message msg) {
    
    
            switch (msg.what) {
    
    
                case MSG_START:
                    handleStartListening((Intent) msg.obj);
                    break;
                ...
            }
        }
    };

    private void handleStartListening(Intent recognizerIntent) {
    
    
        if (!checkOpenConnection()) {
    
    
            return;
        }
        try {
    
    
            mService.startListening(recognizerIntent, mListener, mContext.getAttributionSource());
        }
        ...
    }

该 AIDL 的定义在如下文件中：

// android/speech/IRecognitionService.aidl
oneway interface IRecognitionService {
    
    
    void startListening(in Intent recognizerIntent, in IRecognitionListener listener,
            in AttributionSource attributionSource);
            
    void stopListening(in IRecognitionListener listener);

    void cancel(in IRecognitionListener listener, boolean isShutdown);
    ...
}

该 AIDL 的实现在系统的识别管理类 SpeechRecognitionManagerServiceImpl 中：

// com/android/server/speech/SpeechRecognitionManagerServiceImpl.java
    void createSessionLocked( ... ) {
    
    
        ...
        service.connect().thenAccept(binderService -> {
    
    
            if (binderService != null) {
    
    
                try {
    
    
                    callback.onSuccess(new IRecognitionService.Stub() {
    
    
                        @Override
                        public void startListening( ...) {
    
    
                            attributionSource.enforceCallingUid();
                            if (!attributionSource.isTrusted(mMaster.getContext())) {
    
    
                                attributionSource = mMaster.getContext()
                                        .getSystemService(PermissionManager.class)
                                        .registerAttributionSource(attributionSource);
                            }
                            service.startListening(recognizerIntent, listener, attributionSource);
                        }
                        ...
                    });
                } ...
            } else {
    
    
                tryRespondWithError(callback, SpeechRecognizer.ERROR_CLIENT);
            }
        });
    }

此后还要经过一层 RemoteSpeechRecognitionService 的中转：

// com/android/server/speech/RemoteSpeechRecognitionService.java
void startListening(Intent recognizerIntent, IRecognitionListener listener,
            @NonNull AttributionSource attributionSource) {
    
    
        ...
        synchronized (mLock) {
    
    
            if (mSessionInProgress) {
    
    
                tryRespondWithError(listener, SpeechRecognizer.ERROR_RECOGNIZER_BUSY);
                return;
            }

            mSessionInProgress = true;
            mRecordingInProgress = true;

            mListener = listener;
            mDelegatingListener = new DelegatingListener(listener, () -> {
    
    
                synchronized (mLock) {
    
    
                    resetStateLocked();
                }
            });

            final DelegatingListener listenerToStart = this.mDelegatingListener;
            run(service ->
                    service.startListening(
                            recognizerIntent,
                            listenerToStart,
                            attributionSource));
        }
    }

最后调用具体服务的实现，自然位于RecognitionService中，该 Binder 线程向主线程发送 MSG_START_LISTENING Message：

/** Binder of the recognition service */
    private static final class RecognitionServiceBinder extends IRecognitionService.Stub {
    
    
        ...
        @Override
        public void startListening(Intent recognizerIntent, IRecognitionListener listener,
                @NonNull AttributionSource attributionSource) {
    
    
            final RecognitionService service = mServiceRef.get();
            if (service != null) {
    
    
                service.mHandler.sendMessage(Message.obtain(service.mHandler,
                        MSG_START_LISTENING, service.new StartListeningArgs(
                                recognizerIntent, listener, attributionSource)));
            }
        }
        ...
    }

    private final Handler mHandler = new Handler() {
    
    
        @Override
        public void handleMessage(Message msg) {
    
    
            switch (msg.what) {
    
    
                case MSG_START_LISTENING:
                    StartListeningArgs args = (StartListeningArgs) msg.obj;
                    dispatchStartListening(args.mIntent, args.mListener, args.mAttributionSource);
                    break;
                ...
            }
        }
    };

Handler 接受一样将具体事情交由 dispatchStartListening() 继续，最重要的内容是检查发起识别的 Intent 中是否提供了 EXTRA_AUDIO_SOURCE 活跃音频来源，或者请求的 App 是否具备 RECORD_AUDIO 的 permission。

private void dispatchStartListening(Intent intent, final IRecognitionListener listener,
            @NonNull AttributionSource attributionSource) {
    
    
        try {
    
    
            if (mCurrentCallback == null) {
    
    
                boolean preflightPermissionCheckPassed =
                        intent.hasExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE)
                        || checkPermissionForPreflightNotHardDenied(attributionSource);
                if (preflightPermissionCheckPassed) {
    
    
                    mCurrentCallback = new Callback(listener, attributionSource);
                    RecognitionService.this.onStartListening(intent, mCurrentCallback);
                }

                if (!preflightPermissionCheckPassed || !checkPermissionAndStartDataDelivery()) {
    
    
                    listener.onError(SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS);
                    if (preflightPermissionCheckPassed) {
    
    
                        // If we attempted to start listening, cancel the callback
                        RecognitionService.this.onCancel(mCurrentCallback);
                        dispatchClearCallback();
                    }
                }
                ...
            }
        } catch (RemoteException e) {
    
    
            Log.d(TAG, "onError call from startListening failed");
        }
    }

任一条件满足的话，调用服务实现的 onStartListening 方法发起识别，具体逻辑由各自的服务决定，其最终将调用Callback返回识别状态和结果，对应着【如何请求识别？】章节里对应的 RecognitionListener 回调。
protected abstract void onStartListening(Intent recognizerIntent, Callback listener);

停止识别 & 取消服务

后续的停止识别 stopListening()、取消服务 cancel() 的实现链路和开始识别基本一致，最终分别抵达 RecognitionService 的 onStopListening() 以及 onCancel() 回调。

唯一区别的地方在于 stop 只是暂时停止识别，识别 App 的连接还在，而 cancel 则是断开了连接、并重置了相关数据。

void cancel(IRecognitionListener listener, boolean isShutdown) {
    
    
        ...
        synchronized (mLock) {
    
    
            ...
            mRecordingInProgress = false;
            mSessionInProgress = false;

            mDelegatingListener = null;
            mListener = null;

            // Schedule to unbind after cancel is delivered.
            if (isShutdown) {
    
    
                run(service -> unbind());
            }
        }
    }

结语

SpeechRecognition框图
最后我们结合一张图整体了解一下 SpeechRecognizer 机制的链路：

需要语音识别的 App 通过 SpeechRecognizer 发送 Request
SpeechRecognizer 在发起识别的时候通过 IRecognitionServiceManager.aidl 告知 SystemServer 的 SpeechRecognitionManagerService 系统服务，去 SettingsProvider 中获取默认的 Recognition 服务包名
SpeechRecognitionManagerService 并不直接负责绑定，而是交由 SpeechRecognitionManagerServiceImpl 调度
SpeechRecognitionManagerServiceImpl 则是交给 RemoteSpeechRecognitionService专门绑定和管理
RemoteSpeechRecognitionService 通过 IRecognitionService.aidl 和具体的识别服务 RecognitionService 进行交互
RecognitionService 则会通过 Handler 切换到主线程，调用识别 engine 开始处理识别请求，并通过 Callback 内部类完成识别状态、结果的返回
后续则是 RecognitionService 通过IRecognitionListener.aidl将结果传递至 SystemServer，以及进一步抵达发出请求的 App 源头

参考资料

https://developer.android.google.cn/reference/android/speech/SpeechRecognizer
https://developer.android.google.cn/reference/kotlin/android/speech/RecognitionService