Android 救援模式(Rescue Mode)原理剖析

引言


现在的一个Android设备出货,比如手机,平板和车机,都肯定是经过了很多次的测试。

软件的品质起码是有一个基本的保障。

但是有个实际情况是,当手机在市场上面发售以后,测试是没有办法模拟出来用户的所有操作的。

市场上的消费者包括小白用户,当手机出现各种异常时,用户只能通过设备商售后处理。

而现在售后一般对ROOT,和自己烧一些不是官方发布的软件版本是不保修的。

Android考虑到了这一点,所以增加了救援模式的功能。

可以在严重时,提供给用户恢复出厂设置的选项。

这也就是本文分析的内容。


救援级别

针对不同问题的严重级别,系统定制了不同的救援等级,说明如下:

    @VisibleForTesting
    static final int LEVEL_NONE = 0;
    @VisibleForTesting
    static final int LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS = 1;
    @VisibleForTesting
    static final int LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES = 2;
    @VisibleForTesting
    static final int LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS = 3;
    @VisibleForTesting
    static final int LEVEL_FACTORY_RESET = 4;

我们可以看到,从0 -> 4其实就是随着严重的等级不断的提升,到了4,其实就是factory的操作。


APP级别救援实现

流程图如下:
在这里插入图片描述

我们来看下具体的实现过程:
PWD:frameworks/base/core/java/com/android/internal/os/RuntimeInit.java

    /**
     * Handle application death from an uncaught exception.  The framework
     * catches these for the main threads, so this should only matter for
     * threads created by applications. Before this method runs, the given
     * instance of {@link LoggingHandler} should already have logged details
     * (and if not it is run first).
     */
    private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
    
    
        private final LoggingHandler mLoggingHandler;

        @Override
        public void uncaughtException(Thread t, Throwable e) {
    
    
            try {
    
    
                ensureLogging(t, e);

                // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
                if (mCrashing) return;
                mCrashing = true;

                // Try to end profiling. If a profiler is running at this point, and we kill the
                // process (below), the in-memory buffer will be lost. So try to stop, which will
                // flush the buffer. (This makes method trace profiling useful to debug crashes.)
                if (ActivityThread.currentActivityThread() != null) {
    
    
                    ActivityThread.currentActivityThread().stopProfiling();
                }

                // Bring up crash dialog, wait for it to be dismissed
                ActivityManager.getService().handleApplicationCrash(
                        mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
            } catch (Throwable t2) {
    
    
                if (t2 instanceof DeadObjectException) {
    
    
                    // System process is dead; ignore
                } else {
    
    
                    try {
    
    
                        Clog_e(TAG, "Error reporting crash", t2);
                    } catch (Throwable t3) {
    
    
                        // Even Clog_e() fails!  Oh well.
                    }
                }
            } finally {
    
    
                // Try everything to make sure this process goes away.
                Process.killProcess(Process.myPid());
                System.exit(10);
            }
        }

KillApplicationHandler是一个内部类,我们这边只截取了一个方法KillApplicationHandler
当APP出现异常,被Kill掉后,会进入到该方法中去进行处理。
这里会调用ActivityManager.getService().handleApplicationCrash来进行后续的处理。
PWD:frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

    /**
     * Used by {@link com.android.internal.os.RuntimeInit} to report when an application crashes.
     * The application process will exit immediately after this call returns.
     * @param app object of the crashing app, null for the system server
     * @param crashInfo describing the exception
     */
    public void handleApplicationCrash(IBinder app,
            ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
    
    
        ProcessRecord r = findAppProcess(app, "Crash");
        final String processName = app == null ? "system_server"
                : (r == null ? "unknown" : r.processName);

        handleApplicationCrashInner("crash", r, processName, crashInfo);
    }

这个注释也很有意思:

Used by {@link com.android.internal.os.RuntimeInit} to report when an application crashes.

然后就去将Crash的ProcessName,和CrashInfo去通过handleApplicationCrashInner进行处理。

    /* Native crash reporting uses this inner version because it needs to be somewhat
     * decoupled from the AM-managed cleanup lifecycle
     */
    void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
            ApplicationErrorReport.CrashInfo crashInfo) {
    
    

        EventLogTags.writeAmCrash(Binder.getCallingPid(),
                UserHandle.getUserId(Binder.getCallingUid()), processName,
                r == null ? -1 : r.info.flags,
                crashInfo.exceptionClassName,
                crashInfo.exceptionMessage,
                crashInfo.throwFileName,
                crashInfo.throwLineNumber);

        FrameworkStatsLog.write(FrameworkStatsLog.APP_CRASH_OCCURRED,
                Binder.getCallingUid(),
                eventType,
                processName,
                Binder.getCallingPid(),
                (r != null && r.info != null) ? r.info.packageName : "",
                (r != null && r.info != null) ? (r.info.isInstantApp()
                        ? FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__TRUE
                        : FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__FALSE)
                        : FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__UNAVAILABLE,
                r != null ? (r.isInterestingToUserLocked()
                        ? FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__FOREGROUND
                        : FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__BACKGROUND)
                        : FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__UNKNOWN,
                processName.equals("system_server") ? ServerProtoEnums.SYSTEM_SERVER
                        : (r != null) ? r.getProcessClassEnum()
                                      : ServerProtoEnums.ERROR_SOURCE_UNKNOWN
        );

        final int relaunchReason = r == null ? RELAUNCH_REASON_NONE
                        : r.getWindowProcessController().computeRelaunchReason();
        final String relaunchReasonString = relaunchReasonToString(relaunchReason);
        if (crashInfo.crashTag == null) {
    
    
            crashInfo.crashTag = relaunchReasonString;
        } else {
    
    
            crashInfo.crashTag = crashInfo.crashTag + " " + relaunchReasonString;
        }

        addErrorToDropBox(
                eventType, r, processName, null, null, null, null, null, null, crashInfo);

        mAppErrors.crashApplication(r, crashInfo);
    }

addErrorToDropBox函数如果熟悉android Log系统的同学,都会知道这个是一个非常重要的Error处理函数。
这个我们会在后续Log的分析文章中,进行专门的说明。
这里我们关心的是mAppErrors.crashApplication(r, crashInfo);

    /**
     * Bring up the "unexpected error" dialog box for a crashing app.
     * Deal with edge cases (intercepts from instrumented applications,
     * ActivityController, error intent receivers, that sort of thing).
     * @param r the application crashing
     * @param crashInfo describing the failure
     */
    void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
    
    
        final int callingPid = Binder.getCallingPid();
        final int callingUid = Binder.getCallingUid();

        final long origId = Binder.clearCallingIdentity();
        try {
    
    
            crashApplicationInner(r, crashInfo, callingPid, callingUid);
        } finally {
    
    
            Binder.restoreCallingIdentity(origId);
        }
    }

看下CrashApplicationInner的实现:

    void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
            int callingPid, int callingUid) {
    
    
        long timeMillis = System.currentTimeMillis();
        String shortMsg = crashInfo.exceptionClassName;
        String longMsg = crashInfo.exceptionMessage;
        String stackTrace = crashInfo.stackTrace;
        if (shortMsg != null && longMsg != null) {
    
    
            longMsg = shortMsg + ": " + longMsg;
        } else if (shortMsg != null) {
    
    
            longMsg = shortMsg;
        }

        if (r != null) {
    
    
            mPackageWatchdog.onPackageFailure(r.getPackageListWithVersionCode(),
                    PackageWatchdog.FAILURE_REASON_APP_CRASH);

            mService.mProcessList.noteAppKill(r, (crashInfo != null
                      && "Native crash".equals(crashInfo.exceptionClassName))
                      ? ApplicationExitInfo.REASON_CRASH_NATIVE
                      : ApplicationExitInfo.REASON_CRASH,
                      ApplicationExitInfo.SUBREASON_UNKNOWN,
                    "crash");
        }

        final int relaunchReason = r != null
                ? r.getWindowProcessController().computeRelaunchReason() : RELAUNCH_REASON_NONE;

        AppErrorResult result = new AppErrorResult();
        int taskId;
        synchronized (mService) {
    
    
            /**
             * If crash is handled by instance of {@link android.app.IActivityController},
             * finish now and don't show the app error dialog.
             */
            if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,
                    timeMillis, callingPid, callingUid)) {
    
    
                return;
            }

            // Suppress crash dialog if the process is being relaunched due to a crash during a free
            // resize.
            if (relaunchReason == RELAUNCH_REASON_FREE_RESIZE) {
    
    
                return;
            }

            /**
             * If this process was running instrumentation, finish now - it will be handled in
             * {@link ActivityManagerService#handleAppDiedLocked}.
             */
            if (r != null && r.getActiveInstrumentation() != null) {
    
    
                return;
            }

            // Log crash in battery stats.
            if (r != null) {
    
    
                mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid);
            }

            AppErrorDialog.Data data = new AppErrorDialog.Data();
            data.result = result;
            data.proc = r;

            // If we can't identify the process or it's already exceeded its crash quota,
            // quit right away without showing a crash dialog.
            if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {
    
    
                return;
            }

            final Message msg = Message.obtain();
            msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;

            taskId = data.taskId;
            msg.obj = data;
            mService.mUiHandler.sendMessage(msg);
        }

        int res = result.get();

        Intent appErrorIntent = null;
        MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_CRASH, res);
        if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {
    
    
            res = AppErrorDialog.FORCE_QUIT;
        }
        synchronized (mService) {
    
    
            if (res == AppErrorDialog.MUTE) {
    
    
                stopReportingCrashesLocked(r);
            }
            if (res == AppErrorDialog.RESTART) {
    
    
                mService.mProcessList.removeProcessLocked(r, false, true,
                        ApplicationExitInfo.REASON_CRASH, "crash");
                if (taskId != INVALID_TASK_ID) {
    
    
                    try {
    
    
                        mService.startActivityFromRecents(taskId,
                                ActivityOptions.makeBasic().toBundle());
                    } catch (IllegalArgumentException e) {
    
    
                        // Hmm...that didn't work. Task should either be in recents or associated
                        // with a stack.
                        Slog.e(TAG, "Could not restart taskId=" + taskId, e);
                    }
                }
            }
            if (res == AppErrorDialog.FORCE_QUIT) {
    
    
                long orig = Binder.clearCallingIdentity();
                try {
    
    
                    // Kill it with fire!
                    mService.mAtmInternal.onHandleAppCrash(r.getWindowProcessController());
                    if (!r.isPersistent()) {
    
    
                        mService.mProcessList.removeProcessLocked(r, false, false,
                                ApplicationExitInfo.REASON_CRASH, "crash");
                        mService.mAtmInternal.resumeTopActivities(false /* scheduleIdle */);
                    }
                } finally {
    
    
                    Binder.restoreCallingIdentity(orig);
                }
            }
            if (res == AppErrorDialog.APP_INFO) {
    
    
                appErrorIntent = new Intent(Settings.ACTION_APPLICATION_DETAILS_SETTINGS);
                appErrorIntent.setData(Uri.parse("package:" + r.info.packageName));
                appErrorIntent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
            }
            if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {
    
    
                appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);
            }
            if (r != null && !r.isolated && res != AppErrorDialog.RESTART) {
    
    
                // XXX Can't keep track of crash time for isolated processes,
                // since they don't have a persistent identity.
                mProcessCrashTimes.put(r.info.processName, r.uid,
                        SystemClock.uptimeMillis());
            }
        }

        if (appErrorIntent != null) {
    
    
            try {
    
    
                mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));
            } catch (ActivityNotFoundException e) {
    
    
                Slog.w(TAG, "bug report receiver dissappeared", e);
            }
        }
    }

在出现Crash的情况下,将会调用mPackageWatchdogonPackageFailure函数。

            mPackageWatchdog.onPackageFailure(r.getPackageListWithVersionCode(),
                    PackageWatchdog.FAILURE_REASON_APP_CRASH);

onPackageFailure的实现如下:

    /**
     * Called when a process fails due to a crash, ANR or explicit health check.
     *
     * <p>For each package contained in the process, one registered observer with the least user
     * impact will be notified for mitigation.
     *
     * <p>This method could be called frequently if there is a severe problem on the device.
     */
    public void onPackageFailure(List<VersionedPackage> packages,
            @FailureReasons int failureReason) {
    
    
        if (packages == null) {
    
    
            Slog.w(TAG, "Could not resolve a list of failing packages");
            return;
        }
        mLongTaskHandler.post(() -> {
    
    
            synchronized (mLock) {
    
    
                if (mAllObservers.isEmpty()) {
    
    
                    return;
                }
                boolean requiresImmediateAction = (failureReason == FAILURE_REASON_NATIVE_CRASH
                        || failureReason == FAILURE_REASON_EXPLICIT_HEALTH_CHECK);
                if (requiresImmediateAction) {
    
    
                    handleFailureImmediately(packages, failureReason);
                } else {
    
    
                    for (int pIndex = 0; pIndex < packages.size(); pIndex++) {
    
    
                        VersionedPackage versionedPackage = packages.get(pIndex);
                        // Observer that will receive failure for versionedPackage
                        PackageHealthObserver currentObserverToNotify = null;
                        int currentObserverImpact = Integer.MAX_VALUE;

                        // Find observer with least user impact
                        for (int oIndex = 0; oIndex < mAllObservers.size(); oIndex++) {
    
    
                            ObserverInternal observer = mAllObservers.valueAt(oIndex);
                            PackageHealthObserver registeredObserver = observer.registeredObserver;
                            if (registeredObserver != null
                                    && observer.onPackageFailureLocked(
                                    versionedPackage.getPackageName())) {
    
    
                                int impact = registeredObserver.onHealthCheckFailed(
                                        versionedPackage, failureReason);
                                if (impact != PackageHealthObserverImpact.USER_IMPACT_NONE
                                        && impact < currentObserverImpact) {
    
    
                                    currentObserverToNotify = registeredObserver;
                                    currentObserverImpact = impact;
                                }
                            }
                        }

                        // Execute action with least user impact
                        if (currentObserverToNotify != null) {
    
    
                            currentObserverToNotify.execute(versionedPackage, failureReason);
                        }
                    }
                }
            }
        });
    }

在Crash的原因为Native_Crash和FAILURE_REASON_EXPLICIT_HEALTH_CHECK时,将会调用RollBack进行处理,但是其余的情况,将会进行进一步的通知。我们这里注意的是非RollBack的处理:

                    for (int pIndex = 0; pIndex < packages.size(); pIndex++) {
    
    
                        VersionedPackage versionedPackage = packages.get(pIndex);
                        // Observer that will receive failure for versionedPackage
                        PackageHealthObserver currentObserverToNotify = null;
                        int currentObserverImpact = Integer.MAX_VALUE;

                        // Find observer with least user impact
                        for (int oIndex = 0; oIndex < mAllObservers.size(); oIndex++) {
    
    
                            ObserverInternal observer = mAllObservers.valueAt(oIndex);
                            PackageHealthObserver registeredObserver = observer.registeredObserver;
                            if (registeredObserver != null
                                    && observer.onPackageFailureLocked(
                                    versionedPackage.getPackageName())) {
    
    
                                int impact = registeredObserver.onHealthCheckFailed(
                                        versionedPackage, failureReason);
                                if (impact != PackageHealthObserverImpact.USER_IMPACT_NONE
                                        && impact < currentObserverImpact) {
    
    
                                    currentObserverToNotify = registeredObserver;
                                    currentObserverImpact = impact;
                                }
                            }
                        }

                        // Execute action with least user impact
                        if (currentObserverToNotify != null) {
    
    
                            currentObserverToNotify.execute(versionedPackage, failureReason);
                        }
                    }

这里首先会注册PackageHealthObserver,然后调用相应的execute的函数:

// Execute action with least user impact
if (currentObserverToNotify != null) {
    
    
    currentObserverToNotify.execute(versionedPackage, failureReason);
}

而我们救援模式的实现RescueParty,里面也继承并实现了PackageHealthObserver。

    /**
     * Handle mitigation action for package failures. This observer will be register to Package
     * Watchdog and will receive calls about package failures. This observer is persistent so it
     * may choose to mitigate failures for packages it has not explicitly asked to observe.
     */
    public static class RescuePartyObserver implements PackageHealthObserver {
    
    


        @Override
        public boolean execute(@Nullable VersionedPackage failedPackage,
                @FailureReasons int failureReason) {
    
    
            if (isDisabled()) {
    
    
                return false;
            }
            if (failureReason == PackageWatchdog.FAILURE_REASON_APP_CRASH
                    || failureReason == PackageWatchdog.FAILURE_REASON_APP_NOT_RESPONDING) {
    
    
                int triggerUid = getPackageUid(mContext, failedPackage.getPackageName());
                incrementRescueLevel(triggerUid);
                executeRescueLevel(mContext,
                        failedPackage == null ? null : failedPackage.getPackageName());
                return true;
            } else {
    
    
                return false;
            }
        }
    }

incrementRescueLevel的实现主要是去调整救援的等级;
executeRescueLevel是去执行救援操作

    /**
     * Escalate to the next rescue level. After incrementing the level you'll
     * probably want to call {@link #executeRescueLevel(Context, String)}.
     */
    private static void incrementRescueLevel(int triggerUid) {
    
    
        final int level = getNextRescueLevel();
        SystemProperties.set(PROP_RESCUE_LEVEL, Integer.toString(level));

        EventLogTags.writeRescueLevel(level, triggerUid);
        logCriticalInfo(Log.WARN, "Incremented rescue level to "
                + levelToString(level) + " triggered by UID " + triggerUid);
    }

incrementRescueLevel是去调用getNextRescueLevel来进行计数;

    /**
     * Get the next rescue level. This indicates the next level of mitigation that may be taken.
     */
    private static int getNextRescueLevel() {
    
    
        return MathUtils.constrain(SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE) + 1,
                LEVEL_NONE, LEVEL_FACTORY_RESET);
    }

实现原理也很简单,每次对于计数+1.

    private static void executeRescueLevel(Context context, @Nullable String failedPackage) {
    
    
        final int level = SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE);
        if (level == LEVEL_NONE) return;

        Slog.w(TAG, "Attempting rescue level " + levelToString(level));
        try {
    
    
            executeRescueLevelInternal(context, level, failedPackage);
            EventLogTags.writeRescueSuccess(level);
            logCriticalInfo(Log.DEBUG,
                    "Finished rescue level " + levelToString(level));
        } catch (Throwable t) {
    
    
            logRescueException(level, t);
        }
    }

executeRescueLevel函数则是将当前的level和failedPackage进行传递,到executeRescueLevelInternal进行实现。

    private static void executeRescueLevelInternal(Context context, int level, @Nullable
            String failedPackage) throws Exception {
    
    
        FrameworkStatsLog.write(FrameworkStatsLog.RESCUE_PARTY_RESET_REPORTED, level);
        switch (level) {
    
    
            case LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS:
                resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_DEFAULTS, failedPackage);
                break;
            case LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES:
                resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_CHANGES, failedPackage);
                break;
            case LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS:
                resetAllSettings(context, Settings.RESET_MODE_TRUSTED_DEFAULTS, failedPackage);
                break;
            case LEVEL_FACTORY_RESET:
                // Request the reboot from a separate thread to avoid deadlock on PackageWatchdog
                // when device shutting down.
                Runnable runnable = new Runnable() {
    
    
                    @Override
                    public void run() {
    
    
                        try {
    
    
                            RecoverySystem.rebootPromptAndWipeUserData(context, TAG);
                        } catch (Throwable t) {
    
    
                            logRescueException(level, t);
                        }
                    }
                };
                Thread thread = new Thread(runnable);
                thread.start();
                break;
        }
    }

在FactoryReset之前,进行的都是resetAllSettings的操作。

    private static void resetAllSettings(Context context, int mode, @Nullable String failedPackage)
            throws Exception {
    
    
        // Try our best to reset all settings possible, and once finished
        // rethrow any exception that we encountered
        Exception res = null;
        final ContentResolver resolver = context.getContentResolver();
        try {
    
    
            resetDeviceConfig(context, mode, failedPackage);
        } catch (Exception e) {
    
    
            res = new RuntimeException("Failed to reset config settings", e);
        }
        try {
    
    
            Settings.Global.resetToDefaultsAsUser(resolver, null, mode, UserHandle.USER_SYSTEM);
        } catch (Exception e) {
    
    
            res = new RuntimeException("Failed to reset global settings", e);
        }
        for (int userId : getAllUserIds()) {
    
    
            try {
    
    
                Settings.Secure.resetToDefaultsAsUser(resolver, null, mode, userId);
            } catch (Exception e) {
    
    
                res = new RuntimeException("Failed to reset secure settings for " + userId, e);
            }
        }
        if (res != null) {
    
    
            throw res;
        }
    }

系统Factory Reset级别救援实现

当触发FactoryReset的条件时, 也就是到达五次的时候,会进入下面的操作:

                // Request the reboot from a separate thread to avoid deadlock on PackageWatchdog
                // when device shutting down.
                Runnable runnable = new Runnable() {
    
    
                    @Override
                    public void run() {
    
    
                        try {
    
    
                            RecoverySystem.rebootPromptAndWipeUserData(context, TAG);
                        } catch (Throwable t) {
    
    
                            logRescueException(level, t);
                        }
                    }
                };
                Thread thread = new Thread(runnable);
                thread.start();
                break;

将会调用RecoverySystem.rebootPromptAndWipeUserData来进行FactoryReset的操作。
也就是进入Factory Reset的界面了。

猜你喜欢

转载自blog.csdn.net/ChaoY1116/article/details/109642564