引言
现在的一个Android设备出货,比如手机,平板和车机,都肯定是经过了很多次的测试。
软件的品质起码是有一个基本的保障。
但是有个实际情况是,当手机在市场上面发售以后,测试是没有办法模拟出来用户的所有操作的。
市场上的消费者包括小白用户,当手机出现各种异常时,用户只能通过设备商售后处理。
而现在售后一般对ROOT,和自己烧一些不是官方发布的软件版本是不保修的。
Android考虑到了这一点,所以增加了救援模式的功能。
可以在严重时,提供给用户恢复出厂设置的选项。
这也就是本文分析的内容。
救援级别
针对不同问题的严重级别,系统定制了不同的救援等级,说明如下:
@VisibleForTesting
static final int LEVEL_NONE = 0;
@VisibleForTesting
static final int LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS = 1;
@VisibleForTesting
static final int LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES = 2;
@VisibleForTesting
static final int LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS = 3;
@VisibleForTesting
static final int LEVEL_FACTORY_RESET = 4;
我们可以看到,从0 -> 4其实就是随着严重的等级不断的提升,到了4,其实就是factory的操作。
APP级别救援实现
流程图如下:
我们来看下具体的实现过程:
PWD:frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
/**
* Handle application death from an uncaught exception. The framework
* catches these for the main threads, so this should only matter for
* threads created by applications. Before this method runs, the given
* instance of {@link LoggingHandler} should already have logged details
* (and if not it is run first).
*/
private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
private final LoggingHandler mLoggingHandler;
@Override
public void uncaughtException(Thread t, Throwable e) {
try {
ensureLogging(t, e);
// Don't re-enter -- avoid infinite loops if crash-reporting crashes.
if (mCrashing) return;
mCrashing = true;
// Try to end profiling. If a profiler is running at this point, and we kill the
// process (below), the in-memory buffer will be lost. So try to stop, which will
// flush the buffer. (This makes method trace profiling useful to debug crashes.)
if (ActivityThread.currentActivityThread() != null) {
ActivityThread.currentActivityThread().stopProfiling();
}
// Bring up crash dialog, wait for it to be dismissed
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
} catch (Throwable t2) {
if (t2 instanceof DeadObjectException) {
// System process is dead; ignore
} else {
try {
Clog_e(TAG, "Error reporting crash", t2);
} catch (Throwable t3) {
// Even Clog_e() fails! Oh well.
}
}
} finally {
// Try everything to make sure this process goes away.
Process.killProcess(Process.myPid());
System.exit(10);
}
}
KillApplicationHandler是一个内部类,我们这边只截取了一个方法KillApplicationHandler。
当APP出现异常,被Kill掉后,会进入到该方法中去进行处理。
这里会调用ActivityManager.getService().handleApplicationCrash来进行后续的处理。
PWD:frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
/**
* Used by {@link com.android.internal.os.RuntimeInit} to report when an application crashes.
* The application process will exit immediately after this call returns.
* @param app object of the crashing app, null for the system server
* @param crashInfo describing the exception
*/
public void handleApplicationCrash(IBinder app,
ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
ProcessRecord r = findAppProcess(app, "Crash");
final String processName = app == null ? "system_server"
: (r == null ? "unknown" : r.processName);
handleApplicationCrashInner("crash", r, processName, crashInfo);
}
这个注释也很有意思:
Used by {@link com.android.internal.os.RuntimeInit} to report when an application crashes.
然后就去将Crash的ProcessName,和CrashInfo去通过handleApplicationCrashInner进行处理。
/* Native crash reporting uses this inner version because it needs to be somewhat
* decoupled from the AM-managed cleanup lifecycle
*/
void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
ApplicationErrorReport.CrashInfo crashInfo) {
EventLogTags.writeAmCrash(Binder.getCallingPid(),
UserHandle.getUserId(Binder.getCallingUid()), processName,
r == null ? -1 : r.info.flags,
crashInfo.exceptionClassName,
crashInfo.exceptionMessage,
crashInfo.throwFileName,
crashInfo.throwLineNumber);
FrameworkStatsLog.write(FrameworkStatsLog.APP_CRASH_OCCURRED,
Binder.getCallingUid(),
eventType,
processName,
Binder.getCallingPid(),
(r != null && r.info != null) ? r.info.packageName : "",
(r != null && r.info != null) ? (r.info.isInstantApp()
? FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__TRUE
: FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__FALSE)
: FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__UNAVAILABLE,
r != null ? (r.isInterestingToUserLocked()
? FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__FOREGROUND
: FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__BACKGROUND)
: FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__UNKNOWN,
processName.equals("system_server") ? ServerProtoEnums.SYSTEM_SERVER
: (r != null) ? r.getProcessClassEnum()
: ServerProtoEnums.ERROR_SOURCE_UNKNOWN
);
final int relaunchReason = r == null ? RELAUNCH_REASON_NONE
: r.getWindowProcessController().computeRelaunchReason();
final String relaunchReasonString = relaunchReasonToString(relaunchReason);
if (crashInfo.crashTag == null) {
crashInfo.crashTag = relaunchReasonString;
} else {
crashInfo.crashTag = crashInfo.crashTag + " " + relaunchReasonString;
}
addErrorToDropBox(
eventType, r, processName, null, null, null, null, null, null, crashInfo);
mAppErrors.crashApplication(r, crashInfo);
}
addErrorToDropBox函数如果熟悉android Log系统的同学,都会知道这个是一个非常重要的Error处理函数。
这个我们会在后续Log的分析文章中,进行专门的说明。
这里我们关心的是mAppErrors.crashApplication(r, crashInfo);
/**
* Bring up the "unexpected error" dialog box for a crashing app.
* Deal with edge cases (intercepts from instrumented applications,
* ActivityController, error intent receivers, that sort of thing).
* @param r the application crashing
* @param crashInfo describing the failure
*/
void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
final int callingPid = Binder.getCallingPid();
final int callingUid = Binder.getCallingUid();
final long origId = Binder.clearCallingIdentity();
try {
crashApplicationInner(r, crashInfo, callingPid, callingUid);
} finally {
Binder.restoreCallingIdentity(origId);
}
}
看下CrashApplicationInner的实现:
void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
int callingPid, int callingUid) {
long timeMillis = System.currentTimeMillis();
String shortMsg = crashInfo.exceptionClassName;
String longMsg = crashInfo.exceptionMessage;
String stackTrace = crashInfo.stackTrace;
if (shortMsg != null && longMsg != null) {
longMsg = shortMsg + ": " + longMsg;
} else if (shortMsg != null) {
longMsg = shortMsg;
}
if (r != null) {
mPackageWatchdog.onPackageFailure(r.getPackageListWithVersionCode(),
PackageWatchdog.FAILURE_REASON_APP_CRASH);
mService.mProcessList.noteAppKill(r, (crashInfo != null
&& "Native crash".equals(crashInfo.exceptionClassName))
? ApplicationExitInfo.REASON_CRASH_NATIVE
: ApplicationExitInfo.REASON_CRASH,
ApplicationExitInfo.SUBREASON_UNKNOWN,
"crash");
}
final int relaunchReason = r != null
? r.getWindowProcessController().computeRelaunchReason() : RELAUNCH_REASON_NONE;
AppErrorResult result = new AppErrorResult();
int taskId;
synchronized (mService) {
/**
* If crash is handled by instance of {@link android.app.IActivityController},
* finish now and don't show the app error dialog.
*/
if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,
timeMillis, callingPid, callingUid)) {
return;
}
// Suppress crash dialog if the process is being relaunched due to a crash during a free
// resize.
if (relaunchReason == RELAUNCH_REASON_FREE_RESIZE) {
return;
}
/**
* If this process was running instrumentation, finish now - it will be handled in
* {@link ActivityManagerService#handleAppDiedLocked}.
*/
if (r != null && r.getActiveInstrumentation() != null) {
return;
}
// Log crash in battery stats.
if (r != null) {
mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid);
}
AppErrorDialog.Data data = new AppErrorDialog.Data();
data.result = result;
data.proc = r;
// If we can't identify the process or it's already exceeded its crash quota,
// quit right away without showing a crash dialog.
if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {
return;
}
final Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;
taskId = data.taskId;
msg.obj = data;
mService.mUiHandler.sendMessage(msg);
}
int res = result.get();
Intent appErrorIntent = null;
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_CRASH, res);
if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {
res = AppErrorDialog.FORCE_QUIT;
}
synchronized (mService) {
if (res == AppErrorDialog.MUTE) {
stopReportingCrashesLocked(r);
}
if (res == AppErrorDialog.RESTART) {
mService.mProcessList.removeProcessLocked(r, false, true,
ApplicationExitInfo.REASON_CRASH, "crash");
if (taskId != INVALID_TASK_ID) {
try {
mService.startActivityFromRecents(taskId,
ActivityOptions.makeBasic().toBundle());
} catch (IllegalArgumentException e) {
// Hmm...that didn't work. Task should either be in recents or associated
// with a stack.
Slog.e(TAG, "Could not restart taskId=" + taskId, e);
}
}
}
if (res == AppErrorDialog.FORCE_QUIT) {
long orig = Binder.clearCallingIdentity();
try {
// Kill it with fire!
mService.mAtmInternal.onHandleAppCrash(r.getWindowProcessController());
if (!r.isPersistent()) {
mService.mProcessList.removeProcessLocked(r, false, false,
ApplicationExitInfo.REASON_CRASH, "crash");
mService.mAtmInternal.resumeTopActivities(false /* scheduleIdle */);
}
} finally {
Binder.restoreCallingIdentity(orig);
}
}
if (res == AppErrorDialog.APP_INFO) {
appErrorIntent = new Intent(Settings.ACTION_APPLICATION_DETAILS_SETTINGS);
appErrorIntent.setData(Uri.parse("package:" + r.info.packageName));
appErrorIntent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
}
if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {
appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);
}
if (r != null && !r.isolated && res != AppErrorDialog.RESTART) {
// XXX Can't keep track of crash time for isolated processes,
// since they don't have a persistent identity.
mProcessCrashTimes.put(r.info.processName, r.uid,
SystemClock.uptimeMillis());
}
}
if (appErrorIntent != null) {
try {
mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));
} catch (ActivityNotFoundException e) {
Slog.w(TAG, "bug report receiver dissappeared", e);
}
}
}
在出现Crash的情况下,将会调用mPackageWatchdog的onPackageFailure函数。
mPackageWatchdog.onPackageFailure(r.getPackageListWithVersionCode(),
PackageWatchdog.FAILURE_REASON_APP_CRASH);
onPackageFailure的实现如下:
/**
* Called when a process fails due to a crash, ANR or explicit health check.
*
* <p>For each package contained in the process, one registered observer with the least user
* impact will be notified for mitigation.
*
* <p>This method could be called frequently if there is a severe problem on the device.
*/
public void onPackageFailure(List<VersionedPackage> packages,
@FailureReasons int failureReason) {
if (packages == null) {
Slog.w(TAG, "Could not resolve a list of failing packages");
return;
}
mLongTaskHandler.post(() -> {
synchronized (mLock) {
if (mAllObservers.isEmpty()) {
return;
}
boolean requiresImmediateAction = (failureReason == FAILURE_REASON_NATIVE_CRASH
|| failureReason == FAILURE_REASON_EXPLICIT_HEALTH_CHECK);
if (requiresImmediateAction) {
handleFailureImmediately(packages, failureReason);
} else {
for (int pIndex = 0; pIndex < packages.size(); pIndex++) {
VersionedPackage versionedPackage = packages.get(pIndex);
// Observer that will receive failure for versionedPackage
PackageHealthObserver currentObserverToNotify = null;
int currentObserverImpact = Integer.MAX_VALUE;
// Find observer with least user impact
for (int oIndex = 0; oIndex < mAllObservers.size(); oIndex++) {
ObserverInternal observer = mAllObservers.valueAt(oIndex);
PackageHealthObserver registeredObserver = observer.registeredObserver;
if (registeredObserver != null
&& observer.onPackageFailureLocked(
versionedPackage.getPackageName())) {
int impact = registeredObserver.onHealthCheckFailed(
versionedPackage, failureReason);
if (impact != PackageHealthObserverImpact.USER_IMPACT_NONE
&& impact < currentObserverImpact) {
currentObserverToNotify = registeredObserver;
currentObserverImpact = impact;
}
}
}
// Execute action with least user impact
if (currentObserverToNotify != null) {
currentObserverToNotify.execute(versionedPackage, failureReason);
}
}
}
}
});
}
在Crash的原因为Native_Crash和FAILURE_REASON_EXPLICIT_HEALTH_CHECK时,将会调用RollBack进行处理,但是其余的情况,将会进行进一步的通知。我们这里注意的是非RollBack的处理:
for (int pIndex = 0; pIndex < packages.size(); pIndex++) {
VersionedPackage versionedPackage = packages.get(pIndex);
// Observer that will receive failure for versionedPackage
PackageHealthObserver currentObserverToNotify = null;
int currentObserverImpact = Integer.MAX_VALUE;
// Find observer with least user impact
for (int oIndex = 0; oIndex < mAllObservers.size(); oIndex++) {
ObserverInternal observer = mAllObservers.valueAt(oIndex);
PackageHealthObserver registeredObserver = observer.registeredObserver;
if (registeredObserver != null
&& observer.onPackageFailureLocked(
versionedPackage.getPackageName())) {
int impact = registeredObserver.onHealthCheckFailed(
versionedPackage, failureReason);
if (impact != PackageHealthObserverImpact.USER_IMPACT_NONE
&& impact < currentObserverImpact) {
currentObserverToNotify = registeredObserver;
currentObserverImpact = impact;
}
}
}
// Execute action with least user impact
if (currentObserverToNotify != null) {
currentObserverToNotify.execute(versionedPackage, failureReason);
}
}
这里首先会注册PackageHealthObserver,然后调用相应的execute的函数:
// Execute action with least user impact
if (currentObserverToNotify != null) {
currentObserverToNotify.execute(versionedPackage, failureReason);
}
而我们救援模式的实现RescueParty,里面也继承并实现了PackageHealthObserver。
/**
* Handle mitigation action for package failures. This observer will be register to Package
* Watchdog and will receive calls about package failures. This observer is persistent so it
* may choose to mitigate failures for packages it has not explicitly asked to observe.
*/
public static class RescuePartyObserver implements PackageHealthObserver {
@Override
public boolean execute(@Nullable VersionedPackage failedPackage,
@FailureReasons int failureReason) {
if (isDisabled()) {
return false;
}
if (failureReason == PackageWatchdog.FAILURE_REASON_APP_CRASH
|| failureReason == PackageWatchdog.FAILURE_REASON_APP_NOT_RESPONDING) {
int triggerUid = getPackageUid(mContext, failedPackage.getPackageName());
incrementRescueLevel(triggerUid);
executeRescueLevel(mContext,
failedPackage == null ? null : failedPackage.getPackageName());
return true;
} else {
return false;
}
}
}
incrementRescueLevel的实现主要是去调整救援的等级;
executeRescueLevel是去执行救援操作
/**
* Escalate to the next rescue level. After incrementing the level you'll
* probably want to call {@link #executeRescueLevel(Context, String)}.
*/
private static void incrementRescueLevel(int triggerUid) {
final int level = getNextRescueLevel();
SystemProperties.set(PROP_RESCUE_LEVEL, Integer.toString(level));
EventLogTags.writeRescueLevel(level, triggerUid);
logCriticalInfo(Log.WARN, "Incremented rescue level to "
+ levelToString(level) + " triggered by UID " + triggerUid);
}
incrementRescueLevel是去调用getNextRescueLevel来进行计数;
/**
* Get the next rescue level. This indicates the next level of mitigation that may be taken.
*/
private static int getNextRescueLevel() {
return MathUtils.constrain(SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE) + 1,
LEVEL_NONE, LEVEL_FACTORY_RESET);
}
实现原理也很简单,每次对于计数+1.
private static void executeRescueLevel(Context context, @Nullable String failedPackage) {
final int level = SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE);
if (level == LEVEL_NONE) return;
Slog.w(TAG, "Attempting rescue level " + levelToString(level));
try {
executeRescueLevelInternal(context, level, failedPackage);
EventLogTags.writeRescueSuccess(level);
logCriticalInfo(Log.DEBUG,
"Finished rescue level " + levelToString(level));
} catch (Throwable t) {
logRescueException(level, t);
}
}
executeRescueLevel函数则是将当前的level和failedPackage进行传递,到executeRescueLevelInternal进行实现。
private static void executeRescueLevelInternal(Context context, int level, @Nullable
String failedPackage) throws Exception {
FrameworkStatsLog.write(FrameworkStatsLog.RESCUE_PARTY_RESET_REPORTED, level);
switch (level) {
case LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS:
resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_DEFAULTS, failedPackage);
break;
case LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES:
resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_CHANGES, failedPackage);
break;
case LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS:
resetAllSettings(context, Settings.RESET_MODE_TRUSTED_DEFAULTS, failedPackage);
break;
case LEVEL_FACTORY_RESET:
// Request the reboot from a separate thread to avoid deadlock on PackageWatchdog
// when device shutting down.
Runnable runnable = new Runnable() {
@Override
public void run() {
try {
RecoverySystem.rebootPromptAndWipeUserData(context, TAG);
} catch (Throwable t) {
logRescueException(level, t);
}
}
};
Thread thread = new Thread(runnable);
thread.start();
break;
}
}
在FactoryReset之前,进行的都是resetAllSettings的操作。
private static void resetAllSettings(Context context, int mode, @Nullable String failedPackage)
throws Exception {
// Try our best to reset all settings possible, and once finished
// rethrow any exception that we encountered
Exception res = null;
final ContentResolver resolver = context.getContentResolver();
try {
resetDeviceConfig(context, mode, failedPackage);
} catch (Exception e) {
res = new RuntimeException("Failed to reset config settings", e);
}
try {
Settings.Global.resetToDefaultsAsUser(resolver, null, mode, UserHandle.USER_SYSTEM);
} catch (Exception e) {
res = new RuntimeException("Failed to reset global settings", e);
}
for (int userId : getAllUserIds()) {
try {
Settings.Secure.resetToDefaultsAsUser(resolver, null, mode, userId);
} catch (Exception e) {
res = new RuntimeException("Failed to reset secure settings for " + userId, e);
}
}
if (res != null) {
throw res;
}
}
系统Factory Reset级别救援实现
当触发FactoryReset的条件时, 也就是到达五次的时候,会进入下面的操作:
// Request the reboot from a separate thread to avoid deadlock on PackageWatchdog
// when device shutting down.
Runnable runnable = new Runnable() {
@Override
public void run() {
try {
RecoverySystem.rebootPromptAndWipeUserData(context, TAG);
} catch (Throwable t) {
logRescueException(level, t);
}
}
};
Thread thread = new Thread(runnable);
thread.start();
break;
将会调用RecoverySystem.rebootPromptAndWipeUserData来进行FactoryReset的操作。
也就是进入Factory Reset的界面了。