探究ANR原理-是谁控制了ANR的触发时间

默认标题_公众号封面首图_2022-07-17+14_01_36.png

一、ANR概述

ANR(Application Not responding)是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。

1.1、Anr的触发时机

anr的触发时机有很多种,基本分为以下几点:

  • 输入事件:5s内无响应,如屏幕触碰事件;
  • 广播:执行前台广播(BroadcastReceiver)的onReceive()方法时10s没有处理完成,后台广播的超时时间为60s;
  • Service:前台Service20s内没有执行完毕;后台Service200s内没有执行完毕;
  • ContentProvider:10s内ContentProvider的publish未执行完;

从上面4点来看,每一种触发的时机都是在规定的时间内,看你消不消费得完自身的事件,而本文所探讨的问题就是,这些时间是由谁来制定,并且Anr的核心点在哪里。

二、Service的ANR

前台Service20s内没有执行完毕,后台Service200s内没有执行完毕,就会触发ANR。

开启service有两种方式,一个是startService一个是bindService,本文从startService开始分析,我们可以先看看service的启动流程。

uml_1-第 2 页.png

简单概括,AMS开启了service服务,传递到ActiveServices,ActiveServices是协助AMS完成启动、绑定等逻辑操作,在ActiveService的realStartServiceLocked方法中创建了service对象,开启了一个service的onCreate()。

service的ANR在于时间的控制,那开始计时的逻辑怎么看?

2.1、计时开始

从启动service的流程中发现,service启动计时的逻辑是从ActiveServices.realStartServiceLocked() 开始。

private void realStartServiceLocked(ServiceRecord r, ProcessRecord app,
        IApplicationThread thread, int pid, UidRecord uidRecord, boolean execInFg,
        boolean enqueueOomAdj) throws RemoteException {
  
    bumpServiceExecutingLocked(r, execInFg, "create", null /* oomAdjReason */);
    mAm.updateLruProcessLocked(app, false, null);
    updateServiceForegroundLocked(psr, /* oomAdj= */ false);
    
    thread.scheduleCreateService(r, r.serviceInfo,
                mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo),
                app.mState.getReportedProcState());
    
    requestServiceBindingsLocked(r, execInFg);

    updateServiceClientActivitiesLocked(psr, null, true);

    sendServiceArgsLocked(r, execInFg, true);
}

开始计时的操作bumpServiceExecutingLocked方法中

private boolean bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why,
        @Nullable String oomAdjReason) {
    
        ...
     if (timeoutNeeded) {
         scheduleServiceTimeoutLocked(r.app);
     }

}

抛出handler开始计时


void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.mServices.numberOfExecutingServices() == 0 || proc.getThread() == null) {
        return;
    }
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
    mAm.mHandler.sendMessageDelayed(msg, proc.mServices.shouldExecServicesFg()
            ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}

从scheduleServiceTimeoutLocked方法中可以看到,内部发出了一个延时的消息,延时的时间由proc.mServices.shouldExecServicesFg()来判断,而proc.mServices.shouldExecServicesFg()指代的是是否在前台执行服务,也就是我们经常说的前台服务与后台服务的判断,

当为前台服务时,delay的时间为SERVICE_TIMEOUT = 20s

static final int SERVICE_TIMEOUT = 20 * 1000 * Build.HW_TIMEOUT_MULTIPLIER;

当为后台服务时,SERVICE_BACKGROUND_TIMEOUT = 20 * 10 = 200s

static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

我们再来看看Handler中msg=SERVICE_TIMEOUT_MSG时发生了什么?

final class MainHandler extends Handler {
    
    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
        
        case SERVICE_TIMEOUT_MSG: {
            mServices.serviceTimeout((ProcessRecord) msg.obj);
        } break;
        
}
void serviceTimeout(ProcessRecord proc) {
    String anrMessage = null;
    synchronized(mAm) {
        
        final ProcessServiceRecord psr = proc.mServices;
        
        final long now = SystemClock.uptimeMillis();
        //计算当前时间减去服务超时时间
        final long maxTime =  now -
                (psr.shouldExecServicesFg() ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        ServiceRecord timeout = null;
        long nextTime = 0;
        
        //遍历执行service列表,找出超时的service
        for (int i = psr.numberOfExecutingServices() - 1; i >= 0; i--) {
            ServiceRecord sr = psr.getExecutingServiceAt(i);
            if (sr.executingStart < maxTime) {
                timeout = sr;
                break;
            }
            //重新计时
            if (sr.executingStart > nextTime) {
                nextTime = sr.executingStart;
            }
        }
        if (timeout != null && mAm.mProcessList.isInLruListLOSP(proc)) {
        //该服务已超时,记录anr的一些信息
            Slog.w(TAG, "Timeout executing service: " + timeout);
            StringWriter sw = new StringWriter();
            PrintWriter pw = new FastPrintWriter(sw, false, 1024);
            pw.println(timeout);
            timeout.dump(pw, "    ");
            pw.close();
            mLastAnrDump = sw.toString();
            mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
            mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
            anrMessage = "executing service " + timeout.shortInstanceName;
        } else {
        //服务未超时,则重新监控下一个service是否超时
            Message msg = mAm.mHandler.obtainMessage(
                    ActivityManagerService.SERVICE_TIMEOUT_MSG);
            msg.obj = proc;
            mAm.mHandler.sendMessageAtTime(msg, psr.shouldExecServicesFg()
                    ? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
        }
    }

//将根据需要创建一个独立的线程,开始向用户报告ANR的相关信息
    if (anrMessage != null) {
        mAm.mAnrHelper.appNotResponding(proc, anrMessage);
    }
}

最后回进入到 ActiveServices.serviceTimeout()方法中,基本逻辑如下:

  1. 计算当前时间减去服务超时时间;
  2. 遍历服务列表,找出超时的服务;
  3. 如果该服务超时,记录anr信息;如果没有超时,则重新健康下一个service是否超时;
  4. 超时的service,会更加需要创建一个独立进程,开始向用户报告ANR的相关信息;

有开始ANRmsg,必然也有结束计时的逻辑,我们且往下屡屡。

2.2、计时结束

在调用bumpServiceExecutingLocked方法开启计时,接着就开始调用 scheduleCreateService开始服务的创建;

#ActivityThread

public final void scheduleCreateService(IBinder token,
        ServiceInfo info, CompatibilityInfo compatInfo, int processState) {
    ...
    sendMessage(H.CREATE_SERVICE, s);
}


case CREATE_SERVICE:
    handleCreateService((CreateServiceData)msg.obj);
    break;

private void handleCreateService(CreateServiceData data) {
    
    Service service = null;
    try {
        ...
        service = packageInfo.getAppFactory()
                .instantiateService(cl, data.info.name, data.intent);
        ...
        service.onCreate();
        ...
        ActivityManager.getService().serviceDoneExecuting(
            data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
  
    }
}

在handleCreateService中service开始回调onCreate(),也就是正式开启了一个service,而在开启service之后,AMS就开始结束service的计时。

private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
        boolean finishing, boolean enqueueOomAdj) {
        ...
        mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);

}

利用Handler的removeMessage移除SERVICE_TIMEOUT_MSG消息,结束ANR的计时。

三、Broadcast的ANR

从广播的发送开始,利用Context来sendBroadcast,传递到AMS,同servcice的原理一样,有一个BroadcastQueue来协助AMS处理前台和后台广播。具体流程可以看下面的UML图。

uml_1-第 3 页.png

3.1、计时开始

关于ANR计时开始的逻辑从AMS的broadcastIntentLocked()方法就开始了,

final int broadcastIntentLocked() {

                     
     //处理并行广播
     //
    int NR = registeredReceivers != null ? registeredReceivers.size() : 0;
    if (!ordered && NR > 0) {
        //根据标志位FLAG_RECEIVER_FOREGROUND决定返回前台广播队列还是后台广播队列
        final BroadcastQueue queue = broadcastQueueForIntent(intent);
        BroadcastRecord r = new BroadcastRecord(queue, intent, callerApp, callerPackage,
                callerFeatureId, callingPid, callingUid, callerInstantApp, resolvedType,
                requiredPermissions, excludedPermissions, appOp, brOptions, registeredReceivers,
                resultTo, resultCode, resultData, resultExtras, ordered, sticky, false, userId,
                allowBackgroundActivityStarts, backgroundActivityStartsToken,
                timeoutExempt);
        if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST, "Enqueueing parallel broadcast " + r);
        final boolean replaced = replacePending
                && (queue.replaceParallelBroadcastLocked(r) != null);
        // Note: We assume resultTo is null for non-ordered broadcasts.
        if (!replaced) {
            queue.enqueueParallelBroadcastLocked(r);
            queue.scheduleBroadcastsLocked();
        }
        registeredReceivers = null;
        NR = 0;
    }

    
    //处理串行广播

    if ((receivers != null && receivers.size() > 0)
            || resultTo != null) {
        BroadcastQueue queue = broadcastQueueForIntent(intent);
        BroadcastRecord r = new BroadcastRecord(queue, intent, callerApp, callerPackage,
                callerFeatureId, callingPid, callingUid, callerInstantApp, resolvedType,
                requiredPermissions, excludedPermissions, appOp, brOptions,
                receivers, resultTo, resultCode, resultData, resultExtras,
                ordered, sticky, false, userId, allowBackgroundActivityStarts,
                backgroundActivityStartsToken, timeoutExempt);

        if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST, "Enqueueing ordered broadcast " + r);

        final BroadcastRecord oldRecord =
                replacePending ? queue.replaceOrderedBroadcastLocked(r) : null;
        if (oldRecord != null) {
            // Replaced, fire the result-to receiver.
            if (oldRecord.resultTo != null) {
                final BroadcastQueue oldQueue = broadcastQueueForIntent(oldRecord.intent);
                try {
                    oldQueue.performReceiveLocked(oldRecord.callerApp, oldRecord.resultTo,
                            oldRecord.intent,
                            Activity.RESULT_CANCELED, null, null,
                            false, false, oldRecord.userId);
                } 
            }
        } else {
            queue.enqueueOrderedBroadcastLocked(r);
            queue.scheduleBroadcastsLocked();
        }
    } else {
        // There was nobody interested in the broadcast, but we still want to record
        // that it happened.
        if (intent.getComponent() == null && intent.getPackage() == null
                && (intent.getFlags()&Intent.FLAG_RECEIVER_REGISTERED_ONLY) == 0) {
            // This was an implicit broadcast... let's record it for posterity.
            addBroadcastStatLocked(intent.getAction(), callerPackage, 0, 0, 0);
        }
    }

    return ActivityManager.BROADCAST_SUCCESS;
}

根据标识位来区分是否时前台广播还是后台广播

BroadcastQueue broadcastQueueForIntent(Intent intent) {
    final boolean isFg = (intent.getFlags() & Intent.FLAG_RECEIVER_FOREGROUND) != 0;
   
    return (isFg) ? mFgBroadcastQueue : mBgBroadcastQueue;
}

拿到前台或者后台的广播队列后,调用scheduleBroadcastsLocked方法开始Handler消息的分发,也就是ANR计时开始,

//前台广播超时时间 = 10s;后台广播 = 60s
static final int BROADCAST_FG_TIMEOUT = 10 * 1000 * Build.HW_TIMEOUT_MULTIPLIER;
static final int BROADCAST_BG_TIMEOUT = 60 * 1000 * Build.HW_TIMEOUT_MULTIPLIER;
//设置超时时间
long timeoutTime = r.receiverTime + mConstants.TIMEOUT;
setBroadcastTimeoutLocked(timeoutTime);

最后在handler发送超时时间的消息后,还是会在AnrHelper中发送ANR的相关信息。

3.2、计时结束

有handler发送消息的地方,也就有取消message的地方,在上面计时开始的流程中,processNextBroadcastLocked方法中在处理串行广播时,如果判断没有超时,就取消message。

do {
        cancelBroadcastTimeoutLocked();
        // ... and on to the next...
        addBroadcastToHistoryLocked(r);
 
              
} while (r == null);
final void cancelBroadcastTimeoutLocked() {
    if (mPendingBroadcastTimeoutMessage) {
        mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
        mPendingBroadcastTimeoutMessage = false;
    }
}

四、ContentProvider的ANR

uml_1-第 4 页.png

ContentProvider的创建和启动一般是在进程启动的时候就开始,我们在创建一个进程的时候都会调用AMS的attachApplication()方法,在这个方法中,会拿到ProvideInfo信息即ContentProvider,接着会调用新建进程的bindApplication()将查询到的ProviderInfo进行创建并启动ContentProvider。

4.1、计时开始

在ContentProvider的创建启动过程中, 有一个地方需要需要注意,那就是AMS的attachApplication()方法,

private boolean attachApplicationLocked(@NonNull IApplicationThread thread,
        int pid, int callingUid, long startSeq) {
    ...

    List<ProviderInfo> providers = normalMode
                                        ? mCpHelper.generateApplicationProvidersLocked(app)
                                        : null;

    if (providers != null && mCpHelper.checkAppInLaunchingProvidersLocked(app)) {
        Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG);
        msg.obj = app;
        mHandler.sendMessageDelayed(msg,
                ContentResolver.CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS);
    }

       ...

        
        thread.bindApplication(processName, appInfo, providerList,
                    instr2.mClass,
                    profilerInfo, instr2.mArguments,
                    instr2.mWatcher,
                    instr2.mUiAutomationConnection, testMode,
                    mBinderTransactionTrackingEnabled, enableTrackAllocation,
                    isRestrictedBackupMode || !normalMode, app.isPersistent(),
                    new Configuration(app.getWindowProcessController().getConfiguration()),
                    app.getCompat(), getCommonServicesLocked(app.isolated),
                    mCoreSettingsObserver.getCoreSettingsLocked(),
                    buildSerial, autofillOptions, contentCaptureOptions,
                    app.getDisabledCompatChanges(), serializedSystemFontMap);
      ...
}

在这个方法中,会先通过ContentProviderHelper拿到ProviderInfo列表,然后通过Handler延时发送CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息,而延时的时间为10s,

public static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS =
        10 * 1000 * Build.HW_TIMEOUT_MULTIPLIER;

跟到最后其实和service、广播的流程一样,最后会触发appNotResponding,向用户展示ANR相关信息。

public void appNotRespondingViaProvider(IBinder connection) {
      
        final ContentProviderConnection conn = (ContentProviderConnection) connection;
        ...
        final ProcessRecord host = conn.provider.proc;
        ...
        mHandler.post(new Runnable() {
            @Override
            public void run() {
                 mAppErrors.appNotResponding(host, null, null, false,
                        "ContentProvider not responding");
            }
        });
    }

4.2、计时结束

目前有两个地方会结束计时,

  1. 进程消失

当进程消失时,会清除进程的主要代码,在这里会remove掉ANR的计时,因为在新建进程时,会再次开启一个新的计时。

final boolean cleanUpApplicationRecordLocked(ProcessRecord app, int pid,
        boolean restarting, boolean allowRestart, int index, boolean replacingPid,
        boolean fromBinderDied) {
        ...

        mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, app);
}
  1. ContentProvider发布完成同样会移除ANR的超时消息。
    public final void publishContentProviders(IApplicationThread caller,
            List<ContentProviderHolder> providers) {
                    if (wasInLaunchingProviders) {
                        mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r);
                    }
            }
        }
    }

我正在参与掘金技术社区创作者签约计划招募活动,点击链接报名投稿

猜你喜欢

转载自juejin.im/post/7121217696594657293
ANR