【Android】Android ANR产生过程与分析方法

前言

Android ANR问题一直是比较难解决的问题,一来它比较难以复现,二来复现后也不太好分析。这篇文章梳理一下ANR产生的过程以及出现ANR拿到日志文件如何定位原因。其实关于ANR线上监控也是比较棘手的,看了这篇文章我们再去看一些ANR端上监控方案(比如微信Matrix)也许思路更多清晰。

出现ANR时如下图:
在这里插入图片描述

ANR是什么

ANR表示应用长时间无响应,会在界面上弹出一个弹窗(如上图)。它并不是一个Runtime Exception,不能通过catch来捕获。并且它是System Server进程弹出来的,所以app进程感知不到(可以在native层感知,下文会分析)。出现ANR时,System Server进程会在Logcat里打印日志,并且会把更多详细的日志输出到/data/anr目录下以文件形式存放(一般叫它ANR trace文件)。

ANR产生原因

Android ANR一般由以下几个原因产生:

Service Timeout :前台服务在20s内未执行完成,后台服务在200s未完成;
BroadcastQueue Timeout:前台广播在10s内完成,后台60s;
ContentProvider Timeout:Provider发布超时10s;
InputDispatching Timeout:输入事件处理超时5s,包括按键和触摸事件。

比较常见的场景是第四种,即输入事件响应超时,主要是触摸事件。为什么会超时呢?一般是因为主线程因为某些原因阻塞了,比如耗时任务、复杂计算、死锁、休眠等等。

ANR产生过程

Service Timeout表示Service组件生命周期函数比如onCreate处理超时。下面举例说明Service的onCreate生命周期函数是如何产生AAR的。

onCreate是在startSerice之后调用的,因此从startService说起。以下代码基于Android SDK 29。
流程如下:

Context.startService
ContextImpl.startService
ActivityManagerService.startService
ActiveServices.startServiceLocked
ActiveServices.startServiceInnerLocked
ActiveServices.bringUpServiceLocked
ActiveServices.realStartServiceLocked

然后重点看下ActiveServices.realStartServiceLocked函数的代码:

  private final void realStartServiceLocked(ServiceRecord r,
            ProcessRecord app, boolean execInFg) throws RemoteException {
    
    
		...
		//这个函数会发送一条延时20秒的消息
        bumpServiceExecutingLocked(r, execInFg, "create");
   		...
        try {
    
    
            ...
           //通知app进程创建Service:这里面会调用onCreate生命周期函数
            app.thread.scheduleCreateService(r, r.serviceInfo,
  		    ...
  		 } catch (DeadObjectException e) {
    
    
		 ...

bumpServiceExecutingLocked发送延时消息函数:

 private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
    
    
  ...
   scheduleServiceTimeoutLocked(r.app);
   ...
}

继续看scheduleServiceTimeoutLocked函数:

    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    
    
        //获取延时消息
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_TIMEOUT_MSG);
        msg.obj = proc;
        //发送延时消息,前台服务是20秒,后台是200秒
        mAm.mHandler.sendMessageDelayed(msg,
                proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
    }

关于常量SERVICE_TIMEOUTSERVICE_BACKGROUND_TIMEOUT的定义:

    //路径:com.android.server.am.ActiveServices.java
    
   // How long we wait for a service to finish executing.
    static final int SERVICE_TIMEOUT = 20*1000;

    // How long we wait for a service to finish executing.
    static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
    

可见前台服务ANR时间是20秒,后台服务ANR时间是10倍也就是200秒。

继续看app进程的ActivityThread如何处理创建Service任务:

  private void handleCreateService(CreateServiceData data) {
    
    
        Service service = null;
        try {
    
    
            java.lang.ClassLoader cl = packageInfo.getClassLoader();
            service = packageInfo.getAppFactory()
                    .instantiateService(cl, data.info.name, data.intent);
        } catch (Exception e) {
    
    
      ...
        }

        try {
    
    
           ...
            ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
            context.setOuterContext(service);

            Application app = packageInfo.makeApplication(false, mInstrumentation);
            service.attach(context, this, data.info.name, data.token, app,
                    ActivityManager.getService());
            //重点:调用生命周期函数onCreate
            service.onCreate();
            mServices.put(data.token, service);
            try {
    
    
            //重点:通知AMS Service创建完成,会清除handler里的延时消息
                ActivityManager.getService().serviceDoneExecuting(
                        data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
            } catch (RemoteException e) {
    
    
                throw e.rethrowFromSystemServer();
            }
            ,,,

service.onCreate之后会通知AMS service创建完成。

ActivityManagerService.serviceDoneExecuting方法会走到ActiveServices的
serviceDoneExecutingLocked

  private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,boolean finishing) {
    
    
 
	... 
	//移除之前发送的延时消息
	mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
	...
 }

由此可见,Service的生命周期函数onCreate在20之内走完的话,延时消息就会从handler里清除,消息就不会执行。

如果Service的生命周期函数onCreate在20之内还没有走完的话,之前的发送的延时消息就会执行。

这个消息就是处理ANR的消息。

关于延时消息:SERVICE_TIMEOUT_MSG,MainHandler的处理如下:

//com.android.server.am.ActivityManagerService
final class MainHandler extends Handler {
    
    
        public MainHandler(Looper looper) {
    
    
            super(looper, null, true);
        }

        @Override
        public void handleMessage(Message msg) {
    
    
            switch (msg.what) {
    
    
         	...
            case SERVICE_TIMEOUT_MSG: {
    
    
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            ...
         	}
         	...
         }
 }

mServices.serviceTimeout的实现如下:

  void serviceTimeout(ProcessRecord proc) {
    
    
	  ...
      proc.appNotResponding(null, null, null, null, false, 
      ...
  }

所以Service发生ANR会走到ProcessRecord.appNotResponding函数。

经过其它类型ANR的分析,它们也会走到ProcessRecord.appNotResponding函数,比如输入事件超时:

//com.android.server.am.ActivityManagerService
   /**
     * Handle input dispatching timeouts.
     * @return whether input dispatching should be aborted or not.
     */
    boolean inputDispatchingTimedOut(ProcessRecord proc, String activityShortComponentName,
            ApplicationInfo aInfo, String parentShortComponentName,
            WindowProcessController parentProcess, boolean aboveSystem, String reason) {
    
    
        if (checkCallingPermission(FILTER_EVENTS) != PackageManager.PERMISSION_GRANTED) {
    
    
            throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
    
    
            annotation = "Input dispatching timed out";
        } else {
    
    
            annotation = "Input dispatching timed out (" + reason + ")";
        }

        if (proc != null) {
    
    
            synchronized (this) {
    
    
                if (proc.isDebugging()) {
    
    
                    return false;
                }

                if (proc.getActiveInstrumentation() != null) {
    
    
                    Bundle info = new Bundle();
                    info.putString("shortMsg", "keyDispatchingTimedOut");
                    info.putString("longMsg", annotation);
                    finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info);
                    return true;
                }
            }
            //输入事件超时同样也会走到ProcessRecord.appNotResponding
            proc.appNotResponding(activityShortComponentName, aInfo,
                    parentShortComponentName, parentProcess, aboveSystem, annotation);
        }

        return true;
    }

关于输入事件、广播、provider的超时处理流程就不一一分析了。

所以ProcessRecord.appNotResponding这个函数是殊路同归的,所有类型的ANR最终都会走这里。

处理ANR

处理ANR流程分为以下几个步骤:

收集需要dump堆栈的进程id
分别通知这些进程开始dump线程堆栈-输出到/data/anr目录下
打印Logcat日志
前台进程弹出ANR弹窗/后台进程不弹

详细过程如下:

//com.android.server.am.ProcessRecord
   void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
          String parentShortComponentName, WindowProcessController parentProcess,
          boolean aboveSystem, String annotation) {
    
    
       //收集需要dump堆栈的进程id,分为firstPids、lastPids和nativeProcs
      ArrayList<Integer> firstPids = new ArrayList<>(5);
      SparseArray<Boolean> lastPids = new SparseArray<>(20);

      synchronized (mService) {
    
    
   		...
          // In case we come through here for the same app before completing
          // this one, mark as anring now so we will bail out.
          setNotResponding(true);

          // Dump thread traces as quickly as we can, starting with "interesting" processes.
          firstPids.add(pid);

          // Don't dump other PIDs if it's a background ANR
          if (!isSilentAnr()) {
    
    
              int parentPid = pid;
              if (parentProcess != null && parentProcess.getPid() > 0) {
    
    
                  parentPid = parentProcess.getPid();
              }
              if (parentPid != pid) firstPids.add(parentPid);

              if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID);

              for (int i = getLruProcessList().size() - 1; i >= 0; i--) {
    
    
                  ProcessRecord r = getLruProcessList().get(i);
                  if (r != null && r.thread != null) {
    
    
                      int myPid = r.pid;
                      if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) {
    
    
                          if (r.isPersistent()) {
    
    
                              firstPids.add(myPid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                          } else if (r.treatLikeActivity) {
    
    
                              firstPids.add(myPid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                          } else {
    
    
                              lastPids.put(myPid, Boolean.TRUE);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                          }
                      }
                  }
              }
          }
      }
  	//开始组装logcat日志
      // Log the ANR to the main log.
      StringBuilder info = new StringBuilder();
      info.setLength(0);
      info.append("ANR in ").append(processName);
      if (activityShortComponentName != null) {
    
    
          info.append(" (").append(activityShortComponentName).append(")");
      }
      info.append("\n");
      info.append("PID: ").append(pid).append("\n");
      if (annotation != null) {
    
    
          info.append("Reason: ").append(annotation).append("\n");
      }
      if (parentShortComponentName != null
              && parentShortComponentName.equals(activityShortComponentName)) {
    
    
          info.append("Parent: ").append(parentShortComponentName).append("\n");
      }

      ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

  	//收集需要dump的native进程id
      // don't dump native PIDs for background ANRs unless it is the process of interest
      String[] nativeProcs = null;
      if (isSilentAnr()) {
    
    
          for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
    
    
              if (NATIVE_STACKS_OF_INTEREST[i].equals(processName)) {
    
    
                  nativeProcs = new String[] {
    
     processName };
                  break;
              }
          }
      } else {
    
    
          nativeProcs = NATIVE_STACKS_OF_INTEREST;
      }

      int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
      ArrayList<Integer> nativePids = null;

      if (pids != null) {
    
    
          nativePids = new ArrayList<>(pids.length);
          for (int i : pids) {
    
    
              nativePids.add(i);
          }
      }
  	//重点:开始dump堆栈
      // For background ANRs, don't pass the ProcessCpuTracker to
      // avoid spending 1/2 second collecting stats to rank lastPids.
      File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
              (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
              nativePids);

      String cpuInfo = null;
      if (isMonitorCpuUsage()) {
    
    
          mService.updateCpuStatsNow();
          synchronized (mService.mProcessCpuTracker) {
    
    
              cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
          }
          info.append(processCpuTracker.printCurrentLoad());
          info.append(cpuInfo);
      }

      info.append(processCpuTracker.printCurrentState(anrTime));
      
  	  //输出日志到Logcat
      Slog.e(TAG, info.toString());
      if (tracesFile == null) {
    
    
          // There is no trace file, so dump (only) the alleged culprit's threads to the log
          Process.sendSignal(pid, Process.SIGNAL_QUIT);
      }

      synchronized (mService) {
    
    
  		...
  		//后台进程直接杀死,不弹ANR
          if (isSilentAnr() && !isDebugging()) {
    
    
              kill("bg anr", true);
              return;
          }
          //给app进程设置一个ANR状态
          // Set the app's notResponding state, and look up the errorReportReceiver
          makeAppNotRespondingLocked(activityShortComponentName,
                  annotation != null ? "ANR " + annotation : "ANR", info.toString());

          // mUiHandler can be null if the AMS is constructed with injector only. This will only
          // happen in tests.
          //开始弹出ANR弹窗
          if (mService.mUiHandler != null) {
    
    
              // Bring up the infamous App Not Responding dialog
              Message msg = Message.obtain();
              msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
              msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem);

              mService.mUiHandler.sendMessage(msg);
          }
      }
  }

继续看下ActivityManagerService是如何dump堆栈的:

  File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
                (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
                nativePids);

ActivityManagerService.dumpStackTraces函数:

//com.android.server.am.ActivityManagerService
 public static File dumpStackTraces(ArrayList<Integer> firstPids,
           ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids,
           ArrayList<Integer> nativePids) {
    
    
       ArrayList<Integer> extraPids = null;

       Slog.i(TAG, "dumpStackTraces pids=" + lastPids + " nativepids=" + nativePids);

       // Measure CPU usage as soon as we're called in order to get a realistic sampling
       // of the top users at the time of the request.
       if (processCpuTracker != null) {
    
    
           processCpuTracker.init();
           try {
    
    
               Thread.sleep(200);
           } catch (InterruptedException ignored) {
    
    
           }

           processCpuTracker.update();
   		...
   		//创建ANR的输出文件:ANR_TRACE_DIR = "/data/anr";
       final File tracesDir = new File(ANR_TRACE_DIR);
       // Each set of ANR traces is written to a separate file and dumpstate will process
       // all such files and add them to a captured bug report if they're recent enough.
       maybePruneOldTraces(tracesDir);

       // NOTE: We should consider creating the file in native code atomically once we've
       // gotten rid of the old scheme of dumping and lot of the code that deals with paths
       // can be removed.
       File tracesFile = createAnrDumpFile(tracesDir);
       if (tracesFile == null) {
    
    
           return null;
       }
   	//文件创建完毕,开始dump
       dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids);
       return tracesFile;
   }

ActivityManagerService.dumpStackTraces:

 //com.android.server.am.ActivityManagerService
 public static void dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
            ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) {
    
    

        Slog.i(TAG, "Dumping to " + tracesFile);

        // We don't need any sort of inotify based monitoring when we're dumping traces via
        // tombstoned. Data is piped to an "intercept" FD installed in tombstoned so we're in full
        // control of all writes to the file in question.

        // We must complete all stack dumps within 20 seconds.
        long remainingTime = 20 * 1000;

        // First collect all of the stacks of the most important pids.
        if (firstPids != null) {
    
    
            int num = firstPids.size();
            for (int i = 0; i < num; i++) {
    
    
                Slog.i(TAG, "Collecting stacks for pid " + firstPids.get(i));
                final long timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile,
                                                                remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
                           "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
                }
            }
        }

        // Next collect the stacks of the native pids
        if (nativePids != null) {
    
    
            for (int pid : nativePids) {
    
    
                Slog.i(TAG, "Collecting stacks for native pid " + pid);
                final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime);

                final long start = SystemClock.elapsedRealtime();
                Debug.dumpNativeBacktraceToFileTimeout(
                        pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
                final long timeTaken = SystemClock.elapsedRealtime() - start;

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current native pid=" + pid +
                        "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with native pid " + pid + " in " + timeTaken + "ms");
                }
            }
        }

        // Lastly, dump stacks for all extra PIDs from the CPU tracker.
        if (extraPids != null) {
    
    
            for (int pid : extraPids) {
    
    
                Slog.i(TAG, "Collecting stacks for extra pid " + pid);

                final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current extra pid=" + pid +
                            "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with extra pid " + pid + " in " + timeTaken + "ms");
                }
            }
        }
        Slog.i(TAG, "Done dumping");
    }

可见,dump trace用了两个函数:
dumpJavaTracesTombstonedDebug.dumpNativeBacktraceToFileTimeout,分别是Java层和native层的。Native层是直接调用android.os.Debug类处理。Java层调用dumpJavaTracesTombstoned处理。先看下Java层。

ActivityManagerService.dumpJavaTracesTombstoned:

 /**
     * Dump java traces for process {@code pid} to the specified file. If java trace dumping
     * fails, a native backtrace is attempted. Note that the timeout {@code timeoutMs} only applies
     * to the java section of the trace, a further {@code NATIVE_DUMP_TIMEOUT_MS} might be spent
     * attempting to obtain native traces in the case of a failure. Returns the total time spent
     * capturing traces.
     */
    private static long dumpJavaTracesTombstoned(int pid, String fileName, long timeoutMs) {
    
    
        final long timeStart = SystemClock.elapsedRealtime();
        boolean javaSuccess = Debug.dumpJavaBacktraceToFileTimeout(pid, fileName,
                (int) (timeoutMs / 1000));
        if (javaSuccess) {
    
    
            // Check that something is in the file, actually. Try-catch should not be necessary,
            // but better safe than sorry.
            try {
    
    
                long size = new File(fileName).length();
                if (size < JAVA_DUMP_MINIMUM_SIZE) {
    
    
                    Slog.w(TAG, "Successfully created Java ANR file is empty!");
                    javaSuccess = false;
                }
            } catch (Exception e) {
    
    
                Slog.w(TAG, "Unable to get ANR file size", e);
                javaSuccess = false;
            }
        }
        if (!javaSuccess) {
    
    
            Slog.w(TAG, "Dumping Java threads failed, initiating native stack dump.");
            if (!Debug.dumpNativeBacktraceToFileTimeout(pid, fileName,
                    (NATIVE_DUMP_TIMEOUT_MS / 1000))) {
    
    
                Slog.w(TAG, "Native stack dump failed!");
            }
        }

        return SystemClock.elapsedRealtime() - timeStart;
    }

又调用了 Debug.dumpJavaBacktraceToFileTimeout处理dump。

看下Debug类:

//android.os.Debug
  /**
     * Append the Java stack traces of a given native process to a specified file.
     *
     * @param pid pid to dump.
     * @param file path of file to append dump to.
     * @param timeoutSecs time to wait in seconds, or 0 to wait forever.
     * @hide
     */
    public static native boolean dumpJavaBacktraceToFileTimeout(int pid, String file,
                                                                int timeoutSecs);

    /**
     * Append the native stack traces of a given process to a specified file.
     *
     * @param pid pid to dump.
     * @param file path of file to append dump to.
     * @param timeoutSecs time to wait in seconds, or 0 to wait forever.
     * @hide
     */
    public static native boolean dumpNativeBacktraceToFileTimeout(int pid, String file,
                                                                  int timeoutSecs);

所以Dump trace最终还是调用android.os.Debug类的这两个函数:
dumpJavaBacktraceToFileTimeoutdumpNativeBacktraceToFileTimeout

这两个方法是native修饰的,因此需要去看下android源码。

注意这两个方法是加了@hide标记,app侧不能调用。

Native层如何dump trace

在Android源码中搜索dumpJavaBacktraceToFileTimeout这个函数对应的c++代码,找到了frameworks/base/core/jni/android_os_Debug.cpp,对应函数的实现:

frameworks/base/core/jni/android_os_Debug.cpp

static jboolean android_os_Debug_dumpJavaBacktraceToFileTimeout(JNIEnv* env, jobject clazz,
        jint pid, jstring fileName, jint timeoutSecs) {
    
    
    const bool ret = dumpTraces(env, pid, fileName, timeoutSecs, kDebuggerdJavaBacktrace);
    return ret ? JNI_TRUE : JNI_FALSE;
}

跟踪到了system/core/debuggerd/client/debuggerd_client.cppdebuggerd_trigger_dump方法:

bool debuggerd_trigger_dump(pid_t tid, DebuggerdDumpType dump_type, unsigned int timeout_ms,
                            unique_fd output_fd) {
    
    
     ...
 	// Send the signal.
  	const int signal = (dump_type == kDebuggerdJavaBacktrace) ? SIGQUIT 	: BIONIC_SIGNAL_DEBUGGER;
  	sigval val = {
    
    .sival_int = (dump_type == kDebuggerdNativeBacktrace) ? 1 : 0};
  	if (sigqueue(pid, signal, val) != 0) {
    
    
   	 log_error(output_fd, errno, "failed to send signal to pid %d", pid);
    	return false;
 	 }
 	 ...
  }

这个函数里面会通过sigqueue函数(bionic/libc/bionic/signal.cpp)给目标进程发送一个SIGQUIT信号。

继续看接收SIGQUIT信号的地方。

每一个app进程都会有一个SignalCatcher线程,专门处理SIGQUIT信号,来到art/runtime/signal_catcher.cc:

void* SignalCatcher::Run(void* arg) {
    
    
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  ...
  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);

  while (true) {
    
    
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
    
    
      runtime->DetachCurrentThread();
      return nullptr;
    }

    switch (signal_number) {
    
    
    case SIGQUIT:
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}

监听到SIGQUIT信号后交给了HandleSigQuit函数处理:

void SignalCatcher::HandleSigQuit() {
    
    
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  os << "\n"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";

  DumpCmdLine(os);

  // Note: The strings "Build fingerprint:" and "ABI:" are chosen to match the format used by
  // debuggerd. This allows, for example, the stack tool to work.
  std::string fingerprint = runtime->GetFingerprint();
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";

  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";

  runtime->DumpForSigQuit(os);

  if ((false)) {
    
    
    std::string maps;
    if (android::base::ReadFileToString("/proc/self/maps", &maps)) {
    
    
      os << "/proc/self/maps:\n" << maps;
    }
  }
  os << "----- end " << getpid() << " -----\n";
  Output(os.str());
}

中间调用art/runtime/runtime.cc的DumpForSigQuit方法收集了更多详细的信息,包括线程堆栈。

void Runtime::DumpForSigQuit(std::ostream& os) {
    
    
  // Print backtraces first since they are important do diagnose ANRs,
  // and ANRs can often be trimmed to limit upload size.
  thread_list_->DumpForSigQuit(os);
  GetClassLinker()->DumpForSigQuit(os);
  GetInternTable()->DumpForSigQuit(os);
  GetJavaVM()->DumpForSigQuit(os);
  GetHeap()->DumpForSigQuit(os);
  oat_file_manager_->DumpForSigQuit(os);
  if (GetJit() != nullptr) {
    
    
    GetJit()->DumpForSigQuit(os);
  } else {
    
    
    os << "Running non JIT\n";
  }
  DumpDeoptimizations(os);
  TrackedAllocators::Dump(os);
  GetMetrics()->DumpForSigQuit(os);
  os << "\n";

  BaseMutex::DumpAll(os);

  // Inform anyone else who is interested in SigQuit.
  {
    
    
    ScopedObjectAccess soa(Thread::Current());
    callbacks_->SigQuit();
  }
}

ANR打印的信息比较多,详细请参阅相关源码。

到这里已经分析完了整个ANR从发生到打印的流程。

ANR分析方法

现在已经知道了ANR是怎么回事了,现在看下发生了ANR是如何定位原因的。
上文已经讲到发生ANR会在两个地方打印日志,一个是在Logcat里打印,一个是在/data/anr/目录下的trace文件里打印。

下面模拟两个场景复现ANR,一个场景是耗时操作导致ANR,一个是死锁导致ANR。

场景1:耗时操作导致ANR

为了方便,就让主线程休眠10s。

在Activity界面上有一个按钮,点击会让主线程休眠10s,代码如下,显然会发生ANR。

class AnrTestActivity : AppCompatActivity() {
    
    
    override fun onCreate(savedInstanceState: Bundle?) {
    
    
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_anr_test)
        this.findViewById<Button>(R.id.button).setOnClickListener{
    
    
            SystemClock.sleep(10000)
        }
    }

连续点击两次,5s之后会弹出ANR弹窗。
在这里插入图片描述
Logcat输出日志如下:

2022-10-02 15:38:00.505 594-5381/system_process E/ActivityManager: ANR in com.devnn.demo (com.devnn.demo/.AnrTestActivity)
    PID: 5232
    Reason: Input dispatching timed out (f99e8bb com.devnn.demo/com.devnn.demo.AnrTestActivity (server) is not responding. Waited 5008ms for MotionEvent(deviceId=8, source=0x00005002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00000000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=22.8, yPrecision=12.8, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (804.9, 1173.9)]), policyFlags=0x62000000)
    Parent: com.devnn.demo/.AnrTestActivity
    Load: 0.05 / 0.01 / 0.0
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=0
    full avg10=0.00 avg60=0.00 avg300=0.00 total=0
    ----- End output from /proc/pressure/memory -----
    
    CPU usage from 158257ms to 0ms ago (2022-10-02 15:35:18.256 to 2022-10-02 15:37:56.513):
      6.2% 279/[email protected]: 0.3% user + 5.9% kernel
      2.2% 292/[email protected]: 0% user + 2.1% kernel
      1.6% 594/system_server: 0.3% user + 1.3% kernel / faults: 1085 minor
      1.4% 300/[email protected]: 0% user + 1.4% kernel
      0.4% 277/android.hardware.audio.service.ranchu: 0% user + 0.4% kernel / faults: 10 minor
      0.2% 371/audioserver: 0% user + 0.2% kernel / faults: 4 minor
      0.2% 5232/com.devnn.demo: 0% user + 0.2% kernel / faults: 272 minor
      0.2% 318/surfaceflinger: 0% user + 0.2% kernel
      0% 16/ksoftirqd/1: 0% user + 0% kernel
      0% 365/adbd: 0% user + 0% kernel
      0% 477/llkd: 0% user + 0% kernel
      0% 872/[email protected]: 0% user + 0% kernel
      0% 10/rcu_preempt: 0% user + 0% kernel
      0% 2014/com.android.systemui: 0% user + 0% kernel / faults: 39 minor
      0% 9/ksoftirqd/0: 0% user + 0% kernel
      0% 1002/com.android.phone: 0% user + 0% kernel / faults: 100 minor
      0% 3645/kworker/0:2-events_power_efficient: 0% user + 0% kernel
      0% 157/logd: 0% user + 0% kernel
      0% 427/libgoldfish-rild: 0% user + 0% kernel / faults: 16 minor
      0% 3270/kworker/1:1-mm_percpu_wq: 0% user + 0% kernel
      0% 159/servicemanager: 0% user + 0% kernel
      0% 160/hwservicemanager: 0% user + 0% kernel
      0% 478/hostapd_nohidl: 0% user + 0% kernel
      0% 5346/kworker/u4:0-events_unbound: 0% user + 0% kernel
      0% 11/migration/0: 0% user + 0% kernel
      0% 15/migration/1: 0% user + 0% kernel
      0% 164/qemu-props: 0% user + 0% kernel
      0% 188/jbd2/dm-5-8: 0% user + 0% kernel
      0% 269/statsd: 0% user + 0% kernel
      0% 342/logcat: 0% user + 0% kernel
      0% 418/media.metrics: 0% user + 0% kernel / faults: 1 minor
      0% 442/[email protected]: 0% user + 0% kernel
      0% 761/wpa_supplicant: 0% user + 0% kernel
      0% 3615/logcat: 0% user + 0% kernel
      0% 5068/kworker/u4:1-phy0: 0% user + 0% kernel
    1.9% TOTAL: 0.1% user + 1.7% kernel + 0% softirq
    CPU usage from 20ms to 335ms later (2022-10-02 15:37:56.533 to 2022-10-02 15:37:56.848):
      22% 594/system_server: 15% user + 7.5% kernel / faults: 161 minor
        22% 5381/AnrConsumer: 7.5% user + 15% kernel
      6.9% 279/[email protected]: 0% user + 6.9% kernel
        6.9% 1215/[email protected]: 0% user + 6.9% kernel
      3.5% 292/[email protected]: 0% user + 3.5% kernel
    18% TOTAL: 8.6% user + 10% kernel

注意需要选中system_process进程。

从Logcat日志可以看出来,是进程id=5323的处理输入事件超时了。这个日志也是上文分析的ProcessRecord.appNotResponding方法打印出来的。

下面看下/data/anr/目录下的日志内容是怎么样的。

整个trace文件就代表发生一次ANR的日志。每发生一次ANR就会生成新的trace文件,trace文件名称以时间命名的。
在这里插入图片描述
整个trace文件是有结构的,它整体上是以进程为单位进行打印的。

由于发生ANR不一定是app进程导致的,可能是其它关联进程导致的,所以它把相关进程的信息都打印在同一个文件里了。基本上是以下面这个结构打印的。

----- pid 5232 at 2022-10-02 15:37:56 -----
进程5232的详细日志
----- end 5232 -----

----- pid 594 at 2022-10-02 15:37:57 -----
进程594的详细日志
----- end 594 -----

----- pid xxx at xxxx-xx-xx xx:xx:xx -----
进程xxx的详细日志
----- end xxx -----

第一个进程就是发生ANR的进程,一般是app进程。

由于内容过长,整个trace文件有700多KB,下面就截取app进程的主要信息。

每个进程信息的开头是它的概要信息,包括进程id,发生ANR的时间,进程的名称。

----- pid 5232 at 2022-10-02 15:37:56 -----
Cmd line: com.devnn.demo
Build fingerprint: 'Android/sdk_phone_x86_64/generic_x86_64:11/RSR1.210722.012/7758210:userdebug/test-keys'
ABI: 'x86_64'
Build type: optimized
Zygote loaded classes=15740 post zygote classes=1289
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes10.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes11.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes6.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes2.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes3.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes8.dex], parent #1
Done dumping class loaders
Classes initialized: 526 in 694.025ms
Intern table: 31792 strong; 523 weak
JNI: CheckJNI is on; globals=639 (plus 37 weak)
Libraries: libandroid.so libaudioeffect_jni.so libcompiler_rt.so libicu_jni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so librs_jni.so libsfplugin_ccodec.so libsoundpool.so libstats_jni.so libwebviewchromium_loader.so (14)
Heap: 46% free, 11MB/21MB; 75810 objects
//此处省略部分内容

第二部分是进程里所有线程的状态、堆栈,也是我们重点要关注的:


suspend all histogram:	Sum: 74.854ms 99% C.I. 0.005ms-43.315ms Avg: 3.742ms Max: 44.394ms
DALVIK THREADS (21):
"Signal Catcher" daemon prio=10 tid=4 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x12c40b10 self=0x7fada5a4af50
  | sysTid=5242 nice=-20 cgrp=top-app sched=0/0 handle=0x7fac275adcf0
  | state=R schedstat=( 21716542 2041235 2 ) utm=0 stm=2 core=0 HZ=100
  | stack=0x7fac274b6000-0x7fac274b8000 stackSize=995KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 000000000054da9e  /apex/com.android.art/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+126)
  native: #01 pc 000000000069615c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool, BacktraceMap*, bool) const+380)
  native: #02 pc 00000000006b7320  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+1088)
  native: #03 pc 00000000006b064d  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+557)
  native: #04 pc 00000000006af729  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+1817)
  native: #05 pc 00000000006aec28  /apex/com.android.art/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+824)
  native: #06 pc 00000000006470d9  /apex/com.android.art/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+201)
  native: #07 pc 000000000065ceb6  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1766)
  native: #08 pc 000000000065bc85  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::Run(void*)+357)
  native: #09 pc 00000000000c7d2a  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+58)
  native: #10 pc 000000000005f0c7  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+55)
  (no managed stack frames)

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5232 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 5775317077 4230577099 871 ) utm=286 stm=291 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:442)
  - locked <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:358)
  at android.os.SystemClock.sleep(SystemClock.java:131)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-0(AnrTestActivity.kt:17)
  at com.devnn.demo.AnrTestActivity.lambda$UpadNwrDNzrVyNaTI0ysWoH569M(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$UpadNwrDNzrVyNaTI0ysWoH569M.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)
  at android.view.View.access$3600(View.java:810)
  at android.view.View$PerformClick.run(View.java:28305)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:223)
  at android.app.ActivityThread.main(ActivityThread.java:7656)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947)

"perfetto_hprof_listener" prio=10 tid=5 Native (still starting up)
  | group="" sCount=1 dsCount=0 flags=1 obj=0x0 self=0x7fada5a4cb20
  | sysTid=5243 nice=-20 cgrp=top-app sched=0/0 handle=0x7fac274afcf0
  | state=S schedstat=( 3314219 3983561 6 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7fac273b8000-0x7fac273ba000 stackSize=995KB
  | held mutexes=
  native: #00 pc 00000000000b1ec5  /apex/com.android.runtime/lib64/bionic/libc.so (read+5)
  native: #01 pc 000000000001cb70  /apex/com.android.art/lib64/libperfetto_hprof.so (void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ArtPlugin_Initialize::$_29> >(void*)+288)
  native: #02 pc 00000000000c7d2a  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+58)
  native: #03 pc 000000000005f0c7  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+55)
  (no managed stack frames)
  //...省略其它线程

可以看到第一个线程是Signal Catcher守护线程,用来捕获SIGQUIT信号的。从这里也说明这个线程是属于app进程的。第二个线程就是我们app的主线程:

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5232 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 5775317077 4230577099 871 ) utm=286 stm=291 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:442)
  - locked <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:358)
  at android.os.SystemClock.sleep(SystemClock.java:131)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-0(AnrTestActivity.kt:17)
  at com.devnn.demo.AnrTestActivity.lambda$UpadNwrDNzrVyNaTI0ysWoH569M(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$UpadNwrDNzrVyNaTI0ysWoH569M.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)

可以看到主线程由于正在休眠而无法响应输入事件。

每个线程信息的第一行是固定的:

"main" prio=5 tid=1 Sleeping

第一个表示线程名称,第二个是它的优先级,第三个是线程id,第四个是线程状态。

这里有个关键信息就是线程状态,一般看这个线程状态就大概知道是什么原因导致的ANR。这里一看是休眠了,所以后面看它的堆栈就能分析具体的代码位置。

下面再看一个死锁操作导致ANR的例子。

场景2:死锁导致ANR

 private fun clickTest() {
    
    

        val obj1 = Object()
        val obj2 = Object()

        Thread {
    
    
            synchronized(obj1) {
    
    
                Thread.sleep(100)
                //子线程已经获取obj1的锁,想要获取ojb2的锁
                synchronized(obj2) {
    
    
                    Log.i("AnrTest", "sub")
                }
            }
        }.start()

        synchronized(obj2) {
    
    
            Thread.sleep(100)
            //子线程已经获取obj2的锁,想要获取ojb1的锁
            synchronized(obj1) {
    
    
                Log.i("AnrTest", "main")
            }
        }

    }

Logcat日志如下,依然显示无法响应输入事件。

2022-10-02 16:30:14.001 594-5956/system_process E/ActivityManager: ANR in com.devnn.demo (com.devnn.demo/.AnrTestActivity)
    PID: 5906
    Reason: Input dispatching timed out (1313584 com.devnn.demo/com.devnn.demo.AnrTestActivity (server) is not responding. Waited 5007ms for MotionEvent(deviceId=8, source=0x00005002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00000000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=22.8, yPrecision=12.8, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (721.0, 1641.9)]), policyFlags=0x62000000)
    Parent: com.devnn.demo/.AnrTestActivity
    Load: 0.8 / 0.67 / 0.39
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=0
    full avg10=0.00 avg60=0.00 avg300=0.00 total=0
    ----- End output from /proc/pressure/memory -----
    
    CPU usage from 285508ms to 0ms ago (2022-10-02 16:25:25.779 to 2022-10-02 16:30:11.287):
      8.1% 279/[email protected]: 0.6% user + 7.4% kernel
      4.4% 292/[email protected]: 0.3% user + 4.1% kernel
      4.3% 594/system_server: 1.4% user + 2.8% kernel / faults: 19536 minor
      2.8% 318/surfaceflinger: 0.3% user + 2.4% kernel / faults: 871 minor
      2% 300/[email protected]: 0% user + 1.9% kernel
      0.5% 2014/com.android.systemui: 0% user + 0.5% kernel / faults: 4342 minor
      0.4% 365/adbd: 0% user + 0.4% kernel / faults: 946 minor
      0.2% 1152/com.android.launcher3: 0% user + 0.2% kernel / faults: 50 minor
      0.2% 157/logd: 0% user + 0.2% kernel / faults: 13 minor
      0.2% 277/android.hardware.audio.service.ranchu: 0% user + 0.1% kernel / faults: 5 minor
      0.2% 10/rcu_preempt: 0% user + 0.2% kernel
      0.1% 1002/com.android.phone: 0% user + 0% kernel / faults: 1267 minor

Logat里看不出具体原因,所以要看trace文件。

----- pid 5906 at 2022-10-02 16:30:11 -----
Cmd line: com.devnn.demo
Build fingerprint: 'Android/sdk_phone_x86_64/generic_x86_64:11/RSR1.210722.012/7758210:userdebug/test-keys'
ABI: 'x86_64'
Build type: optimized

...省略无关内容 


"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5906 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 2792813804 2053378730 782 ) utm=161 stm=117 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at com.devnn.demo.AnrTestActivity.clickTest(AnrTestActivity.kt:48)
  - waiting to lock <0x026f6b14> (a java.lang.Object) held by thread 2
  - locked <0x0188dfbd> (a java.lang.Object)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-1(AnrTestActivity.kt:21)
  at com.devnn.demo.AnrTestActivity.lambda$W1-GSjdjbC-dtyUoueoTRdjL4Es(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$W1-GSjdjbC-dtyUoueoTRdjL4Es.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)
  at android.view.View.access$3600(View.java:810)
  at android.view.View$PerformClick.run(View.java:28305)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:223)
  at android.app.ActivityThread.main(ActivityThread.java:7656)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947)

可以看到主线程状态是Bocked(阻塞)。

 waiting to lock <0x026f6b14> (a java.lang.Object) held by thread 2
  - locked <0x0188dfbd> (a java.lang.Object)

堆栈显示主线程正在获取0x026f6b14这个对象的锁,这个锁被线程2持有。同时主线程正在持有0x0188dfbd对象锁。

然后看线程2的堆栈:

"Thread-5" prio=5 tid=2 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x12db7fc0 self=0x7fada5a55630
  | sysTid=5953 nice=0 cgrp=top-app sched=0/0 handle=0x7fabdc49fcf0
  | state=S schedstat=( 1560220 17477159 3 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7fabdc39c000-0x7fabdc39e000 stackSize=1043KB
  | held mutexes=
  at com.devnn.demo.AnrTestActivity.clickTest$lambda-4(AnrTestActivity.kt:39)
  - waiting to lock <0x0188dfbd> (a java.lang.Object) held by thread 1
  - locked <0x026f6b14> (a java.lang.Object)
  at com.devnn.demo.AnrTestActivity.lambda$A4lEoLZVf4n-xUBZSqj2v3ihIqw(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$A4lEoLZVf4n-xUBZSqj2v3ihIqw.run(lambda:-1)
  at java.lang.Thread.run(Thread.java:923)

线程2也处于Blocked状态,而它正在等待0x0188dfbd这个对象的锁, 这个锁正在被线程1持有。而线程2正在持有0x026f6b14这个对象锁。

这就是死锁导致的ANR。

trace文件中的线程状态

在查看trace文件中的线程状态时,可以看到线程有很多状态:

"Signal Catcher" daemon prio=10 tid=4 Runnable
"RenderThread" daemon prio=7 tid=21 Native
"DefaultDispatcher-worker-1" daemon prio=5 tid=22 TimedWaiting
"main" prio=5 tid=1 Blocked
"main" prio=5 tid=1 Sleeping
"main" prio=5 tid=1 MONITOR

主要有这几种状态,有几个状态是在Thread类已经定义了,但是NativeMONITOR是什么状态呢?

回顾下Thread类中定义的几种线程状态:

//java.lang.Thread
public class Thread implements Runnable {
    
    
 public enum State {
    
    
        /**
         * Thread state for a thread which has not yet started.
         */
        NEW,

        /**
         * Thread state for a runnable thread.  A thread in the runnable
         * state is executing in the Java virtual machine but it may
         * be waiting for other resources from the operating system
         * such as processor.
         */
        RUNNABLE,

        /**
         * Thread state for a thread blocked waiting for a monitor lock.
         * A thread in the blocked state is waiting for a monitor lock
         * to enter a synchronized block/method or
         * reenter a synchronized block/method after calling
         * {@link Object#wait() Object.wait}.
         */
        BLOCKED,

        /**
         * Thread state for a waiting thread.
         * A thread is in the waiting state due to calling one of the
         * following methods:
         * <ul>
         *   <li>{@link Object#wait() Object.wait} with no timeout</li>
         *   <li>{@link #join() Thread.join} with no timeout</li>
         *   <li>{@link LockSupport#park() LockSupport.park}</li>
         * </ul>
         *
         * <p>A thread in the waiting state is waiting for another thread to
         * perform a particular action.
         *
         * For example, a thread that has called <tt>Object.wait()</tt>
         * on an object is waiting for another thread to call
         * <tt>Object.notify()</tt> or <tt>Object.notifyAll()</tt> on
         * that object. A thread that has called <tt>Thread.join()</tt>
         * is waiting for a specified thread to terminate.
         */
        WAITING,

        /**
         * Thread state for a waiting thread with a specified waiting time.
         * A thread is in the timed waiting state due to calling one of
         * the following methods with a specified positive waiting time:
         * <ul>
         *   <li>{@link #sleep Thread.sleep}</li>
         *   <li>{@link Object#wait(long) Object.wait} with timeout</li>
         *   <li>{@link #join(long) Thread.join} with timeout</li>
         *   <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
         *   <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
         * </ul>
         */
        TIMED_WAITING,

        /**
         * Thread state for a terminated thread.
         * The thread has completed execution.
         */
        TERMINATED;
    }
}

在VMThread中有它们的对应关系:

//VMThread.java
    /**
     * Holds a mapping from native Thread statuses to Java one. Required for
     * translating back the result of getStatus().
     */
    static final Thread.State[] STATE_MAP = new Thread.State[] {
    
    
        Thread.State.TERMINATED,     // ZOMBIE
        Thread.State.RUNNABLE,       // RUNNING
        Thread.State.TIMED_WAITING,  // TIMED_WAIT
        Thread.State.BLOCKED,        // MONITOR
        Thread.State.WAITING,        // WAIT
        Thread.State.NEW,            // INITIALIZING
        Thread.State.NEW,            // STARTING
        Thread.State.RUNNABLE,       // NATIVE
        Thread.State.WAITING,        // VMWAIT
        Thread.State.RUNNABLE        // SUSPENDED
    };

可见NATIVE就代表RUNNABLEMONITOR就代表BLOCKED

OK,关于ANR问题产生过程与分析方法就介绍到这儿了。

猜你喜欢

转载自blog.csdn.net/devnn/article/details/127138547