Android SSWD(system server Watchdog)工作原理

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/abm1993/article/details/82107801

简介

一直在想如何介绍Watchdog,思来想去还是源码比较给力“This class calls its monitor every minute. Killing this process if they don't return”简单粗暴。Android系统为了保证系统的稳定性,搞了这么个Watchdog,专门负责监控Android系统的一些核心服务和线程,并且在这些服务和线程发生异常或者block时进行重启,并保存问题发生时的现场。同时Watchdog分hardware watchdog检测硬件和system server watchdog检测systemserver关键服务和线程(下面简称为sswd),本文主要结合AndroidP代码分析后者的原理。

SSWD检测的对象是什么?

使用gdb工具从coredump解析出了系统watchdog线程中mHandlerCheckers集合的数据,便可以获取sswd检测的服务和线程
Watchdog监听的系统关键线程

[000] = 0x184cbaa0 Lcom/android/server/Watchdog$HandlerChecker; foreground thread
[001] = 0x184cbf70 Lcom/android/server/Watchdog$HandlerChecker; main thread
[002] = 0x184cbfa0 Lcom/android/server/Watchdog$HandlerChecker; ui thread
[003] = 0x184cbfd0 Lcom/android/server/Watchdog$HandlerChecker; i/o thread
[004] = 0x184cc000 Lcom/android/server/Watchdog$HandlerChecker; display thread
[005] = 0x184cc030 Lcom/android/server/Watchdog$HandlerChecker; ActivityManager
[006] = 0x184cc060 Lcom/android/server/Watchdog$HandlerChecker; PowerManagerService
[007] = 0x184cc090 Lcom/android/server/Watchdog$HandlerChecker; main//同main thread
[008] = 0x184cc0c0 Lcom/android/server/Watchdog$HandlerChecker; PackageManager
[009] = 0x184cc0f0 Lcom/android/server/Watchdog$HandlerChecker; PackageManager//同上

fg->mMonitors(deadlock监听)核心服务

[000] = 0x184cbf30 Lcom/android/server/Watchdog$BinderThreadMonitor;
[001] = 0x15b00a80 Lcom/android/server/am/ActivityManagerService;
[002] = 0x15b1f770 Lcom/android/server/power/PowerManagerService;
[003] = 0x172759f0 Lcom/sonymobile/server/mirrorpowersave/LcdPowerSaveService;
[004] = 0x15b02220 Lcom/android/server/wm/WindowManagerService;
[005] = 0x15e4ee58 Lcom/android/server/input/InputManagerService;
[006] = 0x15e78220 Lcom/android/server/NetworkManagementService;
[007] = 0x18028bf8 Lcom/android/server/media/MediaSessionService;
[008] = 0x1726a8b0 Lcom/android/server/media/MediaRouterService;
[009] = 0x13f0d010 Lcom/android/server/media/projection/MediaProjectionManagerService;

SSWD的工作原理

设定检测超时时间为60s,通过四种状态判定系统服务和线程的工作状态,自旋修改自身的状态

  • COMPLETED:状态很好,无block
  • WAITING:检测30s内等待
  • WAITED_HALF:已等待超过30s但在60s内,此时是打印一些cpu的dump信息
  • OVERDUE:超时,保存超时现场,执行重启

核心代码解释

检测算法

    @Override
    public void run() {
        boolean waitedHalf = false;
        File initialStack = null;
        final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
        processCpuTracker.init();
        while (true) {
            final List<HandlerChecker> blockedCheckers;//记录异常的服务
            final String subject;
            final boolean allowRestart;
            int debuggerWasConnected = 0;
            synchronized (this) {
                long timeout = CHECK_INTERVAL;//决定检测频率,减少功耗
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerChecker hc = mHandlerCheckers.get(i);
                    hc.scheduleCheckLocked();//执行检测
                }

                if (debuggerWasConnected > 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();//记录开始时间
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        wait(timeout);//等待30s
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }//30s继续执行

                boolean fdLimitTriggered = false;
                if (mOpenFdMonitor != null) {
                    fdLimitTriggered = mOpenFdMonitor.monitor();
                }

                //检测的主要算法
                //检测分为两段时间前30s,后30s,检测结果分为四种
                if (!fdLimitTriggered) {
                    final int waitState = evaluateCheckerCompletionLocked();//获取当前检测的状态
                    if (waitState == COMPLETED) {//正常,执行下一次检测
                        // The monitors have returned; reset
                        waitedHalf = false;
                        continue;
                    } else if (waitState == WAITING) {//执行过程中
                        // still waiting but within their configured intervals; back off and recheck
                        continue;
                    } else if (waitState == WAITED_HALF) {//等待超过30s
                        if (!waitedHalf) {//先打印一些cpu的使用信息
                            // We've waited half the deadlock-detection interval.  Pull a stack
                            // trace and wait another half.
                            ArrayList<Integer> pids = new ArrayList<Integer>();
                            pids.add(Process.myPid());
                            initialStack = ActivityManagerService.dumpStackTraces(true, pids,
                                    null, null, getInterestingNativePids());
                            waitedHalf = true;
                            processCpuTracker.update();
                        }
                        continue;
                    }

                    // something is overdue!超时发生,获取异常的服务和线程
                    blockedCheckers = getBlockedCheckersLocked();
                    subject = describeCheckersLocked(blockedCheckers);
                } else {
                    blockedCheckers = Collections.emptyList();
                    subject = "Open FD high water mark reached";
                }
                allowRestart = mAllowRestart;
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
                Slog.w(TAG, "*** GOODBYE!");

                // Check if we should do system dump or not
                if (errorHandlingInfo.mSystemDump) {
                    mActivity.forceCrashDump(errorHandlingInfo);
                }

                Process.killProcess(Process.myPid());
                System.exit(10);//系统重启
            }

            waitedHalf = false;
        }
    }

检测关键类HandlerChecker

    public final class HandlerChecker implements Runnable {
        private final Handler mHandler;//检测的线程对应的Handler
        private final String mName;
        private final long mWaitMax;//等待最大时间60s
        private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();//只存在与foreground thread对应的HandlerChecker中,用来描述系统的核心服务,检测其中是否存在deadlock
        private boolean mCompleted;//检测完成状态
        private Monitor mCurrentMonitor;//当前检测的服务
        private long mStartTime;//在一次60s检测中,记录开始时间

        HandlerChecker(Handler handler, String name, long waitMaxMillis) {
            mHandler = handler;
            mName = name;
            mWaitMax = waitMaxMillis;
            mCompleted = true;
        }

        public void addMonitor(Monitor monitor) {
            mMonitors.add(monitor);
        }

        public void scheduleCheckLocked() {
            if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread.  Note that we
                // only do this if mCheckReboot is false and we have no
                // monitors, since those would need to be executed at this point.
                mCompleted = true;
                return;
            }

            if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
            }

            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();//记录当前执行检测的时间
            mHandler.postAtFrontOfQueue(this);//在对应线程的messagequeue的头部发送一个消息
        }

        public boolean isOverdueLocked() {//是否存在超时
            //mCompleted==false并且执行已经超时60s未完成检测
            return (!mCompleted) && (SystemClock.uptimeMillis() > mStartTime + mWaitMax);
        }

        public int getCompletionStateLocked() {
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

        public Thread getThread() {
            return mHandler.getLooper().getThread();
        }

        public String getName() {
            return mName;
        }

        public String describeBlockedStateLocked() {
            if (mCurrentMonitor == null) {
                return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";
            } else {
                return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
                        + " on " + mName + " (" + getThread().getName() + ")";
            }
        }

        @Override
        public void run() {
            //phase1:检测死锁
            final int size = mMonitors.size();
            for (int i = 0 ; i < size ; i++) {
                synchronized (Watchdog.this) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();//尝试获取各个服务中的lock
            }

            //phase2:执行到这里分为两种情况
            // action1:mMonitors.size() == 0,属于检测线程loop messagequeue是否存在block即对应线程是否block
            // action1:mMonitors.size() != 0,属于检测deadlock,判断对应服务中的lock是否长时间被占有,未即时释放
            //当执行到这里的时候,说明不存在lock被长时间占有,线程也未存在block情况因为检测发送的消息已经被执行,不存在消息堵塞的情况。
            synchronized (Watchdog.this) {
                mCompleted = true;//标记检测完成
                mCurrentMonitor = null;//清除当前检测记录
            }
        }
    }

总结

当我们理解了SSWD的原理,在实际的开发测试中,遇到SSWD的问题时,才能更加准确的分析原因。根据前面的分析,我们知道有两种SSWD问题,一种死锁,一种线程block,有些时候虽然爆出的是看是死锁或者线程block,但是确实是由于一些其他原因导致的,比如最近开发测试中遇到了一个SSWD问题。
system.log

2018-08-21 11:56:47.867  1602  2239 W Watchdog: dropbox thread timeout!
2018-08-21 11:56:47.867  1602  2239 W Watchdog: *** WATCHDOG KILLING STEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on ActivityManager (ActivityManager), Blocked in handler on PowerManagerService (PowerManagerService), Blocked in handler on main (main)
PROBABLE CAUSE OF PROBLEM:
stem server watchdog

crash-arm64> adblog | grep Watchdog
2018-08-21 11:56:30.863   1602  2239 W Watchdog: CPU usage from 30187ms to 44ms ago (2018-08-21 13:56:00.676 to 2018-08-21 13:56:30.820):
2018-08-21 11:56:37.524   1602  2239 I Watchdog: Collecting Binder Transaction Status Information
2018-08-21 11:56:39.745   1602  2239 E Watchdog: First set of traces taken from /data/anr/anr_2018-08-21-13-55-53-415
2018-08-21 11:56:39.808   1602  2239 E Watchdog: Second set of traces taken from /data/anr/anr_2018-08-21-13-56-30-968
2018-08-21 11:56:47.867   1602  2239 W Watchdog: dropbox thread timeout!
2018-08-21 11:56:47.867   1602  2239 W Watchdog: *** WATCHDOG KILLING STEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on ActivityManager (ActivityManager), Blocked in handler on PowerManagerService (PowerManagerService), Blocked in handler on main (main)
2018-08-21 11:56:47.868   1602  2239 W Watchdog: android.fg annotated stack trace:
2018-08-21 11:56:47.868   1602  2239 W Watchdog:     at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:26669)
2018-08-21 11:56:47.869   1602  2239 W Watchdog:     - waiting to lock <0x0966967f> (a com.android.server.am.ActivityManagerService)
2018-08-21 11:56:47.869   1602  2239 W Watchdog:     at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:218)
2018-08-21 11:56:47.869   1602  2239 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:873)
2018-08-21 11:56:47.869   1602  2239 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:99)
2018-08-21 11:56:47.869   1602  2239 W Watchdog:     at android.os.Looper.loop(Looper.java:285)
2018-08-21 11:56:47.869   1602  2239 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:65)
2018-08-21 11:56:47.869   1602  2239 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:44)
2018-08-21 11:56:47.871   1602  2239 W Watchdog: main annotated stack trace:
2018-08-21 11:56:47.871   1602  2239 W Watchdog:     at com.android.server.am.ActivityManagerService.getIntentSender(ActivityManagerService.java:8683)
2018-08-21 11:56:47.871   1602  2239 W Watchdog:     - waiting to lock <0x0966967f> (a com.android.server.am.ActivityManagerService)
2018-08-21 11:56:47.871   1602  2239 W Watchdog:     at android.app.PendingIntent.getBroadcastAsUser(PendingIntent.java:568)
2018-08-21 11:56:47.871   1602  2239 W Watchdog:     at android.app.PendingIntent.getBroadcast(PendingIntent.java:552)
2018-08-21 11:56:47.871   1602  2239 W Watchdog:     at com.android.server.notification.NotificationManagerService.scheduleTimeoutLocked(NotificationManagerService.java:4861)
2018-08-21 11:56:47.872   1602  2239 W Watchdog:     at com.android.server.notification.NotificationManagerService$EnqueueNotificationRunnable.run(NotificationManagerService.java:4486)
2018-08-21 11:56:47.872   1602  2239 W Watchdog:     - locked <0x0081a5f9> (a java.lang.Object)
2018-08-21 11:56:47.872   1602  2239 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:873)
2018-08-21 11:56:47.872   1602  2239 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:99)
2018-08-21 11:56:47.872   1602  2239 W Watchdog:     at android.os.Looper.loop(Looper.java:285)
2018-08-21 11:56:47.872   1602  2239 W Watchdog:     at com.android.server.stemServer.run(stemServer.java:471)
2018-08-21 11:56:47.872   1602  2239 W Watchdog:     at com.android.server.stemServer.main(stemServer.java:311)
2018-08-21 11:56:47.873   1602  2239 W Watchdog:     at java.lang.reflect.Method.invoke(Native Method)
2018-08-21 11:56:47.873   1602  2239 W Watchdog:     at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
2018-08-21 11:56:47.873   1602  2239 W Watchdog:     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:838)
2018-08-21 11:56:47.877   1602  2239 W Watchdog: ActivityManager annotated stack trace:
2018-08-21 11:56:47.878   1602  2239 W Watchdog:     at com.android.server.am.ActivityManagerService$MainHandler.handleMessage(ActivityManagerService.java:2335)
2018-08-21 11:56:47.878   1602  2239 W Watchdog:     - waiting to lock <0x0966967f> (a com.android.server.am.ActivityManagerService)
2018-08-21 11:56:47.878   1602  2239 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:106)
2018-08-21 11:56:47.878   1602  2239 W Watchdog:     at android.os.Looper.loop(Looper.java:285)
2018-08-21 11:56:47.878   1602  2239 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:65)
2018-08-21 11:56:47.878   1602  2239 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:44)
2018-08-21 11:56:47.879   1602  2239 W Watchdog: PowerManagerService annotated stack trace:
2018-08-21 11:56:47.879   1602  2239 W Watchdog:     at com.android.server.am.ActivityManagerService.startService(ActivityManagerService.java:20916)
2018-08-21 11:56:47.879   1602  2239 W Watchdog:     - waiting to lock <0x0966967f> (a com.android.server.am.ActivityManagerService)
2018-08-21 11:56:47.879   1602  2239 W Watchdog:     at android.app.ContextImpl.startServiceCommon(ContextImpl.java:1562)
2018-08-21 11:56:47.879   1602  2239 W Watchdog:     at android.app.ContextImpl.startServiceAsUser(ContextImpl.java:1549)
2018-08-21 11:56:47.879   1602  2239 W Watchdog:     at com.android.server.power.PowerManagerService.handleSendIntentToIntelligentBacklight(PowerManagerService.java:2516)
2018-08-21 11:56:47.880   1602  2239 W Watchdog:     at com.android.server.power.PowerManagerService.access$3300(PowerManagerService.java:135)
2018-08-21 11:56:47.880   1602  2239 W Watchdog:     at com.android.server.power.PowerManagerService$PowerManagerHandler.handleMessage(PowerManagerService.java:4111)
2018-08-21 11:56:47.880   1602  2239 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:106)
2018-08-21 11:56:47.880   1602  2239 W Watchdog:     at android.os.Looper.loop(Looper.java:285)
2018-08-21 11:56:47.880   1602  2239 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:65)
2018-08-21 11:56:47.880   1602  2239 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:44)
2018-08-21 11:56:47.881   1602  2239 W Watchdog: main annotated stack trace:
2018-08-21 11:56:47.882   1602  2239 W Watchdog:     at com.android.server.am.ActivityManagerService.getIntentSender(ActivityManagerService.java:8683)
2018-08-21 11:56:47.882   1602  2239 W Watchdog:     - waiting to lock <0x0966967f> (a com.android.server.am.ActivityManagerService)
2018-08-21 11:56:47.882   1602  2239 W Watchdog:     at android.app.PendingIntent.getBroadcastAsUser(PendingIntent.java:568)
2018-08-21 11:56:47.882   1602  2239 W Watchdog:     at android.app.PendingIntent.getBroadcast(PendingIntent.java:552)
2018-08-21 11:56:47.882   1602  2239 W Watchdog:     at com.android.server.notification.NotificationManagerService.scheduleTimeoutLocked(NotificationManagerService.java:4861)
2018-08-21 11:56:47.882   1602  2239 W Watchdog:     at com.android.server.notification.NotificationManagerService$EnqueueNotificationRunnable.run(NotificationManagerService.java:4486)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     - locked <0x0081a5f9> (a java.lang.Object)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:873)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:99)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at android.os.Looper.loop(Looper.java:285)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at com.android.server.stemServer.run(stemServer.java:471)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at com.android.server.stemServer.main(stemServer.java:311)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at java.lang.reflect.Method.invoke(Native Method)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
2018-08-21 11:56:47.883   1602  2239 W Watchdog:     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:838)
2018-08-21 11:56:47.883   1602  2239 W Watchdog: *** GOODBYE!

咋一看像发生deadlock和线程执行耗时操作了,我们继续分析获取到system server进程所有线程的状态

(gdb-arm) art info threads 
thread list len = 144
   1 "main" thin_tid=1 sysTid=1602 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x7473bb98 self: ('art::Thread'*)0x78fc814c00
        - waiting to lock <0x13e426b8> (a ('art::Thread'*)0x78fc85e800
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
  21 "PowerManagerService" thin_tid=22 sysTid=1678 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e46558 self: ('art::Thread'*)0x78f2778000
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
        - waiting to lock <0x163cd090> (a Ljava/lang/Object;) held by tid=2078 (WifiScanningService)
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
  50 "WifiScanningService" thin_tid=54 sysTid=2078 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e4b998 self: ('art::Thread'*)0x78df979800
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
  59 "DeviceStorageMonitorService" thin_tid=63 sysTid=2091 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e4e938 self: ('art::Thread'*)0x78d7a8b400
---Type <return> to continue, or q <return> to quit---
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
  60 "AudioService" thin_tid=64 sysTid=2092 kNative
      | group="N/A" sCount=0 dsCount=0 obj=0x13e4eba0 self: ('art::Thread'*)0x78d7a8c000
  61 "Binder:1602_3" thin_tid=65 sysTid=2095 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e4eeb0 self: ('art::Thread'*)0x78d7a90800
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
  
  98 "Binder:1602_7" thin_tid=86 sysTid=2685 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e51918 self: ('art::Thread'*)0x78f5d41800
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
  99 "Binder:1602_8" thin_tid=103 sysTid=2734 kNative
      | group="N/A" sCount=0 dsCount=0 obj=0x13e519d0 self: ('art::Thread'*)0x78ccc72400
 
 105 "Binder:1602_A" thin_tid=106 sysTid=5644 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e51f98 self: ('art::Thread'*)0x78ce693800
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 108 "Binder:1602_D" thin_tid=110 sysTid=5664 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e521d8 self: ('art::Thread'*)0x78c32da800
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 109 "Binder:1602_E" thin_tid=101 sysTid=5666 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e52268 self: ('art::Thread'*)0x78d7be1000
        - waiting to lock <0x163cd090> (a Ljava/lang/Object;) held by tid=2078 (WifiScanningService)
 111 "Binder:1602_10" thin_tid=113 sysTid=6152 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e526a0 self: ('art::Thread'*)0x78f5c31000
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 115 "Binder:1602_14" thin_tid=117 sysTid=6158 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e533a0 self: ('art::Thread'*)0x78d4c10c00
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
('art::Thread'*)0x78c32bac00
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 136 "Binder:1602_1B" thin_tid=138 sysTid=14505 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e55f98 self: ('art::Thread'*)0x78c329f800
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 137 "Binder:1602_1C" thin_tid=140 sysTid=15654 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e56048 self: ('art::Thread'*)0x78c4554000
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 138 "Binder:1602_1D" thin_tid=119 sysTid=16726 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e560e8 self: ('art::Thread'*)0x78df052c00
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 139 "Binder:1602_1E" thin_tid=131 sysTid=17690 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x13e56170 self: ('art::Thread'*)0x78d4b88000
        - waiting to lock <0x13e426b8> (a Lcom/android/server/am/ActivityManagerService;) held by tid=2421 (Binder:1602_6)
 140 "Binder:1602_1F" thin_tid=129 sysTid=18957 kBlocked
      | group="N/A" sCount=0 dsCount=0 obj=0x14680f30 self: ('art::Thread'*)0x78df8aa000
        - waiting to lock <0x163cd090> (a Ljava/lang/Object;) held by tid=2078 

发现所有的线程都在等待2421线程,打出其堆栈

(gdb-arm)  art bt
#0  Ljava/io/FileDescriptor;.sync()
#1  Landroid/os/FileUtils;.sync()
#2  Lcom/android/server/DropBoxManagerService;.add()
#3  Lcom/android/server/DropBoxManagerService$2;.add()
#4  Landroid/os/DropBoxManager;.addText()
#5  Lcom/android/server/am/ActivityManagerService$24;.run()
#6  Lcom/android/server/am/ActivityManagerService;.addErrorToDropBox()
#7  Lcom/android/server/am/ActivityManagerService;.addErrorToDropBox()
#8  Lcom/android/server/am/ActivityManagerService;.handleApplicationWtfInner()
#9  Lcom/android/server/am/ActivityManagerService;.handleApplicationWtf()
#10 Lcom/android/internal/os/RuntimeInit;.wtf()
#11 Landroid/util/Log$1;.onTerribleFailure()
#12 Landroid/util/Log;.wtf()
#13 Lcom/android/server/am/ActivityManagerService;.checkBroadcastFromSystem()
#14 Lcom/android/server/am/ActivityManagerService;.broadcastIntentLocked()
#15 Lcom/android/server/am/ActivityManagerService;.broadcastIntent()
#16 Landroid/app/ContextImpl;.sendBroadcastAsUser()
#17 Landroid/content/ContextWrapper;.sendBroadcastAsUser()
#18 L×××/system/debugbuild/DebugbuildHandler;.onBroadcast()
#19 L×××/common/CrashMonitor;.onBroadcast()

crash-arm64> kbt -s 2421 //内核堆栈
Kernel stack for pid 2421

#0  0xffffff8304486434 in __switch_to+0x8c(+140)   
#1  [INLINE]   context_switch
#2  0xffffff8305606a9c in __schedule+0x30c(+780)   
#3  0xffffff8305607084 in schedule+0x38(+56)   
#4  0xffffff83046f3e88 in jbd2_log_wait_commit+0x94(+148)   //内核fsync往磁盘同步数据等待
#5  0xffffff83046f5604 in jbd2_complete_transaction+0x88(+136)   
#6  0xffffff8304693c78 in ext4_sync_file+0x1e8(+488)   
#7  0xffffff8304639764 in sys_fsync+0x54(+84)   
#8  0xffffff83044833b0 in el0_svc_naked+0x24(+36)

根据堆栈我们可以知道,该线程正在准备发送广播,但是checkBroadcastFromSystem时,发现该广播未受保护不安全,便对应的进行wtflog出处,打印并保存堆栈log信息,从最终的堆栈信息可以看出,最终线程执行卡在同步信息到磁盘时发生等待。我们知道系统dropbox的log是保存到/data/system/dropbox路径下的,再结合一下system mainlog,或许我们会发现一些信息

2018-08-21 11:54:10.005   1602  2091 W DiceStorageMonitorService: java.io.IOException: Failed to free 1048576000 on storage dice at /data
at com.android.server.storage.DiceStorageMonitorService.check(DiceStorageMonitorService.java:194)
at com.android.server.storage.DiceStorageMonitorService.access$100(DiceStorageMonitorService.java:73)
at com.android.server.storage.DiceStorageMonitorService$1.handleMessage(DiceStorageMonitorService.java:258)
2018-08-21 11:54:50.213   1208  1280 E storaged: getDiskStats failed with result NOT_SUPPORTED and size 0
2018-08-21 11:55:10.015   1602  2091 I PackageManager: Deleting preloaded file cache /data/preloads/file_cache
2018-08-21 11:55:10.274   1602 16726 I am_wtf: [0,1602,stem_server,-1,ActivityManager,Sending non-protected broadcast *** from stem 1602:stem/1000 pkg android]
2018-08-21 11:55:10.333   1602 16726 I am_wtf: [0,1602,stem_server,-1,ActivityManager,Sending non-protected broadcast *** from stem 1602:stem/1000 pkg android]
2018-08-21 11:55:10.346   1602 16726 I am_wtf: [0,1602,stem_server,-1,ActivityManager,Sending non-protected broadcast *** from stem 1602:stem/1000 pkg android]
2018-08-21 11:55:10.547   1602  2091 W DiceStorageMonitorService: java.io.IOException: Failed to free 1048576000 on storage dice at /data
at com.android.server.storage.DiceStorageMonitorService.check(DiceStorageMonitorService.java:194)
at com.android.server.storage.DiceStorageMonitorService.access$100(DiceStorageMonitorService.java:73)
at com.android.server.storage.DiceStorageMonitorService$1.handleMessage(DiceStorageMonitorService.java:258)
2018-08-21 11:55:10.547   1602  2091 I storage_state: [41217664-9172-527a-b3d5-edabb50a7d69,1,1,116199424,23246843904]
2018-08-21 11:55:11.043   1602  2421 I am_wtf: [0,1602,stem_server,-1,ActivityManager,Sending non-protected broadcast *** from stem 1602:stem/1000 pkg android]
2018-08-21 11:55:50.213   1208  1280 E storaged: getDiskStats failed with result NOT_SUPPORTED and size 0
2018-08-21 11:56:10.590   1602  2091 I storage_state: [41217664-9172-527a-b3d5-edabb50a7d69,1,0,2037506048,23246843904]

从log中我们可以看出data分区处于低内存状态,剩余空间不足,因此导致了SSWD问题发生。至于根本原因,当然也需要framework同事结合log做深入的分析。

扫描二维码关注公众号,回复: 2943599 查看本文章

猜你喜欢

转载自blog.csdn.net/abm1993/article/details/82107801