近期时间相对宽裕一些,把相关的知识点整理下放到博客~
封装的Java层内存泄露监测工具主要基于开源的leakcanary project，下面对Leakcanary原理浅析

Leakcanary简介

Leakcanary工具是用来检测Java层内存泄露的工具,严格的说是检测Activty的内存泄露(监测的是Activity的onDestroy方法,后面会提这一点),能帮助我们发现很多隐藏的内存问题,降低应用内存泄露及OOM的概率.

什么是内存泄露?

内存泄漏（Memory Leak）是指程序中己动态分配的堆内存由于某种原因程序未释放或无法释放，造成系统内存的浪费，导致程序运行速度减慢甚至系统崩溃等严重后果。

在介绍原理前留几个问题,相信后面看完浅析就会有答案~
Leakcanary能检测什么样的泄露问题?
能不能发现所有的内存泄露?
怎么集成Leakcanary?
如何自定义二次开发?
深一点的问题:一个泄露点,是不是每次泄露路径都是一样的?

原理介绍

初始化部分

Leakcanary的入口函数是LeakCanary.install()方法,调用时机是在Application.onCreate()方法中,集成方式有多种
1:可以把LeakCanary.install()方法直接放在Application的onCreate()方法中,然后重新打包生成apk后开始测试,这种方式每个App都需要重新打包后测试,当App较多且升级较为频繁时时比较繁琐;
2:把LeakCanary.install()方法放到系统中,通过系统属性来控制,默认不开启,这种方式需要改动系统源码造成系统较为冗余,而且应用每次创建时都会判断属性,但相对于方案1,其实比较省力了;
3:采用hook方式,hook Application.onCreate方法,测试时自动带着LeakCanary.install(),这种方案相对来讲对于系统的方式比较友好,测试的代码跟系统运行是分离的,也比较头疼,因为每个版本都需要处理hook的问题~
本人选的是方式3,集成方案因人而异,根据不同需求选择不同的集成方式~
看一下LeakCanary.install()后的主要逻辑:

  public static RefWatcher install(Application application) {
    return refWatcher(application).listenerServiceClass(DisplayLeakService.class)
        .excludedRefs(AndroidExcludedRefs.createAppDefaults().build())
        .buildAndInstall();
  }

install方法中包含了3个方法,主要通过AndroidRefWatcherBuilder这个辅助类来实现初始化功能

  /** Builder to create a customized {@link RefWatcher} with appropriate Android defaults. */
  public static AndroidRefWatcherBuilder refWatcher(Context context) {
    return new AndroidRefWatcherBuilder(context);
  }

listenerServiceClass方法主要完成结果分析的服务绑定功能,excludedRefs方法主要完成排除可以忽略的泄露路径(有些泄露点跟系统实现相关,需要排除掉),内容在AndroidExcludedRefs中有枚举,buildAndInstall完成主要的初始化工作~

  public RefWatcher buildAndInstall() {
    RefWatcher refWatcher = build();
    if (refWatcher != DISABLED) {
      LeakCanary.enableDisplayLeakActivity(context);
      ActivityRefWatcher.install((Application) context, refWatcher);
    }
    return refWatcher;
  }

buildAndInstall方法中实例化RefWatcher,并完成ActivityRefWatcher的相应初始化,主角慢慢登场了~

  public static void install(Application application, RefWatcher refWatcher) {
    new ActivityRefWatcher(application, refWatcher).watchActivities();
  }
  public void watchActivities() {
    // Make sure you don't get installed twice.
    stopWatchingActivities();
    application.registerActivityLifecycleCallbacks(lifecycleCallbacks);
  }

在install方法中,会去watchActivities,注册Activity生命周期回调~

  private final Application.ActivityLifecycleCallbacks lifecycleCallbacks =
      new Application.ActivityLifecycleCallbacks() {
        @Override public void onActivityCreated(Activity activity, Bundle savedInstanceState) {
        }

        @Override public void onActivityStarted(Activity activity) {
        }

        @Override public void onActivityResumed(Activity activity) {
        }

        @Override public void onActivityPaused(Activity activity) {
        }

        @Override public void onActivityStopped(Activity activity) {
        }

        @Override public void onActivitySaveInstanceState(Activity activity, Bundle outState) {
        }

        @Override public void onActivityDestroyed(Activity activity) {
          ActivityRefWatcher.this.onActivityDestroyed(activity);
        }
      };

  void onActivityDestroyed(Activity activity) {
    refWatcher.watch(activity);
  }

这个refWatcher是在RefWatcher refWatcher = build()时创建的,到这初始化基本完成,运行场景差不多是当某个Activity执行onDestroy方法时,会回调refWatcher.watch方法,在此方法中判断是否有内存泄漏
在这里补充一句:
onActivityDestroyed方法是在Application.DispatchActivityDestroyed方法中回调的,DispatchActivityDestroyed又是在Activity.onDestroy方法中回调的,也就是说当某个Activity执行super.onDestroy时,就会执行注册的生命周期回调onActivityDestroyed,开始watch

    protected void onDestroy() {
            .......
            getApplication().dispatchActivityDestroyed(this);
            .......
    }

怎么确定是泄露的?

看下RefWatcher的watch方法:

  public void watch(Object watchedReference) {
    watch(watchedReference, "");
  }
  public void watch(Object watchedReference, String referenceName) {
    if (this == DISABLED) {
      return;
    }
    checkNotNull(watchedReference, "watchedReference");
    checkNotNull(referenceName, "referenceName");
    final long watchStartNanoTime = System.nanoTime();
    String key = UUID.randomUUID().toString();
    retainedKeys.add(key);
    final KeyedWeakReference reference =
        new KeyedWeakReference(watchedReference, key, referenceName, queue);

    ensureGoneAsync(watchStartNanoTime, reference);
  }

watchedReference是传下来的Activity引用,key是随机生成的唯一字符串,referenceName是”“,queue是创建RefWatcher时初始化的ReferenceQueue
这个地方需要一点WeakReference和ReferenceQueue的基础,每次WeakReference所指向的对象被GC后，这个弱引用都会被放入这个与之相关联的ReferenceQueue队列中(Reference源码),如果我们期望某个对象被回收,那么在预期时间后依然没出现在ReferenceQueue中,此时就可以判断是有泄露,后面的判断与分析也是基于此逻辑,我感觉,这应该Leakcanary的核心原理~

  RefWatcher(WatchExecutor watchExecutor, DebuggerControl debuggerControl, GcTrigger gcTrigger,
      HeapDumper heapDumper, HeapDump.Listener heapdumpListener, ExcludedRefs excludedRefs) {
    this.watchExecutor = checkNotNull(watchExecutor, "watchExecutor");
    this.debuggerControl = checkNotNull(debuggerControl, "debuggerControl");
    this.gcTrigger = checkNotNull(gcTrigger, "gcTrigger");
    this.heapDumper = checkNotNull(heapDumper, "heapDumper");
    this.heapdumpListener = checkNotNull(heapdumpListener, "heapdumpListener");
    this.excludedRefs = checkNotNull(excludedRefs, "excludedRefs");
    retainedKeys = new CopyOnWriteArraySet<>();
    queue = new ReferenceQueue<>();
  }

解释下这几个成员变量:
watchExecutor: 执行内存泄露检测的异步线程executor,是AndroidWatchExecutor的实例
debuggerControl ：用于查询是否正在调试中，调试中不会执行内存泄露检测
queue ：用于判断弱引用所持有的对象是否已被GC。
gcTrigger：用于在判断内存泄露之前，再给一次GC的机会
headDumper: 用于在产生内存泄露室执行dump 内存heap
heapdumpListener: 用于分析前面产生的dump文件，找到内存泄露的原因
excludedRefs: 用于排除某些系统bug导致的内存泄露
retainedKeys：持有那些待检测以及产生内存泄露的引用的key。

然后将其封装至KeyedWeakReference,核心方法是ensureGoneAsync,异步判断过程,看下逻辑:

  private void ensureGoneAsync(final long watchStartNanoTime, final KeyedWeakReference reference) {
    watchExecutor.execute(new Retryable() {
      @Override public Retryable.Result run() {
        return ensureGone(reference, watchStartNanoTime);
      }
    });
  }

调用AndroidWatcherExecutor的execute方法,看下具体逻辑:

  @Override public void execute(Retryable retryable) {
    if (Looper.getMainLooper().getThread() == Thread.currentThread()) {
      waitForIdle(retryable, 0);
    } else {
      postWaitForIdle(retryable, 0);
    }
  }
  void postWaitForIdle(final Retryable retryable, final int failedAttempts) {
    mainHandler.post(new Runnable() {
      @Override public void run() {
        waitForIdle(retryable, failedAttempts);
      }
    });
  }

  void waitForIdle(final Retryable retryable, final int failedAttempts) {
    // This needs to be called from the main thread.
    Looper.myQueue().addIdleHandler(new MessageQueue.IdleHandler() {
      @Override public boolean queueIdle() {
        postToBackgroundWithDelay(retryable, failedAttempts);
        return false;
      }
    });
  }

  void postToBackgroundWithDelay(final Retryable retryable, final int failedAttempts) {
    long exponentialBackoffFactor = (long) Math.min(Math.pow(2, failedAttempts), maxBackoffFactor);
    long delayMillis = initialDelayMillis * exponentialBackoffFactor;
    backgroundHandler.postDelayed(new Runnable() {
      @Override public void run() {
        Retryable.Result result = retryable.run();
        if (result == RETRY) {
          postWaitForIdle(retryable, failedAttempts + 1);
        }
      }
    }, delayMillis);
  }

在AndroidRefWatcherBuilder中定义了默认时间,5s
private static final long DEFAULT_WATCH_DELAY_MILLIS = SECONDS.toMillis(5);
从代码逻辑中可知,如果是主线程,则在主线程空闲的时候延时5s处理泄露问题,如果不是主线程,则向主线程post消息,让主线程去处理,出现成空心啊的时候延时5s处理泄露问题,看下主线程需要处理些啥?

//核心方法  
  Retryable.Result ensureGone(final KeyedWeakReference reference, final long watchStartNanoTime) {
    long gcStartNanoTime = System.nanoTime();
    long watchDurationMs = NANOSECONDS.toMillis(gcStartNanoTime - watchStartNanoTime);//

    removeWeaklyReachableReferences(); //检查一次弱引用是否已回收

    if (debuggerControl.isDebuggerAttached()) {
      // The debugger can create false leaks.
      return RETRY;
    }
    if (gone(reference)) { //如果回收了,activity没有泄露
      return DONE;
    }
    gcTrigger.runGc();   //还是没有回收,触发GC一次
    removeWeaklyReachableReferences(); //再检查一次弱引用是否已回收
    if (!gone(reference)) { //还没回收,怀疑是内存泄露,dump内存快照hprof,再做分析
      long startDumpHeap = System.nanoTime();
      long gcDurationMs = NANOSECONDS.toMillis(startDumpHeap - gcStartNanoTime);

      File heapDumpFile = heapDumper.dumpHeap();
      if (heapDumpFile == RETRY_LATER) {
        // Could not dump the heap.
        return RETRY;
      }
      long heapDumpDurationMs = NANOSECONDS.toMillis(System.nanoTime() - startDumpHeap);
      heapdumpListener.analyze(
          new HeapDump(heapDumpFile, reference.key, reference.name, excludedRefs, watchDurationMs,
              gcDurationMs, heapDumpDurationMs));
    }
    return DONE;
  }

看下removeWeaklyReachableReferences具体实现:

  private void removeWeaklyReachableReferences() {
    // WeakReferences are enqueued as soon as the object to which they point to becomes weakly
    // reachable. This is before finalization or garbage collection has actually happened.
    KeyedWeakReference ref;
    while ((ref = (KeyedWeakReference) queue.poll()) != null) {
      retainedKeys.remove(ref.key);
    }
  }

上面提到过,如果一个WeakReference所指向的对象被GC了,那么WeakReference会出现在WeakReference对应的ReferenceQueue中,而这个地方就是把已经GC对象的WeakReference所对应的key清除掉,那么剩下的key所对应的弱引用所指向的对象就是发生泄露的对象~

  private boolean gone(KeyedWeakReference reference) {
    return !retainedKeys.contains(reference.key);
  }

所以gone方法比较好理解,只要看看KeyedWeakReference对应的key是否还在retainedKeys Set集合里,如果依然在,说明可能有泄露
ensureGone方法中做了一层双保险,如果发现没被回收时,会主动触发一次GC,再去看是否被回收了,如果还没被回收,则去dump内存快照然后分析~

如何从内存快照中提取内存泄露信息的?

从ensureGone的最后可知,会执行 heapdumpListener.analyze方法,heapdumpListener是ServiceHeapDumpListener类型

  public AndroidRefWatcherBuilder listenerServiceClass(
      Class<? extends AbstractAnalysisResultService> listenerServiceClass) {
    return heapDumpListener(new ServiceHeapDumpListener(context, listenerServiceClass));
  }

看一下ServiceHeapDumpListener的analyze方法:

  @Override public void analyze(HeapDump heapDump) {
    checkNotNull(heapDump, "heapDump");
    HeapAnalyzerService.runAnalysis(context, heapDump, listenerServiceClass);
  }

主要执行了HeapAnalyzerService.runAnalysis方法,看下主要逻辑:

  public static void runAnalysis(Context context, HeapDump heapDump,
      Class<? extends AbstractAnalysisResultService> listenerServiceClass) {
    Intent intent = new Intent(context, HeapAnalyzerService.class);
    intent.putExtra(LISTENER_CLASS_EXTRA, listenerServiceClass.getName());
    intent.putExtra(HEAPDUMP_EXTRA, heapDump);
    context.startService(intent);
  }
  @Override protected void onHandleIntent(Intent intent) {
    if (intent == null) {
      CanaryLog.d("HeapAnalyzerService received a null intent, ignoring.");
      return;
    }
    String listenerClassName = intent.getStringExtra(LISTENER_CLASS_EXTRA);
    HeapDump heapDump = (HeapDump) intent.getSerializableExtra(HEAPDUMP_EXTRA);

    HeapAnalyzer heapAnalyzer = new HeapAnalyzer(heapDump.excludedRefs);

    AnalysisResult result = heapAnalyzer.checkForLeak(heapDump.heapDumpFile, heapDump.referenceKey);
    AbstractAnalysisResultService.sendResultToListener(this, listenerClassName, heapDump, result);
  }
}

创建了HeapAnalyzerService来处理heapDump,HeapAnalyzerService是个IntentService,创建后会回调onHandleIntent,从而找出最短路径并展示,主要看下checkForLeak方法:

  public AnalysisResult checkForLeak(File heapDumpFile, String referenceKey) {
    long analysisStartNanoTime = System.nanoTime();

    if (!heapDumpFile.exists()) {
      Exception exception = new IllegalArgumentException("File does not exist: " + heapDumpFile);
      return failure(exception, since(analysisStartNanoTime));
    }

    try {
      HprofBuffer buffer = new MemoryMappedFileBuffer(heapDumpFile);
      HprofParser parser = new HprofParser(buffer);
      Snapshot snapshot = parser.parse();
      deduplicateGcRoots(snapshot);

      Instance leakingRef = findLeakingReference(referenceKey, snapshot);

      // False alarm, weak reference was cleared in between key check and heap dump.
      if (leakingRef == null) {
        return noLeak(since(analysisStartNanoTime));
      }

    } catch (Throwable e) {
      return failure(e, since(analysisStartNanoTime));
    }
  }

checkForLeak有几个方法很关键,也是四个步骤:
1:HprofParser的parse方法,把hprof转为SnapShot对象,MAT工具解析hporof文件时也是用此方式,里面应该主要是关系树结构,各个引用链,我们可以随意查看
2:HeapAnalyzer的deduplicateGcRoots方法,去除重复性的gc root,减小内存开销,这个里面逻辑不复杂~

  /**
   * Pruning duplicates reduces memory pressure from hprof bloat added in Marshmallow.
   */
  void deduplicateGcRoots(Snapshot snapshot) {
    // THashMap has a smaller memory footprint than HashMap.
    final THashMap<String, RootObj> uniqueRootMap = new THashMap<>();

    final Collection<RootObj> gcRoots = snapshot.getGCRoots();
    for (RootObj root : gcRoots) {
      String key = generateRootKey(root);
      if (!uniqueRootMap.containsKey(key)) {
        uniqueRootMap.put(key, root);
      }
    }

    // Repopulate snapshot with unique GC roots.
    gcRoots.clear();
    uniqueRootMap.forEach(new TObjectProcedure<String>() {
      @Override public boolean execute(String key) {
        return gcRoots.add(uniqueRootMap.get(key));
      }
    });
  }

3:HeapAnalyzer的findLeakingReference方法,主要是根据泄露的Key信息,从snapshot中查找到泄露的实例,方法不复杂
有人会有疑问,如果内存中有多个实例呢?或者有的实例是泄露,有的实例不是泄露的呢?请注意,key生成是与对象一一对应的,也就是通过key,就可以找到真正泄露的对象,其他的在内存中存在的实例不会处理

  private Instance findLeakingReference(String key, Snapshot snapshot) {
    ClassObj refClass = snapshot.findClass(KeyedWeakReference.class.getName());
    List<String> keysFound = new ArrayList<>();
    for (Instance instance : refClass.getInstancesList()) {
      List<ClassInstance.FieldValue> values = classInstanceValues(instance);
      String keyCandidate = asString(fieldValue(values, "key"));
      if (keyCandidate.equals(key)) {
        return fieldValue(values, "referent");
      }
      keysFound.add(keyCandidate);
    }
    throw new IllegalStateException(
        "Could not find weak reference with key " + key + " in " + keysFound);
  }

4:HeapAnalyzer的findLeakTrace方法,关键方法,主要工作是计算到GC ROOT的最短路径,并确认是否是泄露,如果确定是泄露,生成泄露的引用链.逻辑相对复杂一些,在代码里加些注释

  private AnalysisResult findLeakTrace(long analysisStartNanoTime, Snapshot snapshot,
      Instance leakingRef) {

    ShortestPathFinder pathFinder = new ShortestPathFinder(excludedRefs); //从字面意思也能理解,主要负责生成泄漏点到GC ROOT的最短路径,排除掉系统性问题,这些不会出现在最短路径上
    ShortestPathFinder.Result result = pathFinder.findPath(snapshot, leakingRef); //开始查找最短路径,这个方法很复杂,基本可总结为采用广度优先算法,看是否可达,我在附件补充下这个方法吧,感兴趣的可以看下~

    // False alarm, no strong reference path to GC Roots.
    if (result.leakingNode == null) {
      return noLeak(since(analysisStartNanoTime));
    }

    LeakTrace leakTrace = buildLeakTrace(result.leakingNode); //将最短路径转换为需要显示的LeakTrace对象,这个对象中包括了一个由路径上各个节点LeakTraceElement组成的链表，代表了检查到的最短泄漏路径

    String className = leakingRef.getClassObj().getClassName();

    // Side effect: computes retained size.
    snapshot.computeDominators();

    Instance leakingInstance = result.leakingNode.instance;

    long retainedSize = leakingInstance.getTotalRetainedSize(); //此次泄露的总大小

    // TODO: check O sources and see what happened to android.graphics.Bitmap.mBuffer
    if (SDK_INT <= N_MR1) {
      retainedSize += computeIgnoredBitmapRetainedSize(snapshot, leakingInstance);
    }

    return leakDetected(result.excludingKnownLeaks, className, leakTrace, retainedSize,
        since(analysisStartNanoTime));
  }

最后一步就是将AnalysisResult对象交给DisplayLeakService完成保存与展示的工作(Notification通知用户),这个地方不作为核心原理,一笔带过~
这个地方我个人感觉视情况而定,对于开发来说,泄露点的发现才是重点,这个地方只是泄露的显示部分~如果不想显示,这个模块可以去除,或者在显示的同时,还有其他操作,比如将结果上传至服务器,也是可以做~

  public static void sendResultToListener(Context context, String listenerServiceClassName,
      HeapDump heapDump, AnalysisResult result) {
    Class<?> listenerServiceClass;
    try {
      listenerServiceClass = Class.forName(listenerServiceClassName);
    } catch (ClassNotFoundException e) {
      throw new RuntimeException(e);
    }
    Intent intent = new Intent(context, listenerServiceClass);
    intent.putExtra(HEAP_DUMP_EXTRA, heapDump);
    intent.putExtra(RESULT_EXTRA, result);
    context.startService(intent);
  }

问题

Leakcanary能检测什么样的泄露问题?
一句话就是:Activity生命周期相关的泄露问题

能不能发现所有的内存泄露?
我的理解是不见得,Leakcanary可以检测组件Activity的泄露问题,Activity跟界面息息相关,而且也是四大组件中最为重要的组件~但依我的理解,如果自定义Leakcanary,完全可以实现四大组件的内存泄露检测(原生的Leakcanary监测的Activity的onDestroy生命周期函数,同样的道理也可以监测Service,Provider的生命周期函数,这个地方我倒是没做尝试)~

怎么集成Leakcanary?
1:可以把LeakCanary.install()方法直接放在Application的onCreate()方法中,然后重新打包生成apk后开始测试,这种方式每个App都需要重新打包后测试,当App较多且升级较为频繁时时比较繁琐;
2:把LeakCanary.install()方法放到系统中,通过系统属性来控制,默认不开启,这种方式需要改动系统源码造成系统较为冗余,而且应用每次创建时都会判断属性,但相对于方案1,其实比较省力了;
3:采用hook方式,hook Application.onCreate方法,测试时自动带着LeakCanary.install(),这种方案相对来讲对于系统的方式比较友好,测试的代码跟系统运行是分离的,也比较头疼,因为每个版本都需要处理hook的问题~

如何自定义二次开发?
Leakcanary的核心部分是泄漏点的确定以及泄露路径的生成,这部分基本不需要动,,二次开发需结合自身需求,集成方式有自定义开发空间,对于结果如何处理有自定义开发空间,对于ExcludedRefs集合同样也有自定义空间

深一点的问题:一个泄露点,是不是每次泄露路径都是一样的?
不一定,只能讲绝大多数情况是一样的泄露路径,但Leakcanary在寻找最短路径时,这个泄露点到GC ROOT有多条最短路径,这个时候输出的泄露路径不一定是一样的~但换句话来讲,如果有多个GC ROOT都对Activity可达,说明此Activity泄露的几率非常之高~更应该引起重视

Android内存优化(一)之Java层内存泄露监测工具原理（Leakcanary）