Android 平台下的 Method Trace 实现解析

Android 中的MethodTrace

对于开发者来说，Android的Java 层提供了两种开发者可直接调用的 Method Trace 的API，一是 android.os.Debug类中的 startMethodTracing，第二个是android.os.Trace 类中的。这两者的区别是 Debug类只能监控 Java函数调用，而Trace类底层是使用 atrace 实现，其追踪的函数会包含了应用及系统的Java 和Native函数，并且底层基于ftrace还可以追踪cpu的详细活动信息。

本文主要是为了分析Java层的Method Trace，因此主要研究 Debug类的 startMethodTracing的底层实现。

系统 Method Trace 实现解析

直接进入主题，本节会将Method Trace 分位启动Trace、Trace进行中、结束Trace三个阶段进行分析。

启动Trace

当调用Debug.startMethodTracing 函数开始Trace时，其在native层的实际调用的API为 art/runtime/trace.cc 的 Trace::Start() 函数，该函数参数包含以下入参: 写入的File指针，buffer大小，flags信息、TraceOutputMode(输出模式)、TraceMode(trace实现方式)。该函数核心的逻辑在于需要根据TraceMode的值采取不同的 函数调用监听方案。 TraceMode的定义只有两种，分为 kMethodTracing 以及kSampling，分别对应在Java层调用Debug.startMethodTracing 以及 Debug.startMethodTracingSamping 。

void Trace::Start(std::unique_ptr<File>&& trace_file_in,
                  size_t buffer_size,
                  int flags,
                  TraceOutputMode output_mode,
                  TraceMode trace_mode,
                  int interval_us)
  
  //.. 省略
  //create Trace  
  {
    // Required since EnableMethodTracing calls ConfigureStubs which visits class linker classes.
    gc::ScopedGCCriticalSection gcs(self,
                                    gc::kGcCauseInstrumentation,
                                    gc::kCollectorTypeInstrumentation);
    ScopedSuspendAll ssa(__FUNCTION__);
    MutexLock mu(self, *Locks::trace_lock_);
    if (the_trace_ != nullptr) {
      //已经存在trace 实例，忽略本次调用 
      LOG(ERROR) << "Trace already in progress, ignoring this request";
        
    } else {
      enable_stats = (flags & kTraceCountAllocs) != 0;
      the_trace_ = new Trace(trace_file.release(), buffer_size, flags, output_mode, trace_mode);
      if (trace_mode == TraceMode::kSampling) {
        CHECK_PTHREAD_CALL(pthread_create, (&sampling_pthread_, nullptr, &RunSamplingThread,
                                            reinterpret_cast<void*>(interval_us)),
                                            "Sampling profiler thread");
        the_trace_->interval_us_ = interval_us;
      } else {
        runtime->GetInstrumentation()->AddListener(
            the_trace_,
            instrumentation::Instrumentation::kMethodEntered |
                instrumentation::Instrumentation::kMethodExited |
                instrumentation::Instrumentation::kMethodUnwind);
        // TODO: In full-PIC mode, we don't need to fully deopt.
        // TODO: We can only use trampoline entrypoints if we are java-debuggable since in that case
        // we know that inlining and other problematic optimizations are disabled. We might just
        // want to use the trampolines anyway since it is faster. It makes the story with disabling
        // jit-gc more complex though.
        runtime->GetInstrumentation()->EnableMethodTracing(
            kTracerInstrumentationKey, /*needs_interpreter=*/!runtime->IsJavaDebuggable());
      }
    }
  }}
复制代码

if (the_trace_ != nullptr) {
      //已经存在trace 实例，忽略本次调用 
      LOG(ERROR) << "Trace already in progress, ignoring this request";
        
    }
复制代码

在实现中，可以看到在创建Trace实例时，如果判断当前已经存在trace实例(the_trace 变量)，则会忽略这次调用。如果不存在，才会调用Trace构造函数函数，创建出trace实例，因此在一次Trace流程未结束前，多次调用StartTrace是无效的。

当真正开始创建Trace实例时，是通用 new Trace()创建的，创建实例之后，开始根据不同的 TraceMode，进行真正的函数调用监听实现。这里根据TraceMode 分为了采样类型(TraceMode::kSampling) 以及插桩类型(TraceMode::kMethodTracing)的方式。

Trace 过程

采样类型 Trace

先关注下采样类型的实现，首先会通过 pthread_create 创建一个采样工作线程，这个线程执行的是 Trace::RunSamplingThread(void* arg) 函数,在该函数内部会定期通过 Runtime对象的GetThreadList获取所有的线程，

之后遍历每个线程执行 GetSample函数获取每个线程当前的调用栈。

void* Trace::RunSamplingThread(void* arg) {
  Runtime* runtime = Runtime::Current();
  intptr_t interval_us = reinterpret_cast<intptr_t>(arg);
  
  while (true) {
    //..省略
    {
     //..
      runtime->GetThreadList()->ForEach(GetSample, the_trace);
    }
  }

  runtime->DetachCurrentThread();
  return nullptr;
}
复制代码

继续追踪GetSample函数的具体实现, 在该函数内部会通过 StackVisitor::WalkStack 进行栈回溯从而获取当前的调用栈信息并保存在 stack_trace中，之后会调用 the_trace->CompareAndUpdateStackTrace(thread, stack_trace); 进行数据处理

static void GetSample(Thread* thread, void* arg) REQUIRES_SHARED(Locks::mutator_lock_) {
  std::vector<ArtMethod*>* const stack_trace = Trace::AllocStackTrace();
  StackVisitor::WalkStack(
      [&](const art::StackVisitor* stack_visitor) REQUIRES_SHARED(Locks::mutator_lock_) {
        ArtMethod* m = stack_visitor->GetMethod();
        // Ignore runtime frames (in particular callee save).
        if (!m->IsRuntimeMethod()) {
          stack_trace->push_back(m);
        }
        return true;
      },
      thread,
      /* context= */ nullptr,
      art::StackVisitor::StackWalkKind::kIncludeInlinedFrames);
  Trace* the_trace = reinterpret_cast<Trace*>(arg);
    
  //更新对应线程的 StackSapmle  
  the_trace->CompareAndUpdateStackTrace(thread, stack_trace);
}
复制代码

继续分析 CompareAndUpdateStackTrace函数，在这个函数中，主要的工作是基于堆栈对比，来判断并写入函数的状态变更。

void Trace::CompareAndUpdateStackTrace(Thread* thread,
                                       std::vector<ArtMethod*>* stack_trace) {
  CHECK_EQ(pthread_self(), sampling_pthread_);
  std::vector<ArtMethod*>* old_stack_trace = thread->GetStackTraceSample();
  // Update the thread's stack trace sample.
  thread->SetStackTraceSample(stack_trace);
  // Read timer clocks to use for all events in this trace.
  uint32_t thread_clock_diff = 0;
  uint32_t wall_clock_diff = 0;
  ReadClocks(thread, &thread_clock_diff, &wall_clock_diff);
  if (old_stack_trace == nullptr) {
    // If there's no previous stack trace sample for this thread, log an entry event for all
    // methods in the trace.
    for (auto rit = stack_trace->rbegin(); rit != stack_trace->rend(); ++rit) {
      LogMethodTraceEvent(thread, *rit, instrumentation::Instrumentation::kMethodEntered,
                          thread_clock_diff, wall_clock_diff);
    }
  } else {
    // If there's a previous stack trace for this thread, diff the traces and emit entry and exit
    // events accordingly.
    auto old_rit = old_stack_trace->rbegin();
    auto rit = stack_trace->rbegin();
    // Iterate bottom-up over both traces until there's a difference between them.
    while (old_rit != old_stack_trace->rend() && rit != stack_trace->rend() && *old_rit == *rit) {
      old_rit++;
      rit++;
    }
    // Iterate top-down over the old trace until the point where they differ, emitting exit events.
    for (auto old_it = old_stack_trace->begin(); old_it != old_rit.base(); ++old_it) {
      LogMethodTraceEvent(thread, *old_it, instrumentation::Instrumentation::kMethodExited,
                          thread_clock_diff, wall_clock_diff);
    }
    // Iterate bottom-up over the new trace from the point where they differ, emitting entry events.
    for (; rit != stack_trace->rend(); ++rit) {
      LogMethodTraceEvent(thread, *rit, instrumentation::Instrumentation::kMethodEntered,
                          thread_clock_diff, wall_clock_diff);
    }
    FreeStackTrace(old_stack_trace);
  }
}
复制代码

主要的逻辑如下

首先通过 thread->GetStackSampler获取上一次记录的 stackTraceSample, 并通过thread->SetStackTraceSample 记录该线程当前最新的stackTraceSample ；
如果thread->GetStackSampler 获取上次的stackSample 为空，则直接遍历最新的stackTrace，并调用 LogMethodTraceEvent 记录函数事件，记录的函数事件类型全部为kMethodEntered ,表示这些函数已入栈
如果thread->GetStackSampler 不为空，则通过比较最新的StackTrace 和上次的StackTrace，来记录函数事件，主要逻辑如下

-   从栈底遍历新旧StackTrace，分别找到栈帧不一致的开始点 记为 old_rit, rit
-   针对旧的StackTrace ，从栈顶到old_rit 记录这部分函数的 EXIT 事件
-   对于新的StackTrace ，从 rit 到栈顶 记录这部分函数的 Entered事件
-   举个例子，比如上次的栈是 `A B C D E F G` (顺序为栈底到栈顶)，最新的是 `A B C H B C D`, 
    则认为`  D E F G  `函数POP了，并且又 PUSH了`  H B C D `函数,每次采样间隔，只要这个函数没有被 POP，则这个函数耗时又增加了采样间隔的ms数。
复制代码

LogMethodTraceEvent 函数内部会执行实际信息记录操作，每个函数事件所要记录的内容其所占的字节大小是固定的，在TraceClockSource 为kDual的情况下(表示同时使用WallTime 和 cpuTime记录)，每个函数事件通常会包含以下信息：

2个字节的线程ID + 4个字节的 EncodeTraceMethodAndAction + 4个字节的 thread_cpu_diff + 4个字节的 wall_clock_diff , 也就是说每个函数事件记录占用14个字节，每个信息都是以小端格式写入。

void Trace::LogMethodTraceEvent(Thread* thread, ArtMethod* method,
                                instrumentation::Instrumentation::InstrumentationEvent event,
                                uint32_t thread_clock_diff, uint32_t wall_clock_diff) {
  // Ensure we always use the non-obsolete version of the method so that entry/exit events have the
  // same pointer value.
  method = method->GetNonObsoleteMethod();

  // Advance cur_offset_ atomically.
  int32_t new_offset;
  int32_t old_offset = 0;

  // In the non-streaming case, we do a busy loop here trying to get
  // an offset to write our record and advance cur_offset_ for the
  // next use.
  if (trace_output_mode_ != TraceOutputMode::kStreaming) {
    old_offset = cur_offset_.load(std::memory_order_relaxed);  // Speculative read
    do {
      new_offset = old_offset + GetRecordSize(clock_source_);
      if (static_cast<size_t>(new_offset) > buffer_size_) {
        overflow_ = true;
        return;
      }
    } while (!cur_offset_.compare_exchange_weak(old_offset, new_offset, std::memory_order_relaxed));
  }

  TraceAction action = kTraceMethodEnter;
  switch (event) {
    case instrumentation::Instrumentation::kMethodEntered:
      action = kTraceMethodEnter;
      break;
    case instrumentation::Instrumentation::kMethodExited:
      action = kTraceMethodExit;
      break;
    case instrumentation::Instrumentation::kMethodUnwind:
      action = kTraceUnroll;
      break;
    default:
      UNIMPLEMENTED(FATAL) << "Unexpected event: " << event;
  }

  uint32_t method_value = EncodeTraceMethodAndAction(method, action);

  // Write data into the tracing buffer (if not streaming) or into a
  // small buffer on the stack (if streaming) which we'll put into the
  // tracing buffer below.
  //
  // These writes to the tracing buffer are synchronised with the
  // future reads that (only) occur under FinishTracing(). The callers
  // of FinishTracing() acquire locks and (implicitly) synchronise
  // the buffer memory.
  uint8_t* ptr;
  static constexpr size_t kPacketSize = 14U;  // The maximum size of data in a packet.
  uint8_t stack_buf[kPacketSize];             // Space to store a packet when in streaming mode.
  if (trace_output_mode_ == TraceOutputMode::kStreaming) {
    ptr = stack_buf;
  } else {
    ptr = buf_.get() + old_offset;
  }

  Append2LE(ptr, thread->GetTid());
  Append4LE(ptr + 2, method_value);
  ptr += 6;

  if (UseThreadCpuClock()) {
    Append4LE(ptr, thread_clock_diff);
    ptr += 4;
  }
  if (UseWallClock()) {
    Append4LE(ptr, wall_clock_diff);
  }
  static_assert(kPacketSize == 2 + 4 + 4 + 4, "Packet size incorrect.");

   WriteToBuf(stack_buf, sizeof(stack_buf));
  }
}
复制代码

插桩类型 Trace

第二种追踪方式对应 TraceMode::kMethodTracing 模式，它使用了系统 art/runtime/instrumentation 的提供的

MethodTracing 功能，当调用instrumentation->EnableMethodTracing函数时 ，其内部会通过调用 Runtime->GetClassLinker->VisitClasses(&visitor) 遍历所有的Class 的函数，并为每个函数安装 Stub ，来监听函数的进入和退出。

遍历所有Class，visitor实现为 InstallStaubsClassVisitor 从而为所有类执行安装stub操作

void Instrumentation::UpdateStubs() {
   //...
  UpdateInstrumentationLevel(requested_level);
  if (requested_level > InstrumentationLevel::kInstrumentNothing) {
    InstallStubsClassVisitor visitor(this);
    runtime->GetClassLinker()->VisitClasses(&visitor);
    //...
  } else {
    InstallStubsClassVisitor visitor(this);
    runtime->GetClassLinker()->VisitClasses(&visitor);
    MaybeRestoreInstrumentationStack();
  }
}
复制代码

最终会为每个函数调用 InstallSubtsForMethod，实现函数Hook

void Instrumentation::InstallStubsForMethod(ArtMethod* method) {
  if (!method->IsInvokable() || method->IsProxyMethod()) {
    return;
  }

  if (IsProxyInit(method)) {
    return;
  }

  if (InterpretOnly(method)) {
    UpdateEntryPoints(method, GetQuickToInterpreterBridge());
    return;
  }

  if (EntryExitStubsInstalled()) {
    // Install the instrumentation entry point if needed.
    if (CodeNeedsEntryExitStub(method->GetEntryPointFromQuickCompiledCode(), method)) {
      UpdateEntryPoints(method, GetQuickInstrumentationEntryPoint());
    }
    return;
  }

  // We're being asked to restore the entrypoints after instrumentation.
  CHECK_EQ(instrumentation_level_, InstrumentationLevel::kInstrumentNothing);
  // We need to have the resolution stub still if the class is not initialized.
  if (NeedsClinitCheckBeforeCall(method) && !method->GetDeclaringClass()->IsVisiblyInitialized()) {
    UpdateEntryPoints(method, GetQuickResolutionStub());
    return;
  }
  UpdateEntryPoints(method, GetOptimizedCodeFor(method));
}
复制代码

安装 stubs的具体实现稍微有些复杂，其具体的实现在本文不做详细分析。这里举个例子，对于Quick编译的代码，需预先通过 setMethodEntryHook及 setMethodExitHook 已经预留了钩子。

通过InstallSubtsForMethod 会为每个函数安装好对应的进入函数 及退出函数 的钩子，在钩子的实现中，会分别调用 instrumenttation的 MethodEnterEvent 及 MethodExitEvent , 最终会遍历 instrumentation注册的所有监听者，通过事件的方式告知函数进入和退出的发生。

void Instrumentation::MethodEnterEventImpl(Thread* thread, ArtMethod* method) const {
  if (HasMethodEntryListeners()) {
    for (InstrumentationListener* listener : method_entry_listeners_) {
      if (listener != nullptr) {
        listener->MethodEntered(thread, method);
      }
    }
  }
}
复制代码

因此, 在Trace 实现中，其通过 Instrumentation 提供的 AddListener函数注册Listener , 最终实现了函数进入和退出的监控

{   //当不是采样类型追踪时，执行的逻辑
    
        runtime->GetInstrumentation()->AddListener(
            the_trace_,
            instrumentation::Instrumentation::kMethodEntered |
                instrumentation::Instrumentation::kMethodExited |
                instrumentation::Instrumentation::kMethodUnwind);
        // TODO: In full-PIC mode, we don't need to fully deopt.
        // TODO: We can only use trampoline entrypoints if we are java-debuggable since in that case
        // we know that inlining and other problematic optimizations are disabled. We might just
        // want to use the trampolines anyway since it is faster. It makes the story with disabling
        // jit-gc more complex though.
        runtime->GetInstrumentation()->EnableMethodTracing(
            kTracerInstrumentationKey, /*needs_interpreter=*/!runtime->IsJavaDebuggable());
 }
复制代码

在监听到函数的Enter和Exit后，执行的逻辑和采样类型一样，都是调用LogMethodTraceEvent来记录信息

void Trace::MethodEntered(Thread* thread, ArtMethod* method) {
  uint32_t thread_clock_diff = 0;
  uint32_t wall_clock_diff = 0;
  ReadClocks(thread, &thread_clock_diff, &wall_clock_diff);
  LogMethodTraceEvent(thread, method, instrumentation::Instrumentation::kMethodEntered,
                      thread_clock_diff, wall_clock_diff);
}

void Trace::MethodExited(Thread* thread,
                         ArtMethod* method,
                         instrumentation::OptionalFrame frame ATTRIBUTE_UNUSED,
                         JValue& return_value ATTRIBUTE_UNUSED) {
  uint32_t thread_clock_diff = 0;
  uint32_t wall_clock_diff = 0;
  ReadClocks(thread, &thread_clock_diff, &wall_clock_diff);
  LogMethodTraceEvent(thread,
                      method,
                      instrumentation::Instrumentation::kMethodExited,
                      thread_clock_diff,
                      wall_clock_diff);
}
复制代码

Finish Trace

当Java层调用 Debug.stopMethodTracing() 时，最终会调用到 nativce层 trace.cc 的 FiniishingTracing函数，在该函数内部，会先进行 trace文件 Header部分的信息组装。

这部分首先会记录 trace文件的版本号、trace追踪的时间、时间度量的类型(wallTime 还是 cpuTime)，函数调用的次数、虚拟机类型、进程号等基础信息。由于在记录函数创建事件时，并不是直接记录每个函数的名称。而是记录内部生成的ID，因此需要将这部分映射关系记录在Header中，具体的实现是通过调用DumpMethodList 来记录的

void Trace::FinishTracing() {
  size_t final_offset = 0;
  std::set<ArtMethod*> visited_methods;
  if (trace_output_mode_ == TraceOutputMode::kStreaming) {
    // Clean up.
    MutexLock mu(Thread::Current(), *streaming_lock_);
    STLDeleteValues(&seen_methods_);
  } else {
    final_offset = cur_offset_.load(std::memory_order_relaxed);
    GetVisitedMethods(final_offset, &visited_methods);
  }

  // Compute elapsed time.
  uint64_t elapsed = MicroTime() - start_time_;

  std::ostringstream os;
  //记录Trace版本
  os << StringPrintf("%cversion\n", kTraceTokenChar);
  os << StringPrintf("%d\n", GetTraceVersion(clock_source_));
    
  os << StringPrintf("data-file-overflow=%s\n", overflow_ ? "true" : "false");
  //记录时钟类型  时及race间
  if (UseThreadCpuClock()) {
    if (UseWallClock()) {
      os << StringPrintf("clock=dual\n");
    } else {
      os << StringPrintf("clock=thread-cpu\n");
    }
  } else {
    os << StringPrintf("clock=wall\n");
  }
  os << StringPrintf("elapsed-time-usec=%" PRIu64 "\n", elapsed);  
  if (trace_output_mode_ != TraceOutputMode::kStreaming) {
    size_t num_records = (final_offset - kTraceHeaderLength) / GetRecordSize(clock_source_);
    os << StringPrintf("num-method-calls=%zd\n", num_records);
  }
  os << StringPrintf("clock-call-overhead-nsec=%d\n", clock_overhead_ns_);
  os << StringPrintf("vm=art\n");
  os << StringPrintf("pid=%d\n", getpid());
  if ((flags_ & kTraceCountAllocs) != 0) {
    os << "alloc-count=" << Runtime::Current()->GetStat(KIND_ALLOCATED_OBJECTS) << "\n";
    os << "alloc-size=" << Runtime::Current()->GetStat(KIND_ALLOCATED_BYTES) << "\n";
    os << "gc-count=" <<  Runtime::Current()->GetStat(KIND_GC_INVOCATIONS) << "\n";
  }
  // 记录线程信息  
  os << StringPrintf("%cthreads\n", kTraceTokenChar);
  DumpThreadList(os);
  //记录函数信息  
  os << StringPrintf("%cmethods\n", kTraceTokenChar);
  DumpMethodList(os, visited_methods);
  os << StringPrintf("%cend\n", kTraceTokenChar);
  std::string header(os.str());
  //.....  
}
复制代码

MethodList 记录的每个函数信息结构为 methodId prettyMethodDescritpor methodName methodSignature methodDeclaringClassSourceFile,这里的methodId 是 Trace追踪过程中生成的，为每个记录的函数分配的唯一ID。

同样的，由于在 method trace过程中记录的是线程ID，而不是线程名，因此需要通过 DumpThreadList记录线程id和线程名的映射关系。由于头文件信息是在Trace结束时才开始记录的，对于线程来说，可能存在 Trace过程中有线程退出的情况，因此为了保证退出的线程也能被记录，在 art/runtime/thread_list.cc 中，当线程退出时,在 ThreadList::Unregister(Thread* self)内部，会专门通过调用Trace::StoreExistingThreadInfo 提前记录下来。

最终生成的 Trace Header 为类似的文本信息

Trace 方式对比

以上分析了ART虚拟机 methodTrace 在采样和插桩模式下的实现方式，通过实现可以粗略对比出两种方式的优缺点。对于插桩方式, 由于是精准跟踪每个函数的Enter Exit, 因此对于函数耗时的判断是最准确的，并且不会遗漏函数，但是自身对性能的影响也更大; 而采样的方式对于函数的Enter Exit判断是基于两次样本的差异判断，对于一些耗时较低的函数，可能不会被记录在内，并且在计算函数耗时时，精度也很采样间隔有关，不过其对性能的影响较低。

method trace方式	性能影响	精确度
插桩	较高	准确
采样	低	一般

拓展

Android Studio对 Trace文件的处理

通过 art/runtime/trace.cc 生成的Trace文件，在拖入AS后，可以直接以Top Down 或者火焰图的形式展示, 因此研究了下这部分的代码。其中，对于Trace文件的解析源码位于 perflib项目中的 VmTraceParse类。在解析Trace文件后，每个函数信息会被转化为 JavaMethodModel对象。 JavaMethodModel为CaputureNode 的子类， TopDownNode类合并相同CaptureNode 对象，最终可以生成一个树结构，该结构即可以展示出 TopDown的效果.

AS中 FlameChart的数据结构也是由TopDownNode 转换而来，因为本质上没什么区别，都是一个树形结构，

只不过，对于FlameChart展示方式来说，将更函数的耗时以更宽的长度展示出来

Method Trace的应用

从表现形式上来说，火焰图的展示方式通过为不同耗时函数展示不同的宽度，因此可以更快速的定位到耗时函数。我们通过上述的源码分析已经通过指导，通过堆栈采样的方式可以实现耗时函数的监控。本身在Java层，通过Thread.getStackTrace也可以实现堆栈采集的能力，那么不通过Native的能力，我们其实也可以实现一个简单的Method Trace方案。方案具体的实现很简单，开启后台线程，定时采集主线程的栈信息，并默认保存最近 n秒的堆栈信息记录即可。

关于具体的应用场景，比如当我们监测到 APP慢启动、慢消息处理、页面慢启动、ANR 时，在这些场景下，如果能够知晓这些事件发生时间内线程函数调用情况，是可以很好的帮助分析问题的。因此当发生这些情况时，可以获取对应时间内采样方式采取的 Stack Sample信息，将这些堆栈一起上报。在APM平台上，分析具体问题时，可以展示对应时间内的火焰图。这里以应用启动监控功能为例，对于慢启动的日志样本，可以展示出对于的火焰图信息

通过火焰图可以快速定位出，本次慢启动是 Webview.init 导致的。

卡顿监控案例分享

这里分享一个具体的实现案例，在线下场景中，以卡顿为例，我们可能更希望发生卡顿之后，能够立即受到通知，点击通知时，能够在手机上以火焰图的形式直接展示卡顿时间段的堆栈。这里我写了个DEMO来演示，以下是最终实现的效果，代码已开源在 github 上 github.com/Knight-ZXW/… 。

在Demo中目前只采集了wallTime，没有记录Thread CpuTime，后面有时间的话会完善一下

总结

本文首先通过源码分析粗略了解了 Android 系统在Native层实现 Method Trace的方式，对于插桩方式，其最终是通过 Instrementation AddListener 监听函数的进入、退出事件实现的，而采样的方式是通过开启线程定时执行，通过StackVisitor 获取所有线程函数栈的方式。对于这两种方案，在线上模式我们可以参考系统采样的方式，通过 Hook 并调用 StackVisitor相关API 实现线程堆栈的高性能采集方式。

在拓展部分，简单分享了 AndroidStudio 对Method Trace文件的处理，不管是插桩的方式还是采样的方式，最终在表现形式上，我们都可以通过火焰图的方式快速定位阻塞函数，对于在不太了解Native层的情况下，我们也可以直接使用Java层 Thread类的 getStackTrace方式，采集线程当前的栈信息。在Method Trace 的应用方面，本文演示基于堆栈采样在 Android 卡顿监控 (Looper Message 角度)的一个样例，最终可以在设备上直接以火焰图的形式展示函数调用情况。