Today is another busy day. This is the beginning of the hardest core JVM series on the entire network, starting with TLAB. Since the article is very long and everyone has different reading habits, it is hereby divided into a single version and multiple versions
- Analysis of the hardest core JVM TLAB in the whole network (single version does not include additional dishes)
- Analysis of the hardest core JVM TLAB in the whole network 1. Introduction of memory allocation idea
- Analysis of the hardest core JVM TLAB in the whole network 2. Thinking about the life cycle of TLAB and the problems it brings
- TLAB analysis of the hardest core JVM in the whole network 3. JVM EMA expectation algorithm and TLAB related JVM startup parameters
- Analysis of the hardest core JVM TLAB in the whole network 4. Full analysis of the basic process of TLAB
- TLAB analysis of the hardest core JVM in the whole network 5. Full analysis of TLAB source code
- The most hard core JVM TLAB analysis on the whole network 6. TLAB related popular Q&A summary
- TLAB analysis of the hardest core JVM in the whole network (additional dishes) 7. TLAB related JVM log analysis
- The most hard-core JVM TLAB analysis on the entire network (additional dishes) 8. Monitoring TLAB through JFR
9. OpenJDK HotSpot TLAB related source code analysis
If it is difficult to read here, you can go directly to Chapter 10, Popular Q&A, which contains many frequently asked questions
9.1. TLAB class structure
When the thread is initialized, if the JVM enables TLAB (it is enabled by default, it can be turned off by -XX:-UseTLAB), TLAB will be initialized.
TLAB includes the following fields (HeapWord* can be understood as the memory address in the heap):src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp
//静态全局变量
static size_t _max_size; // 所有 TLAB 的最大大小
static int _reserve_for_allocation_prefetch; // CPU 缓存优化 Allocation Prefetch 的保留空间,这里先不用关心
static unsigned _target_refills; //每个 GC 周期内期望的重填次数
//以下是 TLAB 的主要构成 field
HeapWord* _start; // TLAB 起始地址,表示堆内存地址都用 HeapWord*
HeapWord* _top; // 上次分配的内存地址
HeapWord* _end; // TLAB 结束地址
size_t _desired_size; // TLAB 大小 包括保留空间,表示内存大小都需要通过 size_t 类型,也就是实际字节数除以 HeapWordSize 的值
size_t _refill_waste_limit; // TLAB最大浪费空间,剩余空间不足分配浪费空间限制。在TLAB剩余空间不足的时候,根据这个值决定分配策略,如果浪费空间大于这个值则直接在 Eden 区分配,如果小于这个值则将当前 TLAB 放回 Eden 区管理并从 Eden 申请新的 TLAB 进行分配。
AdaptiveWeightedAverage _allocation_fraction; // 当前 TLAB 分配比例 EMA
//以下是我们这里不用太关心的 field
HeapWord* _allocation_end; // TLAB 真正可以用来分配内存的结束地址,这个是 _end 结束地址排除保留空间(预留给 dummy object 的对象头空间)
HeapWord* _pf_top; // Allocation Prefetch CPU 缓存优化机制相关需要的参数,这里先不用考虑
size_t _allocated_before_last_gc; // 这个用于计算 图10 中的线程本轮 GC 分配空间的大小,记录上次 GC 时,线程分配的空间大小
unsigned _number_of_refills; // 线程分配内存数据采集相关,TLAB 剩余空间不足分配次数
unsigned _fast_refill_waste; // 线程分配内存数据采集相关,TLAB 快速分配浪费,快速分配就是直接在 TLAB 分配,这个在现在 JVM 中已经用不到了
unsigned _slow_refill_waste; // 线程分配内存数据采集相关,TLAB 慢速分配浪费,慢速分配就是重填一个 TLAB 分配
unsigned _gc_waste; // 线程分配内存数据采集相关,gc浪费
unsigned _slow_allocations; // 线程分配内存数据采集相关,TLAB 慢速分配计数
size_t _allocated_size; // 分配的内存大小
size_t _bytes_since_last_sample_point; // JVM TI 采集指标相关 field,这里不用关心
9.2. TLAB initialization
The first is when the JVM starts, the global TLAB needs to be initialized:src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp
void ThreadLocalAllocBuffer::startup_initialization() {
//初始化,也就是归零统计数据
ThreadLocalAllocStats::initialize();
// 假设平均下来,GC 扫描的时候,每个线程当前的 TLAB 都有一半的内存被浪费,这个每个线程使用内存的浪费的百分比率(也就是 TLABWasteTargetPercent),也就是等于(注意,仅最新的那个 TLAB 有浪费,之前 refill 退回的假设是没有浪费的):1/2 * (每个 epoch 内每个线程期望 refill 次数) * 100
//那么每个 epoch 内每个线程 refill 次数配置就等于 50 / TLABWasteTargetPercent, 默认也就是 50 次。
_target_refills = 100 / (2 * TLABWasteTargetPercent);
// 但是初始的 _target_refills 需要设置最多不超过 2 次来减少 VM 初始化时候 GC 的可能性
_target_refills = MAX2(_target_refills, 2U);
//如果 C2 JIT 编译存在并启用,则保留 CPU 缓存优化 Allocation Prefetch 空间,这个这里先不用关心,会在别的章节讲述
#ifdef COMPILER2
if (is_server_compilation_mode_vm()) {
int lines = MAX2(AllocatePrefetchLines, AllocateInstancePrefetchLines) + 2;
_reserve_for_allocation_prefetch = (AllocatePrefetchDistance + AllocatePrefetchStepSize * lines) /
(int)HeapWordSize;
}
#endif
// 初始化 main 线程的 TLAB
guarantee(Thread::current()->is_Java_thread(), "tlab initialization thread not Java thread");
Thread::current()->tlab().initialize();
log_develop_trace(gc, tlab)("TLAB min: " SIZE_FORMAT " initial: " SIZE_FORMAT " max: " SIZE_FORMAT,
min_size(), Thread::current()->tlab().initial_desired_size(), max_size());
}
Each thread maintains its own TLAB, and each thread's TLAB varies in size. The size of TLAB is mainly determined by the size of Eden, the number of threads, and the object allocation rate of threads. When a Java thread starts running, the TLAB is allocated first:src/hotspot/share/runtime/thread.cpp
void JavaThread::run() {
// initialize thread-local alloc buffer related fields
this->initialize_tlab();
//剩余代码忽略
}
Allocating TLAB is actually calling the initialize method of ThreadLocalAllocBuffer.src/hotspot/share/runtime/thread.hpp
void initialize_tlab() {
//如果没有通过 -XX:-UseTLAB 禁用 TLAB,则初始化TLAB
if (UseTLAB) {
tlab().initialize();
}
}
// Thread-Local Allocation Buffer (TLAB) support
ThreadLocalAllocBuffer& tlab() {
return _tlab;
}
ThreadLocalAllocBuffer _tlab;
The initialize method of ThreadLocalAllocBuffer initializes the various fields of TLAB mentioned above that we need to care about:src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp
void ThreadLocalAllocBuffer::initialize() {
//设置初始指针,由于还没有从 Eden 分配内存,所以这里都设置为 NULL
initialize(NULL, // start
NULL, // top
NULL); // end
//计算初始期望大小,并设置
set_desired_size(initial_desired_size());
//所有 TLAB 总大小,不同的 GC 实现有不同的 TLAB 容量, 一般是 Eden 区大小
//例如 G1 GC,就是等于 (_policy->young_list_target_length() - _survivor.length()) * HeapRegion::GrainBytes,可以理解为年轻代减去Survivor区,也就是Eden区
size_t capacity = Universe::heap()->tlab_capacity(thread()) / HeapWordSize;
//计算这个线程的 TLAB 期望占用所有 TLAB 总体大小比例
//TLAB 期望占用大小也就是这个 TLAB 大小乘以期望 refill 的次数
float alloc_frac = desired_size() * target_refills() / (float) capacity;
//记录下来,用于计算 EMA
_allocation_fraction.sample(alloc_frac);
//计算初始 refill 最大浪费空间,并设置
//如前面原理部分所述,初始大小就是 TLAB 的大小(_desired_size) / TLABRefillWasteFraction
set_refill_waste_limit(initial_refill_waste_limit());
//重置统计
reset_statistics();
}
9.2.1. How is the initial expected size calculated?
src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp
//计算初始大小
size_t ThreadLocalAllocBuffer::initial_desired_size() {
size_t init_sz = 0;
//如果通过 -XX:TLABSize 设置了 TLAB 大小,则用这个值作为初始期望大小
//表示堆内存占用大小都需要用占用几个 HeapWord 表示,所以用TLABSize / HeapWordSize
if (TLABSize > 0) {
init_sz = TLABSize / HeapWordSize;
} else {
//获取当前epoch内线程数量期望,这个如之前所述通过 EMA 预测
unsigned int nof_threads = ThreadLocalAllocStats::allocating_threads_avg();
//不同的 GC 实现有不同的 TLAB 容量,Universe::heap()->tlab_capacity(thread()) 一般是 Eden 区大小
//例如 G1 GC,就是等于 (_policy->young_list_target_length() - _survivor.length()) * HeapRegion::GrainBytes,可以理解为年轻代减去Survivor区,也就是Eden区
//整体大小等于 Eden区大小/(当前 epcoh 内会分配对象期望线程个数 * 每个 epoch 内每个线程 refill 次数配置)
//target_refills已经在 JVM 初始化所有 TLAB 全局配置的时候初始化好了
init_sz = (Universe::heap()->tlab_capacity(thread()) / HeapWordSize) /
(nof_threads * target_refills());
//考虑对象对齐,得出最后的大小
init_sz = align_object_size(init_sz);
}
//保持大小在 min_size() 还有 max_size() 之间
//min_size主要由 MinTLABSize 决定
init_sz = MIN2(MAX2(init_sz, min_size()), max_size());
return init_sz;
}
//最小大小由 MinTLABSize 决定,需要表示为 HeapWordSize,并且考虑对象对齐,最后的 alignment_reserve 是 dummy object 填充的对象头大小(这里先不考虑 JVM 的 CPU 缓存 prematch,我们会在其他章节详细分析)。
static size_t min_size() {
return align_object_size(MinTLABSize / HeapWordSize) + alignment_reserve();
}
9.2.2. How is the maximum TLAB size determined?
Different GC methods have different methods:
In G1 GC, the size of the large object (humongous object) is half the size of the G1 region:src/hotspot/share/gc/g1/g1CollectedHeap.cpp
// For G1 TLABs should not contain humongous objects, so the maximum TLAB size
// must be equal to the humongous object limit.
size_t G1CollectedHeap::max_tlab_size() const {
return align_down(_humongous_object_threshold_in_words, MinObjAlignment);
}
In ZGC, it is 1/8 of the page size , and similarly in most cases, Shenandoah GC is also 1/8 of the size of each Region . They all expect at least 7/8 of the area to be free from back-off to reduce the scanning complexity when selecting Cset:src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
MaxTLABSizeWords = MIN2(ShenandoahElasticTLAB ? RegionSizeWords : (RegionSizeWords / 8), HumongousThresholdWords);
src/hotspot/share/gc/z/zHeap.cpp
const size_t ZObjectSizeLimitSmall = ZPageSizeSmall / 8;
For other GCs, it is the maximum size of an int array , which is related to filling the empty area of the TLAB represented by the dummy object. The reason for this has been explained before.
9.3. TLAB allocates memory
When new an object, you need to callinstanceOop InstanceKlass::allocate_instance(TRAPS)
src/hotspot/share/oops/instanceKlass.cpp
instanceOop InstanceKlass::allocate_instance(TRAPS) {
bool has_finalizer_flag = has_finalizer(); // Query before possible GC
int size = size_helper(); // Query before forming handle.
instanceOop i;
i = (instanceOop)Universe::heap()->obj_allocate(this, size, CHECK_NULL);
if (has_finalizer_flag && !RegisterFinalizersAtInit) {
i = register_finalizer(i, CHECK_NULL);
}
return i;
}
Its core is heap()->obj_allocate(this, size, CHECK_NULL)
to allocate memory from the heap:src/hotspot/share/gc/shared/collectedHeap.inline.hpp
inline oop CollectedHeap::obj_allocate(Klass* klass, int size, TRAPS) {
ObjAllocator allocator(klass, size, THREAD);
return allocator.allocate();
}
Use the global ObjAllocator
implementation for object memory allocation:src/hotspot/share/gc/shared/memAllocator.cpp
oop MemAllocator::allocate() const {
oop obj = NULL;
{
Allocation allocation(*this, &obj);
//分配堆内存,继续看下面一个方法
HeapWord* mem = mem_allocate(allocation);
if (mem != NULL) {
obj = initialize(mem);
} else {
// The unhandled oop detector will poison local variable obj,
// so reset it to NULL if mem is NULL.
obj = NULL;
}
}
return obj;
}
HeapWord* MemAllocator::mem_allocate(Allocation& allocation) const {
//如果使用了 TLAB,则从 TLAB 分配,分配代码继续看下面一个方法
if (UseTLAB) {
HeapWord* result = allocate_inside_tlab(allocation);
if (result != NULL) {
return result;
}
}
//否则直接从 tlab 外分配
return allocate_outside_tlab(allocation);
}
HeapWord* MemAllocator::allocate_inside_tlab(Allocation& allocation) const {
assert(UseTLAB, "should use UseTLAB");
//从当前线程的 TLAB 分配内存,TLAB 快分配
HeapWord* mem = _thread->tlab().allocate(_word_size);
//如果没有分配失败则返回
if (mem != NULL) {
return mem;
}
//如果分配失败则走 TLAB 慢分配,需要 refill 或者直接从 Eden 分配
return allocate_inside_tlab_slow(allocation);
}
9.3.1. TLAB fast allocation
src/hotspot/share/gc/shared/threadLocalAllocBuffer.inline.hpp
inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
//验证各个内存指针有效,也就是 _top 在 _start 和 _end 范围内
invariants();
HeapWord* obj = top();
//如果空间足够,则分配内存
if (pointer_delta(end(), obj) >= size) {
set_top(obj + size);
invariants();
return obj;
}
return NULL;
}
9.3.2. TLAB slow allocation
src/hotspot/share/gc/shared/memAllocator.cpp
HeapWord* MemAllocator::allocate_inside_tlab_slow(Allocation& allocation) const {
HeapWord* mem = NULL;
ThreadLocalAllocBuffer& tlab = _thread->tlab();
// 如果 TLAB 剩余空间大于 最大浪费空间,则记录并让最大浪费空间递增
if (tlab.free() > tlab.refill_waste_limit()) {
tlab.record_slow_allocation(_word_size);
return NULL;
}
//重新计算 TLAB 大小
size_t new_tlab_size = tlab.compute_size(_word_size);
//TLAB 放回 Eden 区
tlab.retire_before_allocation();
if (new_tlab_size == 0) {
return NULL;
}
// 计算最小大小
size_t min_tlab_size = ThreadLocalAllocBuffer::compute_min_size(_word_size);
//分配新的 TLAB 空间,并在里面分配对象
mem = Universe::heap()->allocate_new_tlab(min_tlab_size, new_tlab_size, &allocation._allocated_tlab_size);
if (mem == NULL) {
assert(allocation._allocated_tlab_size == 0,
"Allocation failed, but actual size was updated. min: " SIZE_FORMAT
", desired: " SIZE_FORMAT ", actual: " SIZE_FORMAT,
min_tlab_size, new_tlab_size, allocation._allocated_tlab_size);
return NULL;
}
assert(allocation._allocated_tlab_size != 0, "Allocation succeeded but actual size not updated. mem at: "
PTR_FORMAT " min: " SIZE_FORMAT ", desired: " SIZE_FORMAT,
p2i(mem), min_tlab_size, new_tlab_size);
//如果启用了 ZeroTLAB 这个 JVM 参数,则将对象所有字段置零值
if (ZeroTLAB) {
// ..and clear it.
Copy::zero_to_words(mem, allocation._allocated_tlab_size);
} else {
// ...and zap just allocated object.
}
//设置新的 TLAB 空间为当前线程的 TLAB
tlab.fill(mem, mem + _word_size, allocation._allocated_tlab_size);
//返回分配的对象内存地址
return mem;
}
9.3.2.1 The biggest waste of space in TLAB
TLAB maximum wasted space _refill_waste_limit
The initial value is the TLAB size divided by TLABRefillWasteFraction:src/hotspot/share/gc/shared/threadLocalAllocBuffer.hpp
size_t initial_refill_waste_limit() { return desired_size() / TLABRefillWasteFraction; }
For each slow allocation, while calling record_slow_allocation(size_t obj_size)
the record slow allocation, increase the size of the maximum wasted space in TLAB:
src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp
void ThreadLocalAllocBuffer::record_slow_allocation(size_t obj_size) {
//每次慢分配,_refill_waste_limit 增加 refill_waste_limit_increment,也就是 TLABWasteIncrement
set_refill_waste_limit(refill_waste_limit() + refill_waste_limit_increment());
_slow_allocations++;
log_develop_trace(gc, tlab)("TLAB: %s thread: " INTPTR_FORMAT " [id: %2d]"
" obj: " SIZE_FORMAT
" free: " SIZE_FORMAT
" waste: " SIZE_FORMAT,
"slow", p2i(thread()), thread()->osthread()->thread_id(),
obj_size, free(), refill_waste_limit());
}
//refill_waste_limit_increment 就是 JVM 参数 TLABWasteIncrement
static size_t refill_waste_limit_increment() { return TLABWasteIncrement; }
9.3.2.2. Recalculating TLAB Size
The recalculation will take the smaller of the current heap remaining available for TLAB allocation and the expected size of TLAB + the size of the space that needs to be allocated currently:
src/hotspot/share/gc/shared/threadLocalAllocBuffer.inline.hpp
inline size_t ThreadLocalAllocBuffer::compute_size(size_t obj_size) {
//获取当前堆剩余给 TLAB 可分配的空间
const size_t available_size = Universe::heap()->unsafe_max_tlab_alloc(thread()) / HeapWordSize;
//取 TLAB 可分配的空间 和 TLAB 期望大小 + 当前需要分配的空间大小 以及 TLAB 最大大小中的小的那个
size_t new_tlab_size = MIN3(available_size, desired_size() + align_object_size(obj_size), max_size());
// 确保大小大于 dummy obj 对象头
if (new_tlab_size < compute_min_size(obj_size)) {
log_trace(gc, tlab)("ThreadLocalAllocBuffer::compute_size(" SIZE_FORMAT ") returns failure",
obj_size);
return 0;
}
log_trace(gc, tlab)("ThreadLocalAllocBuffer::compute_size(" SIZE_FORMAT ") returns " SIZE_FORMAT,
obj_size, new_tlab_size);
return new_tlab_size;
}
9.3.2.3. Put the current TLAB back on the heap
src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp
//在TLAB慢分配被调用,当前 TLAB 放回堆
void ThreadLocalAllocBuffer::retire_before_allocation() {
//将当前 TLAB 剩余空间大小加入慢分配浪费空间大小
_slow_refill_waste += (unsigned int)remaining();
//执行 TLAB 退还给堆,这个在后面 GC 的时候还会被调用用于将所有的线程的 TLAB 退回堆
retire();
}
//对于 TLAB 慢分配,stats 为空
//对于 GC 的时候调用,stats 用于记录每个线程的数据
void ThreadLocalAllocBuffer::retire(ThreadLocalAllocStats* stats) {
if (stats != NULL) {
accumulate_and_reset_statistics(stats);
}
//如果当前 TLAB 有效
if (end() != NULL) {
invariants();
//将用了的空间记录如线程分配对象大小记录
thread()->incr_allocated_bytes(used_bytes());
//填充dummy object
insert_filler();
//清空当前 TLAB 指针
initialize(NULL, NULL, NULL);
}
}
9.4. GC-related TLAB operations
9.4.1. Before GC
Different GCs may have different implementations, but the timing of TLAB operations is basically the same. Here, take G1 GC as an example, before the real GC:
src/hotspot/share/gc/g1/g1CollectedHeap.cpp
void G1CollectedHeap::gc_prologue(bool full) {
//省略其他代码
// Fill TLAB's and such
{
Ticks start = Ticks::now();
//确保堆内存是可以解析的
ensure_parsability(true);
Tickspan dt = Ticks::now() - start;
phase_times()->record_prepare_tlab_time_ms(dt.seconds() * MILLIUNITS);
}
//省略其他代码
}
Why make sure the heap memory is parseable? This facilitates faster scanning of objects on the heap. Make sure the memory can be resolved what's going on inside? In fact, the main thing is to return the TLAB of each thread and fill the dummy object.
src/hotspot/share/gc/g1/g1CollectedHeap.cpp
void CollectedHeap::ensure_parsability(bool retire_tlabs) {
//真正的 GC 肯定发生在安全点上,这个在后面安全点章节会详细说明
assert(SafepointSynchronize::is_at_safepoint() || !is_init_completed(),
"Should only be called at a safepoint or at start-up");
ThreadLocalAllocStats stats;
for (JavaThreadIteratorWithHandle jtiwh; JavaThread *thread = jtiwh.next();) {
BarrierSet::barrier_set()->make_parsable(thread);
//如果全局启用了 TLAB
if (UseTLAB) {
//如果指定要回收,则回收 TLAB
if (retire_tlabs) {
//回收 TLAB,调用 9.3.2.3. 当前 TLAB 放回堆 提到的 retire 方法
thread->tlab().retire(&stats);
} else {
//当前如果不回收,则将 TLAB 填充 Dummy Object 利于解析
thread->tlab().make_parsable();
}
}
}
stats.publish();
}
9.4.2. After GC
Different GC may be implemented differently, but the timing of TLAB operation is basically the same. Here is an example of G1 GC. After GC:
src/hotspot/share/gc/g1/g1CollectedHeap.cpp
_desired_size
When did it become? How did it become?
void G1CollectedHeap::gc_epilogue(bool full) {
//省略其他代码
resize_all_tlabs();
}
src/hotspot/share/gc/shared/collectedHeap.cpp
void CollectedHeap::resize_all_tlabs() {
//需要在安全点,GC 会处于安全点的
assert(SafepointSynchronize::is_at_safepoint() || !is_init_completed(),
"Should only resize tlabs at safepoint");
//如果 UseTLAB 和 ResizeTLAB 都是打开的(默认就是打开的)
if (UseTLAB && ResizeTLAB) {
for (JavaThreadIteratorWithHandle jtiwh; JavaThread *thread = jtiwh.next(); ) {
//重新计算每个线程 TLAB 期望大小
thread->tlab().resize();
}
}
}
Recalculate the expected size of each thread TLAB:src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp
void ThreadLocalAllocBuffer::resize() {
assert(ResizeTLAB, "Should not call this otherwise");
//根据 _allocation_fraction 这个 EMA 采集得出平均数乘以Eden区大小,得出 TLAB 当前预测占用内存比例
size_t alloc = (size_t)(_allocation_fraction.average() *
(Universe::heap()->tlab_capacity(thread()) / HeapWordSize));
//除以目标 refill 次数就是新的 TLAB 大小,和初始化时候的计算方法差不多
size_t new_size = alloc / _target_refills;
//保证在 min_size 还有 max_size 之间
new_size = clamp(new_size, min_size(), max_size());
size_t aligned_new_size = align_object_size(new_size);
log_trace(gc, tlab)("TLAB new size: thread: " INTPTR_FORMAT " [id: %2d]"
" refills %d alloc: %8.6f desired_size: " SIZE_FORMAT " -> " SIZE_FORMAT,
p2i(thread()), thread()->osthread()->thread_id(),
_target_refills, _allocation_fraction.average(), desired_size(), aligned_new_size);
//设置新的 TLAB 大小
set_desired_size(aligned_new_size);
//重置 TLAB 最大浪费空间
set_refill_waste_limit(initial_refill_waste_limit());
}