问题背景
目前LEX业务场景包含两个模块:FM、LineHaulShuttle,都是通过ODPS离线同步然后通过Blink发送到gateway的MQ,两个QPS分别是:FM-1000/s,LineHaulShuttle-600/s,会对系统有个短暂的冲击,目前系统未经过压测,所以对于系统的抗压能力我们是不清楚的,如下就是接入LinHaul场景时发现的异常:线程池占满
问题排查
由于Lindorm入库、发风控是使用了线程池异步操作的,这样不仅能提高系统处理效率还能提高吞吐量,但是由于线程资源有限,所以就会发生如上的异常,查看代码:
可以发现两个问题:
- 当前线程缓冲队列大小为:200,当QPS超过200甚至更多的时候,就会触发创建最大线程
- 当前线程池使用的策略是:AbortPolicy,这将导致达到最大线程数时,新任务将直接被丢弃
@EnableAsync
@Configuration
public class AsyncThreadPoolConfig {
/**
* 核心线程数(默认线程数)
*/
private static final int CORE_POOL_SIZE = 8;
/**
* 最大线程数
*/
private static final int MAX_POOL_SIZE = 20;
/**
* 允许线程空闲时间(单位:默认为秒)
*/
private static final int KEEP_ALIVE_TIME = 10;
/**
* 缓冲队列大小
*/
private static final int QUEUE_CAPACITY = 200;
/**
* 线程池名前缀
*/
private static final String THREAD_NAME_PREFIX = "async-task-pool-";
/**
* 当使用@Async注解时,需指定使用此线程池
* @return 线程池实例
*/
@Bean("asyncTaskExecutor")
public ThreadPoolTaskExecutor asyncTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(CORE_POOL_SIZE);
executor.setMaxPoolSize(MAX_POOL_SIZE);
executor.setQueueCapacity(QUEUE_CAPACITY);
executor.setKeepAliveSeconds(KEEP_ALIVE_TIME);
executor.setThreadNamePrefix(THREAD_NAME_PREFIX);
executor.setThreadFactory(new ThreadFactory() {
// 线程计数器
private final AtomicInteger threadNumber = new AtomicInteger(0);
@Override
public Thread newThread(@NotNull Runnable runnable) {
Thread thread = new Thread(runnable, THREAD_NAME_PREFIX + threadNumber.getAndIncrement());
if (thread.isDaemon()) {
thread.setDaemon(false);
}
if (thread.getPriority() != Thread.NORM_PRIORITY) {
thread.setPriority(Thread.NORM_PRIORITY);
}
return thread;
}
});
// 线程池对拒绝任务的处理策略,拒绝执行且抛出异常
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy());
// 初始化
executor.initialize();
return executor;
}
}
问题复现
本地验证复现该问题
修改线程池配置
将核心/最大线程数改为:5,缓冲区大小改为:5
private static final int CORE_POOL_SIZE = 5;
private static final int QUEUE_CAPACITY = 5;
添加异步方法
private AtomicInteger atomicInteger = new AtomicInteger(0);
@Async("asyncTaskExecutor")
public void test() {
int index = atomicInteger.getAndIncrement();
System.out.println("start handle: " + index);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
System.out.println("--------------InterruptedException");
}
System.out.println("handle over: " + index);
}
添加测试用例
@RunWith(PandoraBootRunner.class)
@DelegateTo(SpringJUnit4ClassRunner.class)
@SpringBootTest(classes = {
Application.class})
public class AsyncThreadPoolConfigTest extends TestCase {
@Autowired
private LindormService lindormService;
@Test
public void testAsyncTaskExecutor() {
System.out.println("testAsyncTaskExecutor start");
for (int i = 0; i < 100; i++) {
lindormService.test();
}
System.out.println("testAsyncTaskExecutor over");
}
}
测试结果
发现当执行到10个任务的时候,线程池就拒绝执行后面来的新任务了。
插曲
疑问:为什么之前处理完的线程没有继续执行后续的任务呢?
经排查发现是由于线程池的异常抛到主线程导致程序中断了
修改test方法
private AtomicInteger atomicInteger = new AtomicInteger(0);
@Autowired
private ThreadPoolTaskExecutor executor;
public void test() {
try {
executor.execute(() -> {
int index = atomicInteger.getAndIncrement();
System.out.println("start handle: " + index);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
System.out.println("--------------InterruptedException");
}
System.out.println("handle over: " + index);
});
} catch (Exception e) {
System.out.println("executor exception: " + e);
}
}
最终结果
总计处理任务100个,其中失败70个,成功30个
问题处理
思路:
- 扩大缓冲区大小。避免太小导致丢数据,尽量在最大QPS之内,参考:双十一最大QPS为1697
- 最大线程数和核心线程数置为相等。避免缓冲区大的时候无法创建新线程,导致无法发挥多线程能力
- 使用CallerRunsPolicy。修改线程池饱和策略,使用主线程处理任务,避免任务丢失
验证
修改代码
修改饱和策略为CallerRunsPolicy,其他保持不变
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
测试结果
总计处理任务100,其中失败0个,成功100个
结论
修改生效,任务不会因为线程池满而被丢弃
最终代码
@EnableAsync
@Configuration
public class AsyncThreadPoolConfig {
/**
* 核心线程数(默认线程数)
*/
private static final int CORE_POOL_SIZE = 20;
/**
* 允许线程空闲时间(单位:默认为秒)
*/
private static final int KEEP_ALIVE_TIME = 10;
/**
* 缓冲队列大小
*/
private static final int QUEUE_CAPACITY = 2000;
/**
* 线程池名前缀
*/
private static final String THREAD_NAME_PREFIX = "async-task-pool-";
/**
* 当使用@Async注解时,需指定使用此线程池
* @return 线程池实例
*/
@Bean("asyncTaskExecutor")
public ThreadPoolTaskExecutor asyncTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(CORE_POOL_SIZE);
// 避免缓冲区没MAX的时候,无法新创建线程,导致无法发挥多线程的能力
executor.setMaxPoolSize(CORE_POOL_SIZE);
// 长度设置参考:https://blog.csdn.net/huangshanchun/article/details/78567501
executor.setQueueCapacity(QUEUE_CAPACITY);
executor.setKeepAliveSeconds(KEEP_ALIVE_TIME);
executor.setThreadNamePrefix(THREAD_NAME_PREFIX);
executor.setThreadFactory(new ThreadFactory() {
// 线程计数器
private final AtomicInteger threadNumber = new AtomicInteger(0);
@Override
public Thread newThread(@NotNull Runnable runnable) {
Thread thread = new Thread(runnable, THREAD_NAME_PREFIX + threadNumber.getAndIncrement());
if (thread.isDaemon()) {
thread.setDaemon(false);
}
if (thread.getPriority() != Thread.NORM_PRIORITY) {
thread.setPriority(Thread.NORM_PRIORITY);
}
return thread;
}
});
// 如果线程池满了之后,使用当前工作线程来处理任务
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
// 初始化
executor.initialize();
return executor;
}
}