MANUAL_FLUSH is enabled but the buffer is too big 解决办法 及 产生原因分析

背景

最近在开发 flink 任务,需要将数据写入kudu,测试环境都没问题,但是生产环境有很多错误信息(如下):

org.apache.kudu.client.NonRecoverableException: MANUAL_FLUSH is enabled but the buffer is too big
	at org.apache.kudu.client.KuduException.transformException(KuduException.java:110) ~[blob_p-371f6d8c9621c84079fa2c5ecdaf852455599896-1bb2c1ac8a602c94efbddf261bf4997c:?]
	at org.apache.kudu.client.KuduSession.apply(KuduSession.java:93) ~[blob_p-371f6d8c9621c84079fa2c5ecdaf852455599896-1bb2c1ac8a602c94efbddf261bf4997c:?]
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:57) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:577) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:533) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.onProcessingTime(InternalTimerServiceImpl.java:284) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1425) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$16(StreamTask.java:1416) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:344) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:330) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) [flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) [flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) [flink-dist_2.11-1.13.3.jar:1.13.3]
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) [flink-dist_2.11-1.13.3.jar:1.13.3]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
	Suppressed: org.apache.kudu.client.KuduException$OriginalException: Original asynchronous stack trace
		at org.apache.kudu.client.AsyncKuduSession.apply(AsyncKuduSession.java:596) ~[blob_p-371f6d8c9621c84079fa2c5ecdaf852455599896-1bb2c1ac8a602c94efbddf261bf4997c:?]
		at org.apache.kudu.client.KuduSession.apply(KuduSession.java:79) ~[blob_p-371f6d8c9621c84079fa2c5ecdaf852455599896-1bb2c1ac8a602c94efbddf261bf4997c:?]
		at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:57) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:577) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:533) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.onProcessingTime(InternalTimerServiceImpl.java:284) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1425) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$16(StreamTask.java:1416) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:344) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:330) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) ~[flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) [flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) [flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) [flink-dist_2.11-1.13.3.jar:1.13.3]
		at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) [flink-dist_2.11-1.13.3.jar:1.13.3]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]

解决办法

这里先说下解决办法, 有2步
1、kuduSession需要设置 maxBufferSize
,具体代码如下:

kuduSession.setMutationBufferSpace(maxBufferSize)

maxBufferSize是个正整数,需要设置大一点, 比如 10000

2、手动刷数据
当数据写入超过阈值的时候,调用 flush() 写入一次。这个阈值要比 maxBufferSize小一点,比如 9000

i = 0;

for (Tuple2<Boolean, Row> data : dataList) {
    
    
errorMsg = kuduWriter.write(table, opsMapper, data);

if (++i > 9000) {
    
    
    kuduWriter.flush();
    i = 0;
   }
}

kuduWriter.flush();

以上2步就可以解决上述报错信息了。

3、根因分析

通过上面的日志栈,可以知道报错的地方是在 KuduSession.apply(), 在这个方法里面再往下找,最终发现是在下面的方法报错的:

org.apache.kudu.client.AsyncKuduSession.apply(final Operation operation)

这个方法比较长,这里截取部分代码:

switch (flushMode) {
    
    
          case AUTO_FLUSH_SYNC: {
    
    
            // This case is handled above and is impossible here.
            // TODO(wdberkeley): Handle AUTO_FLUSH_SYNC just like other flush modes.
            assert false;
            break;
          }
          case MANUAL_FLUSH: {
    
    
            if (activeBufferSize >= mutationBufferMaxOps) {
    
    
              Status statusIllegalState =
                  Status.IllegalState("MANUAL_FLUSH is enabled but the buffer is too big");
              throw new NonRecoverableException(statusIllegalState);
            }
            activeBuffer.getOperations().add(new BufferedOperation(tablet, operation));
            break;
          }
          ...

通过这部分代码可以看到,产生异常的条件有2个:
1、flushMode = MANUAL_FLUSH, 也就是手动刷新模式;
2、 activeBufferSize >= mutationBufferMaxOps

activeBufferSize 指的是已经写入数据的大小, 简单理解成调用write()的次数(每次写入1条的情况下)
mutationBufferMaxOps 是指缓冲区的大小。

也就是说,手动模式下,数据写入kudu时,会先写入 buffer(缓冲区),当写入到一定数量的时候,再调用 flush() 刷入数据库中。
如果写入的数据量大于 缓冲区的最大允许数时,就会报错。

因此解决办法是:设置buffer 大一点,并且及时 调用flush().,或者不要使用 手动刷新模式。

猜你喜欢

转载自blog.csdn.net/samur2/article/details/125660887
今日推荐