Arthas排查skywalking问题 id is too long, must be no longer than 512 bytes

id is too long, must be no longer than 512 bytes

部署的skywalking老是崩溃,cpu被拉满。 查看skywalking-oap-server.log 日志,发现很多异常日志,如下:

2021-02-20 17:27:18,699 - org.apache.skywalking.oap.server.core.register.worker.RegisterPersistentWorker - 105 [DataCarrier.REGISTER_L2.BulkConsumePool.0.Thread] ERROR [] - Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 642;
org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 642;
	at org.elasticsearch.action.ValidateActions.addValidationError(ValidateActions.java:26) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.action.index.IndexRequest.validate(IndexRequest.java:183) ~[elasticsearch-6.3.2.jar:6.3.2]
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:515) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:508) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
	at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:348) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
	at org.apache.skywalking.oap.server.library.client.elasticsearch.ElasticSearchClient.forceInsert(ElasticSearchClient.java:241) ~[library-client-6.3.0.jar:6.3.0]
	at org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.RegisterEsDAO.forceInsert(RegisterEsDAO.java:51) ~[storage-elasticsearch-plugin-6.3.0.jar:6.3.0]
	at org.apache.skywalking.oap.server.core.register.worker.RegisterPersistentWorker.lambda$onWork$0(RegisterPersistentWorker.java:102) ~[server-core-6.3.0.jar:6.3.0]
	at java.util.HashMap$Values.forEach(HashMap.java:981) [?:1.8.0_221]
	at org.apache.skywalking.oap.server.core.register.worker.RegisterPersistentWorker.onWork(RegisterPersistentWorker.java:84) [server-core-6.3.0.jar:6.3.0]
	at org.apache.skywalking.oap.server.core.register.worker.RegisterPersistentWorker.access$100(RegisterPersistentWorker.java:35) [server-core-6.3.0.jar:6.3.0]
	at org.apache.skywalking.oap.server.core.register.worker.RegisterPersistentWorker$PersistentConsumer.consume(RegisterPersistentWorker.java:141) [server-core-6.3.0.jar:6.3.0]
	at org.apache.skywalking.apm.commons.datacarrier.consumer.MultipleChannelsConsumer.consume(MultipleChannelsConsumer.java:82) [apm-datacarrier-6.3.0.jar:6.3.0]
	at org.apache.skywalking.apm.commons.datacarrier.consumer.MultipleChannelsConsumer.run(MultipleChannelsConsumer.java:53) [apm-datacarrier-6.3.0.jar:6.3.0]

在网上找到了这篇文章:https://www.cnblogs.com/kebibuluan/p/13633037.html
问题分析:

可以看到,上面的异常输出的时间节点,以这种频率在疯狂的刷新。通过异常message,得知到是因为skywalking在写elasticsearch时,索引的id太长了。下面是elasticsearch的源码:

        if (id != null && id.getBytes(StandardCharsets.UTF_8).length > 512) {
            validationException = addValidationError("id is too long, must be no longer than 512 bytes but was: " +
                            id.getBytes(StandardCharsets.UTF_8).length, validationException);
        }
具体可见:elasticsearch/action/index/IndexRequest.java#L240

里面给出了方法但没有具体步骤。这里讲一下具体的步骤。

安装arthas

# 下载
curl -O https://alibaba.github.io/arthas/arthas-boot.jar

# 执行
java -Dfile.encoding=UTF-8 -jar arthas-boot.jar
# 另外一个需要解释的点是 -Dfile.encoding=UTF-8,这个 Java 设置是为了让 Arthas 输出中文的时候不会乱码

执行效果

[root@skywailing-aliyun tools]# java -Dfile.encoding=UTF-8 -jar arthas-boot.jar
[INFO] arthas-boot version: 3.4.5
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
* [1]: 22721 org.elasticsearch.bootstrap.Elasticsearch
  [2]: 18622 /data/skywalking/webapp/skywalking-webapp.jar
  [3]: 18863 org.apache.skywalking.oap.server.starter.OAPServerStartUp
3     # 这里选择3 skywalking 的服务。
[INFO] arthas home: /root/.arthas/lib/3.4.6/arthas
[INFO] Try to attach process 18863
[INFO] Attach process 18863 success.
[INFO] arthas-client connect 127.0.0.1 3658
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.                           
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'                          
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.                          
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |                         
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'                          
                                                                                

wiki      https://arthas.aliyun.com/doc                                         
tutorials https://arthas.aliyun.com/doc/arthas-tutorials.html                   
version   3.4.6                                                                 
pid       18863                                                                 
time      2021-02-20 17:38:28                                                   

[arthas@18863]$ 

根据关键字查找

从报错日志里面查看有下面一行 里面有请求index的方法。 定位问题出在请求。所以可以定位这个类。

at org.elasticsearch.action.index.IndexRequest.validate(IndexRequest.java:183) ~[elasticsearch-6.3.2.jar:6.3.2]

执行 sc 搜索类
执行 sm 搜索方法

[arthas@18863]$ sc org.elasticsearch.action.index.*
org.apache.skywalking.oap.server.library.client.elasticsearch.ElasticSearchInsertRequest
org.elasticsearch.action.index.IndexRequest
org.elasticsearch.action.index.IndexResponse
org.elasticsearch.action.index.IndexResponse$$Lambda$340/154425708
org.elasticsearch.action.index.IndexResponse$Builder
Affect(row-cnt:5) cost in 15 ms.
#  模糊搜索出来有 org.elasticsearch.action.index.IndexRequest
# 查看具体方法
[arthas@18863]$ sm org.elasticsearch.action.index.IndexRequest
org.elasticsearch.action.index.IndexRequest <init>()V
org.elasticsearch.action.index.IndexRequest <init>(Ljava/lang/String;Ljava/lang/String;)V
org.elasticsearch.action.index.IndexRequest <init>(Ljava/lang/String;)V
org.elasticsearch.action.index.IndexRequest <init>(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V
org.elasticsearch.action.index.IndexRequest validate()Lorg/elasticsearch/action/ActionRequestValidationException;
org.elasticsearch.action.index.IndexRequest getContentType()Lorg/elasticsearch/common/xcontent/XContentType;
org.elasticsearch.action.index.IndexRequest process(Lorg/elasticsearch/Version;Lorg/elasticsearch/cluster/metadata/MappingMetaData;Ljava/lang/String;)V
org.elasticsearch.action.index.IndexRequest writeTo(Lorg/elasticsearch/common/io/stream/StreamOutput;)V
org.elasticsearch.action.index.IndexRequest source()Lorg/elasticsearch/common/bytes/BytesReference;
org.elasticsearch.action.index.IndexRequest source(Ljava/util/Map;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source(Lorg/elasticsearch/common/bytes/BytesReference;Lorg/elasticsearch/common/xcontent/XContentType;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source([BIILorg/elasticsearch/common/xcontent/XContentType;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source(Lorg/elasticsearch/common/xcontent/XContentType;[Ljava/lang/Object;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source([Ljava/lang/Object;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source(Lorg/elasticsearch/common/xcontent/XContentBuilder;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source(Ljava/lang/String;Lorg/elasticsearch/common/xcontent/XContentType;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source(Ljava/util/Map;Lorg/elasticsearch/common/xcontent/XContentType;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest source([BLorg/elasticsearch/common/xcontent/XContentType;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest routing(Ljava/lang/String;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest routing(Ljava/lang/String;)Ljava/lang/Object;
org.elasticsearch.action.index.IndexRequest routing()Ljava/lang/String;
org.elasticsearch.action.index.IndexRequest opType(Ljava/lang/String;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest opType()Lorg/elasticsearch/action/DocWriteRequest$OpType;
org.elasticsearch.action.index.IndexRequest opType(Lorg/elasticsearch/action/DocWriteRequest$OpType;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest versionType(Lorg/elasticsearch/index/VersionType;)Ljava/lang/Object;
org.elasticsearch.action.index.IndexRequest versionType(Lorg/elasticsearch/index/VersionType;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest versionType()Lorg/elasticsearch/index/VersionType;
org.elasticsearch.action.index.IndexRequest isRetry()Z
org.elasticsearch.action.index.IndexRequest setPipeline(Ljava/lang/String;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest getPipeline()Ljava/lang/String;
org.elasticsearch.action.index.IndexRequest sourceAsMap()Ljava/util/Map;
org.elasticsearch.action.index.IndexRequest resolveVersionDefaults()J
org.elasticsearch.action.index.IndexRequest resolveRouting(Lorg/elasticsearch/cluster/metadata/MetaData;)V
org.elasticsearch.action.index.IndexRequest readFrom(Lorg/elasticsearch/common/io/stream/StreamInput;)V
org.elasticsearch.action.index.IndexRequest onRetry()V
org.elasticsearch.action.index.IndexRequest getAutoGeneratedTimestamp()J
org.elasticsearch.action.index.IndexRequest setShardId(Lorg/elasticsearch/index/shard/ShardId;)Lorg/elasticsearch/action/support/replication/ReplicationRequest;
org.elasticsearch.action.index.IndexRequest setShardId(Lorg/elasticsearch/index/shard/ShardId;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest parent(Ljava/lang/String;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest parent()Ljava/lang/String;
org.elasticsearch.action.index.IndexRequest type(Ljava/lang/String;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest type()Ljava/lang/String;
org.elasticsearch.action.index.IndexRequest toString()Ljava/lang/String;
org.elasticsearch.action.index.IndexRequest create(Z)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest version(J)Ljava/lang/Object;
org.elasticsearch.action.index.IndexRequest version(J)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest version()J
org.elasticsearch.action.index.IndexRequest id(Ljava/lang/String;)Lorg/elasticsearch/action/index/IndexRequest;
org.elasticsearch.action.index.IndexRequest id()Ljava/lang/String;
org.apache.skywalking.oap.server.library.client.elasticsearch.ElasticSearchInsertRequest <init>(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V
org.apache.skywalking.oap.server.library.client.elasticsearch.ElasticSearchInsertRequest source(Lorg/elasticsearch/common/xcontent/XContentBuilder;)Lorg/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchInsertRequest;
org.apache.skywalking.oap.server.library.client.elasticsearch.ElasticSearchInsertRequest source(Lorg/elasticsearch/common/xcontent/XContentBuilder;)Lorg/elasticsearch/action/index/IndexRequest;
Affect(row-cnt:52) cost in 19 ms.

里面有 validate 这个方法。 下面就直接watch 这个方法

定位问题

[arthas@18863]$ watch org.elasticsearch.action.index.IndexRequest validate 
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 1) cost in 115 ms, listenerId: 9
method=org.elasticsearch.action.index.IndexRequest.validate location=AtExit
ts=2021-02-20 17:09:48; [cost=1.001631ms] result=@ArrayList[
    @Object[][isEmpty=true;size=0],
    @ElasticSearchInsertRequest[index {
    
    [es_endpoint_inventory][type][21_/api/platform-merchants-by-merchant-no/394/8628888495116643,5182073620535000,3443172328875926,2250786343073631,8576092672333003,6763668243141332,6492958354264545,6371978301346528,4396232269781811,7076446563756034,6243473869242754,3209554209322923,2048713085087784,3578049506120495,9089969774634252,9584107801344760,1963194682021308,6372911380583361,6052625865198842,5360949050957454,6821866195213903,5793037639974033,1524789348846301,3028126582081557,8458758411205068,3949114501690683,5959222159114331,5451554830330944,3385326099946619,6865603164164470,4112671470578858,8784451002706920,5688016656337076,2982511945470498,1435713064978363_0], source[{
    
    "sequence":9870,"last_update_time":0,"heartbeat_time":1613812188462,"service_id":21,"name":"/api/platform-merchants-by-merchant-no/394/8628888495116643,5182073620535000,3443172328875926,2250786343073631,8576092672333003,6763668243141332,6492958354264545,6371978301346528,4396232269781811,7076446563756034,6243473869242754,3209554209322923,2048713085087784,3578049506120495,9089969774634252,9584107801344760,1963194682021308,6372911380583361,6052625865198842,5360949050957454,6821866195213903,5793037639974033,1524789348846301,3028126582081557,8458758411205068,3949114501690683,5959222159114331,5451554830330944,3385326099946619,6865603164164470,4112671470578858,8784451002706920,5688016656337076,2982511945470498,1435713064978363","detect_point":0,"register_time":1613812188462}]}],
    @ActionRequestValidationException[org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 642;],
]
method=org.elasticsearch.action.index.IndexRequest.validate location=AtExit

这里就直接看到具体的接口为:/api/platform-merchants-by-merchant-no
接下来就是跟开发对接找到接口具体的服务。
问题解决办法: 暂时把定位到的这个应用启动脚本中的的skywalking agent移除

退出调试

[arthas@18863]$ stop
Resetting all enhanced classes ...
Affect(class count: 2 , method count: 0) cost in 106 ms, listenerId: 0
Arthas Server is going to shut down...
[arthas@18863]$ session (16ba375e-c26d-4b1c-9398-2866ec3eed9b) is closed because server is going to shutdown.

参考文章:
Arthas协助排查线上skywalking不可用问题:
https://www.cnblogs.com/kebibuluan/p/13633037.html

Arthas watch 命令使用指南:
https://my.oschina.net/u/3874284/blog/4306792

arthas class/classloader相关命令之一:sc、sm:
https://blog.csdn.net/a772304419/article/details/108432685

用arthas的watch方法观察执行方法的输入输出:
https://www.cnblogs.com/doit8791/p/12040642.html

猜你喜欢

转载自blog.csdn.net/lswzw/article/details/113887840