如何解决Elasticsearch索引红色状态
我的集群中一切正常,今天我发现我没有来自packetbeat的日志,并且碎片的健康状况是红色的:
当我运行GET _cat / shards时,我会得到类似的东西:
packetbeat-7.9.3-2020.10.28-000001 2 p STARTED 11428 3.8mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001 2 r STARTED 11428 3.8mb 10.13.81.13 VSELK-MASTER-03
packetbeat-7.9.3-2020.10.28-000001 9 r STARTED 11402 3.8mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001 9 p STARTED 11402 3.8mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001 4 p STARTED 11619 4mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001 4 r STARTED 11619 3.9mb 10.13.81.22 VSELK-DATA-02
packetbeat-7.9.3-2020.10.28-000001 5 r STARTED 11567 3.8mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001 5 p STARTED 11567 3.9mb 10.13.81.22 VSELK-DATA-02
packetbeat-7.9.3-2020.10.28-000001 1 r STARTED 11553 3.8mb 10.13.81.11 VSELK-MASTER-01
packetbeat-7.9.3-2020.10.28-000001 1 p STARTED 11553 3.9mb 10.13.81.22 VSELK-DATA-02
packetbeat-7.9.3-2020.10.28-000001 7 r UNASSIGNED
packetbeat-7.9.3-2020.10.28-000001 7 p UNASSIGNED
packetbeat-7.9.3-2020.10.28-000001 6 r UNASSIGNED
packetbeat-7.9.3-2020.10.28-000001 6 p UNASSIGNED
packetbeat-7.9.3-2020.10.28-000001 8 r STARTED 11630 4mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001 8 p STARTED 11630 3.9mb 10.13.81.21 VSELK-DATA-01
packetbeat-7.9.3-2020.10.28-000001 3 p STARTED 11495 4mb 10.13.81.12 VSELK-MASTER-02
packetbeat-7.9.3-2020.10.28-000001 3 r STARTED 11495 3.7mb 10.13.81.13 VSELK-MASTER-03
packetbeat-7.9.3-2020.10.28-000001 0 r STARTED 11713 4mb 10.13.81.11 VSELK-MASTER-01
packetbeat-7.9.3-2020.10.28-000001 0 p STARTED 11713 4mb 10.13.81.22 VSELK-DATA-02
当我运行时,我得到:GET / _cluster / allocation / explain
{
"index" : "packetbeat-7.9.2-2020.10.22-000001","shard" : 6,"primary" : true,"current_state" : "unassigned","unassigned_info" : {
"reason" : "ALLOCATION_FAILED","at" : "2020-10-28T13:22:03.006Z","failed_allocation_attempts" : 5,"details" : """failed shard on node [RCeMt0uXQie_ax_Sp22hLw]: failed to create shard,failure java.io.IOException: failed to obtain in-memory shard lock
at org.elasticsearch.index.IndexService.createShard(IndexService.java:489)
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:763)
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:176)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:607)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:584)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:242)
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:504)
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:494)
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:471)
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:418)
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:162)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:674)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.lang.Thread.run(Thread.java:832)
Caused by: [packetbeat-7.9.2-2020.10.22-000001/RRAnRZrrRZiihscJ3bymig][[packetbeat-7.9.2-2020.10.22-000001][6]] org.elasticsearch.env.ShardLockObtainFailedException: [packetbeat-7.9.2-2020.10.22-000001][6]: obtaining shard lock for [starting shard] timed out after [5000ms],lock already held for [closing shard] with age [199852ms]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:869)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775)
at org.elasticsearch.index.IndexService.createShard(IndexService.java:409)
... 16 more
""","last_allocation_status" : "no"
},"can_allocate" : "no","allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy","node_allocation_decisions" : [
{
"node_id" : "A_nOoYrdSSOAHNQrhfveNA","node_name" : "VSELK-DATA-02","transport_address" : "10.13.81.22:9300","node_attributes" : {
"ml.machine_memory" : "8365424640","ml.max_open_jobs" : "20","xpack.installed" : "true","data" : "cold","transform.node" : "true"
},"node_decision" : "no","store" : {
"found" : false
}
},{
"node_id" : "RCeMt0uXQie_ax_Sp22hLw","node_name" : "VSELK-MASTER-03","transport_address" : "10.13.81.13:9300","node_attributes" : {
"ml.machine_memory" : "8365068288","data" : "hot","store" : {
"in_sync" : true,"allocation_id" : "nMvn4c4vQp2efQQtIeKzlg"
},"deciders" : [
{
"decider" : "max_retry","decision" : "NO","explanation" : """shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry,[unassigned_info[[reason=ALLOCATION_FAILED],at[2020-10-28T13:22:03.006Z],failed_attempts[5],failed_nodes[[hHHRtd5HTCKJgLTBtgDbOw,RCeMt0uXQie_ax_Sp22hLw]],delayed=false,details[failed shard on node [RCeMt0uXQie_ax_Sp22hLw]: failed to create shard,lock already held for [closing shard] with age [199852ms]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:869)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775)
at org.elasticsearch.index.IndexService.createShard(IndexService.java:409)
... 16 more
],allocation_status[deciders_no]]]"""
}
]
},{
"node_id" : "hHHRtd5HTCKJgLTBtgDbOw","node_name" : "VSELK-MASTER-01","transport_address" : "10.13.81.11:9300","transform.node" : "true","ml.max_open_jobs" : "20"
},"allocation_id" : "ByqJGtQSQT-p8dCCfk3VlA"
},{
"node_id" : "k_SgmMDMRfGi-IFLbI-cRw","node_name" : "VSELK-MASTER-02","transport_address" : "10.13.81.12:9300","node_attributes" : {
"ml.machine_memory" : "8365056000",{
"node_id" : "r4V_KqZDQ7mYi7AZea5eXQ","node_name" : "VSELK-DATA-01","transport_address" : "10.13.81.21:9300","data" : "warm","store" : {
"found" : false
}
}
]
}
有人可以告诉我这种错误的原因以及如何解决这些错误吗? (知道我的集群中有5个节点,3个主节点和2个数据节点,并且它们都已启动)
感谢您的帮助!
解决方法
您可以按照相关的GitHub issue,特别是this的注释来解决此问题。
简而言之,您应该尝试使用以下更安全的命令
curl -XPOST'localhost:9200 / _cluster / reroute?retry_failed
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。