k8s 上的 Hashicorp 保险库：获取错误 1 内存不足，1 个节点与 pod 关联性/反关联性不匹配

如何解决k8s 上的 Hashicorp 保险库：获取错误 1 内存不足，1 个节点与 pod 关联性/反关联性不匹配

我正在 k8s (EKS) 上部署 ha vault 并在其中一个 vault pod 上收到此错误，我认为这也会导致其他 pod 失败：这是 kubectl get events:
的输出搜索：nodes are available: 1 Insufficient memory

26m         Normal    Created                        pod/vault-1                                 Created container vault
26m         Normal    Started                        pod/vault-1                                 Started container vault
26m         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m40s       Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-1                                 Successfully assigned vault-foo/vault-1 to ip-10-101-0-103.ec2.internal
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-acfc7e26-3616-4075-ab79-0c3f7b0f6470"
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-19d03d48-1de2-41f8-aadf-02d0a9f4bfbd"
48s         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
48s         Normal    Created                        pod/vault-1                                 Created container vault
99s         Normal    Started                        pod/vault-1                                 Started container vault
60s         Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
27m         Normal    TaintManagerEviction           pod/vault-2                                 Cancelling deletion of Pod vault-foo/vault-2
28m         Warning   FailedScheduling               pod/vault-2                                 0/4 nodes are available: 1 Insufficient memory,4 Insufficient cpu.
28m         Warning   FailedScheduling               pod/vault-2                                 0/5 nodes are available: 1 Insufficient memory,1 node(s) had taint {node.kubernetes.io/not-ready: },that the pod didn't tolerate,4 Insufficient cpu.
27m         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-103.ec2.internal
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
27m         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
27m         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
26m         Normal    Created                        pod/vault-2                                 Created container vault
26m         Normal    Started                        pod/vault-2                                 Started container vault
26m         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m26s       Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m36s       Warning   FailedScheduling               pod/vault-2                                 0/7 nodes are available: 1 Insufficient memory,1 node(s) didn't match pod affinity/anti-affinity,1 node(s) didn't satisfy existing pods anti-affinity rules,1 node(s) had volume node affinity conflict,1 node(s) were unschedulable,4 Insufficient cpu.
114s        Warning   FailedScheduling               pod/vault-2                                 0/8 nodes are available: 1 Insufficient memory,4 Insufficient cpu.
104s        Warning   FailedScheduling               pod/vault-2                                 0/9 nodes are available: 1 Insufficient memory,2 node(s) had taint {node.kubernetes.io/not-ready: },4 Insufficient cpu.
93s         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-82.ec2.internal
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
83s         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
81s         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
38s         Normal    Created                        pod/vault-2                                 Created container vault
37s         Normal    Started                        pod/vault-2                                 Started container vault
38s         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
4s          Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-qwsmz    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-qwsmz to ip-10-101-2-91.ec2.internal
2m37s       Normal    Pulling                        pod/vault-agent-injector-d54bdc675-qwsmz    Pulling image "hashicorp/vault-k8s:latest"
2m36s       Normal    Pulled                         pod/vault-agent-injector-d54bdc675-qwsmz    Successfully pulled image "hashicorp/vault-k8s:latest"
2m36s       Normal    Created                        pod/vault-agent-injector-d54bdc675-qwsmz    Created container sidecar-injector
2m35s       Normal    Started                        pod/vault-agent-injector-d54bdc675-qwsmz    Started container sidecar-injector
28m         Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-wz9ws    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-wz9ws to ip-10-101-0-87.ec2.internal
28m         Normal    Pulled                         pod/vault-agent-injector-d54bdc675-wz9ws    Container image "hashicorp/vault-k8s:latest" already present on machine
28m         Normal    Created                        pod/vault-agent-injector-d54bdc675-wz9ws    Created container sidecar-injector
28m         Normal    Started                        pod/vault-agent-injector-d54bdc675-wz9ws    Started container sidecar-injector
3m22s       Normal    Killing                        pod/vault-agent-injector-d54bdc675-wz9ws    Stopping container sidecar-injector
3m22s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Readiness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: connection refused
3m18s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Liveness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: no route to host
28m         Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-wz9ws
2m38s       Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-qwsmz
28m         Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
2m38s       Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
28m         Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
28m         Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
26m         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
3m24s       Normal    DeletingLoadBalancer           service/vault-ui                            Deleting load balancer
3m23s       Warning   PortNotAllocated               service/vault-ui                            Port 32476 is not allocated; repairing
3m23s       Warning   ClusterIPNotAllocated          service/vault-ui                            Cluster IP 172.20.216.143 is not allocated; repairing
3m22s       Warning   FailedToUpdateEndpointSlices   service/vault-ui                            Error updating Endpoint Slices for Service vault-foo/vault-ui: failed to update vault-ui-crtg4 EndpointSlice for Service vault-foo/vault-ui: Operation cannot be fulfilled on endpointslices.discovery.k8s.io "vault-ui-crtg4": the object has been modified; please apply your changes to the latest version and try again
3m16s       Warning   FailedToUpdateEndpoint         endpoints/vault-ui                          Failed to update endpoint vault-foo/vault-ui: Operation cannot be fulfilled on endpoints "vault-ui": the object has been modified; please apply your changes to the latest version and try again
2m52s       Normal    DeletedLoadBalancer            service/vault-ui                            Deleted load balancer
2m39s       Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
2m36s       Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
96s         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
28m         Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-0 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-1 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-2 in StatefulSet vault successful
2m40s       Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-0 in StatefulSet vault successful
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-1 in StatefulSet vault successful
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-2 in StatefulSet vault successful

这是我的掌舵：

# Vault Helm Chart Value Overrides
global:
  enabled: true
  tlsDisable: false

injector:
  enabled: true
  # Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
  image:
    repository: "hashicorp/vault-k8s"
    tag: "latest"

  resources:
      requests:
        memory: 256Mi
        cpu: 250m
      limits:
        memory: 256Mi
        cpu: 250m

server:
  # Use the Enterprise Image
  image:
    repository: "hashicorp/vault-enterprise"
    tag: "1.5.0_ent"

  # These Resource Limits are in line with node requirements in the
  # Vault Reference Architecture for a Small Cluster
  resources:
    requests:
      memory: 8Gi
      cpu: 2000m
    limits:
      memory: 16Gi
      cpu: 2000m

  # For HA configuration and because we need to manually init the vault,# we need to define custom readiness/liveness Probe settings
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
  livenessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true"
    initialDelaySeconds: 60

  # extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
  # used to include variables required for auto-unseal.
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca

  # extraVolumes is a list of extra volumes to mount. These will be exposed
  # to Vault in the path .
  #extraVolumes:
  #  - type: secret
  #    name: tls-server
  #  - type: secret
  #    name: tls-ca
  #  - type: secret
  #    name: kms-creds
  extraVolumes:
    - type: secret
      name: vault-server-tls   
  
  # This configures the Vault Statefulset to create a PVC for audit logs.
  # See https://www.vaultproject.io/docs/audit/index.html to know more
  auditStorage:
    enabled: true

  standalone:
    enabled: false

  # Run Vault in "HA" mode.
  ha:
    enabled: true
    replicas: 3
    raft:
      enabled: true
      setNodeId: true

      config: |
        ui = true
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
          tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          tls_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
        }

        storage "raft" {
          path = "/vault/data"
            retry_join {
            leader_api_addr = "http://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "http://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "http://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          }
        }

        service_registration "kubernetes" {}

# Vault UI
ui:
  enabled: true
  serviceType: "LoadBalancer"
  serviceNodePort: null
  externalPort: 8200

  # For Added Security,edit the below
  #loadBalancerSourceRanges:
  #   - < Your IP RANGE Ex. 10.0.0.0/16 >
  #   - < YOUR SINGLE IP Ex. 1.78.23.3/32 >

我没有正确配置什么？

解决方法

这里有几个问题，它们都由以下错误消息表示：

0/9 nodes are available: 1 Insufficient memory,1 node(s) didn't match pod affinity/anti-affinity,1 node(s) didn't satisfy existing pods anti-affinity rules,1 node(s) had volume node affinity conflict,1 node(s) were unschedulable,2 node(s) had taint {node.kubernetes.io/not-ready: },that the pod didn't tolerate,4 Insufficient cpu.

您有 9 个节点，但由于一组不同的条件，没有一个可用于调度。请注意，每个节点都可能受到多个问题的影响，因此这些数字加起来可能会超过您在节点总数上的数量。

让我们一一分解：

Insufficient memory：执行 kubectl describe node <node-name> 以检查那里有多少可用内存。检查 Pod 的请求和限制。请注意，无论 Pod 使用多少内存，Kubernetes 都会阻塞 Pod 请求的全部内存。
Insufficient cpu：同上。
node(s) didn't match pod affinity/anti-affinity：检查您的 affinity/anti-affinity 规则。
node(s) didn't satisfy existing pods anti-affinity rules：同上。
node(s) had volume node affinity conflict：当 pod 无法被调度时发生，因为它无法从另一个可用区连接到卷。您可以通过为单个区域创建 storageclass 并在您的 PVC 中使用该 storageclass 来解决此问题。
node(s) were unschedulable：这是因为节点被标记为Unschedulable。这就引出了下面的下一个问题：
node(s) had taint {node.kubernetes.io/not-ready: },that the pod didn't tolerate：这对应于 NodeCondition Ready = False。您可以使用 kubectl describe node 检查污点和 kubectl taint nodes <node-name> <taint-name>- 以删除它们。查看Taints and Tolerations了解更多详情。

还有一个 GitHub thread 有类似的问题，您可能会觉得有用。

尝试逐个检查/消除这些问题（从上面列出的第一个开始），因为它们在某些情况下会产生“连锁反应”。

k8s 上的 Hashicorp 保险库：获取错误 1 ​​内存不足，1 个节点与 pod 关联性/反关联性不匹配

如何解决k8s 上的 Hashicorp 保险库：获取错误 1 ​​内存不足，1 个节点与 pod 关联性/反关联性不匹配

解决方法

相关推荐

k8s 上的 Hashicorp 保险库：获取错误 1 内存不足，1 个节点与 pod 关联性/反关联性不匹配

如何解决k8s 上的 Hashicorp 保险库：获取错误 1 内存不足，1 个节点与 pod 关联性/反关联性不匹配