如何解决Nginx-ingress 工作进程不断重启
我最近将入口控制器升级到 kubernetes-ingress v1.10.0。入口似乎正确路由流量,但在检查 pod 日志后,我注意到生成了大量通知:
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2748
2021/02/10 09:40:23 [notice] 19#19: worker process 2748 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2745
2021/02/10 09:40:23 [notice] 19#19: worker process 2745 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
W0210 09:40:23.416499 1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.416812 1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.416912 1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2735
2021/02/10 09:40:23 [notice] 19#19: worker process 2735 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2737
2021/02/10 09:40:23 [notice] 19#19: worker process 2737 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2742 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2746
2021/02/10 09:40:23 [notice] 19#19: worker process 2746 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2744
2021/02/10 09:40:23 [notice] 19#19: worker process 2744 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2740
2021/02/10 09:40:23 [notice] 19#19: worker process 2740 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2736
2021/02/10 09:40:23 [notice] 19#19: worker process 2736 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2741
2021/02/10 09:40:23 [notice] 19#19: worker process 2734 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2741 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2739
2021/02/10 09:40:23 [notice] 19#19: worker process 2739 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2738
2021/02/10 09:40:23 [notice] 19#19: worker process 2738 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2743
2021/02/10 09:40:23 [notice] 19#19: worker process 2743 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2749
2021/02/10 09:40:23 [notice] 19#19: worker process 2749 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2747
2021/02/10 09:40:23 [notice] 19#19: worker process 2747 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [warn] 2718#2718: *6697105 an upstream response is buffered to a temporary file /var/cache/Nginx/proxy_temp/6/79/0000214796 while reading upstream,client: xxxx,server: xxxx,request: "GET /xxxx HTTP/1.1",upstream: "xxxx",host: "xxxx",referrer: "xxxx"
2021/02/10 09:40:23 [notice] 2769#2769: signal process started
2021/02/10 09:40:23 [notice] 19#19: signal 1 (SIGHUP) received from 2769,reconfiguring
2021/02/10 09:40:23 [notice] 19#19: reconfiguring
2021/02/10 09:40:23 [notice] 19#19: using the "epoll" event method
2021/02/10 09:40:23 [notice] 19#19: start worker processes
2021/02/10 09:40:23 [notice] 19#19: start worker process 2770
2021/02/10 09:40:23 [notice] 19#19: start worker process 2771
2021/02/10 09:40:23 [notice] 19#19: start worker process 2772
2021/02/10 09:40:23 [notice] 19#19: start worker process 2773
2021/02/10 09:40:23 [notice] 19#19: start worker process 2774
2021/02/10 09:40:23 [notice] 19#19: start worker process 2775
2021/02/10 09:40:23 [notice] 19#19: start worker process 2776
2021/02/10 09:40:23 [notice] 19#19: start worker process 2777
2021/02/10 09:40:23 [notice] 19#19: start worker process 2778
2021/02/10 09:40:23 [notice] 19#19: start worker process 2779
2021/02/10 09:40:23 [notice] 19#19: start worker process 2780
2021/02/10 09:40:23 [notice] 19#19: start worker process 2781
2021/02/10 09:40:23 [notice] 19#19: start worker process 2782
2021/02/10 09:40:23 [notice] 19#19: start worker process 2783
2021/02/10 09:40:23 [notice] 19#19: start worker process 2784
2021/02/10 09:40:23 [notice] 19#19: start worker process 2785
90.114.22.230 - - [10/Feb/2021:09:40:23 +0000] "GET /xxxx HTTP/1.1" 200 352910 "xxxx" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0" "-"
2021/02/10 09:40:23 [notice] 2753#2753: gracefully shutting down
2021/02/10 09:40:23 [notice] 2755#2755: gracefully shutting down
2021/02/10 09:40:23 [notice] 2760#2760: gracefully shutting down
2021/02/10 09:40:23 [notice] 2755#2755: exiting
2021/02/10 09:40:23 [notice] 2753#2753: exiting
2021/02/10 09:40:23 [notice] 2762#2762: gracefully shutting down
2021/02/10 09:40:23 [notice] 2760#2760: exiting
2021/02/10 09:40:23 [notice] 2766#2766: gracefully shutting down
2021/02/10 09:40:23 [notice] 2762#2762: exiting
2021/02/10 09:40:23 [notice] 2766#2766: exiting
2021/02/10 09:40:23 [notice] 2759#2759: gracefully shutting down
2021/02/10 09:40:23 [notice] 2759#2759: exiting
2021/02/10 09:40:23 [notice] 2763#2763: gracefully shutting down
2021/02/10 09:40:23 [notice] 2761#2761: gracefully shutting down
2021/02/10 09:40:23 [notice] 2767#2767: gracefully shutting down
2021/02/10 09:40:23 [notice] 2763#2763: exiting
2021/02/10 09:40:23 [notice] 2767#2767: exiting
2021/02/10 09:40:23 [notice] 2761#2761: exiting
2021/02/10 09:40:23 [notice] 2760#2760: exit
2021/02/10 09:40:23 [notice] 2753#2753: exit
2021/02/10 09:40:23 [notice] 2766#2766: exit
2021/02/10 09:40:23 [notice] 2764#2764: gracefully shutting down
2021/02/10 09:40:23 [notice] 2764#2764: exiting
2021/02/10 09:40:23 [notice] 2752#2752: gracefully shutting down
2021/02/10 09:40:23 [notice] 2752#2752: exiting
2021/02/10 09:40:23 [notice] 2763#2763: exit
2021/02/10 09:40:23 [notice] 2762#2762: exit
2021/02/10 09:40:23 [notice] 2764#2764: exit
2021/02/10 09:40:23 [notice] 2759#2759: exit
2021/02/10 09:40:23 [notice] 2755#2755: exit
2021/02/10 09:40:23 [notice] 2752#2752: exit
2021/02/10 09:40:23 [notice] 2767#2767: exit
2021/02/10 09:40:23 [notice] 2761#2761: exit
2021/02/10 09:40:23 [notice] 2758#2758: gracefully shutting down
2021/02/10 09:40:23 [notice] 2758#2758: exiting
2021/02/10 09:40:23 [notice] 2756#2756: gracefully shutting down
2021/02/10 09:40:23 [notice] 2756#2756: exiting
2021/02/10 09:40:23 [notice] 2758#2758: exit
2021/02/10 09:40:23 [notice] 2756#2756: exit
2021/02/10 09:40:23 [notice] 2765#2765: gracefully shutting down
2021/02/10 09:40:23 [notice] 2765#2765: exiting
2021/02/10 09:40:23 [notice] 2757#2757: gracefully shutting down
2021/02/10 09:40:23 [notice] 2757#2757: exiting
2021/02/10 09:40:23 [notice] 2754#2754: gracefully shutting down
2021/02/10 09:40:23 [notice] 2754#2754: exiting
2021/02/10 09:40:23 [notice] 2754#2754: exit
2021/02/10 09:40:23 [notice] 2765#2765: exit
2021/02/10 09:40:23 [notice] 2757#2757: exit
I0210 09:40:23.604803 1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",Namespace:"xxxx",Name:"xxxx",UID:"82a71705-194e-4919-a7e2-a511d52c1a7a",APIVersion:"networking.k8s.io/v1beta1",ResourceVersion:"77919848",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated
I0210 09:40:23.604873 1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"10246997-07ae-41e1-b811-0ec630647f3b",ResourceVersion:"182677830",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated
I0210 09:40:23.605520 1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"d628825f-1b06-4719-b4b0-4d971b8c0a54",ResourceVersion:"182677778",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated
I0210 09:40:23.605557 1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"4b7b1fa1-1d7d-41a5-9d97-5f5aee52ade7",ResourceVersion:"182678922",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated
I0210 09:40:23.605569 1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"b86b8b8e-82b9-40d0-b02d-073db557c0e1",ResourceVersion:"182678955",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated
I0210 09:40:23.605577 1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"585ccdee-9807-442e-9b4f-7d1a97264216",ResourceVersion:"182677754",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated
W0210 09:40:23.614001 1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.614213 1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.614304 1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2763
2021/02/10 09:40:23 [notice] 19#19: worker process 2755 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2763 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2767 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2766
2021/02/10 09:40:23 [notice] 19#19: worker process 2752 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2753 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2766 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2756
2021/02/10 09:40:23 [notice] 19#19: worker process 2756 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2758 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2759 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2760 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2761 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2762 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2764 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2754
2021/02/10 09:40:23 [notice] 19#19: worker process 2754 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2765
2021/02/10 09:40:23 [notice] 19#19: worker process 2765 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
这似乎在所有 Pod 上永远且非常快速地循环。 我使用 these manifests 部署了我的控制器并重新创建了发行说明中提到的默认服务器机密。
控制器参数是:
args:
- -Nginx-configmaps=$(POD_NAMESPACE)/Nginx-config
- -default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
- -global-configuration=$(POD_NAMESPACE)/Nginx-configuration
- -report-ingress-status
- -enable-prometheus-metrics
- -enable-snippets
data:
client-max-body-size: 50m
proxy-read-timeout: 5m
server-tokens: "False"
知道那里发生了什么以及如何解决这个问题吗?
编辑:
经过更多研究后,我发现我的两个入口在不断更新:
Name: xxxx
Namespace: xxxx
Address:
Default backend: default-http-backend:80 (<none>)
TLS:
xxxx terminates xxxx
Rules:
Host Path Backends
---- ---- --------
* * default-http-backend:80 (<none>)
Annotations:
ingress.kubernetes.io/ssl-redirect: true
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","Metadata":{"annotations":{"ingress.kubernetes.io/ssl-redirect":"true","kubernetes.io/ingress.class":"Nginx","Nginx.org/mergeable-ingress-type":"master"},"labels":{"app.kubernetes.io/component":"xxxx","app.kubernetes.io/instance":"xxxx","app.kubernetes.io/name":"xxxx","app.kubernetes.io/part-of":"xxxx","argocd.argoproj.io/instance":"xxxx"},"name":"xxxx","namespace":"xxxx"},"spec":{"rules":[{"host":"xxxx"}],"tls":[{"hosts":["xxxx"],"secretName":"xxxx"}]}}
kubernetes.io/ingress.class: Nginx
Nginx.org/mergeable-ingress-type: master
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
normal AddedOrUpdated 3m5s (x2600127 over 6d) nginx-ingress-controller Configuration for xxxx/xxxx was added or updated
normal AddedOrUpdated 2m12s (x2599793 over 6d) nginx-ingress-controller Configuration for xxxx/xxxx was added or updated
normal AddedOrUpdated 66s (x2600182 over 6d) nginx-ingress-controller Configuration for xxxx/xxxx was added or updated
apiVersion: extensions/v1beta1
kind: Ingress
Metadata:
annotations:
ingress.kubernetes.io/ssl-redirect: "true"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","secretName":"xxxx"}]}}
kubernetes.io/ingress.class: Nginx
Nginx.org/mergeable-ingress-type: master
creationTimestamp: "2021-01-18T09:55:07Z"
generation: 1
labels:
app.kubernetes.io/component: xxxx
app.kubernetes.io/instance: xxxx
app.kubernetes.io/name: xxxx
app.kubernetes.io/part-of: xxxx
argocd.argoproj.io/instance: xxxx
name: xxxx
namespace: xxxx
resourceVersion: "182677754"
selfLink: /apis/extensions/v1beta1/namespaces/xxxx/ingresses/xxxx
uid: 585ccdee-9807-442e-9b4f-7d1a97264216
spec:
rules:
- host: xxxx
tls:
- hosts:
- xxxx
secretName: xxxx
status:
loadBalancer:
ingress:
- {}
我的环境由 ArgoCD 管理,但在检查日志后,看起来更新并非来自 ArgoCD。我想知道这些更新是否与 -report-ingress-status
选项有关。
编辑二:
我删除了 -report-ingress-status
并且它没有改变任何东西。
解决方法
我不知道真正的根本原因,但我删除了所有不断更新和重新创建的 TLS 机密、证书和入口。它解决了这个问题。
在此问题之前发生了不同的事件,并且可能与之相关:我的 3 个入口节点中有 2 个失败,在升级过程中应用了错误的 CRD,然后才得到快速修复。
目前我只能说这么多,但是删除与不断更新的入口相关的资源并重新创建它们确实可以解决问题。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。