如何解决在浏览器上正好 2 分钟后,空闲的 websocket 连接在 AKS 上断开
我正在努力将我们的平台从本地迁移到 AKS 集群。我们有一个 websocket web 应用程序,我已经通过 Nginx 负载均衡器入口路由。用于入口的注释是:
kubernetes.io/ingress.class: nginx
kubernetes.io/ingress.allow-http: "false"
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/rewrite-target: /$1
我可以确认这会转化为以下 nginx 配置:
location ~* "^/<service-name>/(.*)" {
set $namespace "<app-name>";
set $ingress_name "<service-name>-ingress";
set $service_name "<service-name>";
set $service_port "6006";
set $location_path "/<service-name>/(.*)";
rewrite_by_lua_block {
lua_ingress.rewrite({
force_ssl_redirect = false,ssl_redirect = true,force_no_ssl_redirect = false,use_port_in_redirects = false,})
balancer.rewrite()
plugins.run()
}
header_filter_by_lua_block {
plugins.run()
}
body_filter_by_lua_block {
}
log_by_lua_block {
balancer.log()
monitor.call()
plugins.run()
}
port_in_redirect off;
set $balancer_ewma_score -1;
set $proxy_upstream_name "<app-name>-<service-name>-6006";
set $proxy_host $proxy_upstream_name;
set $pass_access_scheme $scheme;
set $pass_server_port $server_port;
set $best_http_host $http_host;
set $pass_port $pass_server_port;
set $proxy_alternative_upstream_name "";
client_max_body_size 1m;
proxy_set_header Host $best_http_host;
# Pass the extracted client certificate to the backend
# Allow websocket connections
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Request-ID $req_id;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Host $best_http_host;
proxy_set_header X-Forwarded-Port $pass_port;
proxy_set_header X-Forwarded-Proto $pass_access_scheme;
proxy_set_header X-Scheme $pass_access_scheme;
# Pass the original X-Forwarded-For
proxy_set_header X-Original-Forwarded-For $http_x_forwarded_for;
# mitigate HTTPoxy Vulnerability
# https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
proxy_set_header Proxy "";
# Custom headers to proxied server
proxy_connect_timeout 5s;
proxy_send_timeout 3600s;
proxy_read_timeout 3600s;
proxy_buffering off;
proxy_buffer_size 4k;
proxy_buffers 4 4k;
proxy_max_temp_file_size 1024m;
proxy_request_buffering on;
proxy_http_version 1.1;
proxy_cookie_domain off;
proxy_cookie_path off;
# In case of errors try the next upstream server before returning an error
proxy_next_upstream error timeout;
proxy_next_upstream_timeout 0;
proxy_next_upstream_tries 3;
rewrite "(?i)/<service-name>/(.*)" /$1 break;
proxy_pass http://upstream_balancer;
proxy_redirect off;
}
奇怪的地方来了。当我尝试通过浏览器连接时,它能够找到该服务并创建一个 websocket 连接。但是,在恰好 2 分钟(120 秒)不活动后,连接就会断开。
我还可以确认,只有在通过浏览器访问服务时才会发生这种情况(我使用 Chrome 进行测试)。当我尝试通过命令行使用 wscat 连接到 websocket 时,连接保持打开状态超过 10 分钟。我什至试图复制 chrome 发送的标题,但无济于事。从浏览器发出的请求的标头是:
Host: <host>
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/86.0.4240.183 Safari/537.36
Upgrade: websocket
Origin: <origin>
Sec-WebSocket-Version: 13
Accept-Encoding: gzip,deflate,br
Accept-Language: en-US,en;q=0.9
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
和 nginx 调试日志,当请求被发出,直到它断开连接如下:
2020/12/02 18:06:59 [debug] 102#102: *11638 http request line: "GET /<service-name>/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e HTTP/1.1"
2020/12/02 18:06:59 [debug] 102#102: *11638 http uri: "/<service-name>/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
2020/12/02 18:06:59 [debug] 102#102: *11638 test location: ~ "^/<service-name>/(.*)"
2020/12/02 18:06:59 [debug] 102#102: *11638 using configuration "^/<service-name>/(.*)"
2020/12/02 18:06:59 [debug] 102#102: *11638 http script value: "<service-name>-ingress"
2020/12/02 18:06:59 [debug] 102#102: *11638 http script value: "<service-name>"
2020/12/02 18:06:59 [debug] 102#102: *11638 http script value: "/<service-name>/(.*)"
2020/12/02 18:06:59 [debug] 102#102: *11638 http script value: "<app-name>-<service-name>-<port>"
2020/12/02 18:06:59 [debug] 102#102: *11638 http script var: "<app-name>-<service-name>-<port>"
2020/12/02 18:06:59 [debug] 102#102: *11638 http script regex: "(?i)/<service-name>/(.*)"
2020/12/02 18:06:59 [notice] 102#102: *11638 "(?i)/<service-name>/(.*)" matches "/<service-name>/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e",client: 10.216.148.206,server: <ingress-host>,request: "GET /<service-name>/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e HTTP/1.1",host: "<ingress-host>"
2020/12/02 18:06:59 [debug] 102#102: *11638 http script capture: "<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
2020/12/02 18:06:59 [notice] 102#102: *11638 rewritten data: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e",args: "",host: "<ingress-host>"
2020/12/02 18:06:59 [debug] 102#102: *11638 lua rewrite handler,uri:"/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e" c:1 "GET /<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e HTTP/1.1
2020/12/02 18:06:59 [debug] 102#102: *11638 http finalize request: -4,"/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?" a:1,c:2
2020/12/02 18:06:59 [debug] 102#102: *11638 http run request: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:06:59 [debug] 102#102: *11638 http upstream check client,write event:1,"/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
2020/12/02 18:06:59 [debug] 102#102: *11638 http upstream request: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:06:59 [debug] 102#102: *11638 lua header filter for user lua code,uri "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
2020/12/02 18:06:59 [debug] 102#102: *11638 lua capture header filter,uri "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
Set-Cookie: INGRESSCOOKIE=1606932420.391.102.754662; Path=/<service-name>/(.*); Secure; HttpOnly
2020/12/02 18:06:59 [debug] 102#102: *11638 http output filter "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:06:59 [debug] 102#102: *11638 http copy filter: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:06:59 [debug] 102#102: *11638 lua body filter for user lua code,uri "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
2020/12/02 18:06:59 [debug] 102#102: *11638 lua capture body filter,uri "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e" 2020/12/02 18:06:59 [debug] 102#102: *11638 http postpone filter "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?" 00007FFF78E0D8C0
2020/12/02 18:06:59 [debug] 102#102: *11638 http copy filter: 0 "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:06:59 [debug] 102#102: *11638 http upstream request: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:06:59 [debug] 102#102: *11638 http upstream request: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:06:59 [debug] 102#102: *11638 http upstream request: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
I1202 18:06:59.635816 7 socket.go:213] msg: [{"host":"<ingress-host>","method":"POST","requestLength":138,"status":"200","upstreamResponseLength":36,"upstreamLatency":0.004,"upstreamResponseTime":0.084,"path":"\/<service-name>\/(.*)","requestTime":0.081,"ingress":"<service-name>-ingress","namespace":"<app-name>","service":"<service-name>","responseLength":318}]
2020/12/02 18:07:06 [debug] 102#102: *11536 test location: ~ "^/<service-name>/(.*)"
2020/12/02 18:07:19 [debug] 102#102: *11536 test location: ~ "^/<service-name>/(.*)"
2020/12/02 18:09:00 [debug] 102#102: *11638 http run request: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:09:00 [debug] 102#102: *11638 http output filter "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:09:00 [debug] 102#102: *11638 http copy filter: "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:09:00 [debug] 102#102: *11638 lua body filter for user lua code,uri "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
2020/12/02 18:09:00 [debug] 102#102: *11638 lua capture body filter,uri "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e" 2020/12/02 18:09:00 [debug] 102#102: *11638 http postpone filter "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?" 00007FFF78E0D8B0
2020/12/02 18:09:00 [debug] 102#102: *11638 http copy filter: 0 "/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e?"
2020/12/02 18:09:00 [debug] 102#102: *11638 http finalize request: 0,c:1
2020/12/02 18:09:00 [debug] 102#102: *11638 lua log handler,uri:"/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e" c:0
10.216.148.206 - - [02/Dec/2020:18:09:00 +0000] "GET /<service-name>/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e HTTP/1.1" 101 97 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/86.0.4240.183 Safari/537.36" 583 121.143 [<app-name>-<service-name>-<port>] [] 10.251.137.49:4444 0 121.144 101 32aa8370ebd8b2c93f150f9fa6a45ebd
2020/12/02 18:09:00 [debug] 102#102: *11638 http script var: "/<service-name>/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e"
2020/12/02 18:09:00 [debug] 102#102: *11638 http map: "/<service-name>/<websocket-endpoint>/3eccfbdc-f066-4e4d-a635-d2933503a68e" "1"
I1202 18:09:00.707439 7 socket.go:213] msg: [{"host":"<ingress-host>","method":"GET","requestLength":583,"status":"101","upstreamResponseLength":0,"upstreamLatency":0,"upstreamResponseTime":121.144,"requestTime":121.143,"responseLength":443}]
这让我很恼火,因为这不是本地限制。两者之间唯一的另一个区别是 AKS 通过 https(和 wss)连接,而在本地,我们使用 http(和 ws)协议。我想了解导致连接断开的原因或原因。我知道最好的做法是断开空闲超过 2 分钟的连接,然后重新连接,或者实现应用程序级别的乒乓——但是,我想了解这里发生了什么,并能够解释为什么这些变化是开发团队需要。
在过去的几周里,我一直在为此失去理智。非常感谢任何线索!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。