集群kubelet中出现unable to fetch pod logs错误
问题描述:
查看kubelet日志命令:
journalctl -xefu kubelet
现象描述:
http://cdp.cestc.cn/product/#/project/defect/list?projectId=1501448966260252673
release image:0.0.0-rc.1-20220425214427-20220426011558
kubelet version:
kubelt日志:
Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.956630 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-controller-manager_kube-con troller-manager-masterl_629a3916-467a-4a84-8b22-77c8d439fcel: no such file or directory" pod="openshift-kube-controller-manager/kube-controller-manager-masterl" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.956721 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/e2e-test-apiserver-7bpph_masterl-debug_lf0 ce55a-b823-408f-bbf0-2e29d858ffa7: no such file or directory" pod="e2e-test-apiserver-7bpph/masterl-debug" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.956776 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-multus_multus-w9dlp_686a78a0-831 d-4ae3-b53b-55a3907272cf: no such file or directory" pod="openshift-multus/multus-w9dlp" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.956928 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-authentication_oauth-openshift-6 98d6878c5-qpn5v_38bc5f65-c721-49ec-b201-0f53d962a9ad: no such file or directory" pod="openshift-authentication/oauth-openshift-698d6878c5-qpn5v" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.957433 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-ovn-kubernetes_ovnkube-master-19 7jr_3f40352a-6c7e-447f-83b3-e868b07allf2: no such file or directory" pod="openshift-ovn-kubernetes/ovnkube-master-197jr" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.957500 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/e2e-test-apiserver-6gdb6_masterl-debug_812 2014f-e02f-4b7f-ad7d-022a3575031c: no such file or directory" pod="e2e-test-apiserver-6gdb6/masterl-debug" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.957978 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-apiserver_apiserver-5cfb458fd9-9 hjfb_2568519f-54fc-477b-b02d-740263520e85: no such file or directory" pod="openshift-apiserver/apiserver-5cfb458fd9-9hjfb" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.958549 31187 cadvisor_stats_provider.go:147] 'unable to retcn pod log stats" err="open /var/iog/poas/opensmrt-etcd_etcd-masterl_129d9527-2d4f- 4ecl-98bb-0b5al468130d: no such file or directory" pod="openshift-etcd/etcd-masterl" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.958618 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-scheduler_openshift-kube-sc heduler-masterl_b6allf4b-114d-4alb-a2cb-175418450966: no such file or directory" pod="openshift-kube-scheduler/openshift-kube-scheduler-masterl" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.958776 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-controller-manager_kube-con troller-manager-masterl_b5a9d269-be40-4227-84e3-eec31891cba3: no such file or directory" pod="openshift-kube-controller-manager/kube-controller-manager-masterl" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.959025 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-apiserver_kube-apiserver-ma sterl_6174e936-9908-4219-b585-65b502755f72: no such file or directory" pod="openshift-kube-apiserver/kube-apiserver-masterl" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.959178 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-controller-manager_kube-con troller-manager-masterl_629a3916-467a-4a84-8b22-77c8d439fcel: no such file or directory" pod="openshift-kube-controller-manager/kube-controller-manager-masterl" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.959625 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-dns-operator_dns-operator-584f87 f5f5-ttbsq_bel50f33-7814-41c7-b571-6a9aal4d6324: no such file or directory" pod="openshift-dns-operator/dns-operator-584f87f5f5-ttbsq" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.959764 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-network-operator_network-operato r-fc586669f-t52jr_43f06636-3063-451a-9cd0-2c3123c0dd0a: no such file or directory" pod="openshift-network-operator/network-operator-fc586669f-t52jr" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.959881 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-kube-controller-manager_kube-con troller-manager-masterl_lf439b06-clae-44bl-9280-9618218a5e6a: no such file or directory" pod="openshift-kube-controller-manager/kube-controller-manager-masterl" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.959937 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-ovn-kubernetes_ovnkube-master-19 7jr_3f40352a-6c7e-447f-83b3-e868b07allf2: no such file or directory" pod="openshift-ovn-kubernetes/ovnkube-master-197jr" Apr 26 10:31:51 masterl hyperkube[31187]: E0426 10:31:51.960099 31187 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-ingress_router-default-5fb748b97 6-fwkjl_5c20628b-18b4-4d96-abca-205329c32109: no such file or directory" pod="openshift-ingress/router-default-5fb748b976-fwkjl" Apr 26 10:31:55 masterl hyperkube[31187]: W0426 10:31:55.390139 31187 container.go:586] Failed to update stats for container "/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5c2062 8b_18b4_4d96_abca_205329c32109.Slice/crio-619152e20e0bl48155561758304287245cf011333cdb5368e080d3e52b333931.scope": unable to determine device info for dir: /var/lib/containers/storage/overlay/be82a8bbe8 4408dc973a8fcb2c33ed5d3ed83cd3997efelf60e7f8f86aa3a6ce/diff: stat failed on /Var/lib/containers/storage/overlay/be82a8bbe84408dc973a8fcb2c33ed5d3ed83cd3997efelf60e7f8f86aa3a6ce/diff with error: no such file or directory, continuing to push stats |
pod已经被删掉了但是cadvisor还在疯狂拿日志:
Apr 27 14:42:21 master1 hyperkube[31092]: E0427 14:42:21.978445 31092 cadvisor_stats_provider.go:147] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-cluster-node-tuning-operator_tuned-zs7t2_1daa8ead-d252-4802-9824-ade04974538d: no such file or directory" pod="openshift-cluster-node-tuning-operator/tuned-zs7t2" |
[root@master1 fd]# kubectl get pods -A | grep tuned openshift-cluster-node-tuning-operator tuned-44gmv 1/1 Running 0 14h openshift-cluster-node-tuning-operator tuned-597nx 1/1 Running 0 13h openshift-cluster-node-tuning-operator tuned-7jrv5 1/1 Running 0 14h openshift-cluster-node-tuning-operator tuned-dr4bm 1/1 Running 0 13h openshift-cluster-node-tuning-operator tuned-fq7gd 1/1 Running 0 13h openshift-cluster-node-tuning-operator tuned-sxtv5 1/1 Running 0 14h |
日志中异常点:
一. e2e-test-apiserver 失败
started: (19/2472/2761) "[sig-api-machinery][Feature:APIServer] TestTLSDefaults [Suite:openshift/conformance/parallel]" [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:61 [BeforeEach] [sig-api-machinery][Feature:APIServer] github.com/openshift/origin/test/extended/util/client.go:142 STEP: Creating a kubernetes client [BeforeEach] [sig-api-machinery][Feature:APIServer] github.com/openshift/origin/test/extended/util/client.go:116 goroutine 1 [running]: runtime/debug.Stack(0x0, 0x0, 0x0) runtime/debug/stack.go:24 +0x9f github.com/openshift/origin/test/extended/util.FatalErr(0x79945c0, 0xc0025dfe00) github.com/openshift/origin/test/extended/util/client.go:684 +0x26 github.com/openshift/origin/test/extended/util.(*CLI).GetClientConfigForUser(0xc0015d7d40, 0xc000ad4e00, 0x1d, 0x0) github.com/openshift/origin/test/extended/util/client.go:714 +0x137 github.com/openshift/origin/test/extended/util.(*CLI).ChangeUser(0xc0015d7d40, 0xc000ad4e00, 0x1d, 0x1) github.com/openshift/origin/test/extended/util/client.go:168 +0x6a github.com/openshift/origin/test/extended/util.(*CLI).SetupProject(0xc0015d7d40, 0x0, 0x0) github.com/openshift/origin/test/extended/util/client.go:242 +0x212 github.com/openshift/origin/test/extended/util.NewCLI.func1() github.com/openshift/origin/test/extended/util/client.go:116 +0x2a github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0016c2f00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:113 +0xa3 github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0016c2f00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:64 +0x15c github.com/onsi/ginkgo/internal/leafnodes.(*SetupNode).Run(0xc001668fa0, 0x8ed2f20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/setup_nodes.go:15 +0x87 github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc00145ba40, 0x0, 0x8ed2f20, 0xc00033c840) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/spec/spec.go:193 +0x28d github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc00145ba40, 0x8ed2f20, 0xc00033c840) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/spec/spec.go:138 +0xf2 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0017ffcc0, 0xc00145ba40, 0x0) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:200 +0x111 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0017ffcc0, 0x1) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:170 +0x147 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0017ffcc0, 0xc001bc3a40) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:66 +0x117 github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc000304870, 0x8ed31e0, 0xc0025bbd10, 0x0, 0x0, 0xc00132e0b0, 0x1, 0x1, 0x8fabeb8, 0xc00033c840, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/suite/suite.go:62 +0x426 github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001f30f60, 0xc000c64350, 0x1, 0x1, 0x82d4f68, 0x4a09b80) github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:61 +0x418 main.newRunTestCommand.func1.1() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x4e github.com/openshift/origin/test/extended/util.WithCleanup(0xc00174bc18) github.com/openshift/origin/test/extended/util/test.go:168 +0x5f main.newRunTestCommand.func1(0xc001c57b80, 0xc000c64350, 0x1, 0x1, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x333 github.com/spf13/cobra.(*Command).execute(0xc001c57b80, 0xc000c64290, 0x1, 0x1, 0xc001c57b80, 0xc000c64290) github.com/spf13/cobra@v1.1.3/command.go:852 +0x472 github.com/spf13/cobra.(*Command).ExecuteC(0xc001c57180, 0x0, 0x8edbf00, 0xbd72c98) github.com/spf13/cobra@v1.1.3/command.go:960 +0x375 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.1.3/command.go:897 main.main.func1(0xc001c57180, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:84 +0x94 main.main() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:85 +0x42c Apr 27 02:18:58.436: FAIL: Get "https://api.ocp4qxx.cesclusterqxx.cn:6443/apis/user.openshift.io/v1/users/e2e-test-apiserver-b2sq4-user": dial tcp: lookup api.ocp4qxx.cesclusterqxx.cn on 223.5.5.5:53: dial udp 223.5.5.5:53: connect: resource temporarily unavailable |
关联社区issue:
https://github.com/kubernetes/kubernetes/issues/106957
2040399 – Excessive memory usage by the kubelet
相关代码:
问题分析:
kubelet:
cadvisor组件:
https://github.com/google/cadvisor
cadvisor_stats_provider.go
// ListPodStats returns the stats of all the pod-managed containers. func (p *cadvisorStatsProvider) ListPodStats() ([]statsapi.PodStats, error) { // Gets node root filesystem information and image filesystem stats, which // will be used to populate the available and capacity bytes/inodes in // container stats. rootFsInfo, err := p.cadvisor.RootFsInfo() if err != nil { return nil, fmt.Errorf("failed to get rootFs info: %v", err) } imageFsInfo, err := p.cadvisor.ImagesFsInfo() if err != nil { return nil, fmt.Errorf("failed to get imageFs info: %v", err) } infos, err := getCadvisorContainerInfo(p.cadvisor) if err != nil { return nil, fmt.Errorf("failed to get container info from cadvisor: %v", err) } filteredInfos, allInfos := filterTerminatedContainerInfoAndAssembleByPodCgroupKey(infos) // Map each container to a pod and update the PodStats with container data. podToStats := map[statsapi.PodReference]*statsapi.PodStats{} for key, cinfo := range filteredInfos { // On systemd using devicemapper each mount into the container has an // associated cgroup. We ignore them to ensure we do not get duplicate // entries in our summary. For details on .mount units: // http://man7.org/linux/man-pages/man5/systemd.mount.5.html if strings.HasSuffix(key, ".mount") { continue } // Build the Pod key if this container is managed by a Pod if !isPodManagedContainer(&cinfo) { continue } ref := buildPodRef(cinfo.Spec.Labels) // Lookup the PodStats for the pod using the PodRef. If none exists, // initialize a new entry. podStats, found := podToStats[ref] if !found { podStats = &statsapi.PodStats{PodRef: ref} podToStats[ref] = podStats } // Update the PodStats entry with the stats from the container by // adding it to podStats.Containers. containerName := kubetypes.GetContainerName(cinfo.Spec.Labels) if containerName == leaky.PodInfraContainerName { // Special case for infrastructure container which is hidden from // the user and has network stats. podStats.Network = cadvisorInfoToNetworkStats(&cinfo) } else { podStats.Containers = append(podStats.Containers, *cadvisorInfoToContainerStats(containerName, &cinfo, &rootFsInfo, &imageFsInfo)) } } // Add each PodStats to the result. result := make([]statsapi.PodStats, 0, len(podToStats)) for _, podStats := range podToStats { // Lookup the volume stats for each pod. podUID := types.UID(podStats.PodRef.UID) var ephemeralStats []statsapi.VolumeStats if vstats, found := p.resourceAnalyzer.GetPodVolumeStats(podUID); found { ephemeralStats = make([]statsapi.VolumeStats, len(vstats.EphemeralVolumes)) copy(ephemeralStats, vstats.EphemeralVolumes) podStats.VolumeStats = append(append([]statsapi.VolumeStats{}, vstats.EphemeralVolumes...), vstats.PersistentVolumes...) } logStats, err := p.hostStatsProvider.getPodLogStats(podStats.PodRef.Namespace, podStats.PodRef.Name, podUID, &rootFsInfo) if err != nil { klog.ErrorS(err, "Unable to fetch pod log stats", "pod", klog.KRef(podStats.PodRef.Namespace, podStats.PodRef.Name)) } etcHostsStats, err := p.hostStatsProvider.getPodEtcHostsStats(podUID, &rootFsInfo) if err != nil { klog.ErrorS(err, "Unable to fetch pod etc hosts stats", "pod", klog.KRef(podStats.PodRef.Namespace, podStats.PodRef.Name)) } podStats.EphemeralStorage = calcEphemeralStorage(podStats.Containers, ephemeralStats, &rootFsInfo, logStats, etcHostsStats, false) // Lookup the pod-level cgroup's CPU and memory stats podInfo := getCadvisorPodInfoFromPodUID(podUID, allInfos) if podInfo != nil { cpu, memory := cadvisorInfoToCPUandMemoryStats(podInfo) podStats.CPU = cpu podStats.Memory = memory podStats.ProcessStats = cadvisorInfoToProcessStats(podInfo) } status, found := p.statusProvider.GetPodStatus(podUID) if found && status.StartTime != nil && !status.StartTime.IsZero() { podStats.StartTime = *status.StartTime // only append stats if we were able to get the start time of the pod result = append(result, *podStats) } } return result, nil } |
func (h hostStatsProvider) getPodLogStats(podNamespace, podName string, podUID types.UID, rootFsInfo *cadvisorapiv2.FsInfo) (*statsapi.FsStats, error) { metricsByPath, err := h.podLogMetrics(podNamespace, podName, podUID) if err != nil { return nil, err } return metricsByPathToFsStats(metricsByPath, rootFsInfo) } |
kubernetes\vendor\github.com\google\cadvisor\manager\manager.go
Line 352: for _, container := range m.containers { Line 421: cont, ok = m.containers[namespacedContainerName{ Line 554: cont, ok := m.containers[namespacedContainerName{Name: containerName}] Line 564: containersMap := make(map[string]*containerData, len(m.containers)) Line 568: for i := range m.containers { Line 569: if m.containers[i] == nil { Line 572: name := m.containers[i].info.Name Line 574: containersMap[m.containers[i].info.Name] = m.containers[i] Line 574: containersMap[m.containers[i].info.Name] = m.containers[i] Line 593: containers := make(map[string]*containerData, len(m.containers)) Line 596: for name, cont := range m.containers { Line 628: cont, ok := m.containers[namespacedContainerName{ Line 635: for contName, c := range m.containers { Line 843: _, ok := m.containers[namespacedName] Line 915: if _, ok := m.containers[namespacedName]; ok { Line 980: m.containers[namespacedName] = cont Line 982: m.containers[namespacedContainerName{ Line 1024: cont, ok := m.containers[namespacedName] Line 1037: delete(m.containers, namespacedName) Line 1039: delete(m.containers, namespacedContainerName{ Line 1067: cont, ok := m.containers[namespacedContainerName{ Line 1086: for name, d := range m.containers { Line 1096: _, ok := m.containers[namespacedContainerName{ Line 1334: conts = make(map[*containerData]struct{}, len(m.containers)) Line 1335: for _, c := range m.containers { |
重启kubelet, 发现manager.go开始报错:
Apr 27 18:29:06 master1 hyperkube[2367407]: E0427 18:29:06.233891 2367407 manager.go:1132] Failed to create existing container: /pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podee9abc55_e6a5_4758_9c7e_7139b7d80112.slice/crio-a8220d69f2557a4a5a1794bcb3fb8c4b59f7f0caaa2a2f905faca48070e1ad89.scope: Error finding container a8220d69f2557a4a5a1794bcb3fb8c4b59f7f0caaa2a2f905faca48070e1ad89: Status 404 returned error &{%!s(*http.body=&{0xc000c7ba70 <nil> <nil> false false {0 0} false false false <nil>}) {%!s(int32=0) %!s(uint32=0)} %!s(bool=false) <nil> %!s(func(error) error=0x8944e0) %!s(func() error=0x894460)} |
[root@master1 kubepods-besteffort.slice]# pwd /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice |
[root@master1 kubepods-besteffort.slice]# ll | grep podc9ab82cb_a53a_4650_9dc4_2791342d80e2 drwxr-xr-x. 3 root root 0 Apr 27 15:27 kubepods-besteffort-podc9ab82cb_a53a_4650_9dc4_2791342d80e2.slice |
[root@master1 kubepods-besteffort-podc9ab82cb_a53a_4650_9dc4_2791342d80e2.slice]# tree . ├── cgroup.controllers ├── cgroup.events ├── cgroup.freeze ├── cgroup.kill ├── cgroup.max.depth ├── cgroup.max.descendants ├── cgroup.procs ├── cgroup.stat ├── cgroup.subtree_control ├── cgroup.threads ├── cgroup.type ├── cpu.pressure ├── cpu.stat ├── crio-302d5797eda0d1314e6c2f68a489bb6cc08db0fcdf1b291662c9d1f2aace073b.scope │ ├── cgroup.controllers │ ├── cgroup.events │ ├── cgroup.freeze │ ├── cgroup.kill │ ├── cgroup.max.depth │ ├── cgroup.max.descendants │ ├── cgroup.procs │ ├── cgroup.stat │ ├── cgroup.subtree_control │ ├── cgroup.threads │ ├── cgroup.type │ ├── cpu.pressure │ ├── cpu.stat │ ├── io.pressure │ └── memory.pressure ├── io.pressure └── memory.pressure 1 directory, 30 files |
[root@master1 crio-302d5797eda0d1314e6c2f68a489bb6cc08db0fcdf1b291662c9d1f2aace073b.scope]# ll total 0 -r--r--r--. 1 root root 0 Apr 27 15:27 cgroup.controllers -r--r--r--. 1 root root 0 Apr 27 15:27 cgroup.events -rw-r--r--. 1 root root 0 Apr 27 15:27 cgroup.freeze --w-------. 1 root root 0 Apr 27 15:27 cgroup.kill -rw-r--r--. 1 root root 0 Apr 27 15:27 cgroup.max.depth -rw-r--r--. 1 root root 0 Apr 27 15:27 cgroup.max.descendants -rw-r--r--. 1 root root 0 Apr 27 15:27 cgroup.procs -r--r--r--. 1 root root 0 Apr 27 15:27 cgroup.stat -rw-r--r--. 1 root root 0 Apr 27 15:27 cgroup.subtree_control -rw-r--r--. 1 root root 0 Apr 27 15:27 cgroup.threads -rw-r--r--. 1 root root 0 Apr 27 15:27 cgroup.type -rw-r--r--. 1 root root 0 Apr 27 15:27 cpu.pressure -r--r--r--. 1 root root 0 Apr 27 15:27 cpu.stat -rw-r--r--. 1 root root 0 Apr 27 15:27 io.pressure -rw-r--r--. 1 root root 0 Apr 27 15:27 memory.pressure |
按照命名规则, 容器名:
a8220d69f2557a4a5a1794bcb3fb8c4b59f7f0caaa2a2f905faca48070e1ad89 |
// Detect the existing subcontainers and reflect the setup here. func (m *manager) detectSubcontainers(containerName string) error { added, removed, err := m.getContainersDiff(containerName) if err != nil { return err } // Add the new containers. for _, cont := range added { err = m.createContainer(cont.Name, watcher.Raw) if err != nil { klog.Errorf("Failed to create existing container: %s: %s", cont.Name, err) } } // Remove the old containers. for _, cont := range removed { err = m.destroyContainer(cont.Name) if err != nil { klog.Errorf("Failed to destroy existing container: %s: %s", cont.Name, err) } } return nil } |
// ContainerInfo returns information about a given container func (c *crioClientImpl) ContainerInfo(id string) (*ContainerInfo, error) { req, err := getRequest("/containers/" + id) if err != nil { return nil, err } cInfo := ContainerInfo{} resp, err := c.client.Do(req) if err != nil { return nil, err } defer resp.Body.Close() // golang's http.Do doesn't return an error if non 200 response code is returned // handle this case here, rather than failing to decode the body if resp.StatusCode != http.StatusOK { return nil, fmt.Errorf("Error finding container %s: Status %d returned error %s", id, resp.StatusCode, resp.Body) } if err := json.NewDecoder(resp.Body).Decode(&cInfo); err != nil { return nil, err } if len(cInfo.IP) > 0 { return &cInfo, nil } if len(cInfo.IPs) > 0 { cInfo.IP = cInfo.IPs[0] } return &cInfo, nil } |
以k8s-nettest为例子分析:
var _ = ginkgo.Describe("[sig-network] Internal connectivity", func() { f := framework.NewDefaultFramework("k8s-nettest") ginkgo.It("for TCP and UDP on ports 9000-9999 is allowed", func() { e2eskipper.SkipUnlessNodeCountIsAtLeast(2) clientConfig := f.ClientConfig() one := int64(0) ds := &appsv1.DaemonSet{ ObjectMeta: metav1.ObjectMeta{ Name: "webserver", Namespace: f.Namespace.Name, }, Spec: appsv1.DaemonSetSpec{ Selector: &metav1.LabelSelector{ MatchLabels: map[string]string{ "apps": "webserver", }, }, Template: v1.PodTemplateSpec{ ObjectMeta: metav1.ObjectMeta{ Labels: map[string]string{ "apps": "webserver", }, }, Spec: v1.PodSpec{ Tolerations: []v1.Toleration{ { Key: "node-role.kubernetes.io/master", Operator: v1.TolerationOpExists, Effect: v1.TaintEffectNoSchedule, }, }, HostNetwork: true, TerminationGracePeriodSeconds: &one, Containers: []v1.Container{ { Name: "webserver", Image: e2enetwork.NetexecImageName, Command: []string{"/agnhost", "netexec", fmt.Sprintf("--http-port=%v", nodeTCPPort), fmt.Sprintf("--udp-port=%v", nodeUDPPort)}, Ports: []v1.ContainerPort{ {Name: "tcp", ContainerPort: nodeTCPPort}, {Name: "udp", ContainerPort: nodeUDPPort}, }, ReadinessProbe: &v1.Probe{ InitialDelaySeconds: 10, Handler: v1.Handler{ HTTPGet: &v1.HTTPGetAction{ Port: intstr.FromInt(nodeTCPPort), }, }, }, SecurityContext: &v1.SecurityContext{ Capabilities: &v1.Capabilities{ Add: []v1.Capability{"NET_RAW"}, }, }, }, }, }, }, }, } name := ds.Name ds, err := f.ClientSet.AppsV1().DaemonSets(f.Namespace.Name).Create(context.Background(), ds, metav1.CreateOptions{}) o.Expect(err).NotTo(o.HaveOccurred()) err = wait.PollImmediate(5*time.Second, 5*time.Minute, func() (bool, error) { ds, err = f.ClientSet.AppsV1().DaemonSets(f.Namespace.Name).Get(context.Background(), name, metav1.GetOptions{}) if err != nil { framework.Logf("unable to retrieve daemonset: %v", err) return false, nil } if ds.Status.ObservedGeneration != ds.Generation || ds.Status.NumberAvailable == 0 || ds.Status.NumberAvailable != ds.Status.DesiredNumberScheduled { framework.Logf("waiting for daemonset: %#v", ds.Status) return false, nil } return true, nil }) o.Expect(err).NotTo(o.HaveOccurred()) framework.Logf("daemonset ready: %#v", ds.Status) pods, err := f.ClientSet.CoreV1().Pods(f.Namespace.Name).List(context.Background(), metav1.ListOptions{LabelSelector: labels.Set(ds.Spec.Selector.MatchLabels).String()}) o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(len(pods.Items)).To(o.Equal(int(ds.Status.NumberAvailable)), fmt.Sprintf("%#v", pods.Items)) // verify connectivity across pairs of pods in parallel // TODO: on large clusters this is O(N^2), we could potentially sample or split by topology var testFns []func() error protocols := []v1.Protocol{v1.ProtocolTCP, v1.ProtocolUDP} ports := []int{nodeTCPPort, nodeUDPPort} for j := range pods.Items { for i := range pods.Items { if i == j { continue } for k := range protocols { func(i, j, k int) { testFns = append(testFns, func() error { from := pods.Items[j] to := pods.Items[i] protocol := protocols[k] testingMsg := fmt.Sprintf("[%s: %s -> %s:%d]", protocol, from.Spec.NodeName, to.Spec.NodeName, ports[k]) testMsg := fmt.Sprintf("%s-from-%s-to-%s", "hello", from.Status.PodIP, to.Status.PodIP) command, err := testRemoteConnectivityCommand(protocol, "localhost:"+strconv.Itoa(nodeTCPPort), to.Status.HostIP, ports[k], testMsg) if err != nil { return fmt.Errorf("test of %s failed: %v", testingMsg, err) } res, err := commandResult(f.ClientSet.CoreV1(), clientConfig, from.Namespace, from.Name, "webserver", []string{"/bin/sh", "-cex", strings.Join(command, " ")}) if err != nil { return fmt.Errorf("test of %s failed: %v", testingMsg, err) } if res != `{"responses":["`+testMsg+`"]}` { return fmt.Errorf("test of %s failed, unexpected response: %s", testingMsg, res) } return nil }) }(i, j, k) } } } errs := parallelTest(6, testFns) o.Expect(errs).To(o.Equal([]error(nil))) }) }) |
分析结论:
一. k8s的cadvisor监控, 不但存在内存缓存, 也存在文件缓存. 容器已经被删除, 但是内存缓存和文件缓存却都存在
内存缓存:
type manager struct { containers map[namespacedContainerName]*containerData |
文件缓存的位置:
/sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice |
将容器命字节写到了文件夹的名字里
[root@master0 ~]# cd /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice [root@master0 kubepods-besteffort.slice]# ll total 0 -r--r--r--. 1 root root 0 Apr 15 21:01 cgroup.controllers -r--r--r--. 1 root root 0 Apr 15 21:01 cgroup.events -rw-r--r--. 1 root root 0 Apr 15 21:01 cgroup.freeze --w-------. 1 root root 0 Apr 15 21:01 cgroup.kill -rw-r--r--. 1 root root 0 Apr 15 21:01 cgroup.max.depth -rw-r--r--. 1 root root 0 Apr 15 21:01 cgroup.max.descendants -rw-r--r--. 1 root root 0 Apr 15 21:01 cgroup.procs -r--r--r--. 1 root root 0 Apr 15 21:01 cgroup.stat -rw-r--r--. 1 root root 0 Apr 15 21:01 cgroup.subtree_control -rw-r--r--. 1 root root 0 Apr 15 21:01 cgroup.threads -rw-r--r--. 1 root root 0 Apr 15 21:01 cgroup.type -rw-r--r--. 1 root root 0 Apr 15 21:01 cpu.pressure -r--r--r--. 1 root root 0 Apr 15 21:01 cpu.stat -rw-r--r--. 1 root root 0 Apr 15 21:01 io.pressure drwxr-xr-x. 3 root root 0 Apr 18 16:49 kubepods-besteffort-pode2817b14_b381_4ac8_8b50_4fcb3500d7ae.slice drwxr-xr-x. 3 root root 0 Apr 18 17:21 kubepods-besteffort-pode28fe3ba_d1e2_4e2d_8559_4e9a883ca606.slice drwxr-xr-x. 3 root root 0 Apr 18 16:49 kubepods-besteffort-pode29fd48b_f6b5_4ad2_a45d_093aff644bd6.slice drwxr-xr-x. 3 root root 0 Apr 18 17:32 kubepods-besteffort-pode375e87c_e6ed_41d4_bdc8_438be0f69c11.slice drwxr-xr-x. 3 root root 0 Apr 18 22:36 kubepods-besteffort-pode3f11bb6_537c_4979_b7e8_01d840fbbf32.slice drwxr-xr-x. 3 root root 0 Apr 18 22:39 kubepods-besteffort-pode4335413_19ce_48c5_8353_e09fe40399dc.slice drwxr-xr-x. 3 root root 0 Apr 18 22:39 kubepods-besteffort-pode4bd2a8b_7c27_4de1_8901_a39c4996e251.slice drwxr-xr-x. 3 root root 0 Apr 18 17:09 kubepods-besteffort-pode663575b_83cf_4dfe_a784_afaf4f759d18.slice drwxr-xr-x. 3 root root 0 Apr 18 17:24 kubepods-besteffort-pode8d3ba2f_da34_49b3_b4d4_1d612c4c1b89.slice drwxr-xr-x. 3 root root 0 Apr 18 16:47 kubepods-besteffort-pode95c1805_b49b_44b3_b087_00ebed080a18.slice drwxr-xr-x. 3 root root 0 Apr 17 18:13 kubepods-besteffort-pode97bfd16_0272_4f77_b702_ef5f1bce41cf.slice drwxr-xr-x. 3 root root 0 Apr 18 17:36 kubepods-besteffort-poded520c2e_8d44_498e_a465_05e81fac52b4.slice drwxr-xr-x. 3 root root 0 Apr 18 17:19 kubepods-besteffort-podee2d8d06_9ed2_49c1_9a1c_bcb03545cdc2.slice drwxr-xr-x. 3 root root 0 Apr 18 17:23 kubepods-besteffort-podee8a97c9_dd80_44bd_9c1b_5b165a27fb3c.slice drwxr-xr-x. 3 root root 0 Apr 18 22:42 kubepods-besteffort-podf13f6d1d_c50e_4429_aa6b_e8a807f105d7.slice drwxr-xr-x. 3 root root 0 Apr 18 17:19 kubepods-besteffort-podf248e3e0_1602_4458_bea0_98ad114191e5.slice drwxr-xr-x. 3 root root 0 Apr 17 18:00 kubepods-besteffort-podf769839c_8a04_40d6_ba47_6b6bf39a7b32.slice drwxr-xr-x. 3 root root 0 Apr 18 18:05 kubepods-besteffort-podf7cf6d82_b555_416b_afe1_4366caca0b1a.slice drwxr-xr-x. 3 root root 0 Apr 19 06:51 kubepods-besteffort-podf826b2e3_a0c4_4d1f_ad14_fe756e4a5360.slice drwxr-xr-x. 3 root root 0 Apr 19 02:05 kubepods-besteffort-podf927cb51_412b_420b_92ae_e7aa0c2df0d0.slice drwxr-xr-x. 3 root root 0 Apr 16 15:23 kubepods-besteffort-podf935a471_8432_445c_8100_e69f144a6fa6.slice drwxr-xr-x. 3 root root 0 Apr 18 16:48 kubepods-besteffort-podfa96f0eb_c13f_4265_8a7b_c81de9bbfca9.slice drwxr-xr-x. 3 root root 0 Apr 18 17:07 kubepods-besteffort-podfd8463ff_e756_45d0_9bb9_6ec7228580c6.slice -rw-r--r--. 1 root root 0 Apr 15 21:01 memory.pressure |
问题解决:
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。