一、node-exporter¶
Node Exporter 是 Prometheus 官方提供的一个节点资源采集组件,可以用于收集服务器节点的数据,如 CPU频率信息、磁盘IO统计、剩余可用内存等等。
部署创建:
由于是针对所有K8S-node节点,所以我们这边使用DaemonSet这种方式;
[root@master01 7]# vim node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor
labels:
name: node-exporter
spec:
selector:
matchLabels:
name: node-exporter
template:
metadata:
labels:
name: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- name: node-exporter
image: registry.cn-hangzhou.aliyuncs.com/abroad_images/node-exporter:latest
ports:
- containerPort: 9100
resources:
requests:
cpu: 0.15
securityContext:
privileged: true
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
#应用
[root@master01 7]# kaf node-exporter.yaml
#验证
[root@master01 7]# kgp -n monitor | grep node-exporter
node-exporter-5bt9t 1/1 Running 0 53s
node-exporter-8dddq 1/1 Running 0 53s
node-exporter-8z67n 1/1 Running 0 53s
node-exporter-mqzm5 1/1 Running 0 53s
node-exporter-qs57w 1/1 Running 0 53s
node_exporter.yaml文件说明:
- hostPID:指定是否允许Node Exporter进程绑定到主机的PID命名空间。若值为true,则可以访问宿主机中的PID信息。
- hostIPC:指定是否允许Node Exporter进程绑定到主机的IPC命名空间。若值为true,则可以访问宿主机中的IPC信息。
- hostNetwork:指定是否允许Node Exporter进程绑定到主机的网络命名空间。若值为true,则可以访问宿主机中的网络信息。
验证:
[root@master01 7]# curl localhost:9100/metrics |grep cpu
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP node_cpu_guest_seconds_total Seconds the CPUs spent in guests (VMs) for each mode.
# TYPE node_cpu_guest_seconds_total counter
node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
node_cpu_guest_seconds_total{cpu="1",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="1",mode="user"} 0
node_cpu_guest_seconds_total{cpu="2",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="2",mode="user"} 0
node_cpu_guest_seconds_total{cpu="3",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="3",mode="user"} 0
node_cpu_guest_seconds_total{cpu="4",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="4",mode="user"} 0
node_cpu_guest_seconds_total{cpu="5",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="5",mode="user"} 0
node_cpu_guest_seconds_total{cpu="6",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="6",mode="user"} 0
node_cpu_guest_seconds_total{cpu="7",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="7",mode="user"} 0
...
...
1.1 新增 k8s-node 监控¶
在 prometheus-config.yaml 中新增采集 job:k8s-nodes
node_exporter也是每个node节点都运行,因此role使用node即可,默认address端口为10250,替换为9100即可;
# 新增如下配置
########## k8s-node 监控配置 ##########
- job_name: k8s-nodes
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
# 完整配置文件
[root@master01 7]# vim prometheus-config.yaml
[root@master01 7]# cat prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitor
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: "kubernetes"
############ 数据采集job ###################
scrape_configs:
########## prometheus 监控配置 ##########
- job_name: prometheus
static_configs:
- targets: ['127.0.0.1:9090']
labels:
instance: prometheus
########## kube-apiserver 监控配置 ##########
- job_name: kube-apiserver
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: keep
regex: default;kubernetes
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
########## kube-controller-manager 监控配置 ##########
- job_name: 'kube-controller-manager'
# 使用 Kubernetes Pod 发现机制
kubernetes_sd_configs:
- role: pod
# 强制使用 HTTPS 协议
scheme: https
# TLS 配置(测试环境跳过验证)
tls_config:
insecure_skip_verify: true
# 使用 ServiceAccount 的 Token 认证
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
# 保留标签为 component=kube-controller-manager 的 Pod
- source_labels: [__meta_kubernetes_pod_label_component]
regex: kube-controller-manager
action: keep
# 重写目标地址为 Pod IP + 10257 端口
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: "${1}:10257"
# 强制使用 HTTPS 协议(冗余但明确)
- source_labels: []
regex: .*
target_label: __scheme__
replacement: https
# 附加元数据标签
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
########## kube-scheduler 监控配置 ##########
- job_name: 'kube-scheduler'
kubernetes_sd_configs:
- role: pod
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_component]
regex: kube-scheduler
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: "${1}:10259"
- source_labels: []
regex: .*
target_label: __scheme__
replacement: https
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
########## kube-state-metrics 监控配置 ##########
- job_name: kube-state-metrics
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: kube-state-metrics
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:8080
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
########## coredns 监控配置 ##########
- job_name: coredns
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels:
- __meta_kubernetes_service_label_k8s_app
regex: kube-dns
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:9153
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
########## etcd 监控配置 ##########
- job_name: etcd
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_label_component
regex: etcd
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:2381
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
########## kubelet 监控配置 ##########
- job_name: kubelet
metrics_path: /metrics/cadvisor
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
########## k8s-node 监控配置 ##########
- job_name: k8s-nodes
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
############ 指定告警规则文件路径位置 ###################
rule_files:
- /etc/prometheus/rules/*.rules
# 配置生效
[root@master01 7]# kaf prometheus-config.yaml
# 手动热加载
[root@master01 7]# curl -XPOST http://prometheus.zhang-qing.com/-/reload
热加载prometheus,使configmap配置文件生效:
curl -XPOST http://prometheus.zhang-qing.com/-/reload
二、总结¶
- kube-state-metrics:将 Kubernetes API 中的各种对象状态信息转化为 Prometheus 可以使用的监控指标数据。
- cAdvisor:用于监视容器资源使用和性能的工具,它可以收集 CPU、内存、磁盘、 网络和文件系统等方面的指标数据。
- node-exporter:用于监控主机指标数据的收集器,它可以收集 CPU 负载、内存使 用情况、磁盘空间、网络流量等各种指标数据。
这三种工具可以协同工作,为用户提供一个全面的 Kubernetes 监控方案,帮助用户更好地了解其 Kubernetes 集群和容器化应用程序的运行情况。