Prometheus-AI运维探索者-第5页

更新

浏览

PromQL入门：基础语法与时间范围查询

一、PromQL 是使用 Prometheus 的核心能力，只有理解即时向量、范围向量、标签过滤和时间偏移等基础语法，后续告警与可视化配置才有抓手。本文通过一组常用查询示例带你快速入门。 Prometheus ...

Prometheus

3年前

04014

Alertmanager邮箱告警实战：接入163邮箱发送通知

一、开启SMTP协议点击【设置】-【POP3/SMTP/IMAP】点击【开启】点击【继续开启】扫码后，根据提示发送短信后，点击【我已发送】复制授权码后，点击【确定】二、配置告警模板进入工作目录...

Prometheus

3年前

04010

Prometheus 黑盒监控：DNS 探测配置实战

一、DNS 监控参数解释：更新 `prometheus-config.yaml`配置 : 打开 Prometheus 的 Target 页面，就会看到上面定义的 `blackbox-k8s-service-dns` 任务； graph 页面，可以使用 `probe_succes...

Prometheus

3年前

03911

Prometheus 黑盒监控：TCP 探测配置实战

一、TCP检测按上面方法重载 Prometheus，打开 Prometheus 的 Target 页面，就会看到上面定义的 `service-tcp-probe` 任务则需要在service上添加注释必须有以下三行示例：Java应用的svc：

Prometheus

3年前

03910

Kube-State-Metrics 入门：K8S 集群层监控的基础组件

一、KubeStateMetrics简介 kube-state-metrics 是一个 Kubernetes 组件，它通过查询 Kubernetes 的 API 服务器，收集关于 Kubernetes 中各种资源（如节点、pod、服务等）的状态信息，并将这些...

Prometheus

3年前

03813

AlertManager告警分组和告警抑制：告警分组

一、为了避免告警轰炸，将同类型的告警规则定位一组，比如将所有硬件相关的都归类到hardware，包括负载、cpu使用率、内存使用率、硬盘等。当此类告警被触发，在一个“group_wait”时间范围内，...

Prometheus

3年前

0387

Prometheus Operator 自定义监控：Helm 版 Ingress-Nginx

一、自定义资源 Prometheus-operator 通过定期循环watch apiserver，获取到CRD资源（比如 servicemonitor）的创建或者更新，将配置更新及时应用到运行中的prometheus pod 中转换成标准promethes...

Prometheus

3年前

0385

Prometheus 自定义监控：接入云主机 Node Exporter

一、虚机数据抓取 1.1 配置安装node-exporter 验证数据收集： 1.2 配置prometheus-config.yaml 按上面方法重载 Prometheus，打开 Prometheus 的 Target 页面，就会看到上面定义的 `other-ECS` ...

Prometheus

3年前

03810

Prometheus 对接 Alertmanager：告警配置与测试

一、Prometheus添加告警配置修改ConfigMap资源文件prometheus-config.yaml，改动内容如下: - 添加AlertManager服务器地址 - 指定告警规则文件路径位置 - 添加Prometheus中触发告警的告警规则（...

Prometheus

3年前

0379

Alertmanager 高级配置：接入钉钉告警与静默管理

一、基于钉钉的报警媒介 [自定义机器人安全设置 - 钉钉开放平台 (dingtalk.com)](https://open.dingtalk.com/document/robots/customize-robot-security-settings) [创建自定义机器人 - 钉钉开...

Prometheus

3年前

03711