Prometheus 监控分为两种:
-
白盒监控
-
墨盒监控
白盒监控:是指我们日常监控主机的资源用量、容器的运行状态、数据库中间件等运行数据。这些都是支持业务和服务的基础设施,通过白盒能够了解其内部的实际运行状态,通过对监控指标的观察能够预判可能出现的问题,从而对潜在的不确定因素进行优化。
墨盒监控:即以用户的身份测试服务的外部可见性,常见的黑盒监控包括 HTTP探针、TCP探针、Dns、icmp等用于检测站点、服务的可访问性、服务的连通性、证书过期时间以及访问效率等。
两者比较:黑盒监控相较于白盒监控最大的不同在于黑盒监控是以故障为导向当故障发生时,黑盒监控能快速发现故障,而白盒监控则侧重于主动发现或者预测潜在的问题。一个完善的监控目标是要能够从白盒的角度发现潜在问题,能够在黑盒的角度快速发现已经发生的问题。
在Kubernetes 中使用Helm 部署 Blackbox-Exporter
官方仓库地址:https://github.com/prometheus-community/helm-charts
这里面包含prometheus 相关组件几乎所有的helm chert。
-
添加仓库
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts $ helm repo update
-
下载&解压
$ helm pull prometheus-community/prometheus-blackbox-exporter $ tar -zxvf prometheus-blackbox-exporter.tar.gz
-
修改values.yaml 修改blackbox_exporter配置
modules: http_2xx: # http 检测模块 Blockbox-Exporter 中所有的探针均是以 Module 的信息进行配置 prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] valid_status_codes: [200] # 默认 2xx,这里定义一个返回状态码,在grafana作图时,有明示。 method: GET headers: Host: prometheus.example.com Accept-Language: en-US Origin: example.com preferred_ip_protocol: "ip4" # 首选IP协议 no_follow_redirects: false # 关闭跟随重定向 http_post_2xx: # http post 监测模块 prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] method: POST # post 请求headers, body 这里可以不声明 headers: # 使用 json 格式 Content-Type: application/json body: '{"text": "hello"}' preferred_ip_protocol: "ip4" tcp_connect: # TCP 检测模块 prober: tcp timeout: 10s dns_tcp: # DNS 通过TCP检测模块 prober: dns dns: transport_protocol: "tcp" # 默认是 udp preferred_ip_protocol: "ip4" # 默认是 ip6 query_name: "kubernetes.default.svc.cluster.local" # 利用这个域名来检查 dns 服务器 # query_type: "A" # 如果是 kube-dns ,一定要加入这个,因为不支持Ipv6
-
部署 Blackbox-Exporter
$ helm install blackbox-exporter -n monitor .
Prometheus Operator 配置
-
HTTP 监控(监控外部域名)
- job_name: 'blackbox_http_2xx' metrics_path: /probe params: module: [http_2xx] # Look for a HTTP 200 response. static_configs: - targets: #- http://prometheus.io # Target to probe with http. #- https://prometheus.io # Target to probe with https. - https://www.baidu.com # Target to probe with http on port 8080. relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: prometheus-blackbox-exporter.monitor:9115 # The blackbox exporter's real hostname:port.
-
ping 监测配置
在内网可以通过ping (icmp)检测服务器的存活,以前面的最基本的module配置为例,在Prometheus的配置文件中配置使用ping module:
- job_name: 'blackbox_ping_all' scrape_interval: 1m metrics_path: /probe params: module: [ping] static_configs: - targets: - 64.115.3.100 labels: instance: test relabel_configs: - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
Helm 部署的Prometheus Operator 直接修改values.yaml 文件在prometheus.additionalScrapeConfigs 下添加配置然后更新即可。
-
更新
$ vim values.yaml $ helm upgrade prometheus -n monitor .
-
打开Prometheus Dashboard的 Target 页面,就会看到 上面定义的任务
同样的操作还可以添加更多不通类型的黑盒监测
-
DNS 监控
- job_name: "blackbox-k8s-service-dns" scrape_interval: 30s scrape_timeout: 10s metrics_path: /probe # 不是 metrics,是 probe params: module: [dns_tcp] # 使用 DNS TCP 模块 static_configs: - targets: - kube-dns.kube-system:53 # 不要省略端口号 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
-
ICMP监测
- job_name: node_status metrics_path: /probe params: module: [icmp] static_configs: - targets: ['10.165.94.31'] labels: instance: node_status group: 'node' relabel_configs: - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
-
TCP监测
- job_name: 'prometheus_port_status' metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: ['172.19.155.133:8765'] labels: instance: 'port_status' group: 'tcp' relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
-
SSL 证书过期时间监测
rule_files: - ssl_expiry.rules scrape_configs: - job_name: 'blackbox' metrics_path: /probe params: module: [http_2xx] # Look for a HTTP 200 response. static_configs: - targets: - example.com # Target to probe relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
ssl_expiry.rules
groups: - name: ssl_expiry.rules rules: - alert: SSLCertExpiringSoon expr: probe_ssl_earliest_cert_expiry{job="blackbox"} - time()
赞