本文共 5450 字,大约阅读时间需要 18 分钟。
Node exporter 是用于收集操作系统基本信息的工具,如 CPU、内存、硬盘空间等,并通过 API 提供数据供 Prometheus 查询。
使用 Docker 运行 Node exporter:
docker run -d \ --name node-exporter \ -p 9100:9100 \ -v "/proc:/host/proc:ro" \ -v "/sys:/host/sys:ro" \ -v "/:/rootfs:ro" \ --restart=always \ --net="host" \ prom/node-exporter \ --path.procfs /host/proc \ --path.sysfs /host/sys \ --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"
通过 API 接口获取数据:
curl http://localhost:9100/metrics
Consul 是一个服务注册中心,可向外提供服务的增删 API,Prometheus 可通过 Consul 动态获取节点信息。
使用 Docker 安装 Consul:
docker run \ --restart=always \ --name consul \ -d \ -p 8500:8500 \ consul
使用 curl 添加服务:
curl -X PUT \ -d '{"id": "node03","name": "node03","address": "192.168.1.42","port": 9100,"tags": ["test"],"checks": [{"http": "": "5s"}]}' \ http://localhost:8500/v1/agent/service/register 删除服务节点:
curl -X PUT \ http://localhost:8500/v1/agent/service/deregister/node02
服务注册成功后可通过 Consul UI 查看注册状态。
Alertmanager 接收 Prometheus 发送的告警信息,并通过邮件、微信等方式发送给接收者。
准备目录:
test -d /etc/alertmanager || mkdir -pv /etc/alertmanager
准备配置文件:
# /etc/alertmanager/alertmanager.ymlglobal: resolve_timeout: 5mtemplates: - '/etc/alertmanager/wechat.tmpl'route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'wechat'receivers: - name: 'wechat' wechat_configs: - corp_id: 'wwc08fcb42fc6fe93c' to_party: '2' agent_id: '1000002' api_secret: 'cLG91Xgcd3o3zPJp6NbOJV9m7SBIlhtCScxov3Hp-XQ' send_resolved: true
准备模板文件:
# /etc/alertmanager/wechat.tmpl{ define "wechat.default.message" }{ range .Alerts }========start==========告警程序:{ .Labels.severity }告警类型:{ .Labels.alertname }故障主机: { .Labels.instance }告警主题: { .Annotations.summary }告警详情: { .Annotations.description }触发时间: { .StartsAt.Format "2006-01-02 15:04:05" }========end=========={ end }{ end } 运行 Docker 容器:
docker run \ --restart=always \ -d \ -p 9093:9093 \ -v /etc/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml \ -v /etc/alertmanager/wechat.tmpl:/etc/alertmanager/wechat.tmpl \ --name alertmanager \ prom/alertmanager
查看容器日志:
docker logs -f alertmanager
Prometheus 用于向 Exporter 获取数据并保存,同时可以设置规则和触发器,向报警器发送信息。
准备目录:
test -d /etc/prometheus || mkdir /etc/prometheus -pv
配置文件:
# /etc/prometheus/prometheus.ymlglobal: scrape_interval: 15s evaluation_interval: 15srule_files: - "/etc/prometheus/*.rules"alerting: alertmanagers: - static_configs: - targets: - "192.168.1.82:9093"scrape_configs: - job_name: prometheus static_configs: - targets: - "localhost:9090" labels: instance: prometheus - job_name: 'consul' consul_sd_configs: - server: '192.168.1.82:8500' services: [] relabel_configs: - source_labels: [__meta_consul_tags] - regex: .*test.* action: keep
准备告警规则文件:
# /etc/prometheus/prometheus.rulesgroups: - name: alert-rulerules: - alert: NodeFilesystemUsage-high expr: (1- (node_filesystem_free_bytes{fstype=~"ext3|ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext3|ext4|xfs"}) ) * 100 > 80 for: 2m labels: severity: warning annotations: summary: '{ $labels.instance }}: High Node Filesystem usage detected' description: '{ $labels.instance }}: Node Filesystem usage is above 80% ,(current value is: { $value })' - alert: NodeMemoryUsage expr: (100 - (((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes) * 100)) > 80 for: 2m labels: severity: warning annotations: summary: '{ $labels.instance }}: High Node Memory usage detected' description: '{ $labels.instance }}: Node Memory usage is above 80% ,(current value is: { $value })' - alert: NodeCPUUsage expr: (100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80 for: 2m labels: severity: warning annotations: summary: '{ $labels.instance }}: Node High CPU usage detected' description: '{ $labels.instance }}: Node CPU usage is above 80% ,(current value is: { $value })' 运行 Docker 容器:
docker run \ --restart=always \ -d \ -p 9090:9090 \ -v /etc/prometheus:/etc/prometheus \ prom/prometheus
访问 Prometheus 界面:
http://localhost:9090
Grafana 是一个开源监控和可视化工具,可用于展示 Prometheus 和 Node Exporter 的数据。
安装并启动 Grafana:
wget https://dl.grafana.com/oss/release/grafana-6.0.2-1.x86_64.rpmyum install grafana-6.0.2-1.x86_64.rpm -ysystemctl start grafana-serversystemctl enable grafana-serverss -anltup | grep 3000
在 Grafana 中添加 Node exporter 数据图表,导入模板 ID 为 8919。
安装饼图插件并重启 Grafana:
grafana-cli plugins install grafana-piechartsystemctl restart grafana-server
通过以上步骤,可以完成 Node Exporter、Consul、Alertmanager、Prometheus 和 Grafana 的安装与配置,构建一个完整的监控和告警系统。
转载地址:http://mnjfk.baihongyu.com/