GVKun编程网logo

prometheus: celery, redis-export

14

本文将带您了解关于prometheus:celery,redis-export的新内容,另外,我们还将为您提供关于grafana+prometheus+node_exporter、istioprome

本文将带您了解关于prometheus: celery, redis-export的新内容,另外,我们还将为您提供关于grafana+prometheus+node_exporter、istio prometheus 预警 Prometheus AlertManager 安装 -- 误区、istio prometheus预警Prometheus AlertManager、jvm-exporter整合k8s+prometheus监控报警的实用信息。

本文目录一览:

prometheus: celery, redis-export

prometheus: celery, redis-export

https://github.com/nlighten/tomcat_exporter

https://github.com/prometheus/jmx_exporter

https://vexxhost.com/resources/tutorials/how-to-use-prometheus-to-monitor-your-centos-7-server/
 ./redis_exporter -redis.addr  redis://cache2.w95akh.ng.0001.cnn1.cache.amazonaws.com.cn:6379 -web.listen-address :19121  

启动一个celery worker,即RabbitMQ Brokercelery broker: rabbitmq docker run --link rabbitmq1:rabbit --name some-celery -d celery:latest
检查集群状态:
docker run --link rabbitmq1:rabbit --rm celery celery status
启动一个celery worker,即Redis Broker:
$ docker run --link some-redis:redis -e CELERY_BROKER_URL=redis://redis --name some-celery -d celery

检查集群状态:

$ docker run --link some-redis:redis -e CELERY_BROKER_URL=redis://redis --rm celery celery status

grafana+prometheus+node_exporter

grafana+prometheus+node_exporter

背景

玩一玩

由于都是non-root启动,所以,都是:

  1. 先下载压缩包
  2. 配置systemctl服务配置文件
  3. systemctl start/status/stop {服务名}

grafana

下载:https://grafana.com/grafana/download
需要使用non-root启动的话,下载Standalone Linux Binaries

解压

配置systemctl:

[root@localhost system]# cat grafana-server.service 
[Unit]
Description=grafana - enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored.
Documentation=https://grafana.com/docs/grafana/latest/introduction/
After=network-online.target

[Service]
Type=simple
ExecStart=/home/sysadmin/monitor/grafana-v10.2.2/bin/grafana server \
--homepath /home/sysadmin/monitor/grafana-v10.2.2 \
--config /home/sysadmin/monitor/grafana-v10.2.2/conf/custom.ini
Restart=on-failure
User=sysadmin
Group=sysadmin

[Install]
WantedBy=multi-user.target
[root@localhost system]# 

启动:

[root@localhost system]# systemctl start grafana-server.service 
[root@localhost system]# systemctl status grafana-server.service 
● grafana-server.service - grafana - enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored.
   Loaded: loaded (/etc/systemd/system/grafana-server.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-12-11 17:49:29 CST; 7s ago
     Docs: https://grafana.com/docs/grafana/latest/introduction/
 Main PID: 2548 (grafana)
    Tasks: 21 (limit: 100057)
   Memory: 102.2M
   CGroup: /system.slice/grafana-server.service
           └─2548 /home/sysadmin/monitor/grafana-v10.2.2/bin/grafana server --homepath /home/sysadmin/monitor/grafana-v10.2.2 --config /home/sysadmin/monitor/grafana-v10.2.2/conf/custom.ini

12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=sqlstore.transactions t=2023-12-11T17:49:33.234603441+08:00 level=info msg="Database locked, sleeping then retrying" error="database is lo>
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=ngalert.migration t=2023-12-11T17:49:33.339105773+08:00 level=info msg=Starting
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=ngalert.migration t=2023-12-11T17:49:33.339602491+08:00 level=info msg="No migrations to run"
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=http.server t=2023-12-11T17:49:33.388485736+08:00 level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl=/grafana sock>
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=ngalert.state.manager t=2023-12-11T17:49:33.644194974+08:00 level=info msg="State cache has been initialized" states=0 duration=421.887563>
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=ngalert.scheduler t=2023-12-11T17:49:33.644248371+08:00 level=info msg="Starting scheduler" tickInterval=10s
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=ticker t=2023-12-11T17:49:33.644342054+08:00 level=info msg=starting first_tick=2023-12-11T17:49:40+08:00
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=ngalert.multiorg.alertmanager t=2023-12-11T17:49:33.644355847+08:00 level=info msg="Starting MultiOrg Alertmanager"
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=plugins.update.checker t=2023-12-11T17:49:33.666795693+08:00 level=info msg="Update check succeeded" duration=443.293369ms
12月 11 17:49:34 localhost.localdomain grafana[2548]: logger=grafana.update.checker t=2023-12-11T17:49:34.026805884+08:00 level=info msg="Update check succeeded" duration=803.690713ms
[root@localhost system]# ps -ef |grep grafana
sysadmin    2548       1 15 17:49 ?        00:00:02 /home/sysadmin/monitor/grafana-v10.2.2/bin/grafana server --homepath /home/sysadmin/monitor/grafana-v10.2.2 --config /home/sysadmin/monitor/grafana-v10.2.2/conf/custom.ini
root        2572    1942  0 17:49 pts/1    00:00:00 grep --color=auto grafana
[root@localhost system]# 

检查停止:

[root@localhost system]# systemctl stop grafana-server.service 
[root@localhost system]# systemctl status grafana-server.service 
● grafana-server.service - grafana - enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored.
   Loaded: loaded (/etc/systemd/system/grafana-server.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: https://grafana.com/docs/grafana/latest/introduction/

12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=ngalert.multiorg.alertmanager t=2023-12-11T17:49:33.644355847+08:00 level=info msg="Starting MultiOrg Alertmanager"
12月 11 17:49:33 localhost.localdomain grafana[2548]: logger=plugins.update.checker t=2023-12-11T17:49:33.666795693+08:00 level=info msg="Update check succeeded" duration=443.293369ms
12月 11 17:49:34 localhost.localdomain grafana[2548]: logger=grafana.update.checker t=2023-12-11T17:49:34.026805884+08:00 level=info msg="Update check succeeded" duration=803.690713ms
12月 11 17:50:22 localhost.localdomain grafana[2548]: logger=infra.usagestats t=2023-12-11T17:50:22.23100276+08:00 level=info msg="Usage stats are ready to report"
12月 11 17:50:56 localhost.localdomain systemd[1]: Stopping grafana - enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored....
12月 11 17:50:56 localhost.localdomain grafana[2548]: logger=server t=2023-12-11T17:50:56.762804431+08:00 level=info msg="Shutdown started" reason="System signal: terminated"
12月 11 17:50:56 localhost.localdomain grafana[2548]: logger=tracing t=2023-12-11T17:50:56.763057587+08:00 level=info msg="Closing tracing"
12月 11 17:50:56 localhost.localdomain grafana[2548]: logger=ticker t=2023-12-11T17:50:56.763272714+08:00 level=info msg=stopped last_tick=2023-12-11T17:50:50+08:00
12月 11 17:50:56 localhost.localdomain systemd[1]: grafana-server.service: Succeeded.
12月 11 17:50:56 localhost.localdomain systemd[1]: Stopped grafana - enables you to query, visualize, alert on, and explore your metrics, logs, and traces wherever they are stored..
[root@localhost system]# 

配置开机启动:

[root@localhost system]# systemctl enable grafana-server.service 
Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /etc/systemd/system/grafana-server.service.
[root@localhost system]# systemctl start grafana-server.service

prometheus

下载:https://prometheus.io/download/#prometheus

解压

prometheus启动参数:https://prometheus.io/docs/prometheus/latest/command-line/pro...

配置systemctl:

[root@localhost system]# cat prometheus.service 
[Unit]
Description=prometheus - open-source systems monitoring and alerting toolkit
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target

[Service]
Type=simple
PIDFile=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.pid
ExecStart=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus \
--config.file=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.yml \
--storage.tsdb.retention.time=15d \
--storage.tsdb.path=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/data/ \
--web.max-connections=512 \
--web.read-timeout=5m \
--query.max-concurrency=20 \
--query.timeout=2m
ExecReload=/bin/sh -c "/bin/kill -s HUP $(/bin/cat /home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.pid)"
ExecStop=/bin/sh -c "/bin/kill -s TERM $(/bin/cat /home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.pid)"
User=sysadmin
Group=sysadmin

[Install]
WantedBy=multi-user.target
[root@localhost system]# pwd
/etc/systemd/system
[root@localhost system]# 

启动:systemctl start prometheus

[root@localhost system]# vi prometheus.service 
[root@localhost system]# systemctl start prometheus
Warning: The unit file, source configuration file or drop-ins of prometheus.service changed on disk. Run ''systemctl daemon-reload'' to reload units.
[root@localhost system]# systemctl daemon-reload
[root@localhost system]# systemctl start prometheus
[root@localhost system]# systemctl status prometheus.service 
● prometheus.service - prometheus - open-source systems monitoring and alerting toolkit
   Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-12-11 15:55:42 CST; 6s ago
     Docs: https://prometheus.io/docs/introduction/overview/
 Main PID: 2111 (prometheus)
    Tasks: 10 (limit: 100057)
   Memory: 30.4M
   CGroup: /system.slice/prometheus.service
           └─2111 /home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus --config.file=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.yml --storage.tsdb.retention.time=15d ->

12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.428Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=10 maxSegment=12
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.449Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=11 maxSegment=12
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.449Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=12 maxSegment=12
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.450Z caller=head.go:798 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=1.520374ms wal_>
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.452Z caller=main.go:1045 level=info fs_type=XFS_SUPER_MAGIC
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.452Z caller=main.go:1048 level=info msg="TSDB started"
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.452Z caller=main.go:1229 level=info msg="Loading configuration file" filename=/home/sysadmin/monitor/prometheus-2.48.0.>
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.453Z caller=main.go:1266 level=info msg="Completed loading of configuration file" filename=/home/sysadmin/monitor/prome>
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.453Z caller=main.go:1009 level=info msg="Server is ready to receive web requests."
12月 11 15:55:42 localhost.localdomain prometheus[2111]: ts=2023-12-11T07:55:42.453Z caller=manager.go:1012 level=info component="rule manager" msg="Starting rule manager..."
lines 1-20/20 (END)    

检查进程在:

[root@localhost system]# ps -ef |grep prometheus
sysadmin    2111       1  0 15:55 ?        00:00:00 /home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus --config.file=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.yml --storage.tsdb.retention.time=15d --storage.tsdb.path=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/data/ --web.max-connections=512 --web.read-timeout=5m --query.max-concurrency=20 --query.timeout=2m
root        2160    1942  0 16:04 pts/1    00:00:00 grep --color=auto prometheus

检查停止正常:

[root@localhost system]# systemctl stop prometheus.service 
[root@localhost system]# systemctl status prometheus.service 
● prometheus.service - prometheus - open-source systems monitoring and alerting toolkit
   Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2023-12-11 16:06:38 CST; 5s ago
     Docs: https://prometheus.io/docs/introduction/overview/
  Process: 2182 ExecStop=/bin/sh -c /bin/kill -s TERM $(/bin/cat /home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.pid) (code=exited, status=1/FAILURE)
  Process: 2111 ExecStart=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus --config.file=/home/sysadmin/monitor/prometheus-2.48.0.linux-amd64/prometheus.yml --storage.tsdb.retention.ti>
 Main PID: 2111 (code=exited, status=0/SUCCESS)

12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.036Z caller=manager.go:1036 level=info component="rule manager" msg="Rule manager stopped"
12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.036Z caller=main.go:934 level=info msg="Stopping scrape manager..."
12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.036Z caller=main.go:879 level=info msg="Scrape discovery manager stopped"
12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.036Z caller=main.go:893 level=info msg="Notify discovery manager stopped"
12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.036Z caller=main.go:926 level=info msg="Scrape manager stopped"
12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.095Z caller=notifier.go:604 level=info component=notifier msg="Stopping notification manager..."
12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.095Z caller=main.go:1155 level=info msg="Notifier manager stopped"
12月 11 16:06:38 localhost.localdomain prometheus[2111]: ts=2023-12-11T08:06:38.095Z caller=main.go:1167 level=info msg="See you next time!"
12月 11 16:06:38 localhost.localdomain systemd[1]: prometheus.service: Failed with result ''exit-code''.
12月 11 16:06:38 localhost.localdomain systemd[1]: Stopped prometheus - open-source systems monitoring and alerting toolkit.
[root@localhost system]# ps -ef |grep prometheus
root        2191    1942  0 16:06 pts/1    00:00:00 grep --color=auto prometheus
[root@localhost system]# 

配置开机启动:

[root@localhost system]# systemctl enable prometheus.service 
Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service.
[root@localhost system]# 

node_exporter

下载:https://prometheus.io/download/#node_exporter

解压

配置systemctl:

[root@localhost system]# cat node_exporter.service 
[Unit]
Description=node_exporter - Prometheus exporter for hardware and OS metrics
Documentation=https://github.com/prometheus/node_exporter; https://prometheus.io/docs/guides/node-exporter/;
After=network-online.target

[Service]
Type=simple
ExecStart=/home/sysadmin/monitor/node_exporter-1.7.0.linux-amd64/node_exporter
Restart=on-failure
User=sysadmin
Group=sysadmin

[Install]
WantedBy=multi-user.target
[root@localhost system]# 

启动:

[root@localhost system]# systemctl start node_exporter.service 
[root@localhost system]# systemctl status node_exporter.service 
● node_exporter.service - node_exporter - Prometheus exporter for hardware and OS metrics
   Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-12-11 16:19:44 CST; 4s ago
     Docs: https://github.com/prometheus/node_exporter;
           https://prometheus.io/docs/guides/node-exporter/;
 Main PID: 2231 (node_exporter)
    Tasks: 6 (limit: 100057)
   Memory: 18.6M
   CGroup: /system.slice/node_exporter.service
           └─2231 /home/sysadmin/monitor/node_exporter-1.7.0.linux-amd64/node_exporter

12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=thermal_zone
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=time
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=timex
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=udp_queues
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=uname
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=vmstat
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=xfs
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=zfs
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9100
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9100
[root@localhost system]# ps -ef |grep node_exporter
sysadmin    2231       1  0 16:19 ?        00:00:00 /home/sysadmin/monitor/node_exporter-1.7.0.linux-amd64/node_exporter
root        2240    1942  0 16:20 pts/1    00:00:00 grep --color=auto node_exporter
[root@localhost system]# 

检查停止正常:

[root@localhost system]# systemctl stop node_exporter.service 
[root@localhost system]# ps -ef |grep node
root        2249    1942  0 16:23 pts/1    00:00:00 grep --color=auto node
[root@localhost system]# systemctl status node_exporter.service 
● node_exporter.service - node_exporter - Prometheus exporter for hardware and OS metrics
   Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: https://github.com/prometheus/node_exporter;
           https://prometheus.io/docs/guides/node-exporter/;

12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=udp_queues
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=uname
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=vmstat
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=xfs
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=node_exporter.go:117 level=info collector=zfs
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9100
12月 11 16:19:45 localhost.localdomain node_exporter[2231]: ts=2023-12-11T08:19:45.017Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9100
12月 11 16:23:22 localhost.localdomain systemd[1]: Stopping node_exporter - Prometheus exporter for hardware and OS metrics...
12月 11 16:23:22 localhost.localdomain systemd[1]: node_exporter.service: Succeeded.
12月 11 16:23:22 localhost.localdomain systemd[1]: Stopped node_exporter - Prometheus exporter for hardware and OS metrics.
[root@localhost system]# 

配置开机启动:

[root@localhost system]# systemctl enable node_exporter.service 
Created symlink /etc/systemd/system/multi-user.target.wants/node_exporter.service → /etc/systemd/system/node_exporter.service.
[root@localhost system]# 

成果

忘记grafana密码了:

[sysadmin@localhost bin]$ ls -l
总用量 194400
-rwxr-xr-x 1 sysadmin sysadmin 195352632 11月 20 21:24 grafana
-rwxr-xr-x 1 sysadmin sysadmin   1851408 11月 20 21:24 grafana-cli
-rwxr-xr-x 1 sysadmin sysadmin   1851408 11月 20 21:24 grafana-server
[sysadmin@localhost bin]$ ./grafana-cli admin reset-admin-password monitor
Deprecation warning: The standalone ''grafana-cli'' program is deprecated and will be removed in the future. Please update all uses of ''grafana-cli'' to ''grafana cli''
INFO [12-11|17:56:36] Starting Grafana                         logger=settings version= commit= branch= compiled=1970-01-01T08:00:00+08:00
INFO [12-11|17:56:36] Config loaded from                       logger=settings file=/home/sysadmin/monitor/grafana-v10.2.2/conf/defaults.ini
INFO [12-11|17:56:36] Config loaded from                       logger=settings file=/home/sysadmin/monitor/grafana-v10.2.2/conf/custom.ini
INFO [12-11|17:56:36] Target                                   logger=settings target=[all]
INFO [12-11|17:56:36] Path Home                                logger=settings path=/home/sysadmin/monitor/grafana-v10.2.2
INFO [12-11|17:56:36] Path Data                                logger=settings path=/home/sysadmin/monitor/grafana-v10.2.2/data
INFO [12-11|17:56:36] Path Logs                                logger=settings path=/home/sysadmin/monitor/grafana-v10.2.2/data/log
INFO [12-11|17:56:36] Path Plugins                             logger=settings path=/home/sysadmin/monitor/grafana-v10.2.2/data/plugins
INFO [12-11|17:56:36] Path Provisioning                        logger=settings path=/home/sysadmin/monitor/grafana-v10.2.2/conf/provisioning
INFO [12-11|17:56:36] App mode production                      logger=settings
INFO [12-11|17:56:36] Connecting to DB                         logger=sqlstore dbtype=sqlite3
INFO [12-11|17:56:36] Starting DB migrations                   logger=migrator
INFO [12-11|17:56:36] migrations completed                     logger=migrator performed=0 skipped=608 duration=545.03µs
INFO [12-11|17:56:36] Validated license token                  logger=licensing appURL=http://localhost:3000/grafana/ source=disk status=NotFound
INFO [12-11|17:56:36] Envelope encryption state                logger=secrets enabled=true current provider=secretKey.v1

Admin password changed successfully ✔

[sysadmin@localhost bin]$ 

创建一个dashboard:
node dashboard

istio prometheus 预警 Prometheus AlertManager 安装 -- 误区

istio prometheus 预警 Prometheus AlertManager 安装 -- 误区

 误区:因为 prometheus 是一个 pod 不能重启,所以此方案适用于单独非容器安装的 prometheus。

1. 进入 pod

kubectl exec -n istio-system -it grafana-694477c588-8rbvc /bin/bash

2. 下载安装 Prometheus AlertManager

grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource

3.restart grafana service

service grafana-server restart

4. 查看错误原因

grafana-server -homepath /usr/share/grafana

 

Right. Something is already using port 3000.

Stop the grafana services
systemctl stop grafana.service -l

do a ps aux |grep grafana and kill if any process exits

Then start grafana using
systemctl stop grafana.service -l

Tailf for grafana logs

tailf /var/log/grafana/grafana.log

5. 查看本地安装的 Plugin

grafana-cli plugins ls

service grafana-server restart

 

istio prometheus预警Prometheus AlertManager

istio prometheus预警Prometheus AlertManager

1.安装alertmanager

   kubectl create -f 以下文件

         alertmanager-templates.yaml、configmap.yaml、deployment.yaml、service.yaml

apiVersion: v1
data:
  default.tmpl: |
    {{ define "__alertmanager" }}AlertManager{{ end }}
    {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}

    {{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
    {{ define "__description" }}{{ end }}

    {{ define "__text_alert_list" }}{{ range . }}Labels:
    {{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}
    {{ end }}Annotations:
    {{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}
    {{ end }}Source: {{ .GeneratorURL }}
    {{ end }}{{ end }}


    {{ define "slack.default.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "slack.default.username" }}{{ template "__alertmanager" . }}{{ end }}
    {{ define "slack.default.fallback" }}{{ template "slack.default.title" . }} | {{ template "slack.default.titlelink" . }}{{ end }}
    {{ define "slack.default.pretext" }}{{ end }}
    {{ define "slack.default.titlelink" }}{{ template "__alertmanagerURL" . }}{{ end }}
    {{ define "slack.default.iconemoji" }}{{ end }}
    {{ define "slack.default.iconurl" }}{{ end }}
    {{ define "slack.default.text" }}{{ end }}


    {{ define "hipchat.default.from" }}{{ template "__alertmanager" . }}{{ end }}
    {{ define "hipchat.default.message" }}{{ template "__subject" . }}{{ end }}


    {{ define "pagerduty.default.description" }}{{ template "__subject" . }}{{ end }}
    {{ define "pagerduty.default.client" }}{{ template "__alertmanager" . }}{{ end }}
    {{ define "pagerduty.default.clientURL" }}{{ template "__alertmanagerURL" . }}{{ end }}
    {{ define "pagerduty.default.instances" }}{{ template "__text_alert_list" . }}{{ end }}


    {{ define "opsgenie.default.message" }}{{ template "__subject" . }}{{ end }}
    {{ define "opsgenie.default.description" }}{{ .CommonAnnotations.SortedPairs.Values | join " " }}
    {{ if gt (len .Alerts.Firing) 0 -}}
    Alerts Firing:
    {{ template "__text_alert_list" .Alerts.Firing }}
    {{- end }}
    {{ if gt (len .Alerts.Resolved) 0 -}}
    Alerts Resolved:
    {{ template "__text_alert_list" .Alerts.Resolved }}
    {{- end }}
    {{- end }}
    {{ define "opsgenie.default.source" }}{{ template "__alertmanagerURL" . }}{{ end }}


    {{ define "victorops.default.message" }}{{ template "__subject" . }} | {{ template "__alertmanagerURL" . }}{{ end }}
    {{ define "victorops.default.from" }}{{ template "__alertmanager" . }}{{ end }}


    {{ define "email.default.subject" }}{{ template "__subject" . }}{{ end }}
    {{ define "email.default.html" }}
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <!--
    Style and HTML derived from https://github.com/mailgun/transactional-email-templates


    The MIT License (MIT)

    Copyright (c) 2014 Mailgun

    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to deal
    in the Software without restriction, including without limitation the rights
    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the Software is
    furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all
    copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    SOFTWARE.
    -->
    <html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta name="viewport" content="width=device-width"/>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <title>{{ template "__subject" . }}</title>

    </head>

    <body itemscope="" itemtype="http://schema.org/EmailMessage"bgcolor="#f6f6f6">

    <tablebgcolor="#f6f6f6">
      <tr>
        <tdvalign="top"></td>
        <td width="600"valign="top">
          <div>
            <table width="100%" cellpadding="0" cellspacing="0"bgcolor="#fff">
              <tr>
                <tdalign="center" bgcolor="#E6522C" valign="top">
                  {{ .Alerts | len }} alert{{ if gt (len .Alerts) 1 }}s{{ end }} for {{ range .GroupLabels.SortedPairs }}
                    {{ .Name }}={{ .Value }}
                  {{ end }}
                </td>
              </tr>
              <tr>
                <tdvalign="top">
                  <table width="100%" cellpadding="0" cellspacing="0">
                    <tr>
                      <tdvalign="top">
                        <a href="{{ template "__alertmanagerURL" . }}">View in {{ template "__alertmanager" . }}</a>
                      </td>
                    </tr>
                    {{ if gt (len .Alerts.Firing) 0 }}
                    <tr>
                      <tdvalign="top">
                        <strong>[{{ .Alerts.Firing | len }}] Firing</strong>
                      </td>
                    </tr>
                    {{ end }}
                    {{ range .Alerts.Firing }}
                    <tr>
                      <tdvalign="top">
                        <strong>Labels</strong><br/>
                        {{ range .Labels.SortedPairs }}{{ .Name }} = {{ .Value }}<br/>{{ end }}
                        {{ if gt (len .Annotations) 0 }}<strong>Annotations</strong><br/>{{ end }}
                        {{ range .Annotations.SortedPairs }}{{ .Name }} = {{ .Value }}<br/>{{ end }}
                        <a href="{{ .GeneratorURL }}">Source</a><br/>
                      </td>
                    </tr>
                    {{ end }}

                    {{ if gt (len .Alerts.Resolved) 0 }}
                      {{ if gt (len .Alerts.Firing) 0 }}
                    <tr>
                      <tdvalign="top">
                        <br/>
                        <hr/>
                        <br/>
                      </td>
                    </tr>
                      {{ end }}
                    <tr>
                      <tdvalign="top">
                        <strong>[{{ .Alerts.Resolved | len }}] Resolved</strong>
                      </td>
                    </tr>
                    {{ end }}
                    {{ range .Alerts.Resolved }}
                    <tr>
                      <tdvalign="top">
                        <strong>Labels</strong><br/>
                        {{ range .Labels.SortedPairs }}{{ .Name }} = {{ .Value }}<br/>{{ end }}
                        {{ if gt (len .Annotations) 0 }}<strong>Annotations</strong><br/>{{ end }}
                        {{ range .Annotations.SortedPairs }}{{ .Name }} = {{ .Value }}<br/>{{ end }}
                        <a href="{{ .GeneratorURL }}">Source</a><br/>
                      </td>
                    </tr>
                    {{ end }}
                  </table>
                </td>
              </tr>
            </table>

            <div>
              <table width="100%">
                <tr>
                  <tdalign="center" valign="top"><a href="{{ .ExternalURL }}">Sent by {{ template "__alertmanager" . }}</a></td>
                </tr>
              </table>
            </div></div>
        </td>
        <tdvalign="top"></td>
      </tr>
    </table>

    </body>
    </html>

    {{ end }}

    {{ define "pushover.default.title" }}{{ template "__subject" . }}{{ end }}
    {{ define "pushover.default.message" }}{{ .CommonAnnotations.SortedPairs.Values | join " " }}
    {{ if gt (len .Alerts.Firing) 0 }}
    Alerts Firing:
    {{ template "__text_alert_list" .Alerts.Firing }}
    {{ end }}
    {{ if gt (len .Alerts.Resolved) 0 }}
    Alerts Resolved:
    {{ template "__text_alert_list" .Alerts.Resolved }}
    {{ end }}
    {{ end }}
    {{ define "pushover.default.url" }}{{ template "__alertmanagerURL" . }}{{ end }}
  slack.tmpl: |
    {{ define "slack.devops.text" }}
    {{range .Alerts}}{{.Annotations.DESCRIPTION}}
    {{end}}
    {{ end }}
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: alertmanager-templates
  namespace: monitoring
alertmanager-templates.yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitoring
data:
  config.yml: |-
    global:
      # ResolveTimeout is the time after which an alert is declared resolved
      # if it has not been updated.
      resolve_timeout: 5m

      # The smarthost and SMTP sender used for mail notifications.
      smtp_smarthost: ''smtp.126.com:25''
      smtp_from: ''xxx@126.com''
      smtp_auth_username: ''xxx@126.com''
      smtp_auth_password: ''xxx''

      # The API URL to use for Slack notifications.
      slack_api_url: ''https://hooks.slack.com/services/some/api/token''

    # # The directory from which notification templates are read.
    templates:
    - ''/etc/alertmanager-templates/*.tmpl''

    # The root route on which each incoming alert enters.
    route:

      # The labels by which incoming alerts are grouped together. For example,
      # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
      # be batched into a single group.

      group_by: [''alertname'', ''cluster'', ''service'']

      # When a new group of alerts is created by an incoming alert, wait at
      # least ''group_wait'' to send the initial notification.
      # This way ensures that you get multiple alerts for the same group that start
      # firing shortly after another are batched together on the first
      # notification.

      group_wait: 30s

      # When the first notification was sent, wait ''group_interval'' to send a batch
      # of new alerts that started firing for that group.

      group_interval: 5m

      # If an alert has successfully been sent, wait ''repeat_interval'' to
      # resend them.

      #repeat_interval: 1m
      repeat_interval: 15m

      # A default receiver

      # If an alert isn''t caught by a route, send it to default.
      receiver: default

      # All the above attributes are inherited by all child routes and can
      # overwritten on each.

      # The child route trees.
      routes:
      # Send severity=slack alerts to slack.
      # - match:
      #     severity: slack
      #   receiver: slack_alert
      - match:
          severity: email
        receiver: email_alert

    receivers:
    - name: ''default''
      slack_configs:
      - channel: ''#alertmanager-test''
        text: ''<!channel>{{ template "slack.devops.text" . }}''
        send_resolved: true

    - name: ''slack_alert''
      slack_configs:
      - channel: ''#alertmanager-test''
        send_resolved: true
    - name: live-monitoring
      email_configs:
      - to: xxx@ultrapower.com.cn #接收邮箱地址
configmap.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      name: alertmanager
      labels:
        app: alertmanager
    spec:
      containers:
      - name: alertmanager
        image: quay.io/prometheus/alertmanager:v0.7.1
        args:
          - ''-config.file=/etc/alertmanager/config.yml''
          - ''-storage.path=/alertmanager''
        ports:
        - name: alertmanager
          containerPort: 9093
        volumeMounts:
        - name: config-volume
          mountPath: /etc/alertmanager
        - name: templates-volume
          mountPath: /etc/alertmanager-templates
        - name: alertmanager
          mountPath: /alertmanager
      volumes:
      - name: config-volume
        configMap:
          name: alertmanager
      - name: templates-volume
        configMap:
          name: alertmanager-templates
      - name: alertmanager
        emptyDir: {}
deployment.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: ''true''
    prometheus.io/path: ''/metrics''
  labels:
    name: alertmanager
  name: alertmanager
  namespace: monitoring
spec:
  selector:
    app: alertmanager
  type: NodePort
  ports:
  - name: alertmanager
    protocol: TCP
    port: 9093
    targetPort: 9093
service.yaml

 

2.配置prometheus对alertmanager的链接

编辑prometheus.yaml文件,新增

alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - alertmanager:9093

 

 

 

3.配置预警rules

 编辑prometheus.yaml文件,新增以下代码

 rule_files:
    - "/data/istio/prometheus-rules/*.rules.yml"

 

挂载规则文件:

 

根据文件生成configMap:

 kubectl create configmap prometheus-rules --from-file=prometheus-rules -o yaml --namespace=istio-system

 

注意:rules文件夹名称为prometheus-rules,rules文件为yml格式,如:

test.rules.yml

groups:
- name: my-group-name
  interval: 30s   # defaults to global interval
  rules:
  - record: instance:errors:rate5m
    expr: rate(errors_total[5m])
  - record: instance:requests:rate5m
    expr: rate(requests_total[5m])
  - alert: HighErrors
    # Expressions remain PromQL as before and can be spread over
    # multiple lines via YAML’s multi-line strings.
    expr: |
      sum without(instance) (instance:errors:rate5m)
      /
      sum without(instance) (instance:requests:rate5m)
    for: 5m
    labels:
      severity: critical
    annotations:
      description: "stuff''s happening with {{ $labels.service }}"

 

jvm-exporter整合k8s+prometheus监控报警

jvm-exporter整合k8s+prometheus监控报警

文章背景:使用Prometheus+Grafana监控JVM,这片文章中介绍了怎么用jvm-exporter监控我们的java应用,在我们的使用场景中需要监控k8s集群中的jvm,接下来谈谈k8s和Prometheus的集成扩展使用,假设我们已经成功将Prometheus部署到我们的k8s集群中了kubernetes集成prometheus+grafana监控,但是kube-prometheus并没有集成jvm-exporter,这就需要我们自己操作。

  1. 将jvm-exporter整合进我们的应用

整合过程很简单,只需要将jvm-exporter作为javaagent加入到我们的java启动命令就可以了,详细见使用Prometheus+Grafana监控JVM

  1. 配置Prometheus服务自动发现

对于有Service暴露的服务我们可以用 prometheus-operator 项目定义的ServiceMonitorCRD来配置服务发现,配置模板如下:

--- # ServiceMonitor 服务自动发现规则
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor # prometheus-operator 定义的CRD
metadata:
  name: jmx-metrics
  namespace: monitoring
  labels:
    k8s-apps: jmx-metrics
spec:
  jobLabel: metrics #监控数据的job标签指定为metrics label的值,即加上数据标签job=jmx-metrics
  selector:
    matchLabels:
      metrics: jmx-metrics # 自动发现 label中有metrics: jmx-metrics 的service
  namespaceSelector:
    matchNames: # 配置需要自动发现的命名空间,可以配置多个
    - my-namespace
  endpoints:
  - port: http-metrics # 拉去metric的端口,这个写的是 service的端口名称,即 service yaml的spec.ports.name
    interval: 15s # 拉取metric的时间间隔

--- # 服务service模板
apiVersion: v1
kind: Service
metadata:
  labels:
    metrics: jmx-metrics # ServiceMonitor 自动发现的关键label
  name: jmx-metrics
  namespace: my-namespace
spec:
  ports:
  - name: http-metrics #对应 ServiceMonitor 中spec.endpoints.port
    port: 9093 # jmx-exporter 暴露的服务端口
    targetPort: http-metrics # pod yaml 暴露的端口名
  selector:
    metrics: jmx-metrics # service本身的标签选择器

以上配置了my-namespace命名空间的 jmx-metrics Service的服务自动发现,Prometheus会将这个service 的所有关联pod自动加入监控,并从apiserver获取到最新的pod列表,这样当我们的服务副本扩充时也能自动添加到监控系统中。

那么对于没有创建 Service 的服务,比如以HostPort对集群外暴露服务的实例,我们可以使用 PodMonitor 来做服务发现,相关样例如下:

--- # PodMonitor 服务自动发现规则,最新的版本支持,旧版本可能不支持
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor # prometheus-operator 定义的CRD
metadata:
  name: jmx-metrics
  namespace: monitoring
  labels:
    k8s-apps: jmx-metrics
spec:
  jobLabel: metrics #监控数据的job标签指定为metrics label的值,即加上数据标签job=jmx-metrics
  selector:
    matchLabels:
     metrics: jmx-metrics # 自动发现 label中有metrics: jmx-metrics 的pod
  namespaceSelector:
    matchNames: # 配置需要自动发现的命名空间,可以配置多个
    - my-namespace
  podMetricsEndpoints:
  - port: http-metrics # Pod yaml中 metric暴露端口的名称 即 spec.ports.name
    interval: 15s # 拉取metric的时间间隔
--- # 需要监控的Pod模板
apiVersion: v1
kind: Pod
metadata:
  labels:
    metrics: jmx-metrics
  name: jmx-metrics
  namespace: my-namespace
spec:
  containers:
  - image: tomcat:9.0
    name: tomcat
    ports:
    - containerPort: 9093
      name: http-metrics
  1. 为Prometheus serviceAccount 添加对应namespace的权限
--- # 在对应的ns中创建角色
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: prometheus-k8s
  namespace: my-namespace
rules:
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
--- # 绑定角色 prometheus-k8s 角色到 Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prometheus-k8s
  namespace: my-namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus-k8s
subjects:
- kind: ServiceAccount
  name: prometheus-k8s # Prometheus 容器使用的 serviceAccount,kube-prometheus默认使用prometheus-k8s这个用户
  namespace: monitoring
  1. 在Prometheus管理页面中查看服务发现

服务发现配置成功后会出现在Prometheus的管理界面中:

image.png

  1. 添加报警规则

    新建报警规则文件:jvm-alert-rules.yaml,填入以下内容

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
     prometheus: k8s
     role: alert-rules
      name: jvm-metrics-rules
      namespace: monitoring
    spec:
      groups:
      - name: jvm-metrics-rules
     rules:
     # 在5分钟里,GC花费时间超过10%
     - alert: GcTimeTooMuch
       expr: increase(jvm_gc_collection_seconds_sum[5m]) > 30
       for: 5m
       labels:
         severity: red
       annotations:
         summary: "{{ $labels.app }} GC时间占比超过10%"
         message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} GC时间占比超过10%,当前值({{ $value }}%)"
     # GC次数太多
     - alert: GcCountTooMuch
       expr: increase(jvm_gc_collection_seconds_count[1m]) > 30
       for: 1m
       labels:
         severity: red
       annotations:
         summary: "{{ $labels.app }} 1分钟GC次数>30次"
         message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1分钟GC次数>30次,当前值({{ $value }})"
     # FGC次数太多
     - alert: FgcCountTooMuch
       expr: increase(jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep"}[1h]) > 3
       for: 1m
       labels:
         severity: red
       annotations:
         summary: "{{ $labels.app }} 1小时的FGC次数>3次"
         message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1小时的FGC次数>3次,当前值({{ $value }})"
     # 非堆内存使用超过80%
     - alert: NonheapUsageTooMuch
       expr: jvm_memory_bytes_used{job="jmx-metrics", area="nonheap"} / jvm_memory_bytes_max * 100 > 80
       for: 1m
       labels:
         severity: red
       annotations:
         summary: "{{ $labels.app }} 非堆内存使用>80%"
         message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 非堆内存使用率>80%,当前值({{ $value }}%)"
     # 内存使用预警
     - alert: HeighMemUsage
       expr: process_resident_memory_bytes{job="jmx-metrics"} / os_total_physical_memory_bytes * 100 > 85
       for: 1m
       labels:
         severity: red
       annotations:
         summary: "{{ $labels.app }} rss内存使用率大于85%"
         message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} rss内存使用率大于85%,当前值({{ $value }}%)"
    

    执行 kubectl apply -f jvm-alert-rules.yaml 使规则生效

  2. 添加报警接收人

    编辑接受人配置:

global:
  resolve_timeout: 5m
route:
  group_by: [''job'', ''alertname'', ''pod'']
  group_interval: 2m
  receiver: my-alert-receiver
  routes:
  - match: 
      job: jmx-metrics
    receiver: my-alert-receiver
    repeat_interval: 3h
receivers:
- name: my-alert-receiver
  webhook_configs:
  - url: http://mywebhook.com/
    max_alerts: 1
    send_resolved: true

使用工具转换为base64编码,填入alert-manager对应的配置Secret中
kubectl edit -n monitoring Secret alertmanager-main

apiVersion: v1
data:
  alertmanager.yaml: KICAgICJyZWNlaXZlciI6ICJudWxsIg== # base64填入这里
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
type: Opaque

退出编辑后稍等一会儿生效。

自此,jvm监控系统配置完成。

附jvm-exporter接口返回参数示例,可以根据需要自取其中的metric

# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 218.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 40.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 219.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 249.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jvm_threads_state Current count of threads by state
# TYPE jvm_threads_state gauge
jvm_threads_state{state="NEW",} 0.0
jvm_threads_state{state="RUNNABLE",} 49.0
jvm_threads_state{state="TIMED_WAITING",} 141.0
jvm_threads_state{state="TERMINATED",} 0.0
jvm_threads_state{state="WAITING",} 28.0
jvm_threads_state{state="BLOCKED",} 0.0
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_261-b12",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.553562144E9
jvm_memory_bytes_used{area="nonheap",} 6.5181496E7
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 4.08027136E9
jvm_memory_bytes_committed{area="nonheap",} 6.8747264E7
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 4.08027136E9
jvm_memory_bytes_max{area="nonheap",} 1.317011456E9
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{area="heap",} 4.294967296E9
jvm_memory_bytes_init{area="nonheap",} 2555904.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 2.096832E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 3.9320064E7
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 4893112.0
jvm_memory_pool_bytes_used{pool="Par Eden Space",} 1.71496168E8
jvm_memory_pool_bytes_used{pool="Par Survivor Space",} 7.1602832E7
jvm_memory_pool_bytes_used{pool="CMS Old Gen",} 1.310463144E9
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.3396352E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 4.0239104E7
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 5111808.0
jvm_memory_pool_bytes_committed{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_committed{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_committed{pool="CMS Old Gen",} 2.147483648E9
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} 5.36870912E8
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 5.28482304E8
jvm_memory_pool_bytes_max{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_max{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_max{pool="CMS Old Gen",} 2.147483648E9
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
jvm_memory_pool_bytes_init{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_init{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_init{pool="CMS Old Gen",} 2.147483648E9
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP os_free_physical_memory_bytes FreePhysicalMemorySize (java.lang<type=OperatingSystem><>FreePhysicalMemorySize)
# TYPE os_free_physical_memory_bytes gauge
os_free_physical_memory_bytes 9.1234304E8
# HELP os_committed_virtual_memory_bytes CommittedVirtualMemorySize (java.lang<type=OperatingSystem><>CommittedVirtualMemorySize)
# TYPE os_committed_virtual_memory_bytes gauge
os_committed_virtual_memory_bytes 2.2226296832E10
# HELP os_total_swap_space_bytes TotalSwapSpaceSize (java.lang<type=OperatingSystem><>TotalSwapSpaceSize)
# TYPE os_total_swap_space_bytes gauge
os_total_swap_space_bytes 0.0
# HELP os_max_file_descriptor_count MaxFileDescriptorCount (java.lang<type=OperatingSystem><>MaxFileDescriptorCount)
# TYPE os_max_file_descriptor_count gauge
os_max_file_descriptor_count 1048576.0
# HELP os_system_load_average SystemLoadAverage (java.lang<type=OperatingSystem><>SystemLoadAverage)
# TYPE os_system_load_average gauge
os_system_load_average 4.97
# HELP os_total_physical_memory_bytes TotalPhysicalMemorySize (java.lang<type=OperatingSystem><>TotalPhysicalMemorySize)
# TYPE os_total_physical_memory_bytes gauge
os_total_physical_memory_bytes 1.073741824E10
# HELP os_system_cpu_load SystemCpuLoad (java.lang<type=OperatingSystem><>SystemCpuLoad)
# TYPE os_system_cpu_load gauge
os_system_cpu_load 1.0
# HELP os_free_swap_space_bytes FreeSwapSpaceSize (java.lang<type=OperatingSystem><>FreeSwapSpaceSize)
# TYPE os_free_swap_space_bytes gauge
os_free_swap_space_bytes 0.0
# HELP os_available_processors AvailableProcessors (java.lang<type=OperatingSystem><>AvailableProcessors)
# TYPE os_available_processors gauge
os_available_processors 6.0
# HELP os_process_cpu_load ProcessCpuLoad (java.lang<type=OperatingSystem><>ProcessCpuLoad)
# TYPE os_process_cpu_load gauge
os_process_cpu_load 0.14194299011052938
# HELP os_open_file_descriptor_count OpenFileDescriptorCount (java.lang<type=OperatingSystem><>OpenFileDescriptorCount)
# TYPE os_open_file_descriptor_count gauge
os_open_file_descriptor_count 717.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 0.004494197
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP jmx_scrape_cached_beans Number of beans with their matching rule cached
# TYPE jmx_scrape_cached_beans gauge
jmx_scrape_cached_beans 0.0
# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="direct",} 2.3358974E7
jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool="direct",} 2.3358974E7
jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool="direct",} 61.0
jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="ParNew",} 77259.0
jvm_gc_collection_seconds_sum{gc="ParNew",} 2399.831
jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep",} 1.0
jvm_gc_collection_seconds_sum{gc="ConcurrentMarkSweep",} 0.29
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1759604.89
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.608630226597E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 717.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.2226292736E10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.644765696E9
# HELP jmx_exporter_build_info A metric with a constant ''1'' value labeled with the version of the JMX exporter.
# TYPE jmx_exporter_build_info gauge
jmx_exporter_build_info{version="0.14.0",name="jmx_prometheus_javaagent",} 1.0
# HELP jvm_memory_pool_allocated_bytes_total Total bytes allocated in a given JVM memory pool. Only updated after GC, not continuously.
# TYPE jvm_memory_pool_allocated_bytes_total counter
jvm_memory_pool_allocated_bytes_total{pool="Par Survivor Space",} 1.42928399936E11
jvm_memory_pool_allocated_bytes_total{pool="CMS Old Gen",} 2.862731656E9
jvm_memory_pool_allocated_bytes_total{pool="Code Cache",} 2.8398656E7
jvm_memory_pool_allocated_bytes_total{pool="Compressed Class Space",} 4912848.0
jvm_memory_pool_allocated_bytes_total{pool="Metaspace",} 3.9438872E7
jvm_memory_pool_allocated_bytes_total{pool="Par Eden Space",} 1.32737951722432E14
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 7282.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 7317.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 35.0

关于prometheus: celery, redis-export的问题就给大家分享到这里,感谢你花时间阅读本站内容,更多关于grafana+prometheus+node_exporter、istio prometheus 预警 Prometheus AlertManager 安装 -- 误区、istio prometheus预警Prometheus AlertManager、jvm-exporter整合k8s+prometheus监控报警等相关知识的信息别忘了在本站进行查找喔。

本文标签: