Prometheus 初探一

March 16, 2019

Prometheus 初探一

概述

主要特性

由指标名和k/v对确定的一个多维的时序数据模型。
PromQL，灵活的查询语言，以利用此纬度。
不依赖分布式存储，单个服务节点是可用的。
时间序列集合通过HTTP上的拉模型发生。
推送时许数据可通过中间网关支持。
可通过服务发现或者静态配置发现目标。
多种图形和仪表板支持模式。

组件

Prometheus生态系统由多个系统组件组成，其中许多组件是可选的。

Prometheus server用于存储时许数据。
client libraries 用于检测应用程序代码的客户端库。
push gateway 用于支持简短的任务。
针对HAProxy，StatsD，Graphite等服务的特殊exporters.
alertmanager 处理alter。
其他支持工具。

架构

该图说明了Prometheus的体系结构及其一些生态系统组件。

Prometheus architecture

什么时候适用

Prometheus非常适合用来记录数字的时许数据，它适用于以机器为中心的监控以及高度动态的面向服务架构的监控，它对多维数据收集和查询的支持是一种特殊的优势，

什么时候不适用

如果需要收集100%的数据，并且系统出问题时也能正常的工作的话，Prometheus是不适用的，因为Prometheus收集数据时可能会丢失数据。

Prometheus 第一步

下载相关平台的最新稳定版,https://github.com/prometheus/prometheus/releases/download/v2.7.2/prometheus-2.7.2.linux-amd64.tar.gz 。

tar xvfz prometheus-*.tar.gz
cd prometheus-*

Prometheus server再linux平台下是一个可执行的二进制文件，可以运行 Prometheus命令来查看帮助命令。

[developer@prometheus-2.7.2.linux-amd64]$ ./prometheus —help

usage: prometheus []

The Prometheus monitoring server

Flags:

-h, —help Show context-sensitive help (also try —help-long and —help-man).

—version Show application version.

—config.file=“prometheus.yml”

Prometheus configuration file path.

—web.listen-address=“0.0.0.0:9090”

Address to listen on for UI, API, and telemetry.

—web.read-timeout=5m Maximum duration before timing out read of the request, and closing idle connections.

—web.max-connections=512 Maximum number of simultaneous connections.

—web.external-url= The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a reverse proxy). Used for generating relative and absolute links back to

Prometheus itself. If the URL has a path portion, it will be used to prefix all HTTP endpoints served by Prometheus. If omitted, relevant URL components will be derived

automatically.

—web.route-prefix= Prefix for the internal routes of web endpoints. Defaults to path of —web.external-url.

—web.user-assets= Path to static asset directory, available at /user.

配置prometheus

prometheus通过prometheus.yml来进行配置。

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

alerting:

alertmanagers:

static_configs:

- targets:

rule_files:

scrape_configs:

job_name: ‘prometheus’

static_configs:

- targets: [‘localhost:9090’]

配置由四个模块组成，global,alerting,rule_files，scrape_configs.这个global模块用于配置prometheus server的全局配置，有两个选择，scrape_interval控制普罗米修斯收集目标的频率，可以为单个目标覆盖这个值，evaluation_interval选项控制prometheus多久校验规则的能力，Prometheus使用规则创建新的时间序列并生成警报。

alerting模块是配置Alertmanager 的。

rule_files，模块指定我们希望Prometheus服务器加载的任何规则的位置。现在我们没有规则.

scrape_configs模块配置prometheus要收集的监控资源，由于Prometheus还将自己的数据公开为HTTP端点，因此它可以抓取并监控自身的健康状况。在默认中有一个叫prometheus的job,这个是用于收集Prometheus自身产生的数据，static_configs用于配置服务暴露的端口。

启动prometheus

./prometheus --config.file=prometheus.yml

通过访问http://localhost:9090. 来验证是否启动成功，Prometheus每三十秒向自己请求一次收集数据，请求地址为http://localhost:9090/metrics.

使用表达式浏览

http://localhost:9090/graph请求此地址就可以进行页面搜索，比如查询promhttp_metric_handler_requests_total这个指标的值，这个指标上有不同的级别，可以通过promhttp_metric_handler_requests_total{code="200"}这种方式查看http状态码是200的指标，可以执行运算count(promhttp_metric_handler_requests_total)统计这个指标有多少个不同的级别。

更多关于表达式的部分，查看https://prometheus.io/docs/querying/basics/详细文档。

使用图形界面

在Graph标签中，如下一个为统计普罗米修斯每分钟返回http状态码是200的请求数率。

rate(promhttp_metric_handler_requests_total{code="200"}[1m])

监控其他目标

仅仅查看prometheus自身的数据是体现不了它的强大的，可以查看监控一些主机信息Monitoring Linux or macOS host metrics using a node exporter

概念

数据模型

指标名称和级别名称

指标名称和级别,指标名称是必须能被正则匹配到的：[a-zA-Z_:][a-zA-Z0-9_:]*,:是一个预留符号。
级别名字也必须匹配这个正则:[a-zA-Z_][a-zA-Z0-9_]*,以__开头的标签名称保留供内部使用

样本

每个样本包括如下的数据。

一个float64的值。
毫秒级别的时间戳。

Element Value

prometheus_http_request_duration_seconds_count{handler=“/api/v1/query”,instance=“10.40.10.203:9090”,job=“prometheus”} 146 @1552299953.836

Element	Value
prometheus_http_request_duration_seconds_count{handler=“/api/v1/query”,instance=“10.40.10.203:9090”,job=“prometheus”}	146 @1552299953.836

符号

<metric name>{<label name>=<label value>, ...}使用这种方式来进行标示。

指标类型

Prometheus客户端库提供四种核心度量标准类型。

Counter

计算器是一种只加不减的计数器，除非发生充值，一般在定义Counter类型指标的名称时推荐使用_total作为后缀。

Gauge

可增可减的仪表盘,Gauge类型的指标侧重于反应系统的当前状态,因此这类指标的样本数据可增可减。

Histogram

Histogram和Summary主用用于统计和分析样本的分布情况。

最佳实践

指南

Go应用程序监控

添加到代码中

package main

import (
        "net/http"

        "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
        http.Handle("/metrics", promhttp.Handler())
        http.ListenAndServe(":2112", nil)
}

启动go run main.go函数
查看指标：curl http://localhost:2112/metrics

配置文件来做`scrape_configs` 的服务发现

在prometheus.yml中添加一个scrape_configs的子配置项

scrape_configs:
- job_name: 'redis3.2'
  file_sd_configs:
  - files:
    - 'redis_targets.json'

添加一个redis_targets.json的文件

[
  {
    "labels": {
      "job": "redis205"
    },
    "targets": [
      "localhost:9100"
    ]
  }
]

当更改了redis_targets.json是不用重启服务，prometheus会自动添加服务

[
    {
    "labels": {
      "job": "redis205"
    },
    "targets": [
      "localhost:9100"
    ]
  },
    {
    "targets": [
      "localhost:9200"
    ],
    "labels": {
      "job": "redis206"
    }
  }
]

Prometheus 初探一

概述

主要特性

组件

架构

什么时候适用

什么时候不适用

Prometheus 第一步

配置prometheus

启动prometheus

使用表达式浏览

使用图形界面

监控其他目标

概念

数据模型

指标名称和级别名称

样本

符号

指标类型

Counter

Gauge

Histogram

最佳实践

指南

Go应用程序监控

配置文件来做scrape_configs 的服务发现

配置文件来做`scrape_configs` 的服务发现