Statsd Exporter 如何跳过 prom 的指标检查
statsd exporter 中构造一些特殊的指标
statsd exporter 是 prometheus 官方提供的,将 statsd 指标转换成 prom 指标的 exporter。
statsd exporter 中,可以创建名字相同但 label 不同的同名指标。例如下面的这种序列,
qae_app_request_latencies_count{app="mrpink",role="web",task="default"} 1
qae_app_request_latencies_count{app="mrpink",itf="count_by_item",role="service",service_type="thrift",task="ThriftSvcInstance"} 1
我们用个简单例子来说明一下:
准备 statsd exporter 的配置文件
- exporter_config.yaml
defaults:
# 所有指标使用 glob 匹配的模式
match_type: glob
glob_disable_ordering: false
ttl: 5m #
mappings:
- match: "qae.*.web.*.req_time"
name: "qae_app_request_latencies"
match_metric_type: timer
timer_type: histogram
labels:
app: "$1"
role: "web"
task: "$2"
# labels 中的 $1, $2 等是从 match 中捕获的匹配组
- match: "qae.*.service.*.*.*.req_time"
name: "qae_app_request_latencies"
match_metric_type: timer
timer_type: histogram
labels:
app: "$1"
role: "service"
task: "$2"
itf: "$4"
service_type: "thrift"
# 其他不符合规范的指标都会被丢弃掉
- match: "."
match_type: regex
action: drop
name: "dropped"
启动 statsd exporter
在 statsd_exporter 的代码目录下执行 go build 命令,并启动它:
$ go build -o /tmp/exporter . && /tmp/exporter --statsd.mapping-config=exporter_config.yaml --log.level=debug
ts=2023-05-31T23:42:11.802Z caller=main.go:292 level=info msg="Starting StatsD -> Prometheus Exporter" version="(version=, branch=, revision=f40cab3899c8effd25a5daee6f69e30eea796a96-modified)"
ts=2023-05-31T23:42:11.802Z caller=main.go:293 level=info msg="Build context" context="(go=go1.18.8, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2023-05-31T23:42:11.803Z caller=main.go:342 level=info msg="Accepting StatsD Traffic" udp=:9125 tcp=:9125 unixgram=
ts=2023-05-31T23:42:11.803Z caller=main.go:343 level=info msg="Accepting Prometheus Requests" addr=:9102
# 可以看到,statsd server 运行在 9125 端口,prom server 运行在 9102 端口
使用 statsd client 向 statsd exporter 发送指标
- 发送指标的代码
package main
import (
"github.com/cactus/go-statsd-client/v5/statsd"
"log"
"time"
)
func sendMetrics(client statsd.Statter) {
err := client.TimingDuration("web.default.req_time", time.Millisecond*500, 1)
if err != nil {
log.Fatalln(err)
}
err = client.TimingDuration("service.ThriftSvcInstance.ThriftSvcInstance.count_by_item.req_time", time.Millisecond*300, 1)
if err != nil {
log.Fatalln(err)
}
}
func main() {
config := &statsd.ClientConfig{
Address: "127.0.0.1:9125",
Prefix: "qae.mrpink",
}
client, err := statsd.NewClientWithConfig(config)
if err != nil {
log.Fatal(err)
}
defer client.Close()
sendMetrics(client)
}
执行上述代码,就会向 statsd exporter 发送两个 timer 指标
qae.mrpink.web.default.req_time:500|ms
qae.mrpink.service.ThriftSvcInstance.ThriftSvcInstance.count_by_item.req_time:300|ms
检查 statsd exporter 的 prom 指标
$ curl -s localhost:9102/metrics | rg qae_app_request_latencies_count
qae_app_request_latencies_count{app="mrpink",role="web",task="default"} 1
qae_app_request_latencies_count{app="mrpink",itf="count_by_item",role="service",service_type="thrift",task="ThriftSvcInstance"} 1
可以看到,statsd exporter 中存储了两个同名的指标序列 qae_app_request_latencies_count
, 但是他们的 label 却不同。
使用 prometheus client_golang 构造同样的指标就会报错
如果我们想要使用 prometheus 提供的的官方客户端 client_golang, 发送同样的指标,就会遇到错误
package main
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
"log"
"net/http"
"time"
)
var (
AppRequestLatencies = promauto.NewHistogramVec(prometheus.HistogramOpts{
Name: "qae_app_request_latencies",
Help: "web request count and req time latency",
}, []string{"app", "role", "task"})
ThriftAppRequestLatencies = promauto.NewHistogramVec(prometheus.HistogramOpts{
Name: "qae_app_request_latencies",
Help: "service request count and req time latency",
}, []string{"app", "role", "task", "itf", "service_type"})
)
func SendMetrics() {
dur1 := time.Duration(300) * time.Millisecond
AppRequestLatencies.With(map[string]string{
"app": "mrpink",
"role": "web",
"task": "default",
}).Observe(dur1.Seconds())
dur2 := time.Duration(500) * time.Millisecond
ThriftAppRequestLatencies.With(map[string]string{
"app": "mrpink",
"role": "service",
"task": "ThriftSvcInstance",
"service_type": "thrift",
"itf": "count_by_item",
}).Observe(dur2.Seconds())
log.Println("Send metrics done")
}
func main() {
addr := "0.0.0.0:8090"
promMux := http.NewServeMux()
promMux.Handle("/metrics", promhttp.Handler())
go func() {
time.Sleep(time.Second * 3)
SendMetrics()
}()
err := http.ListenAndServe(addr, promMux)
log.Printf("Start prom server on %v\n", addr)
if err != nil {
log.Fatalf("Starting the http server %v failed: %v\n", addr, err)
}
}
执行以上代码,会抛出 panic 异常
$ go run prom_client/main.go
panic: a previously registered descriptor with the same fully-qualified name as Desc{fqName: "qae_app_request_latencies", help: "service request count and req time latency", constLabels: {}, variableLabels: [{app <nil>} {role <nil>} {task <nil>} {itf <nil>} {service_type <nil>}]} has different label names or a different help string
goroutine 1 [running]:
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0x0?, {0xc000118010?, 0x1, 0x0?})
/home/xuyundong/.gvm/pkgsets/go1.18.8/global/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/registry.go:405 +0x7f
github.com/prometheus/client_golang/prometheus/promauto.Factory.NewHistogramVec({{0x922410?, 0xc0000a2960?}}, {{0x0, 0x0}, {0x0, 0x0}, {0x87e012, 0x19}, {0x88794e, 0x2a}, ...}, ...)
/home/xuyundong/.gvm/pkgsets/go1.18.8/global/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promauto/auto.go:362 +0x1cc
github.com/prometheus/client_golang/prometheus/promauto.NewHistogramVec(...)
/home/xuyundong/.gvm/pkgsets/go1.18.8/global/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promauto/auto.go:235
main.init()
/home/xuyundong/Github/Golang/statsd_client/prom_client/main.go:17 +0x269
exit status 2
Prometheus client_golang 的检查逻辑
于是我有些好奇,statsd exporter 是如何做到,可以发送同名不同 label 的指标的。 简单的阅读了 client_golang 和 statsd_exporter 的指标之后,我找到了答案。
我阅读的代码的版本是
client_golang 中定义了两个接口
client_golang 中每种指标都实现了 prometheus.Collector
, 所有的指标会注册到一个实现了 prometheus.Registerer
接口的实例中,再由此实例收集并输出指标。
我们在代码中定义指标的代码是:
opts := prometheus.HistogramOpts{
Name: "qae_app_request_latencies",
Help: "web request count and req time latency",
}
labels := []string{"app", "role", "task"}
AppRequestLatencies = promauto.NewHistogramVec(opts, labels)
client_golang 实际执行的代码如下, 它会注册一个 Histogram 指标,并将其注册到 DefaultRegisterer
中。DefaultRegisterer
就是一个实现了 prometheus.Registerer
接口的实例。
h := prometheus.NewHistogramVec(opts, labelNames)
prometheus.DefaultRegisterer.MustRegister(h)
// MustRegister 底层调用了 Register 方法,在 Register 返回 err 时会 panic
// prometheus.DefaultRegisterer.Register(h)
prometheus.Collector
接口有一个 Describe
方法,它用于获取指标的信息。具体做法是传入一个 channel 参数,指标将 Desc 信息写入到 channel 中。
type Collector interface {
# Desc 包含指标名, label, help 等信息
Describe(chan<- *Desc)
....
}
接着我们来看一下 DefaultRegisterer
的 Register 函数的实现,
// 以下代码中我省略了一些无关的内容
func (r *Registry) Register(c Collector) error {
// c 是 Collector 对象
var (
descChan = make(chan *Desc, capDescChan)
)
go func() {
c.Describe(descChan)
close(descChan)
}()
r.mtx.Lock()
defer func() {
// 为了防止 goroutine 泄漏,descChan 必须被消费完
for range descChan {
}
r.mtx.Unlock()
}()
....
for desc := range descChan {
// desc.id 是根据指标名和固定 label 名算出来的 hash,
// 如果注册的两个指标 name 和固定 label 名完全相同,则返回错误 duplicateDescErr
if _, exists := r.descIDs[desc.id]; exists {
duplicateDescErr = fmt.Errorf("descriptor %s already exists with the same fully-qualified name and const label values", desc)
}
// fqName 是指标名
// dimHash 是 根据指标 label 名 + help 算出来的 hash
// 如果注册了两个同名的指标,但是它们的 label 或者 help 信息不同的话,则会返回错误
if dimHash, exists := r.dimHashesByName[desc.fqName]; exists {
if dimHash != desc.dimHash {
return fmt.Errorf("a previously registered descriptor with the same fully-qualified name as %s has different label names or a different help string", desc)
}
} else {
...
}
从以上代码可以看出,Register
中会通过 Describe
方法拿到指标的信息,并进行若干检查。如果两个同名指标的 label 名不同,则会返回错误 a previously registered descriptor with the same fully-qualified name as Desc... has different label names or a different help string
。
statsd exporter 中跳过检查的方法
那么 statsd exporter 是如何跳过上述检查的呢,很简单,让 Describe
返回空就好。statsd exporter 中的每个指标都用 uncheckedCollector 包裹了一下。
uncheckedCollector
的定义如下
type uncheckedCollector struct {
c prometheus.Collector
}
func (u uncheckedCollector) Describe(_ chan<- *prometheus.Desc) {}
func (u uncheckedCollector) Collect(c chan<- prometheus.Metric) {
u.c.Collect(c)
}
它的 Describe
方法什么也不返回,那么 Register
中的检查也会被跳过了。
这样最终实现了注册同名不同 label 指标的目的。