Prometheus Learning Series (32) of the write client libraries

This document describes the functions and API Prometheus client libraries should provide, aimed at achieving consistency between libraries, simplifying the use use cases, avoid providing user functionality could lead to the wrong road.

As of this writing it has supported 10 languages, so we now have a good understanding of how to write a client. These guidelines are intended to help authors generate new client library a good library.

A, Conventions Conventions

MUST / MUST NOT / SHOULD / SHOULD NOT / MAY have the meanings given in https://www.ietf.org/rfc/rfc2119.txt

In addition, ENCOURAGED means that a function is ideal for libraries, it does not exist, but if it can. In other words, a good.

Remember the following points:

  • Use function for each language.
  • Common use cases should be very simple
  • Do things the right way is the simplest way
  • A more complex example should be possible

Common use cases (ordered):

  • No label Counters spread between the Library / Application
  • Summaries / Histograms timing function / code block
  • Gauges tracking the current state of things
  • Batch job monitoring
Second, the overall structure

The client must be prepared for internal callback. Customers should generally follow the structure described here.

The key class is Collector. There is a method (commonly referred to collect), returns zero or more indicators of its sample. CollectorIn the CollectorRegistryregistration. By CollectorRegistrypassing to class/method/function``bridgebe publicly available data, returns the class index format supported Prometheus. Every time we crawl CollectorRegistry, it must have a callback for each Collectorof the collectmethods.

Most users interact with the interface is the Counter, Gauge, Summaryand HistogramCollectors. These represent a Metrics should cover the vast majority of users are using use cases own code.

More advanced use cases (e.g., from another monitoring / detection system Agent) writing custom Collector. Some people may want to write a bridge, which uses CollectorRegistryand generate data in different formats monitor / instrument system appreciated, thereby allowing the user need only consider a system instrument.

CollectorRegistryIt should provide register()/unregister()functions and should be allowed to register a plurality of collector CollectorRegistrys.

The client library must be thread-safe.

For non-OO languages, C client library should follow the spirit of this structure as much as possible.

2.1 Naming

The client library should follow this document mentioned function/method/class, remember naming conventions of the language they use. For example, a set_to_current_time()suitable method name Python, but SetToCurrentTime()better in Go, setToCurrentTime()is in Java conventions. If the name is different for technical reasons (for example, does not allow function overloading), documentation / help string should point users to a different name.

Library may not provide the same or similar names function / method / class given here, but with different semantics.

Three, Metrics

Counter, Gauge, SummaryAnd Histogrammetrics is the most important type of interface.

CounterAnd Gaugemust be part of the client library. SummaryAnd Histogramat least a.

These should be used as the main file static variables, i.e., the global variables are defined in the code are detected with the same file. The client library should enable it. Common use case is to write the whole piece of code, instead of writing code in the context of an object instance. Users do not have to worry about managing their indicators in their code, the client library should do it for them (if not, the user will write a wrapper around the library to make it "easier" - this is rarely tend Okay) .

There must be a default CollectorRegistry, default, standard metrics must be registered with the implicit in it, without requiring the user any special work. There must be a way to register the default indexes CollectorRegistryfor use in the batch job and unit tests. Custom collectors should also follow this.

How exactly should create the index varies by language. For some people (Java, Go), builder method is the best, while for others (Python), function arguments are abundant, can be completed in a single call.

For example, in Java Simpleclient, we have:

class YourClass {
  static final Counter requests = Counter.build()
      .name("requests_total")
      .help("Requests.").register();
}

It will use the default CollectorRegistryregistration request. By calling build()instead register(), metrics will not be registered (easy unit testing), you can also CollectorRegistrybe passed to register()(ease of batch jobs).

3.1 Counter

Counter[ Https://prometheus.io/docs/concepts/metric_types/#counter ] is a monotonically increasing counter. It does not allow counter value decreases, but it may be reset to 0 (for example: Client Service restarted).

A counter must have the following methods:

  • inc(): Increments of 1.
  • inc(double v): Increase the reference value v. Must be checked v> = 0.

A Counterencourage:

A computing method throw / throw an exception, and optionally only certain types of anomalies in a given code segment. This is in Python count_exceptions.

Counter must start from zero.

3.2 Gauge

Gauge represents the value of a can fluctuate.

gauge must have the following methods:

  • inc(): Increments of 1
  • inc(double v): Every increase the reference value v
  • dec(): Every decrease of 1
  • dec(double v): Every time reduce a given value v
  • set(double v): To set the gauge value v

Gauges values ​​must start from zero, you can provide a method for a given scale, starting with a different number.

gauge should have the following methods:

  • set_to_current_time(): The unix gauge set to the current time (in seconds).

gauge have been proposed:
a method of some code request / progress tracking functions. This is in Python track_inprogress.

One kind of a code timing and duration for which the instrument is provided a method for, in seconds. This is useful for batch jobs. This is in Java startTimer/setDurationand Python in the time()decorator / Context Manager. This should Summary/Histogrammatch the pattern (albeit set()rather not observe()).

3.3 Summary

summary window sample observation (usually required duration), and provides its distribution, and the sum of the instant frequency observed by the sliding time.

SummaryNever allow the user to "quantile" is set to the label name, because it is used internally to specify the summary quantile. A Summaryis ENCOURAGED provide quantile as exports, although these can not be aggregated, often very slow. Summary must allow no quantile as _count/_sumvery useful, it must be the default value.

SummaryYou must have the following methods:

  • observe(double v): To observe the given amount

SummaryYou should have the following methods:

Some methods can provide users with the timing within a few seconds. In Python, this is time()a decorator / Context Manager. In Java, it is startTimer/observeDuration. Must not be provided by units other than seconds (If you want other things, they can be done manually). This should follow the Gauge/Histogramsame pattern.

Summary``_count/_sumWe must start from zero.

3.4 Histogram

Histogram allowed polymerizable event distribution, for example the request delay. This is the core of each bucket.

HistogramNot allow lea user to set the label, because lethe inside is used to specify bucket.

HistogramNecessary to provide a method of manually selecting bucket. It should be provided to linear(start, width, count)and exponential(start, factor, count)disposed bucket method. Count must exclude +Infbarrel.

HistogramIt should have the same default buckets and other client libraries. After you create a metric, you may not change the bucket.

HistogramYou must have the following methods:

  • observe(double v): To observe the given amount

HistogramYou should have the following methods:

Some methods can provide users with the timing within a few seconds. In Python, this is time()a decorator / Context Manager. In Java, it is startTimer/observeDuration. Must not be provided by units other than seconds (If you want other things, they can be done manually). This should follow the Gauge/Summarysame pattern.

Histogram``_count/_sumAnd barrel must start from zero.

Further consideration of indicators

In addition to the above records for a given language meaningful indicators, it also provides additional functionality, which is ENCOURAGED.

If there is a common use case, then you can do it easier to do, as long as it does not encourage bad behavior (such as second-best measure / label layout, or client computing).

3.5 Label

Prometheus label is the most powerful aspects of one, but it is easy to abuse . Therefore, the client library must be very careful to provide labels to the user.

In any case, the client library does not allow users Gauge/Counter/Summary/Histogramor any other library provides Collectorthe same metrics specify a different label names.

Since metrics defined in the collector almost always have the same label name. Because there are still rare but effective use cases, but it is not, the client library should verify this.

While the label is powerful, but most indicators are no labels. Therefore, API should allow the label but does not dominate it.

The client library must allow the Gauge/Counter/Summary/Histogramlabel name is specified when creating the list. The client library should support any number of label names. The client library must verify whether the record label name requirements.

Provide a measure of the dimension of the visit marked the general approach is to use the labels()method to obtain a list of values or labels from the label name to map tag value and return "Child". You can then call on the Child usual .inc()/.dec()/.observe()methods.

labels()Children should be returned by the user cache to avoid look again - this is very important in delaying the key code.

Tagged metrics should support a remove()method that has the labels()same signature, it will no longer be exported from its metric delete Child, as well as a measure to remove all Children from the clear()method. These invalid cache children.

It should be a default value is used to initialize method given Child, usually just call labels(). No label must always be initialized metrics in order to avoid the problem of lack of metrics.

3.6 metric name

Metric names must follow the specification. As with the label name, you must meet Gauge/Counter/Summary/Histogramas well as any other library supplied with the Collector.

Many client library provides the name of the set of three parts: namespace_subsystem_name, which only nameis required.

Unless a custom collector proxy from other detection / monitoring system shall not prohibit metric name News / or metrics generated sub-part of the standard names. Build / dynamic metric names that you should use the logo tag.

3.7 metric description and help

Gauge/Counter/Summary/HistogramWe must request metrics Description / help.

A description of any custom collector with the client library must have its index / help.

It recommended as mandatory parameters, but do not check if it has a certain length, as if someone really do not want to write the document, otherwise we will not convince them. The Library provides collectors (in fact, we can be anywhere in the ecosystem) should have a good description of the measure, lead by example.

Fourth, export

Text-based format document outlines fair client must implement the export format .

If the exposure metric can be realized in order reproducible without significant resource cost situation is ENCOURAGED (particularly for human readable format).

Fifth, standardization and run-time collector

The client library should provide standard export function, as shown below.

These should be implemented as a custom collection, registration and default on the default CollectorRegistry. There should be a way to disable them, because I have very little use cases prevent them.

5.1 processing metrics

These export should be prefixed process_. If one of the variables is not disclosed or language runtime, then it will not export it. All memory values, in bytes, All times are GMT unixtime/seconds.

Name metrics meaning unit
process_cpu_seconds_total User and system CPU time spent second
process_open_fds The number of open file descriptors File descriptor
process_max_fds Open Descriptor maximum File descriptor
process_virtual_memory_bytes Virtual Memory Size byte
process_resident_memory_bytes Resident memory size byte
process_heap_bytes Process head heap size byte
process_start_time_seconds unix time second
5.2 run-time metrics

In addition, to encourage the client libraries provide metrics that language runtime (such as garbage collection statistics), and to provide appropriate prefix, such as go_, hostspot_and so on.

Sixth, unit testing

The client library should have unit tests covering core tool library and fair.

The client library users are encouraged to provide a convenient method for unit testing its instrumentation code. For example, Python is CollectorRegistry.get_sample_value.

Seven packages and dependencies

Ideally, the client library may be included in any application, to add some tests without damage the application.

Therefore, when you add a dependency to the client library, caution is recommended. For example, if you add the use of Prometheus client library, the client needs xy version of the application using the library but xz elsewhere, then this application will have a negative impact on you?

When this suggestion may appear, with the core tools to bridge given metric format / display separately. For example, Java simpleclient simpleclientmodule no dependencies, simpleclient_servlethaving HTTP bits.

Eight, performance considerations

Since the client libraries must be thread-safe, thus requiring some form of concurrency control, and must consider the performance of multi-core machines and applications.

In our experience, the worst effect is the mutex.

Atomic instruction processor is often in the middle, and is generally acceptable.

Different avoid the same RAM CPU changes the most effective method, for example in the Java simpleclient DoubleAdder. But with memory cost.

As noted above, labels()the results should be cached. The return metrics tend to use the label concurrent maps tend to be relatively slow. No special sleeve label indicators can be avoided labels()- like look and they can provide a lot of help.

Metrics should be in an up / down to avoid obstruction / settings, because when rubbed entire application is being prevented is not desirable.

The main instrument operation (including labeling) benchmark is encouraged.

During Expo, resource consumption should be kept in mind, especially RAM. Consider reduces the memory by streaming a result, complicated and may limit the number of erase.

Nine, links

Prometheus official website address: https://prometheus.io/
my Github: https://github.com/Alrights/prometheus

Guess you like

Origin blog.csdn.net/Coffin_monkey/article/details/93380671