PromQL指南
声明
本文是个人对Querying basics | Prometheus的翻译、整理和小部分扩展,翻译内容基于Prometheus版本: 2.43
本篇翻译的目的是为了帮助更多的读者了解Prometheus官方文档中的内容,同时为中文用户提供更便利地阅读体验
Prometheus的国际化工作在发文截止前(2023-04-24 11:22:20)还处于讨论阶段,所以我没有向官方提交PR,详情请看https://github.com/prometheus/docs/issues/2151。
如有任何版权问题或翻译错误,请您通过电子邮件联系我:956465331@qq.com或:ufovsmba@gmail.com。
基础
概述
Prometheus提供了一种称为PromQL(Prometheus Query Language)的函数式查询语言,让我们可以在真实的时间查询和整合时间序列数据。表达式的结果不单单可以在Prometheus表达式浏览器中表示为图表,表格数据,还能通过HTTP API被外部使用。
示例
本文档仅供参考,对于学习,可能从几个示例入手会更简单
表达式语言数据类型
在Prometheus的表达式中,一个表达式或者子表达式可以评定为四种类型中的其中一种:
- Instant vector(即使向量) - 一组时间序列,每个时间序列包含一个样本,所有样本共享相同的时间戳
- Range vector(范围向量) - 一组时间序列,包含每个时间序列随时间变化的数据点范围
- Scalar(标量) - 简单的浮点数值
- String(字符串) - 简单的字符串值; 目前未使用
根据用例的不一样(即当图表化与显示表达式输出时),只有其中的一些类型是作为用户指定表达式的结果合法的。例如,返回instant vector的表达式是唯一可以直接绘制图表的类型。
Notes about the experimental native histograms:
- Ingesting native histograms has to be enabled via a feature flag.
- Once native histograms have been ingested into the TSDB (and even after disabling the feature flag again), both instant vectors and range vectors may now contain samples that aren’t simple floating point numbers (float samples) but complete histograms (histogram samples). A vector may contain a mix of float samples and histogram samples.
字面量(Literals)
字符串字面量(String literals)
字符串可以用单引号、双引号或反引号表示为字面量。
PromQL 遵循 Go的转义规则。在单引号或双引号中,反斜杠开始转义序列,可能后跟 a
、b
、f
、n
、r
、t
、v
或\
。可以使用八进制(\nnn
)或十六进制(\xnn
、\unnnn
和 \Unnnnnnnn
)提供特定字符。
反引号内不处理转义。与 Go 不同,Prometheus 不会在反引号内丢弃换行符。
示例:
1 | "this is a string" |
浮点字面量(Float literals)
浮点数值标量可以按这样格式写为数据或者浮点数(空格只是为了更好的可读性):
1 | [-+]?( |
示例:
1 | 23 |
Time series Selectors
Instant vector selectors
Instant vector selectors allow the selection of a set of time series and a single sample value for each at a given timestamp (instant): in the simplest form, only a metric name is specified. This results in an instant vector containing elements for all time series that have this metric name.
This example selects all time series that have the metric name:http_requests_total
1 | http_requests_total |
It is possible to filter these time series further by appending a comma separated list of label matchers in curly braces ().{}
This example selects only those time series with the metric name that also have the label set to and their label set to :http_requests_total``job``prometheus``group``canary
1 | http_requests_total{job="prometheus",group="canary"} |
It is also possible to negatively match a label value, or to match label values against regular expressions. The following label matching operators exist:
=
: Select labels that are exactly equal to the provided string.!=
: Select labels that are not equal to the provided string.=~
: Select labels that regex-match the provided string.!~
: Select labels that do not regex-match the provided string.
Regex matches are fully anchored. A match of is treated as .env=~"foo"``env=~"^foo$"
For example, this selects all time series for , , and environments and HTTP methods other than .http_requests_total``staging``testing``development``GET
1 | http_requests_total{environment=~"staging|testing|development",method!="GET"} |
Label matchers that match empty label values also select all time series that do not have the specific label set at all. It is possible to have multiple matchers for the same label name.
Vector selectors must either specify a name or at least one label matcher that does not match the empty string. The following expression is illegal:
1 | {job=~".*"} # Bad! |
In contrast, these expressions are valid as they both have a selector that does not match empty label values.
1 | {job=~".+"} # Good! |
Label matchers can also be applied to metric names by matching against the internal label. For example, the expression is equivalent to . Matchers other than (, , ) may also be used. The following expression selects all metrics that have a name starting with :__name__``http_requests_total``{__name__="http_requests_total"}``=``!=``=~``!~``job:
1 | {__name__=~"job:.*"} |
The metric name must not be one of the keywords , , , and . The following expression is illegal:bool``on``ignoring``group_left``group_right
1 | on{} # Bad! |
A workaround for this restriction is to use the label:__name__
1 | {__name__="on"} # Good! |
All regular expressions in Prometheus use RE2 syntax.
Range Vector Selectors
Range vector literals work like instant vector literals, except that they select a range of samples back from the current instant. Syntactically, a time duration is appended in square brackets () at the end of a vector selector to specify how far back in time values should be fetched for each resulting range vector element.[]
In this example, we select all the values we have recorded within the last 5 minutes for all time series that have the metric name and a label set to :http_requests_total``job``prometheus
1 | http_requests_total{job="prometheus"}[5m] |
Time Durations
Time durations are specified as a number, followed immediately by one of the following units:
ms
- millisecondss
- secondsm
- minutesh
- hoursd
- days - assuming a day has always 24hw
- weeks - assuming a week has always 7dy
- years - assuming a year has always 365d
Time durations can be combined, by concatenation. Units must be ordered from the longest to the shortest. A given unit must only appear once in a time duration.
Here are some examples of valid time durations:
1 | 5h |
Offset modifier
The modifier allows changing the time offset for individual instant and range vectors in a query.offset
For example, the following expression returns the value of 5 minutes in the past relative to the current query evaluation time:http_requests_total
1 | http_requests_total offset 5m |
Note that the modifier always needs to follow the selector immediately, i.e. the following would be correct:offset
1 | sum(http_requests_total{method="GET"} offset 5m) // GOOD. |
While the following would be incorrect:
1 | sum(http_requests_total{method="GET"}) offset 5m // INVALID. |
The same works for range vectors. This returns the 5-minute rate that had a week ago:http_requests_total
1 | rate(http_requests_total[5m] offset 1w) |
For comparisons with temporal shifts forward in time, a negative offset can be specified:
1 | rate(http_requests_total[5m] offset -1w) |
Note that this allows a query to look ahead of its evaluation time.
@ modifier
The modifier allows changing the evaluation time for individual instant and range vectors in a query. The time supplied to the modifier is a unix timestamp and described with a float literal. @``@
For example, the following expression returns the value of at :http_requests_total``2021-01-04T07:40:00+00:00
1 | http_requests_total @ 1609746000 |
Note that the modifier always needs to follow the selector immediately, i.e. the following would be correct:@
1 | sum(http_requests_total{method="GET"} @ 1609746000) // GOOD. |
While the following would be incorrect:
1 | sum(http_requests_total{method="GET"}) @ 1609746000 // INVALID. |
The same works for range vectors. This returns the 5-minute rate that had at :http_requests_total``2021-01-04T07:40:00+00:00
1 | rate(http_requests_total[5m] @ 1609746000) |
The modifier supports all representation of float literals described above within the limits of . It can also be used along with the modifier where the offset is applied relative to the modifier time irrespective of which modifier is written first. These 2 queries will produce the same result.@``int64``offset``@
1 | # offset after @ |
Additionally, and can also be used as values for the modifier as special values.start()``end()``@
For a range query, they resolve to the start and end of the range query respectively and remain the same for all steps.
For an instant query, and both resolve to the evaluation time.start()``end()
1 | http_requests_total @ start() |
Note that the modifier allows a query to look ahead of its evaluation time.@
Subquery
Subquery allows you to run an instant query for a given range and resolution. The result of a subquery is a range vector.
Syntax: <instant_query> '[' <range> ':' [<resolution>] ']' [ @ <float_literal> ] [ offset <duration> ]
<resolution>
is optional. Default is the global evaluation interval.
Operators
Prometheus supports many binary and aggregation operators. These are described in detail in the expression language operators page.
Functions
Prometheus supports several functions to operate on data. These are described in detail in the expression language functions page.
Comments
PromQL supports line comments that start with . Example:#
1 | # This is a comment |
Gotchas
Staleness
When queries are run, timestamps at which to sample data are selected independently of the actual present time series data. This is mainly to support cases like aggregation (, , and so on), where multiple aggregated time series do not exactly align in time. Because of their independence, Prometheus needs to assign a value at those timestamps for each relevant time series. It does so by simply taking the newest sample before this timestamp.sum``avg
If a target scrape or rule evaluation no longer returns a sample for a time series that was previously present, that time series will be marked as stale. If a target is removed, its previously returned time series will be marked as stale soon afterwards.
If a query is evaluated at a sampling timestamp after a time series is marked stale, then no value is returned for that time series. If new samples are subsequently ingested for that time series, they will be returned as normal.
If no sample is found (by default) 5 minutes before a sampling timestamp, no value is returned for that time series at this point in time. This effectively means that time series “disappear” from graphs at times where their latest collected sample is older than 5 minutes or after they are marked stale.
Staleness will not be marked for time series that have timestamps included in their scrapes. Only the 5 minute threshold will be applied in that case.
Avoiding slow queries and overloads
If a query needs to operate on a very large amount of data, graphing it might time out or overload the server or browser. Thus, when constructing queries over unknown data, always start building the query in the tabular view of Prometheus’s expression browser until the result set seems reasonable (hundreds, not thousands, of time series at most). Only when you have filtered or aggregated your data sufficiently, switch to graph mode. If the expression still takes too long to graph ad-hoc, pre-record it via a recording rule.
This is especially relevant for Prometheus’s query language, where a bare metric name selector like could expand to thousands of time series with different labels. Also keep in mind that expressions which aggregate over many time series will generate load on the server even if the output is only a small number of time series. This is similar to how it would be slow to sum all values of a column in a relational database, even if the output value is only a single number.api_http_requests_total
运算符
Binary operators
Prometheus’s query language supports basic logical and arithmetic operators. For operations between two instant vectors, the matching behavior can be modified.
Arithmetic binary operators
The following binary arithmetic operators exist in Prometheus:
+
(addition)-
(subtraction)*
(multiplication)/
(division)%
(modulo)^
(power/exponentiation)
Binary arithmetic operators are defined between scalar/scalar, vector/scalar, and vector/vector value pairs.
Between two scalars, the behavior is obvious: they evaluate to another scalar that is the result of the operator applied to both scalar operands.
Between an instant vector and a scalar, the operator is applied to the value of every data sample in the vector. E.g. if a time series instant vector is multiplied by 2, the result is another vector in which every sample value of the original vector is multiplied by 2. The metric name is dropped.
Between two instant vectors, a binary arithmetic operator is applied to each entry in the left-hand side vector and its matching element in the right-hand vector. The result is propagated into the result vector with the grouping labels becoming the output label set. The metric name is dropped. Entries for which no matching entry in the right-hand vector can be found are not part of the result.
Trigonometric binary operators
The following trigonometric binary operators, which work in radians, exist in Prometheus:
atan2
(based on https://pkg.go.dev/math#Atan2)
Trigonometric operators allow trigonometric functions to be executed on two vectors using vector matching, which isn’t available with normal functions. They act in the same manner as arithmetic operators.
Comparison binary operators
The following binary comparison operators exist in Prometheus:
==
(equal)!=
(not-equal)>
(greater-than)<
(less-than)>=
(greater-or-equal)<=
(less-or-equal)
Comparison operators are defined between scalar/scalar, vector/scalar, and vector/vector value pairs. By default they filter. Their behavior can be modified by providing bool
after the operator, which will return 0
or 1
for the value rather than filtering.
Between two scalars, the bool
modifier must be provided and these operators result in another scalar that is either 0
(false
) or 1
(true
), depending on the comparison result.
Between an instant vector and a scalar, these operators are applied to the value of every data sample in the vector, and vector elements between which the comparison result is false
get dropped from the result vector. If the bool
modifier is provided, vector elements that would be dropped instead have the value 0
and vector elements that would be kept have the value 1
. The metric name is dropped if the bool
modifier is provided.
Between two instant vectors, these operators behave as a filter by default, applied to matching entries. Vector elements for which the expression is not true or which do not find a match on the other side of the expression get dropped from the result, while the others are propagated into a result vector with the grouping labels becoming the output label set. If the bool
modifier is provided, vector elements that would have been dropped instead have the value 0
and vector elements that would be kept have the value 1
, with the grouping labels again becoming the output label set. The metric name is dropped if the bool
modifier is provided.
Logical/set binary operators
These logical/set binary operators are only defined between instant vectors:
and
(intersection)or
(union)unless
(complement)
vector1 and vector2
results in a vector consisting of the elements of vector1
for which there are elements in vector2
with exactly matching label sets. Other elements are dropped. The metric name and values are carried over from the left-hand side vector.
vector1 or vector2
results in a vector that contains all original elements (label sets + values) of vector1
and additionally all elements of vector2
which do not have matching label sets in vector1
.
vector1 unless vector2
results in a vector consisting of the elements of vector1
for which there are no elements in vector2
with exactly matching label sets. All matching elements in both vectors are dropped.
Vector matching
Operations between vectors attempt to find a matching element in the right-hand side vector for each entry in the left-hand side. There are two basic types of matching behavior: One-to-one and many-to-one/one-to-many.
Vector matching keywords
These vector matching keywords allow for matching between series with different label sets providing:
on
ignoring
Label lists provided to matching keywords will determine how vectors are combined. Examples can be found in One-to-one vector matches and in Many-to-one and one-to-many vector matches
Group modifiers
These group modifiers enable many-to-one/one-to-many vector matching:
group_left
group_right
Label lists can be provided to the group modifier which contain labels from the “one”-side to be included in the result metrics.
Many-to-one and one-to-many matching are advanced use cases that should be carefully considered. Often a proper use of ignoring(<labels>)
provides the desired outcome.
Grouping modifiers can only be used for comparison and arithmetic. Operations as and
, unless
and or
operations match with all possible entries in the right vector by default.
One-to-one vector matches
One-to-one finds a unique pair of entries from each side of the operation. In the default case, that is an operation following the format vector1 <operator> vector2
. Two entries match if they have the exact same set of labels and corresponding values. The ignoring
keyword allows ignoring certain labels when matching, while the on
keyword allows reducing the set of considered labels to a provided list:
1 | <vector expr> <bin-op> ignoring(<label list>) <vector expr> |
Example input:
1 | method_code:http_errors:rate5m{method="get", code="500"} 24 |
Example query:
1 | method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m |
This returns a result vector containing the fraction of HTTP requests with status code of 500 for each method, as measured over the last 5 minutes. Without ignoring(code)
there would have been no match as the metrics do not share the same set of labels. The entries with methods put
and del
have no match and will not show up in the result:
1 | {method="get"} 0.04 // 24 / 600 |
Many-to-one and one-to-many vector matches
Many-to-one and one-to-many matchings refer to the case where each vector element on the “one”-side can match with multiple elements on the “many”-side. This has to be explicitly requested using the group_left
or group_right
modifiers, where left/right determines which vector has the higher cardinality.
1 | <vector expr> <bin-op> ignoring(<label list>) group_left(<label list>) <vector expr> |
The label list provided with the group modifier contains additional labels from the “one”-side to be included in the result metrics. For on
a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.
Example query:
1 | method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m |
In this case the left vector contains more than one entry per method
label value. Thus, we indicate this using group_left
. The elements from the right side are now matched with multiple elements with the same method
label on the left:
1 | {method="get", code="500"} 0.04 // 24 / 600 |
Aggregation operators
Prometheus supports the following built-in aggregation operators that can be used to aggregate the elements of a single instant vector, resulting in a new vector of fewer elements with aggregated values:
sum
(calculate sum over dimensions)min
(select minimum over dimensions)max
(select maximum over dimensions)avg
(calculate the average over dimensions)group
(all values in the resulting vector are 1)stddev
(calculate population standard deviation over dimensions)stdvar
(calculate population standard variance over dimensions)count
(count number of elements in the vector)count_values
(count number of elements with the same value)bottomk
(smallest k elements by sample value)topk
(largest k elements by sample value)quantile
(calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions by including a without
or by
clause. These clauses may be used before or after the expression.
1 | <aggr-op> [without|by (<label list>)] ([parameter,] <vector expression>) |
or
1 | <aggr-op>([parameter,] <vector expression>) [without|by (<label list>)] |
label list
is a list of unquoted labels that may include a trailing comma, i.e. both (label1, label2)
and (label1, label2,)
are valid syntax.
without
removes the listed labels from the result vector, while all other labels are preserved in the output. by
does the opposite and drops labels that are not listed in the by
clause, even if their label values are identical between all elements of the vector.
parameter
is only required for count_values
, quantile
, topk
and bottomk
.
count_values
outputs one time series per unique sample value. Each series has an additional label. The name of that label is given by the aggregation parameter, and the label value is the unique sample value. The value of each time series is the number of times that sample value was present.
topk
and bottomk
are different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector. by
and without
are only used to bucket the input vector.
quantile
calculates the φ-quantile, the value that ranks at number φ*N among the N metric values of the dimensions aggregated over. φ is provided as the aggregation parameter. For example, quantile(0.5, ...)
calculates the median, quantile(0.95, ...)
the 95th percentile. For φ = NaN
, NaN
is returned. For φ < 0, -Inf
is returned. For φ > 1, +Inf
is returned.
Example:
If the metric http_requests_total
had time series that fan out by application
, instance
, and group
labels, we could calculate the total number of seen HTTP requests per application and group over all instances via:
1 | sum without (instance) (http_requests_total) |
Which is equivalent to:
1 | sum by (application, group) (http_requests_total) |
If we are just interested in the total of HTTP requests we have seen in all applications, we could simply write:
1 | sum(http_requests_total) |
To count the number of binaries running each build version we could write:
1 | count_values("version", build_version) |
To get the 5 largest HTTP requests counts across all instances we could write:
1 | topk(5, http_requests_total) |
Binary operator precedence
The following list shows the precedence of binary operators in Prometheus, from highest to lowest.
^
*
,/
,%
,atan2
+
,-
==
,!=
,<=
,<
,>=
,>
and
,unless
or
Operators on the same precedence level are left-associative. For example, 2 * 3 % 2
is equivalent to (2 * 3) % 2
. However ^
is right associative, so 2 ^ 3 ^ 2
is equivalent to 2 ^ (3 ^ 2)
.
Operators for native histograms
Native histograms are an experimental feature. Ingesting native histograms has to be enabled via a feature flag. Once native histograms have been ingested, they can be queried (even after the feature flag has been disabled again). However, the operator support for native histograms is still very limited.
Logical/set binary operators work as expected even if histogram samples are involved. They only check for the existence of a vector element and don’t change their behavior depending on the sample type of an element (float or histogram).
The binary +
operator between two native histograms and the sum
aggregation operator to aggregate native histograms are fully supported. Even if the histograms involved have different bucket layouts, the buckets are automatically converted appropriately so that the operation can be performed. (With the currently supported bucket schemas, that’s always possible.) If either operator has to sum up a mix of histogram samples and float samples, the corresponding vector element is removed from the output vector entirely.
All other operators do not behave in a meaningful way. They either treat the histogram sample as if it were a float sample of value 0, or (in case of arithmetic operations between a scalar and a vector) they leave the histogram sample unchanged. This behavior will change to a meaningful one before native histograms are a stable feature.
函数
Some functions have default arguments, e.g. year(v=vector(time()) instant-vector)
. This means that there is one argument v
which is an instant vector, which if not provided it will default to the value of the expression vector(time())
.
Notes about the experimental native histograms:
- Ingesting native histograms has to be enabled via a feature flag. As long as no native histograms have been ingested into the TSDB, all functions will behave as usual.
- Functions that do not explicitly mention native histograms in their documentation (see below) effectively treat a native histogram as a float sample of value 0. (This is confusing and will change before native histograms become a stable feature.)
- Functions that do already act on native histograms might still change their behavior in the future.
- If a function requires the same bucket layout between multiple native histograms it acts on, it will automatically convert them appropriately. (With the currently supported bucket schemas, that’s always possible.)
abs()
abs(v instant-vector)
returns the input vector with all sample values converted to their absolute value.
absent()
absent(v instant-vector)
returns an empty vector if the vector passed to it has any elements (floats or native histograms) and a 1-element vector with the value 1 if the vector passed to it has no elements.
This is useful for alerting on when no time series exist for a given metric name and label combination.
1 | absent(nonexistent{job="myjob"}) |
In the first two examples, absent()
tries to be smart about deriving labels of the 1-element output vector from the input vector.
absent_over_time()
absent_over_time(v range-vector)
returns an empty vector if the range vector passed to it has any elements (floats or native histograms) and a 1-element vector with the value 1 if the range vector passed to it has no elements.
This is useful for alerting on when no time series exist for a given metric name and label combination for a certain amount of time.
1 | absent_over_time(nonexistent{job="myjob"}[1h]) |
In the first two examples, absent_over_time()
tries to be smart about deriving labels of the 1-element output vector from the input vector.
ceil()
ceil(v instant-vector)
rounds the sample values of all elements in v
up to the nearest integer.
changes()
For each input time series, changes(v range-vector)
returns the number of times its value has changed within the provided time range as an instant vector.
clamp()
clamp(v instant-vector, min scalar, max scalar)
clamps the sample values of all elements in v
to have a lower limit of min
and an upper limit of max
.
Special cases: - Return an empty vector if min > max
- Return NaN
if min
or max
is NaN
clamp_max()
clamp_max(v instant-vector, max scalar)
clamps the sample values of all elements in v
to have an upper limit of max
.
clamp_min()
clamp_min(v instant-vector, min scalar)
clamps the sample values of all elements in v
to have a lower limit of min
.
day_of_month()
day_of_month(v=vector(time()) instant-vector)
returns the day of the month for each of the given times in UTC. Returned values are from 1 to 31.
day_of_week()
day_of_week(v=vector(time()) instant-vector)
returns the day of the week for each of the given times in UTC. Returned values are from 0 to 6, where 0 means Sunday etc.
day_of_year()
day_of_year(v=vector(time()) instant-vector)
returns the day of the year for each of the given times in UTC. Returned values are from 1 to 365 for non-leap years, and 1 to 366 in leap years.
days_in_month()
days_in_month(v=vector(time()) instant-vector)
returns number of days in the month for each of the given times in UTC. Returned values are from 28 to 31.
delta()
delta(v range-vector)
calculates the difference between the first and last value of each time series element in a range vector v
, returning an instant vector with the given deltas and equivalent labels. The delta is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if the sample values are all integers.
The following example expression returns the difference in CPU temperature between now and 2 hours ago:
1 | delta(cpu_temp_celsius{host="zeus"}[2h]) |
delta
acts on native histograms by calculating a new histogram where each compononent (sum and count of observations, buckets) is the difference between the respective component in the first and last native histogram in v
. However, each element in v
that contains a mix of float and native histogram samples within the range, will be missing from the result vector.
delta
should only be used with gauges and native histograms where the components behave like gauges (so-called gauge histograms).
deriv()
deriv(v range-vector)
calculates the per-second derivative of the time series in a range vector v
, using simple linear regression. The range vector must have at least two samples in order to perform the calculation. When +Inf
or -Inf
are found in the range vector, the slope and offset value calculated will be NaN
.
deriv
should only be used with gauges.
exp()
exp(v instant-vector)
calculates the exponential function for all elements in v
. Special cases are:
Exp(+Inf) = +Inf
Exp(NaN) = NaN
floor()
floor(v instant-vector)
rounds the sample values of all elements in v
down to the nearest integer.
histogram_count()
and histogram_sum()
Both functions only act on native histograms, which are an experimental feature. The behavior of these functions may change in future versions of Prometheus, including their removal from PromQL.
histogram_count(v instant-vector)
returns the count of observations stored in a native histogram. Samples that are not native histograms are ignored and do not show up in the returned vector.
Similarly, histogram_sum(v instant-vector)
returns the sum of observations stored in a native histogram.
Use histogram_count
in the following way to calculate a rate of observations (in this case corresponding to “requests per second”) from a native histogram:
1 | histogram_count(rate(http_request_duration_seconds[10m])) |
The additional use of histogram_sum
enables the calculation of the average of observed values (in this case corresponding to “average request duration”):
1 | histogram_sum(rate(http_request_duration_seconds[10m])) |
histogram_fraction()
This function only acts on native histograms, which are an experimental feature. The behavior of this function may change in future versions of Prometheus, including its removal from PromQL.
For a native histogram, histogram_fraction(lower scalar, upper scalar, v instant-vector)
returns the estimated fraction of observations between the provided lower and upper values. Samples that are not native histograms are ignored and do not show up in the returned vector.
For example, the following expression calculates the fraction of HTTP requests over the last hour that took 200ms or less:
1 | histogram_fraction(0, 0.2, rate(http_request_duration_seconds[1h])) |
The error of the estimation depends on the resolution of the underlying native histogram and how closely the provided boundaries are aligned with the bucket boundaries in the histogram.
+Inf
and -Inf
are valid boundary values. For example, if the histogram in the expression above included negative observations (which shouldn’t be the case for request durations), the appropriate lower boundary to include all observations less than or equal 0.2 would be -Inf
rather than 0
.
Whether the provided boundaries are inclusive or exclusive is only relevant if the provided boundaries are precisely aligned with bucket boundaries in the underlying native histogram. In this case, the behavior depends on the schema definition of the histogram. The currently supported schemas all feature inclusive upper boundaries and exclusive lower boundaries for positive values (and vice versa for negative values). Without a precise alignment of boundaries, the function uses linear interpolation to estimate the fraction. With the resulting uncertainty, it becomes irrelevant if the boundaries are inclusive or exclusive.
histogram_quantile()
histogram_quantile(φ scalar, b instant-vector)
calculates the φ-quantile (0 ≤ φ ≤ 1) from a conventional histogram or from a native histogram. (See histograms and summaries for a detailed explanation of φ-quantiles and the usage of the (conventional) histogram metric type in general.)
Note that native histograms are an experimental feature. The behavior of this function when dealing with native histograms may change in future versions of Prometheus.
The conventional float samples in b
are considered the counts of observations in each bucket of one or more conventional histograms. Each float sample must have a label le
where the label value denotes the inclusive upper bound of the bucket. (Float samples without such a label are silently ignored.) The other labels and the metric name are used to identify the buckets belonging to each conventional histogram. The histogram metric type automatically provides time series with the _bucket
suffix and the appropriate labels.
The native histogram samples in b
are treated each individually as a separate histogram to calculate the quantile from.
As long as no naming collisions arise, b
may contain a mix of conventional and native histograms.
Use the rate()
function to specify the time window for the quantile calculation.
Example: A histogram metric is called http_request_duration_seconds
(and therefore the metric name for the buckets of a conventional histogram is http_request_duration_seconds_bucket
). To calculate the 90th percentile of request durations over the last 10m, use the following expression in case http_request_duration_seconds
is a conventional histogram:
1 | histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m])) |
For a native histogram, use the following expression instead:
1 | histogram_quantile(0.9, rate(http_request_duration_seconds[10m])) |
The quantile is calculated for each label combination in http_request_duration_seconds
. To aggregate, use the sum()
aggregator around the rate()
function. Since the le
label is required by histogram_quantile()
to deal with conventional histograms, it has to be included in the by
clause. The following expression aggregates the 90th percentile by job
for conventional histograms:
1 | histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m]))) |
When aggregating native histograms, the expression simplifies to:
1 | histogram_quantile(0.9, sum by (job) (rate(http_request_duration_seconds[10m]))) |
To aggregate all conventional histograms, specify only the le
label:
1 | histogram_quantile(0.9, sum by (le) (rate(http_request_duration_seconds_bucket[10m]))) |
With native histograms, aggregating everything works as usual without any by
clause:
1 | histogram_quantile(0.9, sum(rate(http_request_duration_seconds[10m]))) |
The histogram_quantile()
function interpolates quantile values by assuming a linear distribution within a bucket.
If b
has 0 observations, NaN
is returned. For φ < 0, -Inf
is returned. For φ > 1, +Inf
is returned. For φ = NaN
, NaN
is returned.
The following is only relevant for conventional histograms: If b
contains fewer than two buckets, NaN
is returned. The highest bucket must have an upper bound of +Inf
. (Otherwise, NaN
is returned.) If a quantile is located in the highest bucket, the upper bound of the second highest bucket is returned. A lower limit of the lowest bucket is assumed to be 0 if the upper bound of that bucket is greater than 0. In that case, the usual linear interpolation is applied within that bucket. Otherwise, the upper bound of the lowest bucket is returned for quantiles located in the lowest bucket.
holt_winters()
holt_winters(v range-vector, sf scalar, tf scalar)
produces a smoothed value for time series based on the range in v
. The lower the smoothing factor sf
, the more importance is given to old data. The higher the trend factor tf
, the more trends in the data is considered. Both sf
and tf
must be between 0 and 1.
holt_winters
should only be used with gauges.
hour()
hour(v=vector(time()) instant-vector)
returns the hour of the day for each of the given times in UTC. Returned values are from 0 to 23.
idelta()
idelta(v range-vector)
calculates the difference between the last two samples in the range vector v
, returning an instant vector with the given deltas and equivalent labels.
idelta
should only be used with gauges.
increase()
increase(v range-vector)
calculates the increase in the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments.
The following example expression returns the number of HTTP requests as measured over the last 5 minutes, per time series in the range vector:
1 | increase(http_requests_total{job="api-server"}[5m]) |
increase
acts on native histograms by calculating a new histogram where each compononent (sum and count of observations, buckets) is the increase between the respective component in the first and last native histogram in v
. However, each element in v
that contains a mix of float and native histogram samples within the range, will be missing from the result vector.
increase
should only be used with counters and native histograms where the components behave like counters. It is syntactic sugar for rate(v)
multiplied by the number of seconds under the specified time range window, and should be used primarily for human readability. Use rate
in recording rules so that increases are tracked consistently on a per-second basis.
irate()
irate(v range-vector)
calculates the per-second instant rate of increase of the time series in the range vector. This is based on the last two data points. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.
The following example expression returns the per-second rate of HTTP requests looking up to 5 minutes back for the two most recent data points, per time series in the range vector:
1 | irate(http_requests_total{job="api-server"}[5m]) |
irate
should only be used when graphing volatile, fast-moving counters. Use rate
for alerts and slow-moving counters, as brief changes in the rate can reset the FOR
clause and graphs consisting entirely of rare spikes are hard to read.
Note that when combining irate()
with an aggregation operator (e.g. sum()
) or a function aggregating over time (any function ending in _over_time
), always take a irate()
first, then aggregate. Otherwise irate()
cannot detect counter resets when your target restarts.
label_join()
For each timeseries in v
, label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...)
joins all the values of all the src_labels
using separator
and returns the timeseries with the label dst_label
containing the joined value. There can be any number of src_labels
in this function.
This example will return a vector with each time series having a foo
label with the value a,b,c
added to it:
1 | label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3") |
label_replace()
For each timeseries in v
, label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string)
matches the regular expression regex
against the value of the label src_label
. If it matches, the value of the label dst_label
in the returned timeseries will be the expansion of replacement
, together with the original labels in the input. Capturing groups in the regular expression can be referenced with $1
, $2
, etc. If the regular expression doesn’t match then the timeseries is returned unchanged.
This example will return timeseries with the values a:c
at label service
and a
at label foo
:
1 | label_replace(up{job="api-server",service="a:c"}, "foo", "$1", "service", "(.*):.*") |
ln()
ln(v instant-vector)
calculates the natural logarithm for all elements in v
. Special cases are:
ln(+Inf) = +Inf
ln(0) = -Inf
ln(x < 0) = NaN
ln(NaN) = NaN
log2()
log2(v instant-vector)
calculates the binary logarithm for all elements in v
. The special cases are equivalent to those in ln
.
log10()
log10(v instant-vector)
calculates the decimal logarithm for all elements in v
. The special cases are equivalent to those in ln
.
minute()
minute(v=vector(time()) instant-vector)
returns the minute of the hour for each of the given times in UTC. Returned values are from 0 to 59.
month()
month(v=vector(time()) instant-vector)
returns the month of the year for each of the given times in UTC. Returned values are from 1 to 12, where 1 means January etc.
predict_linear()
predict_linear(v range-vector, t scalar)
predicts the value of time series t
seconds from now, based on the range vector v
, using simple linear regression. The range vector must have at least two samples in order to perform the calculation. When +Inf
or -Inf
are found in the range vector, the slope and offset value calculated will be NaN
.
predict_linear
should only be used with gauges.
rate()
rate(v range-vector)
calculates the per-second average rate of increase of the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. Also, the calculation extrapolates to the ends of the time range, allowing for missed scrapes or imperfect alignment of scrape cycles with the range’s time period.
The following example expression returns the per-second rate of HTTP requests as measured over the last 5 minutes, per time series in the range vector:
1 | rate(http_requests_total{job="api-server"}[5m]) |
rate
acts on native histograms by calculating a new histogram where each compononent (sum and count of observations, buckets) is the rate of increase between the respective component in the first and last native histogram in v
. However, each element in v
that contains a mix of float and native histogram samples within the range, will be missing from the result vector.
rate
should only be used with counters and native histograms where the components behave like counters. It is best suited for alerting, and for graphing of slow-moving counters.
Note that when combining rate()
with an aggregation operator (e.g. sum()
) or a function aggregating over time (any function ending in _over_time
), always take a rate()
first, then aggregate. Otherwise rate()
cannot detect counter resets when your target restarts.
resets()
For each input time series, resets(v range-vector)
returns the number of counter resets within the provided time range as an instant vector. Any decrease in the value between two consecutive samples is interpreted as a counter reset.
resets
should only be used with counters.
round()
round(v instant-vector, to_nearest=1 scalar)
rounds the sample values of all elements in v
to the nearest integer. Ties are resolved by rounding up. The optional to_nearest
argument allows specifying the nearest multiple to which the sample values should be rounded. This multiple may also be a fraction.
scalar()
Given a single-element input vector, scalar(v instant-vector)
returns the sample value of that single element as a scalar. If the input vector does not have exactly one element, scalar
will return NaN
.
sgn()
sgn(v instant-vector)
returns a vector with all sample values converted to their sign, defined as this: 1 if v is positive, -1 if v is negative and 0 if v is equal to zero.
sort()
sort(v instant-vector)
returns vector elements sorted by their sample values, in ascending order.
sort_desc()
Same as sort
, but sorts in descending order.
sqrt()
sqrt(v instant-vector)
calculates the square root of all elements in v
.
time()
time()
returns the number of seconds since January 1, 1970 UTC. Note that this does not actually return the current time, but the time at which the expression is to be evaluated.
timestamp()
timestamp(v instant-vector)
returns the timestamp of each of the samples of the given vector as the number of seconds since January 1, 1970 UTC.
vector()
vector(s scalar)
returns the scalar s
as a vector with no labels.
year()
year(v=vector(time()) instant-vector)
returns the year for each of the given times in UTC.
<aggregation>_over_time()
The following functions allow aggregating each series of a given range vector over time and return an instant vector with per-series aggregation results:
avg_over_time(range-vector)
: the average value of all points in the specified interval.min_over_time(range-vector)
: the minimum value of all points in the specified interval.max_over_time(range-vector)
: the maximum value of all points in the specified interval.sum_over_time(range-vector)
: the sum of all values in the specified interval.count_over_time(range-vector)
: the count of all values in the specified interval.quantile_over_time(scalar, range-vector)
: the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.stddev_over_time(range-vector)
: the population standard deviation of the values in the specified interval.stdvar_over_time(range-vector)
: the population standard variance of the values in the specified interval.last_over_time(range-vector)
: the most recent point value in specified interval.present_over_time(range-vector)
: the value 1 for any series in the specified interval.
Note that all values in the specified interval have the same weight in the aggregation even if the values are not equally spaced throughout the interval.
Trigonometric Functions
The trigonometric functions work in radians:
acos(v instant-vector)
: calculates the arccosine of all elements inv
(special cases).acosh(v instant-vector)
: calculates the inverse hyperbolic cosine of all elements inv
(special cases).asin(v instant-vector)
: calculates the arcsine of all elements inv
(special cases).asinh(v instant-vector)
: calculates the inverse hyperbolic sine of all elements inv
(special cases).atan(v instant-vector)
: calculates the arctangent of all elements inv
(special cases).atanh(v instant-vector)
: calculates the inverse hyperbolic tangent of all elements inv
(special cases).cos(v instant-vector)
: calculates the cosine of all elements inv
(special cases).cosh(v instant-vector)
: calculates the hyperbolic cosine of all elements inv
(special cases).sin(v instant-vector)
: calculates the sine of all elements inv
(special cases).sinh(v instant-vector)
: calculates the hyperbolic sine of all elements inv
(special cases).tan(v instant-vector)
: calculates the tangent of all elements inv
(special cases).tanh(v instant-vector)
: calculates the hyperbolic tangent of all elements inv
(special cases).
The following are useful for converting between degrees and radians:
deg(v instant-vector)
: converts radians to degrees for all elements inv
.pi()
: returns pi.rad(v instant-vector)
: converts degrees to radians for all elements inv
.
示例
简单的时间序列选择
返回所有带有http_requests_total
指标(metric)的时间序列:
1 | http_requests_total |
返回所有带有http_requests_total
指标(metric)和带job
和handler
标签(label)的时间序列:
1 | http_requests_total{job="apiserver", handler="/api/comments"} |
返回相同的向量的一整个时间范围(在本示例中为查询时间前5分钟),让它变成一个范围向量(range vector):
1 | http_requests_total{job="apiserver", handler="/api/comments"}[5m] |
请注意,一个返回范围向量(range vector)的表达式是不能直接绘制图表的,但是在表达式浏览器的表格视图(控制台)中查看。
使用正则表达式(regular expressions),你可以选择时间序列那些匹配特定模式的job
,在这个示例中,所有的job
都是server
结尾的:
1 | http_requests_total{job=~".*server"} |
Prometheus 使用的所有正则表达式遵循RE2 syntax。
为了查询除了4xx之外的HTTP状态码,你可以运行下面的语句:
1 | http_requests_total{status!~"4.."} |
子查询
返回30分钟内5分钟速率(5分钟为一个时间段求得的速率),分辨率(resolution)为1分钟(每一分钟一个数据点,也就是有三十个数据点)
1 | rate(http_requests_total[5m])[30m:1m] |
这是个嵌套子查询的示例. 带 deriv
函数的子查询使用默认分辨率. 请注意,不是必要的情况下使用子查询是不明智的
1 | max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:]) |
使用方法,运算符等
返回最近5分钟内测量的所有带有http_requests_total
metric名称的时间序列每秒的速率
1 | rate(http_requests_total[5m]) |
假设http_requests_total
时间序列全部带有job标签(按照作业名展开)和 instance
(按作业实例展开),我们可能希望去对所有实例的速率求和,以至于我们获取更少的时间序列输出,但是仍然保留job
标签
1 | sum by (job) (rate(http_requests_total[5m])) |
如果我们有两个不同的metrics拥有相同维度的标签,我们可以对它们使用二进制运算符,两边都有相同标签集的元素会被匹配并传播到输出。例如,这个表达式返回每个实例未使用的单位为MiB的内存空间(在一个虚拟的集群调度中暴露了这些它运行的实例的metric):
1 | (instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024 |
相同的表达式,但是按照应用程序求和,可以像这样写:
The same expression, but summed by application, could be written like this:
1 | sum by (app, proc) ( |
如果相同的虚拟集群调度暴露了每一个实例的CPU的使用指标(metric),像下面这样:
1 | instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"} |
…我们可以获取根据应用程序(app
)和进场类型(proc
)分组的前三CPU使用者,如下
1 | topk(3, sum by (app, proc) (rate(instance_cpu_time_ns[5m]))) |
假定这个metric每一个运行的实例拥有一个时间序列,你可以像这样计算每个应用程序运行实例数:
1 | count by (app) (instance_cpu_time_ns) |
HTTP API
The current stable HTTP API is reachable under /api/v1
on a Prometheus server. Any non-breaking additions will be added under that endpoint.
Format overview
The API response format is JSON. Every successful API request returns a 2xx
status code.
Invalid requests that reach the API handlers return a JSON error object and one of the following HTTP response codes:
400 Bad Request
when parameters are missing or incorrect.422 Unprocessable Entity
when an expression can’t be executed (RFC4918).503 Service Unavailable
when queries time out or abort.
Other non-2xx
codes may be returned for errors occurring before the API endpoint is reached.
An array of warnings may be returned if there are errors that do not inhibit the request execution. All of the data that was successfully collected will be returned in the data field.
The JSON response envelope format is as follows:
1 | { |
Generic placeholders are defined as follows:
<rfc3339 | unix_timestamp>
: Input timestamps may be provided either in RFC3339 format or as a Unix timestamp in seconds, with optional decimal places for sub-second precision. Output timestamps are always represented as Unix timestamps in seconds.<series_selector>
: Prometheus time series selectors likehttp_requests_total
orhttp_requests_total{method=~"(GET|POST)"}
and need to be URL-encoded.<duration>
: Prometheus duration strings. For example,5m
refers to a duration of 5 minutes.<bool>
: boolean values (stringstrue
andfalse
).
Note: Names of query parameters that may be repeated end with []
.
Expression queries
Query language expressions may be evaluated at a single instant or over a range of time. The sections below describe the API endpoints for each type of expression query.
Instant queries
The following endpoint evaluates an instant query at a single point in time:
1 | GET /api/v1/query |
URL query parameters:
query=<string>
: Prometheus expression query string.time=<rfc3339 | unix_timestamp>
: Evaluation timestamp. Optional.timeout=<duration>
: Evaluation timeout. Optional. Defaults to and is capped by the value of the-query.timeout
flag.
The current server time is used if the time
parameter is omitted.
You can URL-encode these parameters directly in the request body by using the POST
method and Content-Type: application/x-www-form-urlencoded
header. This is useful when specifying a large query that may breach server-side URL character limits.
The data
section of the query result has the following format:
1 | { |
<value>
refers to the query result data, which has varying formats depending on the resultType
. See the expression query result formats.
The following example evaluates the expression up
at the time 2015-07-01T20:10:51.781Z
:
1 | $ curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z' |
Range queries
The following endpoint evaluates an expression query over a range of time:
1 | GET /api/v1/query_range |
URL query parameters:
query=<string>
: Prometheus expression query string.start=<rfc3339 | unix_timestamp>
: Start timestamp, inclusive.end=<rfc3339 | unix_timestamp>
: End timestamp, inclusive.step=<duration | float>
: Query resolution step width induration
format or float number of seconds.timeout=<duration>
: Evaluation timeout. Optional. Defaults to and is capped by the value of the-query.timeout
flag.
You can URL-encode these parameters directly in the request body by using the POST
method and Content-Type: application/x-www-form-urlencoded
header. This is useful when specifying a large query that may breach server-side URL character limits.
The data
section of the query result has the following format:
1 | { |
For the format of the <value>
placeholder, see the range-vector result format.
The following example evaluates the expression up
over a 30-second range with a query resolution of 15 seconds.
1 | $ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s' |
Formatting query expressions
The following endpoint formats a PromQL expression in a prettified way:
1 | GET /api/v1/format_query |
URL query parameters:
query=<string>
: Prometheus expression query string.
You can URL-encode these parameters directly in the request body by using the POST
method and Content-Type: application/x-www-form-urlencoded
header. This is useful when specifying a large query that may breach server-side URL character limits.
The data
section of the query result is a string containing the formatted query expression. Note that any comments are removed in the formatted string.
The following example formats the expression foo/bar
:
1 | $ curl 'http://localhost:9090/api/v1/format_query?query=foo/bar' |
Querying metadata
Prometheus offers a set of API endpoints to query metadata about series and their labels.
NOTE: These API endpoints may return metadata for series for which there is no sample within the selected time range, and/or for series whose samples have been marked as deleted via the deletion API endpoint. The exact extent of additionally returned series metadata is an implementation detail that may change in the future.
Finding series by label matchers
The following endpoint returns the list of time series that match a certain label set.
1 | GET /api/v1/series |
URL query parameters:
match[]=<series_selector>
: Repeated series selector argument that selects the series to return. At least onematch[]
argument must be provided.start=<rfc3339 | unix_timestamp>
: Start timestamp.end=<rfc3339 | unix_timestamp>
: End timestamp.
You can URL-encode these parameters directly in the request body by using the POST
method and Content-Type: application/x-www-form-urlencoded
header. This is useful when specifying a large or dynamic number of series selectors that may breach server-side URL character limits.
The data
section of the query result consists of a list of objects that contain the label name/value pairs which identify each series.
The following example returns all series that match either of the selectors up
or process_start_time_seconds{job="prometheus"}
:
1 | $ curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=up' --data-urlencode 'match[]=process_start_time_seconds{job="prometheus"}' |
Getting label names
The following endpoint returns a list of label names:
1 | GET /api/v1/labels |
URL query parameters:
start=<rfc3339 | unix_timestamp>
: Start timestamp. Optional.end=<rfc3339 | unix_timestamp>
: End timestamp. Optional.match[]=<series_selector>
: Repeated series selector argument that selects the series from which to read the label names. Optional.
The data
section of the JSON response is a list of string label names.
Here is an example.
1 | $ curl 'localhost:9090/api/v1/labels' |
Querying label values
The following endpoint returns a list of label values for a provided label name:
1 | GET /api/v1/label/<label_name>/values |
URL query parameters:
start=<rfc3339 | unix_timestamp>
: Start timestamp. Optional.end=<rfc3339 | unix_timestamp>
: End timestamp. Optional.match[]=<series_selector>
: Repeated series selector argument that selects the series from which to read the label values. Optional.
The data
section of the JSON response is a list of string label values.
This example queries for all label values for the job
label:
1 | $ curl http://localhost:9090/api/v1/label/job/values |
Querying exemplars
This is experimental and might change in the future. The following endpoint returns a list of exemplars for a valid PromQL query for a specific time range:
1 | GET /api/v1/query_exemplars |
URL query parameters:
query=<string>
: Prometheus expression query string.start=<rfc3339 | unix_timestamp>
: Start timestamp.end=<rfc3339 | unix_timestamp>
: End timestamp.
1 | $ curl -g 'http://localhost:9090/api/v1/query_exemplars?query=test_exemplar_metric_total&start=2020-09-14T15:22:25.479Z&end=2020-09-14T15:23:25.479Z' |
Expression query result formats
Expression queries may return the following response values in the result
property of the data
section. <sample_value>
placeholders are numeric sample values. JSON does not support special float values such as NaN
, Inf
, and -Inf
, so sample values are transferred as quoted JSON strings rather than raw numbers.
The keys "histogram"
and "histograms"
only show up if the experimental native histograms are present in the response. Their placeholder <histogram>
is explained in detail in its own section below.
Range vectors
Range vectors are returned as result type matrix
. The corresponding result
property has the following format:
1 | [ |
Each series could have the "values"
key, or the "histograms"
key, or both. For a given timestamp, there will only be one sample of either float or histogram type.
Instant vectors
Instant vectors are returned as result type vector
. The corresponding result
property has the following format:
1 | [ |
Each series could have the "value"
key, or the "histogram"
key, but not both.
Scalars
Scalar results are returned as result type scalar
. The corresponding result
property has the following format:
1 | [ <unix_time>, "<scalar_value>" ] |
Strings
String results are returned as result type string
. The corresponding result
property has the following format:
1 | [ <unix_time>, "<string_value>" ] |
Native histograms
The <histogram>
placeholder used above is formatted as follows.
Note that native histograms are an experimental feature, and the format below might still change.
1 | { |
The <boundary_rule>
placeholder is an integer between 0 and 3 with the following meaning:
- 0: “open left” (left boundary is exclusive, right boundary in inclusive)
- 1: “open right” (left boundary is inclusive, right boundary in exclusive)
- 2: “open both” (both boundaries are exclusive)
- 3: “closed both” (both boundaries are inclusive)
Note that with the currently implemented bucket schemas, positive buckets are “open left”, negative buckets are “open right”, and the zero bucket (with a negative left boundary and a positive right boundary) is “closed both”.
Targets
The following endpoint returns an overview of the current state of the Prometheus target discovery:
1 | GET /api/v1/targets |
Both the active and dropped targets are part of the response by default. labels
represents the label set after relabeling has occurred. discoveredLabels
represent the unmodified labels retrieved during service discovery before relabeling has occurred.
1 | $ curl http://localhost:9090/api/v1/targets |
The state
query parameter allows the caller to filter by active or dropped targets, (e.g., state=active
, state=dropped
, state=any
). Note that an empty array is still returned for targets that are filtered out. Other values are ignored.
1 | $ curl 'http://localhost:9090/api/v1/targets?state=active' |
The scrapePool
query parameter allows the caller to filter by scrape pool name.
1 | $ curl 'http://localhost:9090/api/v1/targets?scrapePool=node_exporter' |
Rules
The /rules
API endpoint returns a list of alerting and recording rules that are currently loaded. In addition it returns the currently active alerts fired by the Prometheus instance of each alerting rule.
As the /rules
endpoint is fairly new, it does not have the same stability guarantees as the overarching API v1.
1 | GET /api/v1/rules |
URL query parameters: - type=alert|record
: return only the alerting rules (e.g. type=alert
) or the recording rules (e.g. type=record
). When the parameter is absent or empty, no filtering is done.
1 | $ curl http://localhost:9090/api/v1/rules |
Alerts
The /alerts
endpoint returns a list of all active alerts.
As the /alerts
endpoint is fairly new, it does not have the same stability guarantees as the overarching API v1.
1 | GET /api/v1/alerts |
Querying target metadata
The following endpoint returns metadata about metrics currently scraped from targets. This is experimental and might change in the future.
1 | GET /api/v1/targets/metadata |
URL query parameters:
match_target=<label_selectors>
: Label selectors that match targets by their label sets. All targets are selected if left empty.metric=<string>
: A metric name to retrieve metadata for. All metric metadata is retrieved if left empty.limit=<number>
: Maximum number of targets to match.
The data
section of the query result consists of a list of objects that contain metric metadata and the target label set.
The following example returns all metadata entries for the go_goroutines
metric from the first two targets with label job="prometheus"
.
1 | curl -G http://localhost:9091/api/v1/targets/metadata \ |
The following example returns metadata for all metrics for all targets with label instance="127.0.0.1:9090
.
1 | curl -G http://localhost:9091/api/v1/targets/metadata \ |
Querying metric metadata
It returns metadata about metrics currently scraped from targets. However, it does not provide any target information. This is considered experimental and might change in the future.
1 | GET /api/v1/metadata |
URL query parameters:
limit=<number>
: Maximum number of metrics to return.metric=<string>
: A metric name to filter metadata for. All metric metadata is retrieved if left empty.
The data
section of the query result consists of an object where each key is a metric name and each value is a list of unique metadata objects, as exposed for that metric name across all targets.
The following example returns two metrics. Note that the metric http_requests_total
has more than one object in the list. At least one target has a value for HELP
that do not match with the rest.
1 | curl -G http://localhost:9090/api/v1/metadata?limit=2 |
The following example returns metadata only for the metric http_requests_total
.
1 | curl -G http://localhost:9090/api/v1/metadata?metric=http_requests_total |
Alertmanagers
The following endpoint returns an overview of the current state of the Prometheus alertmanager discovery:
1 | GET /api/v1/alertmanagers |
Both the active and dropped Alertmanagers are part of the response.
1 | $ curl http://localhost:9090/api/v1/alertmanagers |
Status
Following status endpoints expose current Prometheus configuration.
Config
The following endpoint returns currently loaded configuration file:
1 | GET /api/v1/status/config |
The config is returned as dumped YAML file. Due to limitation of the YAML library, YAML comments are not included.
1 | $ curl http://localhost:9090/api/v1/status/config |
Flags
The following endpoint returns flag values that Prometheus was configured with:
1 | GET /api/v1/status/flags |
All values are of the result type string
.
1 | $ curl http://localhost:9090/api/v1/status/flags |
New in v2.2
Runtime Information
The following endpoint returns various runtime information properties about the Prometheus server:
1 | GET /api/v1/status/runtimeinfo |
The returned values are of different types, depending on the nature of the runtime property.
1 | $ curl http://localhost:9090/api/v1/status/runtimeinfo |
NOTE: The exact returned runtime properties may change without notice between Prometheus versions.
New in v2.14
Build Information
The following endpoint returns various build information properties about the Prometheus server:
1 | GET /api/v1/status/buildinfo |
All values are of the result type string
.
1 | $ curl http://localhost:9090/api/v1/status/buildinfo |
NOTE: The exact returned build properties may change without notice between Prometheus versions.
New in v2.14
TSDB Stats
The following endpoint returns various cardinality statistics about the Prometheus TSDB:
1 | GET /api/v1/status/tsdb |
headStats
: This provides the following data about the head block of the TSDB:
- numSeries: The number of series.
- chunkCount: The number of chunks.
- minTime: The current minimum timestamp in milliseconds.
- maxTime: The current maximum timestamp in milliseconds.
seriesCountByMetricName: This will provide a list of metrics names and their series count.
labelValueCountByLabelName: This will provide a list of the label names and their value count.
memoryInBytesByLabelName This will provide a list of the label names and memory used in bytes. Memory usage is calculated by adding the length of all values for a given label name.
seriesCountByLabelPair This will provide a list of label value pairs and their series count.
1 | $ curl http://localhost:9090/api/v1/status/tsdb |
New in v2.15
WAL Replay Stats
The following endpoint returns information about the WAL replay:
1 | GET /api/v1/status/walreplay |
read: The number of segments replayed so far. total: The total number segments needed to be replayed. progress: The progress of the replay (0 - 100%). state: The state of the replay. Possible states: - waiting: Waiting for the replay to start. - in progress: The replay is in progress. - done: The replay has finished.
1 | $ curl http://localhost:9090/api/v1/status/walreplay |
NOTE: This endpoint is available before the server has been marked ready and is updated in real time to facilitate monitoring the progress of the WAL replay.
New in v2.28
TSDB Admin APIs
These are APIs that expose database functionalities for the advanced user. These APIs are not enabled unless the --web.enable-admin-api
is set.
Snapshot
Snapshot creates a snapshot of all current data into snapshots/<datetime>-<rand>
under the TSDB’s data directory and returns the directory as response. It will optionally skip snapshotting data that is only present in the head block, and which has not yet been compacted to disk.
1 | POST /api/v1/admin/tsdb/snapshot |
URL query parameters:
skip_head=<bool>
: Skip data present in the head block. Optional.
1 | $ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot |
The snapshot now exists at <data-dir>/snapshots/20171210T211224Z-2be650b6d019eb54
New in v2.1 and supports PUT from v2.9
Delete Series
DeleteSeries deletes data for a selection of series in a time range. The actual data still exists on disk and is cleaned up in future compactions or can be explicitly cleaned up by hitting the Clean Tombstones endpoint.
If successful, a 204
is returned.
1 | POST /api/v1/admin/tsdb/delete_series |
URL query parameters:
match[]=<series_selector>
: Repeated label matcher argument that selects the series to delete. At least onematch[]
argument must be provided.start=<rfc3339 | unix_timestamp>
: Start timestamp. Optional and defaults to minimum possible time.end=<rfc3339 | unix_timestamp>
: End timestamp. Optional and defaults to maximum possible time.
Not mentioning both start and end times would clear all the data for the matched series in the database.
Example:
1 | $ curl -X POST \ |
NOTE: This endpoint marks samples from series as deleted, but will not necessarily prevent associated series metadata from still being returned in metadata queries for the affected time range (even after cleaning tombstones). The exact extent of metadata deletion is an implementation detail that may change in the future.
New in v2.1 and supports PUT from v2.9
Clean Tombstones
CleanTombstones removes the deleted data from disk and cleans up the existing tombstones. This can be used after deleting series to free up space.
If successful, a 204
is returned.
1 | POST /api/v1/admin/tsdb/clean_tombstones |
This takes no parameters or body.
1 | $ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones |
New in v2.1 and supports PUT from v2.9
Remote Write Receiver
Prometheus can be configured as a receiver for the Prometheus remote write protocol. This is not considered an efficient way of ingesting samples. Use it with caution for specific low-volume use cases. It is not suitable for replacing the ingestion via scraping and turning Prometheus into a push-based metrics collection system.
Enable the remote write receiver by setting --web.enable-remote-write-receiver
. When enabled, the remote write receiver endpoint is /api/v1/write
. Find more details here.
New in v2.33
Remote Read API
This is not currently considered part of the stable API and is subject to change even between non-major version releases of Prometheus.
Format overview
The API response format is JSON. Every successful API request returns a 2xx
status code.
Invalid requests that reach the API handlers return a JSON error object and one of the following HTTP response codes:
400 Bad Request
when parameters are missing or incorrect.422 Unprocessable Entity
when an expression can’t be executed (RFC4918).503 Service Unavailable
when queries time out or abort.
Other non-2xx
codes may be returned for errors occurring before the API endpoint is reached.
An array of warnings may be returned if there are errors that do not inhibit the request execution. All of the data that was successfully collected will be returned in the data field.
The JSON response envelope format is as follows:
1 | { |
Generic placeholders are defined as follows:
<rfc3339 | unix_timestamp>
: Input timestamps may be provided either in RFC3339 format or as a Unix timestamp in seconds, with optional decimal places for sub-second precision. Output timestamps are always represented as Unix timestamps in seconds.<series_selector>
: Prometheus time series selectors likehttp_requests_total
orhttp_requests_total{method=~"(GET|POST)"}
and need to be URL-encoded.<duration>
: Prometheus duration strings. For example,5m
refers to a duration of 5 minutes.<bool>
: boolean values (stringstrue
andfalse
).
Note: Names of query parameters that may be repeated end with []
.
Remote Read API
This API provides data read functionality from Prometheus. This interface expects snappy compression. The API definition is located here.
Request are made to the following endpoint. /api/v1/read
Samples
This returns a message that includes a list of raw samples.
Streamed Chunks
These streamed chunks utilize an XOR algorithm inspired by the Gorilla compression to encode the chunks. However, it provides resolution to the millisecond instead of to the second.