Prometheus
Introduction
This is a page on Prometheus.
Components
This diagram from iam-veeramalla is a really good picture of what goes on.
Types of Metrics
- Counter - Only goes up. e.g. number of http requests
- Gauge - Goes up and down e.g. CPU usage
- Histogram - Buckets e.g. Distribution of request duration.
PromQL Cheat Sheet
The robot probably will do the work. The main purpose is for the awareness of what is available.
PromQL
Label Query
To query a metrics with a label you put the label in {} with a value e.g.
http_requests_total{method="GET"}
Will retrieve the metrics called http_request_total which contains a label
http_requests_total{method="GET", status="200", instance="web01", path="/items/1"}
You and add and queries by adding another label
http_requests_total{method="GET", status="200"}
You can use .* to get starts with
http_requests_total{method="GET", status="200", path="/items.*"}
Prometheus Vectors
Prometheus uses vectors to represent sets of time series data. There are two primary types:
Instant Vector
An instant vector represents a set of time series, each containing a single sample at a specific timestamp.
Characteristics
- Captures the latest value of each time series at a given moment.
- Used for real-time monitoring and alerting.
- Can be directly graphed or used in arithmetic and comparison operations.
Example
http_requests_total{status="200"}
This returns the current value of all time series with the metric name `http_requests_total` and label `status="200"`.
Range Vector
A range vector represents a set of time series, each containing multiple samples over a specified time range.
Characteristics
- Captures historical data for each time series.
- Used for trend analysis, rate calculations, and aggregations.
- Cannot be directly graphed without applying a function that reduces the range to a single value per timestamp.
Example
http_requests_total{status="200"}[5m]
This returns all samples from the last 5 minutes for each matching time series.
Range Vector Behavior with Missing Data
A range vector in Prometheus captures multiple samples for each time series over a specified time window. But what happens when no samples exist during part of that window?
Scenario
Suppose we query a metric over a 5-minute range:
http_requests_total[5m]
This retrieves all samples from the last 5 minutes for each matching time series. Now imagine the metric was scraped every minute, but no samples were recorded in the final minute.
Sample Timeline
Timestamp | Value |
---|---|
T-5m | 100 |
T-4m | 120 |
T-3m | 140 |
T-2m | 160 |
T-1m | (missing) |
Resulting Range Vector
The range vector will include only the samples that exist within the 5-minute window. If no sample exists for the last minute, that portion of the vector is simply absent.
Implications
- Functions like `rate()` or `increase()` will compute based on the available samples.
- If the last sample is missing, the function may return a lower value or even `NaN`, depending on the function and scrape interval.
- Grafana panels may show gaps or flat lines if the missing data affects aggregation.
Example with `rate()`
rate(http_requests_total[5m])
If the last sample is missing, `rate()` will calculate based on the delta between T-5m and T-2m, reducing accuracy.
Best Practices
- Ensure scrape intervals are consistent and aligned with query windows.
- Use functions like `last_over_time()` or `present_over_time()` to detect missing data.
- Consider alerting on stale metrics using `absent()` or `increase(metric[5m]) == 0`.
Comparison Table
Feature | Instant Vector | Range Vector |
---|---|---|
Time Scope | Single timestamp | Time interval |
Data Points | One per series | Many per series |
Use Case | Current state | Historical analysis |
Graphing | Directly graphable | Requires aggregation |
Functions | Arithmetic, comparison | rate(), increase(), avg_over_time(), etc. |