Prometheus

From bibbleWiki
Jump to navigation Jump to search

Introduction

This is a page on Prometheus.

Components

This diagram from iam-veeramalla is a really good picture of what goes on.

Types of Metrics

  • Counter - Only goes up. e.g. number of http requests
  • Gauge - Goes up and down e.g. CPU usage
  • Histogram - Buckets e.g. Distribution of request duration.

PromQL Cheat Sheet

The robot probably will do the work. The main purpose is for the awareness of what is available.

PromQL

Label Query

To query a metrics with a label you put the label in {} with a value e.g.

http_requests_total{method="GET"}

Will retrieve the metrics called http_request_total which contains a label

http_requests_total{method="GET", status="200", instance="web01", path="/items/1"}

You and add and queries by adding another label

http_requests_total{method="GET", status="200"} 

You can use .* to get starts with

http_requests_total{method="GET", status="200", path="/items.*"}

Prometheus Vectors

Prometheus uses vectors to represent sets of time series data. There are two primary types:

Instant Vector

An instant vector represents a set of time series, each containing a single sample at a specific timestamp.

Characteristics

  • Captures the latest value of each time series at a given moment.
  • Used for real-time monitoring and alerting.
  • Can be directly graphed or used in arithmetic and comparison operations.

Example

http_requests_total{status="200"}

This returns the current value of all time series with the metric name `http_requests_total` and label `status="200"`.

Range Vector

A range vector represents a set of time series, each containing multiple samples over a specified time range.

Characteristics

  • Captures historical data for each time series.
  • Used for trend analysis, rate calculations, and aggregations.
  • Cannot be directly graphed without applying a function that reduces the range to a single value per timestamp.

Example

http_requests_total{status="200"}[5m]

This returns all samples from the last 5 minutes for each matching time series.

Range Vector Behavior with Missing Data

A range vector in Prometheus captures multiple samples for each time series over a specified time window. But what happens when no samples exist during part of that window?

Scenario

Suppose we query a metric over a 5-minute range:

http_requests_total[5m]

This retrieves all samples from the last 5 minutes for each matching time series. Now imagine the metric was scraped every minute, but no samples were recorded in the final minute.

Sample Timeline

Timestamp Value
T-5m 100
T-4m 120
T-3m 140
T-2m 160
T-1m (missing)

Resulting Range Vector

The range vector will include only the samples that exist within the 5-minute window. If no sample exists for the last minute, that portion of the vector is simply absent.

Implications

  • Functions like `rate()` or `increase()` will compute based on the available samples.
  • If the last sample is missing, the function may return a lower value or even `NaN`, depending on the function and scrape interval.
  • Grafana panels may show gaps or flat lines if the missing data affects aggregation.

Example with `rate()`

rate(http_requests_total[5m])

If the last sample is missing, `rate()` will calculate based on the delta between T-5m and T-2m, reducing accuracy.

Best Practices

  • Ensure scrape intervals are consistent and aligned with query windows.
  • Use functions like `last_over_time()` or `present_over_time()` to detect missing data.
  • Consider alerting on stale metrics using `absent()` or `increase(metric[5m]) == 0`.

Comparison Table

Feature Instant Vector Range Vector
Time Scope Single timestamp Time interval
Data Points One per series Many per series
Use Case Current state Historical analysis
Graphing Directly graphable Requires aggregation
Functions Arithmetic, comparison rate(), increase(), avg_over_time(), etc.