Monitoring Observability Setup
Introduction
My setup
Resources
Component | CPU Request | CPU Limit | Memory Request | Memory Limit |
---|---|---|---|---|
Prometheus | 500m | 1–2 cores | 512Mi | 2–4 Gi |
Grafana | 100m | 500m | 128Mi | 512Mi–1 Gi |
Loki | 300m | 1 core | 512Mi | 2 Gi |
OTEL Collector | 200m | 500m | 256Mi | 512Mi–1 Gi |
Persistent Data
Component | Needs Persistence? | What It Stores | Notes |
---|---|---|---|
Prometheus | Yes | Time-series metrics (TSDB) | Use a PVC for `/prometheus` or `/data`. Retention defaults to 15d; tune via `--storage.tsdb.retention.time`. |
Grafana | Optional | Dashboards, users, config | If using SQLite (default), persist `/var/lib/grafana`. For Postgres/MySQL, persist the DB. |
Loki | Yes | Log chunks, index, metadata | Persist `/loki` or use object storage (e.g., MinIO, S3). Index and chunk retention are configurable. |
OTEL Collector | No | N/A | Stateless by design. Doesn’t store data unless you add a file exporter or buffering. |
Backups
Component | Persistence Required? | Suggested Setup | Retention Strategy |
---|---|---|---|
Prometheus | Minimal | Use emptyDir or hostPath | Retention: ~6h to 24h. Set `--storage.tsdb.retention.time=6h` |
Grafana | Optional | No volume needed unless storing user dashboards | Use provisioning (ConfigMaps) for dashboards |
Loki | Yes-ish | Use emptyDir or hostPath | Set retention via config (`table_manager`, `index`, `chunks`) |
OTEL Collector | No | Stateless | No action needed |