Skip to content

Commit 24930d6

Browse files
committed
docs: Add ingestion troubleshooting topic
1 parent 448cc74 commit 24930d6

File tree

4 files changed

+1323
-0
lines changed

4 files changed

+1323
-0
lines changed
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: Monitoring and alerting ingest
3+
description: Set up Prometheus alerting rules to detect common ingestion issues before they impact your logging pipeline.
4+
weight: 100
5+
---
6+
7+
8+
# Monitoring and alerting ingest
9+
10+
Set up Prometheus alerting rules to detect common ingestion issues before they impact your logging pipeline. Create a file named `loki-ingestion-alerts.yml` (or add to your existing Prometheus rules file) with the following alerting rules:
11+
12+
```yaml
13+
# File: loki-ingestion-alerts.yml
14+
# Add this file to your Prometheus rule_files configuration:
15+
# rule_files:
16+
# - /etc/prometheus/rules/loki-ingestion-alerts.yml
17+
18+
groups:
19+
- name: loki_ingestion
20+
rules:
21+
# Rate limit alerts
22+
- alert: LokiRequestRateLimited
23+
expr: sum by (tenant) (rate(loki_discarded_samples_total{reason="rate_limited"}[5m])) > 0
24+
for: 5m
25+
labels:
26+
severity: warning
27+
annotations:
28+
summary: "Tenant {{ $labels.tenant }} is being rate limited"
29+
description: "Tenant {{ $labels.tenant }} has exceeded ingestion rate limits. Consider increasing limits or reducing log volume."
30+
31+
# Stream limit alerts
32+
- alert: LokiStreamLimitReached
33+
expr: sum by (tenant) (rate(loki_discarded_samples_total{reason="stream_limit"}[5m])) > 0
34+
for: 5m
35+
labels:
36+
severity: warning
37+
annotations:
38+
summary: "Tenant {{ $labels.tenant }} has reached stream limit"
39+
description: "Tenant {{ $labels.tenant }} has exceeded max_global_streams_per_user. Reduce label cardinality or increase the limit."
40+
41+
# WAL alerts
42+
- alert: LokiWALDiskFull
43+
expr: increase(loki_ingester_wal_disk_full_failures_total[5m]) > 0
44+
for: 1m
45+
labels:
46+
severity: critical
47+
annotations:
48+
summary: "WAL disk is full on {{ $labels.instance }}"
49+
description: "Ingester {{ $labels.instance }} cannot write to WAL due to disk space. Data durability is compromised."
50+
51+
# Validation errors
52+
- alert: LokiHighValidationErrors
53+
expr: sum by (reason) (rate(loki_discarded_samples_total{reason=~"invalid_labels|line_too_long|out_of_order"}[5m])) > 10
54+
for: 5m
55+
labels:
56+
severity: warning
57+
annotations:
58+
summary: "High rate of {{ $labels.reason }} validation errors"
59+
description: "Loki is discarding logs due to {{ $labels.reason }}. Check your log shipping configuration."
60+
```
61+
62+
To enable these alerts, add the rules file to your Prometheus configuration:
63+
64+
```yaml
65+
# prometheus.yml
66+
rule_files:
67+
- /etc/prometheus/rules/loki-ingestion-alerts.yml
68+
```
69+
70+
{{< admonition type="tip" >}}
71+
The [Loki mixin](https://github.com/grafana/loki/tree/main/production/loki-mixin) provides a comprehensive set of pre-built dashboards and alerting rules for monitoring Loki in production.
72+
{{< /admonition >}}

docs/sources/configure/bp-configure.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,3 +81,50 @@ Loki and Promtail have flags which will dump the entire config object to stderr
8181
`-print-config-stderr` works well when invoking Loki from the command line, as you can get a quick output of the entire Loki configuration.
8282

8383
`-log-config-reverse-order` is the flag Grafana runs Loki with in all our environments. The configuration entries are reversed, so that the order of the configuration reads correctly top to bottom when viewed in Grafana's Explore.
84+
85+
## Recommended production limits
86+
87+
```yaml
88+
limits_config:
89+
# Rate limits
90+
ingestion_rate_strategy: global
91+
ingestion_rate_mb: 10
92+
ingestion_burst_size_mb: 20
93+
per_stream_rate_limit: 3MB
94+
per_stream_rate_limit_burst: 15MB
95+
96+
# Stream limits
97+
max_global_streams_per_user: 10000
98+
max_streams_per_user: 0
99+
100+
# Validation
101+
max_line_size: 256KB
102+
max_line_size_truncate: false
103+
max_label_name_length: 1024
104+
max_label_value_length: 2048
105+
max_label_names_per_series: 15
106+
107+
# Time constraints
108+
reject_old_samples: true
109+
reject_old_samples_max_age: 168h # 7 days
110+
creation_grace_period: 10m
111+
unordered_writes: true
112+
```
113+
114+
## Ingester configuration
115+
116+
```yaml
117+
ingester:
118+
# Chunk settings
119+
chunk_idle_period: 30m
120+
chunk_target_size: 1572864 # 1.5 MB
121+
chunk_encoding: snappy
122+
max_chunk_age: 2h
123+
124+
# WAL settings
125+
wal:
126+
enabled: true
127+
dir: /loki/wal
128+
checkpoint_duration: 5m
129+
flush_on_shutdown: true
130+
```
File renamed without changes.

0 commit comments

Comments
 (0)