grafana
diff --git a/‎docs/sources/alert/alerts-ingest.md‎
Lines changed: 72 additions & 0 deletions b/‎docs/sources/alert/alerts-ingest.md‎
Lines changed: 72 additions & 0 deletions
diff --git a/‎docs/sources/configure/bp-configure.md‎
Lines changed: 47 additions & 0 deletions b/‎docs/sources/configure/bp-configure.md‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎docs/sources/operations/troubleshooting.md‎ renamed to ‎docs/sources/operations/troubleshooting/_index.md‎ b/‎docs/sources/operations/troubleshooting.md‎ renamed to ‎docs/sources/operations/troubleshooting/_index.md‎
@@ -0,0 +1,72 @@
+---
+title: Monitoring and alerting ingest
+description:  Set up Prometheus alerting rules to detect common ingestion issues before they impact your logging pipeline.
+weight: 100
+---
+
+
+# Monitoring and alerting ingest
+
+Set up Prometheus alerting rules to detect common ingestion issues before they impact your logging pipeline. Create a file named `loki-ingestion-alerts.yml` (or add to your existing Prometheus rules file) with the following alerting rules:
+
+```yaml
+# File: loki-ingestion-alerts.yml
+# Add this file to your Prometheus rule_files configuration:
+#   rule_files:
+#     - /etc/prometheus/rules/loki-ingestion-alerts.yml
+
+groups:
+  - name: loki_ingestion
+    rules:
+      # Rate limit alerts
+      - alert: LokiRequestRateLimited
+        expr: sum by (tenant) (rate(loki_discarded_samples_total{reason="rate_limited"}[5m])) > 0
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Tenant {{ $labels.tenant }} is being rate limited"
+          description: "Tenant {{ $labels.tenant }} has exceeded ingestion rate limits. Consider increasing limits or reducing log volume."
+      
+      # Stream limit alerts
+      - alert: LokiStreamLimitReached
+        expr: sum by (tenant) (rate(loki_discarded_samples_total{reason="stream_limit"}[5m])) > 0
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Tenant {{ $labels.tenant }} has reached stream limit"
+          description: "Tenant {{ $labels.tenant }} has exceeded max_global_streams_per_user. Reduce label cardinality or increase the limit."
+      
+      # WAL alerts
+      - alert: LokiWALDiskFull
+        expr: increase(loki_ingester_wal_disk_full_failures_total[5m]) > 0
+        for: 1m
+        labels:
+          severity: critical
+        annotations:
+          summary: "WAL disk is full on {{ $labels.instance }}"
+          description: "Ingester {{ $labels.instance }} cannot write to WAL due to disk space. Data durability is compromised."
+      
+      # Validation errors
+      - alert: LokiHighValidationErrors
+        expr: sum by (reason) (rate(loki_discarded_samples_total{reason=~"invalid_labels|line_too_long|out_of_order"}[5m])) > 10
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High rate of {{ $labels.reason }} validation errors"
+          description: "Loki is discarding logs due to {{ $labels.reason }}. Check your log shipping configuration."
+```
+
+To enable these alerts, add the rules file to your Prometheus configuration:
+
+```yaml
+# prometheus.yml
+rule_files:
+  - /etc/prometheus/rules/loki-ingestion-alerts.yml
+```
+
+{{< admonition type="tip" >}}
+The [Loki mixin](https://github.com/grafana/loki/tree/main/production/loki-mixin) provides a comprehensive set of pre-built dashboards and alerting rules for monitoring Loki in production.
+{{< /admonition >}}
@@ -81,3 +81,50 @@ Loki and Promtail have flags which will dump the entire config object to stderr
 `-print-config-stderr` works well when invoking Loki from the command line, as you can get a quick output of the entire Loki configuration.
 
 `-log-config-reverse-order` is the flag Grafana runs Loki with in all our environments. The configuration entries are reversed, so that the order of the configuration reads correctly top to bottom when viewed in Grafana's Explore.
+
+## Recommended production limits
+
+```yaml
+limits_config:
+  # Rate limits
+  ingestion_rate_strategy: global
+  ingestion_rate_mb: 10
+  ingestion_burst_size_mb: 20
+  per_stream_rate_limit: 3MB
+  per_stream_rate_limit_burst: 15MB
+  
+  # Stream limits
+  max_global_streams_per_user: 10000
+  max_streams_per_user: 0
+  
+  # Validation
+  max_line_size: 256KB
+  max_line_size_truncate: false
+  max_label_name_length: 1024
+  max_label_value_length: 2048
+  max_label_names_per_series: 15
+  
+  # Time constraints
+  reject_old_samples: true
+  reject_old_samples_max_age: 168h  # 7 days
+  creation_grace_period: 10m
+  unordered_writes: true
+```
+
+## Ingester configuration
+
+```yaml
+ingester:
+  # Chunk settings
+  chunk_idle_period: 30m
+  chunk_target_size: 1572864  # 1.5 MB
+  chunk_encoding: snappy
+  max_chunk_age: 2h
+  
+  # WAL settings
+  wal:
+    enabled: true
+    dir: /loki/wal
+    checkpoint_duration: 5m
+    flush_on_shutdown: true
+```