Skip to content

πŸš€ An intelligent, LLM-enhanced log parser pipeline that converts multi-format raw logs into structured JSON, learns from missed patterns, and evolves using Drain3 & open-source LLMs.

License

Notifications You must be signed in to change notification settings

mrsahiljaiswal/adaptive-log-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧭 Adaptive Log Parser System with LLM-Driven Intelligence

A smart log processing pipeline where logs β€” regardless of source, structure, or format β€” are:

βœ… Automatically analyzed and understood
🧠 Matched against known or discovered structures
πŸ“¦ Converted into clean JSON for downstream use (RAG, dashboards, alerts)
πŸ” Continuously improved by learning from what it fails to parse


πŸš€ Phase-Wise Implementation Roadmap

βœ… Phase 1: Rule-Based Multi-Pattern Log Parser

Status: βœ… Implemented

  • Uses manually defined regex patterns for known formats (Apache, Syslog, SSH, etc.)
  • Converts matching log lines into JSONL
  • Logs that do not match are skipped and stored separately

πŸ”„ Phase 2: Feedback-Aware Parser with Skipped Log Tracker

Goal: Track all unmatched lines for improvement

Features:

  • Saves unparsed lines to SkippedLogs/
  • Records file name and line number for traceability
  • Enables continuous learning and correction

🧠 Phase 3: LLM-Assisted Pattern Discovery

Goal: Dynamically extract structure from unknown log formats using open-source LLMs like Mistral, Gemma, or LLaMA3.

Steps:

  • Pass skipped lines to an LLM with a prompt like:
    You are a log analysis assistant. Given the following log line, extract:
    - timestamp
    - level
    - message
    Return the output as JSON.
    
  • Cache and validate LLM outputs
  • Add to training or deployable pattern bank

Benefits:

  • Removes the need for new regexes
  • Handles unstructured, unknown, or mixed-format logs

🧬 Phase 4: Self-Training Log Template Miner (Drain3 / Spell)

Goal: Automatically learn templates and clusters from logs

Features:

  • Use Drain3 to:
    • Discover static and dynamic fields
    • Group logs into clusters
    • Mine templates like User * logged in from *
  • Store mined templates for downstream use or learning
  • Use clustering insights to guide new pattern or anomaly detection

♻️ Phase 5: Autonomous Parser Evolution Engine

Goal: Build a self-improving parser system

How:

  • Reprocess skipped lines periodically
  • Generate new patterns from LLM or Drain3
  • Validate outputs with scoring or confidence thresholds
  • Add verified patterns to live_parser_patterns.json

πŸ“ˆ Optional Enhancements

Feature Description
πŸ§ͺ Accuracy scoring Manual or LLM-assisted evaluation
🧠 Confidence thresholds Auto-accept LLM outputs above threshold
πŸ“Š Parsing dashboard Visualize logs parsed, templates learned, anomalies
πŸ” Secure fine-tuning Handle PII-sensitive logs privately
πŸ’¬ RAG-based querying Ask questions from logs via embedded vector DB

βœ… Log Intelligence Pipeline Diagram

graph TD
  A[Raw Logs] --> B[Regex-based Parser]
  B -->|Parsed| C[JSONL Logs]
  B -->|Skipped| D[SkippedLogs/]
  D --> E[LLM Analysis & Labeling]
  D --> F[Drain3 Template Mining]
  E --> G[Auto-Generated Patterns]
  F --> G
  G --> H[Updated Parser Patterns]
  H --> B
  C --> I[RAG / Vector DB]
Loading

πŸ“ Suggested Folder Structure

log-parser-intelligent/
β”œβ”€β”€ logs/                  # Raw input logs
β”œβ”€β”€ ParsedLogs/           # Parsed JSONL files
β”œβ”€β”€ SkippedLogs/          # Unmatched logs with trace info
β”œβ”€β”€ Anomalies/            # Drain3-flagged anomalies
β”œβ”€β”€ Patterns/
β”‚   β”œβ”€β”€ live_parser_patterns.json
β”‚   └── learned_templates.json
β”œβ”€β”€ llm_prompts/
β”‚   └── log_schema_extraction.txt
β”œβ”€β”€ vectorstore/          # For RAG embeddings
β”œβ”€β”€ drain3_snapshot.json  # Template cluster snapshot
└── README.md             # This file

πŸ› οΈ Setup & Usage

  1. Clone this repo
  2. Install dependencies:
    pip install drain3 openai chromadb
  3. Run the multi-parser:
    python parse_logs.py --input ./logs --output ./ParsedLogs
  4. Run LLM-assist:
    python enrich_with_llm.py --input ./SkippedLogs --output ./ParsedLogs

πŸ™‹ Contributing

Want to add new patterns, LLM prompt styles, or vector search capabilities?
Feel free to fork and raise a PR.


🧠 Credits & Stack

  • Drain3
  • ChromaDB
  • Open-source LLMs: Mistral / Gemma / LLaMA3 via Ollama
  • Inspired by real-world log intelligence & observability challenges

πŸ“¬ Contact

Feel free to connect for ideas, issues or collaborations:

About

πŸš€ An intelligent, LLM-enhanced log parser pipeline that converts multi-format raw logs into structured JSON, learns from missed patterns, and evolves using Drain3 & open-source LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages