Knowledge Base Processor

A Python tool for extracting, analyzing, and managing metadata from Markdown-based knowledge bases. The processor parses Markdown files to extract tags, headings, links, and other structured information, supporting advanced knowledge management workflows.

Features

🔍 Extracts metadata, tags, and structural elements from Markdown files
🏗️ Modular architecture for analyzers, extractors, and enrichers
🔌 Easily extensible for new metadata types or processing logic
🎨 Modern command-line interface with rich terminal UI
📊 Interactive mode for guided workflows
🔄 Real-time file watching and continuous processing
🧪 Comprehensive test suite

Quick Start

Installation

Clone the repository:

git clone https://github.com/your-username/knowledgebase-processor.git
cd knowledgebase-processor

Install Poetry (if not already installed):

curl -sSL https://install.python-poetry.org | python3 -

Install dependencies:
```
poetry install
```

Basic Usage

The Knowledge Base Processor provides a modern CLI interface with two command aliases: kb and kbp.

# Initialize a new knowledge base in the current directory
kb init

# Process documents in the current directory
kb scan

# Search for content
kb search "todo items"

# Process and sync to SPARQL endpoint in one command
kb publish --endpoint http://localhost:3030/kb

# Enter interactive mode (just run kb without arguments)
kb

CLI Commands

🚀 `kb init` - Initialize Knowledge Base

Configure the processor for your documents:

kb init                    # Interactive setup
kb init ~/Documents        # Initialize specific directory
kb init --name "My KB"     # Set project name

📁 `kb scan` - Process Documents

Process documents and extract knowledge entities:

kb scan                           # Scan current directory
kb scan ~/Documents              # Scan specific directory
kb scan --pattern "*.md"         # Only process Markdown files
kb scan --watch                  # Watch for changes
kb scan --sync --endpoint <url>  # Process + sync to SPARQL

🔍 `kb search` - Search Knowledge Base

Search your processed knowledge base:

kb search "machine learning"     # Full-text search
kb search --type todo "project"  # Search specific entity types
kb search --tag important        # Search by tags

📤 `kb publish` - Publish to SPARQL

Process and sync to SPARQL endpoint in one command:

kb publish                       # Use default endpoint
kb publish --endpoint <url>      # Specify endpoint
kb publish --watch               # Continuous publishing mode
kb publish --graph <uri>         # Specify named graph

🔄 `kb sync` - Sync to SPARQL

Sync already processed data to SPARQL endpoint:

kb sync                          # Sync to default endpoint
kb sync --endpoint <url>         # Specify endpoint
kb sync --clear                  # Clear endpoint before sync

📊 `kb status` - Show Status

Display knowledge base statistics and status:

kb status                        # Show current status
kb status --detailed             # Show detailed statistics

⚙️ `kb config` - Manage Configuration

View and manage configuration:

kb config show                   # Display current config
kb config set endpoint <url>    # Set SPARQL endpoint
kb config reset                  # Reset to defaults

Advanced Usage

Interactive Mode

Run kb without any arguments to enter interactive mode with a guided interface:

kb

Process with RDF Output

Generate RDF/TTL files during processing:

kb scan --rdf-output ./rdf_output

Continuous Processing

Watch for file changes and automatically process:

kb scan --watch
kb publish --watch

Using as a Python Module

# Run CLI as a module
python -m knowledgebase_processor.cli --help

Development

Running Tests

Run all tests using the provided script:

poetry run python scripts/run_tests.py

Or use pytest directly:

poetry run pytest
poetry run pytest tests/cli/  # Test CLI specifically

Architecture

The processor uses a service-oriented architecture with clear separation between:

CLI Layer: User interface and command handling
Service Layer: Business logic and orchestration
Data Layer: Document processing and persistence

See ARCHITECTURE_V2.md for detailed architecture documentation.

Configuration

The processor can be configured via:

Command-line arguments (highest priority)
Configuration file (.kbp/config.yaml)
Environment variables
Default values

Example configuration file:

knowledge_base:
  path: /path/to/documents
  patterns:
    - "*.md"
    - "*.markdown"
sparql:
  endpoint: http://localhost:3030/kb
  graph: http://example.org/kb
processing:
  batch_size: 100
  parallel: true

Wikilinks Support

The processor handles wikilinks [[A wikilink]] and extracts them as relationships between documents.

Contributing

Fork the repository, create a feature branch, and submit a pull request. Please ensure all tests pass before submitting.

License

[Add your license information here]

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.claude		.claude
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.roo		.roo
.state/tasks		.state/tasks
.vscode		.vscode
docs		docs
examples		examples
memory		memory
scripts		scripts
specs		specs
src/knowledgebase_processor		src/knowledgebase_processor
test-cli		test-cli
tests		tests
vocabulary		vocabulary
webapp		webapp
{{workspace}}		{{workspace}}
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
.roomodes		.roomodes
ARCHITECTURE_V2.md		ARCHITECTURE_V2.md
CLAUDE.md		CLAUDE.md
CONSOLIDATION_SUMMARY.md		CONSOLIDATION_SUMMARY.md
Dockerfile		Dockerfile
Dockerfile.alpine		Dockerfile.alpine
ENHANCED_ARCHITECTURE.md		ENHANCED_ARCHITECTURE.md
LICENSE		LICENSE
README-docker.md		README-docker.md
README.md		README.md
REFACTOR.md		REFACTOR.md
conversation.txt		conversation.txt
docker-compose.app.yml		docker-compose.app.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
project.toml		project.toml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
task_log.jsonl		task_log.jsonl
task_queue.jsonl		task_queue.jsonl
test_issue_49_integration.py		test_issue_49_integration.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Knowledge Base Processor

Features

Quick Start

Installation

Basic Usage

CLI Commands

🚀 `kb init` - Initialize Knowledge Base

📁 `kb scan` - Process Documents

🔍 `kb search` - Search Knowledge Base

📤 `kb publish` - Publish to SPARQL

🔄 `kb sync` - Sync to SPARQL

📊 `kb status` - Show Status

⚙️ `kb config` - Manage Configuration

Advanced Usage

Interactive Mode

Process with RDF Output

Continuous Processing

Using as a Python Module

Development

Running Tests

Architecture

Configuration

Wikilinks Support

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

dstengle/knowledgebase-processor

Folders and files

Latest commit

History

Repository files navigation

Knowledge Base Processor

Features

Quick Start

Installation

Basic Usage

CLI Commands

🚀 kb init - Initialize Knowledge Base

📁 kb scan - Process Documents

🔍 kb search - Search Knowledge Base

📤 kb publish - Publish to SPARQL

🔄 kb sync - Sync to SPARQL

📊 kb status - Show Status

⚙️ kb config - Manage Configuration

Advanced Usage

Interactive Mode

Process with RDF Output

Continuous Processing

Using as a Python Module

Development

Running Tests

Architecture

Configuration

Wikilinks Support

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

🚀 `kb init` - Initialize Knowledge Base

📁 `kb scan` - Process Documents

🔍 `kb search` - Search Knowledge Base

📤 `kb publish` - Publish to SPARQL

🔄 `kb sync` - Sync to SPARQL

📊 `kb status` - Show Status

⚙️ `kb config` - Manage Configuration

Packages