Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions README_v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Aria v2 - Enhanced with LangChain Agents

Aria v2 includes enhanced agent capabilities using LangChain tools for web search, Wikipedia lookup, and mathematical calculations.

## What's New in v2

### 🔧 Enhanced Tool System

- **DuckDuckGo Web Search**: Get current information and news
- **Wikipedia Integration**: Access detailed encyclopedia information
- **Smart Calculator**: Handle natural language math queries
- **Tool Usage Tracking**: Monitor which tools are being used

### 🤖 LangChain Agent Integration

- Automatic tool selection based on user queries
- Seamless integration with existing LLM system
- Enhanced response generation using tool results
- Maintains Aria's sarcastic personality

## Usage

### Running Aria v2

```bash
# Use the enhanced version
python main_v2.py

# Or use the original version
python main.py

# Test tools without full UI (useful for headless testing)
python test_agent.py
```

### Tool Trigger Keywords

#### Calculator Tool

- "calculate", "compute", "math"
- Mathematical operators: +, -, \*, /
- Natural language: "What is 15 \* 7?"

#### Web Search Tool

- "search", "find", "current", "recent", "news", "today"
- Examples: "Search for recent AI news", "What's happening today?"

#### Wikipedia Tool

- "what is", "who is", "definition"
- Examples: "What is machine learning?", "Who is Einstein?"

## Example Interactions

```
🎙 You: What is 25 * 4?
🔧 Tool Used: calculator - Query: What is 25 * 4?
🤖 Aria: 25*4 = 100. Math wizard at your service, human.

🎙 You: Search for recent AI news
🔧 Tool Used: web_search - Query: Search for recent AI news
🤖 Aria: Found some AI drama. The robots are still plotting, apparently.

🎙 You: What is quantum computing?
🔧 Tool Used: wikipedia_search - Query: What is quantum computing?
🤖 Aria: Quantum stuff. Basically computers that exist and don't exist simultaneously.
```

## Files Structure

```
├── main_v2.py # Enhanced main file with agent integration
├── components/
│ ├── llm_enhanced.py # Enhanced LLM wrapper with agent capabilities
│ ├── lungchanAgents/
│ │ └── lungchainAgent.py # Core LangChain agent implementation
│ └── tools/
│ └── tools.py # Tool implementations with usage tracking
├── test_agent.py # Test script for agent functionality
└── README_v2.md # This file
```

## Technical Details

### Enhanced LLM Wrapper

The `EnhancedLLMWithAgent` class wraps the original LLM and adds:

- Automatic tool detection and execution
- Response enhancement using tool results
- Compatibility with existing interface
- Tool usage statistics

### Agent Architecture

- **Tool Selection**: Keyword-based detection determines which tools to use
- **Execution**: Tools are executed with user input and results captured
- **Response Generation**: LLM generates responses based on tool results
- **Logging**: All tool usage is tracked and logged

### Safety Features

- Mathematical expressions are safely evaluated
- Web search results are sanitized
- Error handling with graceful fallbacks
- Tool usage limitations and monitoring

## Dependencies

Additional packages required for v2:

```bash
pip install langchain langchain-community duckduckgo-search ddgs
```

## Troubleshooting

### Import Errors

If you get import errors, make sure you're in the correct directory and all dependencies are installed:

```bash
pip install langchain langchain-community duckduckgo-search ddgs
```

### UI Initialization Error

If you get `AttributeError: 'NoneType' object has no attribute 'get'`, ensure you're using the correct startup method:

```bash
# ✅ Correct way (fixed in v2)
python main_v2.py

# ❌ Wrong way that causes UI errors
python main_v2.py --config missing_file.json
```

### Model Loading Time

The first startup may take 1-2 minutes while loading the 8GB model. You'll see:

```
Loading Aria v2 with LangChain Agent...
Loading model from: /path/to/model.gguf
Model file exists: True
```

### Web Search Issues

If web search returns no results, try more specific queries or check your internet connection.

### Tool Not Triggering

Make sure your query contains the appropriate keywords. You can also check the tool usage logs for debugging.

### UI Issues in Terminal Environments

If running on a headless server or via SSH, you may need X11 forwarding:

```bash
ssh -X username@hostname
# or
export DISPLAY=:0
```

## Configuration

The agent uses the same configuration as the original Aria system. Tool behavior can be customized by modifying the `AgentTools` class in `components/tools/tools.py`.

## Performance

Tool execution adds minimal latency:

- Calculator: <10ms
- Wikipedia: 1-3 seconds
- Web Search: 1-5 seconds depending on network

The system gracefully falls back to regular LLM responses if tools fail.
86 changes: 78 additions & 8 deletions components/llm.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import sys
import os
from llama_cpp import Llama
from huggingface_hub import hf_hub_download
from .utils import remove_emojis
from .lungchanAgents.lungchainAgent import LangChainAgent


class Llm:
Expand All @@ -17,6 +19,9 @@ def __init__(self, params=None):
self.system_message = self.params.get("system_message", None)
self.verbose = self.params.get("verbose", None)

# Initialize LangChain agent
self.agent = LangChainAgent(self, self.system_message)

if self.custom_path != "":
model_path = self.custom_path
elif isinstance(self.model_file, list):
Expand All @@ -25,21 +30,86 @@ def __init__(self, params=None):
else:
model_path = hf_hub_download(self.model_name, filename=self.model_file)

self.llm = Llama(
model_path=model_path,
n_gpu_layers=self.num_gpu_layers,
n_ctx=self.context_length,
chat_format=self.chat_format,
verbose=self.verbose,
)
print(f"Loading model from: {model_path}")
print(f"Model file exists: {os.path.exists(model_path) if model_path else False}")
print(f"Context length set to: {self.context_length}") # Debug context length
print(f"GPU layers set to: {self.num_gpu_layers}") # Debug GPU layers

try:
self.llm = Llama(
model_path=model_path,
n_gpu_layers=self.num_gpu_layers,
n_ctx=self.context_length,
chat_format=self.chat_format,
verbose=True, # Enable verbose for debugging
)
except Exception as e:
print(f"Error loading model: {e}")
print(f"Trying with reduced GPU layers...")
try:
self.llm = Llama(
model_path=model_path,
n_gpu_layers=0, # Try CPU-only
n_ctx=self.context_length,
chat_format=self.chat_format,
verbose=True,
)
except Exception as e2:
print(f"Failed with CPU-only: {e2}")
raise e2

self.messages = [{"role": "system", "content": self.system_message}]

def should_use_tools(self, user_input: str) -> bool:
"""Determine if the query requires tool usage"""
return self.agent.should_use_tools(user_input)

def execute_tool(self, user_input: str) -> str:
"""Execute appropriate tool based on user input"""
return self.agent.process_with_tools(user_input)

def get_tool_usage_stats(self):
"""Get statistics about tool usage"""
return self.agent.get_tool_stats()

def get_answer(self, ui, ap, tts, data):
# Use the LangChain agent to get response (it handles tool usage internally)
try:
response = self.agent.get_response(data, ui, ap, tts)
return response
except Exception as e:
print(f"Agent error: {e}")
# Fallback to basic LLM response
return self._basic_llm_response(ui, ap, tts, data)

def _basic_llm_response(self, ui, ap, tts, data):
"""Fallback method for basic LLM response without agent"""
self.messages.append({"role": "user", "content": data})

# Define stop tokens to prevent unwanted continuation
stop_tokens = [
"<|im_end|>",
"<|im_start|>",
"<|end_of_text|>",
"<|eot_id|>",
"User:",
"Human:",
"Assistant:",
"\nUser:",
"\nHuman:",
"\nWhat",
"What is the capital",
"I found this information",
"Next question",
"Please provide"
]

outputs = self.llm.create_chat_completion(
self.messages, stream=self.streaming_output
self.messages,
stream=self.streaming_output,
stop=stop_tokens,
max_tokens=50, # Reduce max tokens for shorter responses
temperature=0.7 # Reduce randomness for more predictable output
)

if self.streaming_output:
Expand Down
Empty file.
Loading