Skip to content

feat: Add support for separate LLM and embedding model endpoints #1434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sriramsowmithri9807
Copy link
Contributor

@sriramsowmithri9807 sriramsowmithri9807 commented Jun 13, 2025

Description (issue solved #1367)

This PR adds support for using different endpoints for LLM inference and embedding models, which is particularly useful when running separate llama.cpp servers for each function.

Changes Made

  • Added LLM_ENDPOINT and EMBEDDING_ENDPOINT configuration options
  • Updated GenericLLMProvider to handle custom base URLs for different providers
  • Enhanced embedding initialization to support separate endpoints
  • Improved configuration handling for both LLM and embedding providers
  • Added proper environment variable support for endpoint configuration

How to Test

  1. Set up your environment variables:
    export LLM_ENDPOINT="http://localhost:8080/v1"
    export EMBEDDING_ENDPOINT="http://localhost:8081/v1"
    export FAST_LLM="openai:llama3"  # or your preferred model
    export EMBEDDING="openai:llama-embed"  # or your preferred embedding model
    

Or update your config file:

python

{
    "LLM_ENDPOINT": "http://localhost:8080/v1",
    "EMBEDDING_ENDPOINT": "http://localhost:8081/v1",
    "FAST_LLM": "openai:llama3",
    "EMBEDDING": "openai:llama-embed"
}

sriramsowmithri9807 and others added 2 commits June 13, 2025 17:40
- Added LLM_ENDPOINT and EMBEDDING_ENDPOINT configuration options
- Updated GenericLLMProvider to handle custom base URLs
- Enhanced embedding initialization to use separate endpoints
- Improved configuration handling for both LLM and embedding providers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant