Skip to content

Conversation

thomasgegout
Copy link
Collaborator

Purpose

Add AI-powered intelligent email search functionality in the search bar using the Albert API from Etalab to enable semantic search capabilities beyond traditional keyword matching.

Proposal

This PR introduces a comprehensive deep search system that allows users to find relevant emails using natural language queries, using two different methods.

Key Features:

  • RAG Search: use of the RAG of Albert API for deep search
  • Smart Email Processing: Full content indexing including attachments' names and metadata
  • Contextual Search: other method for deep search using only the context window of albert-large (mistral small)
  • Frontend Integration: a new field "Demander à l'IA de vous aider à rechercher..." in the search bar

@thomasgegout thomasgegout marked this pull request as draft July 28, 2025 12:04
Comment on lines +15 to +17
base_url: str = os.getenv("AI_BASE_URL", "https://albert.api.etalab.gouv.fr/v1")
api_key: str = os.getenv("AI_API_KEY", "")
model: str = os.getenv("AI_MODEL", "albert-large")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should retrieve that from django settings

return True

# Reindex if it's been more than 2 hours since last indexing (to handle any API issues)
if self.last_index_time and (current_time - self.last_index_time) > 7200:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could declare the duration as a class attribute

Suggested change
if self.last_index_time and (current_time - self.last_index_time) > 7200:
if self.last_index_time and (current_time - self.last_index_time) > self.INDEX_STALE_TIME:

# Use default embedding model as fallback
self.embeddings_model = "embeddings-small"

def collection_exists(self, user_id: str) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is duplicated

"content": email.get('body', ''),
"metadata": {
"subject": email.get('subject', ''),
"sender": email.get('sender', ''),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could interesting to index recipients nope ? (to and cc)
Maybe also if email has attachments just through a boolean value ?

Comment on lines +55 to +60
# Get user ID from authenticated user
if not hasattr(request, 'user') or not request.user.is_authenticated:
return Response({
'success': False,
'error': 'Authentication required'
}, status=status.HTTP_401_UNAUTHORIZED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user is not authenticated the permission IsAuthenticated should have already return a 401 response.

user_id=user_id,
user_query=query,
api_client=chatbot.api_client,
max_results=10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we imagine to let api consumer to set this property through a query param? Or is it a threshold for performance purpose?

logger = logging.getLogger(__name__)


@api_view(['POST'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be a get endpoint not a post as it does not create resource then the query can be easily get from a query param

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You tried to standardize response format but currently it will really hard to maintain. IMO a DRF Serializer could help to factorize that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants