An intelligent voice-based AI assistant that transcribes speech and answers questions in real-time using OpenAI's Whisper and Llama models.
- ๐ค Real-time Audio Recording & Transcription: Capture and convert speech to text instantly.
- ๐ง Local Speech Recognition: Utilizes the Whisper Base model for efficient on-device processing.
- ๐ก AI-Powered Responses: Leverages Llama (via Ollama) for intelligent question answering.
- ๐ High-Quality Audio Processing: Includes noise filtering for clearer audio input.
- ๐ CUDA Acceleration: Supports GPU acceleration for faster performance.
- ๐ป Cross-Platform Compatibility: Works on Windows, Linux, and macOS.
- Programming Language: Python 3.8+
- Speech-to-Text: OpenAI Whisper (Base model)
- Language Model: Llama (via Ollama)
- Core Libraries:
- PyTorch
- Transformers
- SoundFile
- Audio Backend: FFMPEG
- ๐ Python 3.8 or higher.
- ๐ฎ CUDA-capable GPU (Optional, but highly recommended for performance).
- ๐๏ธ FFMPEG installed and accessible in your system's PATH.
- ๐ฆ Ollama installed and running locally.
- ๐ง A compatible audio input device (Defaults to HyperX Cloud Stinger Core Wireless on Windows, or system default otherwise).
-
Clone the Repository:
git clone https://github.com/yourusername/WhispererAI.git cd WhispererAI -
Set Up a Virtual Environment:
python -m venv venv
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Install Dependencies:
pip install -r requirements.txt
-
Install and Run Ollama:
- Download and install Ollama from ollama.com.
- Ensure the Ollama service is running.
- Pull the Llama model you intend to use (e.g.,
ollama pull llama3.2).
-
Launch the Application:
python app.py
-
Interact with the Assistant:
- Press
Rto Start Recording your voice. - Press
Sto Stop Recording and process the audio. - Press
Cto Clear the terminal screen. - Press
Qto Quit the application.
- Press
The application comes with the following default settings:
- Audio Sample Rate: 48kHz
- Audio Channels: Mono
- Whisper Model:
openai/whisper-base - LLM (via Ollama):
llama3.2(Ensure this model is available in your Ollama setup) - Processing Device: CUDA (if available), otherwise CPU.
- Audio Filters:
- High-pass: 50Hz
- Low-pass: 15kHz
- Volume Boost: 1.5x
- Windows: Attempts to automatically detect "Microphone (HyperX Cloud Stinger Core Wireless DTS)".
- Linux/macOS: Uses the default system audio input device.
- โน๏ธ If the preferred device isn't found, the application will list available audio devices. You may need to modify
app.pyto specify your device.
Contributions are highly encouraged and welcome! If you have improvements or bug fixes, please:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourAmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/YourAmazingFeature). - Open a Pull Request.
- Ensure Ollama is running with the specified model before starting WhispererAI.
- Configure your audio input device in
app.pyif the default settings don't work for your setup. - For the best performance, a CUDA-capable GPU is recommended.
This project is licensed under the MIT License. See the LICENSE file for more details (assuming a LICENSE file exists or will be created).
Happy Whispering! ๐ฌ