This project provides a FastAPI-based backend for performing the following tasks:
- OCR (Optical Character Recognition): Extract Arabic text from images.
- Text-to-Speech (TTS): Convert text to speech and return an audio file.
- Method:
POST - Description: Upload an image to extract Arabic text using OCR.
- Request:
file(form-data): The image file.lang(form-data, optional): Language code (default:ara).
- Response:
- Extracted text in JSON format.
- Method:
POST - Description: Convert text to speech and return an audio file.
- Request:
text(form-data): Text to convert to speech.voice(form-data, optional): Voice name (default:Aisha).
- Response:
- Audio file in MP3 format.
-
OCR:
- Use a file input to upload an image.
- Send a
POSTrequest to/ocrwith the image file and optional language code. - Display the extracted text from the response.
-
Text-to-Speech:
- Send a
POSTrequest to/ttswith the text and optional language code. - Play or download the returned MP3 file.
- Send a
- Install dependencies:
pip install -r requirements.txt
- Run the FastAPI server:
uvicorn main:app --reload
- Access the API documentation at
http://localhost:8000/docsto test the endpoints interactively.
Ensure that CUDA is installed on your computer to enable GPU acceleration for supported libraries.
- Download CUDA from the NVIDIA CUDA Toolkit website.
- Follow the installation instructions for your operating system.