Skip to content

TheKnower0x0/TTS-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TTS-API

Overview

This project provides a FastAPI-based backend for performing the following tasks:

  1. OCR (Optical Character Recognition): Extract Arabic text from images.
  2. Text-to-Speech (TTS): Convert text to speech and return an audio file.

Endpoints

1. /ocr - Extract Arabic Text from Images

  • Method: POST
  • Description: Upload an image to extract Arabic text using OCR.
  • Request:
    • file (form-data): The image file.
    • lang (form-data, optional): Language code (default: ara).
  • Response:
    • Extracted text in JSON format.

3. /tts - Convert Text to Speech

  • Method: POST
  • Description: Convert text to speech and return an audio file.
  • Request:
    • text (form-data): Text to convert to speech.
    • voice (form-data, optional): Voice name (default: Aisha).
  • Response:
    • Audio file in MP3 format.

How to Use with Frontend

  1. OCR:

    • Use a file input to upload an image.
    • Send a POST request to /ocr with the image file and optional language code.
    • Display the extracted text from the response.
  2. Text-to-Speech:

    • Send a POST request to /tts with the text and optional language code.
    • Play or download the returned MP3 file.

Setup

  1. Install dependencies:
    pip install -r requirements.txt
  2. Run the FastAPI server:
    uvicorn main:app --reload
  3. Access the API documentation at http://localhost:8000/docs to test the endpoints interactively.

Prerequisites

Install CUDA

Ensure that CUDA is installed on your computer to enable GPU acceleration for supported libraries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages