OCR and form parsing API with queue-based processing, supporting both TrOCR and Qwen Vision models for optimal document processing workflows.
- Docker & Docker Compose
- NVIDIA GPU with CUDA support (recommended)
- 8GB+ available GPU memory
Step 1: Create Required Directories
# Create training data directory (required for mounted volume)
mkdir /home/trocr_training
sudo chown -R 1000:1000 /home/trocr_trainingStep 2: Deploy Services
# GPU deployment (recommended)
bash deploy.sh gpu
# CPU deployment (fallback)
bash deploy.sh cpuThe API will be available at http://localhost:8000
Note: The
/home/trocr_trainingdirectory will store training examples for improving TrOCR models.
| Endpoint | Method | Description |
|---|---|---|
/api/v1/ |
GET | API documentation and health status |
/api/v1/parse |
POST | Submit form parsing job |
/api/v1/parse/priority |
POST | Submit high-priority parsing job |
/api/v1/parse/status/{job_id} |
GET | Check job status and retrieve results |
/api/v1/parse/queue/status |
GET | Monitor queue performance |
/api/v1/parse/health |
GET | Worker health diagnostics |
/api/v1/parse/gpu/status |
GET | GPU resource monitoring |
When DEPLOYED_OCR=TrOCR is configured:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/ |
GET | Available TrOCR models list |
/api/v1/ |
POST | Submit OCR job to queue |
/api/v1/priority |
POST | Submit high-priority OCR job |
/api/v1/status/{job_id} |
GET | Check OCR job status and results |
/api/v1/queue/status |
GET | Monitor OCR queue performance |
/api/v1/health |
GET | OCR worker health diagnostics |
Form Parsing (Qwen Vision Mode):
curl -X POST "http://localhost:8000/api/v1/parse" \
-F "[email protected]" \
-F "llm_prompt=Extract all form fields as JSON"OCR Processing (TrOCR Mode):
curl -X POST "http://localhost:8000/api/v1/" \
-F "[email protected]"Job Submission Response:
{
"success": true,
"job_id": "uuid-string",
"message": "Form parse job submitted. Poll for status using job_id."
}Job Status Response:
{
"success": true,
"status": "completed",
"message": "Job completed successfully",
"result": {
"success": true,
"filename": "document.png",
"execution_time": 102.5,
"data": { /* extracted content */ }
}
}- Asynchronous job processing with unique job IDs
- Priority queue support for urgent documents
- GPU resource management with automatic memory optimization
- Redis-backed job status tracking
- Qwen Vision: Advanced form parsing and document understanding
- TrOCR: Traditional OCR for text extraction
- PaddleOCR: Text detection and bounding box identification
The API supports two operational modes, configured via config file.
Qwen Vision Mode (Current Default):
DEPLOYED_OCR=Qwen # Enables form parsing endpoints
QWEN_VL_MODEL=Qwen/Qwen2.5-VL-3B-InstructTrOCR Mode:
DEPLOYED_OCR=TrOCR # Enables OCR processing endpoints
DEFAULT_TROCR_MODEL=trocr-large-stage1| Model | GPU Memory | Recommended |
|---|---|---|
| Qwen Vision | 8GB+ | 16GB+ |
| TrOCR | 4GB+ | 8GB+ |
# Overall API health
curl http://localhost:8000/health
# Worker-specific health
curl http://localhost:8000/api/v1/parse/health
# GPU resource status
curl http://localhost:8000/api/v1/parse/gpu/status# Current queue status
curl http://localhost:8000/api/v1/parse/queue/status# View running containers
docker ps
# View logs
docker logs ocr-service
# Stop services
docker-compose down- File validation and size limits
- Input sanitization
- Error handling and logging
- Single-worker configuration for memory stability
- GPU memory management and cleanup
- Background job processing
- Redis-based job persistence
- Horizontal scaling support via multiple instances
- Load balancing ready
# Install dependencies
pip install -r backend/requirements/requirements-gpu.txt
# Run development server
uvicorn backend.app:app --reload --host 0.0.0.0 --port 8000- Processing Time: ~100 seconds per document (Qwen Vision)
- Memory Usage: ~7GB GPU memory baseline
- Throughput: Sequential processing with queue management
- Uptime: Enterprise-grade reliability with health monitoring
- CUDA Out of Memory: Ensure 8GB+ GPU memory available
- Job Timeouts: Check GPU resource allocation
- Queue Backlog: Monitor worker health and GPU status
# Check container logs
docker logs -f ocr-service
# Monitor GPU usage
nvidia-smi
# Redis connection test
docker exec redis redis-cli pingBuilt with FastAPI β’ Docker β’ CUDA β’ Redis