Skip to content

Commit aa3d881

Browse files
authored
Merge pull request #112 from getomni-ai/mark/enable-streaming
Stream OCR result by page & code restructure
2 parents fab0c9f + 777da09 commit aa3d881

File tree

14 files changed

+609
-484
lines changed

14 files changed

+609
-484
lines changed

README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ The general logic:
1515
- Pass each image to GPT and ask nicely for Markdown
1616
- Aggregate the responses and return Markdown
1717

18-
Try out the hosted version here: https://getomni.ai/ocr-demo
18+
Try out the hosted version here: <https://getomni.ai/ocr-demo>
1919

2020
## Getting Started
2121

@@ -76,9 +76,13 @@ const result = await zerox({
7676
cleanup: true, // Clear images from tmp after run.
7777
concurrency: 10, // Number of pages to run at a time.
7878
correctOrientation: true, // True by default, attempts to identify and correct page orientation.
79+
errorMode: ErrorMode.IGNORE, // ErrorMode.THROW or ErrorMode.IGNORE, defaults to ErrorMode.IGNORE.
7980
maintainFormat: false, // Slower but helps maintain consistent formatting.
81+
maxRetries: 1, // Number of retries to attempt on a failed page, defaults to 1.
8082
maxTesseractWorkers: -1, // Maximum number of tesseract workers. Zerox will start with a lower number and only reach maxTesseractWorkers if needed.
81-
model: 'gpt-4o-mini' // Model to use (gpt-4o-mini or gpt-4o).
83+
model: "gpt-4o-mini", // Model to use (gpt-4o-mini or gpt-4o).
84+
onPostProcess: async ({ page, progressSummary }) => Promise<void>, // Callback function to run after each page is processed.
85+
onPreProcess: async ({ imagePath, pageNumber }) => Promise<void>, // Callback function to run before each page is processed.
8286
outputDir: undefined, // Save combined result.md to a file.
8387
pagesToConvertAsImages: -1, // Page numbers to convert to image as array (e.g. `[1, 2, 3]`) or a number (e.g. `1`). Set to -1 to convert all pages.
8488
tempDir: "/os/tmp", // Directory to use for temporary files (default: system temp directory).
@@ -132,17 +136,23 @@ Request #3 => page_2_markdown + page_3_image
132136
'**Terms:** \n' +
133137
'Order ID : CA-2012-AB10015140-40974 ',
134138
page: 1,
135-
contentLength: 747
139+
contentLength: 747,
140+
status: 'SUCCESS',
136141
}
137-
]
142+
],
143+
summary: {
144+
failedPages: 0,
145+
successfulPages: 1,
146+
totalPages: 1,
147+
},
138148
}
139149
```
140150

141151
## Python Zerox
142152

143153
(Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock etc)
144154

145-
### Installation:
155+
### Installation
146156

147157
- Install **poppler** on the system, it should be available in path variable. See the [pdf2image documentation](https://pdf2image.readthedocs.io/en/latest/installation.html) for instructions by platform.
148158
- Install py-zerox:
@@ -285,7 +295,7 @@ Returns
285295
- ZeroxOutput:
286296
Contains the markdown content generated by the model and also some metadata (refer below).
287297

288-
### Example Output (Output from "azure/gpt-4o-mini"):
298+
### Example Output (Output from "azure/gpt-4o-mini")
289299

290300
`Note: The output is mannually wrapped for this documentation for better readability.`
291301

@@ -340,7 +350,7 @@ ZeroxOutput(
340350
)
341351
````
342352

343-
## Supported File Types:
353+
## Supported File Types
344354

345355
We use a combination of `libreoffice` and `graphicsmagick` to do document => image conversion. For non-image / non-pdf files, we use libreoffice to convert that file to a pdf, and then to an image.
346356

@@ -373,7 +383,7 @@ We use a combination of `libreoffice` and `graphicsmagick` to do document => ima
373383

374384
## Credits
375385

376-
- [Litellm](https://github.com/BerriAI/litellm): https://github.com/BerriAI/litellm | This powers our python sdk to support all popular vision models from different providers.
386+
- [Litellm](https://github.com/BerriAI/litellm): <https://github.com/BerriAI/litellm> | This powers our python sdk to support all popular vision models from different providers.
377387

378388
### License
379389

0 commit comments

Comments
 (0)