Performant calculation of image and text element page coverage

I very much like the new `Page.getTextBlocks(images=True)`, thank you for adding that :).  One of the most important uses for us is to to calculate the 

- `% of the page area` covered by an **image blocks** 
- and `% of the page area` covered by **text blocks**. 

We need these derived values to assume (with a threshold) if the page needs to be processed with OCR as part of our pipeline. We are calculating the total union area of rectangle blocks using packages with C bindings like using Numpy or Shapely however we hate having these requirements *ALTHOUGH doing this in straight python could be much slower*. 

This is general `feature request`: it would be nice to have metrics like this *(with a highly performant implementation)* as part of the page model 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performant calculation of image and text element page coverage #124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performant calculation of image and text element page coverage #124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions