A sample Streamlit application for analyzing documents using AWS Textract and Bedrock - Amazon Nova Micro Model. Users can upload documents (images or PDFs) and get quick insights.
- Document upload (supports PNG, JPG, JPEG, PDF)
- Text extraction using AWS Textract
- Document analysis using AWS Bedrock (Nova Micro model)
- Custom prompts for tailored analysis
- Execution Time Tracking
-
Clone the repository:
git clone https://github.com/aws-samples/textract-bedrock-document-insights.git cd textract-bedrock-document-insights -
Install the required dependencies:
pip install -r requirements.txt
Set the following environment variables:
S3_BUCKET: The name of your S3 bucket for document storageAWS_REGION: The AWS region to use (defaults to "us-east-1" if not set)
You can set these variables in your environment or optionally use a .env file as below:
- Create a
.envfile in the root directory of the project:touch .env
- Add the above environment variables to the .env file
- The application will automatically load these variables using python-dotenv.
Note: Make sure to add .env to your .gitignore file to keep these details secure.
Run the Streamlit application:
streamlit run src/main.py
Then, open your web browser and navigate to the URL provided by Streamlit.
- boto3
- streamlit
- python-dotenv
Make sure you have the necessary AWS credentials configured to access S3, Textract, and Bedrock services, along with model access to Amazon Nova Micro.
If you prefer to run this project in a Python virtual environment, follow these steps:
-
Create a virtual environment in your project directory:
python -m venv venv
-
Activate the virtual environment:
- On macOS/Linux:
source venv/bin/activate - On Windows:
venv\Scripts\activate
- On macOS/Linux:
-
Install the required packages:
pip install -r requirements.txt
-
Run the application:
streamlit run src/main.py
-
When you're done, you can deactivate the virtual environment:
deactivate
Note: You'll need to activate the virtual environment each time you want to run the project.
