aws-solutions-library-samples
diff --git a/‎.clinerules‎
Lines changed: 188 additions & 0 deletions b/‎.clinerules‎
Lines changed: 188 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 88 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,188 @@
+# Cline's Memory Bank
+
+I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation - it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task - this is not optional.
+
+## Memory Bank Structure
+
+The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy:
+
+```mermaid
+flowchart TD
+    PB[projectbrief.md] --> PC[productContext.md]
+    PB --> SP[systemPatterns.md]
+    PB --> TC[techContext.md]
+    
+    PC --> AC[activeContext.md]
+    SP --> AC
+    TC --> AC
+    
+    AC --> P[progress.md]
+```
+
+### Core Files (Required)
+1. `projectbrief.md`
+   - Foundation document that shapes all other files
+   - Created at project start if it doesn't exist
+   - Defines core requirements and goals
+   - Source of truth for project scope
+
+2. `productContext.md`
+   - Why this project exists
+   - Problems it solves
+   - How it should work
+   - User experience goals
+
+3. `activeContext.md`
+   - Current work focus
+   - Recent changes
+   - Next steps
+   - Active decisions and considerations
+   - Important patterns and preferences
+   - Learnings and project insights
+
+4. `systemPatterns.md`
+   - System architecture
+   - Key technical decisions
+   - Design patterns in use
+   - Component relationships
+   - Critical implementation paths
+
+5. `techContext.md`
+   - Technologies used
+   - Development setup
+   - Technical constraints
+   - Dependencies
+   - Tool usage patterns
+
+6. `progress.md`
+   - What works
+   - What's left to build
+   - Current status
+   - Known issues
+   - Evolution of project decisions
+
+### Additional Context
+Create additional files/folders within memory-bank/ when they help organize:
+- Complex feature documentation
+- Integration specifications
+- API documentation
+- Testing strategies
+- Deployment procedures
+
+## Core Workflows
+
+When interacting with a user query, I have two modes of workflows: plan mode & act mode. This is explained in the next subsections. I ALWAYS start in PLAN mode for new tasks or significant changes, and only transition to ACT mode after a comprehensive plan is created and approved by the user.
+
+### Plan Mode
+In plan mode, I focus on understanding requirements, reasoning through solutions, and creating a comprehensive plan before any implementation. I DO NOT use tools to create/modify files or directory structures in this mode.
+
+```mermaid
+flowchart TD
+    Start[Start] --> ReadFiles[Read Memory Bank]
+    ReadFiles --> CheckFiles{Files Complete?}
+    
+    CheckFiles -->|No| CreateMissing[Create Missing Files]
+    CreateMissing --> GatherInfo[Gather Requirements]
+    
+    CheckFiles -->|Yes| Verify[Verify Context]
+    Verify --> GatherInfo
+    
+    GatherInfo --> Reasoning[Explicit Reasoning]
+    Reasoning --> Alternatives[Consider Alternatives]
+    Alternatives --> Strategy[Develop Strategy]
+    Strategy --> PlanDoc[Create Planning Document]
+    PlanDoc --> UserInput[Request User Feedback]
+    UserInput --> Approval{User Approves?}
+    Approval -->|Yes| TransitionToAct[Transition to ACT Mode]
+    Approval -->|No| RefineStrategy[Refine Strategy]
+    RefineStrategy --> PlanDoc
+```
+
+#### Planning Document Structure
+Every comprehensive plan must include:
+1. **Problem Understanding**: Clear articulation of the problem/task
+2. **Requirements Analysis**: Explicit and implicit requirements
+3. **Solution Alternatives**: At least 2-3 approaches with pros/cons
+4. **Selected Approach**: Detailed implementation strategy with reasoning
+5. **Implementation Steps**: Specific, actionable steps with dependencies
+6. **Testing Strategy**: How to verify the solution works
+7. **Risks & Mitigations**: Potential issues and how to address them
+
+### Act Mode
+In act mode, I implement the approved plan, using all available tools to read, write, and execute commands.
+
+```mermaid
+flowchart TD
+    Start[Start] --> Context[Check Memory Bank]
+    Context --> ReviewPlan[Review Approved Plan]
+    ReviewPlan --> Update[Update Documentation]
+    Update --> Execute[Execute Task]
+    Execute --> Verify[Verify Implementation]
+    Verify --> Document[Document Changes]
+```
+
+I should be in one of the modes. I will always start with `MODE: PLAN` or `MODE: ACT` when responding depending on which mode I am. I will only transition from PLAN to ACT after creating a comprehensive plan and receiving explicit user approval. I will make sure that I remember to keep the same mode unless told to change.
+
+## Documentation Updates
+
+Memory Bank updates occur when:
+1. Discovering new project patterns
+2. After implementing significant changes
+3. When user requests with **update memory bank** (MUST review ALL files)
+4. When context needs clarification
+
+```mermaid
+flowchart TD
+    Start[Update Process]
+    
+    subgraph Process
+        P1[Review ALL Files]
+        P2[Document Current State]
+        P3[Clarify Next Steps]
+        P4[Document Insights & Patterns]
+
+        P1 --> P2 --> P3 --> P4
+    end
+    
+    Start --> Process
+```
+
+Note: When triggered by **update memory bank**, I MUST review every memory bank file, even if some don't require updates. Focus particularly on activeContext.md and progress.md as they track current state.
+
+## Project Intelligence (.clinerules)
+
+The .clinerules file is my learning journal for each project. It captures important patterns, preferences, and project intelligence that help me work more effectively. As I work with you and the project, I'll discover and document key insights that aren't obvious from the code alone.
+
+```mermaid
+flowchart TD
+    Start{Discover New Pattern}
+    
+    subgraph Learn [Learning Process]
+        D1[Identify Pattern]
+        D2[Validate with User]
+        D3[Document in .clinerules]
+    end
+    
+    subgraph Apply [Usage]
+        A1[Read .clinerules]
+        A2[Apply Learned Patterns]
+        A3[Improve Future Work]
+    end
+    
+    Start --> Learn
+    Learn --> Apply
+```
+
+### What to Capture
+- Critical implementation paths
+- User preferences and workflow
+- Project-specific patterns
+- Known challenges
+- Evolution of project decisions
+- Tool usage patterns
+
+The format is flexible - focus on capturing valuable insights that help me work more effectively with you and the project. Think of .clinerules as a living document that grows smarter as we work together.
+
+REMEMBER: After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity, as my effectiveness depends entirely on its accuracy.
+
+REMEMBER: I always use mermaid diagrams when I want to visualize any concepts.
@@ -5,6 +5,94 @@ SPDX-License-Identifier: MIT-0
 
 ## [Unreleased]
 
+### Added
+
+### Fixed
+
+
+## [0.3.7]
+
+### Added
+
+- **Criteria Validation Service Class**
+  - New  document validation service that evaluates documents against dynamic business rules using Large Language Models (LLMs)
+  - **Key Capabilities**: Dynamic business rules configuration, asynchronous processing with concurrent criteria evaluation, intelligent text chunking for large documents, multi-file processing with summarization, comprehensive cost and performance tracking
+  - **Primary Use Cases**: Healthcare prior authorization workflows, compliance validation, business rule enforcement, quality assurance, and audit preparation
+  - **Architecture Features**: Seamless integration with IDP pipeline using common Bedrock client, unified metering with automatic token usage tracking, S3 operations using standardized file operations, configuration compatibility with existing IDP config system
+  - **Advanced Features**: Configurable criteria questions without code changes, robust error handling with graceful degradation, Pydantic-based input/output validation with automatic data cleaning, comprehensive timing metrics and token usage tracking
+  - **Limitation**: Python idp_common support only, not yet implemented within deployed pattern workflows.
+
+
+- **Document Process Flow Visualization**
+  - Added interactive visualization of Step Functions workflow execution for document processing
+  - Visual representation of processing steps with status indicators and execution details
+  - Detailed step information including inputs, outputs, and error messages
+  - Timeline view showing chronological execution of all processing steps
+  - Auto-refresh capability for monitoring active executions in real-time
+  - Support for Map state visualization with iteration details
+  - Error diagnostics with detailed error messages for troubleshooting
+  - Automatic selection of failed steps for quick issue identification
+
+- **Granular Assessment Service for Scalable Confidence Evaluation**
+  - New granular assessment approach that breaks down assessment into smaller, focused tasks for improved accuracy and performance
+  - **Key Benefits**: Better accuracy through focused prompts, cost optimization via prompt caching, reduced latency through parallel processing, and scalability for complex documents
+  - **Task Types**: Simple batch tasks (groups 3-5 simple attributes), group tasks (individual group attributes), and list item tasks (individual list items for maximum accuracy)
+  - **Configuration**: Configurable batch sizes (`simple_batch_size`, `list_batch_size`) and parallel processing (`max_workers`) for performance tuning
+  - **Prompt Caching**: Leverages LLM caching capabilities with cached base content (document context, images, OCR data) and dynamic task-specific content
+  - **Use Cases**: Ideal for bank statements with hundreds of transactions, documents with 10+ attributes, complex nested structures, and performance-critical scenarios
+  - **Backward Compatibility**: Maintains same interface as standard assessment service with seamless migration path
+  - **Enhanced Documentation**: Comprehensive documentation in `docs/assessment.md` and example notebooks for both standard and granular approaches
+
+- **Reporting Database now has Document Sections Tables to enable querying across document fields**
+  - Added comprehensive document sections storage system that automatically creates tables for each section type (classification)
+  - **Dynamic Table Creation**: AWS Glue Crawler automatically discovers new section types and creates corresponding tables (e.g., `invoice`, `receipt`, `bank_statement`)
+  - **Configurable Crawler Schedule**: Support for manual, every 15 minutes, hourly, or daily (default) crawler execution via `DocumentSectionsCrawlerFrequency` parameter
+  - **Partitioned Storage**: Data organized by section type and date for efficient querying with Amazon Athena
+
+- **Partition Projections for Evaluation and Metering tables**
+  - **Automated Partition Management**: Eliminates need for `MSCK REPAIR TABLE` operations with projection-based partition discovery
+  - **Performance Benefits**: Athena can efficiently prune partitions based on date ranges without manual partition loading
+  - **Backward Compatibility Warning**: The partition structure change from `year=2024/month=03/day=15/` to `date=2024-03-15/` means that data saved in the evaluation or metering tables prior to v0.3.7 will not be visible in Athena queries after updating. To retain access to historical data, you can either:
+    - Manually reorganize existing S3 data to match the new partition structure
+    - Create separate Athena tables pointing to the old partition structure for historical queries
+
+
+- **Optimize the classification process for single class configurations in Pattern-2**
+  - Detects when only a single document class is defined in the configuration
+  - Automatically classifies all document pages as that single class
+  - Creates a single section containing all pages
+  - Bypasses the backend service calls (Bedrock or SageMaker) completely
+  - Logs an INFO message indicating the optimization is active
+
+- **Skip the extraction process for classes with no attributes in Pattern 2/3**
+  - Add early detection logic in extraction class to check for empty/missing attributes
+  - Return zero metering data and empty JSON results when no attributes defined
+
+- **Enhanced State Machine Optimization for Very Large Documents**
+  - Improved document compression to store only section IDs rather than full section objects
+  - Modified state machine workflow to eliminate nested result structures and reduce payload size
+  - Added OutputPath filtering to remove intermediate results from state machine execution
+  - Streamlined assessment step to replace extraction results instead of nesting them
+  - Resolves "size exceeding the maximum number of bytes service limit" errors for documents with 500+ pages
+
+### Changed
+- **Default behavior for image attachment in Pattern-2 and Pattern3**
+  - If the prompt contains a `{DOCUMENT_IMAGE}` placeholder, keep the current behavior (insert image at placeholder)
+  - If the prompt does NOT contain a `{DOCUMENT_IMAGE}` placeholder, do NOT attach the image at all
+  - Previously, if the (classification or extraction) prompt did NOT contain a `{DOCUMENT_IMAGE}` placeholder, the image was appended at the end of the content array anyway
+- **Modified default assessment prompt for token efficiency**
+  - Removed `confidence_reason` from output to avoid consuming unnecessary output tokens
+  - Refactored task_prompt layout to improve <<CACHEPOINT>> placement for efficiency when granular mode is enabled or disabled
+- **Enhanced .clinerules with comprehensive memory bank workflows**
+  - Enhanced Plan Mode workflow with requirements gathering, reasoning, and user approval loop
+
+### Fixed
+- Fixed UI list deletion issue where empty lists were not saved correctly - #18
+- Improve structure and clarity for idp_common Python package documentation
+- Improved UI in View/Edit Configuration to clarify that Class and Attribute descriptions are used in the classification and extraction prompts
+- Automate UI updates for field "HITL (A2I) Status" in the Document list and document details section.
+- Fixed image display issue in PagesPanel where URLs containing special characters (commas, spaces) would fail to load by properly URL-encoding S3 object keys in presigned URL generation
+
 ## [0.3.6]
 
 ### Fixed
 
@@ -1 +1 @@
-0.3.6
+0.3.7