Skip to content

Conversation

@tectonia
Copy link

@tectonia tectonia commented Jun 6, 2025

This pull request introduces a new LLM-based example - the invoice processing pipeline, including new configurations, dependencies, workflows, and documentation. Key changes focus on enabling invoice processing experiments, integrating Azure OpenAI services, and enhancing pipeline functionality.

Invoice Processing Pipeline:

  • New CI Workflow: Added invoice_processing_ci_pipeline.yml to define a continuous integration workflow for invoice processing, supporting pull requests and workflow dispatch triggers.
  • Pipeline Configurations: Updated config/config.yaml to include configurations for invoice processing pipelines (invoice_processing_pr and invoice_processing_dev) with specific compute cluster and dataset settings. [1] [2]
  • Experiment Configuration: Added experiment_config.yaml to define parameters for data preparation, prediction, and scoring in invoice processing experiments.
  • Predict Component: Created mlops/invoice_processing/components/predict.yml to define the predict component for the pipeline, integrating Azure OpenAI service inputs and outputs.

Dependency Updates:

  • Requirements Files: Updated dependencies in .github/requirements/build_validation_requirements.txt and .github/requirements/execute_job_requirements.txt to include newer versions of mlflow, azure-ai-ml, and additional libraries like azureml-fsspec, Levenshtein, and python-retry. [1] [2]

Workflow Improvements:

  • Build Validation Workflow: Enhanced build_validation_workflow.yml to include PYTHONPATH for improved test execution.
  • Azure CLI Token Cache: Added a step to clear Azure CLI token cache in execute_shell_code/action.yml for better security and reliability.

Documentation Updates:

  • Experiment Configuration Guide: Added docs/how-to/ConfigureExperiments.md to provide detailed instructions on configuring experiments, including .env file setup and pipeline configurations.
  • Prompts and Strategies: Added docs/how-to/PromptsAndExtractionStrategies.md to document prompt creation and extraction strategies for invoice processing.

Data and Config Updates:

  • Data Configuration: Updated config/data_config.json to include new datasets for invoice processing (invoice_processing_test and invoice_processing_test_gt).
  • Config Utils: Enhanced mlops/common/config_utils.py to support loading experiment_config.yaml alongside the main configuration file.

These changes collectively enable robust support for the invoice processing pipeline, streamline workflows, and enhance documentation for easier onboarding and experimentation.

@tectonia tectonia marked this pull request as ready for review June 27, 2025 15:18
@tectonia tectonia changed the title Contributing LLM-based example LLM-based example Jun 27, 2025
@tectonia tectonia closed this Jun 27, 2025
@tectonia tectonia reopened this Aug 27, 2025
Trigger pr checks with minor change
Trigger another pr check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants