A fully automated AWS SageMaker Pipeline that ingests a raw “fake news” dataset, cleans & balances it, trains a RoBERTa classifier, evaluates its performance, and—if it meets your quality gates—packages & registers the model for deployment after human approval.
- Data Registration & understanding
- Pipeline Definition
- Processing (clean, balance, transform, split)
- Training (train on train+validation)
- Evaluation (test the trained model's performance on the test dataset)
- Conditional Model Registration
- Human approval and SageMaker endpoint deployment
- Python 3.8 or above
- AWS account with permissions for SageMaker, S3, IAM, CloudWatch
- AWS CLI v2 configured
boto3,sagemaker,transformers
pip install boto3 sagemaker protobuf transformers pandas