Due 11/13/25 11:59PM PT
75 points total + 10 points extra credit
In this assignment, you will implement methods from two recent papers which focus on builing or evaluating human-ai interaction:
- CoGym - a framework for building and evaluating collaborative agents that take initiative at the correct time and work together with humans in parallel.
- AutoMetrics - a library for generating automatic evaluators to approximate human-judgements
You will also have an opportunity to recieve extra credit by competing on a leaderboard to build the best automatic evaluators.
This assignment will require that you use your Google Cloud Credits. We estimate that this assignment will take fewer than $20. If you do not have enough credit remaining please contact the teaching team.
hw3.ipynb- Main Jupyter notebook containing the complete assignment workflow. This is where you'll implement most of your code and run experiments.writeup.md- Template for your written responses and analysis. You must fill this out with your answers.
.env.example- example of how you need to structure your.envfile.env- file you will create to store your GEMINI_API_KEYrequirements.txt- file specifying the library requirements that you need to install for the assignment
submission.py- file to modify with your customgenerate_evaluatormethod to submit to the leaderboard.
-
Install required packages (we suggest you install them in Conda environment):
pip install -r requirements.txt
-
Copy
.env.exampleto.envand fill in the API keys. -
Open and run through
hw3.ipynbfollowing the instructions
You must submit a ZIP file containing:
-
All code files including:
hw3.ipynb(with all cells executed and outputs visible)submission.pyfor submitting to the leaderboard.envso that we can run your submission on our private test set
-
Completed writeup:
- Export
writeup.mdwith all TODO sections filled out as a PDF file.
- Export
For submitting to the gradescope leaderboard you may find submit.sh useful for zipping your files for submission. For canvas upload please be additionally sure to convert writeup.md to a PDF file.
This is the first homework that will involve gradescope submission, so please be careful to remember to upload to BOTH gradescope and canvas.
- Test that your code runs before submission
- Include all outputs in your Jupyter notebook submission
- Double-check that all TODO sections in
writeup.mdare completed
Good luck building and evaluating human-ai interaction!