CS329X HW3: Building and Evaluating Human-AI Interaction

Due 11/13/25 11:59PM PT

75 points total + 10 points extra credit

Overview

In this assignment, you will implement methods from two recent papers which focus on builing or evaluating human-ai interaction:

CoGym - a framework for building and evaluating collaborative agents that take initiative at the correct time and work together with humans in parallel.
AutoMetrics - a library for generating automatic evaluators to approximate human-judgements

You will also have an opportunity to recieve extra credit by competing on a leaderboard to build the best automatic evaluators.

This assignment will require that you use your Google Cloud Credits. We estimate that this assignment will take fewer than $20. If you do not have enough credit remaining please contact the teaching team.

File Structure

Core Files

hw3.ipynb - Main Jupyter notebook containing the complete assignment workflow. This is where you'll implement most of your code and run experiments.
writeup.md - Template for your written responses and analysis. You must fill this out with your answers.

Environment

.env.example - example of how you need to structure your .env file
.env - file you will create to store your GEMINI_API_KEY
requirements.txt - file specifying the library requirements that you need to install for the assignment

Leaderboard

submission.py - file to modify with your custom generate_evaluator method to submit to the leaderboard.

Environment Setup

Install required packages (we suggest you install them in Conda environment):
```
pip install -r requirements.txt
```
Copy .env.example to .env and fill in the API keys.
Open and run through hw3.ipynb following the instructions

Submission Requirements

What to Submit

You must submit a ZIP file containing:

All code files including:
- hw3.ipynb (with all cells executed and outputs visible)
- submission.py for submitting to the leaderboard
- .env so that we can run your submission on our private test set
Completed writeup:
- Export writeup.md with all TODO sections filled out as a PDF file.

For submitting to the gradescope leaderboard you may find submit.sh useful for zipping your files for submission. For canvas upload please be additionally sure to convert writeup.md to a PDF file.

This is the first homework that will involve gradescope submission, so please be careful to remember to upload to BOTH gradescope and canvas.

Important Notes

Test that your code runs before submission
Include all outputs in your Jupyter notebook submission
Double-check that all TODO sections in writeup.md are completed

Good luck building and evaluating human-ai interaction!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
datasets		datasets
img		img
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
helpers.py		helpers.py
hw3.ipynb		hw3.ipynb
requirements.txt		requirements.txt
submission.py		submission.py
submit.sh		submit.sh
writeup.md		writeup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS329X HW3: Building and Evaluating Human-AI Interaction

Overview

File Structure

Core Files

Environment

Leaderboard

Environment Setup

Submission Requirements

What to Submit

Important Notes

About

Uh oh!

Releases

Packages

Languages

License

SALT-NLP/cs329x_hw3

Folders and files

Latest commit

History

Repository files navigation

CS329X HW3: Building and Evaluating Human-AI Interaction

Overview

File Structure

Core Files

Environment

Leaderboard

Environment Setup

Submission Requirements

What to Submit

Important Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages