Skip to content

Conversation

baptistecolle
Copy link
Contributor

This is a WIP blog post for the Jupyter Agent 2 project.

There are still a few graphs to be added, and maybe release some artifacts for it

@lvwerra @ayukh let me know what you think?

I modified the name from data-agent to jupter-agent-2, as I thought that could be more impactful

title: "😎 Creating a Data Science Agent from Scratch"
thumbnail: /blog/assets/jupyter-agent-2/thumbnail.png
authors:
- user: baptistecolle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know the order you guys prefer


---

## ⚙️ Processing Pipeline
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here i think adding a graph could be cool


*Challenge:* Many datasets were unavailable.
*Trick:* Since LLMs are strong at code and have a decent world model, we prompted them to **act as a code interpreter** when the dataset was missing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also clarify that not just many datasets were not available, but also incorrectly mapped in metadata or were not specific in the metadata

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can expand that section later if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, feel free to commit to the branch! I wanted this to be like a common pull request, so feel free to modify and reformat a lot. Worst case, ping me on Slack for some questions if you have. So I will let you handle this section as you know it best

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick pass. Cool work. I'd suggest to maybe contextualize more how the demo works, and how it relates with the training process we describe.

We built a pipeline to automatically fetch these datasets, ensuring the code inside notebooks could actually run. The goal was to later train the model on actual code execution.

### 3. Edu scoring
We scored notebooks based on educational quality. We saw that using the whole notebook was not optimal, as many contained trivial or broken code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we score them? Did we train/use a separate model?

This is a follow-up to our earlier work on [jupyter-agent (v1)](https://huggingface.co/spaces/lvwerra/jupyter-agent).

The **Jupyter Agent** is a data science agent that can execute code directly inside a Jupyter notebook. Think of it like *Cursor*, but living natively inside your data science workflow.
For this demo we use **QwenCoder**, currently one of the strongest coding models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship with the Qwen 4B model that is discussed in the rest of the post?

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot! 💗
IMHO one big thing missing is instructions on how to use it locally, we only see the link to the Space.


# Creating a Data Science Agent from Scratch

Check out our new demo here: [huggingface.co/spaces/lvwerra/jupyter-agent-2](https://huggingface.co/spaces/lvwerra/jupyter-agent-2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Check out our new demo here: [huggingface.co/spaces/lvwerra/jupyter-agent-2](https://huggingface.co/spaces/lvwerra/jupyter-agent-2).
Check out our new demo [here](https://huggingface.co/spaces/lvwerra/jupyter-agent-2).

This is a follow-up to our earlier work on [jupyter-agent (v1)](https://huggingface.co/spaces/lvwerra/jupyter-agent).

The **Jupyter Agent** is a data science agent that can execute code directly inside a Jupyter notebook. Think of it like *Cursor*, but living natively inside your data science workflow.
For this demo we use **QwenCoder**, currently one of the strongest coding models.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to link to the ckpt


We set out to **train a small data agent model** that could perform better on DABStep.

Our first choice was **Qwen-4B**: extremely small (fast to iterate with, easy to run), yet strong enough to act in agentic scenarios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to link to checkpoint, which Qwen is it? Qwen3?

| For the year 2023, focusing on the merchant *Crossfit Hanna*, if we incentivize users to switch to a different Authorization Characteristics Indicator, which option would be the most cost-effective? | E:346.49 |

This benchmark remains challenging for today’s LLMs — especially for smaller models.
You can explore the live leaderboard here: [huggingface.co/spaces/adyen/DABstep](https://huggingface.co/spaces/adyen/DABstep).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can explore the live leaderboard here: [huggingface.co/spaces/adyen/DABstep](https://huggingface.co/spaces/adyen/DABstep).
You can explore the live leaderboard [here] (https://huggingface.co/spaces/adyen/DABstep).

you can also embed this to the blog, and it would be nice to link to that blog too either here or at the end of the blog

- Rich metadata for each notebook (authors, datasets used, etc.).


## ⚙️ Processing Pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting multiple subheaders with little text breaks readability a bit

Some training steps were particularly interesting:

- For trace generation, we used LLMs to generate QA pairs, which gave us a **verifiable environment**.
- Finally, we fine-tuned **Qwen-4B** with [TRL](https://huggingface.co/docs/trl).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to link to the dataset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also where can people find the fine-tuned checkpoint?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also where can people find the fine-tuned checkpoint?

It's still WIP 😄 I will upload everything later today with that section in the blog updated with more information+links

- *Distillation:* Investigate knowledge distillation, which has shown strong results for improving small models.
- *Reinforcement Learning (RL):* Build an RL environment, which has been shown to achieve state-of-the-art performance on agentic tasks. Since our QA setup already provides a verifiable environment, we could leverage it directly for RL training.

Maybe this will lead to… **Jupyter-Agent 3.** 😉
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to end with a call for action here on people trying it out

@baptistecolle baptistecolle marked this pull request as draft August 29, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants