Skip to content

Conversation

NikhilSinha1
Copy link
Contributor

@NikhilSinha1 NikhilSinha1 commented Oct 6, 2025

Summary

We want to add checkpointing to cog-runtime, so we can checkpoint and restore models after they completed setup. As such, this PR introduces the Checkpointer object, that exposes the ability to checkpoint and restore the model.

To enable this, we also want to restore some of the ability for coglet to use signals to communicate with the parent process over signals rather than over webhooks, as switched to in this PR.

@NikhilSinha1 NikhilSinha1 requested a review from a team as a code owner October 6, 2025 17:59
@NikhilSinha1 NikhilSinha1 changed the title Add mode for cog-runtime to use signals Add checkpointer to cog-runtime Oct 7, 2025
// Derive the runtime context from the manager's context
runtimeContext, runtimeCancel := context.WithCancel(ctx)

cmd, callback, err := cp.Restore(runtimeContext)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this postRunnerStart or something


m.runners[0] = runner
m.monitoringWG.Go(func() {
m.monitorRunnerSubprocess(m.ctx, DefaultRunnerName, runner)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this after the goto

if s.cfg.SignalMode {
// This runs an infinite loop for handling signals, so we explicitly
// do not want to put it in a wait group of any kind
go s.handler.HandleSignals()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a context for cancelling

Copy link
Contributor

@meatballhat meatballhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went through this synchronously 🎉🌮🎉

Copy link

@nmurthy nmurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, OOC have you been testing with any specific cogs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants