Skip to content

Conversation

kvinwang
Copy link
Collaborator

@kvinwang kvinwang commented Jul 15, 2025

This PR implements CVM disk backup in both built-in vmm and external script dstack-backup.py.
As the backup usually takes a very long time, it is recommended to run with the external script dstack-backup.py rather than vmm RPC.
How to use:

  1. Install dependency qmpbackup
  2. Enable qmp socket in vmm.toml
  3. Add dstack-backup.py to crontab or similar task scheduler with a daily trigger. For example:
    30 2 * * * /path/to/dstack-backup.sh
    
    where dstack-backup.sh can be written as:
    #!/bin/bash
    cd /opt/meta-dstack/build
    source .venv/bin/activate
    # use flock to prevent reentrant
    flock -n ./backup.lck ./dstack-backup.py --vmm-work-dir . --max-backups 3 --full-interval 7d --inc-interval 1d

It will backup the disks of running VMs to a configured directory.
Run dstack-backup.py --help to show the arguments.
There are two kinds of backups, full and incremental. By default, it performs full backup weekly, and incremental backup daily.

@kvinwang kvinwang requested a review from Leechael July 15, 2025 03:29
.open(BACKUP_LOCK_FILE)
.context("Failed to create backup lock file, there is another backup in progress")?;
// Run /dstack/hooks/pre-backup if it exists
let pre_backup_hook = "/dstack/hooks/pre-backup";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we set up hooks for pre-backup and post-backup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just write some shell commands into the files pre-backup/post-backup. It usaully need nothing todo. Some apps may want to flush their app data to disk before backup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they set up only by node operators? I'm wondering who can actually create these hook scripts.

Copy link
Collaborator Author

@kvinwang kvinwang Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they set up only by node operators? I'm wondering who can actually create these hook scripts.

It's up to app defines the logic in the hooks. For example, flush their mysql database in the CVM. The hooks is put there for future use. I think we don't need to care about it at this time, until their is a use case appears.

Comment on lines +691 to +710
if backup_level == "full" {
// clear the bitmaps
let output = Command::new("qmpbackup")
.arg("--socket")
.arg(&qmp_socket)
.arg("cleanup")
.arg("--remove-bitmap")
.output()
.context("Failed to clear bitmaps")?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
warn!("Failed to clear bitmaps for {id}: {stderr}");
}
// Switch to new dir and symbol link the latest to it
let timestamp = chrono::Utc::now().format("%Y%m%dZ%H%M%S").to_string();
let new_dir = backup_dir.join(&timestamp);
fs::create_dir_all(&new_dir).context("Failed to create backup directory")?;
if fs::symlink_metadata(&latest_dir).is_ok() {
fs::remove_file(&latest_dir)
.context("Failed to remove latest directory link")?;
}
fs::os::unix::fs::symlink(&timestamp, &latest_dir)
.context("Failed to create latest directory link")?;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting to add a checksum for each backup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The feature would be better to add to the qmpbackup command, since it controls the filename generation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better: can we verify the checksum before restoring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better: can we verify the checksum before restoring?

Surely should add verification in qmprestore, if checksum is added to qmpbackup.

@kvinwang kvinwang force-pushed the disk-backup branch 3 times, most recently from 2ab1f2d to f57a83b Compare July 18, 2025 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants