HPC Snapshots Simulation snapshots are a crucial feature that enable you to capture and preserve the state of a simulation during its execution. This feature is particularly useful for monitoring, debugging, and recovering a simulation data even in cases where the execution might be interrupted or encounter errors.
What are snapshots? A snapshot represents a point-in-time capture of your simulation resources, including its files and results. When you trigger a snapshot, Qarnot creates a copy of all output resources at that specific moment, making them available for retrieval regardless of the task's final status.
Types of snapshots Qarnot offers two primary methods for creating task snapshots:
Periodic snapshots Periodic snapshots enable automatic, scheduled captures of a task's state at regular intervals. By configuring periodic snapshots via the HPC simulation submit form, you can ensure that your task's progress is continuously backed up without manual intervention. This provides a safety net against potential data loss and enables you to track the evolution of your case over time.
Using our SDK or our API, periodic snapshots can also be set up after the simulation’s launch, during its execution.
Manual snapshots Manual snapshots allow you to capture the state of a simulation on demand. You can trigger a manual snapshot at any point during a run’s execution by making an API call to the snapshot endpoint. This is particularly useful when you want to check intermediate results or preserve data at critical points in your workload's execution.
Snapshots' behavior When launching a simulation on Qarnot, your environment will be built with several directories. The /job directory is crucial as it links your resources with your cloud folder. Everything you attach to a simulation during setup will be placed in /job.
Important: Snapshots specifically target the content of the /job directory. This has different implications depending on your workflow:
Interactive workflow: When working interactively, make sure to place everything you want to retrieve in the /job directory before performing a snapshot. Any files stored outside this directory will not be included in the snapshot.Batch mode: In batch mode, results are placed in the /job directory by default, so no additional precautions are necessary. Your results will be automatically included in snapshots.Snapshots' parameters You can fine-tune your snapshots by configuring specific parameters that control which files are included or excluded. These parameters allow for precise control over the snapshot content:
File filtering parameters Whitelist: Use this parameter with regular expressions to specify which files to include in the snapshot. Only files matching the regular expression will be copied. For example, you can use this to capture only files with specific extensions like .log or .txt.Blacklist: Use this parameter with regular expressions to exclude specific files from the snapshot. Files matching the regular expression will be skipped during the snapshot process. This is particularly useful when you want to filter out large files that might not be necessary for your debugging or monitoring purposes.Scheduling parameters Interval: This parameter is specific to periodic snapshots and allows you to set the frequency, in seconds, at which snapshots are automatically created. For example, setting an interval of 3600 would create a new snapshot every hour.Benefits of using snapshots Data recovery : Retrieve intermediate results even if a simulation fails to completeProgress monitoring : Check the evolution of your run’s outputs during executionDebugging : Analyze intermediate states to identify issues in long-running casesRisk mitigation : Protect against potential data loss due to infrastructure failuresUsing snapshots Triggering a manual snapshot With the API To create a manual snapshot of a running case, send a POST request to the following endpoint:
POST /v{version}/tasks/{taskUuid}/snapshot This will immediately capture the current state of all output resources associated with the simulation.
For more information please consult our API documentation
With the python SDKSetting up manual snapshots
To trigger a manual snapshot for a running task using the Python SDK:
import qarnot
# Connect to Qarnot
conn = qarnot.Connection(client_token= "your_auth_token" )
# Retrieve an existing task
task = conn.retrieve_task( "TASK-UUID" )
# Trigger a manual snapshot
task.instant() Configuring periodic snapshotsWith the API To set up automatic snapshots at regular intervals, use the periodic snapshot endpoint:
POST /v{version}/tasks/{taskUuid}/snapshot/periodic You can specify the frequency of snapshots using the request body parameters, allowing you to tailor the snapshot schedule to your specific needs.
For more information please consult our API documentation
With the python SDK The Qarnot Python SDK allows you to integrate snapshot functionality directly into your Python workflows. To configure automatic periodic snapshots for a task:
import qarnot
# Connect to Qarnot
conn = qarnot.Connection( "your_auth_token" )
# Retrieve an existing task
task = conn.retrieve_task( "your-task-uuid" )
# Configure periodic snapshots (every hour)
task.snapshot_periodic(interval= 3600 )
# Configure periodic snapshots with filtering options
task.snapshot( 3600 )
# Optional, blacklist processor directories with regex
task.snapshot_blacklist = r"processor\d+" # Set snapshots blacklist
task.results_blacklist = r"processor\d+" # Set results blacklist
# Optional, whitelist only log. files with regex
task.snapshot_whitelist = r"log\..*" # Set snapshots whitelist
task.results_whitelist = r"log\..*" # Set results whitelist The snapshot() method accepts the interval parameter: the time in seconds between each snapshot
On our web platformOn the last page of the simulation submission form, the snapshots section lets you set up the parameters previously discussed for your periodic snapshots:
Retrieving snapshot data Snapshots are stored alongside regular simulation’s results and can be accessed using the standard result retrieval mechanisms. Each snapshot creates a timestamped directory within your task's results, making it easy to identify and access specific point-in-time captures.
Best practices Consider storage implications when determining snapshot frequency and filters ⚠️ We strongly advise to limit your snapshots to a maximum of 1 Go, and keep the frequency as low as possible for your use case. When snapshots are too frequent or set up to copy large numbers and volumes of files, they can negatively impact the performances of your computation.
Configure periodic snapshots for long-running or critical cases Use manual snapshots at key milestones in your workflow Label or document snapshot timestamps for easier reference Include snapshot retrieval in your error handling workflows For interactive workflows, ensure all important data is in the /job directory before triggering snapshots