This page provides the Qarnot documentation for the HPC application. To access the Tasq documentation, click here.

HPC Snapshots

Simulation snapshots are a crucial feature that enable you to capture and preserve the state of a simulation during its execution. This feature is particularly useful for monitoring, debugging, and recovering a simulation data even in cases where the execution might be interrupted or encounter errors.

What are snapshots?

A snapshot represents a point-in-time capture of your simulation resources, including its files and results. When you trigger a snapshot, Qarnot creates a copy of all output resources at that specific moment, making them available for retrieval regardless of the task's final status.

Types of snapshots

Qarnot offers two primary methods for creating task snapshots:

Periodic snapshots

Periodic snapshots enable automatic, scheduled captures of a task's state at regular intervals. By configuring periodic snapshots via the HPC simulation submit form, you can ensure that your task's progress is continuously backed up without manual intervention. This provides a safety net against potential data loss and enables you to track the evolution of your case over time.

Using our SDK or our API, periodic snapshots can also be set up after the simulation’s launch, during its execution.

Manual snapshots

Manual snapshots allow you to capture the state of a simulation on demand. You can trigger a manual snapshot at any point during a run’s execution by making an API call to the snapshot endpoint. This is particularly useful when you want to check intermediate results or preserve data at critical points in your workload's execution.

Snapshots' behavior

When launching a simulation on Qarnot, your environment will be built with several directories. The /job directory is crucial as it links your resources with your cloud folder. Everything you attach to a simulation during setup will be placed in /job.

Important: Snapshots specifically target the content of the /job directory. This has different implications depending on your workflow:

Interactive workflow: When working interactively, make sure to place everything you want to retrieve in the /job directory before performing a snapshot. Any files stored outside this directory will not be included in the snapshot.
Batch mode: In batch mode, results are placed in the /job directory by default, so no additional precautions are necessary. Your results will be automatically included in snapshots.

Snapshots' parameters

You can fine-tune your snapshots by configuring specific parameters that control which files are included or excluded. These parameters allow for precise control over the snapshot content:

File filtering parameters

Whitelist: Use this parameter with regular expressions to specify which files to include in the snapshot. Only files matching the regular expression will be copied. For example, you can use this to capture only files with specific extensions like .log or .txt.
Blacklist: Use this parameter with regular expressions to exclude specific files from the snapshot. Files matching the regular expression will be skipped during the snapshot process. This is particularly useful when you want to filter out large files that might not be necessary for your debugging or monitoring purposes.

Scheduling parameters

Interval: This parameter is specific to periodic snapshots and allows you to set the frequency, in seconds, at which snapshots are automatically created. For example, setting an interval of 3600 would create a new snapshot every hour.

Benefits of using snapshots

Data recovery: Retrieve intermediate results even if a simulation fails to complete
Progress monitoring: Check the evolution of your run’s outputs during execution
Debugging: Analyze intermediate states to identify issues in long-running cases
Risk mitigation: Protect against potential data loss due to infrastructure failures

Using snapshots

Triggering a manual snapshot

With the API

To create a manual snapshot of a running case, send a POST request to the following endpoint:

POST /v{version}/tasks/{taskUuid}/snapshot

This will immediately capture the current state of all output resources associated with the simulation.

For more information please consult our API documentation

With the python SDK

Setting up manual snapshots

To trigger a manual snapshot for a running task using the Python SDK:

1import qarnot
2# Connect to Qarnot
3conn = qarnot.Connection(client_token="your_auth_token")
4
5# Retrieve an existing task
6task = conn.retrieve_task("TASK-UUID")
7
8# Trigger a manual snapshot
9task.instant()

Configuring periodic snapshots

With the API

To set up automatic snapshots at regular intervals, use the periodic snapshot endpoint:

POST /v{version}/tasks/{taskUuid}/snapshot/periodic

You can specify the frequency of snapshots using the request body parameters, allowing you to tailor the snapshot schedule to your specific needs.

For more information please consult our API documentation

With the python SDK

The Qarnot Python SDK allows you to integrate snapshot functionality directly into your Python workflows. To configure automatic periodic snapshots for a task:

1import qarnot
2# Connect to Qarnot
3conn = qarnot.Connection("your_auth_token")
4
5# Retrieve an existing task
6task = conn.retrieve_task("your-task-uuid")
7
8# Configure periodic snapshots (every hour)
9task.snapshot_periodic(interval=3600)
10
11# Configure periodic snapshots with filtering options
12task.snapshot(3600)
13
14# Optional, blacklist processor directories with regex
15task.snapshot_blacklist = r"processor\d+" # Set snapshots blacklist
16task.results_blacklist = r"processor\d+" # Set results blacklist
17
18# Optional, whitelist only log. files with regex
19task.snapshot_whitelist = r"log\..*" # Set snapshots whitelist
20task.results_whitelist = r"log\..*" # Set results whitelist

The snapshot() method accepts the interval parameter: the time in seconds between each snapshot

On our web platform

On the last page of the simulation submission form, the snapshots section lets you set up the parameters previously discussed for your periodic snapshots:

Retrieving snapshot data

Snapshots are stored alongside regular simulation’s results and can be accessed using the standard result retrieval mechanisms. Each snapshot creates a timestamped directory within your task's results, making it easy to identify and access specific point-in-time captures.

Best practices

Consider storage implications when determining snapshot frequency and filters

⚠️ We strongly advise to limit your snapshots to a maximum of 1 Go, and keep the frequency as low as possible for your use case. When snapshots are too frequent or set up to copy large numbers and volumes of files, they can negatively impact the performances of your computation.

Configure periodic snapshots for long-running or critical cases
Use manual snapshots at key milestones in your workflow
Label or document snapshot timestamps for easier reference
Include snapshot retrieval in your error handling workflows
For interactive workflows, ensure all important data is in the /job directory before triggering snapshots