Data Storage

This is a Core Concept page which offers an overview of key topics essential for understanding Storage. Start here for foundational knowledge, then explore our guides and tutorials for more in-depth learning.

Storage is a key element in high performance computing. In order to execute your simulation, or algorithmic computations, you need somewhere to store source files (data, code, config files…).

In order to simplify storage management we’ve abstracted several technological components while ensuring optimal performances. This article will help you understand what component is used, when and what for.

Glossary

Fundamental Principles

Transfer from/to your task execution environment

Your data is stored on one or several Bucket in one of our data centers. When you launch a task, this data is transferred to an on-site storage system and then mounted as a Network File System to your task execution environment. All files are accessible on /job unless you specified otherwise in your task setup.

During the file transfer your task has the status PartiallyDispatched ou FullyDispatched.

When your task has completed, the data in /job that has been edited is transferred to your destination bucket, the status is then UploadingResults.

Billing

10GiB of Bucket Storage is included with any Qarnot Account. This means that you can save 10GiB of data. Please contact us to expand your storage capacity.

The ephemeral storage of data generated while computing (CFD solving, Data processing….) and not saved to your bucket is included in the computing prices.

Data transfer (ingress, egress) is always included.

Core Components

Buckets

To store your data while it is not being processed we use S3-compatible object storage, which means you can use any S3-compatible tool to interact with your cold data. As this data is meant to be processed on Qarnot or consists in computation results it is not meant to be stored indefinitely on the buckets. As a consequence, we do not ensure redundancy nor high availability. For more information please read our bucket documentation.

NFS

Once your data is transferred to be processed it is stored on a local Network File System server and mounted to your task on the /job.

Temporary data is stored on this server under the /cache directory for local data or /share if the data is to be shared between task instances (if Lustre is not configured).

Lustre

When running in a cluster, several nodes need to simultaneously read and write files. NFS system is prone to corruption and can’t guarantee consistency while delivering high performances. For this reason we chose to add a Lustre File system to our cluster nodes. Lustre supports parallel file access, allowing multiple nodes to read and write to the same file simultaneously.

The Lustre scratchpad is used for temporary data storage during computations. It is not backed up, meaning that data stored there is at risk of being lost if any component fails.

For more information please read our Lustre documentation.

Whitelisting / Blacklisting

To precisely define which files you want to compute with, snapshot or download at the end of your computation, Qarnot offers you the possibility to whitelist (authorize) or blacklist (exclude) files or folders from the file upload, snapshot or download operations.

File selection (or exclusion) is done using a Regex. For example to select all image files you can use the following regex [^\\s]+(.*?)\\.(jpg|jpeg|png|gif|JPG|JPEG|PNG|GIF)$.

For more information please read our whitelist/blacklist documentation.

Related Articles

Fore more information on how to use our platform, please take a look at our guides, tutorial, and documentation article: