Monitoring resources

Compute resources

Every compute node on Qarnot has resources available to be used by a task. This includes and is not limited to:

In many cases, it is important for a task to use most if not all of the available resources to improve performances. It can also be useful to monitor RAM usage in case it overflows and causes your task to crash.

One way to do so is through our Python SDK.

Monitoring CPU and RAM loads

It is possible to get semi-live updates of CPU and RAM usage from our API using our SDK. The script below is a simple example of how to launch a task that will simply sleep for 2 minutes and send back information on CPU and RAM usage every 10 seconds on your terminal.

Python

import qarnot
from datetime import datetime

conn = qarnot.Connection(client_token = '<<<MY_SECRET_TOKEN>>>>')

task = conn.create_task('CPU-RAM-monitoring', 'docker-batch', 1)
task.constants['DOCKER_CMD'] = 'sleep 120'

task.submit()
last_state = ''
done = False

while not done:
    if task.state != last_state:
        last_state = task.state
        print("** {}".format(last_state))

    if task.state == 'FullyExecuting':
        instance_info = task.status.running_instances_info.per_running_instance_info[0]
        cpu = instance_info.cpu_usage
        memory = instance_info.current_memory_mb
        print("\n*******************************\n")
        print('Current Timestamp : ', datetime.now())
        print("Current CPU usage : {:.2f} %".format(cpu))
        print("Current memory usage : {:.2f} MB".format(memory))
    done = task.wait(10)

Bash

#!/bin/bash

# Note: the following assumes that you have installed the .json parser utility [jq](https://jqlang.github.io/jq/manual/#basic-filters)

# =============== Task creation =============== #

# Your info
export QARNOT_CLIENT_TOKEN="<<<MY_SECRET_TOKEN>>>"

# Create and run task
qarnot task create \
--name "CPU-RAM-monitoring" \
--shortname "1234567890" \
--profile docker-batch \
--instance 1 \
--constants "DOCKER_CMD=sleep 1200"

# =============== Task info processing =============== #

###############################################
# Utility function to fetch the infos of the task 
###############################################
get_info () {
    qarnot task info --id "1234567890"
}

# Fetch task info and extract task state
info=$(get_info)
last_state=""

while true ; do
    # Fetch task info and extract task state
    info=$(get_info)
    state=$(echo "$info" | jq .[0].State)

    # Print changes of state to stdout
    if [[ "$state" != "$last_state" ]] ; then
        last_state=$state
        echo "$last_state"
    fi

    # Check if task is done
    completed=$(echo "$info" | jq .[0].Completed)
    if $completed ; then
        exit
    fi

    # If task is executing, update cpu and memory usage
    if [[ $state = \"FullyExecuting\" ]] ; then
        instance_info=$(echo "$info" | jq .[0].Status.RunningInstancesInfo.PerRunningInstanceInfo[0] )
        cpu_usage=$(echo "$instance_info" | jq .CpuUsage)
        memory_usage=$(echo "$instance_info" | jq .MemoryUsage)

        echo "*******************************"
        echo "Current Timestamp : $(date)"
        echo "Current CPU usage : ${cpu_usage}"
        echo "Current memory usage : ${memory_usage}"
    fi

    # Wait 10 seconds before refreshinf info
    sleep 10

done

This is what the script above does:

You can modify this script to monitor other resources, which can be found in the SDK documentation. For example, you can get access to:

As an example, to get the execution time, all you need is to add the following line in the above script (after line 21).

Python

execution_time = instance_info.execution_time_sec
print("Current execution time : {:.2f} s".format(execution_time))

Bash

#!/bin/bash

# Note: the following assumes that you have installed the .json parser utility [jq](https://jqlang.github.io/jq/manual/#basic-filters)

# =============== Task creation =============== #

# Your info
export QARNOT_CLIENT_TOKEN="<<<MY_SECRET_TOKEN>>>"

# Create and run task
qarnot task create \
--name "CPU-RAM-monitoring" \
--shortname "1234567890" \
--profile docker-batch \
--instance 1 \
--constants "DOCKER_CMD=sleep 1200"

# =============== Task info processing =============== #

###############################################
# Utility function to fetch the infos of the task 
###############################################
get_info () {
    qarnot task info --id "1234567890"
}

# Fetch task info and extract task state
info=$(get_info)
last_state=""

while true ; do
    # Fetch task info and extract task state
    info=$(get_info)
    state=$(echo "$info" | jq .[0].State)

    # Print changes of state to stdout
    if [[ "$state" != "$last_state" ]] ; then
        last_state=$state
        echo "$last_state"
    fi

    # Check if task is done
    completed=$(echo "$info" | jq .[0].Completed)
    if $completed ; then
        exit
    fi

    # If task is executing, update cpu and memory usage
    if [[ $state = \"FullyExecuting\" ]] ; then
        instance_info=$(echo "$info" | jq .[0].Status.RunningInstancesInfo.PerRunningInstanceInfo[0] )
        cpu_usage=$(echo "$instance_info" | jq .CpuUsage)
        memory_usage=$(echo "$instance_info" | jq .MemoryUsage)

        echo "*******************************"
        echo "Current Timestamp : $(date)"
        echo "Current CPU usage : ${cpu_usage}"
        echo "Current memory usage : ${memory_usage}"
    fi

    # Wait 10 seconds before refreshinf info
    sleep 10

done