Troubleshooting

‍

If your computation fails, you should follow this troubleshooting guide to find out the cause of the error:

1. Check the task state

You can access the task state on Tasq:

‍

or through the SDK Python/CLI with:

Python

import qarnot
conn = qarnot.Connection(client_token = "<<< PUT YOUR SECRET TOKEN >>>")
task = conn.retrieve_task("<<< PUT YOUR TASK UUID>>>>")
task.state

‍

Bash

export QARNOT_CLIENT_TOKEN=<<<MY_SECRET_TOKEN>>>
qarnot task info --id <<<TASK_UUID>>>

‍

The task’s unique identifier (UUID) can be found on Tasq or stored when you launch your task.

‍

If the state is "cancelled", your task was aborted. As you are the only one who can abort one of your tasks, through Tasq button "ABORT" or through the SDKs, you should re-launch your computation without aborting it.

If the state is "error", then you will need to dig in more and follow the next steps.

2. Check the error code

The error code of your task gives you more insights on the error's cause. You can find a list of the existing error codes. You can find the error code of your task:

on Tasq, by clicking on the task and then on the json tab:

‍

with the Python SDK / CLI:

‍Python

qarnot task info --id <<<TASK_UUID>>> | jq -r '.[0].Errors'

‍

Bash

qarnot task info --id <<<TASK_UUID>>> | jq -r '.[0].Errors'

‍

Depending on the error message, the next steps should be quite clear, here are a few examples:

"Resource download failed: Invalid credentials given for docker registry": check the constant DOCKER_REPO is properly filled, and if it is a private repository, check that your povided correct credentials. More information in the Off the shelf Docker images section and in the Custom Docker images section. We recommend using Qarnot Containers Registry to store Container Images.
"You ran out of credits": add credits to your account.
"Internal error" refers to an error inside the computation, if this is your error code, you should follow the next points.

3. Check the computation logs

On Tasq, click on your task and go through the STDOUT and STDERR tabs to find the source of the error.

‍

If your logs are not precise enough and you are the developer of the script, you must increase the quantity of logs in order to know exactly the root cause of the error and to fix it.

If your logs are not precise enough and you are not the developer of the script:

if it is a famous software, try to find similar errors on the internet.
otherwise, try to get in touch directly with the developers of the project.

Here are a few basic errors to be careful with:

your inputs are not in the path you expect. By default, in Qarnot, your input bucket will be available in /job.
your connection is not the one you expect. In order to optimize security, some profiles don't have access to the internet. That's the case for docker-batch and many others. Check the profile you chose has the right accesses: more information in the Choosing the right profile guide or the Docker article.

The STDOUT and STDERR are fetched as best effort. If you want to get all the logs of your computation, you must take care of writing them in an output file and to upload this output file: check our example in the Fetching logs section.

4. Step by step tests

Logs always give a good idea of what the error is, but testing in a simpler environment can complement this information. The best practice in this case is to remove layers of complexity: start with the simplest environment and add layers until the error appears, it will give you important clues to understand where the error comes from:

Launch your computation on your local laptop.
Launch your computation on a container with access to the terminal.
- If your local input files are in an inputs/ folder, it will look like this: docker run -it --rm -v inputs:/job <<<MY-DOCKER-REPO>>> /bin/bash
- you can then launch manually your computation, from within the container: python3 my_script.py for instance
Launch your computation in a container: docker run --rm -v inputs:/job <<<MY-DOCKER-REPO>>> python3 my_script.py
Launch your computation on Qarnot.

If you still cannot solve your problem, send an email to our support team: support-compute@qarnot-computing.com.

For more information on monitoring and debugging please consult the following articles

Core concept Tasks: which describes how tasks behave and their status
Monitoring resources
Monitor your computations with the Web UI
Monitor your computations with a SDK
Fetching logs
Error codes

Préférence de confidentialité

Troubleshooting

1. Check the task state

2. Check the error code

3. Check the computation logs

4. Step by step tests

Related Article