Troubleshooting

If your computation fails, you should follow this troubleshooting guide to find out the cause of the error:

1. Check the task state

You can access the task state on Tasq:

or through the SDK Python/CLI with:

Python

import qarnot
conn = qarnot.Connection(client_token = "<<< PUT YOUR SECRET TOKEN >>>")
task = conn.retrieve_task("<<< PUT YOUR TASK UUID>>>>")
task.state

Bash

export QARNOT_CLIENT_TOKEN=<<<MY_SECRET_TOKEN>>>
qarnot task info --id <<<TASK_UUID>>>

The task’s unique identifier (UUID) can be found on Tasq or stored when you launch your task.

If the state is "cancelled", your task was aborted. As you are the only one who can abort one of your tasks, through Tasq button "ABORT" or through the SDKs, you should re-launch your computation without aborting it.

If the state is "error", then you will need to dig in more and follow the next steps.

2. Check the error code

The error code of your task gives you more insights on the error's cause. You can find a list of the existing error codes. You can find the error code of your task:

Python

qarnot task info --id <<<TASK_UUID>>> | jq -r '.[0].Errors'

Bash

qarnot task info --id <<<TASK_UUID>>> | jq -r '.[0].Errors'

Depending on the error message, the next steps should be quite clear, here are a few examples:

3. Check the computation logs

On Tasq, click on your task and go through the STDOUT and STDERR tabs to find the source of the error.

If your logs are not precise enough and you are the developer of the script, you must increase the quantity of logs in order to know exactly the root cause of the error and to fix it.

If your logs are not precise enough and you are not the developer of the script:

Here are a few basic errors to be careful with:

The STDOUT and STDERR are fetched as best effort. If you want to get all the logs of your computation, you must take care of writing them in an output file and to upload this output file: check our example in the Fetching logs section.

4. Step by step tests

Logs always give a good idea of what the error is, but testing in a simpler environment can complement this information. The best practice in this case is to remove layers of complexity: start with the simplest environment and add layers until the error appears, it will give you important clues to understand where the error comes from:

If you still cannot solve your problem, send an email to our support team: support-compute@qarnot-computing.com.

Related Article

For more information on monitoring and debugging please consult the following articles