If your computation fails, you should follow this troubleshooting guide to find out the cause of the error:
You can access the task state on Tasq:
.jpg)
or through the SDK Python/CLI with:
Python
import qarnot
conn = qarnot.Connection(client_token = "<<< PUT YOUR SECRET TOKEN >>>")
task = conn.retrieve_task("<<< PUT YOUR TASK UUID>>>>")
task.state
Bash
export QARNOT_CLIENT_TOKEN=<<<MY_SECRET_TOKEN>>>
qarnot task info --id <<<TASK_UUID>>>
The task’s unique identifier (UUID) can be found on Tasq or stored when you launch your task.
.png)
If the state is "cancelled", your task was aborted. As you are the only one who can abort one of your tasks, through Tasq button "ABORT" or through the SDKs, you should re-launch your computation without aborting it.
If the state is "error", then you will need to dig in more and follow the next steps.
The error code of your task gives you more insights on the error's cause. You can find a list of the existing error codes. You can find the error code of your task:
.png)
Python
qarnot task info --id <<<TASK_UUID>>> | jq -r '.[0].Errors'
Bash
qarnot task info --id <<<TASK_UUID>>> | jq -r '.[0].Errors'
Depending on the error message, the next steps should be quite clear, here are a few examples:
On Tasq, click on your task and go through the STDOUT and STDERR tabs to find the source of the error.
.png)
If your logs are not precise enough and you are the developer of the script, you must increase the quantity of logs in order to know exactly the root cause of the error and to fix it.
If your logs are not precise enough and you are not the developer of the script:
Here are a few basic errors to be careful with:
The STDOUT and STDERR are fetched as best effort. If you want to get all the logs of your computation, you must take care of writing them in an output file and to upload this output file: check our example in the Fetching logs section.
Logs always give a good idea of what the error is, but testing in a simpler environment can complement this information. The best practice in this case is to remove layers of complexity: start with the simplest environment and add layers until the error appears, it will give you important clues to understand where the error comes from:
If you still cannot solve your problem, send an email to our support team: support-compute@qarnot-computing.com.
For more information on monitoring and debugging please consult the following articles