This a step-by-step guide on how to use AutoML, specifically Auto-sklearn, on Qarnot with minimal user intervention using a Binder Jupyter notebook as a graphical user interface. Binder is a free Jupyter notebook/lab hosting service that enables the user to share notebooks with other via a simple link.
We encourage you to read the standalone AutoML documentation to get a better understanding of how this software works.
If you are interested in another version, please send us an email at qlab@qarnot.com.
Before starting a calculation with the Python SDK, a few steps are required:
Note: in addition to the Python SDK, Qarnot provides C# and Node.js SDKs and a Command Line.
The data showcased in this tutorial is the Localization Data for Person Activity. It contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing the same scenario five times. The problem consists of classifying the activity type, from 11 different types (walking, falling; sitting, etc...), for each entry given the collected sensor data. You can download the data from this link.
Unlike the above linked AutoML tutorial, this is a multi-class classification problem, i.e. each data entry can have one of 11 different values for the activity type. As opposed to a binary classification where you have only two classes to predict (for example classifying images as dog or cat). This is a completely different Machine Learning problem using the same exact software.
Once you have downloaded the data set, all you have to do click on the following link to get access to the Jupyter notebook hosted on Binder.
You can see there are a number of fields in the page, here is an overview of the most important ones for this task:
.csv
files for now). For this test case, make sure to upload the above linked data set, phpH4DHsK.csv
.target column
will be available with the datasets column names.target column
) has to be set by the user. The rest are optional and have default values. For this test case make sure it is set to Class in the roll down menu.total training time
: The total time (in minutes) allocated to Auto-sklearn for this training task. After which the training will stop and results are sent back.per run training time
: Auto-sklearn trains multiple models in parallel in the given time limit. This parameter governs the time limit for each individual model. It can be set to around 10% of the total time for longer training times (>60 minutes) and more for shorter training times (~33% for < 60 minutes). There is no rule for this and the user should experiment with different values.shift
and/or ctrl
.Adaboost
estimator and exclude other estimators as they are already excluded by setting the first include parameter.Once all the parameters have been set, you can launch the task on Qarnot by simply clicking on the button Start Training on Qarnot!
.
You will get a live progress of the different states of the task. Once Training is complete you can click on the Display outputs
button to have a look at the graphs generated by the training (a confusion matrix and a plot of accuracy over time).
If you wish, you can generate a link to download a zip file containing all the outputs of your task. Mainly the graphs you see above, the trained model, and various logs with detailed performance metrics.
It is also possible to view these results from your bucket explorer by selecting the automl-binder-out
.
That’s it! If you have any questions please contact qlab@qarnot.com and we will help you with pleasure!