RETURN
Biotech
October 2021

BlastN on Qarnot Cloud

Basic Local Alignment Search Tool (BLAST) is initially an online web-based tool allowing to find regions of similarity between biological sequences. The program compares nucleotide sequences to sequence databases and computes statistical significance. Depending on the sequencing data type, there are different specific tools. In this article, we focus on the alignment of nucleotide sequences and thus, on the usage of BlastN.

Version

The test case uses BLAST 2.10.1, released in 2020.

If you are interested in another version, please send us an email at qlab@qarnot.com.

Prerequisites

Before launching a computation task, please ensure that you already fulfill those requirements:

Test case

This test case is a simple example of BLAST use, and more particularly the tool BlastN, on Qarnot Cloud, using the Python SDK. We will align a list of query DNA sequences against another list of reference DNA sequence. Please find a dataset containing two local sequences. Unzip them and place them both in a folder named dataset-blastn. (Find below the headers of the two local sequences files:)

NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
NM_005514.8 Homo sapiens major histocompatibility complex, class I, B (HLA-B), mRNA

In that same dataset-blastn folder, create a run_blastn.sh file and copy the following code in it.

run_blastn.sh

#!/bin/bash

makeblastdb -in chr6.fna -dbtype nucl -parse_seqids -out chr6
blastn -db chr6 -query hla-b.fsa -out results.out

This code contains instructions to:

Launching the case

Copy the following code in a Python script and save it in the same path as the dataset-blastn folder under the name blastn.py.

blastn.py

#!/usr/bin/env python3

# Import the Qarnot SDK
import qarnot

# Connect to the Qarnot platform
conn=qarnot.connection.Connection(client_token='MY_SECRET_TOKEN')

# Create a task
task = conn.create_task("BLASTN-demo", "docker-batch", 1)

# Create an input bucket with the case files
input_bucket = conn.create_bucket("blastn-demo-input")
input_bucket.sync_directory("dataset-blastn")
task.resources.append(input_bucket)

# Create an output bucket
output_bucket = conn.create_bucket("blastn-demo-output")
task.results = output_bucket

# Give parameters regarding the Docker image to be used
task.constants["DOCKER_REPO"] = "ncbi/blast"
task.constants["DOCKER_TAG"] = "2.10.1"
task.constants['DOCKER_CMD'] = "./run_blastn.sh"

# Submit the task and download results
task.run(output_dir="output")

Be sure you have copied your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>) to be able to launch the task on Qarnot. Make sure that all input files mentioned above (1 fna file, 1 fsa file, 1 sh file) are in the same folder named dataset-blastn. Your working directory should look like this:

To launch this script, open a terminal in your working directory and execute python3 blastn.py.

Results

At any given time, you can monitor the status of your task on our platform.

You should now have an output folder in your working directory on your computer and a blastn-demo-output bucket containing all output files (built database and scores for sequences alignment in the results.out file, see bellow).

results.out

BLASTN 2.10.1+

Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: chr6.fna
           1 sequences; 170,805,979 total letters



Query= NM_005514.8 Homo sapiens major histocompatibility complex, class I,
B (HLA-B), mRNA

Length=1536
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly   784     0.0  


&gt;NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
Length=170805979

 Score = 784 bits (424),  Expect = 0.0
 Identities = 424/424 (100%), Gaps = 0/424 (0%)
 Strand=Plus/Minus

Query  1113      AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT  1172
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354298  AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT  31354239

Query  1173      GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC  1232
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354238  GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC  31354179

Query  1233      TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG  1292
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354178  TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG  31354119

Query  1293      TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT  1352
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354118  TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT  31354059

Query  1353      GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA  1412
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354058  GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA  31353999

Query  1413      CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT  1472
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31353998  CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT  31353939

Query  1473      GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT  1532
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31353938  GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT  31353879

Query  1533      TCCA  1536
                 ||||
Sbjct  31353878  TCCA  31353875

...

Wrapping up

That’s it! If you have any questions, please contact qlab@qarnot.com and it will be our pleasure to help you!

Return

Our articles