Basic Local Alignment Search Tool (BLAST) is initially an online web-based tool allowing to find regions of similarity between biological sequences. The program compares nucleotide sequences to sequence databases and computes statistical significance. Depending on the sequencing data type, there are different specific tools. In this article, we focus on the alignment of nucleotide sequences and thus, on the usage of BlastN.
The test case uses BLAST 2.10.1, released in 2020.
If you are interested in another version, please send us an email at qlab@qarnot.com.
Before launching a computation task, please ensure that you already fulfill those requirements:
This test case is a simple example of BLAST use, and more particularly the tool BlastN, on Qarnot Cloud, using the Python SDK. We will align a list of query DNA sequences against another list of reference DNA sequence. Please find a dataset containing two local sequences. Unzip them and place them both in a folder named dataset-blastn
. (Find below the headers of the two local sequences files:)
NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
NM_005514.8 Homo sapiens major histocompatibility complex, class I, B (HLA-B), mRNA
In that same dataset-blastn
folder, create a run_blastn.sh
file and copy the following code in it.
run_blastn.sh
#!/bin/bash
makeblastdb -in chr6.fna -dbtype nucl -parse_seqids -out chr6
blastn -db chr6 -query hla-b.fsa -out results.out
This code contains instructions to:
Copy the following code in a Python script and save it in the same path as the dataset-blastn
folder under the name blastn.py
.
blastn.py
#!/usr/bin/env python3
# Import the Qarnot SDK
import qarnot
# Connect to the Qarnot platform
conn=qarnot.connection.Connection(client_token='MY_SECRET_TOKEN')
# Create a task
task = conn.create_task("BLASTN-demo", "docker-batch", 1)
# Create an input bucket with the case files
input_bucket = conn.create_bucket("blastn-demo-input")
input_bucket.sync_directory("dataset-blastn")
task.resources.append(input_bucket)
# Create an output bucket
output_bucket = conn.create_bucket("blastn-demo-output")
task.results = output_bucket
# Give parameters regarding the Docker image to be used
task.constants["DOCKER_REPO"] = "ncbi/blast"
task.constants["DOCKER_TAG"] = "2.10.1"
task.constants['DOCKER_CMD'] = "./run_blastn.sh"
# Submit the task and download results
task.run(output_dir="output")
Be sure you have copied your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>
) to be able to launch the task on Qarnot. Make sure that all input files mentioned above (1 fna file, 1 fsa file, 1 sh file) are in the same folder named dataset-blastn
. Your working directory should look like this:
dataset-blastn/
chr6.fna
: Homo sapiens chromosome 6hla-b.fsa
: Homo sapiens major histocompatibility complexrun_blastn.sh
: script to run the alignment using BlastNblastn.py
: Python script to run the computation on QarnotTo launch this script, open a terminal in your working directory and execute python3 blastn.py
.
At any given time, you can monitor the status of your task on our platform.
You should now have an output
folder in your working directory on your computer and a blastn-demo-output
bucket containing all output files (built database and scores for sequences alignment in the results.out
file, see bellow).
results.out
BLASTN 2.10.1+
Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.
Database: chr6.fna
1 sequences; 170,805,979 total letters
Query= NM_005514.8 Homo sapiens major histocompatibility complex, class I,
B (HLA-B), mRNA
Length=1536
Score E
Sequences producing significant alignments: (Bits) Value
NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly 784 0.0
>NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
Length=170805979
Score = 784 bits (424), Expect = 0.0
Identities = 424/424 (100%), Gaps = 0/424 (0%)
Strand=Plus/Minus
Query 1113 AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT 1172
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354298 AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT 31354239
Query 1173 GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC 1232
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354238 GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC 31354179
Query 1233 TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG 1292
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354178 TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG 31354119
Query 1293 TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT 1352
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354118 TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT 31354059
Query 1353 GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA 1412
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354058 GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA 31353999
Query 1413 CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT 1472
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31353998 CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT 31353939
Query 1473 GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT 1532
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31353938 GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT 31353879
Query 1533 TCCA 1536
||||
Sbjct 31353878 TCCA 31353875
...
That’s it! If you have any questions, please contact qlab@qarnot.com and it will be our pleasure to help you!