Difference between revisions of "Applications/Ncbi-blast"

From HPC
Jump to: navigation , search
(Created page with "==Application Details== * Description: NCBI-Blast finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence d...")
 
m
Line 10: Line 10:
 
===Command Set===
 
===Command Set===
  
{| class="wikitable"
+
ncbi-blast provides the following commands:
| style="width:50%" | <Strong></Strong>
 
| style="width:50%" | <Strong></Strong>
 
|-
 
| [[Applications|Applications]]
 
| How to use the application modules that are available on Viper.
 
|-
 
| [[General|General]]
 
| General information about using Viper.
 
|-
 
| [[Training|Training]]
 
| Training information about how to extend your knowledge of using Viper.
 
|-
 
| [[Programming|Programming]]
 
| Programming support on writing and compiling programs in a range of different languages available.
 
|-
 
|}
 
blastdb_aliastool 
 
blastdbcmd     
 
blastn 
 
blastx           
 
deltablast 
 
legacy_blast.pl 
 
makembindex   
 
psiblast 
 
rpstblastn 
 
tblastn 
 
update_blastdb.pl
 
blastdbcheck     
 
blast_formatter 
 
blastp 
 
convert2blastmask 
 
dustmasker 
 
makeblastdb     
 
makeprofiledb 
 
rpsblast 
 
segmasker 
 
tblastx 
 
windowmasker
 
  
 +
* blast_formatter 
 +
* blastdb_aliastool 
 +
* blastdbcheck     
 +
* blastdbcmd     
 +
* blastn 
 +
* blastp 
 +
* blastx           
 +
* convert2blastmask 
 +
* deltablast 
 +
* dustmasker 
 +
* legacy_blast.pl 
 +
* makeblastdb     
 +
* makembindex   
 +
* makeprofiledb 
 +
* psiblast 
 +
* rpsblast 
 +
* rpstblastn 
 +
* segmasker 
 +
* tblastn 
 +
* tblastx 
 +
* update_blastdb.pl
 +
* windowmasker
 +
 +
===SLURM Script===
  
 
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
 
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
  
[username@login] module load abyss/1.5.2/gcc-4.9.3
+
#!/bin/bash
[username@login] wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4/test-data.tar.gz
 
[username@login] tar xzvf test-data.tar.gz
 
[username@login} abyss-pe k=25 name=test \
 
    in='test-data/reads1.fastq test-data/reads2.fastq'
 
  
</pre>
+
fasta=/home/user1/maker.all.proteins.fasta
 +
seqs_per_file=300
 +
files_per_dir=100
 +
BLAST_DB=/home/user1/database/nr_metazoa
 +
prefix=Tcancriformis_maker2
 +
threads_per_job=28
 +
e=1e-5
 +
cul=10
 +
n_seqs=50
 +
fmt=5 #5=xml
 +
basedir=$(pwd)
 +
partition_script=/home/user1/ectools/partition.py
  
===Calculate assembly contiguity statistics===
+
###########
  
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
+
echo -e "\nsplitting up files\n"
 
+
#cat $fasta | sed 's/ .*//g' > $genome.fasta
[username@login] module load abyss/1.5.2/gcc-4.9.3
+
python $partition_script $seqs_per_file $files_per_dir $fasta
[username@login] abyss-fac test-unitigs.fa
 
 
 
</pre>
 
  
===Parallel processing===
+
count=$(ls -1 | grep -E "^[0-9]{4}" |wc -l)
 +
for i in $(seq $count -1 1)
 +
do
 +
        current=$(printf "%04d" $i)
 +
        echo -e "processing directory $current\n"
 +
        cd $current
  
The np option of abyss-pe specifies the number of processes to use for the parallel MPI job. Without any MPI configuration, this will allow you to use multiple cores on a single machine. To use multiple machines for assembly, you must create a hostfile for mpirun, which is described in the mpirun man page.
+
        for p in $(ls -1 | grep -E "p[0-9]{4}$" | sort -nr)
 +
        do
  
Do not run '''mpirun -np 8 abyss-pe'''. To run ABySS with 8 threads, use '''abyss-pe np=8'''. The abyss-pe driver script will start the MPI process, like so: '''mpirun -np 8 ABYSS-P.'''
+
                echo -e "#!/bin/bash
 +
#SBATCH -J b-$current-$p-$prefix
 +
#SBATCH -N 1
 +
#SBATCH --ntasks-per-node $threads_per_job
 +
#SBATCH -o job-%j.out
 +
#SBATCH -e job-%j.out
 +
#SBATCH -p compute
  
The paired-end assembly stage is multithreaded, but must run on a single machine. The number of threads to use may be specified with the parameter j. The default value for j is the value of np.
+
#LOAD MODULE
 +
module load ncbi-blast/2.4.0
 +
#
 +
date
 +
cd $basedir/$current
  
'''Note''': this example is done on a high memory node, usually access would be achieved with the scheduler
+
echo -e \"\\\nNumber of scaffolds to process:\\\t\$(cat $basedir/$current/${p} | grep \">\" | wc -l)\"
 +
echo -e \"\\\nTotal length of scaffolds:\\\t\$(cat $basedir/$current/${p} | grep \">\" -v | perl -ne 'chomp; print \"\$_\"' | wc -m)\\\n\"
  
<pre style="background-color: black; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
+
blastp -query ${p} -db $BLAST_DB -outfmt $fmt -max_target_seqs $n_seqs -culling_limit $cul -num_threads \$SLURM_NTASKS_PER_NODE -evalue $e -out ${p}-vs-nt-n$n_seqs.cul$cul.$e.blastp.out.xml
  
[username@c230 ~]$ module add abyss/1.5.2/gcc-4.9.3
+
echo -e \"\\\nDONE\\\n\"
[username@c230 ~]$ mpirun abyss-pe np=40
+
date" > run_blastn_${p}.slurm.sh
 +
                sbatch run_blastn_${p}.slurm.sh
 +
        done
 +
        cd ..
 +
done
  
 
</pre>
 
</pre>
  
Through '''SLURM''' this would become the script:
 
 
<pre style="background-color: black; color: white; border: 2px solid black; font-family: monospace, sans-serif;">
 
 
#!/bin/bash
 
 
 
#SBATCH -J abyss
 
#SBATCH -p highmem
 
#SBATCH -N 2
 
#SBATCH --ntasks-per-node=40
 
#SBATCH -o %N.%j.%a.out
 
#SBATCH -e %N.%j.%a.err
 
#SBATCH --exclusive
 
#SBATCH -t 00:30:00
 
 
module purge
 
module load abyss/1.5.2/gcc-4.9.3
 
  
#Run your ABySS commands
 
 
abyss-pe name=test k=48 n=8 in='test-1.fa test-3.fa'
 
 
</pre>
 
  
 
==Further Information==
 
==Further Information==
  
* [http://computing.bio.cam.ac.uk/local/doc/abyss.html http://computing.bio.cam.ac.uk/local/doc/abyss.html]
+
* [https://blast.ncbi.nlm.nih.gov/Blast.cgi https://blast.ncbi.nlm.nih.gov/Blast.cgi]
  
 
{|
 
{|

Revision as of 13:24, 5 April 2017

Application Details

  • Description: NCBI-Blast finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
  • Version: 2.4.0
  • Modules: ncbi-blast/2.4.0
  • Licence: Open-source (BLAST is a registered trademark of the National Library of Medicine)

Usage Examples

Command Set

ncbi-blast provides the following commands:

  • blast_formatter
  • blastdb_aliastool
  • blastdbcheck
  • blastdbcmd
  • blastn
  • blastp
  • blastx
  • convert2blastmask
  • deltablast
  • dustmasker
  • legacy_blast.pl
  • makeblastdb
  • makembindex
  • makeprofiledb
  • psiblast
  • rpsblast
  • rpstblastn
  • segmasker
  • tblastn
  • tblastx
  • update_blastdb.pl
  • windowmasker

SLURM Script


#!/bin/bash

fasta=/home/user1/maker.all.proteins.fasta
seqs_per_file=300
files_per_dir=100
BLAST_DB=/home/user1/database/nr_metazoa
prefix=Tcancriformis_maker2
threads_per_job=28
e=1e-5
cul=10
n_seqs=50
fmt=5 #5=xml
basedir=$(pwd)
partition_script=/home/user1/ectools/partition.py

###########

echo -e "\nsplitting up files\n"
#cat $fasta | sed 's/ .*//g' > $genome.fasta
python $partition_script $seqs_per_file $files_per_dir $fasta

count=$(ls -1 | grep -E "^[0-9]{4}" |wc -l)
for i in $(seq $count -1 1)
do
        current=$(printf "%04d" $i)
        echo -e "processing directory $current\n"
        cd $current

        for p in $(ls -1 | grep -E "p[0-9]{4}$" | sort -nr)
        do

                echo -e "#!/bin/bash
#SBATCH -J b-$current-$p-$prefix
#SBATCH -N 1
#SBATCH --ntasks-per-node $threads_per_job
#SBATCH -o job-%j.out
#SBATCH -e job-%j.out
#SBATCH -p compute

#LOAD MODULE
module load ncbi-blast/2.4.0
#
date
cd $basedir/$current

echo -e \"\\\nNumber of scaffolds to process:\\\t\$(cat $basedir/$current/${p} | grep \">\" | wc -l)\"
echo -e \"\\\nTotal length of scaffolds:\\\t\$(cat $basedir/$current/${p} | grep \">\" -v | perl -ne 'chomp; print \"\$_\"' | wc -m)\\\n\"

blastp -query ${p} -db $BLAST_DB -outfmt $fmt -max_target_seqs $n_seqs -culling_limit $cul -num_threads \$SLURM_NTASKS_PER_NODE -evalue $e -out ${p}-vs-nt-n$n_seqs.cul$cul.$e.blastp.out.xml

echo -e \"\\\nDONE\\\n\"
date" > run_blastn_${p}.slurm.sh
                sbatch run_blastn_${p}.slurm.sh
        done
        cd ..
done


Further Information

Icon home.png