Difference between revisions of "Applications/Obitools"

From HPC
Jump to: navigation , search
(Created page with "==Application Details== *Description: The OBITools package is a set of programs specifically designed for analysing NGS data in a DNA meta-barcoding context, taking into accou...")
 
m (Navigation)
 
(5 intermediate revisions by the same user not shown)
Line 16: Line 16:
 
* ecotaxstat : getting the coverage of an ecoPCR output compared to the original ecoPCR database
 
* ecotaxstat : getting the coverage of an ecoPCR output compared to the original ecoPCR database
 
* ecotaxspecificity: Evaluates barcode resolution
 
* ecotaxspecificity: Evaluates barcode resolution
 +
  
 
===File format conversions===
 
===File format conversions===
Line 24: Line 25:
 
* obitaxonomy: manages taxonomic databases
 
* obitaxonomy: manages taxonomic databases
 
* obitab: converts a sequence file to a tabular file
 
* obitab: converts a sequence file to a tabular file
 +
  
 
===Sequence annotations===
 
===Sequence annotations===
Line 30: Line 32:
 
* obiannotate: adds/edits sequence record annotations
 
* obiannotate: adds/edits sequence record annotations
 
* obiaddtaxids: adds taxids to sequence records using an ecopcr database
 
* obiaddtaxids: adds taxids to sequence records using an ecopcr database
 +
  
 
===Computations on sequences===
 
===Computations on sequences===
Line 40: Line 43:
 
* obijoinpairedend: Joins paired-end reads
 
* obijoinpairedend: Joins paired-end reads
 
* obiuniq: groups and dereplicates sequences
 
* obiuniq: groups and dereplicates sequences
 +
  
 
===Sequence sampling and filtering===
 
===Sequence sampling and filtering===
Line 56: Line 60:
 
* obisubset used sequence attribute
 
* obisubset used sequence attribute
 
* obitail: extracts the last sequence records
 
* obitail: extracts the last sequence records
 +
  
 
===Statistics over sequence file===
 
===Statistics over sequence file===
Line 62: Line 67:
 
* obicount: counts the number of sequence records
 
* obicount: counts the number of sequence records
 
* obistat: computes basic statistics for attribute values
 
* obistat: computes basic statistics for attribute values
 +
  
 
===Utilities===
 
===Utilities===
Line 70: Line 76:
 
* obitaxonomy: manages taxonomic databases
 
* obitaxonomy: manages taxonomic databases
 
* ecofind: querying a taxonomic database
 
* ecofind: querying a taxonomic database
 +
 +
 +
Further details are at [http://metabarcoding.org/obitools/doc/scripts.html http://metabarcoding.org/obitools/doc/scripts.html]
  
  
Line 77: Line 86:
 
[username@login01 ~]$ module obitools/1.2.9
 
[username@login01 ~]$ module obitools/1.2.9
 
</pre>
 
</pre>
 +
 +
===Slurm Example===
 +
 +
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
 +
 +
#!/bin/bash
 +
 +
#SBATCH -J obitools_database
 +
#SBATCH -N 1
 +
#SBATCH --ntasks-per-node 40
 +
#SBATCH -o %N.%J.out
 +
#SBATCH -e %N.%J.err
 +
#SBATCH -p highmem
 +
#SBATCH --exclusive
 +
 +
module add obitools/1.2.9
 +
module add python/anaconda/4.0/2.7
 +
export PATH=$PATH:~/applications/ecoPCR/src/
 +
 +
echo -e "Starttime: $(date)\n"
 +
echo use ecoPCR to simulate an in silico PCR
 +
mkdir 16S_db
 +
echo 16S
 +
ecoPCR -d ./db/embl_last -e 3 -l 50 -L 2000 AGRGTTYGATYMTGGCTCAG GACGGGCGGTGWGTRCA > 16S.v05.ecopcr
 +
 +
echo clean the databse
 +
obigrep -d ./db/embl_last --require-rank=species --require-rank=genus --require-rank=family 16S.v05.ecopcr > 16S_db/16S.v05_clean.fasta
 +
 +
obiuniq -d ./db/embl_last 16S_db/16S.v05_clean.fasta > 16S_db/16S.v05_clean_uniq.fasta
 +
 +
obigrep -d ./db/embl_last --require-rank=family 16S_db/16S.v05_clean_uniq.fasta > 16S_db/16S.v05_clean_uniq_clean.fasta
 +
 +
obiannotate --uniq-id 16S_db/16S.v05_clean_uniq_clean.fasta > 16S_db/db_16S.v05.fasta
 +
 +
echo -e "Endtime: $(date)\n"
 +
echo done
 +
 +
</pre>
 +
  
  
Line 82: Line 130:
  
 
[http://metabarcoding.org/obitools/doc/ http://metabarcoding.org/obitools/doc/]
 
[http://metabarcoding.org/obitools/doc/ http://metabarcoding.org/obitools/doc/]
 +
 +
{{Modulepagenav}}

Latest revision as of 10:53, 16 November 2022

Application Details

  • Description: The OBITools package is a set of programs specifically designed for analysing NGS data in a DNA meta-barcoding context, taking into account taxonomic information
  • Version: 1.2.9
  • Module: obitools/1.2.9
  • Licence: CeCILL 2.1 license

Usage Areas

Once installed, the OBITools enrich the Unix command line interface with a set of new commands dedicated to NGS data processing. Most of them have a name starting with the obi prefix. They automatically recognize the input file format amongst most of the standard sequence file formats (i.e. fasta, fastq, EMBL, and GenBank formats). Nevertheless, options are available to enforce some format specificity such as the encoding system used in fastq files for quality codes. Most of the basic Unix commands have their OBITools equivalent (e.g. obihead vs head, obitail vs tail, obigrep vs grep), which is convenient for scientists familiar with Unix. The main difference between any standard Unix command and its OBITools counterpart is that the treatment unit is no longer the text line but the sequence record. As a sequence record is more complex than a single text line, the OBITools programs have many supplementary options compared to their Unix equivalents.


Metabarcode design and quality assessment

  • ecoPCR: in silico PCR
  • ecoPrimers: new barcode markers and primers
  • ecotaxstat : getting the coverage of an ecoPCR output compared to the original ecoPCR database
  • ecotaxspecificity: Evaluates barcode resolution


File format conversions

  • obiconvert: converts sequence files to different output formats
  • obipr2: converts silva database into an ecoPCR database
  • obisilva: converts silva database into an ecoPCR database
  • obitaxonomy: manages taxonomic databases
  • obitab: converts a sequence file to a tabular file


Sequence annotations

  • ecotag: assigns sequences to taxa
  • obiannotate: adds/edits sequence record annotations
  • obiaddtaxids: adds taxids to sequence records using an ecopcr database


Computations on sequences

  • illuminapairedend: aligns paired-end Illumina reads
  • ngsfilter : Assigns sequence records to the corresponding experiment/sample based on DNA tags and primers
  • obicomplement: reverse-complements sequences
  • obiclean: tags a set of sequences for PCR/sequencing errors identification
  • obicut: trims sequences
  • obijoinpairedend: Joins paired-end reads
  • obiuniq: groups and dereplicates sequences


Sequence sampling and filtering

  • obiextract: extract samples from a dataset
  • obigrep: filters sequence file
  • obihead: extracts the first sequence records
  • obisample: randomly resamples sequence records
  • obiselect : selects representative sequence records
  • obisplit: Splits a sequence file in a set of subfiles
  • obisubset specific options
       Options to specify input format
       Options to specify output format
       Common options
  • obisubset modifies sequence attributes
  • obisubset used sequence attribute
  • obitail: extracts the last sequence records


Statistics over sequence file

  • ecodbtaxstat: gives taxonomic rank frequency of a given ecopcr database
  • obicount: counts the number of sequence records
  • obistat: computes basic statistics for attribute values


Utilities

  • oligotag: Designs a set of oligonucleotides with specified properties
  • obidistribute: Distributes sequence records over several sequence records files
  • obisort: Sorts sequence records according to the value of a given attribute
  • obitaxonomy: manages taxonomic databases
  • ecofind: querying a taxonomic database


Further details are at http://metabarcoding.org/obitools/doc/scripts.html


Module

[username@login01 ~]$ module obitools/1.2.9

Slurm Example


#!/bin/bash

#SBATCH -J obitools_database
#SBATCH -N 1
#SBATCH --ntasks-per-node 40
#SBATCH -o %N.%J.out
#SBATCH -e %N.%J.err 
#SBATCH -p highmem
#SBATCH --exclusive

module add obitools/1.2.9
module add python/anaconda/4.0/2.7
export PATH=$PATH:~/applications/ecoPCR/src/

echo -e "Starttime: $(date)\n"
echo use ecoPCR to simulate an in silico PCR
mkdir 16S_db
echo 16S
ecoPCR -d ./db/embl_last -e 3 -l 50 -L 2000 AGRGTTYGATYMTGGCTCAG GACGGGCGGTGWGTRCA > 16S.v05.ecopcr

echo clean the databse
obigrep -d ./db/embl_last --require-rank=species --require-rank=genus --require-rank=family 16S.v05.ecopcr > 16S_db/16S.v05_clean.fasta

obiuniq -d ./db/embl_last 16S_db/16S.v05_clean.fasta > 16S_db/16S.v05_clean_uniq.fasta

obigrep -d ./db/embl_last --require-rank=family 16S_db/16S.v05_clean_uniq.fasta > 16S_db/16S.v05_clean_uniq_clean.fasta

obiannotate --uniq-id 16S_db/16S.v05_clean_uniq_clean.fasta > 16S_db/db_16S.v05.fasta

echo -e "Endtime: $(date)\n"
echo done


Further Information

http://metabarcoding.org/obitools/doc/





Modules | Main Page | Further Topics