From HPC
Jump to: navigation , search

Application Details

  • Description: The OBITools package is a set of programs specifically designed for analysing NGS data in a DNA meta-barcoding context, taking into account taxonomic information
  • Version: 1.2.9
  • Module: obitools/1.2.9
  • Licence: CeCILL 2.1 license

Usage Areas

Once installed, the OBITools enrich the Unix command line interface with a set of new commands dedicated to NGS data processing. Most of them have a name starting with the obi prefix. They automatically recognize the input file format amongst most of the standard sequence file formats (i.e. fasta, fastq, EMBL, and GenBank formats). Nevertheless, options are available to enforce some format specificity such as the encoding system used in fastq files for quality codes. Most of the basic Unix commands have their OBITools equivalent (e.g. obihead vs head, obitail vs tail, obigrep vs grep), which is convenient for scientists familiar with Unix. The main difference between any standard Unix command and its OBITools counterpart is that the treatment unit is no longer the text line but the sequence record. As a sequence record is more complex than a single text line, the OBITools programs have many supplementary options compared to their Unix equivalents.

Metabarcode design and quality assessment

  • ecoPCR: in silico PCR
  • ecoPrimers: new barcode markers and primers
  • ecotaxstat : getting the coverage of an ecoPCR output compared to the original ecoPCR database
  • ecotaxspecificity: Evaluates barcode resolution

File format conversions

  • obiconvert: converts sequence files to different output formats
  • obipr2: converts silva database into an ecoPCR database
  • obisilva: converts silva database into an ecoPCR database
  • obitaxonomy: manages taxonomic databases
  • obitab: converts a sequence file to a tabular file

Sequence annotations

  • ecotag: assigns sequences to taxa
  • obiannotate: adds/edits sequence record annotations
  • obiaddtaxids: adds taxids to sequence records using an ecopcr database

Computations on sequences

  • illuminapairedend: aligns paired-end Illumina reads
  • ngsfilter : Assigns sequence records to the corresponding experiment/sample based on DNA tags and primers
  • obicomplement: reverse-complements sequences
  • obiclean: tags a set of sequences for PCR/sequencing errors identification
  • obicut: trims sequences
  • obijoinpairedend: Joins paired-end reads
  • obiuniq: groups and dereplicates sequences

Sequence sampling and filtering

  • obiextract: extract samples from a dataset
  • obigrep: filters sequence file
  • obihead: extracts the first sequence records
  • obisample: randomly resamples sequence records
  • obiselect : selects representative sequence records
  • obisplit: Splits a sequence file in a set of subfiles
  • obisubset specific options
       Options to specify input format
       Options to specify output format
       Common options
  • obisubset modifies sequence attributes
  • obisubset used sequence attribute
  • obitail: extracts the last sequence records

Statistics over sequence file

  • ecodbtaxstat: gives taxonomic rank frequency of a given ecopcr database
  • obicount: counts the number of sequence records
  • obistat: computes basic statistics for attribute values


  • oligotag: Designs a set of oligonucleotides with specified properties
  • obidistribute: Distributes sequence records over several sequence records files
  • obisort: Sorts sequence records according to the value of a given attribute
  • obitaxonomy: manages taxonomic databases
  • ecofind: querying a taxonomic database

Further details are at


[username@login01 ~]$ module obitools/1.2.9

Slurm Example


#SBATCH -J obitools_database
#SBATCH --ntasks-per-node 40
#SBATCH -o %N.%J.out
#SBATCH -e %N.%J.err 
#SBATCH -p highmem
#SBATCH --exclusive

module load obitools/1.2.9
module load python/anaconda/4.0/2.7
export PATH=$PATH:~/applications/ecoPCR/src/

echo -e "Starttime: $(date)\n"
echo use ecoPCR to simulate an in silico PCR
mkdir 16S_db
echo 16S
ecoPCR -d ./db/embl_last -e 3 -l 50 -L 2000 AGRGTTYGATYMTGGCTCAG GACGGGCGGTGWGTRCA > 16S.v05.ecopcr

echo clean the databse
obigrep -d ./db/embl_last --require-rank=species --require-rank=genus --require-rank=family 16S.v05.ecopcr > 16S_db/16S.v05_clean.fasta

obiuniq -d ./db/embl_last 16S_db/16S.v05_clean.fasta > 16S_db/16S.v05_clean_uniq.fasta

obigrep -d ./db/embl_last --require-rank=family 16S_db/16S.v05_clean_uniq.fasta > 16S_db/16S.v05_clean_uniq_clean.fasta

obiannotate --uniq-id 16S_db/16S.v05_clean_uniq_clean.fasta > 16S_db/db_16S.v05.fasta

echo -e "Endtime: $(date)\n"
echo done

Further Information