Applications/Obitools
Contents
Application Details
- Description: The OBITools package is a set of programs specifically designed for analysing NGS data in a DNA meta-barcoding context, taking into account taxonomic information
- Version: 1.2.9
- Module: obitools/1.2.9
- Licence: CeCILL 2.1 license
Usage Areas
Once installed, the OBITools enrich the Unix command line interface with a set of new commands dedicated to NGS data processing. Most of them have a name starting with the obi prefix. They automatically recognize the input file format amongst most of the standard sequence file formats (i.e. fasta, fastq, EMBL, and GenBank formats). Nevertheless, options are available to enforce some format specificity such as the encoding system used in fastq files for quality codes. Most of the basic Unix commands have their OBITools equivalent (e.g. obihead vs head, obitail vs tail, obigrep vs grep), which is convenient for scientists familiar with Unix. The main difference between any standard Unix command and its OBITools counterpart is that the treatment unit is no longer the text line but the sequence record. As a sequence record is more complex than a single text line, the OBITools programs have many supplementary options compared to their Unix equivalents.
Metabarcode design and quality assessment
- ecoPCR: in silico PCR
- ecoPrimers: new barcode markers and primers
- ecotaxstat : getting the coverage of an ecoPCR output compared to the original ecoPCR database
- ecotaxspecificity: Evaluates barcode resolution
File format conversions
- obiconvert: converts sequence files to different output formats
- obipr2: converts silva database into an ecoPCR database
- obisilva: converts silva database into an ecoPCR database
- obitaxonomy: manages taxonomic databases
- obitab: converts a sequence file to a tabular file
Sequence annotations
- ecotag: assigns sequences to taxa
- obiannotate: adds/edits sequence record annotations
- obiaddtaxids: adds taxids to sequence records using an ecopcr database
Computations on sequences
- illuminapairedend: aligns paired-end Illumina reads
- ngsfilter : Assigns sequence records to the corresponding experiment/sample based on DNA tags and primers
- obicomplement: reverse-complements sequences
- obiclean: tags a set of sequences for PCR/sequencing errors identification
- obicut: trims sequences
- obijoinpairedend: Joins paired-end reads
- obiuniq: groups and dereplicates sequences
Sequence sampling and filtering
- obiextract: extract samples from a dataset
- obigrep: filters sequence file
- obihead: extracts the first sequence records
- obisample: randomly resamples sequence records
- obiselect : selects representative sequence records
- obisplit: Splits a sequence file in a set of subfiles
- obisubset specific options
Options to specify input format Options to specify output format Common options
- obisubset modifies sequence attributes
- obisubset used sequence attribute
- obitail: extracts the last sequence records
Statistics over sequence file
- ecodbtaxstat: gives taxonomic rank frequency of a given ecopcr database
- obicount: counts the number of sequence records
- obistat: computes basic statistics for attribute values
Utilities
- oligotag: Designs a set of oligonucleotides with specified properties
- obidistribute: Distributes sequence records over several sequence records files
- obisort: Sorts sequence records according to the value of a given attribute
- obitaxonomy: manages taxonomic databases
- ecofind: querying a taxonomic database
Further details are at http://metabarcoding.org/obitools/doc/scripts.html
Module
[username@login01 ~]$ module obitools/1.2.9
Slurm Example
#!/bin/bash #SBATCH -J obitools_database #SBATCH -N 1 #SBATCH --ntasks-per-node 40 #SBATCH -o %N.%J.out #SBATCH -e %N.%J.err #SBATCH -p highmem #SBATCH --exclusive module add obitools/1.2.9 module add python/anaconda/4.0/2.7 export PATH=$PATH:~/applications/ecoPCR/src/ echo -e "Starttime: $(date)\n" echo use ecoPCR to simulate an in silico PCR mkdir 16S_db echo 16S ecoPCR -d ./db/embl_last -e 3 -l 50 -L 2000 AGRGTTYGATYMTGGCTCAG GACGGGCGGTGWGTRCA > 16S.v05.ecopcr echo clean the databse obigrep -d ./db/embl_last --require-rank=species --require-rank=genus --require-rank=family 16S.v05.ecopcr > 16S_db/16S.v05_clean.fasta obiuniq -d ./db/embl_last 16S_db/16S.v05_clean.fasta > 16S_db/16S.v05_clean_uniq.fasta obigrep -d ./db/embl_last --require-rank=family 16S_db/16S.v05_clean_uniq.fasta > 16S_db/16S.v05_clean_uniq_clean.fasta obiannotate --uniq-id 16S_db/16S.v05_clean_uniq_clean.fasta > 16S_db/db_16S.v05.fasta echo -e "Endtime: $(date)\n" echo done