From HPC
Jump to: navigation , search

Application Details

  • Description: ABySS is a de novo sequence assembler intended for short paired-end reads and large genomes.
  • Version: 1.5.2 (gcc-4.9.3)
  • Modules: abyss/1.5.2/gcc-4.9.3
  • Licence: Free, open-source

Usage Examples

Assemble a small synthetic data set

[username@login] module load abyss/1.5.2/gcc-4.9.3
[username@login] wget
[username@login] tar xzvf test-data.tar.gz
[username@login} abyss-pe k=25 name=test \
    in='test-data/reads1.fastq test-data/reads2.fastq'

Calculate assembly contiguity statistics

[username@login] module load abyss/1.5.2/gcc-4.9.3
[username@login] abyss-fac test-unitigs.fa

Parallel processing

The np option of abyss-pe specifies the number of processes to use for the parallel MPI job. Without any MPI configuration, this will allow you to use multiple cores on a single machine. To use multiple machines for assembly, you must create a hostfile for mpirun, which is described in the mpirun man page.

Do not run mpirun -np 8 abyss-pe. To run ABySS with 8 threads, use abyss-pe np=8. The abyss-pe driver script will start the MPI process, like so: mpirun -np 8 ABYSS-P.

The paired-end assembly stage is multithreaded, but must run on a single machine. The number of threads to use may be specified with the parameter j. The default value for j is the value of np.

Note: this example is done on a high memory node, usually access would be achieved with the scheduler

[username@c230 ~]$ module add abyss/1.5.2/gcc-4.9.3
[username@c230 ~]$ mpirun abyss-pe np=40

Through SLURM this would become the script:

#SBATCH -J abyss
#SBATCH -p highmem
#SBATCH --ntasks-per-node=40
#SBATCH -o %N.%j.%a.out
#SBATCH -e %N.%j.%a.err
#SBATCH --exclusive
#SBATCH -t 00:30:00

module purge
module load abyss/1.5.2/gcc-4.9.3

#Run your ABySS commands

abyss-pe name=test k=48 n=8 in='test-1.fa test-3.fa'

Further Information

Icon home.png