Applications/Trinityrnaseq

From HPC
Revision as of 09:40, 19 April 2017 by Pysdlb (talk | contribs) (Pysdlb moved page Trinityrnaseq to Applications/Trinityrnaseq without leaving a redirect)

Jump to: navigation , search

Application Details

  • Description: Trinity assembles transcript sequences from Illumina RNA-Seq data.
  • Version: 2.2.0 (compiled with 2.2.0)
  • Modules: trinityrnaseq/gcc/2.2.0
  • Licence: Github, open-source

Usage

This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:

  • Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
  • Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
  • Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.


Assemble RNA-Seq data


[username@login] module add trinityrnaseq/gcc/2.2.0
[username@login] Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G 

Find assembled transcripts as: 'trinity_out_dir/Trinity.fasta'

A typical script would be:

#!/bin/bash

if [ -e reads.right.fq.gz ] && [ ! -e reads.right.fq ]; then
    gunzip -c reads.right.fq.gz > reads.right.fq
fi

if [ -e reads.left.fq.gz ] && [ ! -e reads.left.fq ]; then
    gunzip -c reads.left.fq.gz > reads.left.fq
fi

if [ -e reads2.right.fq.gz ] && [ ! -e reads2.right.fq ]; then
    gunzip -c reads2.right.fq.gz > reads2.right.fq
fi

if [ -e reads2.left.fq.gz ] && [ ! -e reads2.left.fq ]; then
    gunzip -c reads2.left.fq.gz > reads2.left.fq
fi

#######################################################
##  Run Trinity to Generate Transcriptome Assemblies ##
#######################################################

Trinity --seqType fq --max_memory 2G --left reads.left.fq.gz,reads2.left.fq.gz --right reads.right.fq.gz,reads2.right.fq.gz --SS_lib_type RF --CPU 4 --no_cleanup --normalize_reads

##### Done Running Trinity #####

if [ ! $* ]; then
    exit 0
fi



Further Information

Icon home.png