Difference between revisions of "Applications/Trinityrnaseq"

From HPC
Jump to: navigation , search
m (Pysdlb moved page Trinityrnaseq to Applications/Trinityrnaseq without leaving a redirect)
m (Navigation)
 
(3 intermediate revisions by the same user not shown)
Line 4: Line 4:
  
 
* Description: Trinity assembles transcript sequences from Illumina RNA-Seq data.
 
* Description: Trinity assembles transcript sequences from Illumina RNA-Seq data.
* Version: 2.2.0 (compiled with 2.2.0)
+
* Version: 2.2.0, 2.5.1 and 2.8.2
* Modules: trinityrnaseq/gcc/2.2.0
+
* Modules: trinityrnaseq/gcc/2.2.0, /2.5.1 and /2.8.2
 
* Licence: Github, open-source
 
* Licence: Github, open-source
  
 
==Usage==
 
==Usage==
  
This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:
+
This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:
  
 
* Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
 
* Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
  
* Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
+
* Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
  
* Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.
+
* Butterfly then processes the individual graphs in parallel, tracing the paths that read and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that correspond to paralogous genes.
  
  
Line 23: Line 23:
 
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
 
<pre style="background-color: #f5f5dc; color: black; font-family: monospace, sans-serif;">
  
[username@login] module add trinityrnaseq/gcc/2.2.0
+
[username@login] module add trinityrnaseq/gcc/2.8.2
 
[username@login] Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G  
 
[username@login] Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G  
  
Line 64: Line 64:
  
 
</pre>
 
</pre>
 
  
  
Line 72: Line 71:
 
* [https://github.com/trinityrnaseq/trinityrnaseq/wiki https://github.com/trinityrnaseq/trinityrnaseq/wiki]
 
* [https://github.com/trinityrnaseq/trinityrnaseq/wiki https://github.com/trinityrnaseq/trinityrnaseq/wiki]
  
{|
+
{{Modulepagenav}}
|style="width:5%; border-width: 0" | [[File:icon_home.png]]
 
|style="width:95%; border-width: 0" |
 
* [[Main_Page|Home]]
 
* [[Applications|Application support]]
 
* [[General|General]]
 
* [[Training|Training]]
 
* [[Programming|Programming support]]
 
|-
 
|}
 

Latest revision as of 11:00, 16 November 2022

Application Details

  • Description: Trinity assembles transcript sequences from Illumina RNA-Seq data.
  • Version: 2.2.0, 2.5.1 and 2.8.2
  • Modules: trinityrnaseq/gcc/2.2.0, /2.5.1 and /2.8.2
  • Licence: Github, open-source

Usage

This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:

  • Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
  • Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
  • Butterfly then processes the individual graphs in parallel, tracing the paths that read and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that correspond to paralogous genes.


Assemble RNA-Seq data


[username@login] module add trinityrnaseq/gcc/2.8.2
[username@login] Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G 

Find assembled transcripts as: 'trinity_out_dir/Trinity.fasta'

A typical script would be:

#!/bin/bash

if [ -e reads.right.fq.gz ] && [ ! -e reads.right.fq ]; then
    gunzip -c reads.right.fq.gz > reads.right.fq
fi

if [ -e reads.left.fq.gz ] && [ ! -e reads.left.fq ]; then
    gunzip -c reads.left.fq.gz > reads.left.fq
fi

if [ -e reads2.right.fq.gz ] && [ ! -e reads2.right.fq ]; then
    gunzip -c reads2.right.fq.gz > reads2.right.fq
fi

if [ -e reads2.left.fq.gz ] && [ ! -e reads2.left.fq ]; then
    gunzip -c reads2.left.fq.gz > reads2.left.fq
fi

#######################################################
##  Run Trinity to Generate Transcriptome Assemblies ##
#######################################################

Trinity --seqType fq --max_memory 2G --left reads.left.fq.gz,reads2.left.fq.gz --right reads.right.fq.gz,reads2.right.fq.gz --SS_lib_type RF --CPU 4 --no_cleanup --normalize_reads

##### Done Running Trinity #####

if [ ! $* ]; then
    exit 0
fi


Further Information





Modules | Main Page | Further Topics