Difference between revisions of "Applications/Trinityrnaseq"
From HPC
m |
m |
||
Line 71: | Line 71: | ||
* [https://github.com/trinityrnaseq/trinityrnaseq/wiki https://github.com/trinityrnaseq/trinityrnaseq/wiki] | * [https://github.com/trinityrnaseq/trinityrnaseq/wiki https://github.com/trinityrnaseq/trinityrnaseq/wiki] | ||
− | + | ==Navigation== | |
− | + | ||
− | |||
* [[Main_Page|Home]] | * [[Main_Page|Home]] | ||
− | * [[Applications|Application support]] | + | * [[Applications|Application support]] * |
* [[General|General]] | * [[General|General]] | ||
− | |||
* [[Programming|Programming support]] | * [[Programming|Programming support]] | ||
− | |||
− |
Revision as of 13:49, 24 May 2019
Application Details
- Description: Trinity assembles transcript sequences from Illumina RNA-Seq data.
- Version: 2.2.0, 2.5.1 and 2.8.2
- Modules: trinityrnaseq/gcc/2.2.0, /2.5.1 and /2.8.2
- Licence: Github, open-source
Usage
This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:
- Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
- Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
- Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.
Assemble RNA-Seq data
[username@login] module add trinityrnaseq/gcc/2.8.2 [username@login] Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G
Find assembled transcripts as: 'trinity_out_dir/Trinity.fasta'
A typical script would be:
#!/bin/bash if [ -e reads.right.fq.gz ] && [ ! -e reads.right.fq ]; then gunzip -c reads.right.fq.gz > reads.right.fq fi if [ -e reads.left.fq.gz ] && [ ! -e reads.left.fq ]; then gunzip -c reads.left.fq.gz > reads.left.fq fi if [ -e reads2.right.fq.gz ] && [ ! -e reads2.right.fq ]; then gunzip -c reads2.right.fq.gz > reads2.right.fq fi if [ -e reads2.left.fq.gz ] && [ ! -e reads2.left.fq ]; then gunzip -c reads2.left.fq.gz > reads2.left.fq fi ####################################################### ## Run Trinity to Generate Transcriptome Assemblies ## ####################################################### Trinity --seqType fq --max_memory 2G --left reads.left.fq.gz,reads2.left.fq.gz --right reads.right.fq.gz,reads2.right.fq.gz --SS_lib_type RF --CPU 4 --no_cleanup --normalize_reads ##### Done Running Trinity ##### if [ ! $* ]; then exit 0 fi