Difference between revisions of "Applications/Trinityrnaseq"
From HPC
m |
m (→Navigation) |
||
(One intermediate revision by the same user not shown) | |||
Line 10: | Line 10: | ||
==Usage== | ==Usage== | ||
− | This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly | + | This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so: |
* Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts. | * Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts. | ||
− | * Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full | + | * Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs. |
− | * Butterfly then processes the individual graphs in parallel, tracing the paths that | + | * Butterfly then processes the individual graphs in parallel, tracing the paths that read and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that correspond to paralogous genes. |
Line 71: | Line 71: | ||
* [https://github.com/trinityrnaseq/trinityrnaseq/wiki https://github.com/trinityrnaseq/trinityrnaseq/wiki] | * [https://github.com/trinityrnaseq/trinityrnaseq/wiki https://github.com/trinityrnaseq/trinityrnaseq/wiki] | ||
− | + | {{Modulepagenav}} | |
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 11:00, 16 November 2022
Application Details
- Description: Trinity assembles transcript sequences from Illumina RNA-Seq data.
- Version: 2.2.0, 2.5.1 and 2.8.2
- Modules: trinityrnaseq/gcc/2.2.0, /2.5.1 and /2.8.2
- Licence: Github, open-source
Usage
This represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:
- Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
- Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
- Butterfly then processes the individual graphs in parallel, tracing the paths that read and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that correspond to paralogous genes.
Assemble RNA-Seq data
[username@login] module add trinityrnaseq/gcc/2.8.2 [username@login] Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G
Find assembled transcripts as: 'trinity_out_dir/Trinity.fasta'
A typical script would be:
#!/bin/bash if [ -e reads.right.fq.gz ] && [ ! -e reads.right.fq ]; then gunzip -c reads.right.fq.gz > reads.right.fq fi if [ -e reads.left.fq.gz ] && [ ! -e reads.left.fq ]; then gunzip -c reads.left.fq.gz > reads.left.fq fi if [ -e reads2.right.fq.gz ] && [ ! -e reads2.right.fq ]; then gunzip -c reads2.right.fq.gz > reads2.right.fq fi if [ -e reads2.left.fq.gz ] && [ ! -e reads2.left.fq ]; then gunzip -c reads2.left.fq.gz > reads2.left.fq fi ####################################################### ## Run Trinity to Generate Transcriptome Assemblies ## ####################################################### Trinity --seqType fq --max_memory 2G --left reads.left.fq.gz,reads2.left.fq.gz --right reads.right.fq.gz,reads2.right.fq.gz --SS_lib_type RF --CPU 4 --no_cleanup --normalize_reads ##### Done Running Trinity ##### if [ ! $* ]; then exit 0 fi