Introduction

Summary

RNA-seq,also known as transcriptomic sequencing, mainly uses high-throughput sequencing to reveal the expression level of RNA in a biosample, and further analyze the transcriptional change.

The pipeline can automatically analyze RNA-seq sequencing data and produces a series of visual figures and tables of results. In this pipeline, both human and mouse species are supported. The pipeline consists of five analysis steps. Firstly, Trimmomatic or NGS QC Toolkit software was used for quality control. For reads after quality control, STAR or HISAT2 software is used for mapping. Then, a gene expression matrix can be obtained by using featureCounts, HTSeq or RSEM software. According to the group information of samples, DESeq2 is used to obtain the differential genes between groups. Finally, for the differential genes identified by DESeq2, the pipeline will perform GO and KEGG functional enrichment analysis by using clusterProfiler.

The schematic diagram of RNA-seq pipeline is as follows:

Software and references:
  • Trimmomatic: Trimmomatic: a flexible trimmer for Illumina sequence data.
  • NGS QC Toolkit: NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data.
  • STAR: STAR: ultrafast universal RNA-seq aligner.
  • HISAT2: Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.
  • featureCounts: featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.
  • HTSeq: HTSeq--a Python framework to work with high-throughput sequencing data.
  • RSEM: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
  • DESeq2: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2.
  • clusterProfiler: clusterProfiler: an R package for comparing biological themes among gene clusters.
Submit Data