Integrated High-throughput Sequencing Data Analysis for Plant
Author: Dr. Chenjiang You
Current Version: 0.8
Latest updata: 02/09/2022
The pRNASeqTools is a Perl and R based pipeline designed for automatic general analysis for Illumina sequencing data in supported plants.
Currently it is able to process small RNA-seq, mRNA-seq, degradome-seq, CLIP-seq, ChIP-seq, and WGBS-seq and generally analyze them. See below for more information of specific tasks.
The phasiRNA identification module was contributed by Dr. Xuan Ma.
If you have any questions or comments, please submit an issue in the GitHub or directly email to Chenjiang You.
To successfully run this pipeline on your own computer or server, several pieces of dependent software are needed. See INSTALL.md for detailed information.
For genome reference files, please contact Chenjiang You for pre-built genomes or the instructions for new genomes.
The only input files needed are Illumina output fastq files, either in the FASTQ format or corresponding compressed file formats .gz and .bz2. SRR accessions are also accepted.
Note: the FASTA format is not supported. You may convert the fasta format to fastq format by adding artificial sequence names and qualities.
See help information of pRNASeqTools simply by execute pRNASeqTools.
General analysis for small RNA-seq from samples control and treatment with 3 biological replicates
pRNASeqTools srna --adaptor AGATCGGAAGAGC --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2Only mapping the small RNA reads to the genome and creating read count files
pRNASeqTools srna --adaptor AGATCGGAAGAGC --mapping-only --control control=control_1.fastq+control_2.fastq+SRRXXXXXXXPerform statistic analyses in the folder containing pre-processed data
pRNASeqTools srna --nomapping --control control=3 --treatment treatment=3General analysis for mRNA-seq from samples control and treatment with 3 biological replicates
pRNASeqTools mrna --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2General analysis for paired mRNA-seq from samples control and treatment with 3 biological replicates
pRNASeqTools mrna --control control=control_1_R1.fastq.gz,control_1_R2.fastq.gz+control_2_R1.fastq.gz,control_2_R2.fastq.gz+control_3_R1.fastq.gz,control_3_R2.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2Trunction and tailing analysis of plant miRNAs
pRNASeqTools tt --adaptor AGATCGGAAGAGC --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2Degradome data analysis
pRNASeqTools deg --adaptor AGATCGGAAGAGC --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2Two-factor DE analysis
pRNASeqTools tf --control control=time1,3,time2,3 --treatment treatment=time1,3,time2,3All output files are stored in the output directory.
Mapping statistics are stored in the log file log_xxxxxxxxx.txt.
Several groups of files are generated in the output directory:
-
countfiles andnffiles can be used for later--nomappingruns, which will not invoke the mapping procedures.The second to tenth columns of
countfiles are numbers of assigned small RNAs with length 18 - 26nt. -
pdffiles showing the reproductivity of biological replicates and the relationship of samples. -
csvfiles containing the results of statistic analyses, of whichhyperandhypofiles indicate the significant ones filtered out based on input parameters. -
bedgraphfiles for visualization in IGV. Note: Keywords are embedded in the file names, indicating the targets and methods.
- miRNA reads are categorized in the
outfiles. The second column shows the number of tailed nucleotides and the third column shows the number of truncated nucleotides. pdffiles are bubble plots for each miRNA.
bamfiles contain the mapped reads.txtfiles report the identified peaks on each transcripts.
- Mapped reads in
bamfiles and read counts for each gene intxtfiles are reported. - Up-regulated and down-regulated DEG results are reported in
total.hyper.csvandtotal.hypo.csvfiles.
- This mode can only run in
srnaandmrnaoutput folders. - Up-regulated and down-regulated DEG results are reported in
total.hyper.csvandtotal.hypo.csvfiles.