Skip to content

qProfiler2

Christina.xu edited this page Apr 22, 2020 · 24 revisions

Introduction

qprofiler2 is a standalone Java application that produces summary metrics for common file types used in next-generation sequencing. It can process BAM, FASTQ, VCF files and the output in all cases is an XML file containing basic summary statistics. It is a newer version of qprofiler but with many more features including the VCF mode and a vastly expanded BAM mode.

requirements

  • Java 1.8
  • Multi-core machine (ideally) and 10GB of RAM

building qprofiler2

#first clone the adamajava repository using "git clone"
git clone https://github.com/AdamaJava/adamajava

#Then move into the adamajava folder
cd adamajava

#Run gradle to build qmotif and its dependent jar files
./gradlew :qprofiler2:build

This creates the qprofiler2 jar file along with dependent jars in the qprofiler2/build/flat folder

usage & options

java -jar qprofiler2.jar -h
usage: qprofiler2 [option...] --log logfile --loglevel INFO --output outputfile --input inputfile1 --input inputfile2 ... [-ntP 4 -ntC 16] 

Option                  Description                           
-------------------------------------------------                          
--format                VCF mode only; group VCF records according to user specified format fields
--fullBamHeader         Output whole BAM header in XML report; default is to only output HD and SQ lines                       
--help                  Shows this help message.            
--index                 File containing index data relating to --input file               
--input                 File containing data to be profiled (currently limited to BAM/SAM,FASTQ, VCF).     
--log                   Log output file.       
--loglevel <LEVEL>      Logging level, e.g. INFO, DEBUG. Default=INFO.
--maxRecords <Integer>  Only process the first <Integer> records in the BAM file.                       
--ntConsumer <Integer>  count of Consumer threads created to process the input file (BAM files only).                    
--ntProducer <Integer>  count of Producer threads created to write the output file. Default=1.
--output                XML report file output by qprofiler2. Default=qprofiler2.xml.
--validation            How strict to be when reading a SAM or BAM. Possible values: {STRICT, LENIENT, SILENT}.                    
--version               Print version info.    

NOTE:

  • BAM files mapped by BWA may need to be run with the optional parameter --validation SILENT otherwise Picard will throw an exception.
  • If --output is not specified, output will be written to a default file (qprofiler2.xml) in the current directory.
  • If --ntConsumer and --ntProducer are not specified, qprofiler2 will run in single-threaded mode.
  • When running multi-threaded, we suggest more consumer than producer threads with a recommended ratio of 6:1. However, it is up to your machine system. for example, only one thread to read the input file but 12 threads are specified to process reads.
java -jar qprofiler2.jar -ntC 12 --input  $somedir/$bam --output $somedir/${bam}.qp2.xml --log $somedir/${bam}.qp2.log

Please specify the BAM index file if multiple producer threads are going to be used, e.g.:

java -jar qprofiler2.jar -ntC 12 -ntP 2 --index  $somedir/${bam}.bai --input  $somedir/$bam --output $somedir/${bam}.qp2.xml --log $somedir/${bam}.qp2.log  

Clone this wiki locally