public class CollectIndependentReplicateMetrics extends CommandLineProgram
The estimation is based on duplicate-sets of size 2 and 3 and gives separate estimates from each. The assumption is that the duplication rate (biological or otherwise) is independent of the duplicate-set size. A significant difference between the two rates may be an indication that this assumption is incorrect.
The duplicate sets are found using the mate-cigar tag (MC) which is added by MergeBamAlignment
, or FixMateInformation
.
This program will not work without the MC tag.
Explanation of the calculation behind the estimation can be found in the IndependentReplicateMetric
class.
The calculation Assumes a diploid organism (more accurately, assumes that only two alleles can appear at a HET site and that these two alleles will appear at equal probabilities. It requires as input a VCF with genotypes for the sample in question. NOTE: This class is very much in alpha stage, and still under heavy development (feel free to join!)
Modifier and Type | Field and Description |
---|---|
java.lang.String |
BARCODE_BQ |
java.lang.String |
BARCODE_TAG |
java.io.File |
INPUT |
java.io.File |
MATRIX_OUTPUT |
java.lang.Integer |
MINIMUM_BARCODE_BQ |
java.lang.Integer |
MINIMUM_BQ |
java.lang.Integer |
MINIMUM_GQ |
java.lang.Integer |
MINIMUM_MQ |
java.io.File |
OUTPUT |
java.lang.String |
SAMPLE |
java.lang.Integer |
STOP_AFTER |
java.io.File |
VCF |
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, TMP_DIR, VALIDATION_STRINGENCY, VERBOSITY
Constructor and Description |
---|
CollectIndependentReplicateMetrics() |
Modifier and Type | Method and Description |
---|---|
protected int |
doWork()
Do the work after command line has been parsed.
|
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getNestedOptions, getNestedOptionsForHelp, getStandardUsagePreamble, getVersion, instanceMain, instanceMainWithExit, parseArgs, setDefaultHeaders
@Option(shortName="MO", doc="Write the confusion matrix (of UMIs) to this file", optional=true) public java.io.File MATRIX_OUTPUT
@Option(shortName="GQ", doc="minimal value for the GQ field in the VCF to use variant site.", optional=true) public java.lang.Integer MINIMUM_GQ
@Option(shortName="MQ", doc="minimal value for the mapping quality of the reads to be used in the estimation.", optional=true) public java.lang.Integer MINIMUM_MQ
@Option(shortName="BQ", doc="minimal value for the base quality of a base to be used in the estimation.", optional=true) public java.lang.Integer MINIMUM_BQ
@Option(shortName="ALIAS", doc="Name of sample to look at in VCF. Can be omitted if VCF contains only one sample.", optional=true) public java.lang.String SAMPLE
@Option(doc="Number of sets to examine before stopping.", optional=true) public java.lang.Integer STOP_AFTER
@Option(doc="Barcode Quality SAM tag.", optional=true) public java.lang.String BARCODE_BQ
protected int doWork()
CommandLineProgram
doWork
in class CommandLineProgram