MarkDuplicates (picard 2.10.9-1-gaa8c979-SNAPSHOT API)

java.lang.Object
- picard.cmdline.CommandLineProgram
- - picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
  - - picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
    - - picard.sam.markduplicates.MarkDuplicates

Direct Known Subclasses:

SimpleMarkDuplicatesWithMateCigar
```
@DocumentedFeature
public class MarkDuplicates
extends AbstractMarkDuplicatesCommandLineProgram
```
A better duplication marking algorithm that handles all cases including clipped and gapped alignments.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`MarkDuplicates.DuplicateTaggingPolicy` Enum used to control how duplicates are flagged in the DT optional tag on each read.
`static class`	`MarkDuplicates.DuplicateType` Enum for the possible values that a duplicate read can be tagged with in the DT attribute.

Nested classes/interfaces inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
AbstractMarkDuplicatesCommandLineProgram.SamHeaderAndIterator

Field Summary

Fields
Modifier and Type	Field and Description
`java.lang.String`	`BARCODE_TAG`
`static java.lang.String`	`DUPLICATE_SET_INDEX_TAG` The attribute in the SAM/BAM file used to store which read was selected as representative out of a duplicate set
`static java.lang.String`	`DUPLICATE_SET_SIZE_TAG` The attribute in the SAM/BAM file used to store the size of a duplicate set
`static java.lang.String`	`DUPLICATE_TYPE_LIBRARY` The duplicate type tag value for duplicate type: library.
`static java.lang.String`	`DUPLICATE_TYPE_SEQUENCING` The duplicate type tag value for duplicate type: sequencing (optical & pad-hopping, or "co-localized").
`static java.lang.String`	`DUPLICATE_TYPE_TAG` The optional attribute in SAM/BAM files used to store the duplicate type.
`protected LibraryIdGenerator`	`libraryIdGenerator`
`int`	`MAX_FILE_HANDLES_FOR_READ_ENDS_MAP`
`int`	`MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP` If more than this many sequences in SAM file, don't spill to disk because there will not be enough file handles.
`java.lang.String`	`READ_ONE_BARCODE_TAG`
`java.lang.String`	`READ_TWO_BARCODE_TAG`
`boolean`	`REMOVE_SEQUENCING_DUPLICATES`
`double`	`SORTING_COLLECTION_SIZE_RATIO`
`boolean`	`TAG_DUPLICATE_SET_MEMBERS`
`MarkDuplicates.DuplicateTaggingPolicy`	`TAGGING_POLICY`

Fields inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
ASSUME_SORT_ORDER, ASSUME_SORTED, COMMENT, DUPLICATE_SCORING_STRATEGY, INPUT, METRICS_FILE, OUTPUT, pgIdsSeen, PROGRAM_GROUP_COMMAND_LINE, PROGRAM_GROUP_NAME, PROGRAM_GROUP_VERSION, PROGRAM_RECORD_ID, REMOVE_DUPLICATES

Fields inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
LOG, MAX_OPTICAL_DUPLICATE_SET_SIZE, OPTICAL_DUPLICATE_PIXEL_DISTANCE, opticalDuplicateFinder, READ_NAME_REGEX

Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY

Constructor Summary

Constructors
Constructor and Description

MarkDuplicates()

Constructors
Constructor and Description
`MarkDuplicates()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected int`	`doWork()` Main work method.
`static void`	`main(java.lang.String[] args)` Stock main method.

Methods inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
finalizeAndWriteMetrics, getChainedPgIds, openInputs, trackOpticalDuplicates

Methods inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
customCommandLineValidation, setupOpticalDuplicateFinder

Methods inherited from class picard.cmdline.CommandLineProgram
getCommandLine, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

DUPLICATE_TYPE_TAG
```
public static final java.lang.String DUPLICATE_TYPE_TAG
```
The optional attribute in SAM/BAM files used to store the duplicate type.

See Also:

Constant Field Values

DUPLICATE_TYPE_LIBRARY
```
public static final java.lang.String DUPLICATE_TYPE_LIBRARY
```
The duplicate type tag value for duplicate type: library.

See Also:

Constant Field Values

DUPLICATE_TYPE_SEQUENCING
```
public static final java.lang.String DUPLICATE_TYPE_SEQUENCING
```
The duplicate type tag value for duplicate type: sequencing (optical & pad-hopping, or "co-localized").

See Also:

Constant Field Values

DUPLICATE_SET_INDEX_TAG
```
public static final java.lang.String DUPLICATE_SET_INDEX_TAG
```
The attribute in the SAM/BAM file used to store which read was selected as representative out of a duplicate set

See Also:

Constant Field Values

DUPLICATE_SET_SIZE_TAG
```
public static final java.lang.String DUPLICATE_SET_SIZE_TAG
```
The attribute in the SAM/BAM file used to store the size of a duplicate set

See Also:

Constant Field Values

MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP

@Argument(shortName="MAX_SEQS",
          doc="This option is obsolete. ReadEnds will always be spilled to disk.")
public int MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP

If more than this many sequences in SAM file, don't spill to disk because there will not be enough file handles.

MAX_FILE_HANDLES_FOR_READ_ENDS_MAP

@Argument(shortName="MAX_FILE_HANDLES",
          doc="Maximum number of file handles to keep open when spilling read ends to disk. Set this number a little lower than the per-process maximum number of file that may be open. This number can be found by executing the \'ulimit -n\' command on a Unix system.")
public int MAX_FILE_HANDLES_FOR_READ_ENDS_MAP

SORTING_COLLECTION_SIZE_RATIO

@Argument(doc="This number, plus the maximum RAM available to the JVM, determine the memory footprint used by some of the sorting collections.  If you are running out of memory, try reducing this number.")
public double SORTING_COLLECTION_SIZE_RATIO

BARCODE_TAG

@Argument(doc="Barcode SAM tag (ex. BC for 10X Genomics)",
          optional=true)
public java.lang.String BARCODE_TAG

READ_ONE_BARCODE_TAG

@Argument(doc="Read one barcode SAM tag (ex. BX for 10X Genomics)",
          optional=true)
public java.lang.String READ_ONE_BARCODE_TAG

READ_TWO_BARCODE_TAG

@Argument(doc="Read two barcode SAM tag (ex. BX for 10X Genomics)",
          optional=true)
public java.lang.String READ_TWO_BARCODE_TAG

TAG_DUPLICATE_SET_MEMBERS

@Argument(doc="If a read appears in a duplicate set, add two tags. The first tag, DUPLICATE_SET_SIZE_TAG (DS), indicates the size of the duplicate set. The smallest possible DS value is 2 which occurs when two reads map to the same portion of the reference only one of which is marked as duplicate. The second tag, DUPLICATE_SET_INDEX_TAG (DI), represents a unique identifier for the duplicate set to which the record belongs. This identifier is the index-in-file of the representative read that was selected out of the duplicate set.",
          optional=true)
public boolean TAG_DUPLICATE_SET_MEMBERS

REMOVE_SEQUENCING_DUPLICATES

@Argument(doc="If true remove \'optical\' duplicates and other duplicates that appear to have arisen from the sequencing process instead of the library preparation process, even if REMOVE_DUPLICATES is false. If REMOVE_DUPLICATES is true, all duplicates are removed and this option is ignored.")
public boolean REMOVE_SEQUENCING_DUPLICATES

TAGGING_POLICY

@Argument(doc="Determines how duplicate types are recorded in the DT optional attribute.")
public MarkDuplicates.DuplicateTaggingPolicy TAGGING_POLICY

libraryIdGenerator

protected LibraryIdGenerator libraryIdGenerator

Constructor Detail
- MarkDuplicates
```
public MarkDuplicates()
```

Method Detail
- main
```
public static void main(java.lang.String[] args)
```
  Stock main method.
- doWork
```
protected int doWork()
```
  Main work method. Reads the BAM file once and collects sorted information about the 5' ends of both ends of each read (or just one end in the case of pairs). Then makes a pass through those determining duplicates before re-reading the input file and writing it out with duplication flags set correctly.
  
  Specified by:
  
  doWork in class CommandLineProgram
  
  Returns:
  
  program exit status.

Class MarkDuplicates

Nested Class Summary

Nested classes/interfaces inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram

Field Summary

Fields inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram

Fields inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram

Fields inherited from class picard.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram

Methods inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram

Methods inherited from class picard.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Detail

DUPLICATE_TYPE_TAG

DUPLICATE_TYPE_LIBRARY

DUPLICATE_TYPE_SEQUENCING

DUPLICATE_SET_INDEX_TAG

DUPLICATE_SET_SIZE_TAG

MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP

MAX_FILE_HANDLES_FOR_READ_ENDS_MAP

SORTING_COLLECTION_SIZE_RATIO

BARCODE_TAG

READ_ONE_BARCODE_TAG

READ_TWO_BARCODE_TAG

TAG_DUPLICATE_SET_MEMBERS

REMOVE_SEQUENCING_DUPLICATES

TAGGING_POLICY

libraryIdGenerator

Constructor Detail

MarkDuplicates

Method Detail

main

doWork