PositionBasedDownsampleSam (PICARD JDK API Documentation)

java.lang.Object
- picard.cmdline.CommandLineProgram
- - picard.sam.PositionBasedDownsampleSam

```
public class PositionBasedDownsampleSam
extends CommandLineProgram
```
Class to downsample a BAM file while respecting that we should either get rid of both ends of a pair or neither end of the pair. In addition, this program uses the read-name and extracts the position within the tile whence the read came from. The downsampling is based on this position.
Note 1: This is technology and read-name dependent. If your read-names do not have coordinate information, or if your BAM contains reads from multiple technologies (flowcell versions, sequencing machines) this will not work properly. This has been designed with Illumina MiSeq/HiSeq in mind.
Note 2: The downsampling is _not_ random. It is deterministically dependent on the position of the read within its tile. Specifically, it draws out an ellipse that covers a FRACTION fraction of the area and each of the edges and uses this to determine whether to keep the record. Since reads with the same name have the same position (mates, secondary and supplemental alignments), the decision will be the same for all of them.
Finally, the code has been designed to simulate sequencing less as accurately as possible, not for getting an exact downsample fraction. In particular, since the reads may be distributed non-evenly within the lanes/tiles, the resulting downsampling percentage will not be accurately determined by the input argument FRACTION. One should re-MarkDuplicates after downsampling in order to "expose" the duplicates whose representative has been downsampled away.

Author:

Yossi Farjoun

Field Summary

Fields
Modifier and Type	Field and Description
`boolean`	`ALLOW_MULTIPLE_DOWNSAMPLING_DESPITE_WARNINGS`
`java.lang.Double`	`FRACTION`
`java.io.File`	`INPUT`
`java.io.File`	`OUTPUT`
`static java.lang.String`	`PG_PROGRAM_NAME`
`boolean`	`REMOVE_DUPLICATE_INFORMATION`
`java.lang.Long`	`STOP_AFTER`

Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, TMP_DIR, VALIDATION_STRINGENCY, VERBOSITY

Constructor Summary

Constructors
Constructor and Description

PositionBasedDownsampleSam()

Constructors
Constructor and Description
`PositionBasedDownsampleSam()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected java.lang.String[]`	`customCommandLineValidation()` Put any custom command-line validation in an override of this method.
`protected int`	`doWork()` Do the work after command line has been parsed.
`static void`	`main(java.lang.String[] args)`

Methods inherited from class picard.cmdline.CommandLineProgram
getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getNestedOptions, getNestedOptionsForHelp, getStandardUsagePreamble, getVersion, instanceMain, instanceMainWithExit, parseArgs, setDefaultHeaders

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

INPUT

@Option(shortName="I",
        doc="The input SAM or BAM file to downsample.")
public java.io.File INPUT

OUTPUT

@Option(shortName="O",
        doc="The output, downsampled, SAM or BAM file to write.")
public java.io.File OUTPUT

FRACTION

@Option(shortName="F",
        doc="The (approximate) fraction of reads to be kept, between 0 and 1.",
        optional=false)
public java.lang.Double FRACTION

STOP_AFTER

@Option(doc="Stop after processing N reads, mainly for debugging.",
        optional=true)
public java.lang.Long STOP_AFTER

ALLOW_MULTIPLE_DOWNSAMPLING_DESPITE_WARNINGS

@Option(doc="Allow Downsampling again despite this being a bad idea with possibly unexpected results.",
        optional=true)
public boolean ALLOW_MULTIPLE_DOWNSAMPLING_DESPITE_WARNINGS

REMOVE_DUPLICATE_INFORMATION

@Option(doc="Determines whether the duplicate tag should be reset since the downsampling requires re-marking duplicates.")
public boolean REMOVE_DUPLICATE_INFORMATION

PG_PROGRAM_NAME

public static java.lang.String PG_PROGRAM_NAME

Constructor Detail
- PositionBasedDownsampleSam
```
public PositionBasedDownsampleSam()
```

Method Detail
- main
```
public static void main(java.lang.String[] args)
```
- customCommandLineValidation
```
protected java.lang.String[] customCommandLineValidation()
```
  Description copied from class: CommandLineProgram
  
  Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
  
  Overrides:
  
  customCommandLineValidation in class CommandLineProgram
  
  Returns:
  
  null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
- doWork
```
protected int doWork()
```
  Description copied from class: CommandLineProgram
  
  Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
  
  Specified by:
  
  doWork in class CommandLineProgram
  
  Returns:
  
  program exit status.

Class PositionBasedDownsampleSam

Field Summary

Fields inherited from class picard.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class picard.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Detail

INPUT

OUTPUT

FRACTION

STOP_AFTER

ALLOW_MULTIPLE_DOWNSAMPLING_DESPITE_WARNINGS

REMOVE_DUPLICATE_INFORMATION

PG_PROGRAM_NAME

Constructor Detail

PositionBasedDownsampleSam

Method Detail

main

customCommandLineValidation

doWork