Gamgee
You miserable little maggot. I'll stove your head in!
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Public Member Functions | Friends | List of all members
gamgee::Sam Class Reference

Utility class to manipulate a Sam record. More...

#include <sam.h>

Public Member Functions

 Sam ()=default
 initializes a null Sam. More...
 
 Sam (const std::shared_ptr< bam_hdr_t > &header, const std::shared_ptr< bam1_t > &body) noexcept
 creates a sam record that points to htslib memory already allocated More...
 
 Sam (const Sam &other)
 creates a deep copy of a sam record More...
 
Samoperator= (const Sam &other)
 creates a deep copy of a sam record More...
 
 Sam (Sam &&other)=default
 moves Sam and its header accordingly. Shared pointers maintain state to all other associated objects correctly. More...
 
Samoperator= (Sam &&other)=default
 move assignment of a Sam and it's header. Shared pointers maintain state to all other associated objects correctly. More...
 
SamHeader header ()
 the header of the Sam record More...
 
uint32_t chromosome () const
 chromosome index of the read. More...
 
uint32_t alignment_start () const
 the reference position of the first base in the read More...
 
uint32_t alignment_stop () const
 returns a (1-based and inclusive) alignment stop position. More...
 
uint32_t unclipped_start () const
 calculates the theoretical alignment start of a read that has soft/hard-clips preceding the alignment More...
 
uint32_t unclipped_stop () const
 calculates the theoretical alignment stop of a read that has soft/hard-clips preceding the alignment More...
 
uint32_t mate_chromosome () const
 returns the integer representation of the mate's chromosome. More...
 
uint32_t mate_alignment_start () const
 returns a (1-based and inclusive) mate's alignment start position (as you would see in a Sam file). More...
 
uint32_t mate_alignment_stop () const
 returns a (1-based and inclusive) mate's alignment stop position. More...
 
uint32_t mate_alignment_stop (const SamTag< std::string > &mate_cigar_tag) const
 returns a (1-based and inclusive) mate's alignment stop position. More...
 
uint32_t mate_unclipped_start () const
 returns a (1-based and inclusive) mate's unclipped alignment start position. More...
 
uint32_t mate_unclipped_start (const SamTag< std::string > &mate_cigar_tag) const
 returns a (1-based and inclusive) mate's unclipped alignment start position. More...
 
uint32_t mate_unclipped_stop () const
 returns a (1-based and inclusive) mate's unclipped alignment stop position. More...
 
uint32_t mate_unclipped_stop (const SamTag< std::string > &mate_cigar_tag) const
 returns a (1-based and inclusive) mate's unclipped alignment stop position. More...
 
uint8_t mapping_qual () const
 returns the mapping quality of this alignment More...
 
int32_t insert_size () const
 inferred insert size as reported by the aligner More...
 
void set_chromosome (const uint32_t chr)
 simple setter for the chromosome index. Index is 0-based. More...
 
void set_alignment_start (const uint32_t start)
 simple setter for the alignment start. More...
 
void set_mate_chromosome (const uint32_t mchr)
 simple setter for the mate's chromosome index. Index is 0-based. More...
 
void set_mate_alignment_start (const uint32_t mstart)
 simple setter for the mate's alignment start. More...
 
void set_mapping_qual (const uint8_t mapq)
 simple setter for the alignment quality More...
 
void set_insert_size (const int32_t isize)
 simple setter for the insert size More...
 
std::string name () const
 returns the read name More...
 
Cigar cigar () const
 returns the cigar. More...
 
ReadBases bases () const
 returns the read bases. More...
 
BaseQuals base_quals () const
 returns the base qualities. More...
 
SamTag< int32_t > integer_tag (const std::string &tag_name) const
 retrieve an integer-valued tag by name. More...
 
SamTag< double > double_tag (const std::string &tag_name) const
 retrieve an double/float-valued tag by name. More...
 
SamTag< char > char_tag (const std::string &tag_name) const
 retrieve a char-valued tag by name. More...
 
SamTag< std::string > string_tag (const std::string &tag_name) const
 retrieve a string-valued tag by name. More...
 
bool paired () const
 whether or not this read is paired More...
 
bool properly_paired () const
 whether or not this read is properly paired (see definition in BAM spec) More...
 
bool unmapped () const
 whether or not this read is unmapped More...
 
bool mate_unmapped () const
 whether or not the mate read is unmapped More...
 
bool reverse () const
 whether or not this read is from the reverse strand More...
 
bool mate_reverse () const
 whether or not the mate read is from the reverse strand More...
 
bool first () const
 whether or not this read is the first read in a pair (or multiple pairs) More...
 
bool last () const
 whether or not this read is the last read in a pair (or multiple pairs) More...
 
bool secondary () const
 whether or not this read is a secondary alignment (see definition in BAM spec) More...
 
bool fail () const
 whether or not this read is marked as failing vendor (sequencer) quality control More...
 
bool duplicate () const
 whether or not this read is a duplicate More...
 
bool supplementary () const
 whether or not this read is a supplementary alignment (see definition in the BAM spec) More...
 
void set_paired ()
 
void set_not_paired ()
 
void set_unmapped ()
 
void set_not_unmapped ()
 
void set_mate_unmapped ()
 
void set_not_mate_unmapped ()
 
void set_reverse ()
 
void set_not_reverse ()
 
void set_mate_reverse ()
 
void set_not_mate_reverse ()
 
void set_first ()
 
void set_not_first ()
 
void set_last ()
 
void set_not_last ()
 
void set_secondary ()
 
void set_not_secondary ()
 
void set_fail ()
 
void set_not_fail ()
 
void set_duplicate ()
 
void set_not_duplicate ()
 
void set_supplementary ()
 
void set_not_supplementary ()
 
bool empty () const
 whether or not this Sam object is empty, meaning that the internal memory has not been initialized (i.e. a Sam object initialized with Sam()). More...
 

Friends

class SamWriter
 allows the writer to access the guts of the object More...
 
class SamBuilder
 builder needs access to the internals in order to build efficiently More...
 

Detailed Description

Utility class to manipulate a Sam record.

Constructor & Destructor Documentation

gamgee::Sam::Sam ( )
default

initializes a null Sam.

Note
this is only used internally by the iterators
Warning
if you need to create a Sam from scratch, use the builder instead
gamgee::Sam::Sam ( const std::shared_ptr< bam_hdr_t > &  header,
const std::shared_ptr< bam1_t > &  body 
)
explicitnoexcept

creates a sam record that points to htslib memory already allocated

Note
the resulting Sam shares ownership of the pre-allocated memory via shared_ptr reference counting
gamgee::Sam::Sam ( const Sam other)

creates a deep copy of a sam record

Note
the copy will have exclusive ownership over the newly-allocated htslib memory until a data field (cigar, bases, etc.) is accessed, after which it will be shared via reference counting with the Cigar, etc. objects
does not perform a deep copy of the sam header; to copy the header, first get it via the header() function and then copy it via the usual C++ semantics
gamgee::Sam::Sam ( Sam &&  other)
default

moves Sam and its header accordingly. Shared pointers maintain state to all other associated objects correctly.

Member Function Documentation

uint32_t gamgee::Sam::alignment_start ( ) const
inline

the reference position of the first base in the read

Note
the internal encoding is 0-based to mimic that of the BAM files.
Returns
a (1-based and inclusive) alignment start position (as you would see in a Sam file).
uint32_t gamgee::Sam::alignment_stop ( ) const
inline

returns a (1-based and inclusive) alignment stop position.

Note
the internal encoding is 0-based to mimic that of the BAM files.
htslib's bam_endpos returns the coordinate of the first base AFTER the alignment, 0-based, so that translates into the last base IN the 1-based alignment.
BaseQuals gamgee::Sam::base_quals ( ) const
inline

returns the base qualities.

Warning
the objects returned by this member function will share underlying htslib memory with this object.
creates an object but doesn't copy the underlying values.
ReadBases gamgee::Sam::bases ( ) const
inline

returns the read bases.

Warning
the objects returned by this member function will share underlying htslib memory with this object.
creates an object but doesn't copy the underlying values.
SamTag< char > gamgee::Sam::char_tag ( const std::string &  tag_name) const

retrieve a char-valued tag by name.

retrieve a char-valued tag by name

Warning
creates an object but doesn't copy the underlying values.
Note
returns a SamTag with missing() == true if the read has no tag by this name
uint32_t gamgee::Sam::chromosome ( ) const
inline

chromosome index of the read.

Notice that chromosomes are listed in index order with regards to the header (so a 0-based number). Similar to Picards getReferenceIndex()

Returns
the integer representation of the chromosome.
Cigar gamgee::Sam::cigar ( ) const
inline

returns the cigar.

Warning
the objects returned by this member function will share underlying htslib memory with this object.
creates an object but doesn't copy the underlying values.
SamTag< double > gamgee::Sam::double_tag ( const std::string &  tag_name) const

retrieve an double/float-valued tag by name.

retrieve a double/float-valued tag by name

Warning
creates an object but doesn't copy the underlying values.
Note
returns a SamTag with missing() == true if the read has no tag by this name
bool gamgee::Sam::duplicate ( ) const
inline

whether or not this read is a duplicate

bool gamgee::Sam::empty ( ) const
inline

whether or not this Sam object is empty, meaning that the internal memory has not been initialized (i.e. a Sam object initialized with Sam()).

bool gamgee::Sam::fail ( ) const
inline

whether or not this read is marked as failing vendor (sequencer) quality control

bool gamgee::Sam::first ( ) const
inline

whether or not this read is the first read in a pair (or multiple pairs)

SamHeader gamgee::Sam::header ( )
inline

the header of the Sam record

Returns
a newly created SamHeader object every time it's called but the htslib memory used by the header is the same (no new allocations).
int32_t gamgee::Sam::insert_size ( ) const
inline

inferred insert size as reported by the aligner

This is the signed observed insert size. If all segments are mapped to the same reference, the unsigned observed template length equals the number of bases from the leftmost mapped base to the rightmost mapped base. The leftmost segment has a plus sign and the rightmost has a minus sign. The sign of segments in the middle is undefined.

It is set as 0 for single read or when the information is unavailable.

Returns
a signed insert size or zero if it can't be inferred.
SamTag< int32_t > gamgee::Sam::integer_tag ( const std::string &  tag_name) const

retrieve an integer-valued tag by name.

retrieve an integer-valued tag by name

Warning
creates an object but doesn't copy the underlying values.
Note
returns a SamTag with missing() == true if the read has no tag by this name
bool gamgee::Sam::last ( ) const
inline

whether or not this read is the last read in a pair (or multiple pairs)

uint8_t gamgee::Sam::mapping_qual ( ) const
inline

returns the mapping quality of this alignment

uint32_t gamgee::Sam::mate_alignment_start ( ) const
inline

returns a (1-based and inclusive) mate's alignment start position (as you would see in a Sam file).

Note
the internal encoding is 0-based to mimic that of the BAM files.
uint32_t gamgee::Sam::mate_alignment_stop ( ) const

returns a (1-based and inclusive) mate's alignment stop position.

Note
the internal encoding is 0-based to mimic that of the BAM files.
Exceptions
std::invalid_argumentif called on a record that doesn't contain the mate cigar ("MC") tag.
uint32_t gamgee::Sam::mate_alignment_stop ( const SamTag< std::string > &  mate_cigar_tag) const

returns a (1-based and inclusive) mate's alignment stop position.

This overload is for usage when the user checks for the existence of the tag themselves and passes it in to avoid exception throwing. This is provided for performance conscious use of this function. This way you will only create one SamTag object for the mate cigar tag, instead of potentially two when checking for its availability and then calling this function. For example:

const auto tag = record.string_tag("MC"); // obtains the tag from the record (expensive operation)
if (!missing(tag))
cout << record.mate_alignment_stop(tag) << endl; // this will reuse the tag you have already obtained

This is better than the alternative using the other overload where you have to either get the Tag twice or check for the exception thrown:

const auto tag = record.string_tag("MC"); // obtains the tag from the record (expensive operation)
if (!missing(tag))
cout << record.mate_alignment_stop() << endl; // this will obtain a new tag internally (unnecessary)
Parameters
mate_cigar_tagthe MC tag as obtained via the string_tag("MC") API in Sam.
Warning
This overload DOES NOT throw an exception if the mate cigar tag is missing. Instead it returns mate_alignment_start(). Treat it as undefined behavior.
Note
the internal encoding is 0-based to mimic that of the BAM files.
uint32_t gamgee::Sam::mate_chromosome ( ) const
inline

returns the integer representation of the mate's chromosome.

Notice that chromosomes are listed in index order with regards to the header (so a 0-based number).

bool gamgee::Sam::mate_reverse ( ) const
inline

whether or not the mate read is from the reverse strand

uint32_t gamgee::Sam::mate_unclipped_start ( ) const

returns a (1-based and inclusive) mate's unclipped alignment start position.

Exceptions
std::invalid_argumentif called on a record that doesn't contain the mate cigar ("MC") tag.
uint32_t gamgee::Sam::mate_unclipped_start ( const SamTag< std::string > &  mate_cigar_tag) const

returns a (1-based and inclusive) mate's unclipped alignment start position.

This overload is for usage when the user checks for the existence of the tag themselves and passes it in to avoid exception throwing. This is provided for performance conscious use of this function. This way you will only create one SamTag object for the mate cigar tag. Instead of potentially two when checking for it's availability and then calling this function. For example:

const auto tag = record.string_tag("MC"); // obtains the tag from the record (expensive operation)
if (!missing(tag))
cout << record.mate_unclipped_start(tag) << endl; // this will reuse the tag you have already obtained

This is better than the alternative using the other overload where you have to either get the Tag twice or check for the exception thrown:

const auto tag = record.string_tag("MC"); // obtains the tag from the record (expensive operation)
if (!missing(tag))
cout << record.mate_unclipped_start() << endl; // this will obtain a new tag internally (unnecessary)
Parameters
mate_cigar_tagthe MC tag as obtained via the string_tag("MC") API in Sam.
Warning
This overload DOES NOT throw an exception if the mate cigar tag is missing. Instead it returns mate_alignment_start(). Treat it as undefined behavior.
Note
the internal encoding is 0-based to mimic that of the BAM files.
uint32_t gamgee::Sam::mate_unclipped_stop ( ) const

returns a (1-based and inclusive) mate's unclipped alignment stop position.

Exceptions
std::invalid_argumentif called on a record that doesn't contain the mate cigar ("MC") tag.
uint32_t gamgee::Sam::mate_unclipped_stop ( const SamTag< std::string > &  mate_cigar_tag) const

returns a (1-based and inclusive) mate's unclipped alignment stop position.

This overload is for usage when the user checks for the existence of the tag themselves and passes it in to avoid exception throwing. This is provided for performance conscious use of this function. This way you will only create one SamTag object for the mate cigar tag. Instead of potentially two when checking for it's availability and then calling this function. For example:

const auto tag = record.string_tag("MC"); // obtains the tag from the record (expensive operation)
if (!missing(tag))
cout << record.mate_unclipped_stop(tag) << endl; // this will reuse the tag you have already obtained

This is better than the alternative using the other overload where you have to either get the Tag twice or check for the exception thrown:

const auto tag = record.string_tag("MC"); // obtains the tag from the record (expensive operation)
if (!missing(tag))
cout << record.mate_unclipped_stop() << endl; // this will obtain a new tag internally (unnecessary)
Parameters
mate_cigar_tagthe MC tag as obtained via the string_tag("MC") API in Sam.
Warning
This overload DOES NOT throw an exception if the mate cigar tag is missing. Instead it returns mate_alignment_start(). Treat it as undefined behavior.
Note
the internal encoding is 0-based to mimic that of the BAM files.
bool gamgee::Sam::mate_unmapped ( ) const
inline

whether or not the mate read is unmapped

std::string gamgee::Sam::name ( ) const
inline

returns the read name

Sam & gamgee::Sam::operator= ( const Sam other)

creates a deep copy of a sam record

Note
the copy will have exclusive ownership over the newly-allocated htslib memory until a data field (cigar, bases, etc.) is accessed, after which it will be shared via reference counting with the Cigar, etc. objects
does not perform a deep copy of the sam header; to copy the header, first get it via the header() function and then copy it via the usual C++ semantics

< shared_ptr assignment will take care of deallocating old sam record if necessary

< shared_ptr assignment will take care of deallocating old sam record if necessary

Sam& gamgee::Sam::operator= ( Sam &&  other)
default

move assignment of a Sam and it's header. Shared pointers maintain state to all other associated objects correctly.

bool gamgee::Sam::paired ( ) const
inline

whether or not this read is paired

bool gamgee::Sam::properly_paired ( ) const
inline

whether or not this read is properly paired (see definition in BAM spec)

bool gamgee::Sam::reverse ( ) const
inline

whether or not this read is from the reverse strand

bool gamgee::Sam::secondary ( ) const
inline

whether or not this read is a secondary alignment (see definition in BAM spec)

void gamgee::Sam::set_alignment_start ( const uint32_t  start)
inline

simple setter for the alignment start.

Warning
You should use (1-based and inclusive) alignment but internally this is stored 0-based to simplify BAM conversion.
void gamgee::Sam::set_chromosome ( const uint32_t  chr)
inline

simple setter for the chromosome index. Index is 0-based.

void gamgee::Sam::set_duplicate ( )
inline
void gamgee::Sam::set_fail ( )
inline
void gamgee::Sam::set_first ( )
inline
void gamgee::Sam::set_insert_size ( const int32_t  isize)
inline

simple setter for the insert size

void gamgee::Sam::set_last ( )
inline
void gamgee::Sam::set_mapping_qual ( const uint8_t  mapq)
inline

simple setter for the alignment quality

void gamgee::Sam::set_mate_alignment_start ( const uint32_t  mstart)
inline

simple setter for the mate's alignment start.

Warning
You should use (1-based and inclusive) alignment but internally this is stored 0-based to simplify BAM conversion.
void gamgee::Sam::set_mate_chromosome ( const uint32_t  mchr)
inline

simple setter for the mate's chromosome index. Index is 0-based.

void gamgee::Sam::set_mate_reverse ( )
inline
void gamgee::Sam::set_mate_unmapped ( )
inline
void gamgee::Sam::set_not_duplicate ( )
inline
void gamgee::Sam::set_not_fail ( )
inline
void gamgee::Sam::set_not_first ( )
inline
void gamgee::Sam::set_not_last ( )
inline
void gamgee::Sam::set_not_mate_reverse ( )
inline
void gamgee::Sam::set_not_mate_unmapped ( )
inline
void gamgee::Sam::set_not_paired ( )
inline
void gamgee::Sam::set_not_reverse ( )
inline
void gamgee::Sam::set_not_secondary ( )
inline
void gamgee::Sam::set_not_supplementary ( )
inline
void gamgee::Sam::set_not_unmapped ( )
inline
void gamgee::Sam::set_paired ( )
inline
void gamgee::Sam::set_reverse ( )
inline
void gamgee::Sam::set_secondary ( )
inline
void gamgee::Sam::set_supplementary ( )
inline
void gamgee::Sam::set_unmapped ( )
inline
SamTag< std::string > gamgee::Sam::string_tag ( const std::string &  tag_name) const

retrieve a string-valued tag by name.

retrieve a string-valued tag by name

Warning
creates an object but doesn't copy the underlying values.
Note
returns a SamTag with missing() == true if the read has no tag by this name
bool gamgee::Sam::supplementary ( ) const
inline

whether or not this read is a supplementary alignment (see definition in the BAM spec)

uint32_t gamgee::Sam::unclipped_start ( ) const

calculates the theoretical alignment start of a read that has soft/hard-clips preceding the alignment

For example if the read has an alignment start of 100 but the first 4 bases were clipped (hard or soft clipped) then this method will return 96.

Returns
the alignment start (1-based, inclusive) adjusted for clipped bases.

Invalid to call on an unmapped read. Invalid to call with cigar = null

uint32_t gamgee::Sam::unclipped_stop ( ) const

calculates the theoretical alignment stop of a read that has soft/hard-clips preceding the alignment

For example if the read has an alignment stop of 100 but the last 4 bases were clipped (hard or soft clipped) then this method will return 104.

Returns
the alignment stop (1-based, inclusive) adjusted for clipped bases.
Warning
Invalid to call on an unmapped read.
Invalid to call with cigar = null
bool gamgee::Sam::unmapped ( ) const
inline

whether or not this read is unmapped

Friends And Related Function Documentation

friend class SamBuilder
friend

builder needs access to the internals in order to build efficiently

friend class SamWriter
friend

allows the writer to access the guts of the object


The documentation for this class was generated from the following files: