Gamgee
You miserable little maggot. I'll stove your head in!
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Public Member Functions | Static Public Member Functions | Friends | List of all members
gamgee::Variant Class Reference

Utility class to manipulate a Variant record. More...

#include <variant.h>

Public Member Functions

 Variant ()=default
 initializes a null Variant More...
 
 Variant (const std::shared_ptr< bcf_hdr_t > &header, const std::shared_ptr< bcf1_t > &body) noexcept
 creates a Variant given htslib objects. More...
 
 Variant (const Variant &other)
 makes a deep copy of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly. More...
 
Variantoperator= (const Variant &other)
 deep copy assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly. More...
 
 Variant (Variant &&other)=default
 moves Variant and it's header accordingly. Shared pointers maintain state to all other associated objects correctly. More...
 
Variantoperator= (Variant &&other)=default
 move assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly. More...
 
VariantHeader header () const
 returns the header for this variant More...
 
bool missing () const
 returns true if this is a default-constructed Variant object with no data More...
 
uint32_t chromosome () const
 returns the integer representation of the chromosome. Notice that chromosomes are listed in index order with regards to the header (so a 0-based number). Similar to Picards getReferenceIndex() More...
 
std::string chromosome_name () const
 returns the name of the chromosome by querying the header. More...
 
uint32_t alignment_start () const
 returns a 1-based alignment start position (as you would see in a VCF file). More...
 
uint32_t alignment_stop () const
 returns a 1-based alignment stop position, as you would see in a VCF INFO END tag, or the end position of the reference allele if there is no END tag. More...
 
float qual () const
 returns the Phred scaled site qual (probability that the site is not reference). See VCF spec. More...
 
uint32_t n_samples () const
 returns the number of samples in this Variant record More...
 
uint32_t n_alleles () const
 returns the number of alleles in this Variant record including the reference allele More...
 
std::string id () const
 returns the variant id field (typically dbsnp id) More...
 
std::string ref () const
 returns the ref allele in this Variant record More...
 
std::vector< std::string > alt () const
 returns the vectors of alt alleles in this Variant record More...
 
VariantFilters filters () const
 returns a vector-like object with all the filters for this record More...
 
bool has_filter (const std::string &filter) const
 checks for the existence of a filter in this record More...
 
IndividualField< Genotypegenotypes () const
 special getter for the Genotype (GT) field. Returns a random access object with all the values in a given GT tag for all samples contiguous in memory. More...
 
IndividualField
< IndividualFieldValue
< int32_t > > 
integer_individual_field (const std::string &tag) const
 returns a random access object with all the values in a given individual field tag in integer format for all samples contiguous in memory. More...
 
IndividualField
< IndividualFieldValue< float > > 
float_individual_field (const std::string &tag) const
 returns a random access object with all the values in a given individual field tag in float format for all samples contiguous in memory. More...
 
IndividualField
< IndividualFieldValue
< std::string > > 
string_individual_field (const std::string &tag) const
 returns a random access object with all the values in a given individual field tag in string format for all samples contiguous in memory. More...
 
IndividualField
< IndividualFieldValue
< int32_t > > 
individual_field_as_integer (const std::string &tag) const
 same as integer_individual_field but will attempt to convert underlying data to integer if possible. More...
 
IndividualField
< IndividualFieldValue< float > > 
individual_field_as_float (const std::string &tag) const
 same as float_individual_field but will attempt to convert underlying data to float if possible. More...
 
IndividualField
< IndividualFieldValue
< std::string > > 
individual_field_as_string (const std::string &tag) const
 same as string_individual_field but will attempt to convert underlying data to string if possible. More...
 
IndividualField
< IndividualFieldValue
< int32_t > > 
integer_individual_field (const int32_t index) const
 returns a random access object with all the values in a given individual field tag index in integer format for all samples contiguous in memory. More...
 
IndividualField
< IndividualFieldValue< float > > 
float_individual_field (const int32_t index) const
 returns a random access object with all the values in a given individual field tag index in float format for all samples contiguous in memory. More...
 
IndividualField
< IndividualFieldValue
< std::string > > 
string_individual_field (const int32_t index) const
 returns a random access object with all the values in a given individual field tag index in string format for all samples contiguous in memory. More...
 
IndividualField
< IndividualFieldValue
< int32_t > > 
individual_field_as_integer (const int32_t index) const
 same as integer_individual_field but will attempt to convert underlying data to integer if possible. More...
 
IndividualField
< IndividualFieldValue< float > > 
individual_field_as_float (const int32_t index) const
 same as float_individual_field but will attempt to convert underlying data to float if possible. More...
 
IndividualField
< IndividualFieldValue
< std::string > > 
individual_field_as_string (const int32_t index) const
 same as string_individual_field but will attempt to convert underlying data to string if possible. More...
 
bool boolean_shared_field (const std::string &tag) const
 whether or not the tag is present More...
 
SharedField< int32_t > integer_shared_field (const std::string &tag) const
 returns a random access object with all the values in a given shared field tag in integer format contiguous in memory. More...
 
SharedField< float > float_shared_field (const std::string &tag) const
 returns a random access object with all the values in a given shared field tag in float format for all samples contiguous in memory. More...
 
SharedField< std::string > string_shared_field (const std::string &tag) const
 returns a random access object with all the values in a given shared field tag in string format for all samples contiguous in memory. More...
 
SharedField< int32_t > shared_field_as_integer (const std::string &tag) const
 same as integer_shared_field but will attempt to convert underlying data to integer if possible. More...
 
SharedField< float > shared_field_as_float (const std::string &tag) const
 same as float_shared_field but will attempt to convert underlying data to float if possible. More...
 
SharedField< std::string > shared_field_as_string (const std::string &tag) const
 same as string_shared_field but will attempt to convert underlying data to string if possible. More...
 
bool boolean_shared_field (const int32_t index) const
 whether or not the tag with this index is present More...
 
SharedField< int32_t > integer_shared_field (const int32_t index) const
 same as integer_shared_field but will attempt to convert underlying data to integer if possible. More...
 
SharedField< float > float_shared_field (const int32_t index) const
 same as float_shared_field but will attempt to convert underlying data to float if possible. More...
 
SharedField< std::string > string_shared_field (const int32_t index) const
 same as string_shared_field but will attempt to convert underlying data to string if possible. More...
 
SharedField< int32_t > shared_field_as_integer (const int32_t index) const
 same as integer_shared_field but will attempt to convert underlying data to integer if possible. More...
 
SharedField< float > shared_field_as_float (const int32_t index) const
 same as float_shared_field but will attempt to convert underlying data to float if possible. More...
 
SharedField< std::string > shared_field_as_string (const int32_t index) const
 same as string_shared_field but will attempt to convert underlying data to string if possible. More...
 
AlleleMask allele_mask () const
 computes the allele types for all allels (including the reference allele) More...
 

Static Public Member Functions

template<class VALUE , template< class > class ITER>
static boost::dynamic_bitset select_if (const ITER< VALUE > &first, const ITER< VALUE > &last, const std::function< bool(const decltype(*first)&value)> pred)
 functional-style set logic operations for variant field vectors More...
 

Friends

class VariantWriter
 
class VariantBuilder
 builder needs access to the internals in order to build efficiently More...
 
class ReferenceBlockSplittingVariantIterator
 

Detailed Description

Utility class to manipulate a Variant record.

Constructor & Destructor Documentation

gamgee::Variant::Variant ( )
default

initializes a null Variant

Note
this is only used internally by the iterators
Warning
if you need to create a Variant from scratch, use the builder instead
gamgee::Variant::Variant ( const std::shared_ptr< bcf_hdr_t > &  header,
const std::shared_ptr< bcf1_t > &  body 
)
explicitnoexcept

creates a Variant given htslib objects.

creates a variant record that points to htslib memory already allocated

Note
used by all iterators
the resulting Variant shares ownership of the pre-allocated memory via shared_ptr reference counting
gamgee::Variant::Variant ( const Variant other)

makes a deep copy of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly.

creates a deep copy of a variant record

Note
does not perform a deep copy of the variant header; to copy the header, first get it via the header() function and then copy it via the usual C++ semantics
gamgee::Variant::Variant ( Variant &&  other)
default

moves Variant and it's header accordingly. Shared pointers maintain state to all other associated objects correctly.

Member Function Documentation

uint32_t gamgee::Variant::alignment_start ( ) const
inline

returns a 1-based alignment start position (as you would see in a VCF file).

Note
the internal encoding is 0-based to mimic that of the BCF files.
uint32_t gamgee::Variant::alignment_stop ( ) const
inline

returns a 1-based alignment stop position, as you would see in a VCF INFO END tag, or the end position of the reference allele if there is no END tag.

AlleleMask gamgee::Variant::allele_mask ( ) const

computes the allele types for all allels (including the reference allele)

This function gives you an index vector (AlleleMask) that can be used to query genotypes for snp(), indel(), complex(), ...

Complexity is O(N) on the number of alleles.

Returns
a vector of AlleleType that can be used with many of Genotype member functions
std::vector< std::string > gamgee::Variant::alt ( ) const

returns the vectors of alt alleles in this Variant record

bool gamgee::Variant::boolean_shared_field ( const std::string &  tag) const

whether or not the tag is present

Note
bools are treated specially as vector<bool> is impossible given the spec
bool gamgee::Variant::boolean_shared_field ( const int32_t  index) const

whether or not the tag with this index is present

Note
bools are treated specially as vector<bool> is impossible given the spec
uint32_t gamgee::Variant::chromosome ( ) const
inline

returns the integer representation of the chromosome. Notice that chromosomes are listed in index order with regards to the header (so a 0-based number). Similar to Picards getReferenceIndex()

std::string gamgee::Variant::chromosome_name ( ) const
inline

returns the name of the chromosome by querying the header.

VariantFilters gamgee::Variant::filters ( ) const

returns a vector-like object with all the filters for this record

IndividualField< IndividualFieldValue< float > > gamgee::Variant::float_individual_field ( const std::string &  tag) const

returns a random access object with all the values in a given individual field tag in float format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< float > > gamgee::Variant::float_individual_field ( const int32_t  index) const

returns a random access object with all the values in a given individual field tag index in float format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< float > gamgee::Variant::float_shared_field ( const std::string &  tag) const

returns a random access object with all the values in a given shared field tag in float format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< float > gamgee::Variant::float_shared_field ( const int32_t  index) const

same as float_shared_field but will attempt to convert underlying data to float if possible.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< Genotype > gamgee::Variant::genotypes ( ) const

special getter for the Genotype (GT) field. Returns a random access object with all the values in a given GT tag for all samples contiguous in memory.

Warning
Only int8_t GT fields have been tested.
Missing GT fields are untested.
creates a new object but makes no copies of the underlying values.

< if the variant is missing or the GT tag is missing, return an empty IndividualField

bool gamgee::Variant::has_filter ( const std::string &  filter) const

checks for the existence of a filter in this record

VariantHeader gamgee::Variant::header ( ) const
inline

returns the header for this variant

Note
does not deep copy the header; returned VariantHeader object shares existing memory
std::string gamgee::Variant::id ( ) const

returns the variant id field (typically dbsnp id)

IndividualField< IndividualFieldValue< float > > gamgee::Variant::individual_field_as_float ( const std::string &  tag) const

same as float_individual_field but will attempt to convert underlying data to float if possible.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< float > > gamgee::Variant::individual_field_as_float ( const int32_t  index) const

same as float_individual_field but will attempt to convert underlying data to float if possible.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::individual_field_as_integer ( const std::string &  tag) const

same as integer_individual_field but will attempt to convert underlying data to integer if possible.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::individual_field_as_integer ( const int32_t  index) const

same as integer_individual_field but will attempt to convert underlying data to integer if possible.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< std::string > > gamgee::Variant::individual_field_as_string ( const std::string &  tag) const

same as string_individual_field but will attempt to convert underlying data to string if possible.

Warning
Only int8_t GT fields have been tested.
IndividualField< IndividualFieldValue< std::string > > gamgee::Variant::individual_field_as_string ( const int32_t  index) const

same as string_individual_field but will attempt to convert underlying data to string if possible.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::integer_individual_field ( const std::string &  tag) const

returns a random access object with all the values in a given individual field tag in integer format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::integer_individual_field ( const int32_t  index) const

returns a random access object with all the values in a given individual field tag index in integer format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< int32_t > gamgee::Variant::integer_shared_field ( const std::string &  tag) const

returns a random access object with all the values in a given shared field tag in integer format contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< int32_t > gamgee::Variant::integer_shared_field ( const int32_t  index) const

same as integer_shared_field but will attempt to convert underlying data to integer if possible.

Warning
creates a new object but makes no copies of the underlying values.
bool gamgee::Variant::missing ( ) const
inline

returns true if this is a default-constructed Variant object with no data

uint32_t gamgee::Variant::n_alleles ( ) const
inline

returns the number of alleles in this Variant record including the reference allele

uint32_t gamgee::Variant::n_samples ( ) const
inline

returns the number of samples in this Variant record

Variant & gamgee::Variant::operator= ( const Variant other)

deep copy assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly.

creates a deep copy of a variant record

Parameters
otherthe Variant to be copied
Note
does not perform a deep copy of the variant header; to copy the header, first get it via the header() function and then copy it via the usual C++ semantics

< shared_ptr assignment will take care of deallocating old record if necessary

Variant& gamgee::Variant::operator= ( Variant &&  other)
default

move assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly.

float gamgee::Variant::qual ( ) const
inline

returns the Phred scaled site qual (probability that the site is not reference). See VCF spec.

std::string gamgee::Variant::ref ( ) const

returns the ref allele in this Variant record

template<class VALUE , template< class > class ITER>
static boost::dynamic_bitset gamgee::Variant::select_if ( const ITER< VALUE > &  first,
const ITER< VALUE > &  last,
const std::function< bool(const decltype(*first)&value)>  pred 
)
inlinestatic

functional-style set logic operations for variant field vectors

This function applies the unary predicate pred to every element between first and last in the container without modifying them. It then produces a bitset with as many elements as the full iteration (from first to last) For all elements where the predicate returns true, the corresponding bit in the bitset will be 1 (set). For all elements where the predicate returns false, the corresponding bit will be 0 (unset)

The great advantage of this functionality is that you can perform one operation across all elements of an iterator benefiting from data locality and cache prefetching which gets translated into a more manageable format (bitset) for subsequent set-logic transformations. This will be particularly fast for very large datasets, for example, in a VCF file with 1000+ samples, running several select_if's end to end on all the samples and then combining the resulting bitsets using set-logic, is much faster than having one single iteration updating several IndividualFields due to it's distance in the computer's memory.

For example:

const auto genotypes = record.genotypes(); // a "vector-like" with the genotypes of all samples in this record
const auto gqs = record.integer_individual_field("GQ"); // a "vector-like" with all the GQs of all samples in this record
const auto hets = Variant::select_if(genotypes.begin(), genotypes.end(), [](const auto& g) { return g.het(); }); // returns a bit set with all hets marked with 1's
const auto pass_gqs = Variant::select_if(gqs.begin(), gqs.end(), [](const auto& gq) { return gq[0] > 20; }); // returns a bit set with every sample with gq > 20 marked with 1's
const auto high_qual_hets = hets & pass_gqs; // a bit set with all the samples that are het and have gq > 20

in the above example you see two invocations of select_if using the same ITER but different VALUEs but the user is oblivious to that as he simply defines the type in his lambda function as const auto&

This function is declared with template template arguments to facilitate parameter type deduction and enable usage without specifying the actual types being used. The whole goal here is to keep the user happily using auto everywhere and unaware of the iterator and value types underlying the data structures.

Note
pred can either be a function pointer, a function object or a lambda function.
This function can be called directly (ignoring the template parameters) as all the template parameters can be deduced from the function parameters.
Template Parameters
ITERany iterator that has operator- defined to return the difference in number of elements between two ITER iterators
VALUEthe class of the objects ITER is iterating over. (e.g. in IndividualFieldIterator<Genotype> Genotype is the VALUE, IndividualFieldIterator is the ITER
Parameters
firstiterator to the initial position in a sequence. The range includes the element pointed by first.
lastiterator to the last position in a sequence. The range does not include the element pointed by last.
predunary predicate (lambda) function that accepts an element in range [first, last) as argument and returns a value convertible to bool. The value returned indicates whether the element is considered a match in the context of this function.
Returns
a bitset indicating the samples for which the unary predicate is true
SharedField< float > gamgee::Variant::shared_field_as_float ( const std::string &  tag) const

same as float_shared_field but will attempt to convert underlying data to float if possible.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< float > gamgee::Variant::shared_field_as_float ( const int32_t  index) const

same as float_shared_field but will attempt to convert underlying data to float if possible.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< int32_t > gamgee::Variant::shared_field_as_integer ( const std::string &  tag) const

same as integer_shared_field but will attempt to convert underlying data to integer if possible.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< int32_t > gamgee::Variant::shared_field_as_integer ( const int32_t  index) const

same as integer_shared_field but will attempt to convert underlying data to integer if possible.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< string > gamgee::Variant::shared_field_as_string ( const std::string &  tag) const

same as string_shared_field but will attempt to convert underlying data to string if possible.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< string > gamgee::Variant::shared_field_as_string ( const int32_t  index) const

same as string_shared_field but will attempt to convert underlying data to string if possible.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< string > > gamgee::Variant::string_individual_field ( const std::string &  tag) const

returns a random access object with all the values in a given individual field tag in string format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
IndividualField< IndividualFieldValue< string > > gamgee::Variant::string_individual_field ( const int32_t  index) const

returns a random access object with all the values in a given individual field tag index in string format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< string > gamgee::Variant::string_shared_field ( const std::string &  tag) const

returns a random access object with all the values in a given shared field tag in string format for all samples contiguous in memory.

Warning
creates a new object but makes no copies of the underlying values.
SharedField< string > gamgee::Variant::string_shared_field ( const int32_t  index) const

same as string_shared_field but will attempt to convert underlying data to string if possible.

Warning
creates a new object but makes no copies of the underlying values.

Friends And Related Function Documentation

friend class VariantBuilder
friend

builder needs access to the internals in order to build efficiently

friend class VariantWriter
friend

The documentation for this class was generated from the following files: