Gamgee
You miserable little maggot. I'll stove your head in!
|
Utility class to manipulate a Variant record. More...
#include <variant.h>
Public Member Functions | |
Variant ()=default | |
initializes a null Variant More... | |
Variant (const std::shared_ptr< bcf_hdr_t > &header, const std::shared_ptr< bcf1_t > &body) noexcept | |
creates a Variant given htslib objects. More... | |
Variant (const Variant &other) | |
makes a deep copy of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly. More... | |
Variant & | operator= (const Variant &other) |
deep copy assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly. More... | |
Variant (Variant &&other)=default | |
moves Variant and it's header accordingly. Shared pointers maintain state to all other associated objects correctly. More... | |
Variant & | operator= (Variant &&other)=default |
move assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly. More... | |
VariantHeader | header () const |
returns the header for this variant More... | |
bool | missing () const |
returns true if this is a default-constructed Variant object with no data More... | |
uint32_t | chromosome () const |
returns the integer representation of the chromosome. Notice that chromosomes are listed in index order with regards to the header (so a 0-based number). Similar to Picards getReferenceIndex() More... | |
std::string | chromosome_name () const |
returns the name of the chromosome by querying the header. More... | |
uint32_t | alignment_start () const |
returns a 1-based alignment start position (as you would see in a VCF file). More... | |
uint32_t | alignment_stop () const |
returns a 1-based alignment stop position, as you would see in a VCF INFO END tag, or the end position of the reference allele if there is no END tag. More... | |
float | qual () const |
returns the Phred scaled site qual (probability that the site is not reference). See VCF spec. More... | |
uint32_t | n_samples () const |
returns the number of samples in this Variant record More... | |
uint32_t | n_alleles () const |
returns the number of alleles in this Variant record including the reference allele More... | |
std::string | id () const |
returns the variant id field (typically dbsnp id) More... | |
std::string | ref () const |
returns the ref allele in this Variant record More... | |
std::vector< std::string > | alt () const |
returns the vectors of alt alleles in this Variant record More... | |
VariantFilters | filters () const |
returns a vector-like object with all the filters for this record More... | |
bool | has_filter (const std::string &filter) const |
checks for the existence of a filter in this record More... | |
IndividualField< Genotype > | genotypes () const |
special getter for the Genotype (GT) field. Returns a random access object with all the values in a given GT tag for all samples contiguous in memory. More... | |
IndividualField < IndividualFieldValue < int32_t > > | integer_individual_field (const std::string &tag) const |
returns a random access object with all the values in a given individual field tag in integer format for all samples contiguous in memory. More... | |
IndividualField < IndividualFieldValue< float > > | float_individual_field (const std::string &tag) const |
returns a random access object with all the values in a given individual field tag in float format for all samples contiguous in memory. More... | |
IndividualField < IndividualFieldValue < std::string > > | string_individual_field (const std::string &tag) const |
returns a random access object with all the values in a given individual field tag in string format for all samples contiguous in memory. More... | |
IndividualField < IndividualFieldValue < int32_t > > | individual_field_as_integer (const std::string &tag) const |
same as integer_individual_field but will attempt to convert underlying data to integer if possible. More... | |
IndividualField < IndividualFieldValue< float > > | individual_field_as_float (const std::string &tag) const |
same as float_individual_field but will attempt to convert underlying data to float if possible. More... | |
IndividualField < IndividualFieldValue < std::string > > | individual_field_as_string (const std::string &tag) const |
same as string_individual_field but will attempt to convert underlying data to string if possible. More... | |
IndividualField < IndividualFieldValue < int32_t > > | integer_individual_field (const int32_t index) const |
returns a random access object with all the values in a given individual field tag index in integer format for all samples contiguous in memory. More... | |
IndividualField < IndividualFieldValue< float > > | float_individual_field (const int32_t index) const |
returns a random access object with all the values in a given individual field tag index in float format for all samples contiguous in memory. More... | |
IndividualField < IndividualFieldValue < std::string > > | string_individual_field (const int32_t index) const |
returns a random access object with all the values in a given individual field tag index in string format for all samples contiguous in memory. More... | |
IndividualField < IndividualFieldValue < int32_t > > | individual_field_as_integer (const int32_t index) const |
same as integer_individual_field but will attempt to convert underlying data to integer if possible. More... | |
IndividualField < IndividualFieldValue< float > > | individual_field_as_float (const int32_t index) const |
same as float_individual_field but will attempt to convert underlying data to float if possible. More... | |
IndividualField < IndividualFieldValue < std::string > > | individual_field_as_string (const int32_t index) const |
same as string_individual_field but will attempt to convert underlying data to string if possible. More... | |
bool | boolean_shared_field (const std::string &tag) const |
whether or not the tag is present More... | |
SharedField< int32_t > | integer_shared_field (const std::string &tag) const |
returns a random access object with all the values in a given shared field tag in integer format contiguous in memory. More... | |
SharedField< float > | float_shared_field (const std::string &tag) const |
returns a random access object with all the values in a given shared field tag in float format for all samples contiguous in memory. More... | |
SharedField< std::string > | string_shared_field (const std::string &tag) const |
returns a random access object with all the values in a given shared field tag in string format for all samples contiguous in memory. More... | |
SharedField< int32_t > | shared_field_as_integer (const std::string &tag) const |
same as integer_shared_field but will attempt to convert underlying data to integer if possible. More... | |
SharedField< float > | shared_field_as_float (const std::string &tag) const |
same as float_shared_field but will attempt to convert underlying data to float if possible. More... | |
SharedField< std::string > | shared_field_as_string (const std::string &tag) const |
same as string_shared_field but will attempt to convert underlying data to string if possible. More... | |
bool | boolean_shared_field (const int32_t index) const |
whether or not the tag with this index is present More... | |
SharedField< int32_t > | integer_shared_field (const int32_t index) const |
same as integer_shared_field but will attempt to convert underlying data to integer if possible. More... | |
SharedField< float > | float_shared_field (const int32_t index) const |
same as float_shared_field but will attempt to convert underlying data to float if possible. More... | |
SharedField< std::string > | string_shared_field (const int32_t index) const |
same as string_shared_field but will attempt to convert underlying data to string if possible. More... | |
SharedField< int32_t > | shared_field_as_integer (const int32_t index) const |
same as integer_shared_field but will attempt to convert underlying data to integer if possible. More... | |
SharedField< float > | shared_field_as_float (const int32_t index) const |
same as float_shared_field but will attempt to convert underlying data to float if possible. More... | |
SharedField< std::string > | shared_field_as_string (const int32_t index) const |
same as string_shared_field but will attempt to convert underlying data to string if possible. More... | |
AlleleMask | allele_mask () const |
computes the allele types for all allels (including the reference allele) More... | |
Static Public Member Functions | |
template<class VALUE , template< class > class ITER> | |
static boost::dynamic_bitset | select_if (const ITER< VALUE > &first, const ITER< VALUE > &last, const std::function< bool(const decltype(*first)&value)> pred) |
functional-style set logic operations for variant field vectors More... | |
Friends | |
class | VariantWriter |
class | VariantBuilder |
builder needs access to the internals in order to build efficiently More... | |
class | ReferenceBlockSplittingVariantIterator |
Utility class to manipulate a Variant record.
|
default |
gamgee::Variant::Variant | ( | const Variant & | other | ) |
makes a deep copy of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly.
creates a deep copy of a variant record
|
default |
moves Variant and it's header accordingly. Shared pointers maintain state to all other associated objects correctly.
|
inline |
returns a 1-based alignment start position (as you would see in a VCF file).
|
inline |
returns a 1-based alignment stop position, as you would see in a VCF INFO END tag, or the end position of the reference allele if there is no END tag.
AlleleMask gamgee::Variant::allele_mask | ( | ) | const |
computes the allele types for all allels (including the reference allele)
This function gives you an index vector (AlleleMask) that can be used to query genotypes for snp(), indel(), complex(), ...
Complexity is O(N) on the number of alleles.
std::vector< std::string > gamgee::Variant::alt | ( | ) | const |
returns the vectors of alt alleles in this Variant record
bool gamgee::Variant::boolean_shared_field | ( | const std::string & | tag | ) | const |
whether or not the tag is present
bool gamgee::Variant::boolean_shared_field | ( | const int32_t | index | ) | const |
whether or not the tag with this index is present
|
inline |
returns the integer representation of the chromosome. Notice that chromosomes are listed in index order with regards to the header (so a 0-based number). Similar to Picards getReferenceIndex()
|
inline |
returns the name of the chromosome by querying the header.
VariantFilters gamgee::Variant::filters | ( | ) | const |
returns a vector-like object with all the filters for this record
IndividualField< IndividualFieldValue< float > > gamgee::Variant::float_individual_field | ( | const std::string & | tag | ) | const |
returns a random access object with all the values in a given individual field tag in float format for all samples contiguous in memory.
IndividualField< IndividualFieldValue< float > > gamgee::Variant::float_individual_field | ( | const int32_t | index | ) | const |
returns a random access object with all the values in a given individual field tag index in float format for all samples contiguous in memory.
SharedField< float > gamgee::Variant::float_shared_field | ( | const std::string & | tag | ) | const |
returns a random access object with all the values in a given shared field tag in float format for all samples contiguous in memory.
SharedField< float > gamgee::Variant::float_shared_field | ( | const int32_t | index | ) | const |
same as float_shared_field but will attempt to convert underlying data to float if possible.
IndividualField< Genotype > gamgee::Variant::genotypes | ( | ) | const |
special getter for the Genotype (GT) field. Returns a random access object with all the values in a given GT tag for all samples contiguous in memory.
< if the variant is missing or the GT tag is missing, return an empty IndividualField
bool gamgee::Variant::has_filter | ( | const std::string & | filter | ) | const |
checks for the existence of a filter in this record
|
inline |
returns the header for this variant
std::string gamgee::Variant::id | ( | ) | const |
returns the variant id field (typically dbsnp id)
IndividualField< IndividualFieldValue< float > > gamgee::Variant::individual_field_as_float | ( | const std::string & | tag | ) | const |
same as float_individual_field but will attempt to convert underlying data to float if possible.
IndividualField< IndividualFieldValue< float > > gamgee::Variant::individual_field_as_float | ( | const int32_t | index | ) | const |
same as float_individual_field but will attempt to convert underlying data to float if possible.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::individual_field_as_integer | ( | const std::string & | tag | ) | const |
same as integer_individual_field but will attempt to convert underlying data to integer if possible.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::individual_field_as_integer | ( | const int32_t | index | ) | const |
same as integer_individual_field but will attempt to convert underlying data to integer if possible.
IndividualField< IndividualFieldValue< std::string > > gamgee::Variant::individual_field_as_string | ( | const std::string & | tag | ) | const |
same as string_individual_field but will attempt to convert underlying data to string if possible.
IndividualField< IndividualFieldValue< std::string > > gamgee::Variant::individual_field_as_string | ( | const int32_t | index | ) | const |
same as string_individual_field but will attempt to convert underlying data to string if possible.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::integer_individual_field | ( | const std::string & | tag | ) | const |
returns a random access object with all the values in a given individual field tag in integer format for all samples contiguous in memory.
IndividualField< IndividualFieldValue< int32_t > > gamgee::Variant::integer_individual_field | ( | const int32_t | index | ) | const |
returns a random access object with all the values in a given individual field tag index in integer format for all samples contiguous in memory.
SharedField< int32_t > gamgee::Variant::integer_shared_field | ( | const std::string & | tag | ) | const |
returns a random access object with all the values in a given shared field tag in integer format contiguous in memory.
SharedField< int32_t > gamgee::Variant::integer_shared_field | ( | const int32_t | index | ) | const |
same as integer_shared_field but will attempt to convert underlying data to integer if possible.
|
inline |
returns true if this is a default-constructed Variant object with no data
|
inline |
returns the number of alleles in this Variant record including the reference allele
|
inline |
returns the number of samples in this Variant record
deep copy assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly.
creates a deep copy of a variant record
other | the Variant to be copied |
< shared_ptr assignment will take care of deallocating old record if necessary
move assignment of a Variant and it's header. Shared pointers maintain state to all other associated objects correctly.
|
inline |
returns the Phred scaled site qual (probability that the site is not reference). See VCF spec.
std::string gamgee::Variant::ref | ( | ) | const |
returns the ref allele in this Variant record
|
inlinestatic |
functional-style set logic operations for variant field vectors
This function applies the unary predicate pred to every element between first and last in the container without modifying them. It then produces a bitset with as many elements as the full iteration (from first to last) For all elements where the predicate returns true, the corresponding bit in the bitset will be 1 (set). For all elements where the predicate returns false, the corresponding bit will be 0 (unset)
The great advantage of this functionality is that you can perform one operation across all elements of an iterator benefiting from data locality and cache prefetching which gets translated into a more manageable format (bitset) for subsequent set-logic transformations. This will be particularly fast for very large datasets, for example, in a VCF file with 1000+ samples, running several select_if's end to end on all the samples and then combining the resulting bitsets using set-logic, is much faster than having one single iteration updating several IndividualFields due to it's distance in the computer's memory.
For example:
in the above example you see two invocations of select_if using the same ITER but different VALUEs but the user is oblivious to that as he simply defines the type in his lambda function as const auto&
This function is declared with template template arguments to facilitate parameter type deduction and enable usage without specifying the actual types being used. The whole goal here is to keep the user happily using auto everywhere and unaware of the iterator and value types underlying the data structures.
ITER | any iterator that has operator- defined to return the difference in number of elements between two ITER iterators |
VALUE | the class of the objects ITER is iterating over. (e.g. in IndividualFieldIterator<Genotype> Genotype is the VALUE, IndividualFieldIterator is the ITER |
first | iterator to the initial position in a sequence. The range includes the element pointed by first. |
last | iterator to the last position in a sequence. The range does not include the element pointed by last. |
pred | unary predicate (lambda) function that accepts an element in range [first, last) as argument and returns a value convertible to bool. The value returned indicates whether the element is considered a match in the context of this function. |
SharedField< float > gamgee::Variant::shared_field_as_float | ( | const std::string & | tag | ) | const |
same as float_shared_field but will attempt to convert underlying data to float if possible.
SharedField< float > gamgee::Variant::shared_field_as_float | ( | const int32_t | index | ) | const |
same as float_shared_field but will attempt to convert underlying data to float if possible.
SharedField< int32_t > gamgee::Variant::shared_field_as_integer | ( | const std::string & | tag | ) | const |
same as integer_shared_field but will attempt to convert underlying data to integer if possible.
SharedField< int32_t > gamgee::Variant::shared_field_as_integer | ( | const int32_t | index | ) | const |
same as integer_shared_field but will attempt to convert underlying data to integer if possible.
SharedField< string > gamgee::Variant::shared_field_as_string | ( | const std::string & | tag | ) | const |
same as string_shared_field but will attempt to convert underlying data to string if possible.
SharedField< string > gamgee::Variant::shared_field_as_string | ( | const int32_t | index | ) | const |
same as string_shared_field but will attempt to convert underlying data to string if possible.
IndividualField< IndividualFieldValue< string > > gamgee::Variant::string_individual_field | ( | const std::string & | tag | ) | const |
returns a random access object with all the values in a given individual field tag in string format for all samples contiguous in memory.
IndividualField< IndividualFieldValue< string > > gamgee::Variant::string_individual_field | ( | const int32_t | index | ) | const |
returns a random access object with all the values in a given individual field tag index in string format for all samples contiguous in memory.
SharedField< string > gamgee::Variant::string_shared_field | ( | const std::string & | tag | ) | const |
returns a random access object with all the values in a given shared field tag in string format for all samples contiguous in memory.
SharedField< string > gamgee::Variant::string_shared_field | ( | const int32_t | index | ) | const |
same as string_shared_field but will attempt to convert underlying data to string if possible.
|
friend |
|
friend |
builder needs access to the internals in order to build efficiently
|
friend |