Gamgee
You miserable little maggot. I'll stove your head in!
|
utility functions for the gamgee library More...
Classes | |
struct | HtsFileDeleter |
a functor object to delete an htsFile pointer More... | |
struct | HtsIndexDeleter |
a functor object to delete an hts file index pointer More... | |
struct | HtsIteratorDeleter |
a functor object to delete an hts file iterator pointer More... | |
struct | IFStreamDeleter |
a functor object to delete an ifstream More... | |
class | MergedVCFAllelesIdxLUT |
LUT class for storing mappings between allele vectors in the merged file and input VCF files Since the #alleles per site is expected to be small, this class sets the number of fields to 10. This makes any subsequent re-allocations unlikely. The function resize_luts_if_needed() will almost always return immediately after failing the if condition. More... | |
class | MergedVCFLUTBase |
Base class to store look up information between fields of merged header and input headers. More... | |
struct | SamBodyDeleter |
a functor object to delete a bam1_t pointer More... | |
struct | SamHeaderDeleter |
a functor object to delete a bam_hdr_t pointer More... | |
class | ShortValueOptimizedStorage |
struct | SyncedReaderDeleter |
a functor object to delete a bcf_srs_t pointer More... | |
struct | VariantBodyDeleter |
a functor object to delete a bcf1_t pointer More... | |
struct | VariantHeaderDeleter |
a functor object to delete a bcf_hdr_t pointer More... | |
Typedefs | |
using | CombineAllelesLUT = MergedVCFAllelesIdxLUT< true, true > |
Enumerations | |
enum | VariantFieldType { VariantFieldType::NIL = 0, VariantFieldType::INT8 = 1, VariantFieldType::INT16 = 2, VariantFieldType::INT32 = 3, VariantFieldType::FLOAT = 5, VariantFieldType::STRING = 7 } |
an enumeration of the types in htslib for the format field values More... | |
Functions | |
std::shared_ptr< std::ifstream > | make_shared_ifstream (std::ifstream *ifstream_ptr) |
wraps a pre-allocated ifstream in a shared_ptr with correct deleter More... | |
std::shared_ptr< std::ifstream > | make_shared_ifstream (std::string filename) |
wraps an input file in a shared_ptr to an ifstream with correct deleter More... | |
template<class TYPE > | |
bool | allele_missing (const uint8_t *data_ptr, const uint32_t allele_index, const TYPE missing) |
template<class TYPE > | |
vector< int32_t > | allele_keys (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const TYPE missing, const TYPE vector_end) |
template<class TYPE > | |
vector< string > | allele_strings (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const TYPE missing, const TYPE vector_end) |
bool | allele_missing (const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const uint32_t allele_index) |
Returns true if the allele at position allele_index is missing. More... | |
vector< int32_t > | allele_keys (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr) |
Returns the genotype allele keys. More... | |
vector< string > | allele_strings (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr) |
Returns the genotype allele strings. More... | |
string | allele_key_to_string (const std::shared_ptr< bcf1_t > &body, const int32_t key_index) |
Returns the genotype allele string from this line. More... | |
uint32_t | allele_count (const bcf_fmt_t *const format_ptr) |
Counts the genotype alleles. More... | |
template<class TYPE > | |
int32_t | allele_key (const uint8_t *data_ptr, const uint32_t allele_index, const TYPE missing, const TYPE vector_end) |
int32_t | allele_key (const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const uint32_t allele_index) |
Returns the genotype allele at position allele_index. More... | |
shared_ptr< htsFile > | make_shared_hts_file (htsFile *hts_file_ptr) |
wraps a pre-allocated htsFile in a shared_ptr with correct deleter More... | |
shared_ptr< hts_idx_t > | make_shared_hts_index (hts_idx_t *hts_index_ptr) |
wraps a pre-allocated hts_idx_t in a shared_ptr with correct deleter More... | |
shared_ptr< hts_itr_t > | make_shared_hts_itr (hts_itr_t *hts_itr_ptr) |
wraps a pre-allocated hts_itr_t in a shared_ptr with correct deleter More... | |
shared_ptr< bam1_t > | make_shared_sam (bam1_t *sam_ptr) |
wraps a pre-allocated bam1_t in a shared_ptr with correct deleter More... | |
shared_ptr< bam_hdr_t > | make_shared_sam_header (bam_hdr_t *sam_header_ptr) |
wraps a pre-allocated bam_hdr_t in a shared_ptr with correct deleter More... | |
shared_ptr< bcf1_t > | make_shared_variant (bcf1_t *bcf_ptr) |
wraps a pre-allocated bcf1_t in a shared_ptr with correct deleter More... | |
shared_ptr< bcf_hdr_t > | make_shared_variant_header (bcf_hdr_t *bcf_hdr_ptr) |
wraps a pre-allocated bcf_hdr_t in a shared_ptr with correct deleter More... | |
std::shared_ptr< bcf_srs_t > | make_shared_synced_variant_reader (bcf_srs_t *synced_reader_ptr) |
wraps a pre-allocated bcf_srs_t in a shared_ptr with correct deleter More... | |
unique_ptr< htsFile, HtsFileDeleter > | make_unique_hts_file (htsFile *hts_file_ptr) |
wraps a pre-allocated htsFile in a unique_ptr with correct deleter More... | |
std::unique_ptr< hts_itr_t, HtsIteratorDeleter > | make_unique_hts_itr (hts_itr_t *hts_itr_ptr) |
wraps a pre-allocated hts_itr_t in a unique_ptr with correct deleter More... | |
bam1_t * | sam_deep_copy (bam1_t *original) |
creates a deep copy of an existing bam1_t More... | |
bam_hdr_t * | sam_header_deep_copy (bam_hdr_t *original) |
creates a deep copy of an existing bam_hdr_t More... | |
bcf1_t * | variant_deep_copy (bcf1_t *original) |
creates a deep copy of an existing bcf1_t More... | |
bcf_hdr_t * | variant_header_deep_copy (bcf_hdr_t *original) |
creates a deep copy of an existing bcf_hdr_t More... | |
bam1_t * | sam_shallow_copy (bam1_t *original) |
creates a shallow copy of an existing bam1_t: copies core fields but not the data buffer or fields related to the size of the data buffer More... | |
std::string | htslib_filter_name (bcf_hdr_t *header, bcf1_t *body, int index) |
helper function to translate an index into a string in the filter list More... | |
uint8_t | bcf_type_to_element_size (const int32_t htslib_type) |
Returns the number of bytes required to store each BCF_BT_* type. More... | |
uint8_t | int_encoded_type (const int32_t min_val, const int32_t max_val) |
Given a min and max value, determines whether int8, int16, or int32 BCF encoding is required. More... | |
kstring_t | initialize_htslib_buffer (const uint32_t initial_capacity) |
Returns a newly-allocated kstring_t buffer suitable for passing to htslib. More... | |
uint8_t | int_encoded_type (const int32_t val) |
uint8_t | int_encoded_size (const int32_t val) |
uint32_t | encoded_size (const int8_t field_type, const uint32_t field_length, bool add_type_descriptor=true) |
char | complement_base (const char base) |
std::string | complement (std::string &sequence) |
calculates the complement of a sequence in-place More... | |
std::string | complement (const std::string &sequence) |
calculates the complement of a sequence More... | |
std::string | reverse_complement (const std::string &sequence) |
calculates the reverse complement of a sequence More... | |
std::vector< std::string > | hts_string_array_to_vector (const char *const *const string_array, const uint32_t array_size) |
converts an array of c-strings into a vector<string> More... | |
char | complement (const char base) |
calculates the complement of a base More... | |
void | check_max_boundary (const uint32_t index, const uint32_t size, const std::string &prefix_msg) |
checks that an index is greater than or equal to size More... | |
void | check_max_boundary (const uint32_t index, const uint32_t size) |
checks that an index is greater than or equal to size More... | |
template<class TYPE > | |
bool | bcf_check_equal_element (const TYPE &x, const TYPE &y) |
| |
template<> | |
bool | bcf_check_equal_element< float > (const float &x, const float &y) |
: Check whether two float values from VCF fields are equal More... | |
template<class TYPE > | |
bool | bcf_is_vector_end_value (const TYPE &value) |
template<> | |
bool | bcf_is_vector_end_value< int32_t > (const int32_t &value) |
template<> | |
bool | bcf_is_vector_end_value< float > (const float &value) |
template<class ITER > | |
const uint8_t * | cache_and_advance_to_end_if_necessary (const uint8_t *current_ptr, const uint8_t *end_ptr, ITER &it) |
advances current ptr to end of the vector if the current element is bcf_*_vector_end More... | |
int32_t | convert_data_to_integer (const uint8_t *data_ptr, const int index, const uint8_t num_bytes_per_value, const VariantFieldType &type) |
converts the value in an index from the byte array into int32_t More... | |
float | convert_data_to_float (const uint8_t *data_ptr, const int index, const uint8_t num_bytes_per_value, const VariantFieldType &type) |
converts the value in an index from the byte array into float More... | |
std::string | convert_data_to_string (const uint8_t *data_ptr, const int index, const uint8_t num_bytes_per_value, const VariantFieldType &type) |
converts the value in an index from the byte array into string More... | |
uint8_t | size_for_type (const VariantFieldType &type, const bcf_fmt_t *const format_ptr) |
returns the number of bytes for a given VariantFieldType More... | |
uint8_t | size_for_type (const VariantFieldType &type, const bcf_info_t *const info_ptr) |
returns the number of bytes for a given VariantFieldType More... | |
bool | is_string_type (const int32_t &type) |
| |
template<typename... T> | |
auto | zip (const T &...containers) -> boost::iterator_range< boost::zip_iterator< decltype(boost::make_tuple(containers.begin()...))>> |
utility method to zip iterators together with simpler syntax than boost More... | |
Variables | |
const uint8_t | bcf_type_sizes [] = { 0, 1, 2, 4, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 } |
utility functions for the gamgee library
This namespace includes typical functors and templates for DNA sequence manipulation as well as low level routines for htslib memory management
using gamgee::utils::CombineAllelesLUT = typedef MergedVCFAllelesIdxLUT<true,true> |
|
strong |
an enumeration of the types in htslib for the format field values
Enumerator | |
---|---|
NIL | |
INT8 | |
INT16 | |
INT32 | |
FLOAT | |
STRING |
|
inline |
Counts the genotype alleles.
format_ptr | The GT field from the line. |
|
inline |
|
inline |
Returns the genotype allele at position allele_index.
format_ptr | The GT field from the line. |
data_ptr | The GT for this sample. |
allele_index | The index within the GT field to return an integer allele. |
string gamgee::utils::allele_key_to_string | ( | const std::shared_ptr< bcf1_t > & | body, |
const int32_t | key_index | ||
) |
Returns the genotype allele string from this line.
body | The shared memory variant "line" from a vcf, or bcf. |
key_index | The integer representation of the allele within this "line". |
vector< int32_t > gamgee::utils::allele_keys | ( | const std::shared_ptr< bcf1_t > & | body, |
const bcf_fmt_t *const | format_ptr, | ||
const uint8_t * | data_ptr, | ||
const TYPE | missing, | ||
const TYPE | vector_end | ||
) |
vector< int32_t > gamgee::utils::allele_keys | ( | const std::shared_ptr< bcf1_t > & | body, |
const bcf_fmt_t *const | format_ptr, | ||
const uint8_t * | data_ptr | ||
) |
Returns the genotype allele keys.
body | The shared memory variant "line" from a vcf, or bcf. |
format_ptr | The GT field from the line. |
data_ptr | The GT for this sample. |
bool gamgee::utils::allele_missing | ( | const uint8_t * | data_ptr, |
const uint32_t | allele_index, | ||
const TYPE | missing | ||
) |
bool gamgee::utils::allele_missing | ( | const bcf_fmt_t *const | format_ptr, |
const uint8_t * | data_ptr, | ||
const uint32_t | allele_index | ||
) |
Returns true if the allele at position allele_index is missing.
format_ptr | The GT field from the line. |
data_ptr | The GT for this sample. |
allele_index | The index within the GT field to return an integer allele. |
vector< string > gamgee::utils::allele_strings | ( | const std::shared_ptr< bcf1_t > & | body, |
const bcf_fmt_t *const | format_ptr, | ||
const uint8_t * | data_ptr, | ||
const TYPE | missing, | ||
const TYPE | vector_end | ||
) |
vector< string > gamgee::utils::allele_strings | ( | const std::shared_ptr< bcf1_t > & | body, |
const bcf_fmt_t *const | format_ptr, | ||
const uint8_t * | data_ptr | ||
) |
Returns the genotype allele strings.
body | The shared memory variant "line" from a vcf, or bcf. |
format_ptr | The GT field from the line. |
data_ptr | The GT for this sample. |
|
inline |
|
inline |
: Check whether two float values from VCF fields are equal
|
inline |
|
inline |
|
inline |
uint8_t gamgee::utils::bcf_type_to_element_size | ( | const int32_t | htslib_type | ) |
Returns the number of bytes required to store each BCF_BT_* type.
|
inline |
advances current ptr to end of the vector if the current element is bcf_*_vector_end
|
inline |
checks that an index is greater than or equal to size
index | the index between 0 and size to check |
size | one past the maximum valid index |
prefix_msg | additional string to prefix error message |
throws | an out_of_bounds exception if index is out of limits |
<
|
inline |
checks that an index is greater than or equal to size
index | the index between 0 and size to check |
size | one past the maximum valid index |
throws | an out_of_bounds exception if index is out of limits |
std::string gamgee::utils::complement | ( | std::string & | sequence | ) |
calculates the complement of a sequence in-place
sequence | the sequence to turn into the complement |
std::string gamgee::utils::complement | ( | const std::string & | sequence | ) |
calculates the complement of a sequence
sequence | the sequence to get the complement from |
char gamgee::utils::complement | ( | const char | base | ) |
calculates the complement of a base
base | the base to get the complement from |
char gamgee::utils::complement_base | ( | const char | base | ) |
float gamgee::utils::convert_data_to_float | ( | const uint8_t * | data_ptr, |
const int | index, | ||
const uint8_t | num_bytes_per_value, | ||
const VariantFieldType & | type | ||
) |
converts the value in an index from the byte array into float
int32_t gamgee::utils::convert_data_to_integer | ( | const uint8_t * | data_ptr, |
const int | index, | ||
const uint8_t | num_bytes_per_value, | ||
const VariantFieldType & | type | ||
) |
converts the value in an index from the byte array into int32_t
The byte array's underlying data representation is record specific, meaning that even numbers (like Integer) can be represented multiple ways across registers (some with uint8_t others with uint32_t...) dictated by the the maximum value in the field.
This member function provides correct index location and appropriately creates a new value of VALUE_TYPE to return to the user.
Will | throw std::invalid_argument exception if trying to create a string out of a numeric format or vice-versa. All numeric type conversions are internally truncated or expanded accordingly. |
std::string gamgee::utils::convert_data_to_string | ( | const uint8_t * | data_ptr, |
const int | index, | ||
const uint8_t | num_bytes_per_value, | ||
const VariantFieldType & | type | ||
) |
converts the value in an index from the byte array into string
|
inline |
std::vector< std::string > gamgee::utils::hts_string_array_to_vector | ( | const char *const *const | string_array, |
const uint32_t | array_size | ||
) |
converts an array of c-strings into a vector<string>
Useful in many hts structs where a list of names is stored as a char** and we want to manipulate it in a vector<string> to maintain data contiguity and improve usability
helper function to translate an index into a string in the filter list
header | a VariantHeader htslib pointer |
body | a Variant htslib pointer |
index | the index of the filter you want access to |
kstring_t gamgee::utils::initialize_htslib_buffer | ( | const uint32_t | initial_capacity | ) |
Returns a newly-allocated kstring_t buffer suitable for passing to htslib.
The returned buffer is safe for htslib to call realloc() on, since it is initially allocated with malloc().
|
inline |
|
inline |
uint8_t gamgee::utils::int_encoded_type | ( | const int32_t | min_val, |
const int32_t | max_val | ||
) |
Given a min and max value, determines whether int8, int16, or int32 BCF encoding is required.
|
inline |
wraps a pre-allocated htsFile in a shared_ptr with correct deleter
hts_file_ptr | an htslib raw file pointer |
wraps a pre-allocated hts_idx_t in a shared_ptr with correct deleter
hts_index_ptr | an htslib raw file index pointer |
wraps a pre-allocated hts_itr_t in a shared_ptr with correct deleter
hts_itr_ptr | an htslib raw file iterator pointer |
std::shared_ptr< std::ifstream > gamgee::utils::make_shared_ifstream | ( | std::ifstream * | ifstream_ptr | ) |
wraps a pre-allocated ifstream in a shared_ptr with correct deleter
ifstream_ptr | an ifstream raw file pointer |
std::shared_ptr< std::ifstream > gamgee::utils::make_shared_ifstream | ( | std::string | filename | ) |
wraps an input file in a shared_ptr to an ifstream with correct deleter
filename | the input filename |
wraps a pre-allocated bam1_t in a shared_ptr with correct deleter
sam_ptr | an htslib raw bam pointer |
wraps a pre-allocated bam_hdr_t in a shared_ptr with correct deleter
sam_header_ptr | an htslib raw bam header pointer |
std::shared_ptr< bcf_srs_t > gamgee::utils::make_shared_synced_variant_reader | ( | bcf_srs_t * | synced_reader_ptr | ) |
wraps a pre-allocated bcf_srs_t in a shared_ptr with correct deleter
synced_reader_ptr | an htslib synced BCF reader pointer |
wraps a pre-allocated bcf1_t in a shared_ptr with correct deleter
bcf_ptr | an htslib raw vcf pointer |
wraps a pre-allocated bcf_hdr_t in a shared_ptr with correct deleter
bcf_hdr_ptr | an htslib raw variant header pointer |
std::unique_ptr< htsFile, HtsFileDeleter > gamgee::utils::make_unique_hts_file | ( | htsFile * | hts_file_ptr | ) |
wraps a pre-allocated htsFile in a unique_ptr with correct deleter
hts_file_ptr | an htslib raw file pointer |
std::unique_ptr< hts_itr_t, HtsIteratorDeleter > gamgee::utils::make_unique_hts_itr | ( | hts_itr_t * | hts_itr_ptr | ) |
wraps a pre-allocated hts_itr_t in a unique_ptr with correct deleter
hts_itr_ptr | an htslib raw file iterator pointer |
std::string gamgee::utils::reverse_complement | ( | const std::string & | sequence | ) |
calculates the reverse complement of a sequence
sequence | the sequence to get the reverse complement from |
creates a deep copy of an existing bam1_t
original | an htslib raw bam pointer |
creates a deep copy of an existing bam_hdr_t
original | an htslib raw bam header pointer |
creates a shallow copy of an existing bam1_t: copies core fields but not the data buffer or fields related to the size of the data buffer
uint8_t gamgee::utils::size_for_type | ( | const VariantFieldType & | type, |
const bcf_fmt_t *const | format_ptr | ||
) |
returns the number of bytes for a given VariantFieldType
uint8_t gamgee::utils::size_for_type | ( | const VariantFieldType & | type, |
const bcf_info_t *const | info_ptr | ||
) |
returns the number of bytes for a given VariantFieldType
creates a deep copy of an existing bcf1_t
original | an htslib raw bcf pointer |
creates a deep copy of an existing bcf_hdr_t
original | an htslib raw bcf header pointer |
auto gamgee::utils::zip | ( | const T &... | containers | ) | -> boost::iterator_range<boost::zip_iterator<decltype(boost::make_tuple(containers.begin()...))>> |
utility method to zip iterators together with simpler syntax than boost
This is a wrapper over boost's zip_iterator interface to simplify the usage of zip iterators especially in for each loops. This function enables the following syntax:
for more details look at boost's zip_iterator documentation.
const uint8_t gamgee::utils::bcf_type_sizes[] = { 0, 1, 2, 4, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 } |