|
Gamgee
You miserable little maggot. I'll stove your head in!
|
utility functions for the gamgee library More...
Classes | |
| struct | HtsFileDeleter |
| a functor object to delete an htsFile pointer More... | |
| struct | HtsIndexDeleter |
| a functor object to delete an hts file index pointer More... | |
| struct | HtsIteratorDeleter |
| a functor object to delete an hts file iterator pointer More... | |
| struct | IFStreamDeleter |
| a functor object to delete an ifstream More... | |
| class | MergedVCFAllelesIdxLUT |
| LUT class for storing mappings between allele vectors in the merged file and input VCF files Since the #alleles per site is expected to be small, this class sets the number of fields to 10. This makes any subsequent re-allocations unlikely. The function resize_luts_if_needed() will almost always return immediately after failing the if condition. More... | |
| class | MergedVCFLUTBase |
| Base class to store look up information between fields of merged header and input headers. More... | |
| struct | SamBodyDeleter |
| a functor object to delete a bam1_t pointer More... | |
| struct | SamHeaderDeleter |
| a functor object to delete a bam_hdr_t pointer More... | |
| class | ShortValueOptimizedStorage |
| struct | SyncedReaderDeleter |
| a functor object to delete a bcf_srs_t pointer More... | |
| struct | VariantBodyDeleter |
| a functor object to delete a bcf1_t pointer More... | |
| struct | VariantHeaderDeleter |
| a functor object to delete a bcf_hdr_t pointer More... | |
Typedefs | |
| using | CombineAllelesLUT = MergedVCFAllelesIdxLUT< true, true > |
Enumerations | |
| enum | VariantFieldType { VariantFieldType::NIL = 0, VariantFieldType::INT8 = 1, VariantFieldType::INT16 = 2, VariantFieldType::INT32 = 3, VariantFieldType::FLOAT = 5, VariantFieldType::STRING = 7 } |
| an enumeration of the types in htslib for the format field values More... | |
Functions | |
| std::shared_ptr< std::ifstream > | make_shared_ifstream (std::ifstream *ifstream_ptr) |
| wraps a pre-allocated ifstream in a shared_ptr with correct deleter More... | |
| std::shared_ptr< std::ifstream > | make_shared_ifstream (std::string filename) |
| wraps an input file in a shared_ptr to an ifstream with correct deleter More... | |
| template<class TYPE > | |
| bool | allele_missing (const uint8_t *data_ptr, const uint32_t allele_index, const TYPE missing) |
| template<class TYPE > | |
| vector< int32_t > | allele_keys (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const TYPE missing, const TYPE vector_end) |
| template<class TYPE > | |
| vector< string > | allele_strings (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const TYPE missing, const TYPE vector_end) |
| bool | allele_missing (const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const uint32_t allele_index) |
| Returns true if the allele at position allele_index is missing. More... | |
| vector< int32_t > | allele_keys (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr) |
| Returns the genotype allele keys. More... | |
| vector< string > | allele_strings (const std::shared_ptr< bcf1_t > &body, const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr) |
| Returns the genotype allele strings. More... | |
| string | allele_key_to_string (const std::shared_ptr< bcf1_t > &body, const int32_t key_index) |
| Returns the genotype allele string from this line. More... | |
| uint32_t | allele_count (const bcf_fmt_t *const format_ptr) |
| Counts the genotype alleles. More... | |
| template<class TYPE > | |
| int32_t | allele_key (const uint8_t *data_ptr, const uint32_t allele_index, const TYPE missing, const TYPE vector_end) |
| int32_t | allele_key (const bcf_fmt_t *const format_ptr, const uint8_t *data_ptr, const uint32_t allele_index) |
| Returns the genotype allele at position allele_index. More... | |
| shared_ptr< htsFile > | make_shared_hts_file (htsFile *hts_file_ptr) |
| wraps a pre-allocated htsFile in a shared_ptr with correct deleter More... | |
| shared_ptr< hts_idx_t > | make_shared_hts_index (hts_idx_t *hts_index_ptr) |
| wraps a pre-allocated hts_idx_t in a shared_ptr with correct deleter More... | |
| shared_ptr< hts_itr_t > | make_shared_hts_itr (hts_itr_t *hts_itr_ptr) |
| wraps a pre-allocated hts_itr_t in a shared_ptr with correct deleter More... | |
| shared_ptr< bam1_t > | make_shared_sam (bam1_t *sam_ptr) |
| wraps a pre-allocated bam1_t in a shared_ptr with correct deleter More... | |
| shared_ptr< bam_hdr_t > | make_shared_sam_header (bam_hdr_t *sam_header_ptr) |
| wraps a pre-allocated bam_hdr_t in a shared_ptr with correct deleter More... | |
| shared_ptr< bcf1_t > | make_shared_variant (bcf1_t *bcf_ptr) |
| wraps a pre-allocated bcf1_t in a shared_ptr with correct deleter More... | |
| shared_ptr< bcf_hdr_t > | make_shared_variant_header (bcf_hdr_t *bcf_hdr_ptr) |
| wraps a pre-allocated bcf_hdr_t in a shared_ptr with correct deleter More... | |
| std::shared_ptr< bcf_srs_t > | make_shared_synced_variant_reader (bcf_srs_t *synced_reader_ptr) |
| wraps a pre-allocated bcf_srs_t in a shared_ptr with correct deleter More... | |
| unique_ptr< htsFile, HtsFileDeleter > | make_unique_hts_file (htsFile *hts_file_ptr) |
| wraps a pre-allocated htsFile in a unique_ptr with correct deleter More... | |
| std::unique_ptr< hts_itr_t, HtsIteratorDeleter > | make_unique_hts_itr (hts_itr_t *hts_itr_ptr) |
| wraps a pre-allocated hts_itr_t in a unique_ptr with correct deleter More... | |
| bam1_t * | sam_deep_copy (bam1_t *original) |
| creates a deep copy of an existing bam1_t More... | |
| bam_hdr_t * | sam_header_deep_copy (bam_hdr_t *original) |
| creates a deep copy of an existing bam_hdr_t More... | |
| bcf1_t * | variant_deep_copy (bcf1_t *original) |
| creates a deep copy of an existing bcf1_t More... | |
| bcf_hdr_t * | variant_header_deep_copy (bcf_hdr_t *original) |
| creates a deep copy of an existing bcf_hdr_t More... | |
| bam1_t * | sam_shallow_copy (bam1_t *original) |
| creates a shallow copy of an existing bam1_t: copies core fields but not the data buffer or fields related to the size of the data buffer More... | |
| std::string | htslib_filter_name (bcf_hdr_t *header, bcf1_t *body, int index) |
| helper function to translate an index into a string in the filter list More... | |
| uint8_t | bcf_type_to_element_size (const int32_t htslib_type) |
| Returns the number of bytes required to store each BCF_BT_* type. More... | |
| uint8_t | int_encoded_type (const int32_t min_val, const int32_t max_val) |
| Given a min and max value, determines whether int8, int16, or int32 BCF encoding is required. More... | |
| kstring_t | initialize_htslib_buffer (const uint32_t initial_capacity) |
| Returns a newly-allocated kstring_t buffer suitable for passing to htslib. More... | |
| uint8_t | int_encoded_type (const int32_t val) |
| uint8_t | int_encoded_size (const int32_t val) |
| uint32_t | encoded_size (const int8_t field_type, const uint32_t field_length, bool add_type_descriptor=true) |
| char | complement_base (const char base) |
| std::string | complement (std::string &sequence) |
| calculates the complement of a sequence in-place More... | |
| std::string | complement (const std::string &sequence) |
| calculates the complement of a sequence More... | |
| std::string | reverse_complement (const std::string &sequence) |
| calculates the reverse complement of a sequence More... | |
| std::vector< std::string > | hts_string_array_to_vector (const char *const *const string_array, const uint32_t array_size) |
| converts an array of c-strings into a vector<string> More... | |
| char | complement (const char base) |
| calculates the complement of a base More... | |
| void | check_max_boundary (const uint32_t index, const uint32_t size, const std::string &prefix_msg) |
| checks that an index is greater than or equal to size More... | |
| void | check_max_boundary (const uint32_t index, const uint32_t size) |
| checks that an index is greater than or equal to size More... | |
| template<class TYPE > | |
| bool | bcf_check_equal_element (const TYPE &x, const TYPE &y) |
| |
| template<> | |
| bool | bcf_check_equal_element< float > (const float &x, const float &y) |
| : Check whether two float values from VCF fields are equal More... | |
| template<class TYPE > | |
| bool | bcf_is_vector_end_value (const TYPE &value) |
| template<> | |
| bool | bcf_is_vector_end_value< int32_t > (const int32_t &value) |
| template<> | |
| bool | bcf_is_vector_end_value< float > (const float &value) |
| template<class ITER > | |
| const uint8_t * | cache_and_advance_to_end_if_necessary (const uint8_t *current_ptr, const uint8_t *end_ptr, ITER &it) |
| advances current ptr to end of the vector if the current element is bcf_*_vector_end More... | |
| int32_t | convert_data_to_integer (const uint8_t *data_ptr, const int index, const uint8_t num_bytes_per_value, const VariantFieldType &type) |
| converts the value in an index from the byte array into int32_t More... | |
| float | convert_data_to_float (const uint8_t *data_ptr, const int index, const uint8_t num_bytes_per_value, const VariantFieldType &type) |
| converts the value in an index from the byte array into float More... | |
| std::string | convert_data_to_string (const uint8_t *data_ptr, const int index, const uint8_t num_bytes_per_value, const VariantFieldType &type) |
| converts the value in an index from the byte array into string More... | |
| uint8_t | size_for_type (const VariantFieldType &type, const bcf_fmt_t *const format_ptr) |
| returns the number of bytes for a given VariantFieldType More... | |
| uint8_t | size_for_type (const VariantFieldType &type, const bcf_info_t *const info_ptr) |
| returns the number of bytes for a given VariantFieldType More... | |
| bool | is_string_type (const int32_t &type) |
| |
| template<typename... T> | |
| auto | zip (const T &...containers) -> boost::iterator_range< boost::zip_iterator< decltype(boost::make_tuple(containers.begin()...))>> |
| utility method to zip iterators together with simpler syntax than boost More... | |
Variables | |
| const uint8_t | bcf_type_sizes [] = { 0, 1, 2, 4, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 } |
utility functions for the gamgee library
This namespace includes typical functors and templates for DNA sequence manipulation as well as low level routines for htslib memory management
| using gamgee::utils::CombineAllelesLUT = typedef MergedVCFAllelesIdxLUT<true,true> |
|
strong |
an enumeration of the types in htslib for the format field values
| Enumerator | |
|---|---|
| NIL | |
| INT8 | |
| INT16 | |
| INT32 | |
| FLOAT | |
| STRING | |
|
inline |
Counts the genotype alleles.
| format_ptr | The GT field from the line. |
|
inline |
|
inline |
Returns the genotype allele at position allele_index.
| format_ptr | The GT field from the line. |
| data_ptr | The GT for this sample. |
| allele_index | The index within the GT field to return an integer allele. |
| string gamgee::utils::allele_key_to_string | ( | const std::shared_ptr< bcf1_t > & | body, |
| const int32_t | key_index | ||
| ) |
Returns the genotype allele string from this line.
| body | The shared memory variant "line" from a vcf, or bcf. |
| key_index | The integer representation of the allele within this "line". |
| vector< int32_t > gamgee::utils::allele_keys | ( | const std::shared_ptr< bcf1_t > & | body, |
| const bcf_fmt_t *const | format_ptr, | ||
| const uint8_t * | data_ptr, | ||
| const TYPE | missing, | ||
| const TYPE | vector_end | ||
| ) |
| vector< int32_t > gamgee::utils::allele_keys | ( | const std::shared_ptr< bcf1_t > & | body, |
| const bcf_fmt_t *const | format_ptr, | ||
| const uint8_t * | data_ptr | ||
| ) |
Returns the genotype allele keys.
| body | The shared memory variant "line" from a vcf, or bcf. |
| format_ptr | The GT field from the line. |
| data_ptr | The GT for this sample. |
| bool gamgee::utils::allele_missing | ( | const uint8_t * | data_ptr, |
| const uint32_t | allele_index, | ||
| const TYPE | missing | ||
| ) |
| bool gamgee::utils::allele_missing | ( | const bcf_fmt_t *const | format_ptr, |
| const uint8_t * | data_ptr, | ||
| const uint32_t | allele_index | ||
| ) |
Returns true if the allele at position allele_index is missing.
| format_ptr | The GT field from the line. |
| data_ptr | The GT for this sample. |
| allele_index | The index within the GT field to return an integer allele. |
| vector< string > gamgee::utils::allele_strings | ( | const std::shared_ptr< bcf1_t > & | body, |
| const bcf_fmt_t *const | format_ptr, | ||
| const uint8_t * | data_ptr, | ||
| const TYPE | missing, | ||
| const TYPE | vector_end | ||
| ) |
| vector< string > gamgee::utils::allele_strings | ( | const std::shared_ptr< bcf1_t > & | body, |
| const bcf_fmt_t *const | format_ptr, | ||
| const uint8_t * | data_ptr | ||
| ) |
Returns the genotype allele strings.
| body | The shared memory variant "line" from a vcf, or bcf. |
| format_ptr | The GT field from the line. |
| data_ptr | The GT for this sample. |
|
inline |
|
inline |
: Check whether two float values from VCF fields are equal
|
inline |
|
inline |
|
inline |
| uint8_t gamgee::utils::bcf_type_to_element_size | ( | const int32_t | htslib_type | ) |
Returns the number of bytes required to store each BCF_BT_* type.
|
inline |
advances current ptr to end of the vector if the current element is bcf_*_vector_end
|
inline |
checks that an index is greater than or equal to size
| index | the index between 0 and size to check |
| size | one past the maximum valid index |
| prefix_msg | additional string to prefix error message |
| throws | an out_of_bounds exception if index is out of limits |
<
|
inline |
checks that an index is greater than or equal to size
| index | the index between 0 and size to check |
| size | one past the maximum valid index |
| throws | an out_of_bounds exception if index is out of limits |
| std::string gamgee::utils::complement | ( | std::string & | sequence | ) |
calculates the complement of a sequence in-place
| sequence | the sequence to turn into the complement |
| std::string gamgee::utils::complement | ( | const std::string & | sequence | ) |
calculates the complement of a sequence
| sequence | the sequence to get the complement from |
| char gamgee::utils::complement | ( | const char | base | ) |
calculates the complement of a base
| base | the base to get the complement from |
| char gamgee::utils::complement_base | ( | const char | base | ) |
| float gamgee::utils::convert_data_to_float | ( | const uint8_t * | data_ptr, |
| const int | index, | ||
| const uint8_t | num_bytes_per_value, | ||
| const VariantFieldType & | type | ||
| ) |
converts the value in an index from the byte array into float
| int32_t gamgee::utils::convert_data_to_integer | ( | const uint8_t * | data_ptr, |
| const int | index, | ||
| const uint8_t | num_bytes_per_value, | ||
| const VariantFieldType & | type | ||
| ) |
converts the value in an index from the byte array into int32_t
The byte array's underlying data representation is record specific, meaning that even numbers (like Integer) can be represented multiple ways across registers (some with uint8_t others with uint32_t...) dictated by the the maximum value in the field.
This member function provides correct index location and appropriately creates a new value of VALUE_TYPE to return to the user.
| Will | throw std::invalid_argument exception if trying to create a string out of a numeric format or vice-versa. All numeric type conversions are internally truncated or expanded accordingly. |
| std::string gamgee::utils::convert_data_to_string | ( | const uint8_t * | data_ptr, |
| const int | index, | ||
| const uint8_t | num_bytes_per_value, | ||
| const VariantFieldType & | type | ||
| ) |
converts the value in an index from the byte array into string
|
inline |
| std::vector< std::string > gamgee::utils::hts_string_array_to_vector | ( | const char *const *const | string_array, |
| const uint32_t | array_size | ||
| ) |
converts an array of c-strings into a vector<string>
Useful in many hts structs where a list of names is stored as a char** and we want to manipulate it in a vector<string> to maintain data contiguity and improve usability
helper function to translate an index into a string in the filter list
| header | a VariantHeader htslib pointer |
| body | a Variant htslib pointer |
| index | the index of the filter you want access to |
| kstring_t gamgee::utils::initialize_htslib_buffer | ( | const uint32_t | initial_capacity | ) |
Returns a newly-allocated kstring_t buffer suitable for passing to htslib.
The returned buffer is safe for htslib to call realloc() on, since it is initially allocated with malloc().
|
inline |
|
inline |
| uint8_t gamgee::utils::int_encoded_type | ( | const int32_t | min_val, |
| const int32_t | max_val | ||
| ) |
Given a min and max value, determines whether int8, int16, or int32 BCF encoding is required.
|
inline |
wraps a pre-allocated htsFile in a shared_ptr with correct deleter
| hts_file_ptr | an htslib raw file pointer |
wraps a pre-allocated hts_idx_t in a shared_ptr with correct deleter
| hts_index_ptr | an htslib raw file index pointer |
wraps a pre-allocated hts_itr_t in a shared_ptr with correct deleter
| hts_itr_ptr | an htslib raw file iterator pointer |
| std::shared_ptr< std::ifstream > gamgee::utils::make_shared_ifstream | ( | std::ifstream * | ifstream_ptr | ) |
wraps a pre-allocated ifstream in a shared_ptr with correct deleter
| ifstream_ptr | an ifstream raw file pointer |
| std::shared_ptr< std::ifstream > gamgee::utils::make_shared_ifstream | ( | std::string | filename | ) |
wraps an input file in a shared_ptr to an ifstream with correct deleter
| filename | the input filename |
wraps a pre-allocated bam1_t in a shared_ptr with correct deleter
| sam_ptr | an htslib raw bam pointer |
wraps a pre-allocated bam_hdr_t in a shared_ptr with correct deleter
| sam_header_ptr | an htslib raw bam header pointer |
| std::shared_ptr< bcf_srs_t > gamgee::utils::make_shared_synced_variant_reader | ( | bcf_srs_t * | synced_reader_ptr | ) |
wraps a pre-allocated bcf_srs_t in a shared_ptr with correct deleter
| synced_reader_ptr | an htslib synced BCF reader pointer |
wraps a pre-allocated bcf1_t in a shared_ptr with correct deleter
| bcf_ptr | an htslib raw vcf pointer |
wraps a pre-allocated bcf_hdr_t in a shared_ptr with correct deleter
| bcf_hdr_ptr | an htslib raw variant header pointer |
| std::unique_ptr< htsFile, HtsFileDeleter > gamgee::utils::make_unique_hts_file | ( | htsFile * | hts_file_ptr | ) |
wraps a pre-allocated htsFile in a unique_ptr with correct deleter
| hts_file_ptr | an htslib raw file pointer |
| std::unique_ptr< hts_itr_t, HtsIteratorDeleter > gamgee::utils::make_unique_hts_itr | ( | hts_itr_t * | hts_itr_ptr | ) |
wraps a pre-allocated hts_itr_t in a unique_ptr with correct deleter
| hts_itr_ptr | an htslib raw file iterator pointer |
| std::string gamgee::utils::reverse_complement | ( | const std::string & | sequence | ) |
calculates the reverse complement of a sequence
| sequence | the sequence to get the reverse complement from |
creates a deep copy of an existing bam1_t
| original | an htslib raw bam pointer |
creates a deep copy of an existing bam_hdr_t
| original | an htslib raw bam header pointer |
creates a shallow copy of an existing bam1_t: copies core fields but not the data buffer or fields related to the size of the data buffer
| uint8_t gamgee::utils::size_for_type | ( | const VariantFieldType & | type, |
| const bcf_fmt_t *const | format_ptr | ||
| ) |
returns the number of bytes for a given VariantFieldType
| uint8_t gamgee::utils::size_for_type | ( | const VariantFieldType & | type, |
| const bcf_info_t *const | info_ptr | ||
| ) |
returns the number of bytes for a given VariantFieldType
creates a deep copy of an existing bcf1_t
| original | an htslib raw bcf pointer |
creates a deep copy of an existing bcf_hdr_t
| original | an htslib raw bcf header pointer |
| auto gamgee::utils::zip | ( | const T &... | containers | ) | -> boost::iterator_range<boost::zip_iterator<decltype(boost::make_tuple(containers.begin()...))>> |
utility method to zip iterators together with simpler syntax than boost
This is a wrapper over boost's zip_iterator interface to simplify the usage of zip iterators especially in for each loops. This function enables the following syntax:
for more details look at boost's zip_iterator documentation.
| const uint8_t gamgee::utils::bcf_type_sizes[] = { 0, 1, 2, 4, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 } |
1.8.8