gnomad.sample_qc.sex

gnomad.sample_qc.sex.adjusted_sex_ploidy_expr(...)

Create an entry expression to convert XY to haploid on non-PAR X/Y and XX to missing on Y.

gnomad.sample_qc.sex.adjust_sex_ploidy(mt, ...)

Convert males to haploid on non-PAR X/Y, sets females to missing on Y.

gnomad.sample_qc.sex.gaussian_mixture_model_karyotype_assignment(sex_ht)

Annotate the input Table with an X karyotype, Y karyotype, and sex karyotype based on a gaussian mixture model.

gnomad.sample_qc.sex.get_ploidy_cutoffs(ht)

Get chromosome X and Y ploidy cutoffs for XY and XX samples.

gnomad.sample_qc.sex.get_chr_x_hom_alt_cutoffs(ht, ...)

Get cutoffs for the fraction homozygous alternate genotypes on chromosome X in 'XY' and 'XX' samples.

gnomad.sample_qc.sex.get_sex_expr(...[, ...])

Create a struct with X_karyotype, Y_karyotype, and sex_karyotype.

gnomad.sample_qc.sex.adjusted_sex_ploidy_expr(locus_expr, gt_expr, karyotype_expr, xy_karyotype_str='XY', xx_karyotype_str='XX')[source]

Create an entry expression to convert XY to haploid on non-PAR X/Y and XX to missing on Y.

Parameters:
  • locus_expr (LocusExpression) – Locus expression.

  • gt_expr (CallExpression) – Genotype expression.

  • karyotype_expr (StringExpression) – Sex karyotype expression.

  • xy_karyotype_str (str) – String representing XY karyotype. Default is “XY”.

  • xx_karyotype_str (str) – String representing XX karyotype. Default is “XX”.

Return type:

CallExpression

Returns:

Genotype adjusted for sex ploidy.

gnomad.sample_qc.sex.adjust_sex_ploidy(mt, sex_expr, male_str='male', female_str='female')[source]

Convert males to haploid on non-PAR X/Y, sets females to missing on Y.

Parameters:
  • mt (MatrixTable) – Input MatrixTable

  • sex_expr (StringExpression) – Expression pointing to sex in MT (if not male_str or female_str, no change)

  • male_str (str) – String for males (default ‘male’)

  • female_str (str) – String for females (default ‘female’)

Return type:

MatrixTable

Returns:

MatrixTable with fixed ploidy for sex chromosomes

gnomad.sample_qc.sex.gaussian_mixture_model_karyotype_assignment(sex_ht, chrx_ploidy_expr='chrX_ploidy', chry_ploidy_expr='chrY_ploidy', karyotype_output_prefix='gmm')[source]

Annotate the input Table with an X karyotype, Y karyotype, and sex karyotype based on a gaussian mixture model.

This function uses two component Gaussian mixture models on chrx_ploidy_expr and chry_ploidy_expr to assign an X karyotype and a Y karyotype which are then combined into the sex karyotype.

The following annotations are added:
  • {karyotype_output_prefix}_x_karyotype

  • {karyotype_output_prefix_y_karyotype

  • {karyotype_output_prefix}_karyotype = {karyotype_output_prefix}_x_karyotype + {karyotype_output_prefix}_y_karyotype

Note

This uses a two component Gaussian mixture model so all samples are given one of the following sex karyotypes: X, XX, XY, YY. It’s recommended that this annotation is only used to split samples into XX and XY groups that can then be used in get_ploidy_cutoffs to determine XX and XY ploidy means and stdevs.

Parameters:
  • sex_ht (Table) – Input Table with chromosome X and chromosome Y ploidy values.

  • chrx_ploidy_expr (Union[NumericExpression, str]) – Expression pointing to chromosome X ploidy in sex_ht. Default is ‘chrX_ploidy’.

  • chry_ploidy_expr (Union[NumericExpression, str]) – Expression pointing to chromosome Y ploidy in sex_ht. Default is ‘chrY_ploidy’.

  • karyotype_output_prefix (str) – String to use as the prefix for the Gaussian mixture model karyotype output. Default is ‘gmm’.

Return type:

Table

Returns:

Input Table with Gaussian mixture model karyotype annotations added.

gnomad.sample_qc.sex.get_ploidy_cutoffs(ht, f_stat_cutoff=None, normal_ploidy_cutoff=5, aneuploidy_cutoff=6, group_by_expr=None)[source]

Get chromosome X and Y ploidy cutoffs for XY and XX samples.

Note

This assumes the input hail Table has the fields chrX_ploidy, and chrY_ploidy, and f_stat if f_stat_cutoff is set.

Return a tuple of sex chromosome ploidy cutoffs: ((x_ploidy_cutoffs), (y_ploidy_cutoffs)). x_ploidy_cutoffs: (upper cutoff for single X, (lower cutoff for double X, upper cutoff for double X), lower cutoff for triple X) y_ploidy_cutoffs: ((lower cutoff for single Y, upper cutoff for single Y), lower cutoff for double Y)

Uses the normal_ploidy_cutoff parameter to determine the ploidy cutoffs for XX and XY karyotypes. Uses the aneuploidy_cutoff parameter to determine the cutoffs for sex aneuploidies.

Note

f_stat_cutoff or group_by_expr must be supplied. If f_stat_cutoff is supplied then f-stat is used to split the samples into roughly ‘XX’ and ‘XY’. If group_by_expr is supplied instead, then it must include an annotation grouping samples by ‘XX’ and ‘XY’. These are both only used to divide samples into XX and XY to determine means and standard deviations for these categories and are not used in the final karyotype annotation.

Parameters:
  • ht (Table) – Table with f_stat and sex chromosome ploidies

  • f_stat_cutoff (float) – f-stat to roughly divide ‘XX’ from ‘XY’ samples. Assumes XX samples are below cutoff and XY are above cutoff.

  • normal_ploidy_cutoff (int) – Number of standard deviations to use when determining sex chromosome ploidy cutoffs for XX, XY karyotypes.

  • aneuploidy_cutoff (int) – Number of standard deviations to use when sex chromosome ploidy cutoffs for aneuploidies.

  • group_by_expr (StringExpression) – Expression grouping samples into ‘XX’ and ‘XY’. Can be used instead of and f_stat_cutoff.

Return type:

Tuple[Tuple[float, Tuple[float, float], float], Tuple[Tuple[float, float], float]]

Returns:

Tuple of ploidy cutoff tuples: ((x_ploidy_cutoffs), (y_ploidy_cutoffs))

gnomad.sample_qc.sex.get_chr_x_hom_alt_cutoffs(ht, chr_x_frac_hom_alt_expr, f_stat_cutoff=None, group_by_expr=None, cutoff_stdev=5)[source]

Get cutoffs for the fraction homozygous alternate genotypes on chromosome X in ‘XY’ and ‘XX’ samples.

Note

This assumes the input hail Table has the field ‘f_stat’ if f_stat_cutoff is set.

Return a tuple of cutoffs for the fraction of homozygous alternate genotypes (hom-alt/(hom-alt + het)) on chromosome X: ((lower cutoff for more than one X, upper cutoff for more than one X), lower cutoff for single X).

Uses the cutoff_stdev parameter to determine the fraction of homozygous alternate genotypes (hom-alt/(hom-alt + het)) on chromosome X cutoffs for ‘XX’ and ‘XY’ karyotypes.

Note

f_stat_cutoff or group_by_expr must be supplied. If f_stat_cutoff is supplied then f-stat is used to split the samples into roughly ‘XX’ and ‘XY’. If group_by_expr is supplied instead, then it must include an annotation grouping samples by ‘XX’ and ‘XY’. These are both only used to divide samples into XX and XY to determine means and standard deviations for these categories and are not used in the final karyotype annotation.

Parameters:
  • ht (Table) – Table with f_stat and fraction of homozygous alternate genotypes on chromosome X.

  • chr_x_frac_hom_alt_expr (NumericExpression) – Fraction of homozygous alternate genotypes (hom-alt/(hom-alt + het)) on chromosome X.

  • f_stat_cutoff (float) – f-stat to roughly divide ‘XX’ from ‘XY’ samples. Assumes XX samples are below cutoff and XY are above cutoff.

  • group_by_expr (StringExpression) – Expression grouping samples into ‘XX’ and ‘XY’. Can be used instead of f_stat_cutoff.

  • cutoff_stdev (int) – Number of standard deviations to use when determining sex chromosome ploidy cutoffs for XX, XY karyotypes.

Return type:

Tuple[Tuple[float, float], float]

Returns:

Tuple of cutoffs: ((lower cutoff for more than one X, upper cutoff for more than one X), lower cutoff for single X).

gnomad.sample_qc.sex.get_sex_expr(chr_x_ploidy, chr_y_ploidy, x_ploidy_cutoffs, y_ploidy_cutoffs, chr_x_frac_hom_alt_expr=None, chr_x_frac_hom_alt_cutoffs=None)[source]

Create a struct with X_karyotype, Y_karyotype, and sex_karyotype.

Note that X0 is currently returned as ‘X’.

Parameters:
  • chr_x_ploidy (NumericExpression) – Chromosome X ploidy (or relative ploidy).

  • chr_y_ploidy (NumericExpression) – Chromosome Y ploidy (or relative ploidy).

  • x_ploidy_cutoffs (Tuple[float, Tuple[float, float], float]) – Tuple of X chromosome ploidy cutoffs: (upper cutoff for single X, (lower cutoff for double X, upper cutoff for double X), lower cutoff for triple X).

  • y_ploidy_cutoffs (Tuple[Tuple[float, float], float]) – Tuple of Y chromosome ploidy cutoffs: ((lower cutoff for single Y, upper cutoff for single Y), lower cutoff for double Y).

  • chr_x_frac_hom_alt_expr (Optional[NumericExpression]) – Fraction of homozygous alternate genotypes (hom-alt/(hom-alt + het)) on chromosome X.

  • chr_x_frac_hom_alt_cutoffs (Optional[Tuple[Tuple[float, float], float]]) – Tuple of cutoffs for the fraction of homozygous alternate genotypes (hom-alt/(hom-alt + het)) on chromosome X: ((lower cutoff for more than one X, upper cutoff for more than one X), lower cutoff for single X).

Return type:

StructExpression

Returns:

Struct containing X_karyotype, Y_karyotype, and sex_karyotype.