Getting Started

  1. Install Hail:

    pip install hail
    
  2. Use hailctl to start a Google Dataproc cluster with the gnomad package installed (see Hail on the Cloud for more detail on hailctl):

    hailctl dataproc start cluster-name --packages gnomad
    
  3. Connect to a Jupyter Notebook on the cluster:

    hailctl dataproc connect cluster-name notebook
    
  4. Import gnomAD data in Hail Table format:

    • gnomAD v2.1.1 variants:

      from gnomad.resources.grch37 import gnomad
      
      gnomad_v2_exomes = gnomad.public_release("exomes")
      exomes_ht = gnomad_v2_exomes.ht()
      exomes_ht.describe()
      
      gnomad_v2_genomes = gnomad.public_release("genomes")
      genomes_ht = gnomad_v2_genomes.ht()
      genomes_ht.describe()
      
    • gnomAD v3 variants:

      from gnomad.resources.grch38 import gnomad
      gnomad_v3_genomes = gnomad.public_release("genomes")
      ht = gnomad_v3_genomes.ht()
      ht.describe()
      
  5. Shut down the cluster when finished with it:

    hailctl dataproc stop cluster-name