Query information of genes

Overview

This how-to focuses on linking gene names from the NCBI databases. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to bookmark. ## Procedure

import polars as pl
from Bio import Entrez
from broad_babel.query import get_mapper

We define the fields that we need and an email to provide to the server we will query.

Entrez.email = "example@email.com"
fields = (
    "Name",
    "Description",
    "Summary",
    "OtherDesignations",  # This gives us synonyms
)

As an example, we will use a set of genes that we found in a JUMP cluster.

genes = ("CHRM4", "SCAPER", "GPR176", "LY6K")

Get a dictionary that maps Gene symbols to Entrez IDs

ids = get_mapper(
    query=genes,
    input_column="standard_key",
    output_columns="standard_key,NCBI_Gene_ID",
)

# Fetch the summaries for these genes
entries = []
for id_ in ids.values():
    stream = Entrez.esummary(db="gene", id=id_)
    record = Entrez.read(stream)

    entries.append(
        {k: record["DocumentSummarySet"]["DocumentSummary"][0][k] for k in fields}
    )
# Show the resultant information in a human-readable format
with pl.Config(fmt_str_lengths=1000):
    print(pl.DataFrame(entries))
shape: (4, 4)
┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐
│ Name   ┆ Description                 ┆ Summary                     ┆ OtherDesignations           │
│ ---    ┆ ---                         ┆ ---                         ┆ ---                         │
│ str    ┆ str                         ┆ str                         ┆ str                         │
╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡
│ GPR176 ┆ G protein-coupled receptor  ┆ Members of the G            ┆ G-protein coupled receptor  │
│        ┆ 176                         ┆ protein-coupled receptor    ┆ 176|probable G-protein      │
│        ┆                             ┆ family, such as GPR176, are ┆ coupled receptor 176        │
│        ┆                             ┆ cell surface receptors      ┆                             │
│        ┆                             ┆ involved in responses to    ┆                             │
│        ┆                             ┆ hormones, growth factors,   ┆                             │
│        ┆                             ┆ and neurotransmitters (Hata ┆                             │
│        ┆                             ┆ et al., 1995 [PubMed        ┆                             │
│        ┆                             ┆ 7893747]).[supplied by      ┆                             │
│        ┆                             ┆ OMIM, Jul 2008]             ┆                             │
│ CHRM4  ┆ cholinergic receptor        ┆ The muscarinic cholinergic  ┆ muscarinic acetylcholine    │
│        ┆ muscarinic 4                ┆ receptors belong to a       ┆ receptor M4|acetylcholine   │
│        ┆                             ┆ larger family of G          ┆ receptor, muscarinic 4      │
│        ┆                             ┆ protein-coupled receptors.  ┆                             │
│        ┆                             ┆ The functional diversity of ┆                             │
│        ┆                             ┆ these receptors is defined  ┆                             │
│        ┆                             ┆ by the binding of           ┆                             │
│        ┆                             ┆ acetylcholine and includes  ┆                             │
│        ┆                             ┆ cellular responses such as  ┆                             │
│        ┆                             ┆ adenylate cyclase           ┆                             │
│        ┆                             ┆ inhibition,                 ┆                             │
│        ┆                             ┆ phosphoinositide            ┆                             │
│        ┆                             ┆ degeneration, and potassium ┆                             │
│        ┆                             ┆ channel mediation.          ┆                             │
│        ┆                             ┆ Muscarinic receptors        ┆                             │
│        ┆                             ┆ influence many effects of   ┆                             │
│        ┆                             ┆ acetylcholine in the        ┆                             │
│        ┆                             ┆ central and peripheral      ┆                             │
│        ┆                             ┆ nervous system. The         ┆                             │
│        ┆                             ┆ clinical implications of    ┆                             │
│        ┆                             ┆ this receptor are unknown;  ┆                             │
│        ┆                             ┆ however, mouse studies link ┆                             │
│        ┆                             ┆ its function to adenylyl    ┆                             │
│        ┆                             ┆ cyclase inhibition.         ┆                             │
│        ┆                             ┆ [provided by RefSeq, Jul    ┆                             │
│        ┆                             ┆ 2008]                       ┆                             │
│ LY6K   ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen          │
│        ┆ member K                    ┆ binding activity of sperm   ┆ 6K|cancer/testis antigen    │
│        ┆                             ┆ to zona pellucida.          ┆ 97|lymphocyte antigen 6     │
│        ┆                             ┆ Predicted to act upstream   ┆ complex, locus              │
│        ┆                             ┆ of or within flagellated    ┆ K|up-regulated in lung      │
│        ┆                             ┆ sperm motility. Predicted   ┆ cancer 10                   │
│        ┆                             ┆ to be located in cell       ┆                             │
│        ┆                             ┆ surface; cytoplasm; and     ┆                             │
│        ┆                             ┆ plasma membrane. Predicted  ┆                             │
│        ┆                             ┆ to be active in acrosomal   ┆                             │
│        ┆                             ┆ vesicle. [provided by       ┆                             │
│        ┆                             ┆ Alliance of Genome          ┆                             │
│        ┆                             ┆ Resources, Apr 2022]        ┆                             │
│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │
│        ┆ protein in the ER           ┆ acid binding activity and   ┆ protein in the endoplasmic  │
│        ┆                             ┆ zinc ion binding activity.  ┆ reticulum|zinc finger       │
│        ┆                             ┆ Located in cytosol and      ┆ protein 291                 │
│        ┆                             ┆ nuclear speck. [provided by ┆                             │
│        ┆                             ┆ Alliance of Genome          ┆                             │
│        ┆                             ┆ Resources, Apr 2022]        ┆                             │
└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘