import polars as pl
from Bio import Entrez
from broad_babel.query import get_mapper
Downloading data from 'https://zenodo.org/records/12211976/files/babel.db' to file '/home/runner/.cache/pooch/2eaa6a2f4915f72d7100683f53982ed8-babel.db'.
This how-to focuses on linking gene names from the NCBI databases. Whilst not JUMP-specific, it is useful to fetch more information on perturbations that our analysis deem important without having to manually search them. We will use Biopython, this only explores a subset of the options, the full Entrez documentation, which contains all the options, is a useful reference to bookmark. ## Procedure
import polars as pl
from Bio import Entrez
from broad_babel.query import get_mapper
Downloading data from 'https://zenodo.org/records/12211976/files/babel.db' to file '/home/runner/.cache/pooch/2eaa6a2f4915f72d7100683f53982ed8-babel.db'.
We define the fields that we need and an email to provide to the server we will query.
= "example@email.com"
Entrez.email = (
fields "Name",
"Description",
"Summary",
"OtherDesignations", # This gives us synonyms
)
As an example, we will use a set of genes that we found in a JUMP cluster.
= ("CHRM4", "SCAPER", "GPR176", "LY6K") genes
Get a dictionary that maps Gene symbols to Entrez IDs
= get_mapper(
ids =genes,
query="standard_key",
input_column="standard_key,NCBI_Gene_ID",
output_columns
)
# Fetch the summaries for these genes
= []
entries for id_ in ids.values():
= Entrez.esummary(db="gene", id=id_)
stream = Entrez.read(stream)
record
entries.append("DocumentSummarySet"]["DocumentSummary"][0][k] for k in fields}
{k: record[ )
# Show the resultant information in a human-readable format
with pl.Config(fmt_str_lengths=1000):
print(pl.DataFrame(entries))
shape: (4, 4)
┌────────┬─────────────────────────────┬─────────────────────────────┬─────────────────────────────┐
│ Name ┆ Description ┆ Summary ┆ OtherDesignations │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞════════╪═════════════════════════════╪═════════════════════════════╪═════════════════════════════╡
│ GPR176 ┆ G protein-coupled receptor ┆ Members of the G ┆ G-protein coupled receptor │
│ ┆ 176 ┆ protein-coupled receptor ┆ 176|probable G-protein │
│ ┆ ┆ family, such as GPR176, are ┆ coupled receptor 176 │
│ ┆ ┆ cell surface receptors ┆ │
│ ┆ ┆ involved in responses to ┆ │
│ ┆ ┆ hormones, growth factors, ┆ │
│ ┆ ┆ and neurotransmitters (Hata ┆ │
│ ┆ ┆ et al., 1995 [PubMed ┆ │
│ ┆ ┆ 7893747]).[supplied by ┆ │
│ ┆ ┆ OMIM, Jul 2008] ┆ │
│ CHRM4 ┆ cholinergic receptor ┆ The muscarinic cholinergic ┆ muscarinic acetylcholine │
│ ┆ muscarinic 4 ┆ receptors belong to a ┆ receptor M4|acetylcholine │
│ ┆ ┆ larger family of G ┆ receptor, muscarinic 4 │
│ ┆ ┆ protein-coupled receptors. ┆ │
│ ┆ ┆ The functional diversity of ┆ │
│ ┆ ┆ these receptors is defined ┆ │
│ ┆ ┆ by the binding of ┆ │
│ ┆ ┆ acetylcholine and includes ┆ │
│ ┆ ┆ cellular responses such as ┆ │
│ ┆ ┆ adenylate cyclase ┆ │
│ ┆ ┆ inhibition, ┆ │
│ ┆ ┆ phosphoinositide ┆ │
│ ┆ ┆ degeneration, and potassium ┆ │
│ ┆ ┆ channel mediation. ┆ │
│ ┆ ┆ Muscarinic receptors ┆ │
│ ┆ ┆ influence many effects of ┆ │
│ ┆ ┆ acetylcholine in the ┆ │
│ ┆ ┆ central and peripheral ┆ │
│ ┆ ┆ nervous system. The ┆ │
│ ┆ ┆ clinical implications of ┆ │
│ ┆ ┆ this receptor are unknown; ┆ │
│ ┆ ┆ however, mouse studies link ┆ │
│ ┆ ┆ its function to adenylyl ┆ │
│ ┆ ┆ cyclase inhibition. ┆ │
│ ┆ ┆ [provided by RefSeq, Jul ┆ │
│ ┆ ┆ 2008] ┆ │
│ LY6K ┆ lymphocyte antigen 6 family ┆ Predicted to be involved in ┆ lymphocyte antigen │
│ ┆ member K ┆ binding activity of sperm ┆ 6K|cancer/testis antigen │
│ ┆ ┆ to zona pellucida. ┆ 97|lymphocyte antigen 6 │
│ ┆ ┆ Predicted to act upstream ┆ complex, locus │
│ ┆ ┆ of or within flagellated ┆ K|up-regulated in lung │
│ ┆ ┆ sperm motility. Predicted ┆ cancer 10 │
│ ┆ ┆ to be located in cell ┆ │
│ ┆ ┆ surface and plasma ┆ │
│ ┆ ┆ membrane. Predicted to be ┆ │
│ ┆ ┆ active in acrosomal ┆ │
│ ┆ ┆ vesicle. [provided by ┆ │
│ ┆ ┆ Alliance of Genome ┆ │
│ ┆ ┆ Resources, Jan 2025] ┆ │
│ SCAPER ┆ S-phase cyclin A associated ┆ Predicted to enable nucleic ┆ S phase cyclin A-associated │
│ ┆ protein in the ER ┆ acid binding activity and ┆ protein in the endoplasmic │
│ ┆ ┆ zinc ion binding activity. ┆ reticulum|zinc finger │
│ ┆ ┆ Acts upstream of or within ┆ protein 291 │
│ ┆ ┆ retina development in ┆ │
│ ┆ ┆ camera-type eye. Located in ┆ │
│ ┆ ┆ cytosol and nuclear speck. ┆ │
│ ┆ ┆ [provided by Alliance of ┆ │
│ ┆ ┆ Genome Resources, Jan 2025] ┆ │
└────────┴─────────────────────────────┴─────────────────────────────┴─────────────────────────────┘