Technologies
Celldega utilizes a suite of complementary technologies to develop an efficient web-based spatial-omics analysis and visualization toolkit.
Visualization Technologies
Spatial transcriptomics (ST) datasets can be very large and difficult for researchers to analyze and visualize collaboratively. Additionally, visualization that is linked to analysis is key to extracting biological insights. To address these issues, we built the Celldega viz
module to help researchers interactively visualize large ST datasets within notebook-based workflows on the cloud (e.g., Terra.bio).
The Celldega Landscape visualization method (see Gallery) utilizes novel vector tiling approaches to enable interactive visualization of large ST datasets in a notebook environment or as a stand-alone webpage. This approach allows Celldega to visualize larger datasets than currently available open-source tools (e.g., datasets with hundreds of millions of transcripts). We also utilize modern web image data formats (WebP) to reduce the data storage burden for interactive visualization. The resulting LandscapeFiles data format serves as a compact and highly performant visualization-specific data format.
Terra.bio
Terra.bio is a cloud-based compute and data storage platform that is being developed by the Broad Institute of MIT and Harvard. We are utilizing Terra.bio to help Spatial Technology Platform clients access, analyze, and visualize their ST data.
Jupyter Widget
We utilize the Jupyter Widget ecosystem to build interactive spatial and data visualizations that enable users to perform two way communication between JavaScript (front-end) and Python (back-end). We are utilizing the AnyWidget implementation to build our custom widgets.
Deck.gl
Celldega uses the GPU-powered data visualization library deck.gl to create high-performance spatial- and data-visualizations.
Apache Parquet
Celldega uses the Apache Parquet file format for storing vectorized spatial data and metadata. This file format in combination with the JavaScript library ParquetWASM and Apache Arrow in memory representation is used to build Celldega's high-performance vector tiling spatial visualization functionality (see GeoArrow and GeoParquet in deck.gl).
ParquetWASM and Apache Arrow
ParquetWASM is a JavaScript library for reading Parquet files into Apache Arrow memory and utilizes Web Assembly (WASM) to run Rust in a browser environment. The Apache Arrow in-memory format is a columnar in-memory format that is used for storing data from Apache Parquet files and efficiently passing to deck.gl. For more information please see GeoArrow and GeoParquet in deck.gl.
WebP
A modern image format developed by Google, offering efficient lossless compression and designed specifically for the web.
Deep Zoom
We utilize the Deep Zoom image schema, developed by Microsoft, to enable efficient visualization of large multi-channel microscopy images. Deep Zoom tile images are stored using the WebP image format.
Clustergrammer Visualization Approaches
The Celldega Matrix visualization builds upon the visualization approaches developed in the Clustergrammer project. This enables users to interactively explore high-dimensional datasets (e.g., single-cell gene expression data) alongside spatial data (e.g., cell distributions within a tissue).
Data Analysis Technologies
Scanpy and Squidpy
Celldega is built to interface with the AnnData and SpatialData objects, which enables users to easily import analysis results from Scanpy and Squidpy, respectively, into Celldega for downstream analysis and/or visuaization.
GeoPandas
Celldega uses GeoPandas for efficient spatial operations and storing collections of spatial objects (e.g., neighborhood multi-polygons) as GeoDataFrames.
LibPySal: Python Spatial Analysis Library Core
Celldega uses the Python Spatial Analysis Library (libpysal) for spatial analysis - namely for calculating alpha shape cell type neighborhoods.
Clustergrammer Data Analysis Approaches
The Celldega Cluster module build upon the hierarchical clustering approaches developed in the Clustergrammer project. This enables users to perform hierarchical clustering on observations (e.g., single cells) and measurements (e.g., genes) and easily visualize these two orthogonal clustering results interactively using Celldega's Matrix visualization method.