Skip to main content

Consortia Data Processing

Brain Initiative Cell Census Network Processing

The Smart-seq2 Single Nucleus Multi-Sample (Multi-snSS2) pipeline supports data processing for the BRAIN Initiative Cell Census Network (BICCN). An overview of the BICCN pipeline resources is available on the BICCN's Pipelines page.

Multi-snSS2 reference files for BICCN data processing

The BICCN 2.0 Whole Mouse Brain Working Group uses the Ensembl GRCm38 reference for alignment and a modified GTF for gene annotation (see table below). All Multi-snSS2 pipeline reference inputs were created with the BuildIndices workflow.

BICCN processes single-nucleus data, which is enriched in pre-mRNAs containing introns. To account for this, the Multi-snSS2 workflow counts reads that map to both exonic and intronic regions (any part of a contig that is not exonic nor intergenic). The BuildIndices workflow uses the BuildStarSingleNucleus task to add intron annotations to the GTF with a custom python script. These annotations enable intron counting with the featureCounts software.

The custom GTF contains all annotations for any gene_id that has at least one transcript. This reduces the number of genes in the GTF to ~32,000.

All reference files are available in a public Google bucket (see table below) and are accompanied by a README that details reference provenance (gs://gcp-public-data--broad-references/mm10/v0/README_mm10_singlecell_gencode.txt).

Multi-snSS2 reference input nameGoogle bucket URIReference sourceDescription
annotations_gtfgs://gcp-public-data--broad-references/mm10/v0/single_nucleus/modified_gencode.vM23.primary_assembly.annotation.gtfhttps://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.annotation.gtf.gzfModified GENCODE GTF including intron annotations that can be used for intron counting with featureCounts.
genome_ref_fastags://gcp-public-data--broad-references/mm10/v0/single_nucleus/modified_mm10.primary_assembly.genome.fahttps://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/GRCm38.p6.genome.fa.gzFASTA file used to create the STAR reference files.
tar_star_referencegs://gcp-public-data--broad-references/mm10/v0/single_nucleus/star/modified_star_2.7.9a_primary_gencode_mouse_vM23.tarNA — built with the BuildIndices workflow.Reference files used for alignment with STAR.
adapter_listgs://broad-gotc-test-storage/MultiSampleSmartSeq2SingleNucleus/adapters/Illumina_adapters_list.faSee Illumina's overview on adapter sequences.List of adapter sequences used for trimming.