Skip to main content

What's in the Smart-seq2 Multi Sample Pipeline Loom File?

The Loom is the Smart-seq2 Multi Sample pipeline's default cell-by-gene matrix. It is an HDF5 file generated using Loompy v.3.0.6 that is an aggregate of the individual output Loom files from the Smart-seq2 Single Sample pipeline.

Overall, the Loom contains global attributes detailing information about the Loom and workflow used to generate it (Table 1), column attributes detailing metrics for individual cells (Table 2), and row metrics detailing metrics for individual genes (Table 3).

The matrix contains the calculated gene TPMs from the RSEM software and there is an additional layer containing RSEM expected_counts (named "estimated_counts" in the Loom).

The tables below document the Loom metrics, list which tools generate them, and define them.

Table 1. Global Attributes

The global attributes in the Loom apply to the whole file, not any specific part. The global attributes corresponding to the project_id, project_name, library, species, and organ workflow inputs are named according to the metadata used for processing data from the Human Cell Atlas.

AttributeDetails
LOOM_SPEC_VERSIONString with the loom file spec version
CreationDateDate Loom file was generated
pipeline_versionWorkflow version number
batch_idReadout of the string used for the batch_id workflow input
batch_nameReadout of the optional string used for the batch_name workflow input
library_preparation_protocol.library_construction_approachReadout of the optional string used for the library workflow input
donor_organism.genus_speciesReadout of the optional string used for the species workflow input
specimen_from_organism.organReadout of the optional string used for the organ workflow input
project.provenance.document_idReadout of the optional string used for the project workflow input
project.project_core.project_short_nameReadout of the optional string used for the project_name input

Table 2. Column Attributes (Cell Metrics)

Cell MetricsProgramDetails
CellID---The unique identifier for each cell; read from the input_id
cell_names---The unique identifier for each cell; read from the input_id and identical to CellID
input_id---The input_id listed in the pipeline configuration file
input_id_metadata_field---Optional identifier for the metadata field used for the input_id
input_name---The optional string provided in the pipeline configuration to further identify samples
input_name_metadata_field---Optional identifier for the metadata field used for the input_name
ACCUMULATION_LEVELPicardSee Picard documentation
ALIGNED_READSPicardSee Picard documentation
AT_DROPOUTPicardSee Picard documentation
Aligned 0 timeHISAT2Number and percent reads aligned 0 times
Aligned 1 timeHISAT2Number and percent reads aligned 1 time
Aligned >1 timesHISAT2Number and percent reads aligned more than 1 time
BAD_CYCLES.UNPAIREDPicardSee Picard documentation
CODING_BASESPicardSee Picard documentation
CORRECT_STRAND_READSPicardSee Picard documentation
ESTIMATED_LIBRARY_SIZEPicardSee Picard documentation
GC_DROPOUTPicardSee Picard documentation
GC_NC_0_19PicardSee Picard documentation
GC_NC_20_39PicardSee Picard documentation
GC_NC_40_59PicardSee Picard documentation
GC_NC_60_79PicardSee Picard documentation
GC_NC_80_100PicardSee Picard documentation
IGNORED_READSPicardSee Picard documentation
INCORRECT_STRAND_READSPicardSee Picard documentation
INTERGENIC_BASESPicardSee Picard documentation
INTRONIC_BASESPicardSee Picard documentation
MEAN_READ_LENGTH.UNPAIREDPicardSee Picard documentation
MEDIAN_3PRIME_BIASPicardSee Picard documentation
MEDIAN_5PRIME_BIASPicardSee Picard documentation
MEDIAN_5PRIME_TO_3PRIME_BIASPicardSee Picard documentation
MEDIAN_CV_COVERAGEPicardSee Picard documentation
NUM_R1_TRANSCRIPT_STRAND_READSPicardSee Picard documentation
NUM_R2_TRANSCRIPT_STRAND_READSPicardSee Picard documentation
NUM_UNEXPLAINED_READSPicardSee Picard documentation
Overall alignment rateHISAT2Overall percent of reads that aligned
PCT_ADAPTER.UNPAIREDPicardSee Picard documentation
PCT_CHIMERAS.UNPAIREDPicardSee Picard documentation
PCT_CODING_BASESPicardSee Picard documentation
PCT_CORRECT_STRAND_READSPicardSee Picard documentation
PCT_INTERGENIC_BASESPicardSee Picard documentation
PCT_INTRONIC_BASESPicardSee Picard documentation
PCT_MRNA_BASESPicardSee Picard documentation
PCT_PF_READS.UNPAIREDPicardSee Picard documentation
PCT_PF_READS_ALIGNED.UNPAIREDPicardSee Picard documentation
PCT_PF_READS_IMPROPER_PAIRS.UNPAIREDPicardSee Picard documentation
PCT_R1_TRANSCRIPT_STRAND_READSPicardSee Picard documentation
PCT_R2_TRANSCRIPT_STRAND_READSPicardSee Picard documentation
PCT_READS_ALIGNED_IN_PAIRS.UNPAIREDPicardSee Picard documentation
PCT_RIBOSOMAL_BASESPicardSee Picard documentation
PCT_USABLE_BASESPicardSee Picard documentation
PCT_UTR_BASESPicardSee Picard documentation
PERCENT_DUPLICATIONPicardSee Picard documentation
PF_ALIGNED_BASESPicardSee Picard documentation
PF_ALIGNED_BASES.UNPAIREDPicardSee Picard documentation
PF_BASESPicardSee Picard documentation
PF_HQ_ALIGNED_BASES.UNPAIREDPicardSee Picard documentation
PF_HQ_ALIGNED_Q20_BASES.UNPAIREDPicardSee Picard documentation
PF_HQ_ALIGNED_READS.UNPAIREDPicardSee Picard documentation
PF_HQ_ERROR_RATE.UNPAIREDPicardSee Picard documentation
PF_HQ_MEDIAN_MISMATCHES.UNPAIREDPicardSee Picard documentation
PF_INDEL_RATE.UNPAIREDPicardSee Picard documentation
PF_MISMATCH_RATE.UNPAIREDPicardSee Picard documentation
PF_NOISE_READS.UNPAIREDPicardSee Picard documentation
PF_READS.UNPAIREDPicardSee Picard documentation
PF_READS_ALIGNED.UNPAIREDPicardSee Picard documentation
PF_READS_IMPROPER_PAIRS.UNPAIREDPicardSee Picard documentation
READS_ALIGNED_IN_PAIRS.UNPAIREDPicardSee Picard documentation
READS_USEDPicardSee Picard documentation
READ_PAIRS_EXAMINEDPicardSee Picard documentation
READ_PAIR_DUPLICATESPicardSee Picard documentation
READ_PAIR_OPTICAL_DUPLICATESPicardSee Picard documentation
RIBOSOMAL_BASESPicardSee Picard documentation
SECONDARY_OR_SUPPLEMENTARY_RDSPicardSee Picard documentation
STRAND_BALANCE.UNPAIREDPicardSee Picard documentation
TOTAL_CLUSTERSPicardSee Picard documentation
TOTAL_READS.UNPAIREDPicardSee Picard documentation
Total readsHISAT2Total number of aligned reads
UNMAPPED_READSPicardSee Picard documentation
UNPAIRED_READS_EXAMINEDPicardSee Picard documentation
UNPAIRED_READ_DUPLICATESPicardSee Picard documentation
UTR_BASESPicardSee Picard documentation
WINDOW_SIZEPicardSee Picard documentation
alignable readsRSEM cnt fileThe number of alignable reads
filtered readsRSEM cnt fileThe number of filtered reads due to too many alignments
multiple mappedRSEM cnt fileThe number of reads aligned to multiple genes
strandRSEM cnt fileThe RSEM read_type; describes if data is single- or paired-end
total alignmentsRSEM cnt fileThe RSEM nHits; the number of total alignments
total readsRSEM cnt fileThe number of total alignable reads
unalignable readsRSEM cnt fileThe number of reads unalignable
uncertain readsRSEM cnt fileThe number of reads aligned to multiple locations
unique alignedRSEM cnt fileThe number of reads uniquely alignable to one gene

Table 3. Row Attributes (Gene Metrics)

Gene MetricsProgramDetails
ensembl_idsGENCODE GTFThe gene_id listed in the GENCODE GTF
gene_namesGENCODE GTFThe unique gene_name provided in the GENCODE GTF
GeneGENCODE GTFThe unique gene_name provided in the GENCODE GTF; identical to attribute in gene_names