What's in the Smart-seq2 Multi Sample Pipeline Loom File?
The Loom is the Smart-seq2 Multi Sample pipeline's default cell-by-gene matrix. It is an HDF5 file generated using Loompy v.3.0.6 that is an aggregate of the individual output Loom files from the Smart-seq2 Single Sample pipeline.
Overall, the Loom contains global attributes detailing information about the Loom and workflow used to generate it (Table 1), column attributes detailing metrics for individual cells (Table 2), and row metrics detailing metrics for individual genes (Table 3).
The matrix contains the calculated gene TPMs from the RSEM software and there is an additional layer containing RSEM expected_counts (named "estimated_counts" in the Loom).
The tables below document the Loom metrics, list which tools generate them, and define them.
Table 1. Global Attributes
The global attributes in the Loom apply to the whole file, not any specific part. The global attributes corresponding to the project_id
, project_name
, library
, species
, and organ
workflow inputs are named according to the metadata used for processing data from the Human Cell Atlas.
Attribute | Details |
---|---|
LOOM_SPEC_VERSION | String with the loom file spec version |
CreationDate | Date Loom file was generated |
pipeline_version | Workflow version number |
batch_id | Readout of the string used for the batch_id workflow input |
batch_name | Readout of the optional string used for the batch_name workflow input |
library_preparation_protocol.library_construction_approach | Readout of the optional string used for the library workflow input |
donor_organism.genus_species | Readout of the optional string used for the species workflow input |
specimen_from_organism.organ | Readout of the optional string used for the organ workflow input |
project.provenance.document_id | Readout of the optional string used for the project workflow input |
project.project_core.project_short_name | Readout of the optional string used for the project_name input |
Table 2. Column Attributes (Cell Metrics)
Cell Metrics | Program | Details |
---|---|---|
CellID | --- | The unique identifier for each cell; read from the input_id |
cell_names | --- | The unique identifier for each cell; read from the input_id and identical to CellID |
input_id | --- | The input_id listed in the pipeline configuration file |
input_id_metadata_field | --- | Optional identifier for the metadata field used for the input_id |
input_name | --- | The optional string provided in the pipeline configuration to further identify samples |
input_name_metadata_field | --- | Optional identifier for the metadata field used for the input_name |
ACCUMULATION_LEVEL | Picard | See Picard documentation |
ALIGNED_READS | Picard | See Picard documentation |
AT_DROPOUT | Picard | See Picard documentation |
Aligned 0 time | HISAT2 | Number and percent reads aligned 0 times |
Aligned 1 time | HISAT2 | Number and percent reads aligned 1 time |
Aligned >1 times | HISAT2 | Number and percent reads aligned more than 1 time |
BAD_CYCLES.UNPAIRED | Picard | See Picard documentation |
CODING_BASES | Picard | See Picard documentation |
CORRECT_STRAND_READS | Picard | See Picard documentation |
ESTIMATED_LIBRARY_SIZE | Picard | See Picard documentation |
GC_DROPOUT | Picard | See Picard documentation |
GC_NC_0_19 | Picard | See Picard documentation |
GC_NC_20_39 | Picard | See Picard documentation |
GC_NC_40_59 | Picard | See Picard documentation |
GC_NC_60_79 | Picard | See Picard documentation |
GC_NC_80_100 | Picard | See Picard documentation |
IGNORED_READS | Picard | See Picard documentation |
INCORRECT_STRAND_READS | Picard | See Picard documentation |
INTERGENIC_BASES | Picard | See Picard documentation |
INTRONIC_BASES | Picard | See Picard documentation |
MEAN_READ_LENGTH.UNPAIRED | Picard | See Picard documentation |
MEDIAN_3PRIME_BIAS | Picard | See Picard documentation |
MEDIAN_5PRIME_BIAS | Picard | See Picard documentation |
MEDIAN_5PRIME_TO_3PRIME_BIAS | Picard | See Picard documentation |
MEDIAN_CV_COVERAGE | Picard | See Picard documentation |
NUM_R1_TRANSCRIPT_STRAND_READS | Picard | See Picard documentation |
NUM_R2_TRANSCRIPT_STRAND_READS | Picard | See Picard documentation |
NUM_UNEXPLAINED_READS | Picard | See Picard documentation |
Overall alignment rate | HISAT2 | Overall percent of reads that aligned |
PCT_ADAPTER.UNPAIRED | Picard | See Picard documentation |
PCT_CHIMERAS.UNPAIRED | Picard | See Picard documentation |
PCT_CODING_BASES | Picard | See Picard documentation |
PCT_CORRECT_STRAND_READS | Picard | See Picard documentation |
PCT_INTERGENIC_BASES | Picard | See Picard documentation |
PCT_INTRONIC_BASES | Picard | See Picard documentation |
PCT_MRNA_BASES | Picard | See Picard documentation |
PCT_PF_READS.UNPAIRED | Picard | See Picard documentation |
PCT_PF_READS_ALIGNED.UNPAIRED | Picard | See Picard documentation |
PCT_PF_READS_IMPROPER_PAIRS.UNPAIRED | Picard | See Picard documentation |
PCT_R1_TRANSCRIPT_STRAND_READS | Picard | See Picard documentation |
PCT_R2_TRANSCRIPT_STRAND_READS | Picard | See Picard documentation |
PCT_READS_ALIGNED_IN_PAIRS.UNPAIRED | Picard | See Picard documentation |
PCT_RIBOSOMAL_BASES | Picard | See Picard documentation |
PCT_USABLE_BASES | Picard | See Picard documentation |
PCT_UTR_BASES | Picard | See Picard documentation |
PERCENT_DUPLICATION | Picard | See Picard documentation |
PF_ALIGNED_BASES | Picard | See Picard documentation |
PF_ALIGNED_BASES.UNPAIRED | Picard | See Picard documentation |
PF_BASES | Picard | See Picard documentation |
PF_HQ_ALIGNED_BASES.UNPAIRED | Picard | See Picard documentation |
PF_HQ_ALIGNED_Q20_BASES.UNPAIRED | Picard | See Picard documentation |
PF_HQ_ALIGNED_READS.UNPAIRED | Picard | See Picard documentation |
PF_HQ_ERROR_RATE.UNPAIRED | Picard | See Picard documentation |
PF_HQ_MEDIAN_MISMATCHES.UNPAIRED | Picard | See Picard documentation |
PF_INDEL_RATE.UNPAIRED | Picard | See Picard documentation |
PF_MISMATCH_RATE.UNPAIRED | Picard | See Picard documentation |
PF_NOISE_READS.UNPAIRED | Picard | See Picard documentation |
PF_READS.UNPAIRED | Picard | See Picard documentation |
PF_READS_ALIGNED.UNPAIRED | Picard | See Picard documentation |
PF_READS_IMPROPER_PAIRS.UNPAIRED | Picard | See Picard documentation |
READS_ALIGNED_IN_PAIRS.UNPAIRED | Picard | See Picard documentation |
READS_USED | Picard | See Picard documentation |
READ_PAIRS_EXAMINED | Picard | See Picard documentation |
READ_PAIR_DUPLICATES | Picard | See Picard documentation |
READ_PAIR_OPTICAL_DUPLICATES | Picard | See Picard documentation |
RIBOSOMAL_BASES | Picard | See Picard documentation |
SECONDARY_OR_SUPPLEMENTARY_RDS | Picard | See Picard documentation |
STRAND_BALANCE.UNPAIRED | Picard | See Picard documentation |
TOTAL_CLUSTERS | Picard | See Picard documentation |
TOTAL_READS.UNPAIRED | Picard | See Picard documentation |
Total reads | HISAT2 | Total number of aligned reads |
UNMAPPED_READS | Picard | See Picard documentation |
UNPAIRED_READS_EXAMINED | Picard | See Picard documentation |
UNPAIRED_READ_DUPLICATES | Picard | See Picard documentation |
UTR_BASES | Picard | See Picard documentation |
WINDOW_SIZE | Picard | See Picard documentation |
alignable reads | RSEM cnt file | The number of alignable reads |
filtered reads | RSEM cnt file | The number of filtered reads due to too many alignments |
multiple mapped | RSEM cnt file | The number of reads aligned to multiple genes |
strand | RSEM cnt file | The RSEM read_type; describes if data is single- or paired-end |
total alignments | RSEM cnt file | The RSEM nHits; the number of total alignments |
total reads | RSEM cnt file | The number of total alignable reads |
unalignable reads | RSEM cnt file | The number of reads unalignable |
uncertain reads | RSEM cnt file | The number of reads aligned to multiple locations |
unique aligned | RSEM cnt file | The number of reads uniquely alignable to one gene |
Table 3. Row Attributes (Gene Metrics)
Gene Metrics | Program | Details |
---|---|---|
ensembl_ids | GENCODE GTF | The gene_id listed in the GENCODE GTF |
gene_names | GENCODE GTF | The unique gene_name provided in the GENCODE GTF |
Gene | GENCODE GTF | The unique gene_name provided in the GENCODE GTF; identical to attribute in gene_names |