FAQ

Read Tags

Longbow adds several tags to output reads to indicate various metadata features. The following table describes the read tags added by each tool and what data they contain:

Read Tag	Type	Tool(s)	Description
YS	f	Annotate, Demultiplex	Longbow HMM model score (log probability)
SG	Z	Annotate, Demultiplex	Read segment information indicating the boundaries and labels of each segment in the read. For example: `SG:Z:random:0-53,Poly_T:54-85,cDNA:86-823,MARS:824-853,N:854-869,VENUS:870-894,CBC:895-910,UMI:911-920,Poly_T:921-951,cDNA:952-2030,MARS:2031-2060,O:2061-2076,VENUS:2077-2100,CBC:2101-2116,UMI:2117-2126,Poly_T:2127-2157,cDNA:2158-3747,MARS:3748-3777,P:3778-3794`
YN	Z	Annotate, Demultiplex	Name of the model used to annotate the reads
RC	i	Annotate, Demultiplex	1 if the read has been reverse-complemented; 0 otherwise
XQ	Z	Annotate, Demultiplex	Comma-delimited quality scores ratios for each segment in a given read. Scores for random segments are set to 0/0. Scores for other segments are the optimal alignment score (Smith-Waterman) for the expected sequence over 2x the sequence length. For example: `XQ:Z:0/0,12/60,0/0,58/60,30/32,46/46,0/0,0/0,60/60,0/0,58/60,30/32,46/46,0/0,0/0,60/60,0/0,58/60,32/32`
YQ	f	Annotate, Demultiplex	Approximate read quality. Read quality is approximated by summing the individual segment scores and dividing that sum by the sum of the total possible (best) scores (as defined by the `XQ` tag).
ZS	i	Segment, Extract	1 if the given read is segmented; 0 otherwise
YV	i	Filter	1 if the given read is valid according to the expected order of the model segments; 0 otherwise
YK	Z	Filter	Name of the first valid adapter / named non-random region in the given read
YG	i	Filter	Number of valid adapters / named non-random regions in the given read
ZU	Z	Segment	Unique Molecular Identifier (UMI) sequence for the given segmented read
XU	i	Segment	Position (0-based) in the read of the first base in the annotated Unique Molecular Identifier (UMI) sequence for the given segmented read
CR	Z	Segment	Cell Barcode (CBC) sequence for the given segmented read
XB	i	Segment	Position (0-based) in the read of the first base in the Cell Barcode (CBC) sequence in the given segmented read
XF	f	Segment	Confidence factor for the Cell Barcode (CBC) in the given segmented read. The confidence factor is based on the quality of each base in the CBC and is given by the following: `scale_factor = 100 scale_factor * reduce(operator.mul, map(lambda q: 1. - 10 ** (-(ord(q) - 33.) / 10), qual_string))`
XA	Z	Segment	String containing the name of the tag containing the Cell Barcode (CBC) tag followed by the Unique Molecular Identifier (UMI) tag: `XC-XM`
X1	Z	Segment	Sequence for Spatial Barcode 1 for the given segmented read
XP	i	Segment	Position (0-based) in the read of the first base in the Spatial Barcode 1 sequence in the given segmented read
X2	Z	Segment	Sequence for Spatial Barcode 2 for the given segmented read
XR	i	Segment	Position (0-based) in the read of the first base in the Spatial Barcode 2 sequence in the given segmented read
XM	Z	Segment	Raw Unique Molecular Identifier (UMI) sequence for the given segmented read (for IsoSeq3 compatibility)
XC	Z	Segment	Raw Cell Barcode (CBC) sequence for the given segmented read (for IsoSeq3 compatibility)
YC	i	Correct	True IFF barcode correction was able to be performed (including “correction” where the original barcode did not change). False otherwise.
YP	i	Correct	True IFF the barcode was able to be corrected AND the corrected barcode != the raw barcode. False otherwise.
ic	i	Segment	Sum of number of passes from all ZMWs used to create consensus. Always set to 1. (for IsoSeq3 compatibility)
im	Z	Segment	ZMW names associated with a given segmented read. Set to the name of the parent read for a segmented read. (e.g. `m64013e_211031_055434/1/ccs`). (for IsoSeq3 compatibility)
is	i	Segment	Number of ZMWs associated with a given segmented read. Always set to 1. (for IsoSeq3 compatibility)
it	Z	Segment	List of barcodes / UMIs tagged/clipped during segmentation (e.g. `it:Z:CATTAGGTCATCCCTA,AAATTTTGGA`) (for IsoSeq3 compatibility)
zm	i	Segment	ZMW number from which the given read originates. (for IsoSeq3 compatibility)
XN	Z	Segment, Extract	Altered read name given by Longbow to a segmented read (used for debugging). This name consists of the original read name followed by the start and end positions of the segment on the original read, then the names of the bounding adapters / known regions. For example: `XN:Z:m64013e_211029_235558/24/ccs/0_1960/START-MARS`
pz	i	Correct	Offset of the new, corrected, barcode/tag relative to the original, raw value (0-based). NOTE: if the tag could not be corrected, this tag is not present in the output read.

Read Tag Types

The SAM spec defines the following as types for read tags:

Type	Description
A	printable char
i	signed int
f	float
Z	printable string
H	Byte array in hex format
B	Integer or numeric array