Heart of CLP.
This class actually classifies ReadData into the reason why it failed PF
classification is based on a small set of titrated flowcells sequenced at the Broad Institute by the Genomics Platform.
Three cluster were observed:
- numNs~24 and was found only near the boundaries of tiles. it didn't seem to depend on concentration. For this reason it
was classified as MISALIGNED
- numNs~0 and numQGtTwo<=8 these were found throughout the tiles and _decreased_ in number as the concentration of the library increased
Thus it was concluded that these correspond to the EMPTY wells
- numNs~0 and numQGtTwo>=12 there were found throughout the tiles and _increased_ in number as the concentration of the library increased
Thus it was concluded that these correspond to the POLYCLONAL wells
- the remaining reads were few in number the classification for them wasn't clear. Thus they are left as UNKNOWN.
We use the length of the read as a parameter and scale the 8 and the 12 accordingly as length/3 and length/2, but in reality this has only
been tested on length=24.