Use of random barcode in data analysis

I am interested in the evaluation of random barcodes, which I can't completely understand. The barcode marks individual cDNA, but how can the barcode solve the problem of PCR artefacts? For example, if there are two barcode at a particular position, one barcode having 100 reads, another barcode having 200 reads, then the total reads for both barcodes should be the same if PCR efficiency is the same for all cDNAs, so you should choose the minimal number of reads, i.e., 100 reads. This is my guessing, I don't know if it's correct.

A good example of random barcode analysis is the Fig 1C in “iCLIP Predicts the Dual Splicing Effects of TIA-RNA Interactions by Wang et al, PLOS biology, 2010”. This shows you the random barcode for each sequence, and the number of sequences that had the same barcode is shown in the brackets. If multiple sequences mapping to the same position in the genome have the same random barcode, then they are all counted as 1. In your example, you have only two different random barcodes or sequences mapping to the same position, so the cDNA count = 2. Such analysis can properly correct for PCR artefacts.

Last updated