SNPer

SNPer is a Nextflow-based pipeline for calling within-host variants from sequencing read data. 

 

Why did we develop this?

Calling within-host (intra-host) variants that occur below the level of consensus calls is (1) important for understanding how viruses diversify over the course of acute and chronic infections, and (2) shows promise for helping to resolve transmission patterns, especially when linkages are so close that consensus genome sequences between cases are identical.

While calling sub-consensus level variants is useful, it can be challenging. Many real intra-host variants occur at sufficiently low frequency in the sample that they can be hard to distinguish from artifactual variants arising from PCR errors, sequencing errors, and/or bioinformatic alignment errors.

SNPer is our attempt to create an easy-to-use pipeline for calling within-host variants that recapitulates the performance of state-of-the-art, project-specific variant calling processes.

SNPer benchmarking

Variants called by SNPer under different stringency parameters were compared to two "ground truth" datasets - a synthetic dataset created by mixing synthetic RNA controls at varied viral loads (described in Valesano et al) and a real dataset of SARS-CoV-2 infections occurring in a household cohort study, where samples were sequenced in duplicate (described in Bendall et al). We use the comparison of SNPer intra-host variant calls to "ground truth" variant calls to identify variant frequency thresholds and conditions at which biologically real variants can be distinguished from technical artifacts.

SNPer performance is described in detail in the linked PDF document.