Download Report

SNV Benchmarking Database - Comprehensive Variant Calling Evaluation

Overview: This dashboard provides systematic evaluation and comparison of variant calling performance across multiple sequencing technologies and computational pipelines. Each experiment represents a controlled benchmarking analysis using hap.py (a standard benchmarking tool) to compare variant calls against established genomic reference standards including GIAB (Genome in a Bottle) , CMRG , and T2T truth sets.

Technologies Covered: Compare performance across major sequencing platforms including Short-read sequencing ( Illumina , MGI ) and Long-read sequencing ( PacBio , Oxford Nanopore (ONT) ) using variant callers including DeepVariant (ML-based), GATK (Traditional), DRAGEN (Hardware-accelerated), and Clair3 (Long-read optimized).

How to Navigate:
• Use sidebar filters to narrow by technology or caller
Advanced Comparison — filter by technology/caller combinations
Manual Selection — click rows to pick specific experiments
• Expand ▶ arrows for detailed experiment metadata

View:
Variant Calling Performance Metrics

Performance Overview: Quantitative hap.py benchmarking results showing precision, recall, and F1-score for each technology-caller combination against validated truth sets.

Key Metrics: Precision measures accuracy of called variants (% true positives), Recall measures completeness (% of true variants detected), and F1-score provides balanced performance assessment. Higher percentages indicate better performance across all metrics.

Exploring Results:
• Results reflect experiments selected in Tab 1 (filters or comparison modes)
• Use truth set filter above to focus on specific reference standards
• Click column headers to sort by any metric
• Each experiment shows two rows: SNP and INDEL variants

Performance Characterization Plots

Performance Plots: Precision vs. recall scatter plots with F1-score contour lines for visual comparison of variant calling performance. Curved lines represent constant F1-scores, helping identify optimal precision-recall balance. Each point represents one experiment, colored by sequencing technology ( Illumina , PacBio , ONT , MGI ) and shaped by variant caller ( DeepVariant , GATK , Clair3 , DRAGEN ).

How to Interact:
Click and drag to zoom in | Double-click to reset view
Hover for quick performance metrics
Click points and scroll down to view detailed experiment metadata

SNP Performance

INDEL Performance

Loading plots...

Chart Reference






Stratified Performance Analysis

Regional Breakdown: Performance metrics displayed across genomic regions with different sequence characteristics. Available stratifications include complexity-based regions (easy/difficult), GC content ranges, functional annotations (coding/non-coding), repetitive sequences (homopolymers, tandem repeats, segmental duplications, satellites), and specialized regions (MHC, low mappability areas). These stratifications follow GIAB genome stratification standards .

How to Use:
• Results reflect experiments selected in Tab 1
• Choose genomic regions below (expand ▶ sections for more options)
• Click Update Analysis to generate stratified results

Select Regions to Analyze
Primary Stratifications
▶ Functional Regions
▶ Repetitive DNA Regions
Simple Repeats:
Tandem Repeats:
Non-Repetitive:
▶ Structural Complexity
▶ GC Composition
Low GC:
Normal GC:
High GC:
Extreme GC:
F1 Score
Precision
Recall
Show Experiment Information
Click to expand

SNP Performance by Region


Loading plots...

INDEL Performance by Region



SNP Performance by Region


INDEL Performance by Region

No Data to Display

Please select some regions and experiments from previous tabs, then click 'Update Analysis'.