Abstract Accurate identification of somatic mutations is crucial to early diagnosis of cancers and optimal personalized treatments for cancer patients. However, current somatic mutation detection mostly relies on inference from standard reference genome (such as GRCh38) based read alignments, leading to some reference related false positives or false negatives. Genome-in-a-Bottle (GIAB) consortium has publicly released a wide variety of sequencing data, including multiple short- and long-read technologies, Hi-C, and optical mapping for a new broadly-consented tumor-normal paired samples derived from the same individual (HG008), aiming for development of high-quality somatic mutation benchmarks to advance the development of sequencing technologies and analytical methods. Leveraging the latest advancements in long-read technologies and assembly algorithms, we have used the above data to generate and subsequently curate near-complete chromosomal-scale haplotype-resolved assemblies for both normal tissue and tumor cell lines, thus enabling the direct comparison of the tumor haplotype genome to its corresponding normal haplotype genome for accurate somatic mutation detection. Here we report that we implemented an integrated analysis workflow for accurate detection of haplotype-specific somatic structural variations in paired tumor-normal samples by adopting genetic marker identification, haplotype assembly-to-haplotype assembly mapping (SyRI, svim-asm, PAV) and read-alignment approaches (Severus, savana, Sniffles2 etc). We first identified high-quality haplotype-specific genetic markers so that each of the chromosomes from two sets of haplotype assemblies in the tumor sample could be matched with their corresponding haplotype chromosomes in the normal sample. These genetic markers also allowed us to explore and identify genome-wide recombination events in this cancer sample, such as a complex series of inverted duplications on chr19 that is attached to chr22 on one end and to the opposite haplotype of chr19 on the other end. We then generated two comprehensive somatic SV sets by incorporating multiple lines of evidence from matched tumor-normal haplotypes and long/short read mappings (including PacBio HiFi, ONT, and Illumina), when two haplotype-resolved assemblies from normal sample were used as reference. We curated each of the selected SVs using IGV and compared them with GIAB’s curated draft somatic SV benchmark (v0. 4, GRCh38-based). While most somatic SVs were identical between the two callsets, our assembly-based approach was complementary to the mapping-based benchmark in certain repetitive and complex regions. Finally, we assigned each of the somatic SVs to its correct haplotype and generated final haplotype-specific somatic structural variation callset. At present, we confirm 32 insertions and 62 deletions (greater than 50 bps), and 10 translocation and inversion breakpoints that do not have clear GRCh38 coordinates, mostly in centromeric satellite regions. While we continue to improve the haplotype-based somatic SVs, these analyses are already providing more insights into the ongoing refinement of high-quality somatic SV benchmarks for this tumor-normal pair. Citation Format: Chunlin Xiao, Justin Wagner, Jennifer McDaniel, Francoise Thibaud-Nissen, Justin Zook. Marker-based tumor-normal haplotypes matching reveals genome-wide recombination and haplotype specific somatic structural variation discovery using GIAB de novo pancreatic tumor-normal assemblies abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts) ; 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86 (8Suppl): Abstract nr LB104.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chunlin Xiao
Justin Wagner
Jennifer McDaniel
Cancer Research
National Institutes of Health
Material Measurement Laboratory
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiao et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69e471ef010ef96374d8e33f — DOI: https://doi.org/10.1158/1538-7445.am2026-lb104