This repository contains the raw data and imputation panel files necessary for the replication of results contained in the paper, entitled: "Imputation and Maximum Likelihood Haplotype Refinement of Simulated Ancient Mitochondrial Genomes". GenBank Accession IDs Whole mtDNA genomes were retrieved from NCBI's GenBank nucleotide database using Entrez Direct utilities via command-line using the following command: esearch -db nucleotide -query " (016500SLEN: 016600SLEN) AND HomoOrganism AND mitochondrionFILT AND complete genome AND ancient NOT (Homo sp. Altai OR Denisova hominin OR neanderthalensis OR heidelbergensis OR shotgun) " | efetch -format fasta > final. fasta The two text files (i. e. , 46Kₐccessions. txt and 100ₛampledₐccessions. txt) contain the GenBank accession IDs for the 46K imputation panel and control/simulated mtDNAs, respectively. Imputation Panel Files We uploaded the processed multiple-sequence alignment file (MSA, (mtDNA. panel. postfiltered. realigned. rmdup. FINAL. fasta. gz) that contains the 46, 791 mtDNA genomes used for Minimac4 panel creation. Additional intermediary files, such as the split VCF (Varian Call Format, (46K. panel. split. rcrs. vcf. gz) and supporting tabix VCF index file (46K. panel. split. rcrs. vcf. gz. tbi) are included here. The final Minimac4 panel (compressed. MSAV) can be implemented in the MAVEN pipeline for mtDNA imputation (46K. mtDNA. reference. panel. msav). Data Files Simulated FASTQ files (ancient DNA patterns, such as terminal deamination and fragments lengths, were generated using gargammel) in two batches: 1) incremental depth of coverages ranging from 1x to 15x (1₁5fastq-20260221T200810Z-1-001. zip), and 2) 0. 25x to 5x (ultraₗow₀. 25₅fastq-20260221T200727Z-1-001. zip). Each coverage depth contains n=100 mtDNA genomes sub-sampled from the the larger 46K panel (MSA in FASTA format).
Plummer et al. (Thu,) studied this question.