What question did this study set out to answer?

The aim is to identify and categorize data biases in genomic datasets and their effects on machine learning.

April 10, 2026Open Access

Data biases in genomics

Key Points

The aim is to identify and categorize data biases in genomic datasets and their effects on machine learning.
Reviewed different categories of data biases in genomics
Examined the impact of biases on model performance
Provided examples from databases like NCBI ClinVar and gnomAD
Identified several types of data biases present in genomic databases
Showed that flawed data can lead to decreased model accuracy
Demonstrated that biases can distort representation in genomic studies

Abstract

Machine learning (ML) is developing into an inherent part of genomic research due to the ever-increasing amounts of genomic data. However, data-driven algorithms are strongly dependent on good quality and representative data, which can be problematic in genomics due to various reasons. One of these reasons is data biases-flawed or incomplete data often containing systematic errors that compromise its representativeness. In this review, we examine different categories of data biases in genomics and translate them into the framework of general ML. We give examples of different types of biases present in widely used databases such as NCBI ClinVar and gnomAD and illustrate how data biases can influence model performance in assorted studies.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lusiné Nazaretyan

Martin Kircher

Journals

Trends in Genetics

Actions

Institutions

Charité - Universitätsmedizin Berlin

University of Lübeck

Berlin Institute of Health at Charité - Universitätsmedizin Berlin

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Data biases in genomics

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study