• PIBAdb, one of the largest public datasets of colorectal polyp images and videos, is presented. • PIBAdb comprises over 3,000 annotated video segments totalling more than 11 hours. • PIBAdb is the only dataset featuring a relational database with all the relevant meta-information. • PIBAdb has been used for the development of the PolyDeep CAD system, evaluated in clinical trials. • PIBAdb provides comprehensive per-polyp clinical metadata and diverse imaging modalities, including NBI, WL, and non-polyp samples with defined levels of cleanness. Colorectal cancer is the third most common cancer worldwide and presents a high mortality rate. Colonoscopy is the gold standard for screening, as it can reduce its incidence and mortality. Deep Learning techniques have become state-of-the-art in lesion detection and classification, and several Deep-Learning-based Computer-Aided Diagnosis systems are already undergoing clinical evaluation or commercialization. However, the development of reliable models requires large, high-quality datasets, which are costly and time-consuming to create. Thus, the availability of public datasets is critical for the scientific community to develop artificial intelligent models. This work aims to contribute to the available resources by presenting PIBAdb, a new multimodal public cohort of colorectal videos and images. The PIBAdb cohort contains polyp data derived from routine colonoscopies conducted between January 2018 and May 2021 at Hospital Universitario de Ourense, under the PolyDeep project. Each polyp was resected, histologically analysed, morphologically classified, and annotated by expert clinicians with bounding boxes in images and temporal segments in videos. The main characteristics of PIBAdb were compared with another 25 public datasets. The utility of PIBAdb was evaluated in polyp detection and classification scenarios using Deep Learning models. PIBAdb includes detailed clinical and histological metadata from 1,176 polyps, 31,946 manually annotated polyp images, 14,124 non-polyp images, nearly 7 hours of annotated video segments showing polyps, and over 4 hours of annotated video segments without polyps. It comprises both the raw database and several curated image datasets, each accompanied by metadata and documentation. PIBAdb is publicly available upon request for non-profit purposes. PIBAdb is one of the largest and most complete multimodal public datasets for colorectal polyp research. It is characterized by its rich per-polyp metadata (histology and PARIS/NICE classifications), inclusion of NBI and WL images, and non-polyp images at multiple levels of cleanness. While the image datasets included are practical for developing classification or detection models, the full database enables more complex video-based research and custom dataset creation using the PIBA management tool, supported by a queryable relational database. Its availability is expected to support the development of Deep Learning models and to foster future contributions from the research community.
Nogueira-Rodríguez et al. (Sun,) studied this question.