Historical documents hold immense cultural value, yet working with them poses numerous challenges for information access and analysis. This thesis addresses two such tasks in the context of historical document images: sub-image retrieval and pattern spotting. While sub-image retrieval involves retrieving images that contain a given query image, pattern spotting extends this further by localizing occurrences of that query image within the retrieved images. These tasks present two major challenges: i) search queries are arbitrary and not limited to a predefined set of patterns, requiring the proposed approach to handle previously unseen queries; ii) most modern deep learning methods rely on labeled training data, which is scarce or nonexistent in the domain of historical documents.Due to these constraints, prior work on these tasks has been limited to learning-free approaches, relying exclusively on off-the-shelf pre-trained networks. In this thesis, we propose the first learning-based approach to address these tasks. This involves the challenge of developing a learning-based solution in a setting with no available training data and no fixed set of patterns to detect or retrieve. Our aim is to open a new direction for tackling these problems — one that we believe is more scalable and future-proof, as learning task-specific and domain-specific representations should enable more flexible and adaptable solutions.To this aim, we develop a novel model for pattern spotting, dubbed OS-DETR. This model adapts the competitive transformer-based DETR architecture, originally designed for object detection, to address the tasks of sub-image retrieval and pattern spotting.To overcome the scarcity of labelled data, we propose a simple technique for generating annotated synthetic data tailored to these tasks. This synthetic data is then used to train our OS-DETR model, and we investigate various design choices and their impact.A set of generalization techniques is then introduced, that aims to improve the performance of the model beyond the source domain. These techniques span multiple aspects of the pipeline, from adjustments to the model architecture and training schedule to improvements in synthetic data generation and post-processing strategies.We show the impact of these techniques and the validity of our approach via numerous experiments, both on a synthetically generated testing set as well as on a publicly available benchmark dataset for historical document images.Finally, we present preliminary experiments exploring an alternative approach to training data generation, which opens promising avenues for future research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Joseph Assaker
Building similarity graph...
Analyzing shared references across papers
Loading...
Joseph Assaker (Fri,) studied this question.