March 3, 2026Open Access

Rethinking patent retrieval with language models: Toward scalable and efficient search

Key Points

Embedding models improve patent retrieval efficiency through semantic search with significant MAP gains.
The best configurations yield a 14.81% absolute improvement in MAP over existing benchmarks.
Embedding quantization allows for up to 30× faster retrieval and 32× lower memory usage.
General-purpose language models outperform patent-specific embeddings by at least 28.95% in MAP results.

Abstract

Semantic search with embedding models offers an alternative to traditional keyword-based patent retrieval but often struggles with computational cost and efficiency in real-time scenarios compared to methods like BM25. Meanwhile, the rapid advancement of language models raises questions about the necessity of domain-specific models versus the viability of general-purpose ones. This work presents a comprehensive evaluation of embedding-based patent search using the CLEF-IP 2011 dataset. We assess 10 configurations employing language models as retrievers, re-rankers, or hybrids, across 9 models, both patent-specific and general-purpose, tested in 105 experimental setups. Our best configurations deliver a 14.81% absolute MAP improvement over state-of-the-art baselines and outperform patent-specific embeddings by at least 28.95% in MAP. We further show that embedding quantization enables large-scale patent search with up to 30×faster retrieval and 32×lower memory usage. These results provide practical guidance for integrating embedding models into patent prior art search while addressing performance and scalability constraints. • Analyzed 10 configurations across 105 setups using 9 diverse language models. • A 14.81% improvement in MAP over state-of-the-art benchmarks in patent semantic search. • General models can outperform patent-specific embeddings by at least 28.95% in MAP. • With embedding quantization, reducing search time by 30× and memory usage by 32×. • Abstracts are preferred over claims while natural text-based patent searching.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Chikkamath et al. (Wed,) studied this question.

synapsesocial.com/papers/69a75c0ac6e9836116a2469e https://doi.org/https://doi.org/10.1016/j.wpi.2026.102433

Bookmark

View Full Paper