Abstract A key step in sequence similarity search is to identify shared seeds between a query and a reference sequence. A well-known tradeoff is that longer seeds offer fast searches but reduce sensitivity in variable regions. We introduce multi-context seeds (MCS), which allow the storage of seeds with different lengths in the same index structure, thus retaining the advantages of both short and long seeds. We demonstrate the applicability of MCS by implementing them in strobealign. Strobealign with MCS substantially improves accuracy compared to the previous version with little cost in runtime and no memory overhead.
Tolstoganov et al. (Sat,) studied this question.