Key points are not available for this paper at this time.
Abstract We present MARVEL ( https://ligogpt.mit.edu/marvel ), a locally deployable, open-source framework for domain-aware question answering and assisted scientific research. This system has been developed to work with technical data from scientific groups and to provide citation-backed responses while operating in authenticated computing environments. For complex queries, MARVEL uses DeepSearch, which integrates retrieval-augmented generation with Monte Carlo Tree Search. The DeepSearch mode breaks a query into related sub-questions and spends more compute on branches that appear useful, while keeping track of sources through a global evidence ledger during drafting. We apply the framework to gravitational-wave (GW) research, using material related to the laser Interferometer GW Observatory (LIGO). Answers are grounded in a curated semantic index of research literature, doctoral theses, LIGO documents, and long-running detector electronic logbooks, with targeted web searches when appropriate. Since commercial large language models cannot be directly benchmarked on private data, we evaluate MARVEL using two publicly available datasets chosen to resemble the semantic and technical characteristics of our target domain. On these datasets, results for literature-style queries are comparable between MARVEL and a GPT-4o mini baseline. We see improved results in queries related to detector operations where the effects of domain-specific retrieval and multi-step reasoning are more apparent. The code and evaluation datasets are released with this work.
Mukund et al. (Tue,) studied this question.