What question did this study set out to answer?

The aim is to assess the reliability of identifying nitrogen-containing compounds absent from mass spectral databases using advanced models.

April 10, 2026

An Evaluation of the Identifying Reliability of Nitrogen-Containing Compounds That Are Absent from Mass Spectral Databases Using a Combination of Gas Chromatography–Mass Spectrometry and Deep Learning

Key Points

The aim is to assess the reliability of identifying nitrogen-containing compounds absent from mass spectral databases using advanced models.
Utilized GC-MS combined with deep learning approaches for identification.
Employed models such as AIRI for retention indices and neims-pytorch for mass spectra prediction.
Extracted isomers from the PubChem database and excluded those unlikely to be present.
Sorted remaining isomers by the similarity of observed versus predicted mass spectra.
Achieved correct compound structure identification in 8 out of 12 cases using both approaches.
Correct structures were found among the top 5 candidates in remaining cases, indicating improved accuracy.
Single approach use resulted in lower accuracy, emphasizing the need for simultaneous analysis.

Abstract

In recent years, numerous approaches have emerged for identifying compounds that are absent from databases using a combination of GC–MS and deep learning methods. Despite significant progress in this area, studies assessing the reliability of identification (for GC–MS) using simultaneously several of state-of-the-art models are virtually nonexistent. Such an assessment requires reference mass spectra and retention indices for compounds not represented in the NIST database used to train the models. The assessment is only valid if all used models “have not seen” the test molecules during training. In this work, such an assessment was performed for 12 nitrogen-containing compounds that are absent from the NIST 23 mass spectral database: 2-methyl-1-pyrroline, N',N'-dimethylformohydrazide, 1-ethylpyrazole, 3,4-dimethyl-1,2-oxazole, 1,4-dimethyl-1,2,3-triazole, 1-ethyl-1,2,4-triazole, 2-amino-5-methylpyrazine, and others. The following models were used: the AIRI model for predicting retention indices, the neims-pytorch model for predicting mass spectra, and the EI2FP model for predicting molecular fingerprints (the presence or absence of certain substructures) based on the mass spectrum. For each molecule, the isomer structures corresponding to the molecular formula were extracted from the PubChem database. Isomers with low probability of being present in typical samples and isomers for which the predicted retention index differed significantly from the observed one were excluded. The remaining isomers were sorted according to the similarity of the observed and predicted mass spectra, as well as the similarity of the molecular fingerprints obtained from the mass spectrum to that calculated for the candidate structure. Using both approaches simultaneously allows for the determination of the correct compound structure in 8 out of 12 cases; in the remaining cases, the correct structure is among the top 5 candidates; it is a quite high result for non-targeted screening. Using either approach alone yields lower accuracy, and satisfactory identification without the retention index is also not possible. Outdated models for predicting mass spectra and retention indices (CFM, SVEKLA) also fail to achieve such results.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

D. D. Matyushin

M. D. Khrisanfov

S. A. Borovikova

Journals

Journal of Analytical Chemistry

Actions

Institutions

Lomonosov Moscow State University

Frumkin Institute of Physical Chemistry and Electrochemistry

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An Evaluation of the Identifying Reliability of Nitrogen-Containing Compounds That Are Absent from Mass Spectral Databases Using a Combination of Gas Chromatography–Mass Spectrometry and Deep Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study