We curate a small, manually annotated dataset of 500 spectrograms containing a range of feature classes in the 1 to 50 Hz frequency band from a single ocean bottom seismometer (OBS) from the UPFLOW array in the mid-Atlantic region. We explore several machine learning (ML) training techniques that are specialised for low-data training regimes, and compare their performances for two feature classes (instrument resonances and blue whale calls). We find that a synthetic pre-training step significantly improves performance relative to semi-supervised approaches and finetuning an off-the-shelf model, with a ~5% improvement in performance for well-represented features, and an enhancement of over 90% for rare features. Despite the small dataset, our method can be utilised to accurately and efficiently segment spectrogram data across 43 OBSs with high-quality data of the large-scale UPFLOW array, as well as for earlier OBS deployments. We next investigate a range of applications for the trained segmentation models. We demonstrate that our ML algorithm identifies current-induced instrument resonances accurately enough to extract a tidal signal. In addition, it reliably detects blue whale calls across the entire UPFLOW array, and it even enables automated tracking of individual whales detected simultaneously at multiple OBSs.
Saoulis et al. (Tue,) studied this question.