To improve the quality of service in speech communication, various objective measures have been developed. Out of these, the non-intrusive evaluation measures play a crucial role, as in many cases, the reference clean signal is not available at the receiver end. However, in many non-intrusive evaluations, the standard speech features have been used with deep learning models. These features may not always provide better results in noisy conditions. To solve this problem, a fundamental analysis has been carried out in this paper regarding the effect of the presence of silent portions in a signal on speech quality and intelligibility. It has been observed that the percentage of signal below a specific threshold level, called silent cues, can be a simple yet very effective tool for the development of a non-intrusive speech evaluation measure. This concept of the silent cues has been used in this paper along with the statistical features and a stacked ensemble regression model for non-intrusive speech quality and intelligibility evaluation. The proposed model has been compared with the baseline models and state-of-the-art features in three standard datasets. The simulation results have been analyzed in various noisy conditions, and it has been observed that the proposed model consistently provides superior performance with a minimum of 8-10% improvement for both quality and intelligibility prediction over the existing models.
Dash et al. (Thu,) studied this question.