The paper describes the dataset for a deeper evaluation of the machine learning models for handwritten character recognition. For that purpose, we build a dataset that, combined with existing NIST Databases, offers possibilities for additional analysis of the models built on these data. The paper summarizes the most popular publicly available machine learning models, trained on the EMNIST-letters dataset. We discuss issues related to the evaluation of state-of-the-art results that have been made by comparing accuracy achieved on the test set built in cross-validation setting. We propose additional evaluation on new, independently constructed data, unaffiliated with the NIST database authors. The dataset and source codes have been made available using Gdansk Tech University repository Most Wiedzy.
Szymański et al. (Thu,) studied this question.