What question did this study set out to answer?

The aim is to create image descriptions in both formal and conversational Russian to aid language learning.

March 24, 2026

Generating Russian-Language Image Descriptions in Both Formal and Conversational Styles via a Neural-Network Ensemble

Key Points

The aim is to create image descriptions in both formal and conversational Russian to aid language learning.
Developed a multimodal encoder-decoder architecture using ResNet-152 for encoding and LSTM for decoding.
Incorporated Bahdanau attention mechanism to improve captioning accuracy.
Constructed a proprietary dataset adapted from MS COCO using the GigaChat language model.
Used ruCLIPScore for selecting optimal model configurations during ensemble construction.
The ensemble model significantly outperformed individual models based on ruCLIPScore.
Generated captions demonstrated stylistic variety in both formal and conversational styles.

Abstract

The paper proposes a solution for generating image captions in both formal and conversational registers. This study is motivated by the need for educational tools to assist non-native speakers in mastering colloquial Russian. The methodology employs a multimodal encoder–decoder ensemble architecture, in which a pre-trained ResNet-152 Convolutional Neural Network serves as the encoder, and an LSTM network functions as the decoder. The captioning performance is further enhanced by incorporating the Bahdanau attention mechanism. To facilitate training, the authors constructed a proprietary dataset derived from MS COCO, which was translated and stylistically adapted via the GigaChat large language model. During ensemble construction, ruCLIPScore is utilized to select the most effective model configurations. Experimental results indicate that the ensemble significantly outperforms its individual constituent models according to ruCLIPScore and can produce captions with stylistic diversity across registers.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

M. A. Privalov

A. S. Kozharinov

Journals

Doklady Mathematics

Actions

Institutions

National University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Generating Russian-Language Image Descriptions in Both Formal and Conversational Styles via a Neural-Network Ensemble

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study