Driver distraction is one of the main factors in road accidents, emphasizing the importance of early detection and alerting mechanisms to mitigate the risk. To achieve this, it is crucial to identify the distraction and its source. Existing distracted driver detection methods primarily analyze full images based on visual features, often overlooking fine-grained details within specific regions, decreasing the distinction between highly similar classes. Addressing this gap requires considering global and local information to capture the driver’s actions better. Our proposed approach is based on the fusion of two different modalities of features: the visual features of the global appearances and the skeletal information. Additionally, we utilized a method to generate a key points attention map that presents the distribution of key points in the image and the regions where the local information is located, which drives the model’s attention to fine-grained details around the driver’s joints. Our model demonstrates competitive accuracy compared to state-of-the-art models, achieving an accuracy of 95.08% on AUCDD-v1 and 99.84% on SFDD datasets.
Boulahmar et al. (Sat,) studied this question.