Large Language Models (LLMs) have achieved remarkable success in generating human-like text and are increasingly integrated into real-world applications. However, their deployment raises significant safety concerns, including the risk of generating harmful, biased, or culturally inappropriate content. While several safety benchmarks exist for English, non-English contexts—such as Italian—remain critically underexplored, despite the growing demand for localized and culturally sensitive AI technologies. In this paper, we introduce BeaverTails-IT, the first Italian safety benchmark for LLMs, created through the machine translation of the original English BeaverTails dataset. We employ five state-of-the-art translation models, evaluate translation quality using automated metrics and human judgments, and provide guidelines for selecting high-quality safety prompts. Our benchmark enables the preliminary evaluation of Italian LLMs across key safety dimensions such as toxicity, bias, and ethical compliance. Beyond presenting the translated dataset, we offer a detailed analysis of its limitations, highlighting the challenges of using translated content as a proxy for native benchmarks. Our findings demonstrate the need for a dedicated, culturally grounded Italian safety benchmark to ensure effective and contextually appropriate evaluations.
Giuseppe et al. (Wed,) studied this question.