Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Published in Findings of the Association for Computational Linguistics: ACL 2025, 2024
Finetuning on specialized datasets can mitigate harmful behavior, and doing this in English can transfer to other languages. In this work we also observe this transfer and show that the extent to which transfer takes place can be predicted by the amount of data in a given language present in the model’s pretraining data. However, this transfer of bias and toxicity mitigation often comes at the expense of decreased language generation ability in non-English languages.
Vera Neplenbroek, Arianna Bisazza, and Raquel Fernández. 2025. Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 2805–2830, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.findings-acl.145