Publications

You can also find my publications on my Google Scholar profile.

Conference Papers

Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation

Published in Findings of the Association for Computational Linguistics: ACL 2025, 2024

Finetuning on specialized datasets can mitigate harmful behavior, and doing this in English can transfer to other languages. In this work we also observe this transfer and show that the extent to which transfer takes place can be predicted by the amount of data in a given language present in the model’s pretraining data. However, this transfer of bias and toxicity mitigation often comes at the expense of decreased language generation ability in non-English languages.

Vera Neplenbroek, Arianna Bisazza, and Raquel Fernández. 2025. Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 2805–2830, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.findings-acl.145

MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs

Published in COLM, 2024

MBBQ (Multilingual Bias Benchmark for Question-answering) is a carefully curated version of the English BBQ dataset extended to Dutch, Spanish, and Turkish, which measures stereotypes commonly held across these languages. Our results based on several open-source and proprietary LLMs confirm that some non-English languages suffer from bias more than English, and that there are significant cross-lingual differences in bias behaviour for all except the most accurate models.

Neplenbroek, V., Bisazza, A. and Fernández, R., 2024. MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs. In the first Conference on Language Modeling (COLM) 2024. https://openreview.net/pdf?id=X9yV4lFHt4

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 2024

We provide JUDGE-BENCH, a collection of 20 NLP datasets with human annotations, and comprehensively evaluate 11 current LLMs, covering both open-weight and proprietary models, for their ability to replicate the annotations. Our evaluations show that each LLM exhibits a large variance across datasets in its correlation to human judgments. We conclude that LLMs are not yet ready to systematically replace human judges in NLP.

Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, Andre Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, and Alberto Testoni. 2025. LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 238–255, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.acl-short.20

Pre-prints

Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization

Published in arXiv, 2025

In this work we show that LLMs infer the user’s demographic attributes based on stereotypical signals in the conversation, which for a number of groups even persists when the user explicitly identifies with a different demographic group. We effectively mitigate this form of stereotype-driven implicit personalization by intervening on the model’s internal representations using a trained linear probe to steer them toward the explicitly stated identity.

Neplenbroek, V., Bisazza, A. and Fernández, R., 2025. Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization. arXiv preprint arXiv:2505.16467. https://arxiv.org/abs/2505.16467

Journal Articles

[Re] Replication study of ‘Data-Driven Methods for Balancing Fairness and Efficiency in Ride-Pooling’

Published in ReScience C, 2022

Replication study evaluting claims related to fairness-based objective functions for ride-pooling matching systems.

Neplenbroek, V., Perdijk, S., and Prins, V. 2022. [Re] Replication study of ’Data-Driven Methods for Balancing Fairness and Efficiency in Ride-Pooling.’ ReScience C 8, 2, #29. https://rescience.github.io/bibliography/Neplenbroek_2022.html