Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in ReScience C, 2022
Replication study evaluting claims related to fairness-based objective functions for ride-pooling matching systems.
Neplenbroek, V., Perdijk, S., and Prins, V. 2022. [Re] Replication study of ’Data-Driven Methods for Balancing Fairness and Efficiency in Ride-Pooling.’ ReScience C 8, 2, #29. https://rescience.github.io/bibliography/Neplenbroek_2022.html
Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 2024
We provide JUDGE-BENCH, a collection of 20 NLP datasets with human annotations, and comprehensively evaluate 11 current LLMs, covering both open-weight and proprietary models, for their ability to replicate the annotations. Our evaluations show that each LLM exhibits a large variance across datasets in its correlation to human judgments. We conclude that LLMs are not yet ready to systematically replace human judges in NLP.
Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, Andre Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, and Alberto Testoni. 2025. LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 238–255, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.acl-short.20
Published in COLM, 2024
MBBQ (Multilingual Bias Benchmark for Question-answering) is a carefully curated version of the English BBQ dataset extended to Dutch, Spanish, and Turkish, which measures stereotypes commonly held across these languages. Our results based on several open-source and proprietary LLMs confirm that some non-English languages suffer from bias more than English, and that there are significant cross-lingual differences in bias behaviour for all except the most accurate models.
Neplenbroek, V., Bisazza, A. and Fernández, R., 2024. MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs. In the first Conference on Language Modeling (COLM) 2024. https://openreview.net/pdf?id=X9yV4lFHt4
Published in Findings of the Association for Computational Linguistics: ACL 2025, 2024
Finetuning on specialized datasets can mitigate harmful behavior, and doing this in English can transfer to other languages. In this work we also observe this transfer and show that the extent to which transfer takes place can be predicted by the amount of data in a given language present in the model’s pretraining data. However, this transfer of bias and toxicity mitigation often comes at the expense of decreased language generation ability in non-English languages.
Vera Neplenbroek, Arianna Bisazza, and Raquel Fernández. 2025. Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 2805–2830, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.findings-acl.145
Published in arXiv, 2025
In this work we show that LLMs infer the user’s demographic attributes based on stereotypical signals in the conversation, which for a number of groups even persists when the user explicitly identifies with a different demographic group. We effectively mitigate this form of stereotype-driven implicit personalization by intervening on the model’s internal representations using a trained linear probe to steer them toward the explicitly stated identity.
Neplenbroek, V., Bisazza, A. and Fernández, R., 2025. Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization. arXiv preprint arXiv:2505.16467. https://arxiv.org/abs/2505.16467
Published:
Presentation of “MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs”.
Published:
Large language models (LLMs) are being used by vast amounts of speakers over the world, and show remarkable performance in many non-English languages. However, they often only receive safety fine-tuning in English, if at all, and their performance is known to be inconsistent across languages. There is therefore a need to investigate to what extent LLMs exhibit harmful biases and toxic behaviors across languages and how such harmful behaviors can best be reduced. In this talk I will discuss my work which shows that stereotypical bias exhibited by LLMs differs significantly depending on the language they are prompted in. Furthermore, we show that mitigation of these stereotypical biases and toxic behaviors performed in English transfers to other languages, though often at the expense of decreased language generation ability in those non-English languages.
Published:
Poster presentation of “Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation”.
Published:
Presentation of “Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization”.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.