Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization
Published in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
In this work we show that LLMs infer the user’s demographic attributes based on stereotypical signals in the conversation, which for a number of groups even persists when the user explicitly identifies with a different demographic group. We effectively mitigate this form of stereotype-driven implicit personalization by intervening on the model’s internal representations using a trained linear probe to steer them toward the explicitly stated identity.
Vera Neplenbroek, Arianna Bisazza, and Raquel Fernández. 2025. Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20378–20411, Suzhou, China. Association for Computational Linguistics. https://aclanthology.org/2025.emnlp-main.1029/
