2023-24 Research Projects

Abstract

Computer representation of words, called word embeddings, attempt to capture the semantic meaning of words, but often capture additional, unwanted information. Societal biases in training data, including gender and racial stereotypes, yield biased embeddings. Such embeddings are then used broadly across an abundance of AI and machine learning applications, resulting in detrimental outcomes. Recent examples of bias in natural language processing include Google Translate translating the gender neutral sentences “O bir hemşire. O bir doktor” to “She is a nurse. He is a doctor,” Google Search autocompleting the request “jews should” with “be wiped out” and Facebook’s hate speech detection algorithms being more effective at identifying hate speech against white people than against other groups. Existing work has failed to substantially remove bias in word embeddings while retaining unbiased semantic information. We propose a novel method to address this issue. Instead of attempting to isolate and remove embedding bias, we consider how relationships between words change in the presence of a sensitive attribute. Then, we can compute embeddings as if this sensitive attribute had no effect. Initial experimental results indicate that this method is capable of removing gender bias from word embeddings without destroying the unbiased meaning of words.

Field

Mathematics / machine learning

Debiasing Word Relationships in Word Embeddings

Abstract

Field

Team

Deanna Needell

Erin George