Dirk Hovy
Fri 10 Apr 2015, 11:00 - 12:30
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Diana Dalla Costa (ddallac)


The way we express ourselves is heavily influenced by our demographic background and our communicative goals. In NLP, however, we have mostly worked under the assumption that the main goal of language is information, and that our data is representative of all demographics. As NLP is applied to more and more domains and text types, these assumptions are challenged. Sociolinguistics has long investigated the interplay of demographic factors and language use, and it seems likely that these factors also find their way into the data we use to train NLP systems. The result is reduced performance and potentially disadvantaging certain groups of speakers. This suggests that some of the problems we have addressed in domain adaptation might actually require demographic adaptation. In this talk, I will show how we can combine modern, statistical NLP methods and sociolinguistic theories to the benefit of both fields. I present ongoing research into large-scale analysis of demographic language variation, how this variation affects the performance (and fairness) of NLP systems, and how we can address these problems by incorporating demographic information.



Dirk Hovy is a postdoc at the University of Copenhagen, working with Anders Søgaard. His interests include lexical semantics, non-standard language, and the interaction of extra-linguistic factors and language use. Dirk holds an MA in sociolinguistics from the University of Marburg, and received his PhD in NLP from the University of Southern California, where he worked on unsupervised relation extraction. He has authored multiple papers on WSD, supersenses, NLP for social media, and annotation. He recently shared best paper awards at EACL 2014 and *SEM 2014 for the work with his colleagues in Copenhagen. Outside of research, Dirk enjoys cooking, tango, and leather-crafting, as well as picking up heavy things and putting them back down.