University Seminar Site

Add to calendar (vCal)

Brendan O'Connor
Thu 17 Sep 2015, 11:00 - 12:30
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Diana Dalla Costa (ddallac)

Abstract:

Social media, SMS, and other genres of online conversational text look different than newspapers or academic papers. They feature tremendous lexical diversity, alternate spellings, and grammatical constructs not seen in standard English, with roots in sociodemographic diversity and widespread dialectical variation that has rarely been seen before in written form. This talk will present results from part-of-speech tagging and large-scale unsupervised word clustering on English-language Twitter, which reveal not just engineering but also sociolinguistic implications for how we approach natural language processing.

Bio:

Brendan O'Connor (http://brenocon.com/) is an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts, Amherst. He researches natural language processing and computational social science through the analysis of social phenomena in news, social media, and other textual corpora. He received his PhD in 2014 from Carnegie Mellon University's Machine Learning Department, advised by Noah Smith. He has been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and an intern with the Facebook Data Science team. Before grad school, he worked on crowdsourcing at CrowdFlower / Dolores Labs, and natural language search at Powerset. His bachelors and masters are from Stanford's Symbolic Systems Program (which itself was modeled on Edinburgh's original School of Epistemics).

This talk is part of the Informatics: Institute for Language, Cognition and Computation/HCRC Seminar Series series

Are Minority Dialects "Noisy Text"?: Implications of Social and Linguistic Diversity for Social Media NLP