Caroline Sporleder
Fri 27 Feb 2015, 11:00 - 12:00
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Nicola Drago-Ferrante (ndferran)

ABSTRACT:

Figurative language poses a serious challenge to NLP systems. The use of idiomatic and metaphoric expressions is not only extremely widespread in natural language; many figurative expressions, in particular idioms, also behave idiosyncratically. These idiosyncrasies are not restricted to a non-compositional meaning but often also extend to syntactic properties, selectional preferences etc. To deal appropriately with such expressions, NLP tools need to detect figurative language and assign the correct analyses to non-literal expressions.

While there has been quite a bit of work on determining the general 'idiomaticity' of an expression (type-based approaches), this only solves part of the problem as  many expressions, such as "break the ice"

or "play with fire", can also have a literal, perfectly compositional meaning (e.g. "break the ice on the duck pond"). Such expressions have to be disambiguated in context (token-based approaches). Token-based approaches have received increased attention recently. In this talk, I will present an unsupervised method for token-based idiom detection. The method exploits the fact that well-formed texts exhibit lexical cohesion, i.e. words are semantically related to  other words in the context.

 

BIOGRAPHY:

Caroline Sporleder is Associate Professor in the Department of Computational Linguistics and Digital Humanities at Trier University, Germany. She received her PhD in Informatics from the University of Edinburgh in 2004 and worked as a postdoc researcher at the Universities of Edinburgh, Tilburg and Saarbruecken. Her research interests lie in the areas of discourse processing, computational semantics and natural language processing for the Humanities.