Chloe Braud
Fri 28 Apr 2017, 11:00 - 12:30
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Diana Dalla Costa (ddallac)


Discourse structures describe the organization of documents in terms of discourse or rhetorical relations (such as "Explanation" or "Contrast") linking clauses and sentences. Discourse analysis has proven to be useful for various downstream applications, such as automatic summarization, question-answering or sentiment analysis. However, the range of applications and the performance are still limited by the low scores of the existing discourse parsers and their focus on English. 

Discourse parsing is known to be a hard task: It involves several complex and interacting factors, touching upon all layers of linguistic analysis, from syntax, semantics up to pragmatics. Consequently, also annotation is complex and time consuming, and hence available annotated corpora are sparse and limited in size.

In this presentation, I will describe my works where I tried to tackle these issues using transfer learning strategies. First, I will describe experiments on identifying implicit discourse relations (i.e. lacking a discourse connective such as "but" or "because") in the Penn Discourse Treebank: I proposed strategies relying on transferring knowledge from the explicit examples to the implicit ones, either by augmenting the size of the training set, or by building a task-tailored representation of the words. 

I will then present two RST discourse parsers. The first parser relies on multi-task learning to transfer information among several discourse related tasks. The second one involves a combination of all the RST corpora annotated for different languages, leading to improvements on English and to the first systems for Basque and Dutch developed without any training data.