Detmar Meurers
Fri 04 Sep 2015, 14:00 - 15:30
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Diana Dalla Costa (ddallac)

Abstract:

The analysis of readability has traditionally relied on surface properties of language, such as average sentence and word lengths and specific word lists.  At the same time, there is a long tradition analyzing the Complexity, Accuracy, and Fluency (CAF) of language produced by language learners in second language acquisition (SLA) research.   Reusing SLA measures of learner language complexity to analyze readability, Sowmya Vajjala and I explored which aspects of linguistic modeling can successfully be employed to predict the readability of a native language text. Using various machine learning setups  and  corpora,  we  show  that  a  broad  range  of  linguistic properties are highly indicative of the readability of documents, from graded readers  to web pages  and TV programs targeting  different age groups. The readability model using the full linguistic feature set currently is the best non-commercial readability model available for English, as measured on the standard Common Core State Standard data.

The fact that readability is reflected in a wide range of linguistic aspects also is of relevance for research on text simplification, where the model can in principle be used to identify which sentences are worth simplifying in which way and to evaluate one dimension of the success of automatic simplification. As a prerequisite of such applications,   we show   that our   text readability   models can successfully be applied to individual sentences.

The talk will try to trace the ideas sketched above based on the joint work with Sowmya Vajjala listed below, which are downloadable from: http://purl.org/dm/papers

Sowmya   Vajjala   (2015)   "Analyzing Text   Complexity   and   Text Simplification: Connecting Linguistics, Processing and Educational Applications".  PhD thesis, Eberhard-Karls Universität Tübingen. http://hdl.handle.net/10900/64359

Sowmya Vajjala and Detmar Meurers (2015) “Readability Assessment for Text Simplification:  From   Analyzing Documents to Identifying Sentential Simplifications".   International Journal of Applied Linguistics, Special Issue on Current Research in Readability and Text Simplification edited by Thomas François & Delphine Bernhard.

Sowmya Vajjala and Detmar Meurers (2014) “Assessing the relative reading   level of   sentence pairs   for text   simplification". Proceedings of EACL. Gothenburg, Sweden.

Sowmya Vajjala and Detmar Meurers (2014) “Exploring Measures of 'Readability' for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs.  Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), EACL. Gothenburg, Sweden.

Sowmya Vajjala and Detmar Meurers (2013) "On the Applicability of Readability Models to Web Texts."  Proceedings of the Workshop on Predicting and Improving Text   Readability for Target Reader Populations (PITR), ACL. Sofia, Bulgaria.

Julia Hancke, Sowmya Vajjala and Detmar Meurers (2012) "Readability Classification   for   German   using   lexical,   syntactic,   and morphological features". Proceedings of COLING, Mumbai, India.

Sowmya Vajjala and Detmar Meurers (2012) "On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition". Proceedings of BEA7, ACL. Montreal, Canada.



Bio:

Detmar Meurers is a professor of computational linguistics at the University of Tübingen, Germany, which he joined in 2008 after eight years as a faculty member of the Department of Linguistics at The Ohio State University. In 2013 he also became a member of the Department of Language and Linguistics at the Universitet i Tromsø, Norway to collaborate with the Giellatekno group.

His  research generally  targets the  use of  linguistic modeling  and insight  in  areas  such  as  Intelligent  Computer-Assisted  Language Learning  (TAGARELA, VIEW,  WERTi), learner  corpora, Second  Language Acquisition and Testing (MERLIN,  Kobalt-DaF), automatic comparison of meaning in  authentic task contexts taking  information structure into account  (SFB 833/A4  CoMiC), and  the  use and  correction of  corpus annotation (DECCA). Much of the recent work is connected to the LEAD graduate school in empirical   educational science, where he is responsible for the Language intersection.

More information and links to most papers can be found on http://purl.org/dm