Current trends in computational linguistics and natural language processing have shown great interest in extending monolingual systems to several languages automatically, driven by technological goals. These extensions --- monolingual robustness and more and more languages --- require discovering and formally describing some very basic cross-linguistic regularities. I report here on some recent and current work from our group, where cross-linguistic computational modelling techniques are used to address theoretically-driven linguistic questions. First, much like the comparative method in linguistics, cross-lingual corpus investigations take advantage of any corresponding annotation or linguistic knowledge across languages. We report on work that exploits differences across languages in the surface expression of meaning to show that complementary information about one language can be extracted from their translations in a second language for the task of event duration prediction. The second case study investigates the relationship between corpus data and typological data in the causative alternation. This study shows strong correlations between the two sources of data and proposes a model based on the notion of inner causation of an event. Finally, we ask which factors govern one of the most apparent sources of diversity across languages: the order of words. The availability of several large-scale treebanks allows us to answer this question in a novel way. In a large-scale, computational study on Romance languages, we confirm a trend towards minimisation of the distance between words even in very short spans, raising issues about the role of efficiency and complexity in language use.


Paola Merlo is a professor in the Linguistics department of the University of Geneva. She is the head of the interdisciplinary research group Computational Learning and Computational Linguistics (CLCL). The group is concerned with interdisciplinary research combining linguistic modelling with machine learning techniques. The scope of her current research includes issues in the statistical nature of language, empirical evaluations of linguistic proposal about the lexical semantics of verbs and language universals of word order and statistical models of syntactic and semantic parsing. Prof. Merlo is the current editor of the journal of the Association for Computational Linguistics, Computational Linguistics. Prof. Merlo holds a doctorate in Computational Linguistics from the University of Maryland, USA. She has been associate research fellow at the University of Pennsylvania, and has been visiting scholar at Rutgers, Edinburgh, and Stanford.