Hinrich Schuetze
Fri 03 Jul 2015, 11:00 - 12:30
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Diana Dalla Costa (ddallac)


Word embeddings are widely used. There are two main evaluation techniques for comparing different learning algorithms for word embeddings. First, performance on standard similarity and analogy datasets is taken as a proxy for their quality. Second, word embeddings are evaluated in the context of an application like sentiment analysis or part-of-speech tagging and whichever embedding has the highest performance is deemed best. I will take a step back and ask what the properties of an ideal word embedding would be. I will present a series of context free grammars that model different linguistic properties and investigate how two representative word representation learning algorithms fare on data generated by these grammars. My conclusion is that the current methodology for evaluating word embeddings is flawed and we need alternatives.


Hinrich Schuetze is a professor of computational linguistics at LMU Munich. He received his PhD in linguistics from Stanford University in 1995 and worked on natural language processing and information retrieval at Xerox PARC and three startups in Silicon Valley 1995-2004. He was professor of theoretical computational linguistics at the University of Stuttgart 2004-2013. He is the coauthor of Foundations of Statistical Natural Language Processing (with Chris Manning) and Introduction to Information Retrieval (with Chris Manning and Prabhakar Raghavan). His current research is focused on deep learning.