Afra Alishahi
Fri 27 Oct 2017, 11:00 - 12:30
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Diana Dalla Costa (ddallac)


Humans learn to understand speech from weak and noisy supervision: they manage to extract structure and meaning from speech by simply being exposed to utterances situated and grounded in their daily sensory experience. Emulating this remarkable skill has been the goal of numerous studies; however in the overwhelming majority of cases researchers have used severely simplified settings where either the language input or the extralinguistic sensory input, or both, are small scale and symbolically represented.

We simulate this process in a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a recurrent highway network to model the temporal nature of spoken speech, and examine how form and meaning-based linguistic knowledge emerge from the input signal.  We carry out an in-depth analysis of the representations used by different components of the trained model and show that encoding of semantic aspects tends to become richer as we go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease. In particular, we examine the representation and encoding of phonemes in the model, and show that phoneme representations are most salient in the lower layers of the model. A hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics. This is joint work by Grzegorz Chrupała, Lieke Gelderloos and Marie Barking.


Afra Alishahi is an associate professor at the Tilburg Center for Cognition and Communication (TiCC), Tilburg University, the Netherlands. She received her PhD in Computer Science from University of Toronto and was a post-doctoral fellow at the Computational Psycholinguistic group, Saarland University.  Her main research interest is developing computational models for studying the process of human language learning. She received an NWO Aspasia award in 2012, and is the Co-PI of the NWO project on the role of non-verbal cues on child language learning.