University Seminar Site

Add to calendar (vCal)

Simao Eduardo and Chris Williams
Tue 12 Feb 2019, 11:00 - 12:00
IF 4.31/4.33

If you have a question about this talk, please contact: Gareth Beedham (gbeedham)

Chris Williams

Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case

Latent variable models can be used to probabilistically “fill-in”

missing data entries. The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a “recognition” or “encoder” network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the FA model in the presence of missing data, and note that this solution exhibits a non-trivial dependence on the pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to imputing the missing data

Joint work with Charlie Nash and Alfedo Nazabal

Simao Eduardo

Self-Cleaning VAE: Robust Variational Autoencoders for Mixed-Type Data

Variational Autoencoders (VAE) have been successfully applied to datasets that span from images to tabular data,

e.g. UCI repository. However, there is a plethora of real world datasets that are corrupted by noise, making them

unsuitable for certain tasks like model training. In addition, sometimes the objective is to obtain a clean dataset (repair)

or remove the outliers from it (detection). Available models for this task may have one of several drawbacks: need of

clean subset of data to train; dirty-clean pairs to train; not easily understood hyper-parameters; outlier detection

granularity is at instance level rather than feature level; does not model mixed-types, e.g. categorical and real features.

In this ongoing project, our aim is to provide a fully unsupervised generative model that focuses on modelling the

inliers of the dataset (robust), directly training on dirty instances. We provide a probabilistic framework for mixed-type

datasets, which also enables cell-wise (feature) outlier detection and repair. Our robust VAE (RVAE) outperforms

standard VAE in several corruption scenarios.

Joint work with Alfredo Nazabal, Chris Williams and Charles Sutton.

This talk is part of the Informatics: Institute for Adaptive and Neural Computation (ANC) Workshops and ANC/DTC Seminars series

ANC Workshop: Simao Eduardo and Chris Williams, Chair: David Sterratt