Simao Eduardo and Chris Williams
Tue 12 Feb 2019, 11:00 - 12:00
IF 4.31/4.33

If you have a question about this talk, please contact: Gareth Beedham (gbeedham)

Chris Williams

 

Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case

 

Latent variable models can be used to probabilistically “fill-in”

missing data entries.  The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a “recognition” or “encoder” network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the FA model in the presence of missing data, and note that this solution exhibits a non-trivial dependence on the pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to imputing the missing data

 

Joint work with Charlie Nash and Alfedo Nazabal

 

 

Simao Eduardo

 

Self-Cleaning VAE: Robust Variational Autoencoders for Mixed-Type Data

 

Variational Autoencoders (VAE) have been successfully applied to datasets that span from images to tabular data, 

e.g. UCI repository. However, there is a plethora of real world datasets that are corrupted by noise, making them

unsuitable for certain tasks like model training. In addition, sometimes the objective is to obtain a clean dataset (repair)

or remove the outliers from it (detection). Available models for this task may have one of several drawbacks: need of 

clean subset of data to train; dirty-clean pairs to train; not easily understood hyper-parameters; outlier detection 

granularity is at instance level rather than feature level; does not model mixed-types, e.g. categorical and real features.

 

In this ongoing project, our aim is to provide a fully unsupervised generative model that focuses on modelling the 

inliers of the dataset (robust), directly training on dirty instances. We provide a probabilistic framework for mixed-type 

datasets, which also enables cell-wise (feature) outlier detection and repair. Our robust VAE (RVAE) outperforms 

standard VAE in several corruption scenarios. 

 

Joint work with Alfredo Nazabal, Chris Williams and Charles Sutton.