Marc Deisenroth, Research Fellow, Imperial College, Dept of Computing, Statistical Machine Learning Chair: Amos Storkey
Tue 31 Mar 2015, 11:00 - 12:00
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Mary-Clare Mackay (mmackay3)

Abstract: Gaussian processes (GPs) are the method of choice for probabilistic nonlinear regression. A strength of the GP is that it is a fairly reliable black-box function approximator, i.e., it produces reasonable predictions without manual parameter tuning. A practical limitation of the GP is its computational demand: Training and predicting scale in O(N3) and O(N2), respectively, where N is the size of the training data set. To scale GPs to data set sizes beyond 104, we often use sparse approximations, which implicitly (or explicitly) use a subset of the data. Modern sparse approximations scale GPs up to O(106) data points, but training these methods is non-trivial. In this talk, I will introduce a generalised version of Tresp's Bayesian Committee Machine to address the large-data problem of GPs by distributed computing. This generalised Bayesian Committee Machine (gBCM) is a practical and scalable hierarchical GP model for large-scale distributed non-parametric regression. The gBCM is a family of product-of-experts models that hierarchically recombines independent computations to form an approximation of a full Gaussian process. The gBCM includes classical product-of-experts models and the Bayesian Committee Machine as special cases, while it addresses their respective shortcomings, such as under-estimation of variances or a (more or less) complete breakdown for weak experts. Closed-form computations allow for efficient and straightforward parallelisation and distributed computing with a small memory footprint, but without an explicit sparse approximation. Since training and predicting is independent of the computational graph our model can be used on heterogeneous computing infrastructures, ranging from laptops to large clusters. We provide strong experimental evidence that the gBCM works well on large data sets.

Link to the corresponding working paper:





Marc Deisenroth's web link


There will be lunch provided after this seminar