Dr Thomas Berrett (University of Cambridge)
Fri 25 Jan 2019, 15:05 - 16:00
JCMB 5323

If you have a question about this talk, please contact: Tim Cannings (tcannin2)

Image for Efficient multivariate entropy estimation and independence testing

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this talk I will first describe new entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. These estimators are constructed as weighted averages of the estimators originally proposed by Kozachenko and Leonenko (1987), based on the k-nearest neighbour distances of a sample of n independent and identically distributed random vectors in d dimensions. A careful choice of weights enables us to obtain an efficient estimator for arbitrary d, given sufficient smoothness, while the original unweighted estimator is typically only efficient for d up to 3.

In the next part of the talk I will discuss the problem of independence testing. Our previous work on entropy estimation will allow us to propose a test of independence of two multivariate random vectors, given a sample from the underlying population. The approach, which we call MINT, is based on the estimation of mutual information, which we may decompose into joint and marginal entropies. The proposed critical values, obtained from simulation in the case where an approximation to one marginal is available or permutations of the data otherwise, facilitate size guarantees, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide a new goodness-of-fit tests of normal linear models based on assessing the independence of our vector of covariates and an appropriately-defined notion of an error vector.