
Tutorial Speakers
Nonparametric Bayesian uncertainty quantification
Aad van der Vaart, Leiden University
In Bayesian nonparametrics a functional parameter (density, regression function) is equipped with a prior distribution, and a posterior distribution is obtained in the standard manner. The center of the posterior distribution can be used as a point estimator. It has been documented in fair generality that reasonable priors give good reconstructions of an unknown function. In particular, priors that come with a bandwidth or sparsity parameter that is tuned to the data by a hierarchical or empirical Bayes method typically lead to posterior distributions that contract at optimal rates to the true function. However, at the core of the Bayesian method is uncertainty quantification through the full spread of the posterior distribution. For instance, one would hope that the area covered by a plot of a sample of draws (functions) from the posterior distribution can be interpreted as a confidence set. The purpose of the three talks is to investigate to what extent this is justified. A full and general answer is currently not available, but we discuss results for special models, which are thought to extend to other models as well. The talks will assume no prior knowledge of Bayesian nonparametrics; we shall start with examples of priors and contraction rates.
Statistical inference with highdimensional data
CunHui Zhang, Rutgers University
We consider a semi lowdimensional approach to statistical inference with highdimensional data. The approach is best described with the following model statement:
model = lowdimensional component + highdimensional component.
The main objective of this approach is to develop asymptotically efficient statistical inference procedures for the lowdimensional component, such as pvalues and confidence regions. Just as in semiparametric inference, a sufficiently accurate estimate of the highdimensional component is required in order to carry out the inference for the lowdimensional component. The feasibility of estimating the highdimensional component at the required accuracy depends on the model complexity and illposedness, signal strength, the type of lowdimensional inference problem under consideration, and sometimes availability of certain ancillary information. We will consider linear regression and Gaussian graphical model as primary examples. We will describe concave penalized methods which take advantage of partial signal strength, strategies and algorithms of debiasing the Lasso and concave penalized estimators, the sample size requirement for the debiasing methods to work, and the contributions of unlabeled data in semisupervised regression.
Confidence regions in highdimensional and nonparametric statistical models
Richard Nickl, Cambridge University
In highdimensional and nonparametric statistical models, optimal (adaptive) estimators typically require a model selection, dimension reduction or regularisation step, and as a consequence using them for inference is a nonobvious task. In particular, `uncertainty quantification’  the construction of adaptive `honest' confidence regions that are valid uniformly in the parameter space may not be straightforward or even impossible. We will explain the main ideas of a decisiontheoretic framework (that has emerged in the last 10 years or so) that gives general informationtheoretic conditions which allow to check whether honest confidence sets exist or not in a given statistical model, and, when the answer is negative, which `signal strength’ conditions are required to make adaptive inference. These conditions involve the minimax solution of certain composite highdimensional testing problems, somewhat related to the minimax `signal detection' problem. I will show how the general theory can be applied to several examples, such as sparse or nonparametric regression, density estimation, low rank matrix recovery and matrix completion. We will also describe some concrete uncertainty quantification procedures, Bayesian and nonBayesian, that can be used in such models.
Variational Inference: Foundations and Innovations
David Blei, Columbia University
One of the core problems of modern statistics and machine learning is to approximate difficulttocompute probability distributions. This problem is especially important in probabilistic modeling, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this tutorial I review and discuss variational inference (VI), a method a that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and tends to be faster than more traditional methods, such as Markov chain Monte Carlo sampling.
Brought into machine learning in the 1990s, recent advances in improved fidelity and simplified implementation have renewed interest and application of this class of methods. This tutorial aims to provide both an introduction to VI, a modern view of the field, and an overview of the role that probabilistic inference plays in many of the central areas of machine learning.
First, I will provide a broad review of variational inference. This serves as an introduction (or review) of its central concepts. Second, I develop and connect some of the pivotal tools for VI that have been developed in the last few years, tools like Monte Carlo gradient estimation, black box variational inference, stochastic variational inference, and variational autoencoders. These methods have lead to a resurgence of research and applications of VI. Finally, I discuss some of the unsolved problems in VI and point to promising research directions.
Contributed Speakers
Posterior Contraction and Credible Sets for Multivariate Regression Mode with Twostage Improvements
William Weimin Yoo, Leiden University
Locating the maximum of a function and its size in presence of noise is an important problem. The optimal rates for estimating them are respectively the same as those of estimating the function and all its first order partial derivatives, if one is allowed to sample in one shot only. It has been recently observed that substantial improvements are possible when one can obtain samples in two stages: a pilot estimate obtained in the first stage that guides to optimal sampling locations for the second stage sampling. If the second stage design points are chosen appropriately, the second stage rate can match the optimal sequential rate. In the Bayesian paradigm, one can naturally update uncertainty quantification based on past information and hence the twostage method fits very naturally within the Bayesian framework. Nevertheless, Bayesian twostage procedures for modehunting have not been studied in the literature. In this talk, we provide posterior contraction rates and Bayesian credible sets with guaranteed frequentist coverage, which will allow us to quantify the uncertainty in the process. We consider anisotropic functions where
function smoothness varies by direction. We use a random series prior based on tensor product Bsplines with normal basis coefficients for the underlying function, and the error variance is either estimated using empirical Bayes or is further endowed with a conjugate inversegamma prior. The credible set obtained in the first stage is used to mark the sampling area for second stage sampling. We show that the second stage estimation achieves the optimal sequential rate and avoids the curse of dimensionality. This research is joint work with Dr. Subhashis Ghosal of North Carolina State University.
Efficient semiparametric estimation and model selection for multidimensional mixtures
Elodie Vernet, Cambridge University
Obtaining theoretical guarantees (such as uncertainty quantification) in the context of parameter estimation may be challenging in mixture models. Note that identifiability is already not trivial in these models. In this presentation, I will discuss efficiency in the context of nonparametric mixture models. More precisely, we consider mixture models where the i.i.d. observations have at least three components which are independent given the population of the observation. We don't assume a parametric modelling of the emission distributions that is the distribution of the observation given its population. And we are interested in the semiparametric estimation of the proportion of each population. Using a discretisation of the problem via projection of the densities in histograms, we obtain an asymptotically efficient estimator. In the Bayesian setting, using a sequence of prior distributions defined on more and more complex sets when the number of observations increases, we show that the associated sequence of posterior distribution verifies a Bernstein von Mises type theorem with efficient Fisher information for the semiparametric problem as variance. These two asymptotic results are true given the complexity of the approximation models don't increase too fast compared to the number of observations. We then propose a crossvalidation like procedure to select the complexity of the model in a finite horizon. This proposed procedure satisfies an oracle inequality.
These results are part of a joint work with Elisabeth Gassiat and Judith Rousseau. Reference: https://arxiv.org/abs/1607.05430.
Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics
Julyen Arbel, Inria Grenoble RhôneAlpes
Given a sample of size n from a population of individual belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability D_n(l) that the (n+1)th draw coincides with a species with frequency l in the sample, for any l=0,1,...,n. We explore in this talk a Bayesian nonparametric viewpoint for inference of D_n(l). Specifically, under the general framework of Gibbstype priors we show how to derive credible intervals for the Bayesian nonparametric estimator of D_n(l), and we investigate the large n asymptotic behavior of such an estimator. We also compare this estimator to the classical Good–Turing estimator (joint work with Stefano Favaro (Collegio Carlo Alberto & University of Torino), Bernardo Nipoti (Trinity College Dublin) and Yee Whye Teh (Oxford University)).
Hierarchical hazard rates for partially exchangeable survival times
Federico Camerlenghi, Bocconi University
Survival analysis represents one among the first areas of applications of Bayesian nonparametric techniques. A large amount of literature has been developed to model prior distributions of hazard rates for exchangeable, and possibly censored, survival times. Exchangeability corresponds to assuming homogeneity among the data, which is quite restrictive in a large variety of applied problems where data are generated by different experiments. Even if these experiments may be related, they represent a source of heterogeneity that cannot be accommodated for by the exchangeability assumption. Hence, one needs to resort to more general dependence structures. In such situations partial exchangeability is a more suitable assumption. Here we define a novel class of dependent random hazard rates, which work as prior distributions in presence of partially exchangeable survival times. They are expressed as mixtures of kernels with respect to a vector of hierarchical completely random measures, which has the advantage to enable dependence across the diverse groups of observations. We characterize the posterior distribution of the hierarchical completely random measures, which is the key tool to estimate the survival functions through a Markov chain Monte Carlo algorithm. Besides we are able to obtain reliable credible intervals for the estimated quantities developing a novel and efficient Ferguson & Klass–type algorithm, that avoids to marginalize out the infinite–dimensional random elements of the model. Finally we concentrate on some illustrative examples, both real and simulated, to show the benefits of the whole construction (joint work with Antonio Lijoi and Igor Prünster).
ResamplingBased Inference for the MannWhitney Effect for RightCensored and Tied Data
Dennis Dobler, Ulm University
In a twosample survival setting with independent survival variables $T_1$ and $T_2$ and independent rightcensoring, the MannWhitney effect $p = P(T_1 > T_2) + \frac12 P(T_1 = T_2)$ is an intuitive measure for discriminating two survival distributions. Comparing two treatments, the case $p> 1/2$ suggests the superiority of the first. Nonparametric maximum likelihood estimators based on normalized KaplanMeier estimators naturally handle tied data, which are omnipresent in practical applications. Studentizations allow for asymptotically accurate inference for $p$. For small samples, however, coverage probabilities of confidence intervals are considerably enhanced by means of bootstrap and permutation techniques. The latter even yields finitely exact procedures in the situation of exchangeable data.Simulation results support all theoretic properties under various censoring and distribution setups.
PseudoMarginal Monte Carlo for the Bayesian Gaussian Process Latent Variable Model
Charles Gadd, University of Warwick
Gaussian process latent variable models (GPLVMs) can be viewed as a nonlinear extension to the dual of probabilistic principal component analysis, where in the dual we instead optimize the latent variables and marginalize the transformation matrix. In recent years these models have emerged as a powerful tool for modelling multi dimensional data. One variant is the Bayesian GPLVM (BGPLVM) which allows for the additional marginalisation of latent variables using variational Bayes and variational sparse GP regression. We focus on the a further generalization, the dynamic BGPLVM for supervised learning, which incorporates general input information through a GP prior. In GP models we choose to parameterize our kernels with a set of hyperparameters to allow for a degree of flexibility. Having marginalized over the latent space it is common to optimise the variational parameters and hyperparameters simultaneously through maximum likelihood. However, a fully Bayesian model would both infer all parameters and latent variables, plus integrate over them with respect to their posterior distributions to account for their uncertainty when making predictions. Unfortunately it is not possible to obtain these analytically. We may choose to perform this inference using stochastic approximations based on MCMC, but find that the strong coupling between the latent variables and hyperparameters a posteriori provides a challenge when sampling and results in poor mixing. To break this correlation when sampling we propose the use of Pseudo Marginal Monte Carlo, approximately integrating out the latent variables while retaining the exact posterior distribution over hyperparameters as the invariant distribution of our Markov Chain and ergodicity properties. This works shows the ability of a fully Bayesian treatment to better quantify uncertainty when compared to the maximum likelihood or other optimization based approaches (joint work with Sara Wade, and Akeel Shah).
Needles and Straw in a Haystack: Robust Empirical Bayes Confidence for Possibly Sparse Sequences
Nurzhan Nurushev, VU Amsterdam
In the signal+noise model (the noise is not necessarily independent normals) we construct an empirical Bayes posterior which we then use for \emph{uncertainty quantification} for the unknown, possibly sparse, signal. We introduce a novel \emph{excessive bias restriction} (EBR) condition, which gives rise to a new slicing of the entire space that is suitable for uncertainty quantification. Under EBR and some mild conditions on the noise, we establish the local (oracle) confidence optimality of the empirical Bayes credible ball. In passing, we also get the local optimal results for estimation and posterior contraction problems. Adaptive minimax results (also for the estimation and posterior contraction problems) over sparsity classes follow from our local results.
Estimation of a twocomponent mixture model with applications to multiple testing
Rohit Patra, University of Florida
We consider estimation and inference in a two component mixture model where the distribution of one component is completely unknown. We develop methods for estimating the mixing proportion and the unknown distribution nonparametrically, given i.i.d. data from the mixture model. We use ideas from shape restricted function estimation and develop "tuning parameter free" estimators that are easily implementable and have good finite sample performance. We establish the consistency of our procedures. Distributionfree finite sample lower confidence bounds are developed for the mixing proportion. We discuss the connection with the problem of multiple testing and compare our procedure with some of the existing methods in that area through simulation studies.

