# European Institute for Statistics, Probability, Stochastic Operations Research and their Applications

About | Research | Events | People | Reports | Alumni | ContactHome

# Abstracts of poster session

Reversible Markov processes and their linear functions
Martial Longla,University of Mississippi

We propose some backward-forward martingale decompositions for functions of reversible Markov chains. These decompositions are used to prove the functional Central limit theorem for reversible Markov chains with asymptotically linear variance of partial sums. We also provide a proof of the equivalence between asymptotic linearity of the variance and convergence of the integral of $1/(1-t)$
with respect to the associated spectral measure $\rho$. We show a result on uniform integrability of the supremum of the average sum of squares of martingale differences. We also study the asymptotic behavior of linear processes having as innovations mean zero square integrable functions of stationary reversible Markov chains. We include in our study the long range dependence case. We apply this study to several cases of reversible stationary Markov chains that arise in regression estimation.

Wasserstein distances between discretely observed Lévy processes
Ester Mariucci, Humbolt University

We present some upper bounds for the Wasserstein distance of order $p$ between the product measures associated with the increments of two independent Lévy processes with possibly infinite Lévy measures. As an application, we deduce an upper bound for the total variation distance between the marginals of two independent Lévy processes with possibly infinite Lévy measures and non-zero Gaussian components. Also, a lower bound for the Wasserstein distance of order $p$ between the marginals of two independent Lévy processes is discussed.

Supervised classification for functional data with a reject option
Diego Andres Perez Ruiz, The University of Manchester

This talk consider the problem of supervised classification with rejection option apply to functional data. Classifiers for functional data pose a challenge. This is because the probability density function for functional data do not exists. Therefore is common to construct classifiers based on projections of the data. Classifiers with rejection are common in real world applications where the presence and cost of errors are detrimental to performance. We start discussing the binary case for plug in estimators and empirical risk minimisers. The case where differentiation between cost associates with the possible two error is also discussed.

Spatial quantile prediction for elliptical random fields
Antonio Usseglio-Carleve, Université de Lyon

In this work, we consider elliptical random fields. We propose some spatial quantile predictions at one site given observations at some other locations. To this aim, we first give exact expressions for conditional quantiles, and discuss problems that occur for computing these values. A first affine regression quantile predictor is detailed, an explicit formula is obtained, and its distribution is given. Direct simple expressions are derived for some particular elliptical random fields. The performance of this regression quantile is shown to be very poor for extremal quantile levels, so that a second predictor is proposed. We prove that this new extremal prediction is asymptotically equivalent to the true conditional quantile. Through numerical illustrations, the study shows that Quantile Regression may perform poorly when one leaves the usual Gaussian random field setting, justifying the use of proposed extremal quantile predictions. To conclude, we give some similar results for spatial expectile prediction, another risk measure (joint work with V. Maume-Deschamps, D. Rulliere).

Rank-based permutation approaches for nonparametric factorial designs
Maria Umlauft, Ulm University

In our work, we develop an exact rank-based permutation test for factorial designs. Typical restrictions of standard methods for analysing such data are equal variances across groups, balanced sample sizes or normally distributed error terms. However, these assumptions are often not met in real data. One alternative is given by the well-known Wald-type statistic (WTS) in semiparametric models, which also yields to asymptotically exact results in cases of heteroscedasticity and no particular error distribution. Recently, a modified permutation test based on the WTS has been develop to improve its small sample properties, see Pauly et al. (2015). However, if ordinal or ordered categorial data are present, the classical parametric and semiparametric models show their limits, since calculated means are neither meaningful nor suitable. Consequently, the aim of our work is to extend the Waldtype permutation statistic (WTPS) to a rank-based WTPS and therefore to nonparametric factorial designs allowing for all kind of observations such as discrete, ordinal or count data. Despite the difficulty of dependencies it is shown that this rank-based approach is also approximately exact and consistent. The small sample behaviour of the nonparametric Wald-type permutation statistic is compared to other well-known techniques in an extensive simulation study (joint work with Frank Konietschke and Markus Pauly).

Permuting longitudinal data in spite of the dependencies
Sarah Friedrich, Institute of Statistics, Ulm University

In many experiments in the life, social or psychological sciences the experimental units are observed at different occasions, e.g. different time points. This leads to certain dependencies between observations from the same unit and results in a more complicated statistical analysis. Classical repeated measures models assume that the observation vectors are independent with normally distributed error terms and a common covariance matrix for all groups. As demonstrated by a practical data set from biometric research, however, these two assumptions are often not met and may inflate the type-I error rates of the corresponding procedures. We present a different approach working under covariance heterogeneity and without postulating any specific underlying distribution. For such general models only a few inference procedures are known which typically do not possess good finite sample properties [1]. For example, the asymptotic pivotal Waldtype test statistic (WTS) is known to be very liberal for small sample sizes and repeated measures. We improve its small sample behavior under the null hypothesis by a studentized permutation technique generalizing the results of [3]. The behavior of the permutation procedure is analyzed in extensive simulation studies with different simulation designs and compared to various competitors. Despite all the dependencies in the repeated measures design the permutation procedure leads to astonishingly successful results. Moreover, the theoretical properties of the method are analyzed and it is applied to a practical data set (joint work with Edgar Brunner , Markus Pauly 1, Germany 2 University Medical Center, Göttingen, Germany).

The importance of quantifying measurement error and uncertainty in biological data
Erica Ponzi, Institute of Evolutionary Biology, University of Zurich

We analyzed the effect of error on models and parameter estimations in evolutionary biology. In this context, it is not trivial to collect precise information and the presence of measurement error or missing data occurs quite often. This is very likely to affect the estimation of the parameters of interest and add more uncertainty into the models. In particular, we analyzed its effect on the estimation of heritability and inbreeding depression, which are widely used in the context of evolution and response to selection. We identified three different sources of error that can affect such estimates, namely the error in the phenotypic measurements, the error in covariates and the pedigree error. We saw that the presence of randomly added phenotypic errors caused an underestimation and a considerable increase of uncertainty in the heritability of phenotypic traits in a population of song sparrows and we obtained the same results in a second dataset relative to a population of snow voles. By simulating errors in covariates as sex and age, we observed the same underestimation and increased uncertainty in the estimates of heritability in both datasets.
Finally, we analyzed pedigree errors in the song sparrow dataset, where some paternities are wrongly associated, and observed a bias in both inbreeding depression and heritability and an increase in the uncertainty of such estimates. In all the analyzed cases, the bias and the increased uncertainty led to inaccurate or even wrong conclusions, failing to detect some real effects or overestimating some others. Our recommendation is therefore to always take into account this possibility and try to handle it correctly. A first improvement can be obtained by taking measurements as accurately as possible and if feasible repeating them multiple times. Even in this case, we suggest to account for such phenomenon and correct for the bias using measurement error models and techniques. For this purpose, a knowledge of the quantity of initial error and of possible sources that may produce it is advisable and can lead to better and more precise estimates (joint work with Stefanie Muff, Lukas Keller).

Proportion of people living alone by age, gender, area type and municipality in South Ostrobothnia region
Urszula Zimoch, Social Sciences of the University of Helsinki

Proportion of people living alone by age, gender, area type and municipality in South Ostrobothnia region. The number of people living alone in Finland, as well as worldwide, has risen over the past few decades and is forecasted to keep growing. People living alone can be seen as a vulnerable group having higher risks of socio-economic problems such as: social exclusion, poverty, lower wellbeing etc. Finnish municipalities are obligated to provide wide range of social and health care services to all its citizens. Among people living alone, gender does matter as men and women may have different skills, therefore different service needs. Age wise, younger population tends to have higher mobility, better health and better social skills then the elder citizens. Rural areas, including, sparsely populated rural areas, are dominating the Finnish landscape and bring the issue of distances between the citizens and the service centres. Introduced in 2013, by Finnish Environment Institute, new urban-rural classification of land area introduces seven different area classes and gives a unique opportunity to analyse the structure of people living alone in different areas. The research is based on official municipal and urban-rural population statistics for 2014 (Statistics Finland), as well as land areas classification (SYKE). Iterative Proportional Fitting procedure (Bishop et al., 1975) method was used to create set of gender and age specific two-way tables presenting the municipal and urban-rural distribution of people living alone. The data is currently analysed with a logistic regression modelling in order to answer the question what is the effect of age, gender, area type and municipality on the proportion of people living alone in Finland. In the contexts of the YES VIII workshop, the most important questions are the convergence of IPFP and estimation of the regression parameters.

Bayesian Singular Spectrum Analysis Forecasting using State Dependent Parameters
Donya Rahmani, University of Bournemouth

The Multivariate Singular Spectrum Analysis (MSSA) forecasting algorithm has an underlying assumption that a time series is governed by a linear recurrent continuation. However, in the presence of a structural break the multiple series will be transferred from one homogeneous state to another over a comparatively short time. Therefore, the linear recurrent formula (LRF) does not coincide with a recurrent continuation of the series before being perturbed. In this paper, we propose a state dependent model to incorporate the movement of states in the LRF called multivariate Bayesian SSA (MBSSA).

The performance of the proposed model is assessed using both synthetic and real data (industrial production series) including a structural break. Of the three methods considered, MSSA, VAR and VECM, MBSSA was the most accurate method for forecasting horizons up to a year. We were also interested to see how the model would perform in the presence of cointegration between all series (eight series for each country), given the promising results of the bivariate model. Our results showed that the model performed poorly compared to its bivariate counterpart. An obstacle to the performance of this model is a lack of strong separability in SSA. Therefore, we applied ICA-MBSSA to overcome the issue in which the accuracy of forecast is also increased.

In practical terms, it seems sensible to incorporate states transition into SSA forecasting for a time series with a structural break. However, there can be an argument on choosing smoothing factors and $\sigma2_\epsilon$ in nonlinear models, like initialise them as a fixed value or let them vary from one point to the next, so as to allow the algorithm to learn the most appropriate value for it. Moreover, the proposed model needs to be more general and flexible regardless of the parameters values for regime switching models. The latter property is probably the most interesting one to investigate further in the context of forecasting series such as electrical demands that present a weekly regime change.

Permuting Incomplete Paired Data: A Novel Exact and Asymptotic Correct Randomization Test
Lubna Amro, Institute of Statistics, University of Ulm

Various statistical tests have been developed for testing the equality of means in matched pairs with missing values. However, most existing methods are commonly based on certain distributional assumptions such as normality, 0-symmetry or homoscedasticity of the data. The aim of this paper is to develop a statistical test that is robust against deviations from such assumptions and also leads to valid inference in case of heteroscedasticity or skewed distributions. This is achieved by applying a clever randomization approach to handle missing data. The resulting test procedure is not only shown to be asymptotically correct but is also finitely exact if the distribution of the data is invariant with respect to the considered randomization group. Its small sample performance is further studied in an extensive simulation study and compared to existing methods. Finally, an illustrative data example is analyzed (joint work with Markus Pauly).

High dimensional inference for a conditional distribution by greedy algorithms
Minh-Lien Jeanne Nguyen, Université Paris Dauphine

A major issue in population genetics is to infer phylogenetic trees from gene pool of studied populations. In statistics, an approach consists in estimating the parameters of the tree given the observed genome. For this purpose, in practice, statisticians have recently developed ABC methods (for Approximate Bayesian Computation). We propose here a new perspective for this problem, interpreting it as an estimation of the conditional distribution in high dimension. For this purpose, we consider new kernel estimators adapted to conditional density (Bertin, Lacour and Rivoirard, Adaptive pointwise estimation of conditional density function, (2015) Ann. IHP) and a greedy algorithm for variable selection. We have addressed several issues : an automatic calibration of the procedure leading to an easy use for practitioners, a low computational time when the dimension of the conditional distributions is of order a few tens, and the theoretical validation of the adaptive implemented procedures via oracle or minimax approaches. Furthermore, in the case of a sparse density, the estimator avoids the curse of dimensionality with a rate depending on the number of relevant components. More precisely, if the density is a $β$-Hölder function depending on only $r$ of its $d$ components, the mean squared error of our estimator converges at rate $n^{−2β/(2β+r)}$ up to a logarithmic factor (joint work with Vincent Rivoirard (CEREMADE) and Claire Lacour (LM-Orsay)).

Estimating occupation time functionals
Randolf Altmeyer, Humboldt-Universität zu Berlin

An occupation time functional is a time integral $\int_{0}^{T}f(X_t)dt$ for a function f and a continuous-time stochastic process X. Given discrete-time observations of the process we approximate the occupation time functional by a Riemann-sum estimator and study the rate of covergence. For Sobolev-smooth functions f we establish surprising upper and lower bounds on the approximation error for many important processes such as Markov processes, semimartingales, and Gaussian processes, e.g. fractional Brownian motion. We also provide a generalized Itô formula for continuous semimartingales, which is of independent interest, and apply it to prove stable central limit theorems.

Home | Recent Changes | To protected page

 Last change: Wed Sep-06-17 15:00:30