STOC 2020 Workshop: Recent advances in Discrepancy and Applications

 

Organizers:  Nikhil Bansal (bansal@gmail.com)

                          Aleksandar Nikolov (anikolov@cs.toronto.edu)

 

Description

 

Discrepancy theory is a branch of combinatorics with several applications to other areas of pure and applied mathematics, and to TCS. Classically, discrepancy theory has been used to construct uniformly distributed point sets, for example epsilon-nets in computational geometry. In recent years there has been a lot of progress on several aspects of discrepancy, resulting in new algorithmic techniques and various connections to convex geometry and probability. There have also been surprising applications to topics such as rounding in approximation algorithms, reconstruction attacks in differential privacy, coreset design for kernel density estimation in machine learning, randomized experiment design and so on. The aim of the workshop is to give an overview of these developments, with  a focus on recent developments and open directions.

 

Besides introducing non-experts to the area, one of the main goals of the workshop is to present the latest algorithmic developments in discrepancy theory, and highlight open questions, and connections to other areas, hopefully leading to interesting new research.

Schedule

Date: Tuesday, June 23, 2020  
Time:  noon – 3pm CDT  (1-4 pm EST, 10am-1pm PST)
Talk will talk place online via Zoom.
The livestream link

12:00 -  12:45    Nikhil  Bansal    (slides)           
12:45 -  13:30    Aleksandar Nikolov (slides)
13:30 -  14:00    Zohar Karnin     (slides)
14:00 -  14:30    Peng Zhang       (slides)
14:30 – 15:00    Aditya Potukuchi

Title and Abstracts:

Nikhil Bansal (CWI)

Title: Algorithmic discrepancy, geometry and applications

 

Abstract: We will start with the basic concepts in discrepancy theory and describe some classic applications in areas such as approximation and sparsification. We then consider discrepancy  from a convex geometric and vector-balancing point of view. This view has been extremely useful in the algorithmic progress on discrepancy in recent years, often matching or even improving upon previously known non-constructive bounds. I will give an overview of the these recent results, and the key underlying algorithmic ideas behind them.

 

Aleksandar Nikolov (Toronto)

Title: Hereditary Discrepancy and Factorization Norms

Abstract: Hereditary discrepancy is a robust version of combinatorial discrepancy, equal to the maximum discrepancy of any restriction of the original set system. Hereditary discrepancy is stable under simple operations: for example, it behaves nicely under taking restrictions, unions or duals of the set system. We will discuss a connection between hereditary discrepancy and factorization norms of matrices, which implies these stability properties, gives an efficient approximation algorithm for hereditary discrepancy, and gives a powerful tool for proving upper and lower bounds on discrepancy. We will also sketch applications of hereditary discrepancy to rounding algorithms, to construction of uniformly distributed point sets, and to differential privacy.

 

Zohar Karnin (Amazon)
Title:
Discrepancy, Coresets, and Sketches in Machine Learning


Abstract: In this talk we discuss applications of discrepancy to corsets, machine learning and streaming algorithms. The literature on discrepancy allows to partition a set of vectors or class of functions in a highly balanced manner, surpassing the guarantees of a random partition. A coreset is defined in the context of an optimization task over a set of points and consists of a subset that represents the whole, typically better than a random subset. We review the connection between low discrepancy partitions and coresets, and how an efficient solution to one can be applied to the other. We show how this connection is also helpful in the context of streaming algorithms, and in machine learning where discrepancy methods can be used to provide meaningful generalization bounds for coresets.

 

Peng Zhang (Yale)

Title: Balancing covariates in randomized experiments using the Gram–Schmidt walk 

 

Abstract: In randomized experiments, such as a medical trials, we randomly assign the treatment, such as a drug or a placebo, that each experimental subject receives. Randomization can help us accurately estimate the difference in treatment effects with high probability. We also know that we want the two groups to be similar: ideally the two groups would be similar in every statistic we can measure beforehand. Recent advances in algorithmic discrepancy theory allow us to divide subjects into groups with similar statistics. By exploiting the recent Gram-Schmidt Walk algorithm of Bansal, Dadush, Garg, and Lovett, we can obtain random assignments of low discrepancy. These allow us to obtain more accurate estimates of treatment effects when the information we measure about the subjects is predictive, while also bounding the worst-case behavior when it is not.

 

We will explain the experimental design problem we address, how we use and analyze the Gram-Schmidt walk algorithm, and remaining open problems. This is joint work with Chris Harshaw, Fredrik Sävje and Daniel Spielman.

 

Paper: https://arxiv.org/abs/1911.03071

Code: https://github.com/crharshaw/GSWDesign.jl

Aditya Potukuchi (Rutgers)

Title: Hypergraph discrepancy in random and pseudorandom settings

 

Abstract:   The Beck-Fiala conjecture states that for a t-regular hypergraph (every vertex is in t hyperedges), the discrepancy is at most O(sqrt{t}). The seeming difficulty in making any progress towards resolving this has led to the study of the discrepancy of random t-regular hypergraphs with n vertices and m hyperedges. Here, Ezra and Lovett (RANDOM 2015) showed that the discrepancy is at most O(sqrt{t log t}) almost surely as t grows and n = O(m). Later, a result of Bansal and Meka (SODA 2019) shows that the discrepancy is almost surely O(sqrt{t}) almost surely as n grows and t = Omega(log^2 log m).

 

We will see a proof that for every t > 0, the discrepancy of a random t-regular hypergraph is almost surely O(sqrt{t}) as n grows and n = O(m). The proof is algorithmic and works more generally because it is based on a spectral ``pseudorandomness'' condition of the incidence matrix of the hypergraph, which holds with high probability.

 

If time permits, we will also see discrepancy in random hypergraphs where every point-edge incidence is i.i.d Bernoulli.