The goal of Machine Learning Theory is to understand fundamental principles and capabilities of learning from data, as well as designing and analyzing machine learning algorithms. We invite you to the Machine Learning Theory Session of CSL student conference if you are curious about when, how, and why machine learning algorithms work.

The session consists of a keynote speech followed by several student talks in which students present their current research. Besides the theoretical aspects of machine learning, this session covers topics including (but not limited to) statistical inference, algorithms, graphical models, signal processing, etc.

## Confirmed Speakers

**Keynote Speaker – Prof. Avrim Blum, Toyota Technological Institute at Chicago**

*Learning about Agents and Mechanisms from Opaque Transactions
*

*February 23, 14:30-15:20, CSL B02*

In this talk I will discuss the problem of trying to learn the requirements and preferences of economic agents by observing the outcomes of an allocation mechanism whose rules you also don’t initially know. As an example, consider observing web pages where the agents are advertisers and the winners are those whose ads show up on the given page. We know these ads are placed based on bids and other constraints given to some auction mechanism, but we do not get to see these bids and constraints. What we would like to do is from repeated observations of this type to learn what the requirements and preferences of the agents are. Or consider observing the input-output behavior of some scheduling service, where the input consists of a set of agents requesting service, and the output tells us which actually received service and which did not. In this case, we assume the agents who did not receive service were not served due to overlap of their resource needs with higher-priority requests. From such input-output behavior, we would like to learn the underlying structure. Our goal will be from observing a series of such interactions to try to learn both the needs and preferences of the agents and perhaps also the rules of the allocation mechanism.

This talk is based on work joint with Yishay Mansour and Jamie Morgenstern, as well as work joint with Michael Liang.

**Bio**

Avrim Blum received his BS, MS, and PhD from MIT in 1987, 1989, and 1991 respectively. He then served on the faculty in the Computer Science Department at Carnegie Mellon University from 1992 to 2017. In 2017 he joined the Toyota Technological Institute at Chicago as Chief Academic Officer.

Prof. Blum’s main research interests are in Theoretical Computer Science and Machine Learning, including Machine Learning Theory, Approximation Algorithms, Algorithmic Game Theory, and Database Privacy, as well as connections among them. Some current specific interests include multi-agent learning, multi-task learning, semi-supervised learning, and the design of incentive systems. He is also known for his past work in AI Planning. Prof. Blum has served as Program Chair for the IEEE Symposium on Foundations of Computer Science (FOCS) and the Conference on Learning Theory (COLT). He has served as Chair of the ACM SIGACT Committee for the Advancement of Theoretical Computer Science and on the SIGACT Executive Committee. Prof. Blum is recipient of the AI Journal Classic Paper Award, the ICML/COLT 10-Year Best Paper Award, the Sloan Fellowship, the NSF National Young Investigator Award, and the Herbert Simon Teaching Award, and he is a Fellow of the ACM.

### Invited Student Speaker

Abhiram Natarajan, Purdue University

*Communication-Efficient Distributed Learning of Discrete Probability Distributions
*

*February 23, 15:30-15:50, CSL B02*

We initiate a systematic investigation of distribution learning (density estimation) when sample data is distributed across multiple servers. The servers must communicate with a referee and the goal is to estimate the underlying distribution with as few bits of communication as possible. We focus on non-parametric density estimation of discrete distributions with respect to the l1 and l2 norms. We provide the first non-trivial upper and lower bounds on the communication complexity of this basic estimation task in various settings of interest.

Our distributed learning algorithms run in near-linear time and are robust to model mis-specification. Our results provide insights on the interplay between structure and communication efficiency for a range of fundamental distribution estimation tasks.

**Bio**

Abhiram Natarajan is a PhD student at Purdue University working in Real-Algebraic Geometry. He is advised by Prof. Saugata Basu and Prof. Elena Grigorescu. He has a masters degree from Brown University. His research interests are centered in algebraic geometry, including applications of AG to various areas.

### UIUC Speakers

Weihao Gao

*Estimating Mutual Information for Discrete-Continuous Mixtures
*

*February 23, 15:50-16:10, CSL B02*

Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two special cases of purely discrete or purely continuous pairs of random variables. In this paper, we design a novel estimator for mutual information of discrete-continuous mixtures. We prove that the proposed estimator is consistent. We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics. This significantly widens the applicability of mutual information estimation in real-world applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components.

**Bio**

Weihao Gao is a Ph.D. candidate at the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign (UIUC). He is working under the supervision of Prof. Sewoong Oh and Prof. Pramod Viswanath. He received his B.E. in Computer Science from Institute for Interdisciplinary Information Sciences at Tsinghua University in 2014, and M.S. from the Depratment of Electrical and Computer Engineering at University of Illinois at Urbana-Champaign in 2016. His research interest contains machine learning, information theory, statistics and deep learning.

Ashish Khetan

*PacGAN: The power of two samples in generative adversarial networks
*

*February 23, 16:10-16:30, CSL B02*

Generative adversarial networks (GANs) are innovative techniques for learning generative models of complex data distributions from samples. Despite remarkable recent improvements in generating realistic images, one of their major shortcomings is the fact that in practice, they tend to produce samples with little diversity, even when trained on diverse datasets. This phenomenon, known as mode collapse, has been the main focus of several recent advances in GANs. We propose a principled approach to handling mode collapse, which we call packing. We borrow analysis tools from binary hypothesis testing—in particular, the seminal result of Blackwell [Bla53]—to prove a fundamental connection between packing and mode collapse. We show that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process. Numerical experiments on benchmark datasets suggest that packing provides significant improvements in practice as well.

**Bio**

Ashish Khetan is a final year PhD student at UIUC. His research interests are in the domain of theoretical machine learning. He has worked on problems in rank-aggregation, crowdsourcing, matrix completion, deep learning, and GANs.

Ravi Kiran Raman

*Unsupervised Image Clustering and Registration using Multivariate Information Measures
*

*February 23, 16:30-16:50, CSL B02*

The task of image clustering and registration consider the task of grouping similar images into clusters and aligning the intra-cluster images to the same orientation, for instance in medical imaging and diagnostics. Several feature-based and value-based methods have been defined, one of which is the max mutual information method. We define universal joint clustering and registration algorithms using multivariate information functionals. We first prove the asymptotic optimality of the algorithm in registering two images using the method of Markov types to compare the error exponent with that of the maximum likelihood method. We then show the shortcomings of pairwise registration in multi-image registration, and design an asymptotically optimal algorithm using multi-information. Finally, we use novel multivariate information functionals to perform joint clustering and registration of images, and prove exponential consistency of the algorithm.

**Bio**

Ravi Kiran Raman is a fourth year PhD candidate at the Department of Electrical and Computer Engineering at UIUC. He received the B.Tech. and M.Tech. degrees in Electrical Engineering and Communication Systems from the Indian Institute of Technology (IIT), Madras, in 2014. Ravi is primarily interested in the machine learning, information theory, human-based noisy computation systems, and blockchain systems.

Cesar A. Uribe

*Optimal Algorithms for Distributed Optimization
*

*February 23, 16:50-17:10, CSL B02*

In this paper, we study the optimal convergence rates for distributed convex optimization problems in networks, where the objective is to minimize the sum $\sum_{i=1}^{m}f_i(z)$ of local agent functions. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different cases, namely: the case when each function $f_i$ is strongly convex and smooth, the cases when it is either strongly convex or smooth, and the case when it is convex but neither strongly convex nor smooth. Our approach is based on the dual of an appropriately formulated primal problem which includes the underlying static graph that models the communication restrictions. Our results show that Nesterov’s accelerated gradient method for the dual problem can be executed in a distributed manner and that it enjoys the same optimal rates as in the centralized version of the problem (up to a constant or logarithmic factors), with an additional cost related to the spectral gap of the interaction matrix. Also, we discuss some extensions of the proposed setup such as for proximal friendly functions and time-varying graphs, and some directions for improvements of the condition numbers.

**Bio**

Cesar A. Uribe received the B.S. degree in electronic engineering from the University of Antioquia, Medell´ın, Colombia, in 2010, and the M.S. degrees in systems and control from Delft University of Technology, Delft, The Netherlands, and in applied mathematics from the University of Illinois at Urbana-Champaign, Urbana, IL, USA, in 2013 and 2016, respectively. He is currently working toward the Ph.D. degree in electrical and computer engineering at the University of Illinois at Urbana-Champaign. His research interests include distributed learning and optimization, decentralized control, optimization theory, and algorithm analysis.

SaveSave