Optimization and Reinforcement Learning

Keynote Speaker – Prof. Yuxin Chen, University of Pennsylvania

Time: 2:00 – 3:00 PM

“Breaking the Sample Size Barrier in Reinforcement Learning”

Abstract: Emerging reinforcement learning (RL) applications necessitate the design of sample-efficient solutions in order to accommodate the explosive growth of problem dimensionality. Despite the empirical success, however, our understanding about the statistical limits of RL remains highly incomplete. In this talk, I will present some recent progress towards settling the sample complexity limits in RL. The first scenario is concerned with RL with a generative model, which allows one to query arbitrary state-action pairs to draw independent samples. We prove that a model-based algorithm (a.k.a. the plug-in approach) achieves minimal-optimal sample complexity without any burn-in cost. The second scenario is concerned with online RL, where an agent learns via real-time interactions with an unknown environment. We develop the first algorithm — an optimistic model-based algorithm — that achieves minimax-optimal regret for the entire range of sample sizes. Time permitting, we will also discuss the effectiveness of model-based paradigms in offline RL and multi-agent RL. Our results emphasize the prolific interplay between high-dimensional statistics, online learning, and game theory.

The first part is based on joint work with Gen Li, Yuting Wei and Yuejie Chi, and the second part is based on joint work with Zihan Zhang, Jason Lee and Simon Du.

Paper 1: https://arxiv.org/abs/2005.12900
Paper 2: https://yuxinchen2020.github.io/publications/Optimal-OnlineRL.pdf
Paper 3: https://arxiv.org/abs/2204.05275

Biography: Yuxin Chen is currently a professor of statistics and data science and of electrical and systems engineering at the University of Pennsylvania. Before joining UPenn, he was an assistant professor of electrical and computer engineering at Princeton University. He completed his Ph.D. in Electrical Engineering at Stanford University and was also a postdoc scholar at Stanford Statistics. His current research interests include machine learning theory, statistics, and optimization. He has received the Alfred P. Sloan Research Fellowship, the Leo Breiman Junior Award, the SIAM Activity Group on Imaging Science Best Paper Prize, the ICCM Best Paper Award (gold medal), and has been selected as a finalist for the Best Paper Prize for Young Researchers in Continuous Optimization, and an Information Theory Society Distinguished Lecturer. He has also received the Princeton Graduate Mentoring Award.

Invited Student Speaker – Shaan Ul Haque, Georgia Tech

Time: 3:00 – 3:45 PM

“Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem”

Abstract: Motivated by engineering applications such as resource allocation in networks and inventory systems, we consider average-reward Reinforcement Learning with unbounded state space and reward function. Recent works studied this problem in the actor-critic framework and established finite sample bounds assuming access to a critic with certain error guarantees. We complement their work by studying Temporal Difference (TD) learning with linear function approximation and establishing finite-time bounds with the optimal sample complexity. These results are obtained using the following general-purpose theorem for non-linear Stochastic Approximation (SA).

Suppose that one constructs a Lyapunov function for a non-linear SA with certain drift condition. Then, our theorem establishes finite-time bounds when this SA is driven by unbounded Markovian noise under suitable conditions. It serves as a black box tool to generalize sample guarantees on SA from i.i.d. or martingale difference case to poten tially unbounded Markovian noise. The generality and the mild assumptions of the setup enables broad applicability of our theorem. We illustrate its power by studying three more systems: (i) We analyze the least mean squares algorithm to solve regression problem for generalized linear models with unbounded and uncountable Markovian noise. (ii) We improve upon the finite-time bounds of Q-learning by tightening the error bounds and also allowing for a larger class of behavior policies. (iii) We establish the first ever finite-time bounds for distributed stochastic optimization of high-dimensional smooth strongly convex function using cyclic block coordinate descent.

Biography: Shaan Ul Haque is a fourth-year PhD student in the Machine Learning program within the H. Milton Stewart School of Industrial & Systems Engineering at Georgia Tech, Atlanta where he is advised by Professor Siva Theja Maguluri. Shaan earned his bachelor’s degree in electrical engineering with Honors and a Minor degree in Computer Science and Engineering from the Indian Institute of Technology (IIT), Bombay. Currently, his research focuses on the theoretical analysis of Stochastic Approximation algorithms, with a primary emphasis on their applications in Reinforcement Learning and Machine Learning.

Student Presentations

Time: 4:00 – 5:00 PM

Alex Zhang: “Deep Reinforcement Learning for Rapid Spacecraft Science Operations Scheduling to Enhance Science Return”

Arda Guclu: “A Unified Framework for Regret Analysis of Linear Bandits over Smooth Convex Action Sets”

Argyrios Gerogiannis: “DAL: “A Practical Prior-Free Black-Box Framework for Non-Stationary Bandits””

Alisina Bayati: “Integrating Control Theory and the Maximum Entropy Principle for a Class of Constrained Static and Dynamic Resource Allocation Problems”

CSL Student Conference 2026
Email: omarb3@illinois.edu