[Interests] [Experiences] [Publications] [Professional Activities] [Back to Home]

Value Function and Policy Learning

How can we estimate and optimize long-term reward of taking an action in a given state, using a compact representation? This question lies in the heart of modern reinforcement learning, and is a key component in many prominent RL applications. The problem becomes particularly challenging when function approximation is used, especially in the presence of off-policy data.

J. Mei, C. Xiao, B. Dai, L. Li, Cs. Szepesvari, D. Schuurmans: Escaping the gravitational pull of softmax. In Advances in Neural Information Processing Systems 33 (NeurIPS), oral, 2020.
Y. Feng, L. Li, and Q. Liu: A kernel loss for solving the Bellman equation. In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019. [link, arXiv]
B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song: SBEED: Convergent reinforcement learning with nonlinear function approximation. In the 35th International Conference on Machine Learning (ICML), 2018. [link, arXiv, slides]
Y. Chen, L. Li, and M. Wang: Bilinear π learning using state and action features. In the 35th International Conference on Machine Learning (ICML), 2018. [link, arXiv]
B. Dai, A. Shaw, N. He, L. Li, and L. Song: Boosting the actor with dual critic. In the 6th International Conference on Learning Representations (ICLR), 2018. [arXiv]
J. Chen, C. Wang, L. Xiao, J. He, L. Li, and L. Deng: Q-LDA: Uncovering latent patterns in text-based sequential decision processes. In Advances in Neural Information Processing Systems 30 (NIPS), 2017. [link]
S. Du, J. Chen, L. Li, L. Xiao, and D. Zhou: Stochastic variance reduction methods for policy evaluation. In the 34th International Conference on Machine Learning (ICML), 2017. [link]
L. Li: A worst-case comparison between temporal difference and residual gradient. In the 25th International Conference on Machine Learning (ICML), 2008.
R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M.L. Littman: An analysis of linear models, linear value function approximation, and feature selection for reinforcement learning. In the 25th International Conference on Machine Learning (ICML), 2008.
L. Li and M.L. Littman: Efficient value-function approximation via online linear regression. In the 10th International Symposium on Artificial Intelligence and Mathematics (AI&Math), 2008.
R. Parr, C. Painter-Wakefield, L. Li, and M.L. Littman: Analyzing feature generation for value-function approximation. In the 24th International Conference on Machine Learning (ICML), 2007.
L. Li, T.J. Walsh, and M.L. Littman: Towards a unified theory of state abstraction for MDPs. In the 9th International Symposium on Artificial Intelligence and Mathematics (AI&Math), 2006.
L. Li, M.L. Littman: Lazy approximation for solving continuous finite-horizon MDPs. In the 20th National Conference on Artificial Intelligence (AAAI), 2005.