Lihong Li

Research Scientist
Google Inc.

lihongli.cs@gmail.com (for general academic work)
lihong@google.com (for Google related work)
747 Sixth Street South, Kirkland, WA, USA 98033


[Interests] [Experiences] [Publications] [Professional Activities] [Back to Home]


Value Function Approximation (VFA) and Generalization

How can we estimate long-term reward of taking an action in a given state, using a compact representation? This question lies in the heart of modern reinforcement learning, and is a key component in many prominent RL applications. This problem is notoriously challenging (the ``deadly triad''): many of the classic algorithms are known to diverge, and it does happen very often in practice. Our recent work (SBEED) provides the first provably convergent VFA algorithm in the controlled case, with nonlinear funnction classes, and with off-policy data.
  • B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song: SBEED: Convergent reinforcement learning with nonlinear function approximation. In the 35th International Conference on Machine Learning (ICML), 2018. [PDF, arXiv, slides]
  • Y. Chen, L. Li, and M. Wang: Bilinear π learning using state and action features. In the 35th International Conference on Machine Learning (ICML), 2018. [PDF, arXiv]
  • B. Dai, A. Shaw, N. He, L. Li, and L. Song: Boosting the actor with dual critic. In the 6th International Conference on Learning Representations (ICLR), 2018. [arXiv]
  • J. Chen, C. Wang, L. Xiao, J. He, L. Li, and L. Deng: Q-LDA: Uncovering latent patterns in text-based sequential decision processes. In Advances in Neural Information Processing Systems 30 (NIPS), 2017. [link]
  • S. Du, J. Chen, L. Li, L. Xiao, and D. Zhou: Stochastic variance reduction methods for policy evaluation. In the 34th International Conference on Machine Learning (ICML), 2017. [link]
  • L. Li: A worst-case comparison between temporal difference and residual gradient. In the 25th International Conference on Machine Learning (ICML), 2008.
  • R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M.L. Littman: An analysis of linear models, linear value function approximation, and feature selection for reinforcement learning. In the 25th International Conference on Machine Learning (ICML), 2008.
  • L. Li and M.L. Littman: Efficient value-function approximation via online linear regression. In the 10th International Symposium on Artificial Intelligence and Mathematics (AI&Math), 2008.
  • R. Parr, C. Painter-Wakefield, L. Li, and M.L. Littman: Analyzing feature generation for value-function approximation. In the 24th International Conference on Machine Learning (ICML), 2007.
  • L. Li, T.J. Walsh, and M.L. Littman: Towards a unified theory of state abstraction for MDPs. In the 9th International Symposium on Artificial Intelligence and Mathematics (AI&Math), 2006.
  • L. Li, M.L. Littman: Lazy approximation for solving continuous finite-horizon MDPs. In the 20th National Conference on Artificial Intelligence (AAAI), 2005.