[Interests] [Experiences] [Publications] [Professional Activities] [Back to Home]

Exploration

How can we build an RL agent that learns to act optimally by efficient trial-and-error experimentation? This problem, known as efficient exploration or exploration-exploitation tradeoff, is a unique challenge in RL. It aims to reduce the sample complexity of reinforcement learning by collecting the right data. In practice, research in this area has led to algorithms widely used in industrial applications such as recommendation and online advertising.

X. Chen, J. Hu, C. Jin, L. Li, and L. Wang: Near-optimal representation learning for linear bandits and linear RL. In the 38th International Conference on Machine Learning (ICML), 2021. [arXiv]
X. Chen, J. Hu, L. Li, and L. Wang: Efficient reinforcement learning in factored MDPs with application to constrained RL. In the 9th International Conference on Learning Representations (ICLR), 2021.
W. Zhang, D. Zhou, L. Li, and Q. Gu: Neural Thompson sampling. In the 9th International Conference on Learning Representations (ICLR), 2021. [arXiv]
D. Zhou, L. Li, and Q. Gu: Neural contextual bandits with UCB-based exploration. In the 37th International Conference on Machine Learning (ICML), 2020. [arXiv]
B. Kveton, M. Zaheer, Cs. Szepesvari, L. Li, M. Ghavamzadeh, and C. Boutilier: Randomized exploration in generalized linear bandits. In the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. [arXiv]
C. Dann, L. Li, W. Wei, and E. Brunskill: Policy certificates: Towards accountable reinforcement learning. In the 36th International Conference on Machine Learning (ICML), 2019. [link]
K.-S. Jun, L. Li, Y. Ma, and J. Zhu: Adversarial attacks on stochastic bandits. In Advances in Neural Information Processing Systems 31 (NeurIPS), 2018. [link]
Z. Lipton, X. Li, J. Gao, L. Li, F. Ahmed, and L. Deng: Efficient dialogue policy learning with BBQ-networks. In the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018. [link]
L. Li, Y. Lu, and D. Zhou: Provably optimal algorithms for generalized linear contextual bandits. In the 34th International Conference on Machine Learning (ICML), 2017. [link]
C.-Y. Liu and L. Li: On the Prior Sensitivity of Thompson Sampling. In the 27th International Conference on Algorithmic Learning Theory (ALT), 2016. [link]
S. Agrawal, N. R. Devanur, and L. Li: An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In the 29th Annual Conference on Learning Theory (COLT), 2016. [link]
E. Brunskill and L. Li: The online discovery problem and its application to lifelong reinforcement learning. CoRR abs/1506.03379, June 2015.
A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R.E. Schapire: Taming the monster: A fast and simple algorithm for contextual bandits. In the 31st International Conference on Machine Learning (ICML), 2014.
E. Brunskill and L. Li: PAC-inspired option discovery in lifelong reinforcement learning. In the 31st International Conference on Machine Learning (ICML), 2014.
E. Brunskill and L. Li: Sample complexity of multi-task reinforcement learning. In the 29th Conference on Uncertainty in Artificial Intelligence (UAI), 2013.
L. Li: Sample complexity bounds of exploration. In Marco Wiering and Martijn van Otterlo, editors, Reinforcement Learning: State of the Art, Springer Verlag, 2012. ISBN 978-3642276446.
O. Chapelle and L. Li: An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems 24 (NIPS), 2011.
A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R.E. Schapire: Contextual bandit algorithms with supervised learning guarantees. In the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
W. Chu, L. Li, L. Reyzin, and R. Schapire: Linear contextual bandit problems. In the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
L. Li, M.L. Littman, T.J. Walsh, and A.L. Strehl: Knows what it knows: A framework for self-aware learning. In Machine Learning, 82(3):399--443, 2011.
L. Li, W. Chu, J. Langford, and R.E. Schapire: A contextual-bandit approach to personalized news article recommendation. In the 19th International Conference on World Wide Web (WWW), 2010.
L. Li and M.L. Littman: Reducing reinforcement learning to KWIK online regression. In the Annals of Mathematics and Artificial Intelligence, 58(3--4):217--237, 2010.
J. Langford, L. Li, J. Wortman, and Y. Vorobeychik: Maintaining equilibria during exploration in sponsored search auctions. In Algorithmica, 58(4):990--1021, 2010.
J. Asmuth, L. Li, M.L. Littman, A. Nouri, and D. Wingate: A Bayesian sampling approach to exploration in reinforcement learning. In the 25th International Conference on Uncertainty in Artificial Intelligence (UAI), 2009.
L. Li, M.L. Littman and C.R. Mansley: Online exploration in least-squares policy iteration. In the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
A.L. Strehl, L. Li, and M.L. Littman: Reinforcement learning in finite MDPs: PAC analysis. In the Journal of Machine Learning Research, 10:2413--2444, 2009.
E. Brunskill, B.R. Leffler, L. Li, M.L. Littman, and N. Roy: Provably efficient learning with typed parametric models. In the Journal of Machine Learning Research, 10:1955--1988, 2009.
T.J. Walsh, A. Nouri, L. Li, and M.L. Littman: Planning and learning in environments with delayed feedback. In the Journal of Autonomous Agents and Multi-Agent Systems, 18(1):83--105, 2009.
L. Li: A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Department of Computer Science, Rutgers University, New Brunswick, NJ, USA, May, 2009. [link]
L. Li, M.L. Littman, and T.J. Walsh: Knows what it knows: A framework for self-aware learning. In the 25th International Conference on Machine Learning (ICML), 2008.
E. Brunskill, B.R. Leffler, L. Li, M.L. Littman, and N. Roy: CORL: A continuous-state offset-dynamics reinforcement learner. In the 24th Conference on Uncertainty in Artificial Intelligence (UAI), 2008.
J. Wortman, Y. Vorobeychik, L. Li, and J. Langford: Maintaining equilibria during exploration in sponsored search auctions. In the 3rd International Workshop on Internet and Network Economics (WINE), LNCS 4858, 2007.
T.J. Walsh, A. Nouri, L. Li, and M.L. Littman: Planning and learning in environments with delayed feedback. In the 18th European Conference on Machine Learning (ECML), LNCS 4701, 2007.
A.L. Strehl, L. Li, E. Wiewiora, J. Langford, and M.L. Littman: PAC model-free reinforcement learning. In the 23rd International Conference on Machine Learning (ICML), 2006.
A.L. Strehl, L. Li, and M.L. Littman: Incremental model-based learners with formal learning-time guarantees. In the 22nd Conference on Uncertainty in Artificial Intelligence (UAI), 2006.
A.L. Strehl, L. Li, and M.L. Littman: PAC reinforcement learning bounds for RTDP and Rand-RTDP. AAAI technical report WS-06-11, pages 50-56, July 2006.