[Interests] [Experiences] [Professional Activities] [Back to Home]

Publicaitons

Preprints

  • O. Nachum, B. Dai, I. Kostrikov, Y. Chow, L. Li, and D. Schuurmans: AlgaeDICE: Policy gradient from arbitrary experience. [arXiv]

Conference

  • C. Xiao, Y. Wu, T. Lattimore, B. Dai, J. Mei, L. Li, Cs. Szepesvari, and D. Schuurmans: On the optimality of batch policy optimization algorithms. In the 38th International Conference on Machine Learning (ICML), 2021. [arXiv]
  • X. Chen, J. Hu, C. Jin, L. Li, and L. Wang: Near-optimal representation learning for linear bandits and linear RL. In the 38th International Conference on Machine Learning (ICML), 2021. [arXiv]
  • A. Bennett, N. Kallus, L. Li, and A. Mousavi: Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders. In the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 2021. [arXiv]
  • X. Chen, J. Hu, L. Li, and L. Wang: Efficient reinforcement learning in factored MDPs with application to constrained RL. In the 9th International Conference on Learning Representations (ICLR), 2021.
  • W. Zhang, D. Zhou, L. Li, and Q. Gu: Neural Thompson sampling. In the 9th International Conference on Learning Representations (ICLR), 2021. [arXiv]
  • J. Mei, C. Xiao, B. Dai, L. Li, Cs. Szepesvari, and D. Schuurmans: Escaping the gravitational pull of softmax. In Advances in Neural Information Processing Systems 33 (NeurIPS), oral, 2020.
  • B. Dai, O. Nachum, Y. Chow, L. Li, Cs. Szepesvari, and D. Schuurmans: CoinDICE: Off-policy confidence interval estimation. In Advances in Neural Information Processing Systems 33 (NeurIPS), spotlight, 2020.
  • M. Yang, O. Nachum, B. Dai, L. Li, and D. Schuurmans: Off-policy evaluation via the regularized Lagrangian. In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020. [arXiv]
  • J. Wen, B. Dai, L. Li, and D. Schuurmans: Batch stationary distribution estimation. In the 37th International Conference on Machine Learning (ICML), 2020. [arXiv]
  • D. Zhou, L. Li, and Q. Gu: Neural contextual bandits with UCB-based exploration. In the 37th International Conference on Machine Learning (ICML), 2020. [arXiv]
  • B. Kveton, M. Zaheer, Cs. Szepesvari, L. Li, M. Ghavamzadeh, and C. Boutilier: Randomized exploration in generalized linear bandits. In the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. [arXiv]
  • R. Zhang, B. Dai, L. Li, and D. Schuurmans: GenDICE: Generalized offline estimation of stationary values. In the 8th International Conference on Learning Representations (ICLR), 2020. [link, arXiv]
  • Z. Tang, Y. Feng, L. Li, D. Zhou, and Q. Liu: Doubly robust bias reduction in infinite horizon off-policy estimation. In the 8th International Conference on Learning Representations (ICLR), 2020. [link]
  • A. Mousavi, L. Li, Q. Liu, and D. Zhou: Black-box off-policy estimation for infinite-horizon reinforcement learning. In the 8th International Conference on Learning Representations (ICLR), 2020. [link, arXiv]
  • O. Nachum, Y. Chow, B. Dai, and L. Li: DualDICE: Behavior-agnostic estimation of discounted stationary distribution corrections. In Advances in Neural Information Processing Systems 32 (NeurIPS), spotlight, 2019. [arXiv]
  • Y. Feng, L. Li, and Q. Liu: A kernel loss for solving the Bellman equation. In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019. [arXiv]
  • C. Dann, L. Li, W. Wei, and E. Brunskill: Policy certificates: Towards accountable reinforcement learning. In the 36th International Conference on Machine Learning (ICML), 2019. [link]
  • H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou: Neural logic machines. In the 7th International Conference on Learning Representations (ICLR), 2019. [arXiv]
  • Q. Liu, L. Li, Z. Tang, and D. Zhou: Breaking the curse of horizon: Infinite-horizon off-policy estimation. In Advances in Neural Information Processing Systems 31 (NeurIPS), spotlight, 2018. [link]
  • K.-S. Jun, L. Li, Y. Ma, and J. Zhu: Adversarial attacks on stochastic bandits. In Advances in Neural Information Processing Systems 31 (NeurIPS), 2018. [link]
  • Y. Ma, K.-S. Jun, L. Li, and J. Zhu: Data poisoning attacks in contextual bandits. In the 9th Conference on Decision and Game Theory for Security (GameSec), 2018. [arXiv]
  • D. Tang, X. Li, J. Gao, C. Wang, L. Li, and T. Jebara: Subgoal discovery for hierarchical dialogue policy learning. In the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [link]
  • B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song: SBEED: Convergent reinforcement learning with nonlinear function approximation. In the 35th International Conference on Machine Learning (ICML), 2018. [link, arXiv, slides]
  • Y. Chen, L. Li, and M. Wang: Bilinear π learning using state and action features. In the 35th International Conference on Machine Learning (ICML), 2018. [link, arXiv]
  • B. Dai, A. Shaw, N. He, L. Li, and L. Song: Boosting the actor with dual critic. In the 6th International Conference on Learning Representations (ICLR), 2018. [arXiv]
  • Z. Lipton, X. Li, J. Gao, L. Li, F. Ahmed, and L. Deng: Efficient dialogue policy learning with BBQ-networks. In the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018. [link]
  • J. Chen, C. Wang, L. Xiao, J. He, L. Li, and L. Deng: Q-LDA: Uncovering latent patterns in text-based sequential decision processes. In Advances in Neural Information Processing Systems 30 (NIPS), 2017. [link]
  • B. Peng, X. Li, L. Li, J. Gao, A. Celikyilmaz, S. Lee, and K.-F. Wong: Composite task-completion dialogue system via hierarchical deep reinforcement learning. In the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017. [link]
  • L. Li, Y. Lu, and D. Zhou: Provably optimal algorithms for generalized linear contextual bandits. In the 34th International Conference on Machine Learning (ICML), 2017. [link]
  • S. Du, J. Chen, L. Li, L. Xiao, and D. Zhou: Stochastic variance reduction methods for policy evaluation. In the 34th International Conference on Machine Learning (ICML), 2017. [link]
  • B. Dhingra, L. Li, X. Li, J. Gao, Y.-N. Chen, F. Ahmed, and L. Deng: Towards end-to-end reinforcement learning of dialogue agents for information access. In the 55th Annual Meeting of the Association for Computational Linguistics (ACL), 2017. [link]
  • E. Parisotto, A. Mohamed, R. Singh, L. Li, D. Zhou, and P. Kohli: Neuro-symbolic program synthesis. In the 5th International Conference on Learning Representations (ICLR), 2017. [link]
  • T.K. Huang, L. Li, A. Vartanian, S. Amershi, and J. Zhu: Active learning with oracle epiphany. In Advances in Neural Information Processing Systems 29 (NIPS), 2016. [link]
  • J. He, M. Ostendorf, X. He, J. Chen, J. Gao, L. Li, and L. Deng: Deep reinforcement learning with a combinatorial action space for predicting and tracking popular discussion threads. In the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016. [link]
  • C.-Y. Liu and L. Li: On the Prior Sensitivity of Thompson Sampling. In the 27th International Conference on Algorithmic Learning Theory (ALT), 2016. [link]
  • J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf: Deep reinforcement learning with a natural language action space. In the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 2016. [link]
  • N. Jiang and L. Li: Doubly robust off-policy value evaluation for reinforcement learning. In the 33rd International Conference on Machine Learning (ICML), 2016. [link]
  • S. Agrawal, N. R. Devanur, and L. Li: An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In the 29th Annual Conference on Learning Theory (COLT), 2016. [link]
  • M. Zoghi, T. Tunys, L. Li, D. Jose, J. Chen, C.-M. Chin, and M. de Rijke: Click-based hot fixes for underperforming torso queries. In the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2016. [link]
  • J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf: Deep reinforcement learning with an unbounded action space. In the International Conference on Learning Representations (ICLR), Workshop Track, 2016.
  • L. Li, R. Munos, and Cs. Szepesvari: Toward minimax off-policy value estimation. In the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 2015. [link]
  • L. Li, S. Chen, J. Kleban, and A. Gupta: Counterfactual estimation and optimization of click metrics in search engines: A case study. In the 24th International Conference on World Wide Web (WWW), Companion, 2015. [link]
  • L. Li, J. Kim, and I. Zitouni: Toward predicting the outcome of an A/B experiment for search relevance. In the 8th International Conference on Web Search and Data Mining (WSDM), 2015. [link]
  • L. Li, H. He, and J.D. Williams: Temporal supervised learning for inferring a dialog policy from example conversations. In the IEEE Spoken Language Technology Workshop (SLT), 2014.
  • A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R.E. Schapire: Taming the monster: A fast and simple algorithm for contextual bandits. In the 31st International Conference on Machine Learning (ICML), 2014.
  • E. Brunskill and L. Li: PAC-inspired option discovery in lifelong reinforcement learning. In the 31st International Conference on Machine Learning (ICML), 2014.
  • E. Brunskill and L. Li: Sample complexity of multi-task reinforcement learning. In the 29th Conference on Uncertainty in Artificial Intelligence (UAI), 2013.
  • M. Dudik, D. Erhan, J. Langford, and L. Li: Sample-efficient nonstationary-policy evaluation for contextual bandits. In the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 2012.
  • L. Li, W. Chu, J. Langford, T. Moon, and X. Wang: An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Journal of Machine Learning Research - Workshop and Conference Proceedings 26: On-line Trading of Exploration and Exploitation 2, 2012.
  • V. Navalpakkam, R. Kumar, L. Li, and D. Sivakumar: Attention and selection in online choice tasks. In the 20th International Conference on User Modeling, Adaptation and Personalization (UMAP), 2012
  • H. Wang, A. Dong, L. Li, Y. Chang, and E. Gabrilovich: Joint relevance and freshness learning From clickthroughs for news search. In the 21st International Conference on World Wide Web (WWW), 2012.
  • O. Chapelle and L. Li: An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems 24 (NIPS), 2011.
  • M. Dudik, J. Langford, and L. Li: Doubly robust policy evaluation and learning. In the 28th International Conference on Machine Learning (ICML), 2011.
  • W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng: Unbiased online active learning in data streams. In the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2011.
  • D. Agarwal, L. Li, and A.J. Smola: Linear-time algorithms for propensity scores. In the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  • A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R.E. Schapire: Contextual bandit algorithms with supervised learning guarantees. In the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011. Co-winner of the Notable Paper Award.
  • W. Chu, L. Li, L. Reyzin, and R. Schapire: Linear contextual bandit problems. In the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  • L. Li, W. Chu, J. Langford, and X. Wang: Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In the 4th ACM International Conference on Web Search and Data Mining (WSDM), 2011. Winner of the Best Paper Award.
  • A.L. Strehl, J. Langford, L. Li, and S. Kakade: Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems 23 (NIPS), spotlight, 2011.
  • M. Zinkevich, M. Weimer, A.J. Smola, and L. Li: Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems 23 (NIPS), 2011.
  • T. Moon, L. Li, W. Chu, C. Liao, Z. Zheng, and Y. Chang: Online learning for recency search ranking using real-time user feedback (short paper). In the 19th ACM Conference on Information and Knowledge Management (CIKM), 2010.
  • L. Li, W. Chu, J. Langford, and R.E. Schapire: A contextual-bandit approach to personalized news article recommendation. In the 19th International Conference on World Wide Web (WWW), 2010.
  • L. Li, J.D. Williams, and S. Balakrishnan: Reinforcement learning for spoken dialog management using least-squares policy iteration and fast feature selection. In the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), 2009.
  • C. Diuk, L. Li, and B.R. Leffler: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In the 26th International Conference on Machine Learning (ICML), 2009.
  • J. Asmuth, L. Li, M.L. Littman, A. Nouri, and D. Wingate: A Bayesian sampling approach to exploration in reinforcement learning. In the 25th International Conference on Uncertainty in Artificial Intelligence (UAI), 2009.
  • L. Li, M.L. Littman and C.R. Mansley: Online exploration in least-squares policy iteration. In the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
  • L. Langford, L. Li, and T. Zhang: Sparse online learning via truncated gradient. In Advances in Neural Information Processing Systems 21 (NIPS), spotlight, 2009.
  • L. Li: A worst-case comparison between temporal difference and residual gradient. In the 25th International Conference on Machine Learning (ICML), 2008.
  • L. Li, M.L. Littman, and T.J. Walsh: Knows what it knows: A framework for self-aware learning. In the 25th International Conference on Machine Learning (ICML), 2008. Co-winner of the Best Student Paper Award. A Google Student Award winner at the New York Academy of Sciences Symposium on Machine Learning, 2008.
  • R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M.L. Littman: An analysis of linear models, linear value function approximation, and feature selection for reinforcement learning. In the 25th International Conference on Machine Learning (ICML), 2008.
  • E. Brunskill, B.R. Leffler, L. Li, M.L. Littman, and N. Roy: CORL: A continuous-state offset-dynamics reinforcement learner. In the 24th Conference on Uncertainty in Artificial Intelligence (UAI), 2008.
  • L. Li and M.L. Littman: Efficient value-function approximation via online linear regression. In the 10th International Symposium on Artificial Intelligence and Mathematics (AI&Math), 2008.
  • J. Wortman, Y. Vorobeychik, L. Li, and J. Langford: Maintaining equilibria during exploration in sponsored search auctions. In the 3rd International Workshop on Internet and Network Economics (WINE), LNCS 4858, 2007.
  • T.J. Walsh, A. Nouri, L. Li, and M.L. Littman: Planning and learning in environments with delayed feedback. In the 18th European Conference on Machine Learning (ECML), LNCS 4701, 2007.
  • R. Parr, C. Painter-Wakefield, L. Li, and M.L. Littman: Analyzing feature generation for value-function approximation. In the 24th International Conference on Machine Learning (ICML), 2007.
  • A.L. Strehl, L. Li, E. Wiewiora, J. Langford, and M.L. Littman: PAC model-free reinforcement learning. In the 23rd International Conference on Machine Learning (ICML), 2006. Best Student Poster Award winner at the New York Academy of Sciences Symposium on Machine Learning, 2006.
  • A.L. Strehl, L. Li, and M.L. Littman: Incremental model-based learners with formal learning-time guarantees. In the 22nd Conference on Uncertainty in Artificial Intelligence (UAI), 2006.
  • L. Li, T.J. Walsh, and M.L. Littman: Towards a unified theory of state abstraction for MDPs. In the 9th International Symposium on Artificial Intelligence and Mathematics (AI&Math), 2006.
  • L. Li, M.L. Littman: Lazy approximation for solving continuous finite-horizon MDPs. In the 20th National Conference on Artificial Intelligence (AAAI), 2005.
  • L. Li, V. Bulitko, and R. Greiner: Batch reinforcement learning with state importance (extended abstract). In the 15th European Conference on Machine Learning (ECML), LNCS 3201, 2004.
  • V. Bulitko, L. Li, R. Greiner, and I. Levner: Lookahead pathologies for single agent search (poster paper). In the 18th International Joint Conference on Artificial Intelligence (IJCAI), 2003.
  • I. Levner, V. Bulitko, L. Li, G. Lee, and R. Greiner: Towards automated creation of image interpretation systems. In the 16th Australian Joint Conference on Artificial Intelligence, LNCS 2903, 2003.
  • L. Li, V. Bulitko, R. Greiner, and I. Levner: Improving an adaptive image interpretation system by leveraging. In the 8th Australian and New Zealand Intelligent Information System Conference, 2003.

Journal

  • L. Li: A perspective on off-policy evaluation in reinforcement learning (Invited Paper). Frontiers of Computer Science, 13(5):911-912, 2019. [link, PDF]
  • M. Dudik, D. Erhan, J. Langford, and L. Li: Doubly robust policy evaluation and optimization. In Statistical Science, 29(4):485--511, 2014.
  • J. Bian, B. Long, L. Li, T. Moon, A. Dong, and Y. Chang: Exploiting user preference for online learning in Web content optimization systems. In ACM Transactions on Intelligent Systems and Technology, 5(2), 2014.
  • T. Moon, W. Chu, L. Li, Z. Zheng, and Y. Chang: Refining recency search results with user click feedback. In ACM Transactions on Information Systems, 30(4), 2012.
  • J. Langford, L. Li, P. McAfee, and K. Papineni: Cloud control: Voluntary admission control for Intranet traffic management. In Information Systems and e-Business Management, 10(3):295--308, 2012.
  • L. Li, M.L. Littman, T.J. Walsh, and A.L. Strehl: Knows what it knows: A framework for self-aware learning. In Machine Learning, 82(3):399--443, 2011.
  • L. Li and M.L. Littman: Reducing reinforcement learning to KWIK online regression. In the Annals of Mathematics and Artificial Intelligence, 58(3--4):217--237, 2010.
  • J. Langford, L. Li, J. Wortman, and Y. Vorobeychik: Maintaining equilibria during exploration in sponsored search auctions. In Algorithmica, 58(4):990--1021, 2010.
  • A.L. Strehl, L. Li, and M.L. Littman: Reinforcement learning in finite MDPs: PAC analysis. In the Journal of Machine Learning Research, 10:2413--2444, 2009.
  • E. Brunskill, B.R. Leffler, L. Li, M.L. Littman, and N. Roy: Provably efficient learning with typed parametric models. In the Journal of Machine Learning Research, 10:1955--1988, 2009.
  • J. Langford, L. Li, and T. Zhang: Sparse online learning via truncated gradient. In the Journal of Machine Learning Research, 10:777--801, 2009.
  • T.J. Walsh, A. Nouri, L. Li, and M.L. Littman: Planning and learning in environments with delayed feedback. In the Journal of Autonomous Agents and Multi-Agent Systems, 18(1):83--105, 2009.
  • L. Li, V. Bulitko, and R. Greiner: Focus of attention in reinforcement learning. In the Journal of Universal Computer Science, 13(9):1246--1269, 2007.

Theses, Surveys, Books, and Chapters

  • J. Gao, M. Galley, and L. Li: Neural approaches to Conversational AI: Question answering, task-oriented dialogues and social chatbots. Foundations and Trends in Information Retrieval, 13(2-3):127-298, 2019. ISBN 978-1-68083-552-6. [link, arXiv]
  • K. Hofmann, L. Li, and F. Radlinski: Online Evaluation for Information Retrieval. Foundations and Trends in Information Retrieval, 10(1):1--107, 2016. ISBN 978-1-68083-163-4. [link, PDF]
  • L. Li: Sample complexity bounds of exploration. In Marco Wiering and Martijn van Otterlo, editors, Reinforcement Learning: State of the Art, Springer Verlag, 2012. ISBN 978-3642276446.
  • L. Li: A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Department of Computer Science, Rutgers University, New Brunswick, NJ, USA, May, 2009. [link]
  • L. Li: Focus of attention in reinforcement learning. MSc thesis, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, July, 2004.

Others

  • X. Li, Z.C. Lipton, B. Dhingra, L. Li, J. Gao, Y.-N. Chen: A user simulator for task-completion dialogues. MSR technical report, December 2016.
  • E. Brunskill and L. Li: The online discovery problem and its application to lifelong reinforcement learning. CoRR abs/1506.03379, June 2015.
  • D. Yankov, P. Berkhin, and L. Li: Evaluation of explore-exploit policies in multi-result ranking systems. Microsoft Journal on Applied Research, volume 3, pages 54--60, 2015. Also available as Microsoft Research Technical Report MSR-TR-2015-34, May 2015.
  • Z. Qin, V. Petricek, N. Karampatziakis, L. Li, and J. Langford: Efficient online bootstrapping for large scale learning. NIPS Workshop on Big Data, December, 2013. Also available as Microsoft Research Technical Report MSR-TR-2013-132.
  • L. Li and O. Chapelle: Regret bounds for Thompson sampling (Open Problems). In the Twenty-Fifth Annual Conference on Learning Theory (COLT), 2012
  • L. Li and M.L. Littman: Prioritized sweeping converges to the optimal value function. Technical report DCS-TR-631, Department of Computer Science, Rutgers University, May 2008.
  • A.L. Strehl, L. Li, and M.L. Littman: PAC reinforcement learning bounds for RTDP and Rand-RTDP. AAAI technical report WS-06-11, pages 50-56, July 2006.
  • L. Li and M.L. Littman: Lazy approximation: A new approach for solving continuous finite-horizon MDPs. Technical report DCS-TR-577, Department of Computer Science, Rutgers University, May 2005.