Lihong Li

Research Scientist
Google Inc.

lihongli.cs@gmail.com (for general academic work)
lihong@google.com (for Google related work)
747 Sixth Street South, Kirkland, WA, USA 98033


[Interests] [Experiences] [Publications] [Professional Activities] [Back to Home]


Off-policy Learning

How can we evaluate the quality of a new policy using data collected by another policy? Answer to this question finds a wide range of applications in the industry, alleviating the need for frequent online experimentation that can be costly, time-consuming, and risky. This problem is very related to covariate-shift and causal effect estimation.
  • Q. Liu, L. Li, Z. Tang, and D. Zhou: Breaking the curse of horizon: Infinite-horizon off-policy estimation. In Advances in Neural Information Processing Systems 31 (NIPS), 2018. [PDF]
  • N. Jiang and L. Li: Doubly robust off-policy value evaluation for reinforcement learning. In the 33rd International Conference on Machine Learning (ICML), 2016. [link]
  • M. Zoghi, T. Tunys, L. Li, D. Jose, J. Chen, C.-M. Chin, and M. de Rijke: Click-based hot fixes for underperforming torso queries. In the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2016. [link]
  • K. Hofmann, L. Li, and F. Radlinski: Online Evaluation for Information Retrieval. Foundations and Trends in Information Retrieval, 10(1):1--107, 2016. ISBN 978-1-68083-163-4.
  • L. Li, R. Munos, and Cs. Szepesvari: Toward minimax off-policy value estimation. In the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 2015. [link]
  • L. Li, S. Chen, J. Kleban, and A. Gupta: Counterfactual estimation and optimization of click metrics in search engines: A case study. In the 24th International Conference on World Wide Web (WWW), Companion, 2015. [link]
  • L. Li, J. Kim, and I. Zitouni: Toward predicting the outcome of an A/B experiment for search relevance. In the 8th International Conference on Web Search and Data Mining (WSDM), 2015. [link]
  • D. Yankov, P. Berkhin, and L. Li: Evaluation of explore-exploit policies in multi-result ranking systems. Microsoft Journal on Applied Research, volume 3, pages 54--60, 2015. Also available as Microsoft Research Technical Report MSR-TR-2015-34, May 2015.
  • M. Dudik, D. Erhan, J. Langford, and L. Li: Doubly robust policy evaluation and optimization. In Statistical Science, 29(4):485--511, 2014.
  • M. Dudik, D. Erhan, J. Langford, and L. Li: Sample-efficient nonstationary-policy evaluation for contextual bandits. In the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 2012.
  • L. Li, W. Chu, J. Langford, T. Moon, and X. Wang: An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Journal of Machine Learning Research - Workshop and Conference Proceedings 26: On-line Trading of Exploration and Exploitation 2, 2012.
  • M. Dudik, J. Langford, and L. Li: Doubly robust policy evaluation and learning. In the 28th International Conference on Machine Learning (ICML), 2011.
  • D. Agarwal, L. Li, and A.J. Smola: Linear-time algorithms for propensity scores. In the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  • L. Li, Wei Chu, John Langford, and Xuanhui Wang: Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In the 4th ACM International Conference on Web Search and Data Mining (WSDM), 2011.
  • A.L. Strehl, J. Langford, L. Li, and S. Kakade: Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems 23 (NIPS), 2011.