[Interests] [Experiences] [Publications] [Professional Activities] [Back to Home]

Applications to the Web

How can we optimize an online system that automates its decision making to maximize metrics of interest? RL finds natural applications to many applications on the Web, such as recommendation, advertising, and search. Many RL techniques have been successfully deployed with substantial improvements over non-RL approaches.

Z. Tang, Y. Duan, S. Zhu, S. Zhang, and L. Li: Estimating long-term effects from experimental data. In the 16th ACM Conference on Recommender Systems (RecSys), Industry Track, 2022.
K. Hofmann, L. Li, and F. Radlinski: Online Evaluation for Information Retrieval. Foundations and Trends in Information Retrieval, 10(1):1--107, 2016. ISBN 978-1-68083-163-4. [link, PDF]
M. Zoghi, T. Tunys, L. Li, D. Jose, J. Chen, C.-M. Chin, and M. de Rijke: Click-based hot fixes for underperforming torso queries. In the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2016. [link]
L. Li, S. Chen, J. Kleban, and A. Gupta: Counterfactual estimation and optimization of click metrics in search engines: A case study. In the 24th International Conference on World Wide Web (WWW), Companion, 2015. [link]
L. Li, J. Kim, and I. Zitouni: Toward predicting the outcome of an A/B experiment for search relevance. In the 8th International Conference on Web Search and Data Mining (WSDM), 2015. [link]
D. Yankov, P. Berkhin, and L. Li: Evaluation of explore-exploit policies in multi-result ranking systems. Microsoft Journal on Applied Research, volume 3, pages 54--60, 2015. Also available as Microsoft Research Technical Report MSR-TR-2015-34, May 2015.
J. Bian, B. Long, L. Li, T. Moon, A. Dong, and Y. Chang: Exploiting user preference for online learning in Web content optimization systems. In ACM Transactions on Intelligent Systems and Technology, 5(2), 2014.
L. Li, W. Chu, J. Langford, T. Moon, and X. Wang: An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Journal of Machine Learning Research - Workshop and Conference Proceedings 26: On-line Trading of Exploration and Exploitation 2, 2012.
H. Wang, A. Dong, L. Li, Y. Chang, and E. Gabrilovich: Joint relevance and freshness learning From clickthroughs for news search. In the 21st International Conference on World Wide Web (WWW), 2012.
T. Moon, W. Chu, L. Li, Z. Zheng, and Y. Chang: Refining recency search results with user click feedback. In ACM Transactions on Information Systems, 30(4), 2012.
O. Chapelle and L. Li: An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems 24 (NIPS), 2011.
L. Li, W. Chu, J. Langford, and X. Wang: Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In the 4th ACM International Conference on Web Search and Data Mining (WSDM), 2011.
A.L. Strehl, J. Langford, L. Li, and S. Kakade: Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems 23 (NIPS), 2011.
T. Moon, L. Li, W. Chu, C. Liao, Z. Zheng, and Y. Chang: Online learning for recency search ranking using real-time user feedback (short paper). In the 19th ACM Conference on Information and Knowledge Management (CIKM), 2010.
L. Li, W. Chu, J. Langford, and R.E. Schapire: A contextual-bandit approach to personalized news article recommendation. In the 19th International Conference on World Wide Web (WWW), 2010.
J. Langford, L. Li, J. Wortman, and Y. Vorobeychik: Maintaining equilibria during exploration in sponsored search auctions. In Algorithmica, 58(4):990--1021, 2010.
J. Wortman, Y. Vorobeychik, L. Li, and J. Langford: Maintaining equilibria during exploration in sponsored search auctions. In the 3rd International Workshop on Internet and Network Economics (WINE), LNCS 4858, 2007.