[Interests] [Experiences] [Publications] [Professional Activities] [Back to Home]

Applications to Natural Language Processing (NLP)

How can we introduce decision making abilities to natural language-based systems? Some of the these NLP applications, such as conversational systems and more recently large language model post-training, can naturally be modeled in a decision-theoretic framework and optimized by reinforcement learning.

Z. Wei, W. Yao, Y. Liu, W. Zhang, Q. Lu, L. Qiu, C. Yu, P. Xu, C. Zhang, B. Yin, H. Yun, and L. Li: WebAgent-R1: Training web agents via end-to-end multi-turn reinforcement learning, submitted, 2025.
Q. Zhang, L. Qiu, I. Hong, Z. Xu, T. Liu, S. Li, R. Zhang, Z. Li, L. Li, B. Yin, C. Zhang, J. Chen, H. Jiang, and T. Zhao: Self-rewarding PPO: Aligning large language models with demonstrations only. In the 2nd Conference on Language Modeling (COLM), 2025.
J. Gao, M. Galley, and L. Li: Neural approaches to Conversational AI: Question answering, task-oriented dialogues and social chatbots. Foundations and Trends in Information Retrieval, 13(2-3):127-298, 2019. ISBN 978-1-68083-552-6. [link, arXiv]
D. Tang, X. Li, J. Gao, C. Wang, L. Li, and T. Jebara: Subgoal discovery for hierarchical dialogue policy learning. In the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [link]
Z. Lipton, X. Li, J. Gao, L. Li, F. Ahmed, and L. Deng: Efficient dialogue policy learning with BBQ-networks. In the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018. [link]
J. Chen, C. Wang, L. Xiao, J. He, L. Li, and L. Deng: Q-LDA: Uncovering latent patterns in text-based sequential decision processes. In Advances in Neural Information Processing Systems 30 (NIPS), 2017. [link]
B. Peng, X. Li, L. Li, J. Gao, A. Celikyilmaz, S. Lee, and K.-F. Wong: Composite task-completion dialogue system via hierarchical deep reinforcement learning. In the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017. [link]
B. Dhingra, L. Li, X. Li, J. Gao, Y.-N. Chen, F. Ahmed, and L. Deng: Towards end-to-end reinforcement learning of dialogue agents for information access. In the 55th Annual Meeting of the Association for Computational Linguistics (ACL), 2017. [link]
X. Li, Z.C. Lipton, B. Dhingra, L. Li, J. Gao, Y.-N. Chen: A user simulator for task-completion dialogues. MSR technical report, December 2016.
J. He, M. Ostendorf, X. He, J. Chen, J. Gao, L. Li, and L. Deng: Deep reinforcement learning with a combinatorial action space for predicting and tracking popular discussion threads. In the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016. [link]
J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf: Deep reinforcement learning with a natural language action space. In the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 2016. [link]
J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf: Deep reinforcement learning with an unbounded action space. In the International Conference on Learning Representations (ICLR), Workshop Track, 2016.
L. Li, H. He, and J.D. Williams: Temporal supervised learning for inferring a dialog policy from example conversations. In the IEEE Spoken Language Technology Workshop (SLT), 2014.
L. Li, J.D. Williams, and S. Balakrishnan: Reinforcement learning for spoken dialog management using least-squares policy iteration and fast feature selection. In the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), 2009.