COMP4901Z: Reinforcement Learning
Fall 2024, Dept. of Computer Science and Engineering (CSE), The Hong Kong University of Science and Technology (HKUST)
Instructor: Long Chen
Class Time & Location: Monday & Wednesday 10:30AM - 11:50AM (LG3009, Lift 10 - 12)
Email: longchen@ust.hk
(For course-related queries, please use the subject starting from [COMP4901Z]
)
Office Hour: Monday 17:00PM - 18:00PM (or ZOOM through canvas)
Teaching Assistant: Yanghao Wang (ywangtg@connect.ust.hk
) and Wei Chen (wchendb@connect.ust.hk
)
TA Office Hour: Wednesday 15:15PM - 16:15PM (By [ZOOM Link])
For those who have enrolled the COMP4901 course, if you want to get the recorded videos for absent classes, you can direct sent emails to the TA.
Course Description: Reinforcement learning (RL) is a computational learning approach where an agent tries to maximize the total amount of reward it receives while interacting with a complex and uncertain environment. It not only shows strong performance in lots of games (such as Go), but also becomes an essential technique in many today’s real-world applications (such as LLM training, and embodied AI). This course aims to teach the fundamentals and the advanced topics of RL. The course content includes the introduction of basic RL elemnets (including MDP, dynamic programming, policy iteration), value-based approaches (DQN), policy-based approaches (policy gradient), model-based RL, multi-agent RL, other advanced topics, and the applications of RL techniques in today’s computer vision or AI applications. To better enhance the understanding, we will also contain some Python/Pytorch implementations.
Pre-requisite:
Math: You should have some background in Linear Algebra and Probability.
Machine Learning: Basic machine learning knowledge (e.g., gradient backpropagation) and deep learning knowledge (e.g., MLP) as needed.
Programming: Python, PyTorch (better)
Grading scheme:
- In-class Quiz: 20%
- Assignment: 30%
- Final Exam: 50%
Reference books/materials:
Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Second Edition. [pdf]
Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun. Reinforcement Learning: Theory and Algorithms. [pdf]
Csaba Szepesvari. Algorithms for Reinforcement Learning. [pdf]
Dimitri P. Bertsekas. Reinforcement Learning and Optimal Control. [pdf]
Content Coverage
- Markov Decision Processes
- Dynamic Programming
- Monte Carlo and Temporal Difference Learning
- Q-Learning
- DQN and advanced techniques
- Policy Gradient
- Actor Critic
- Advanced Policy Gradient
- Continuous Controls
- Imitation Learning
- Model-based RL
- Multi-Agent RL
- RL in CV/NLP (e.g., RLHF)
Syllabus / Schedule
Course overview
Basic RL concepts
Comparions with other ML methods
Exploration vs. Exploitation
Greedy vs. \(\epsilon\)-greedy
Upper Confidence Bound (UCB)
Bayesian Bandits
Markov process, Markov reward process
Markov decision process
Optimal policies and value functions
Policy evaluation
Policy improvement
Policy iteration vs. Value iteration
Monte-Carlo Learning
Temporal Difference Learning
Acknowledgements
This course was inspired by and/or uses reserouces from the following courses:
Reinforcement Learning by David Silver, DeepMind, 2015.
CS285: Deep Reinforcement Learning by Sergey Levine, UC Berkeley, 2023.
CS234: Reinforcement Learning by Emma Brunskill, Stanford University, 2024.
10-403: Deep Reinforcement Learning by Katerina Fragkiadaki, Carnegie Mellon University, 2024.
Special Topics in AI: Foundations of Reinforcement Learning by Yuejie Chi, Carnegie Mellon University, 2023.
CS 6789: Foundations of Reinforcement Learning by Wen Sun and Sham Kakade, Cornell Univeristy.
CS224R: Deep Reinforcement Learning by Chelsea Finn, Stanford University, 2023.
DeepMind x UCL RL Lecture Series by Hado van Hasselt, DeepMind, 2021.