Teaching & Talks | Long Chen @ HKUST

COMP4901Z: Reinforcement Learning

Fall 2024, Dept. of Computer Science and Engineering (CSE), The Hong Kong University of Science and Technology (HKUST)

Instructor: Long Chen
Class Time & Location: Monday & Wednesday 10:30AM - 11:50AM (LG3009, Lift 10 - 12)

Email: longchen@ust.hk (For course-related queries, please use the subject starting from [COMP4901Z])
Office Hour: Monday 17:00PM - 18:00PM (or ZOOM through canvas)

Teaching Assistant: Yanghao Wang (ywangtg@connect.ust.hk) and Wei Chen (wchendb@connect.ust.hk)
TA Office Hour: Wednesday 15:15PM - 16:15PM (By [ZOOM Link])
For those who have enrolled the COMP4901 course, if you want to get the recorded videos for absent classes, you can direct sent emails to the TA.

Course Description: Reinforcement learning (RL) is a computational learning approach where an agent tries to maximize the total amount of reward it receives while interacting with a complex and uncertain environment. It not only shows strong performance in lots of games (such as Go), but also becomes an essential technique in many today’s real-world applications (such as LLM training, and embodied AI). This course aims to teach the fundamentals and the advanced topics of RL. The course content includes the introduction of basic RL elemnets (including MDP, dynamic programming, policy iteration), value-based approaches (DQN), policy-based approaches (policy gradient), model-based RL, multi-agent RL, other advanced topics, and the applications of RL techniques in today’s computer vision or AI applications. To better enhance the understanding, we will also contain some Python/Pytorch implementations.

(Updated) Group-based Presentation Details

Presentation Time: Nov 20th & Nov 25th
Each group has 2 ~ 4 members
Presentation time is 8 ~ 10 minutes for each group
DDL for fixing group member (Nov 4th)
DDL for sending presentation slides (Nov 19th)
DDL for sending survey report (Nov 29th)
Presetnation and report: 2~4 papers (same as #group members) published/released in the last 24 months, as long as the topics are related to RL.
Write a very short summarization (half page / person) about presented papers (using Latex Template: NeurIPS conference).

Pre-requisite:

Math: You should have some background in Linear Algebra and Probability.
Machine Learning: Basic machine learning knowledge (e.g., gradient backpropagation) and deep learning knowledge (e.g., MLP) as needed.
Programming: Python, PyTorch (better)

Grading scheme:

In-class Quiz: 20%
Assignment: 30% (Code assignment: 20%, Project presentation: 10%)
Final Exam: 50%

Reference books/materials:

Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Second Edition. [pdf]
Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun. Reinforcement Learning: Theory and Algorithms. [pdf]
Csaba Szepesvari. Algorithms for Reinforcement Learning. [pdf]
Dimitri P. Bertsekas. Reinforcement Learning and Optimal Control. [pdf]

Content Coverage

Markov Decision Processes
Dynamic Programming
Monte Carlo and Temporal Difference Learning
Q-Learning
DQN and advanced techniques
Policy Gradient
Actor Critic
Advanced Policy Gradient
Continuous Controls
Imitation Learning
Model-based RL
Multi-Agent RL
RL in CV/NLP (e.g., RLHF)

Syllabus / Schedule

Lecture

Data

Reading Materials

1.1 Course and RL Introduction
Course overview
Basic RL concepts
Comparions with other ML methods

Sep 02 & 04

1.2 Multi-armed Bandit
Exploration vs. Exploitation
Greedy vs. \(\epsilon\)-greedy
Upper Confidence Bound (UCB)
Bayesian Bandits

Sep 04 & 09

Book (S. & B.) Chapter 2

1.3 Markov Decision Processes
Markov process, Markov reward process
Markov decision process
Optimal policies and value functions

Sep 11 & 16

Book (S. & B.) Chapter 3

1.4 Planning by Dynamic Programming
Policy evaluation
Policy improvement
Policy iteration vs. Value iteration

Sep 16 & 23

Book (S. & B.) Chapter 4

2.1 Model-free Prediction
Monte-Carlo Learning
Temporal Difference Learning

Sep 23 & 25

Book (S. & B.) Chapter 5 & 6

2.2 Model-free Control
On-policy Monte-Carlo Control
Off-policy Monte-Carlo Control
SARSA
Q-Learning

Oct 7 & 8 (2)

Book (S. & B.) Chapter 5 & 6

2.3 Value Function Approximation
Classes of function approximation
Gradient-based algorithm
Convergence and divergence
Deep Q-Learning

Oct 9 & 14

Playing Atari with Deep Reinforcement Learning. NIPS workshop'13.
Human-level Control through Deep Reinforcement Learning. Nature'15.

2.4 Advanced Tricks for DQNs
Experience Replay
Target Network
Double DQN
Dueling Network
Noisy Network

Oct 14 & 16

Deep Reinforcement Learning with Double Q-learning. AAAI'16.
Prioritized Experience Replay. ICLR, 2016.
Dueling Network Architectures for Deep Reinforcement Learning. ICML'16.
A Distributional Perspective on Reinforcement Learning. ICML'17.
Noisy Networks for Exploration. ICLR'18.
Rainbow: Combining Improvements in Deep Reinforcement Learning. AAAI'18.

3.1 Policy Gradient
REINFORCE
Policy gradient with baseline
Off-Policy policy gradient

Oct 21 & 23

Book (S. & B.) Chapter 13

3.2 Actor Critic
Actor critic
Advantage actor critic (A2C)

Oct 28

Book (S. & B.) Chapter 13
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML'18.

3.3 Advanced Policy Gradients
Natural policy gradient
Trust region policy optimization (TRPO)

Oct 30 & Nov 4

A Natural Policy Gradient. NIPS'01.
Natural Actor-Critic. ECML'05
Trust Region Policy Optimization. ICML'15.
Proximal Policy Optimization Algorithms. arXiv'17

3.4 Continuous Control
Deterministic policy gradient
TD3

Nov 4 & 6

Deterministic Policy Gradient Algorithms. ICML'14.
Continuous Control with Deep Reinforcement Learning. ICLR'16.
Addressing Function Approximation Error in Actor-Critic Methods. ICML'18.

4.1 Imitation Learning
Behavior cloning
Dataset Aggregation (DAgger)
Inverse RL

Nov 6 & 11

4.2 Model-based RL (1)
Open-loop planning
Monte Carlo Tree Search (MCTS)

Nov 11

4.3 Model-based RL (2)
Dyna & Dyna-Q
Model-based Policy Learning
Model-free RL with a Model

Nov 13 & 18

5.1 RL in Multimodal Understanding

Nov 18

Student Presentation Session (1)

Nov 20

Student Presentation Session (2)

Nov 25

Course Summarization and Review

Nov 27

Acknowledgements

This course was inspired by and/or uses reserouces from the following courses:

Reinforcement Learning by David Silver, DeepMind, 2015.
CS285: Deep Reinforcement Learning by Sergey Levine, UC Berkeley, 2023.
CS234: Reinforcement Learning by Emma Brunskill, Stanford University, 2024.
10-403: Deep Reinforcement Learning by Katerina Fragkiadaki, Carnegie Mellon University, 2024.
Special Topics in AI: Foundations of Reinforcement Learning by Yuejie Chi, Carnegie Mellon University, 2023.
CS 6789: Foundations of Reinforcement Learning by Wen Sun and Sham Kakade, Cornell Univeristy.
CS224R: Deep Reinforcement Learning by Chelsea Finn, Stanford University, 2023.
DeepMind x UCL RL Lecture Series by Hado van Hasselt, DeepMind, 2021.