Section 1 : Welcome

Lecture 1 Introduction 3:1
Lecture 2 Course Outline and Big Picture 7:45
Lecture 3 Where to get the Code 4:27
Lecture 4 Anyone Can Succeed in this Course 11:46
Lecture 5 Warmup 15:26

Section 2 : Return of the Multi-Armed Bandit

Lecture 6 Section Introduction The Explore-Exploit Dilemma 10:8
Lecture 7 Applications of the Explore-Exploit Dilemma 7:51
Lecture 8 Epsilon-Greedy Theory 6:55
Lecture 9 Calculating a Sample Mean (pt 1) 5:46
Lecture 10 Epsilon-Greedy Beginner's Exercise Prompt
Lecture 11 Designing Your Bandit Program 3:59
Lecture 12 Epsilon-Greedy in Code 7:1
Lecture 13 Comparing Different Epsilons 5:53
Lecture 14 Optimistic Initial Values Theory 5:30
Lecture 15 Optimistic Initial Values Beginner's Exercise Prompt 2:17
Lecture 16 Optimistic Initial Values Code 4:8
Lecture 17 UCB1 Theory 14:23
Lecture 18 UCB1 Beginner's Exercise Prompt 2:3
Lecture 19 UCB1 Code 3:18
Lecture 20 Bayesian Bandits Thompson Sampling Theory (pt 1) 12:33
Lecture 21 Bayesian Bandits Thompson Sampling Theory (pt 2) 17:25
Lecture 22 Thompson Sampling Beginner's Exercise Prompt 2:40
Lecture 23 Thompson Sampling Code 4:54
Lecture 24 Thompson Sampling With Gaussian Reward Theory 11:14
Lecture 25 Thompson Sampling With Gaussian Reward Code 6:8
Lecture 26 Why don't we just use a library 5:29
Lecture 27 Nonstationary Bandits 7:1
Lecture 28 Bandit Summary, Real Data, and Online Learning 6:20
Lecture 29 (Optional) Alternative Bandit Designs 9:52
Lecture 30 Suggestion Box 2:54

Section 3 : High Level Overview of Reinforcement Learning

Lecture 31 What is Reinforcement Learning 7:58
Lecture 32 On Unusual or Unexpected Strategies of RL 5:59
Lecture 33 From Bandits to Full Reinforcement Learning 8:33

Section 4 : Markov Decision Proccesses

Lecture 34 MDP Section Introduction 6:11
Lecture 35 Gridworld 12:25
Lecture 36 Choosing Rewards 3:49
Lecture 37 The Markov Property 6:2
Lecture 38 Markov Decision Processes (MDPs) 14:33
Lecture 39 Future Rewards 9:24
Lecture 40 Value Functions 4:58
Lecture 41 The Bellman Equation (pt 1) 8:38
Lecture 42 The Bellman Equation (pt 2) 6:33
Lecture 43 The Bellman Equation (pt 3) 6:1
Lecture 44 Bellman Examples 22:25
Lecture 45 Optimal Policy and Optimal Value Function (pt 1) 9:8
Lecture 46 Optimal Policy and Optimal Value Function (pt 2) 4:0
Lecture 47 MDP Summary 2:49

Section 5 : Dynamic Programming

Lecture 48 Intro to Dynamic Programming and Iterative Policy Evaluation 2:58
Lecture 49 Designing Your RL Program 4:50
Lecture 50 Gridworld in Code 11:28
Lecture 51 Iterative Policy Evaluation in Code 12:8
Lecture 52 Windy Gridworld in Code 7:39
Lecture 53 Iterative Policy Evaluation for Windy Gridworld in Code 7:4
Lecture 54 Policy Improvement 2:42
Lecture 55 Policy Iteration 1:51
Lecture 56 Policy Iteration in Code 8:18
Lecture 57 Policy Iteration in Windy Gridworld 8:41
Lecture 58 Value Iteration 3:49
Lecture 59 Value Iteration in Code 6:26
Lecture 60 Dynamic Programming Summary 5:5

Section 6 : Monte Carlo

Lecture 61 Monte Carlo Intro
Lecture 62 Monte Carlo Policy Evaluation 5:36
Lecture 63 Monte Carlo Policy Evaluation in Code
Lecture 64 Policy Evaluation in Windy Gridworld 3:29
Lecture 65 Monte Carlo Control 5:49
Lecture 66 Monte Carlo Control in Code 4:4
Lecture 67 Monte Carlo Control without Exploring Starts 2:50
Lecture 68 Monte Carlo Control without Exploring Starts in Code 2:51
Lecture 69 Monte Carlo Summary

Section 7 : Temporal Difference Learning

Lecture 70 Temporal Difference Intro 1:33
Lecture 71 TD(0) Prediction 3:37
Lecture 72 TD(0) Prediction in Code 2:27
Lecture 73 SARSA 5:6
Lecture 74 SARSA in Code 3:38
Lecture 75 Q Learning 2:56
Lecture 76 Q Learning in Code 2:14
Lecture 77 TD Summary 2:24

Section 8 : Approximation Methods

Lecture 78 Approximation Intro 4:3
Lecture 79 Linear Models for Reinforcement Learning 4:6
Lecture 80 Features 3:53
Lecture 81 Monte Carlo Prediction with Approximation 1:45
Lecture 82 Monte Carlo Prediction with Approximation in Code
Lecture 83 TD(0) Semi-Gradient Prediction 4:14
Lecture 84 Semi-Gradient SARSA 2:58
Lecture 85 Semi-Gradient SARSA in Code 4:8
Lecture 86 Course Summary and Next Steps 8:30

Section 9 : Stock Trading Project with Reinforcement Learning

Lecture 87 Beginners, halt! Stop here if you skipped ahead 13:59
Lecture 88 Stock Trading Project Section Introduction 5:4
Lecture 89 Data and Environment 12:12
Lecture 90 How to Model Q for Q-Learning 9:27
Lecture 91 Design of the Program 6:35
Lecture 92 Code pt 1 7:49
Lecture 93 Code pt 2 9:30
Lecture 94 Code pt 3 4:18
Lecture 95 Code pt 4 7:8
Lecture 96 Stock Trading Project Discussion 3:27

Section 10 : Setting Up Your Environment (FAQ by Student Request)

Lecture 97 Windows-Focused Environment Setup 2018 20:13
Lecture 98 How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow 17:33

Section 11 : Extra Help With Python Coding for Beginners (FAQ by Student Request)

Lecture 99 How to Code by Yourself (part 1) 15:54
Lecture 100 How to Code by Yourself (part 2) 9:23
Lecture 101 Proof that using Jupyter Notebook is the same as not using it 12:29
Lecture 102 Python 2 vs Python 3 4:31

Section 12 : Effective Learning Strategies for Machine Learning (FAQ by Student Request)

Lecture 103 How to Succeed in this Course (Long Version) 10:18
Lecture 104 Is this for Beginners or Experts Academic or Practical Fast or slow-paced 21:58
Lecture 105 Machine Learning and AI Prerequisite Roadmap (pt 1) 11:13
Lecture 106 Machine Learning and AI Prerequisite Roadmap (pt 2) 16:7

Section 13 : Appendix FAQ Finale

Lecture 107 What is the Appendix 2:42