Section 1 : Welcome

Lecture 1 Introduction 00:03:01 Duration
Lecture 2 Course Outline and Big Picture 00:07:45 Duration
Lecture 3 Where to get the Code 00:04:27 Duration
Lecture 4 Anyone Can Succeed in this Course 00:11:46 Duration
Lecture 5 Warmup 00:15:26 Duration

Section 2 : Return of the Multi-Armed Bandit

Lecture 1 Section Introduction The Explore-Exploit Dilemma 00:10:08 Duration
Lecture 2 Applications of the Explore-Exploit Dilemma 00:07:51 Duration
Lecture 3 Epsilon-Greedy Theory 00:06:55 Duration
Lecture 4 Calculating a Sample Mean (pt 1) 00:05:46 Duration
Lecture 5 Epsilon-Greedy Beginner's Exercise Prompt
Lecture 6 Designing Your Bandit Program 00:03:59 Duration
Lecture 7 Epsilon-Greedy in Code 00:07:01 Duration
Lecture 8 Comparing Different Epsilons 00:05:53 Duration
Lecture 9 Optimistic Initial Values Theory 00:05:30 Duration
Lecture 10 Optimistic Initial Values Beginner's Exercise Prompt 00:02:17 Duration
Lecture 11 Optimistic Initial Values Code 00:04:08 Duration
Lecture 12 UCB1 Theory 00:14:23 Duration
Lecture 13 UCB1 Beginner's Exercise Prompt 00:02:03 Duration
Lecture 14 UCB1 Code 00:03:18 Duration
Lecture 15 Bayesian Bandits Thompson Sampling Theory (pt 1) 00:12:33 Duration
Lecture 16 Bayesian Bandits Thompson Sampling Theory (pt 2) 00:17:25 Duration
Lecture 17 Thompson Sampling Beginner's Exercise Prompt 00:02:40 Duration
Lecture 18 Thompson Sampling Code 00:04:54 Duration
Lecture 19 Thompson Sampling With Gaussian Reward Theory 00:11:14 Duration
Lecture 20 Thompson Sampling With Gaussian Reward Code 00:06:08 Duration
Lecture 21 Why don't we just use a library 00:05:29 Duration
Lecture 22 Nonstationary Bandits 00:07:01 Duration
Lecture 23 Bandit Summary, Real Data, and Online Learning 00:06:20 Duration
Lecture 24 (Optional) Alternative Bandit Designs 00:09:52 Duration
Lecture 25 Suggestion Box 00:02:54 Duration

Section 3 : High Level Overview of Reinforcement Learning

Lecture 1 What is Reinforcement Learning 00:07:58 Duration
Lecture 2 On Unusual or Unexpected Strategies of RL 00:05:59 Duration
Lecture 3 From Bandits to Full Reinforcement Learning 00:08:33 Duration

Section 4 : Markov Decision Proccesses

Lecture 1 MDP Section Introduction 00:06:11 Duration
Lecture 2 Gridworld 00:12:25 Duration
Lecture 3 Choosing Rewards 00:03:49 Duration
Lecture 4 The Markov Property 00:06:02 Duration
Lecture 5 Markov Decision Processes (MDPs) 00:14:33 Duration
Lecture 6 Future Rewards 00:09:24 Duration
Lecture 7 Value Functions 00:04:58 Duration
Lecture 8 The Bellman Equation (pt 1) 00:08:38 Duration
Lecture 9 The Bellman Equation (pt 2) 00:06:33 Duration
Lecture 10 The Bellman Equation (pt 3) 00:06:01 Duration
Lecture 11 Bellman Examples 00:22:25 Duration
Lecture 12 Optimal Policy and Optimal Value Function (pt 1) 00:09:08 Duration
Lecture 13 Optimal Policy and Optimal Value Function (pt 2) 00:04:00 Duration
Lecture 14 MDP Summary 00:02:49 Duration

Section 5 : Dynamic Programming

Lecture 1 Intro to Dynamic Programming and Iterative Policy Evaluation 00:02:58 Duration
Lecture 2 Designing Your RL Program 00:04:50 Duration
Lecture 3 Gridworld in Code 00:11:28 Duration
Lecture 4 Iterative Policy Evaluation in Code 00:12:08 Duration
Lecture 5 Windy Gridworld in Code 00:07:39 Duration
Lecture 6 Iterative Policy Evaluation for Windy Gridworld in Code 00:07:04 Duration
Lecture 7 Policy Improvement 00:02:42 Duration
Lecture 8 Policy Iteration 00:01:51 Duration
Lecture 9 Policy Iteration in Code 00:08:18 Duration
Lecture 10 Policy Iteration in Windy Gridworld 00:08:41 Duration
Lecture 11 Value Iteration 00:03:49 Duration
Lecture 12 Value Iteration in Code 00:06:26 Duration
Lecture 13 Dynamic Programming Summary 00:05:05 Duration

Section 6 : Monte Carlo

Lecture 1 Monte Carlo Intro
Lecture 2 Monte Carlo Policy Evaluation 00:05:36 Duration
Lecture 3 Monte Carlo Policy Evaluation in Code
Lecture 4 Policy Evaluation in Windy Gridworld 00:03:29 Duration
Lecture 5 Monte Carlo Control 00:05:49 Duration
Lecture 6 Monte Carlo Control in Code 00:04:04 Duration
Lecture 7 Monte Carlo Control without Exploring Starts 00:02:50 Duration
Lecture 8 Monte Carlo Control without Exploring Starts in Code 00:02:51 Duration
Lecture 9 Monte Carlo Summary

Section 7 : Temporal Difference Learning

Lecture 1 Temporal Difference Intro 00:01:33 Duration
Lecture 2 TD(0) Prediction 00:03:37 Duration
Lecture 3 TD(0) Prediction in Code 00:02:27 Duration
Lecture 4 SARSA 00:05:06 Duration
Lecture 5 SARSA in Code 00:03:38 Duration
Lecture 6 Q Learning 00:02:56 Duration
Lecture 7 Q Learning in Code 00:02:14 Duration
Lecture 8 TD Summary 00:02:24 Duration

Section 8 : Approximation Methods

Lecture 1 Approximation Intro 00:04:03 Duration
Lecture 2 Linear Models for Reinforcement Learning 00:04:06 Duration
Lecture 3 Features 00:03:53 Duration
Lecture 4 Monte Carlo Prediction with Approximation 00:01:45 Duration
Lecture 5 Monte Carlo Prediction with Approximation in Code
Lecture 6 TD(0) Semi-Gradient Prediction 00:04:14 Duration
Lecture 7 Semi-Gradient SARSA 00:02:58 Duration
Lecture 8 Semi-Gradient SARSA in Code 00:04:08 Duration
Lecture 9 Course Summary and Next Steps 00:08:30 Duration

Section 9 : Stock Trading Project with Reinforcement Learning

Lecture 1 Beginners, halt! Stop here if you skipped ahead 00:13:59 Duration
Lecture 2 Stock Trading Project Section Introduction 00:05:04 Duration
Lecture 3 Data and Environment 00:12:12 Duration
Lecture 4 How to Model Q for Q-Learning 00:09:27 Duration
Lecture 5 Design of the Program 00:06:35 Duration
Lecture 6 Code pt 1 00:07:49 Duration
Lecture 7 Code pt 2 00:09:30 Duration
Lecture 8 Code pt 3 00:04:18 Duration
Lecture 9 Code pt 4 00:07:08 Duration
Lecture 10 Stock Trading Project Discussion 00:03:27 Duration

Section 10 : Setting Up Your Environment (FAQ by Student Request)

Lecture 1 Windows-Focused Environment Setup 2018 00:20:13 Duration
Lecture 2 How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow 00:17:33 Duration

Section 11 : Extra Help With Python Coding for Beginners (FAQ by Student Request)

Lecture 1 How to Code by Yourself (part 1) 00:15:54 Duration
Lecture 2 How to Code by Yourself (part 2) 00:09:23 Duration
Lecture 3 Proof that using Jupyter Notebook is the same as not using it 00:12:29 Duration
Lecture 4 Python 2 vs Python 3 00:04:31 Duration

Section 12 : Effective Learning Strategies for Machine Learning (FAQ by Student Request)

Lecture 1 How to Succeed in this Course (Long Version) 00:10:18 Duration
Lecture 2 Is this for Beginners or Experts Academic or Practical Fast or slow-paced 00:21:58 Duration
Lecture 3 Machine Learning and AI Prerequisite Roadmap (pt 1) 00:11:13 Duration
Lecture 4 Machine Learning and AI Prerequisite Roadmap (pt 2) 00:16:07 Duration

Section 13 : Appendix FAQ Finale

Lecture 1 What is the Appendix 00:02:42 Duration