Section 1 : Introduction

Lecture 1 INTRODUCTION TO BRAINMEASURES PROCTOR SYSTEM
Lecture 2 Using labs for preparation 00:08:55 Duration
Lecture 3 Setup Development Environment (Windows 10) - Introduction 00:02:26 Duration
Lecture 4 Setup Development Environment - Python and Spark - Pre-requisites 00:04:12 Duration
Lecture 5 Setup Development Environment - Python Setup on Windows 00:03:08 Duration
Lecture 6 Setup Development Environment - Configure Environment Variables 00:02:32 Duration
Lecture 7 Setup Development Environment - Setup PyCharm for developing Python applications 00:05:29 Duration
Lecture 8 Setup Development Environment - Pass run time arguments or parameters 00:02:32 Duration
Lecture 9 Setup Development Environment - Download Spark compressed tar ball 00:01:38 Duration
Lecture 10 Setup Development Environment - Install 7z for uncompress and untar on windows 00:01:00 Duration
Lecture 11 Setup Development Environment - Setup Spark 00:02:27 Duration
Lecture 12 Setup Development Environment - Install JDK 00:06:05 Duration
Lecture 13 Setup Development Environment - Configure environment variables for Spark 00:03:47 Duration
Lecture 14 Setup Development Environment - Install WinUtils - integrate Windows and HDFS 00:06:31 Duration
Lecture 15 Setup Development Environment - Integrate PyCharm and Spark on Windows 10 00:07:06 Duration

Section 2 : Python Fundamentals

Lecture 1 Introduction and Setting up Python 00:09:34 Duration
Lecture 2 Basic Programming Constructs 00:13:16 Duration
Lecture 3 Functions in Python 00:14:05 Duration
Lecture 4 Python Collections 00:16:12 Duration
Lecture 5 Map Reduce operations on Python Collections 00:12:52 Duration
Lecture 6 Setting up Data Sets for Basic IO Operations 00:04:24 Duration
Lecture 7 Basic IO operations and processing data using Collections 00:16:36 Duration

Section 3 : Getting Started

Lecture 1 Get revenue for given order id - as application 00:12:21 Duration
Lecture 2 About Certification
Lecture 3 Setup Environment - Locally 00:02:04 Duration
Lecture 4 Setup Environment - using Cloudera Quickstart VM 00:07:23 Duration
Lecture 5 Using Itversity platforms - Big Data Developer labs and forum 00:07:20 Duration
Lecture 6 Using itversity's big data labs 00:08:37 Duration
Lecture 7 Using Windows - Putty and WinSCP 00:10:34 Duration
Lecture 8 Using Windows - Cygwin 00:14:46 Duration
Lecture 9 HDFS Quick Preview 00:20:25 Duration
Lecture 10 YARN Quick Preview 00:09:53 Duration
Lecture 11 Setup Data Sets 00:07:54 Duration

Section 4 : Apache Spark 1

Lecture 1 Introduction 00:06:05 Duration
Lecture 2 Introduction to Spark 00:02:22 Duration
Lecture 3 Setup Spark on Windows 00:23:15 Duration
Lecture 4 Quick overview about Spark documentation 00:04:38 Duration
Lecture 5 Connecting to the environment 00:03:49 Duration
Lecture 6 Initializing Spark job using pyspark 00:04:54 Duration
Lecture 7 Create RDD from HDFS files 00:18:28 Duration
Lecture 8 Create RDD from collection - using parallelize 00:04:53 Duration
Lecture 9 Read data from different file formats - using sqlContext 00:08:06 Duration
Lecture 10 Row level transformations - String Manipulation 00:11:00 Duration
Lecture 11 Row Level Transformations - map 00:12:25 Duration
Lecture 12 Row Level Transformations - flatMap 00:05:50 Duration
Lecture 13 Filtering data using filter 00:10:09 Duration
Lecture 14 Joining Data Sets - Introduction 00:05:17 Duration
Lecture 15 Joining Data Sets - Inner Join 00:10:34 Duration
Lecture 16 Joining Data Sets - Outer Join 00:14:39 Duration
Lecture 17 Aggregations - Introduction 00:03:01 Duration
Lecture 18 Aggregations - count and reduce - Get revenue for order id 00:12:52 Duration
Lecture 19 Aggregations - reduce - Get order item with minimum subtotal for order id 00:05:47 Duration
Lecture 20 Aggregations - countByKey - Get order count by status 00:05:58 Duration
Lecture 21 Aggregations - understanding combiner 00:06:51 Duration
Lecture 22 Aggregations - groupByKey - Get revenue for each order id 00:08:17 Duration
Lecture 23 groupByKey - Get order items sorted by order_item_subtotal for each order id 00:11:59 Duration
Lecture 24 Aggregations - reduceByKey - Get revenue for each order id 00:10:26 Duration
Lecture 25 Aggregations - aggregateByKey - Get revenue and count of items for each order id 00:14:30 Duration
Lecture 26 Sorting - sortByKey - Sort data by product price 00:09:59 Duration
Lecture 27 Sorting - sortByKey - Sort data by category id and then by price descending 00:10:48 Duration
Lecture 28 Ranking - Introduction 00:01:18 Duration
Lecture 29 Ranking - Global Ranking using sortByKey and take 00:02:49 Duration
Lecture 30 Ranking - Global using takeOrdered or top 00:07:29 Duration
Lecture 31 Ranking - By Key - Get top N products by price per category - Introduction 00:03:54 Duration
Lecture 32 Ranking - By Key - Get top N products by price per category - Python collections 00:04:41 Duration
Lecture 33 Ranking - By Key - Get top N products by price per category - using flatMap 00:03:06 Duration
Lecture 34 Ranking - By Key - Get top N priced products - Introduction 00:03:00 Duration
Lecture 35 Ranking - By Key - Get top N priced products - using Python collections API 00:13:06 Duration
Lecture 36 Ranking - By Key - Get top N priced products - Create Function 00:05:03 Duration
Lecture 37 Ranking - By Key - Get top N priced products - integrate with flatMap 00:04:16 Duration
Lecture 38 Set Operations - Introduction 00:01:05 Duration
Lecture 39 Set Operations - Prepare data 00:08:22 Duration
Lecture 40 Set Operations - union and distinct 00:05:14 Duration
Lecture 41 Set Operations - intersect and minus 00:08:04 Duration
Lecture 42 Saving data into HDFS - text file format 00:11:46 Duration
Lecture 43 Saving data into HDFS - text file format with compression 00:05:52 Duration
Lecture 44 Saving data into HDFS using Data Frames - json 00:11:18 Duration

Section 5 : Apache Spark 1

Lecture 1 Problem Statement 00:01:54 Duration
Lecture 2 Launching pyspark 00:11:45 Duration
Lecture 3 Reading data from HDFS and filtering 00:08:14 Duration
Lecture 4 Joining orders and order_items 00:07:44 Duration
Lecture 5 Aggregate to get daily revenue per product id 00:06:53 Duration
Lecture 6 Load products and convert into RDD 00:10:01 Duration
Lecture 7 Join and sort the data 00:11:39 Duration
Lecture 8 Save to HDFS and validate in text file format 00:07:24 Duration
Lecture 9 Saving data in avro file format 00:11:58 Duration
Lecture 10 Get data to local file system using get or copyToLocal 00:04:51 Duration
Lecture 11 Develop as application to get daily revenue per product 00:07:27 Duration
Lecture 12 Run as application on the cluster 00:05:08 Duration

Section 6 : Apache Spark 1

Lecture 1 Different interfaces to run SQL - Hive, Spark SQL 00:09:26 Duration
Lecture 2 Create database and tables of text file format - orders and order_items 00:25:00 Duration
Lecture 3 Create database and tables of ORC file format - orders and order_items 00:10:19 Duration
Lecture 4 Running SQLHive Commands using pyspark 00:05:17 Duration
Lecture 5 Functions - Getting Started 00:05:11 Duration
Lecture 6 Functions - String Manipulation 00:22:23 Duration
Lecture 7 Functions - Date Manipulation 00:13:44 Duration
Lecture 8 Functions - Aggregate Functions in brief 00:05:49 Duration
Lecture 9 Functions - case and nvl 00:14:10 Duration
Lecture 10 Row level transformations 00:08:31 Duration
Lecture 11 Joining data between multiple tables 00:18:10 Duration
Lecture 12 Group by and aggregations 00:11:41 Duration
Lecture 13 Sorting the data 00:07:27 Duration
Lecture 14 Set operations - union and union all 00:05:39 Duration
Lecture 15 Analytics functions - aggregations 00:15:54 Duration
Lecture 16 Analytics functions - ranking 00:08:40 Duration
Lecture 17 Windowing functions 00:07:49 Duration
Lecture 18 Creating Data Frames and register as temp tables 00:18:46 Duration
Lecture 19 Write Spark Application - Processing Data using Spark SQL 00:09:14 Duration
Lecture 20 Write Spark Application - Saving Data Frame to Hive tables 00:09:35 Duration
Lecture 21 Data Frame Operations 00:13:42 Duration

Section 7 : Setup Hadoop and Spark Environment for Practice

Lecture 1 About Proctor Testing
Lecture 2 Overview of ITVersity Boxes GitHub Repository 00:03:11 Duration
Lecture 3 Creating Virtual Machine 00:10:31 Duration
Lecture 4 Starting HDFS and YARN 00:04:29 Duration
Lecture 5 Gracefully Stopping Virtual Machine 00:05:42 Duration
Lecture 6 Undertanding Datasets provided in Virtual Machine 00:05:39 Duration
Lecture 7 Using GitHub Content for the practice 00:05:12 Duration
Lecture 8 Using Resources for Practice 00:03:55 Duration

Section 8 : Apache Spark 2

Lecture 1 Introduction 00:02:10 Duration
Lecture 2 Review of Setup Steps for Spark Environment 00:08:40 Duration
Lecture 3 Using ITVersity labs 00:03:20 Duration
Lecture 4 Apache Spark Official Documentation (Very Important) 00:07:21 Duration
Lecture 5 Quick Review of Spark APIs 00:12:30 Duration
Lecture 6 Spark Modules 00:05:02 Duration
Lecture 7 Spark Data Structures - RDDs and Data Frames 00:14:49 Duration
Lecture 8 Develop Simple Application 00:14:26 Duration
Lecture 9 Apache Spark - Framework 00:22:20 Duration

Section 9 : Apache Spark 2

Lecture 1 Introduction 00:01:43 Duration
Lecture 2 Data Frames - Overview 00:12:22 Duration
Lecture 3 Create Data Frames from Text Files 00:16:18 Duration
Lecture 4 Create Data Frames from Hive Tables 00:05:50 Duration
Lecture 5 Create Data Frames using JDBC 00:17:14 Duration
Lecture 6 Data Frame Operations - Overview
Lecture 7 Spark SQL - Overview 00:04:00 Duration
Lecture 8 Overview of Functions to manipulate data in Data Frame fields or columns 00:05:52 Duration

Section 10 : Apache Spark 2

Lecture 1 Define Problem Statement - Get Daily Product Revenue 00:06:54 Duration
Lecture 2 Selection or Projection of Data in Data Frames 00:10:27 Duration
Lecture 3 Filtering Data from Data Frames 00:16:33 Duration
Lecture 4 Joining multiple Data Frames
Lecture 5 Perform Aggregations using Data Frames 00:12:24 Duration
Lecture 6 Sorting Data in Data Frames 00:10:24 Duration
Lecture 7 Development Life Cycle using Data Frames 00:14:36 Duration
Lecture 8 Run applications using Spark Submit 00:08:58 Duration

Section 11 : Apache Spark 2

Lecture 1 Data Frame Operations - Window Functions - Overview 00:04:35 Duration
Lecture 2 Data Frames - Window Functions APIs - Overview 00:05:02 Duration
Lecture 3 Define Problem Statement - Get Top N Daily Products 00:02:58 Duration
Lecture 4 Data Frame Operations - Creating Window Spec 00:04:21 Duration
Lecture 5 Data Frame Operations - Performing Aggregations using sum, avg etc 00:11:41 Duration
Lecture 6 Data Frame Operations - Time Series Functions such as Lead, Lag etc 00:15:05 Duration
Lecture 7 Data Frame Operations - Ranking Functions - rank, dense_rank, row_number etc 00:08:45 Duration

Section 12 : Apache Spark using SQL - Getting Started

Lecture 1 Getting Started - Overview 00:02:01 Duration
Lecture 2 Overview of Spark Documentation 00:02:29 Duration
Lecture 3 Launching and using Spark SQL CLI 00:04:08 Duration
Lecture 4 Overview of Spark SQL Properties 00:08:51 Duration
Lecture 5 Running OS Commands using Spark SQL 00:03:19 Duration
Lecture 6 Understanding Warehouse Directory 00:04:13 Duration
Lecture 7 Managing Spark Metastore Databases 00:10:02 Duration
Lecture 8 Managing Spark Metastore Tables 00:03:21 Duration
Lecture 9 Retrieve Metadata of Tables 00:02:19 Duration
Lecture 10 Role of Spark Metastore or Hive Metastore 00:05:01 Duration
Lecture 11 Exercise - Getting Started with Spark SQL 00:08:57 Duration

Section 13 : Apache Spark using SQL - Basic Transformations using Spark SQL

Lecture 1 Basic Transformations using Spark SQL - Introduction 00:03:20 Duration
Lecture 2 Spark SQL - Overview 00:06:42 Duration
Lecture 3 Define Problem Statement 00:03:20 Duration
Lecture 4 Prepare Tables 00:05:06 Duration
Lecture 5 Projecting Data 00:04:01 Duration
Lecture 6 Filtering Data
Lecture 7 Joining Tables - Inner 00:07:30 Duration
Lecture 8 Joining Tables - Outer 00:07:22 Duration
Lecture 9 Aggregating Data 00:11:06 Duration
Lecture 10 Sorting Data 00:04:49 Duration
Lecture 11 Conclusion - Final Solution 00:04:22 Duration

Section 14 : Apache Spark using SQL - Basic DDL and DML

Lecture 1 Introduction 00:02:48 Duration
Lecture 2 Create Spark Metastore Tables 00:10:33 Duration
Lecture 3 Overview of Data Types 00:09:51 Duration
Lecture 4 Adding Comments 00:02:02 Duration
Lecture 5 Loading Data Into Tables - Local 00:04:17 Duration
Lecture 6 Loading Data Into Tables - HDFS 00:06:10 Duration
Lecture 7 Loading Data - Append and Overwrite 00:02:41 Duration
Lecture 8 Creating External Tables 00:03:06 Duration
Lecture 9 Managed Tables vs External Tables 00:04:39 Duration
Lecture 10 Overview of File Formats 00:08:01 Duration
Lecture 11 Drop Tables and Databases 00:04:17 Duration
Lecture 12 Truncating Tables 00:02:17 Duration
Lecture 13 Exercise - Managed Tables 00:07:10 Duration

Section 15 : Apache Spark using SQL - DML and Partitioning

Lecture 1 Introduction 00:03:27 Duration
Lecture 2 Introduction to Partitioning 00:01:22 Duration
Lecture 3 Creating Tables using Parquet 00:04:41 Duration
Lecture 4 Load vs 00:04:25 Duration
Lecture 5 Inserting Data using Stage Table 00:04:52 Duration
Lecture 6 Creating Partitioned Tables
Lecture 7 Adding Partitions to Tables 00:04:01 Duration
Lecture 8 Loading Data into Partitioned Tables 00:08:01 Duration
Lecture 9 Inserting Data into Partitions 00:03:19 Duration
Lecture 10 Using Dynamic Partition Mode 00:04:52 Duration
Lecture 11 Exercise - Partitioned Tables 00:03:34 Duration

Section 16 : Apache Spark using SQL - Pre-defined Functions

Lecture 1 Introduction - Overview of Spark SQL Functions 00:01:45 Duration
Lecture 2 Overview of Functions 00:02:48 Duration
Lecture 3 Validating Functions
Lecture 4 String Manipulation Functions 00:11:03 Duration
Lecture 5 Date Manipulation Functions 00:16:48 Duration
Lecture 6 Overview of Numeric Functions 00:09:24 Duration
Lecture 7 Data Type Conversion 00:04:02 Duration
Lecture 8 Dealing with Nulls 00:07:52 Duration
Lecture 9 Using CASE and WHEN 00:07:32 Duration
Lecture 10 Query Example - Word Count 00:07:16 Duration

Section 17 : Sample Scenarios with Solutions

Lecture 1 Remove - INTRODUCTION TO BRAINMEASURES PROCTOR SYSTEM
Lecture 2 Problem Statements - General Guidelines 00:05:53 Duration
Lecture 3 Initializing the job - General Guidelines 00:13:32 Duration
Lecture 4 Exercise 01 - Get Monthly Crime Count By Type - Understanding Problem Statement 00:03:41 Duration
Lecture 5 Exercise 01 - Get Monthly Crime Count By Type - Core APIs - Design 00:04:02 Duration
Lecture 6 Exercise 01 - Get Monthly Crime Count By Type - Core APIs - Read Data into RDD 00:08:45 Duration
Lecture 7 Exercise 01 - Get Monthly Crime Count By Type - Core APIs - Perform Aggregation 00:09:49 Duration
Lecture 8 Exercise 01 - Get Monthly Crime Count By Type - Core APIs - Sort and Save output 00:11:16 Duration