Add to Wishlist

Big Data Masters Program

Enrolled: 2345 students
Duration: 6 Months
Lectures: 535
Level: Intermediate

Introduction to Big Data

1
Introduction to Big Data
2
Big Data System Requirements
3
Monolithic vs Distributed System
4
Distributed System Architecture
5
What is Hadoop And Evolution of Hadoop
6
Core Components of Hadoop
7
HDFS Architecture:
8
What is Node And What is Cluster
9
Data Block & Block Size
10
Slave Node, Master Node, Data Node & Name Node
11
Metadata And Replication Factor
12
Heart Beat & Fault Tolerance
13
Handling Namenode Failure
14
What is SPOF
15
FSimage & Edit Logs
16
Secondary Namenode
17
Name Node Recovery
18
Check Pointing
19
Understanding Replication Factor
20
What is Rack And Rack Failure
21
Rack Awareness Mechanism
22
Block Report
23
Namenode High Availability
24
Quorum Journal Manager & Quorum Journal Node
25
Understanding Linux File System
26
List & Parameters of List Command
27
Touch, Mkdir, Rmdir & Other Linux Commands
28
HDFS Commands:
29
List Files & Directories
30
How HDFS Commands Work
31
‘ls’ Command With Various Parameters
32
Create, Remove File/Directory
33
Copy & Get Files/Folders From Local to HDFS & Vice Versa
34
Move Files/Folders From HDFS to HDFS
35
Change Replication Factor Dynamically
36
View File Metadata Information

MapReduce - Distributed Computing Framework

1
Introduction to MapReduce
2
Stages in MapReduce
3
What is Key-Value
4
What is Map & What is Reduce
5
Example to Undestand Map&Reduce
6
Word Count Example in MapREduce
7
Record Reader
8
Mapper Phase
9
Reducer Phase
10
MapReduce Shuffle & Sort
11
Inside Map & Reduce Phase
12
Wordcount Example in MapReduce
13
Typical MapReduce Flow
14
Blocks in MapReduce
15
Default Number of Mappers & Reducers
16
Understanding Number of Mappers/Reducers
17
MapReduce Framework Behind the Scenes
18
Role of Hash Function in MapReduce
19
Partitioning in MapReduce
20
How to Choose Number of Reducers
21
How Hash Function Works
22
Understanding Shuffle & Sort
23
Example: Calculating Max Temperature in a Day
24
Combiner Function in MapReduce
25
Advantages of Combiners
26
When to Use or Not to Use Combiner
27
Example1: Filtering Data using MapReduce
28
Example2: Finding Distinct Values
29
Example3: Finding Top 3 Most Influential users
30
Realtime Use Case: Google Web Search
31
MapReduce Programming
32
MR Code Explanation
33
How to Write Map Reduce Code
34
Mapper Code
35
Reducer Code
36
Main Code
37
Finding the Frequency of Each Word in a File
38
Mapreduce Jars
39
MapReduce Practical Sessions
40
Word Count Program – Practical Session1
41
Jar Creation & Execution – Practical Session2:
42
How to Create a Jar
43
How to Execute the Jar
44
How to Track a Job
45
How to Track All Previous Jobs
46
MR Program Variations – Practical Session3:
47
How to Change Number of Reducers
48
Writing Custom Partitioner Logic
49
Changing Number of Reducers to Zero
50
Introducing Combiner
51
Writing Custom Combiner logic
52
Introduction to Partitioners
53
Partitioners Code Example

Apache Sqoop - Data Ingestion to Hadoop

1
Sqoop Fundamentals
2
Sqoop Basics
3
What is sqoop
4
Sqoop Workflow
5
Key Features of Sqoop
6
Sqoop Import
7
Sqoop Export
8
Connecting to MySQL
9
Acessing MySQL Databases from Hadoop
10
Acessing MySQL Tables from Hadoop
11
Sqoop Import Practicals
12
Sqoop Export Practicals
13
Sqoop Job
14
Sqoop Incremental Load
15
Sqoop Default Import
16
Sqoop Free-From Query Import
17
Sqoop Direct import
18
Importing Data Into Hive
19
Importing Data Into HBase
20
Sqoop Validate
21
When a Sqoop Export May Fail

Apache Hive

1
Hive Overview:
2
Transactional System and Analytical System
3
Examples of Transactional Systems
4
Examples of Analytical Systems
5
What is Hive
6
Hive Query Language (HQL)
7
Understanding Hive Table
8
Introduction to Hive Metadata
9
Why Hive over traditional databases
10
Transactional and Analytical Processing
11
What is Data Warehouse
12
Hive Architecture
13
Hive on top of Hadoop
14
How Hive Works
15
Transactional vs Analytical Processing
16
Data Warehouse Concept
17
The Hive Metastore
18
Hive vs RDBMS
19
HQL vs SQL
20
Hive Subqueries Views & Index
21
Transactional and Analytical Processing
22
What is Data Warehouse
23
Hive Architecture
24
Hive on Hadoop
25
Hive Metastore
26
Hive vs. RDBMS
27
Hive Complex Data Types
28
Hive Array, Map & Struct
29
Hive Built-in Functions
30
Hive UDF, UDAF & UDTF
31
Hive Lateral Views
32
Hive Subqueries
33
Hive Views
34
Hive Normalization vs Denormalization

Apache Hive Advance

1
Hive Structure Level Optimizations:
2
Hive Partitioning
3
Hive Partitioning With 2 Columns
4
Hive Bucketing
5
Hive Partitioning With Bucketing
6
Hive Query Level Optimizations:
7
Hive Join Optimizations
8
Hive Bucket Map Join Optimizations
9
Hive Window Functions
10
Hive Ranking
11
Hive Sorting
12
Hive File Format
13
Row vs Column File Formats
14
Specialized File Formats
15
Internals of ORC File Formats
16
Internals of Parquet File Formats
17
ORC vs Parquet File Formats
18
Hive Compression Techniques
19
Hive Vectorization
20
Changing the Hive Engine
21
Hive Thrift Server

NoSQL Databases - HBase

1
Hbase Basics
2
Key requirements of database
3
Limitations of Hadoop
4
Google Bigtable concept for quick searching
5
Implementation of Bigtable as Hbase
6
Properties of Hbase
7
What Hbase can offer
8
Row based storage vs Columnar storage
9
Advantages of columnar storage
10
Normalization vs Denormalization
11
CRUD Operation
12
RDBMS vs Hbase
13
Hbase data model
14
4-Dimensional data model
15
CAP Theorem
16
Hbase Architecture
17
Hbase Region Server
18
Region, Memstore, Wal & Block Cache
19
Hfile
20
Zookeeper
21
Hmaster & Meta Table
22
Hbase Architecture components in details
23
Hbase Read/Write operations
24
Compaction
25
Hbase Data Update
26
Hbase Data Deletion
27
Handling Server Failures
28
Hbase Practicals
29
Handling Hbase Failure Services
30
Create & List Table
31
Insert Records in Table
32
Scan(view) & Get records from table
33
Delete a column
34
Describe a table
35
Check table exists or not
36
Drop table – Understanding how it works
37
Parameters of get command
38
Parameters of scan command
39
Hbase files structure in HDFS
40
How to disable/enable a table
41
Various filters in Hbase
42
Count Records

NO SQL Database --Cassandra Overview

1
What is Cassandra
2
How Cassandra Cluster Look Like
3
Tunable read/write Consistency
4
Hbase vs Cassandra
5
Integration with Hadoop (Mini Project)
6
Hbase-Hive Integration

Learning Scala - A Guide to Functional Programming

1
Why Scala
2
Where to Run Scala Code
3
Scala Code Using IDE
4
Scala Basics
5
Var vs val
6
Type inference
7
Data types in Scala
8
String Interpolation
9
String Comparison
10
Flow control: If else
11
Match Case
12
For Loop
13
While loop
14
Scala Functional Programming
15
How to define a function
16
Higher order function
17
Anonymous function
18
Scala Collections
19
Array
20
List
21
Tuple
22
Range
23
Set
24
Map
25
Scala Functional Programming:
26
Why Scala
27
Modes of writing Scala code
28
What is a functional programming
29
What is a function
30
What is a pure function?
31
First class function
32
Higher order function
33
Anonymous function
34
Immutability
35
Loop
36
Recursion
37
Tail recursion
38
Statement vs Expression
39
Closure
40
Scala type system
41
Scala operators
42
Anonymous function
43
Placeholder syntax
44
Partially applied functions
45
Function currying

Apache Spark - General Purpose Cluster Computing Framework

1
What is App class in Scala
2
Default args, named args & variable args
3
Difference between nil, null, none & nothing
4
What is option in Scala
5
What is unit in Scala
6
Dealing with nulls in Scala
7
What is yield
8
What is vector
9
Scala if guards & pattern guards
10
What is “for comprehensions”
11
Difference between “==” in java and Scala
12
Difference between strict val vs lazy val
13
What are default packages in Scala
14
What is Scala apply method
15
What is a diamond problem in Scala
16
What is a trait
17
Why Scala is the top most choice for a big data
18
What is Apache Spark

Apache Spark Introduction

1
What is Apache Spark
2
Understanding Spark cluster
3
Is Spark a replacement to Hadoop
4
Why Spark is faster than MapReduce
5
How data store in Spark
6
What is RDD
7
What is DAG
8
RDD Lineage
9
Resiliency
10
Immutability
11
Transformation & Action
12
Lazy Evaluation
13
Word count program in Spark
14
Word count program in PySpark
15
Word count problem real-time example

Apache Spark --ADVANCE

1
Spark Real-Time Example
2
Broadcast Variable
3
Accumulators
4
How Spark Executes Program on the Cluster
5
Spark Driver and Executors
6
Client Mode, Cluster Mode and Local Mode Analyzing Log Messages – Hands on
7
Narrow vs Wide Transformations
8
Stages in Spark
9
Difference Between reduceByKey & reduce
10
Difference Between groupByKey & reduceByKey
11
Pair RDD
12
Pair RDD vs Map
13
Understanding Default Parallelism
14
Difference Between repartition & coalesce
15
When to Increase/Decrease Partitions
16
Spark on YARN Architecture
17
YARN – Yet Another Resource Negotiator
18
Application Master
19
Containers

Apache Spark - Structured API Part-1

1
Cache vs Persist
2
Spark Storage Levels
3
Difference Between DAG & Lineage
4
How to Submit a Spark Job
5
Real-time example – Finding top movies based on ratings
6
Spark Ecosystem
7
Map vs Map Partitions
8
Introduction to Spark Structured API
9
Spark DataFrame
10
Understanding SparkSession
11
SparkSession vs SparkContext
12
Dataframe with Various Transformations
13
RDD vs DataFrame vs Datasets
14
Challenges with DataFrame
15
Spark Dataset API
16
Difference Between DataFrame and Dataset
17
Benefits of Dataset
18
Creating Dataframe/Datasets from Various File Formats
19
Read Modes & Schema
20
Ways to Define the Schema
21
Defining a Explicit Schema

Apache Spark - Structured API Part-2

1
Writing Output to Sink (spark.write)
2
Spark File Layout
3
Benefits of Repartitions
4
partitionBy & bucketBy
5
Saving file in Various file format
6
Introduction to SparkSql
7
Storing Data in Persistent Manner
8
Handling Spark Metadata
9
Low & High level Transformations
10
Refering to a Column in Dataframe/Dataset
11
Column String
12
Column Object
13
Column Expression
14
Spark UDF using Structured API
15
Adding Column in Dataframe
16
Dataframe to Dataset Using Case Class.
17
Dataset to DataFrame Conversion
18
Spark Catalog
19
Registring UDF with Driver
20
Transformations Hands on Examples
21
Aggregate Transformations
22
Simple Aggregations
23
Grouping Aggregations
24
Window Aggregations
25
Joins on DataFrame
26
Simple Join (Shuffle Sort Merge Join)
27
Broadcast Join
28
Dealing With Ambiguoes Column Names
29
Dealing With Null’s
30
Internals of Join Operations
31
When to Use Simple Join When Use Broadcast Join
32
Grouping Aggregation Real-time Example
33
Infering Data in SparkSQL
34
Quiz
35
Assignment
36
Assignment Solution

Apache Spark - Optimization Part-1

1
Level of Optimizations
2
Resource level optimizations
3
Application level optimizations
4
Cluster level optimizations
5
How to calculate no of Executors
6
Thin Executor
7
Fat Executor
8
How to calculate no of Executors
9
How to Calculate Memory allacation
10
How to Calculate No of Cores
11
Heap Memory
12
Off-Heap Memory
13
Hands on With Real-time cluster
14
Understanding Cluster Configuarations
15
Realtime Example:Moving ata to HDFS using a Edge node and work around it in a realtime cluster
16
Static Resource allocation
17
Dynamic Resource allocation
18
Understanding Memory Usage in Spark
19
Execution Memory
20
Storage Memory
21
Practical Demonstration: Cache & Persist
22
Java Serializer vs Kryo Serializer
23
Quiz
24
Assignment
25
Assignment Solution

Apache Spark - Optimization Part-2

1
Broadcast Join Practical Demonstartions
2
Broadcast Join Using RDD
3
When to Use Broadcast Join
4
Broadcast Join Using Dataframe
5
Visualizing Broadcast Join with Structured API
6
Practical Demo on Repartition vs Coalesce
7
Client Mode vs Cluster Mode When using Spark submit
8
Spark Join Optimizations
9
Spark Advance Optimizations: Sort Aggregate vs Hash Aggregate
10
Spark Catalyst Optimizer
11
Quiz
12
Assignment
13
Assignment Solution

Apache Spark - Streaming

1
What is Real-time Processing
2
The Importance of Real-time Processing
3
Batch processing vs Real-time Stream Processing
4
Spark Streaming Data
5
Spark discretized stream or DStream
6
Batch & Batch Interval
7
Do Spark is a real-time streaming engine
8
Stream Processing in Spark
9
Transformed DStream
10
Understanding Producer & Consumer
11
Practical on Real-time Processing
12
Stream Transformations
13
Stateless Transformations
14
Stateful Transformations
15
Window Operations
16
Batch Interval
17
Window Size
18
Sliding Interval
19
Practical on Stateless Transformation
20
Practical on Stateful Transformation
21
reduceByKey vs updateStateByKey
22
Working With Sliding Window
23
reduceByKeyAndWindow Transformation
24
reduceByWindow Transformation
25
countByWindow Transformation
26
Quiz
27
Assignment
28
Assignment Solution

Apache Spark - Streaming Part-2

1
What Is Structured Streaming
2
Requirement Of Structure Streaming
3
Limitations Of Spark Streaming
4
Benefits Of Spark Structure Streaming
5
Practical – Wordcount Example On Structured Streaming
6
Dynamically Setting The Shuffle Partitions
7
Data Stream Writer Output Modes
8
Datastream Output Modes – append, update & complete
9
Spark Streaming Graceful Shutdown
10
How Does Spark Streaming Code Executes Internally
11
How a Job Converted to Micro batches
12
Trigger Point For Micro Batches
13
Types of Triggers – unspecified, time interval,one time, continuous
14
Types of Data Sources – Socket Source, Rate ,Source, File Source, Kafka Source
15
Limitations of socket source
16
Practical on File Data Source
17
Types of Spark Streaming Output Data Options
18
Fault Tolerance and Exactly Once Guarantee
19
Understanding Checkpoint Location
20
Stateful vs Stateless Transformations
21
Managed Stateful Operations vs UnManaged Stateful Operations
22
Types of Aggregations – Continuous Aggregations vs Time Bound Aggregations
23
Window Tranformations
24
updateStateByKey, reduceByKeyAndWindow,reduceByWindow, countByWindow
25
Types of windows – Tumbling Time Window,Sliding Time Window
26
Streaming Joins
27
Streaming Dataframe to static dataframe
28
Streaming Dataframe With Another Streaming Dataframes
29
Quiz
30
Assignment
31
Assignment Solution

Apache Kafka - Distributed Event Streaming Platform

1
Introduction To Kafka
2
Kakfa Architecture
3
Kafka Key Concepts/Fundamentals
4
Overview Of Zookeeper And It’s Role In Kafka Cluster
5
Cluster, Nodes, Brokers, Topics
6
Consumer, Producers, Logs, Partitions
7
Concept Of Consumer Groups
8
Leader & Follower Partition
9
Installing One Node Kafka Cluster On Local
10
Installing Multi Broker Kafka Cluster On Local
11
Command Line Producer And Consumer
12
Replication Concept For Fault Tolerance
13
How Data Is Stored In Brokers
14
Log Segments, Message Offsets, Message Index
15
Isr List / Minimum Isr
16
Committed Vs Uncommited Messages
17
Writing A Kafka Producer In Java
18
Writing A Kafka Consumer In Java
19
Achieving Exactly Once Semantics
20
Integrating Kafka With Spark Structured Streaming.
21
Quiz
22
Assignment
23
Assignment Solution

Big Data on Cloud

1
AWS EMR (Elastic MapReduce)
2
What is a VM (Virtual Machine)
3
On-Premise vs Cloud Setup
4
Major Vendors of Hadoop Distribution
5
Why Cloud & Big Data on Cloud
6
Major Cloud Providers of Bigdata
7
What is EMR
8
Hdfs vs S3
9
What Is S3
10
Important Instances in AWS
11
Kinds of Nodes in Cluster
12
Transient vs Long Running Cluster
13
Running Spark Code on Emr
14
How to Track Your Job
15
Copy File From S3 to Local
16
Zeppelin Notebook
17
Types of EC2 Instances
18
How to Create a VM
19
What is a Keypair
20
Elastic IP
21
AWS Storage, Networking & CLI
22
Instance Store
23
S3 & EBS
24
Public Ip Vs Private Ip
25
Network Switches
26
Security Group
27
Aws Command Line Interface
28
Launch A Emr Cluster Using Advanced Options

AWS Athena:

1
What is Athena
2
When do we require Athena
3
What problem Athena Solve
4
How Athena Works
5
Athena Practical Demonstration
6
How to create a normal table manually on csv data residing in s3
7
How to minimize data scanning in Athena
8
How to create partition table on Parquet file
9
Infering Schema automatically using AWS Glue
10
Glue Catalog
11
Quiz
12
Assignment
13
Assignment Solution

Final Project

1
One end-to-end pipeline PROJECT involving all Major components like Sqoop, Hdfs, Hive, Hbase, Spark… etc.
2
Interview Preparation Tips:
3
Resume Building
4
15+ Mock Interview Recordings
5
Mock Interview
6
Interview Questions
7
How to Handle Managerial Round Qs
This website uses cookies and asks your personal data to enhance your browsing experience.