Add to Wishlist

Hadoop and Spark Training & Placement

Get course
Hadoop and Spark Training

Download Hadoop Content

Download Course Content 

Introduction to Big data & Hadoop

1
What is Big data?
2
Sources of Big data
3
Categories of Big data
4
Characteristics of Big data
5
Use-cases of Big data
6
• Traditional RDBMS vs Hadoop
7
What is Hadoop?
8
History of Hadoop
9
Understanding Hadoop Architecture
10
Fundamental of HDFS (Blocks, Name Node, Data Node, Secondary Name Node)
11
Block Placement &Rack Awareness
12
HDFS Read/Write
13
Drawback with 1.X Hadoop
14
Introduction to 2.X Hadoop
15
•High Availability

Linux

1
Making/creating directories
2
Removing/deleting directories
3
Print working directory
4
Change directory
5
Manual pages
6
Help
7
Vi editor
8
Creating empty files
9
Creating file contents
10
Copying file
11
Renaming files
12
Removing files
13
Moving files
14
Listing files and directories
15
Displaying file contents

HDFS

1
Understanding Hadoop configuration files
2
Hadoop Components- HDFS, MapReduce
3
Overview of Hadoop Processes
4
Overview of Hadoop Distributed File Syste
5
The building blocks of Hadoop
6
Hands-On Exercise: Using HDFS commands

– Map Reduce

1
Map Reduce Introduction
2
How Map Reduce works?
3
Communication between Job Tracker and Task Tracker
4
Anatomy of a Map Reduce Job Submission
5
Limitations of Current Architecture
6
YARN Architecture
7
Node Manager & Resource Manager

Hive

1
What is hive?
2
Why hive?
3
What hive is not?
4
Meta store DB in hive
5
Architecture of hive
6
Internal table
7
External table
8
•Hive operations
9
Static Partition
10
• Dynamic Partition
11
Bucketing
12
•Bucketing with sorting
13
File formats
14
•Hive performance tuning

Sqoop

1
What is Sqoop?
2
•Architecture of Sqoop
3
Listing databases
4
•Listing tables
5
Different ways of setting the password
6
•Using options file
7
Sqoop eval
8
Sqoop import into target directory
9
Sqoop import into warehouse directory
10
Setting the number of mappers
11
Life cycle of Sqoop import
12
Split-by clause
13
Import into hive tables
14
Export from hive tables
15
Setting number of mappers during the export

Python Core

1
What is Python?
2
Why Python?
3
Installation of python
4
CONDITIONS
5
Loops
6
Break statement
7
Continue statement
8
Range functions
9
Command line arguments

Strings & Collections

1
String Object Basics
2
String Method
3
Splitting and Joining Strings
4
String format functions
5
List Object Basics
6
List Methods
7
Tuples
8
Sets
9
Frozen sets
10
Dictionary
11
Iterators
12
Generators
13
Decorators
14
List Set Dictionary comprehensions

– Python Advanced concepts

1
Creating Classes and Objects
2
Inheritance
3
Multiple Inheritance
4
Working with files
5
Reading and Writing files
6
Using Standard Modules
7
Creating custom modules
8
Exceptions Handling with Try-except
9
Finally, in exception handling

Getting started with Spark

1
What is Apache Spark & Why Spark?
2
Spark History
3
Unification in Spark
4
Spark ecosystem Vs Hadoop
5
Spark with Hadoop
6
Introduction to Spark’s Python and Scala Shells
7
Spark Standalone Cluster Architecture and its application flow

–Programming with RDDS

DFs & DSs

1
RDD Basics and its characteristics, Creating RDDs
2
Lazy Evaluation
3
Transformations
4
Actions
5
Persistence (Caching)
6
Module-Advanced spark programming
7
Accumulators and Fault Tolerance
8
Broadcast Variables
9
Custom Partitioning
10
Dealing with different file formats
11
Hadoop Input and Output Formats
12
Connecting to diverse Data Sources
13
Module-Spark SQL
14
Linking with Spark SQL
15
Initializing Spark SQL
16
Data Frames &Caching
17
Case Classes, Inferred Schema
18
Loading and Saving Data
19
Apache Hive
20
Data Sources/Parquet
21
JSON
22
Spark SQL User Defined Functions (UDFs)

KAFKA & Spark Streaming

1
Getting started with Kafka
2
Understanding Kafka Producer and Consumer APIs
3
Deep dive into producer and consumer APIs
4
Ingesting Web Server logs into Kafka
5
Getting started with Spark Streaming
6
Getting started with HBASE
7
Integrating Kafka-Spark Streaming-HBASE

– Spark on Amazon Web Services (AWS)

1
Introduction
2
Sign up for AWS account
3
Setup Cygwin on Windows
4
Quick Preview of Cygwin
5
Understand Pricing
6
Create first EC2 Instance
7
Connecting to EC2 Instance
8
Understanding EC2 dashboard left menu
9
Different EC2 Instance states
10
Describing EC2 Instance
11
Using elastic IPs to connect to EC2 Instance
12
Using security groups to provide security to EC2 Instance
13
Understanding the concept of bastion server
14
Terminating EC2 Instance and relieving all the resources
15
Create security credentials for AWS account
16
Setting up AWS CLI in Windows
17
Creating s3 bucket
18
Deleting root access keys
19
Enable MFA for root account
20
Introduction to IAM users and customizing sign in link
21
Create first IAM user
22
Create group and add user
23
Configure IAM password policy
24
Understanding IAM best practices
25
AWS managed policies and creating custom policies
26
Assign policy to entities (user and/or group)
27
Creating role for EC2 trusted entity with permissions on s3
28
Assigning role to EC2 instance
29
Introduction to EMR
30
EMR concepts
31
Pre-requisites before setting up EMR cluste
32
Setting up data sets
33
Setup EMR with Spark cluster using quick options
34
Connecting to EMR cluster
35
Submitting spark job on EMR cluster
36
Validating the results
37
Terminating EMR Cluster

Airflow

1
What is Airflow?
2
Airflow terminology
3
Why Airflow?
4
What is Airflow Scheduler?
5
What is DAG RUN?
6
Airflow Operators
7
Create first DAG/Workflow
8
Run Pyspark job with Airflow

Interview Preparation

1
3 Real-Time Projects
2
Deployment on multiple platforms
3
Discussion on project explanation in interview
4
Data engineer roles and responsibilities
5
Data engineer day to day work
6
One to One resume Discussion with project, technology, and Experience.
7
Mock interview for every student
8
Real time Interview Questions

Be the first to add a review.

Please, login to leave a review
Hadoop and Spark Training & Placement
Price:
INR. 20,000 INR. 17,000