Email: [email protected]Phone: 080-42041080 +91 9611824441

HADOOP DEV + SPARK & SCALA

  • Solution for BigData Problem
  • Open Source Technology
  • Based on open source platforms
  • Contains several tool for entire ETL data processing Framework
  • It can process Distributed data and no need to store entire data in centralized storage as it is required for SQL based tools.
Satisfied Learners

Our Courses

  • Drop A Query

    One time class room registraion to click here Fee 1000/-

    Clasroom training batch schedules:

    Location Day/Duration Date Time Type
    HSR Layout, Bangalore Weekend 27/04/2019 11:00 AM Demo Batch Enquiry
    HSR Layout, Bangalore Weekend 28/04/2019 11:30 AM New Batch Enquiry

    Hadoop Developer / Analyst / SPARK + SCALA / Hadoop (Java + Non- Java) Track

    HADOOP DEV + SPARK & SCALA + NoSQL + Splunk + HDFS (Storage) + YARN (Hadoop Processing Framework) + MapReduce using Java (Processing Data) +  Apache Hive + Apache Pig + HBASE (Real NoSQL ) + Sqoop + Flume + Oozie  + Kafka With ZooKeeper + Cassandra + MongoDB + Apache Splunk

    Best Bigdata Hadoop Training with 2 Real-time Projects with 1 TB Data set

    Duration of the Training : 8 to 10 weekends 

     

    Bigdata Hadoop Syllabus

    For whom Hadoop is?

    IT folks who want to change their profile in a most demanding technology which is in demand by almost all clients in all domains because of below mentioned reasons-

    •  Hadoop is open source (Cost saving / Cheaper)
    •  Hadoop solves Big Data problem which is very difficult or impossible to solve using highly paid tools in market
    •  It can process Distributed data and no need to store entire data in centralized storage as it is there with other tools.
    •  Now a days there is job cut in market in so many existing tools and technologies because clients are moving towards a cheaper and efficient solution in market named HADOOP
    •  There will be almost 4.4 million jobs in market on Hadoop by next year.

    Please refer below mentioned links:

    http://www.computerworld.com/article/2494662/business-intelligence/hadoop-will-be-in-most-advanced-analytics-products-by-2015–gartner-says.html

    Can I Learn Hadoop If I Don’t know Java?

    Yes,

    It is a big myth that if a guy don’t know Java then he can’t learn Hadoop. The truth is that Only Map Reduce framework needs Java except Map Reduce all other components are based on different terms like Hive is similar to SQL, HBase is similar to RDBMS and Pig is script based.

    Only MR requires Java but there are so many organizations who started hiring on specific skill set also like HBASE developer or Pig and Hive specific requirements. Knowing MapReuce also is just like become all-rounder in Hadoop for any requirement.

    Why Hadoop?

    • Solution for BigData Problem
    • Open Source Technology
    • Based on open source platforms
    • Contains several tool for entire ETL data processing Framework
    • It can process Distributed data and no need to store entire data in centralized storage as it is required for SQL based tools. 

     

    Training Syllabus                                                   ,

    HADOOP DEV + SPARK & SCALA + NoSQL + Splunk + HDFS (Storage) + YARN (Hadoop Processing Framework) + MapReduce using Java (Processing Data) +  Apache Hive + Apache Pig + HBASE (Real NoSQL ) + Sqoop + Flume + Oozie  + Kafka With ZooKeeper + Cassandra + MongoDB + Apache Splunk

     Big data

    Distributed computing

    Data management – Industry Challenges

    Overview of Big Data

    Characteristics of Big Data

    Types of data

    Sources of Big Data

    Big Data examples

    What is streaming data?

    Batch vs Streaming data processing

    Overview of Analytics

    Big data Hadoop opportunities

    Hadoop                                      

    Why we need Hadoop

    Data centers and Hadoop Cluster overview

    Overview of Hadoop Daemons

    Hadoop Cluster and Racks

    Learning Linux required for Hadoop

    Hadoop ecosystem tools overview

    Understanding the Hadoop configurations and Installation.

    HDFS (Storage)

    HDFS

    HDFS Daemons – Namenode, Datanode, Secondary Namenode

    Hadoop FS and Processing Environment’s UIs

    Fault Tolerant 

    High Availability

    Block Replication

    How to read and write files

    Hadoop FS shell commands

    YARN (Hadoop Processing Framework)

    YARN

    YARN Daemons – Resource Manager, NodeManager etc.

    Job assignment & Execution flow

    MapReduce using Java (Processing Data)

    The introduction of MapReduce.

    MapReduce Architecture

    Data flow in MapReduce

    Understand Difference Between Block and InputSplit

    Role of RecordReader

    Basic Configuration of MapReduce

    MapReduce life cycle

    How MapReduce Works

    Writing and Executing the Basic MapReduce Program using Java

    Submission & Initialization of MapReduce Job.

    File Input/Output Formats in MapReduce Jobs

    Text Input Format

    Key Value Input Format

    Sequence File Input Format

    NLine Input Format

    Joins

    Map-side Joins

    Reducer-side Joins

    Word Count Example(or) Election Vote Count

    Will cover five to Ten Map Reduce Examples with real time data.

     Apache Hive

    Data warehouse basics

    OLTP vs OLAP Concepts

    Hive

    Hive Architecture

    Metastore DB and Metastore Service

    Hive Query Language (HQL)

    Managed and External Tables

    Partitioning & Bucketing

    Query Optimization

    Hiveserver2 (Thrift server)

    JDBC , ODBC connection to Hive

    Hive Transactions

    Hive UDFs

    Working with Avro Schema and AVRO file format

    Hands on Multiple Real Time datasets. 

    Apache Pig

    Apache Pig

    Advantage of Pig over MapReduce

    Pig Latin (Scripting language for Pig)

    Schema and Schema-less data in Pig

    Structured , Semi-Structure data processing in Pig

    Pig UDFs

    HCatalog

    Pig vs Hive Use case

    Hands On Two more examples daily use case data analysis in google. And Analysis on Date time dataset

    HBASE (Real NoSQL )

    Introduction to HBASE

    Basic Configurations of HBASE

    Fundamentals of HBase

    What is NoSQL?

    HBase Data Model

    Table and Row.

    Column Family and Column Qualifier.

    Cell and its Versioning

    Categories of NoSQL Data Bases

    Key-Value Database

    Document Database

    Column Family Database

    HBASE Architecture

    HMaster

    Region Servers

    Regions

    MemStore

    Store

    SQL vs. NOSQL

    How HBASE is differed from RDBMS

    HDFS vs. HBase

    Client-side buffering or bulk uploads

    HBase Designing Tables

    HBase Operations

    Get

    Scan

    Put

    Delete

    Live Dataset

    Sqoop

    Sqoop commands

    Sqoop practical implementation 

    Importing data to HDFS

    Importing data to Hive

    Exporting data to RDBMS

    Sqoop connectors

    Flume

    Flume commands

    Configuration of Source, Channel and Sink

    Fan-out flume agents

    How to load data in Hadoop that is coming from web server or other storage

    How to load streaming data from Twitter data in HDFS using Hadoop

    Oozie

    Oozie

    Action Node and Control Flow node

    Designing workflow jobs

    How to schedule jobs using Oozie

    How to schedule jobs which are time based

    Oozie Conf file

    Scala

    Scala 

    Syntax formation, Datatypes , Variables

    Classes and Objects

    Basic Types and Operations

    Functional Objects

    Built-in Control Structures

    Functions and Closures

    Composition and Inheritance

    Scala’s Hierarchy

    Traits

    Packages and Imports

    Working with Lists, Collections

    Abstract Members

    Implicit Conversions and Parameters

    For Expressions Revisited

    The Scala Collections API

    Extractors

    Modular Programming Using Objects

    Spark

    Spark

    Architecture and Spark APIs

    Spark components 

    Spark master

    Driver

    Executor

    Worker

    Significance of Spark context

    Concept of Resilient distributed datasets (RDDs)

    Properties of RDD

    Creating RDDs

    Transformations in RDD

    Actions in RDD

    Saving data through RDD

    Key-value pair RDD

    Invoking Spark shell

    Loading a file in shell

    Performing some basic operations on files in Spark shell

    Spark application overview

    Job scheduling process

    DAG scheduler

    RDD graph and lineage

    Life cycle of spark application

    How to choose between the different persistence levels for caching RDDs

    Submit in cluster mode

    Web UI – application monitoring

    Important spark configuration properties

    Spark SQL overview

    Spark SQL demo

    SchemaRDD and data frames

    Joining, Filtering and Sorting Dataset

    Spark SQL example program demo and code walk through

    Kafka With ZooKeeper

    What is Kafka

    Cluster architecture With Hands On

    Basic operation

    Integration with spark

    Integration with Camel

    Additional Configuration

    Security and Authentication

    Apache Kafka With Spring Boot Integration

    Running 

    Usecase

    Apache Splunk

    Introduction & Installing Splunk

    Play with Data and Feed the Data

    Searching & Reporting

    Visualizing Your Data

    Advanced Splunk Concepts 

    Cassandra + MongoDB 

    Introduction of NoSQL 

    What is NOSQL & N0-SQL Data Types

    System Setup Process

    MongoDB Introduction

    MongoDB Installation 

    DataBase Creation in MongoDB

    ACID and CAP Theorum 

    What is JSON and what all are JSON Features? 

    JSON and XML Difference 

    CRUD Operations – Create , Read, Update, Delete

    Cassandra Introduction

    Cassandra – Different Data Supports 

    Cassandra – Architecture in Detail 

    Cassandra’s SPOF & Replication Factor

    Cassandra – Installation & Different Data Types

    Database Creation in Cassandra 

    Tables Creation in Cassandra 

    Cassandra Database and Table Schema and Data 

    Update, Delete, Insert Data in Cassandra Table 

    Insert Data From File in Cassandra Table 

    Add & Delete Columns in Cassandra Table 

    Cassandra Collections

    DataQubez University creates meaningful big data & Data Science certifications that are recognized in the industry as a confident measure of qualified, capable big data experts. How do we accomplish that mission? DataQubez certifications are exclusively hands on, performance-based exams that require you to complete a set of tasks. Demonstrate your expertise with the most sought-after technical skills. Big data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At DataQubez, we’re drawing on our industry leadership and early corpus of real-world experience to address the big data & Data Science talent gap.

    How To Become Certified Big Data – Hadoop Developer

    Certification Code – DQCP – 502

    Certification Description – DataQubez Certified Professional Big Data – Hadoop Developer

    Exam Objectives
    Configuration :-

    Define and deploy a rack topology script, Change the configuration of a service using Apache Hadoop, Configure the Capacity Scheduler, Create a home directory for a user and configure permissions, Configure the include and exclude DataNode files

    Troubleshooting :-

    Restart an Cluster service, View an application’s log file, Configure and manage alerts Troubleshoot a failed job

    High Availability :-

    Configure NameNode, Configure ResourceManager, Copy data between two clusters, Create a snapshot of an HDFS directory, Recover a snapshot, Configure HiveServer2

    Data Ingestion - with Sqoop & Flume :-

    Import data from a table in a relational database into HDFS, Import the results of a query from a relational database into HDFS, Import a table from a relational database into a new or existing Hive table, Insert or update data from HDFS into a table in a relational database, Given a Flume configuration file, start a Flume agent, Given a configured sink and source, configure a Flume memory channel with a specified capacity

    Data Transformation Using Pig :-

    Write and execute a Pig script, Load data into a Pig relation without a schema, Load data into a Pig relation with a schema, Load data from a Hive table into a Pig relation, Use Pig to transform data into a specified format, Transform data to match a given Hive schema, Group the data of one or more Pig relations, Use Pig to remove records with null values from a relation, Store the data from a Pig relation into a folder in HDFS, Store the data from a Pig relation into a Hive table, Sort the output of a Pig relation, Remove the duplicate tuples of a Pig relation, Specify the number of reduce tasks for a Pig MapReduce job, Join two datasets using Pig, Perform a replicated join using Pig

    Data Analysis Using Hive :-

    Write and execute a Hive query, Define a Hive-managed table, Define a Hive external table, Define a partitioned Hive table, Define a bucketed Hive table, Define a Hive table from a select query, Define a Hive table that uses the ORCFile format, Create a new ORCFile table from the data in an existing non-ORCFile Hive table, Specify the storage format of a Hive table Specify the delimiter of a Hive table, Load data into a Hive table from a local directory Load data into a Hive table from an HDFS directory, Load data into a Hive table as the result of a query, Load a compressed data file into a Hive table, Update a row in a Hive table, Delete a row from a Hive table, Insert a new row into a Hive table, Join two Hive tables, Set a Hadoop or Hive configuration property from within a Hive query.

    Data Processing through Spark & Spark SQL& Python :-

    Frame big data analysis problems as Apache Spark scripts, Optimize Spark jobs through partitioning, caching, and other techniques, Develop distributed code using the Scala programming language, Build, deploy, and run Spark scripts on Hadoop clusters, Transform structured data using SparkSQL and DataFrames

    For Exam Registration , Click here:

    Trainer is having 17 year experience in IT with 10 years in data warehousing &ETL experience. It has been six years now that he has been working extensively in BigData ecosystem toolsets for few of the banking-retail-manufacturing clients. He is a certified HDP-Spark Developer and Cloudera certified Hbase specialist. He also have done corporate sessions and seminars both in India and abroadRecently he was engaged by Pune University for 40 hour sessions on BigData analytics to the senior professors of Pune.

    All faculties at our organization are currently working on the technologies in reputed organization. The curriculum that is imparted is not just some theory or talk with some PPTs. We absolutely frame the forum in such a way so that at the end the lessons are imparted in easy language and the contents are well absorbed by the candidates. The sessions are backed by hands-on assignment. Also that the faculties are industry experience so during the course he does showcase his practical stories.

    • How we are Different from Others : Covers each topics with Real Time Examples . Covers 8 Real time project and more than 72+ Assignments which is divided into Basic , Intermediate and  Advanced . Trainer from Real Time Industry with 9 years experience in DWH. Working as BI and Hadoop consultant having 3+ years in Bigdata & Hadoop real time implementation and migrations.
      This is completely hands own training , which covers 90% Practical And 10% Theory .Here in Radical Technologies , we will take all prerequisite like Java ,SQL, which is required to learn Hadoop Developer and Analytical skills. This way We will accommodate technology illiterate and Technical experts in the same session and at the end of the training , they will gain the confidence  that , they got up-skilled to a different level. 
      • 8 Domain Based Project With Real Time Data ( with one trainer - two project. If you req more projects , you are free to attend any other trainers project orientations sessions )
      • 5 POC
      • 72 Assignments
      • 25 Real Time Scenarios On 16 Node Clusters ( Aws Cloud setup )
      • Basic Java
      • DWH Concept
      • Pig|Hive|Mapreduce|Nosql|Hbase|Zookeeper|Sqoop|Flume|Oozie|Yarn|Hue|Spark |Scala
      42 Hours Classroom Section 30 Hours of assignments 25 hours for One Project and 50 Hrs for 2 Project ( Candidates should prepare with mentor support . 50 hours mentioned is total hours spent on project by each trainer ) 350+ Interview Questions Administration and Manual Installation of Hadoop with other Domain based projects will be done on regular basis apart from our normal batch schedule . We do have projects from Healthcare , Financial , Automotive ,Insurance , Banking , Retail etc , which will be given to our students as per their requirements .
      • Training By 14+ Years experienced Real Time Trainer
      • A pool of 200+ real time Practical Sessions on Bigdata Hadoop
      • Scenarios and Assignments to make sure you compete with current Industry standards
      • World class training methods
      • Training  until the candidate get placed
      • Certification and Placement Support until you get certified and placed
      • All training in reasonable cost
      • 10000+ Satisfied candidates
      • 5000+ Placement Records
      • Corporate and Online Training in reasonable Cost
      • Complete End-to-End Project with Each Course
      • World Class Lab Facility which facilitates I3 /I5 /I7 Servers and Cisco UCS Servers
      •  Covers Topics other than from Books which is required for the IT Industry
      • Resume And Interview preparation with 100% Hands-on Practical sessions
      • Doubt clearing sessions any time after the course
      • Happy to help you any time after the course
    ML and GraphX ,'R' Language Data Analytics / Science

    Cloudera Certified Professional (CCP)

    CCP Data Engineer

    STILL NOT SURE WHAT TO DO?

    We are glad that you preferred to contact us. Please fill our short form and one of our friendly team members will contact you back.

    X
    Quik Enquiry