<h3 id="heading-what-is-hadoop">What is Hadoop ? 🧿</h3>
<p>Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed <em>storage</em> and <em>computation</em> across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.</p>
<h3 id="heading-components-of-hadoop">Components of Hadoop 🧿</h3>
<ul>
<li><p>HDFS</p>
</li>
<li><p>Map Reduce</p>
</li>
<li><p>YARN</p>
</li>
</ul>
<h2 id="heading-hadoop-distributed-file-system"><strong>Hadoop Distributed File System</strong></h2>
<p>The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and provides a distributed file system that is designed to run on commodity hardware.</p>
<h2 id="heading-mapreduce"><strong>MapReduce</strong></h2>
<p>MapReduce is a parallel programming model for writing distributed applications devised at Google for efficient processing of large amounts of data (multi-terabyte data-sets), on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.</p>
<h1 id="heading-yarn">YARN</h1>
<p>Yet Another Resource Negotiator which is mostly used to manage the resources.</p>
<h2 id="heading-challenges-with-hadoop">Challenges with Hadoop 🧿</h2>
<ul>
<li><p>Small File Concerns</p>
</li>
<li><p>Slow Processing Speed</p>
</li>
<li><p>No Real Time Processing</p>
</li>
<li><p>No Iterative Processing</p>
</li>
<li><p>Ease of Use</p>
</li>
<li><p>Security Problem</p>
</li>
</ul>


### What is Hadoop ? 🧿

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed *storage* and *computation* across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

### Components of Hadoop 🧿

* HDFS
    
* Map Reduce
    
* YARN
    

## **Hadoop Distributed File System**

The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and provides a distributed file system that is designed to run on commodity hardware.

## **MapReduce**

MapReduce is a parallel programming model for writing distributed applications devised at Google for efficient processing of large amounts of data (multi-terabyte data-sets), on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

# YARN

Yet Another Resource Negotiator which is mostly used to manage the resources.

## Challenges with Hadoop 🧿

* Small File Concerns
    
* Slow Processing Speed
    
* No Real Time Processing
    
* No Iterative Processing
    
* Ease of Use
    
* Security Problem

Hadoop Overview 🧑‍💻

What is Hadoop ? 🧿

Components of Hadoop 🧿

Hadoop Distributed File System

MapReduce

YARN

Challenges with Hadoop 🧿