HDFS Architecture ๐Ÿ’ฅ

ยท

1 min read

Hadoop is an open-source framework for distributed storage and processing. It can be used to store large amounts of data in a reliable, scalable, and inexpensive manner.

HDFS is composed of master-slave architecture, which includes the following elements:

NameNode ๐Ÿ’ฃ

All the blocks on DataNodes are handled by NameNode, which is known as the master node.

Secondary NameNode ๐Ÿ’ฃ

When NameNode runs out of disk space, a secondary NameNode is activated to perform a checkpoint.

DataNode ๐Ÿ’ฃ

Every slave machine that contains data organsises a DataNode. DataNode stores data in ext3 or ext4 file format on DataNodes.

Checkpoint Node ๐Ÿ’ฃ

It establishes checkpoints at specified intervals to generate checkpoint nodes in FsImage and EditLogs from NameNode and joins them to produce a new image. Whenever you generate FsImage and EditLogs from NameNode and merge them to create a new image, checkpoint nodes in HDFS create a checkpoint and deliver it to the NameNode.

Backup Node ๐Ÿ’ฃ

Backup nodes are used to provide high availability of the data. In case one of the active NameNode or DataNodes goes down, the backup node can be promoted to active and the active node switched over to the backup node. Backup nodes are not used to recover from a failure of the active NameNode or DataNodes.

ย