Hadoop is an open-source framework for distributed storage and processing. It can be used to store large amounts of data in a reliable, scalable, and inexpensive manner.
HDFS is composed of master-slave architecture, which includes the following elements:
NameNode ๐ฃ
All the blocks on DataNodes are handled by NameNode, which is known as the master node.
Secondary NameNode ๐ฃ
When NameNode runs out of disk space, a secondary NameNode is activated to perform a checkpoint.
DataNode ๐ฃ
Every slave machine that contains data organsises a DataNode. DataNode stores data in ext3 or ext4 file format on DataNodes.
Checkpoint Node ๐ฃ
It establishes checkpoints at specified intervals to generate checkpoint nodes in FsImage and EditLogs from NameNode and joins them to produce a new image. Whenever you generate FsImage and EditLogs from NameNode and merge them to create a new image, checkpoint nodes in HDFS create a checkpoint and deliver it to the NameNode.
Backup Node ๐ฃ
Backup nodes are used to provide high availability of the data. In case one of the active NameNode or DataNodes goes down, the backup node can be promoted to active and the active node switched over to the backup node. Backup nodes are not used to recover from a failure of the active NameNode or DataNodes.