How to understand data locality in Hadoop MapReduce?

What is data locality and how does it work?
- Suppose you have a cluster of 5 nodes, you store a file there and need to do a calculation on it. With data locality you try to make the calculation happen on the node (s) where the data is stored (rather than for example the first node that has compute resources available). This reduces network load.
What is data localization?
- Data locality refers the process of placing computation near to data , which helps in high throughput and faster execution of data. 1. Data Local. If a map task is executing on a node which has the input block to be processed, its called data local.
How is a file stored in Hadoop?
- HDFS stores a file by dividing it into blocks of 128 MB (which is the default block size). These blocks are then stored in different nodes across the Hadoop cluster.


Share this Post: