How Hadoop Map and Reduce frameworks

hanmayya

I have a 1GB file. It is a simple text file. I have a 3 node cluster. If I write a Java MapReduce program to count for each word its occurrence, how may times a Mappers Map() will be called and Reducers reduce method be called?

vefthym

First of all, the size of the cluster is not important. It will result in some redundant calls, just for fault tolerance, but I guess this is not your question. So, if you have a 1-node cluster or a 100-node cluster, the number of map and reduce tasks will be the same and the result will be the same.

Now, for the number of map tasks, this depends on a few things, such as the size of a block and the format of the input. You can find details about the number of mappers in this article.

Now for the number of times that the reduce method will be called, this is much easier to define. In a wordcount program, the output key of the mappers is a word. So, each distinct word will end up in a different invocation to the reduce method. In other words, the number of times a reduce method is called, is equal to the number of distinct words that exist in your dataset.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related