Site icon TechVidvan

6 Best MapReduce Job Optimization Techniques

Performance tuning will help in optimizing your Hadoop performance. In this blog, we are going to discuss all those techniques for MapReduce Job optimizations.

In this MapReduce tutorial, we will provide you 6 important tips for MapReduce Job Optimization such as the Proper configuration of your cluster, LZO compression usage, Proper tuning of the number of MapReduce tasks etc.

Tips for MapReduce Job Optimization

Below are some MapReduce job optimization techniques that would help you in optimizing MapReduce job performance.

1. Proper configuration of your cluster

2. LZO compression usage

For Intermediate data, this is always a good idea. Every Hadoop job that generates a non-negligible amount of map output will get benefit from intermediate data compression with LZO.

Although LZO adds a little bit of overhead to CPU, it saves time by reducing the amount of disk IO during the shuffle.

Set mapred.compress.map.output to true to enable LZO compression

3. Proper tuning of the number of MapReduce tasks

4. Combiner between Mapper and Reducer

If algorithm involves computing aggregates of any sort, then we should use a Combiner. Combiner performs some aggregation before the data hits the reducer.

The Hadoop MapReduce framework runs combine intelligently to reduce the amount of data to be written to disk. And that data has to be transferred between the Map and Reduce stages of computation.

5. Usage of most appropriate and compact writable type for data

Big data users use the Text writable type unnecessarily to switch from Hadoop Streaming to Java MapReduce. Text can be convenient. It’s inefficient to convert numeric data to and from UTF8 strings. And can actually make up a significant portion of CPU time.

6. Reusage of Writables

Many MapReduce users make one very common mistake that is to allocate a new Writable object for every output from a mapper/reducer. Suppose, for example, word-count mapper implementation as follows:

public void map(...) {
...
for (String word: words) {
output.collect(new Text(word), new IntWritable(1));
}

This implementation causes allocation of thousands of short-lived objects. While Java garbage collector does a reasonable job at dealing with this, it is more efficient to write:

class MyMapper ... {
Text wordText = new Text();
IntWritable one = new IntWritable(1);
public void map(...) {
... for (String word: words)
{
wordText.set(word);
output.collect(word, one); }
}
}

Conclusion

Hence, there are various MapReduce job optimization techniques that help you in optimizing MapReduce job. Like using combiner between mapper and Reducer, by LZO compression usage, proper tuning of the number of MapReduce tasks, Reusage of writable.

If you find ant other technique for MapReduce job optimization, so do let us know in the comment section given below.

Exit mobile version