Undergraduate Student (Computer Engineering)
asuar054 (at) fiu.edu
Currently working on implementing I/O differentiation in Hadoop for the purpose of maintaining a desired service level for a job in the presence of I/O contention or proportionally sharing I/O bandwidth when it is saturated. To maintain service level agreements, a performance model is required to intelligently allocate I/O bandwidth to MapReduce jobs. Below are running times of the map phase of the WordCount benchmark with numbers of map waves for 1-node, 2-node, and 4-node clusters, respectively. Note that the data in each case fits a linear model.
To demonstrate the effect of I/O contention on concurrent MapReduce jobs, two sets of benchmarks were run: one with a single job being executed in isolation and with 30 map and 5 reduce slots allocated to it, and one with two jobs being run concurrently on the same cluster, using Hadoop's FairScheduler to allocate 30 map and 5 reduce slots to each job. Since the allocation of slots is the same for both sets, slowdown of the concurrent benchmarks will be due mostly to I/O contention and to cache contention/thrashing. Below are bar graphs showing the running times for the various sets of benchmarks.