Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Subject Areas : Computer Networks and Distributed SystemsAvishan Sharafi 1 * , Ali Rezaee 2
1 - Department of Computer Engineering, Islamic Azad University South Tehran Branch
2 - Department of Computer Engineering, Islamic Azad University, Science and Research Branch,Tehran, Iran.
Keywords: Hadoop, Data placement, MapReduce, Heterogeneous, Resource-aware,
Abstract :
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop doesn’t consider load state of each node in distribution input data blocks, which may cause inappropriate overhead and reduce Hadoop performance, but in practice, such data placement policy can noticeably reduce MapReduce performance and may increase extra energy dissipation in heterogeneous environments. This paper proposes a resource aware adaptive dynamic data placement algorithm (ADDP) .With ADDP algorithm, we can resolve the unbalanced node workload problem based on node load status. The proposed method can dynamically adapt and balance data stored on each node based on node load status in a heterogeneous Hadoop cluster. Experimental results show that data transfer overhead decreases in comparison with DDP and traditional Hadoop algorithms. Moreover, the proposed method can decrease the execution time and improve the system’s throughput by increasing resource utilization
[1] G. Turkington, 2013. Hadoop Beginner's Guide: Packt Publishing Ltd.
[2] A. Holmes , 2012. Hadoop in practice: Manning Publications Co.
[3] R. D. Schneider, 2012. Hadoop for Dummies Special Edition, John Wiley&Sons Canada.
[4] C.-W. Lee, K.-Y. Hsieh, S.-Y. Hsieh, and H.-C. Hsiao, 2014. A dynamic data placement strategy for hadoop in heterogeneous environments, Big Data Research,1, pp. 14-22
[5] A. Hadoop, "Welcome to apache hadoop," Hämtat från http://hadoop. apache. org, 2014.
[6] R. Xiong, J. Luo, and F. Dong, 2015. Optimizing data placement in heterogeneous Hadoop clusters, Cluster Computing, 18, pp. 1465-1480.
[7] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, et al, 2010. Improving mapreduce performance through data placement in heterogeneous hadoop clusters, in Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), IEEE International Symposium on, 2010, pp. 1-9.
[8] K. Singh and R. Kaur, 2014. Hadoop: addressing challenges of big data. In Advance Computing Conference (IACC), on (pp. 686-689). IEEE.
[9] X. Xu, L. Cao, and X. Wang, 2014. Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters.
[10] P. Xu, H. Wang, and M. Tian, 2014.New Scheduling Algorithm in Hadoop Based on Resource Aware in Practical Applications of Intelligent Systems, ed: Springer, pp. 1011-1020.
[11] Z. Tang, J. Zhou, K. Li, and R. Li, 2012. MTSD: A task scheduling algorithm for MapReduce base on deadline constraints, in Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), IEEE 26th International.