Abstract
With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this problem. Here, Spark is an open-source distributed computing platform with Hadoop YARN as resource scheduler and HDFS as cloud storage system. On the Spark-based platform, data loaded into memory in the first iteration can be reused in the subsequent iterations. This mechanism makes Spark much suitable for running multi-iteration algorithms compared to MapReduce which has to load data in each iteration. The experiments are carried out on massive remote sensing data using multi-iteration singular value decomposition (SVD) algorithm. The results show that Spark-based SVD can obtain significantly faster computation timethan that by MapReduce, usually by one order of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bilotta, G., Sánchez, R.Z., Ganci, G.: Optimizing satellite monitoring of volcanic areas through gpus and multi-core cpus image processing: An opencl case study. Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 6(6), 2445–2452 (2013)
Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)
Callico, G., Lopez, S., Aguilar, B., Lopez, J., Sarmiento, R.: Parallel implementation of the modified vertex component analysis algorithm for hyperspectral unmixing using opencl (2014)
Dagum, L., Menon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM (2003)
Golpayegani, N., Halem, M.: Cloud computing for satellite data processing on high end compute clusters. In: IEEE International Conference on Cloud Computing, 2009. CLOUD 2009, pp. 88–92. IEEE (2009)
Grauer-Gray, S., Kambhamettu, C., Palaniappan, K.: Gpu implementation of belief propagation using cuda for cloud tracking and reconstruction. In: IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS 2008), vol. 4, p. 2 (2008)
Johnpaul, C., Thampi, N.S.: Distributed in-memory cluster computing approach in scala for solving graph data applications. In: 2014 International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1–6. IEEE (2014)
Programming Language, S.: http://www.scala-lang.org
Lin, X., Wang, P., Wu, B.: Log analysis in cloud computing environment with hadoop and spark. In: 2013 5th IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT), pp. 273–276. IEEE (2013)
Marchal, S., Jiang, X., State, R., Engel, T.: A big data architecture for large scale security monitoring. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 56–63. IEEE (2014)
Pan, X., Zhang, S.: A remote sensing image cloud processing system based on hadoop. In: 2012 IEEE 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS), vol. 1, pp. 492–494. IEEE (2012)
Tan, Y.K.A., Tan, W.J., Kwoh, L.K.: Fast colour balance adjustment of ikonos imagery using cuda. In: IEEE International Geoscience and Remote Sensing Symposium, 2008. IGARSS 2008, vol. 2, pp. II-1052. IEEE (2008)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10–10 (2010)
Zhao, J., Zhou, H.: Design and optimization of remote sensing image fusion parallel algorithms based on cpu-gpu heterogeneous platforms. In: 2011 4th International Congress on Image and Signal Processing (CISP), vol. 3, pp. 1623–1627. IEEE (2011)
Acknowledgement
This work was supported in part by Natural Science Foundation of China under contract 71331005, in part by Shanghai Science and Technology Development Funds (13dz2260200, 13511504300), and in part by the Open Foundation of Second Institute of Oceanography (SOA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sun, Z., Chen, F., Chi, M., Zhu, Y. (2015). A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing. In: Zhang, C., et al. Data Science. ICDS 2015. Lecture Notes in Computer Science(), vol 9208. Springer, Cham. https://doi.org/10.1007/978-3-319-24474-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-24474-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24473-0
Online ISBN: 978-3-319-24474-7
eBook Packages: Computer ScienceComputer Science (R0)