A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Sun, Zhongyi; Chen, Fengke; Chi, Mingmin; Zhu, Yangyong

doi:10.1007/978-3-319-24474-7_17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9208))

Included in the following conference series:

International Conference on Data Science

2071 Accesses
17 Citations

Abstract

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this problem. Here, Spark is an open-source distributed computing platform with Hadoop YARN as resource scheduler and HDFS as cloud storage system. On the Spark-based platform, data loaded into memory in the first iteration can be reused in the subsequent iterations. This mechanism makes Spark much suitable for running multi-iteration algorithms compared to MapReduce which has to load data in each iteration. The experiments are carried out on massive remote sensing data using multi-iteration singular value decomposition (SVD) algorithm. The results show that Spark-based SVD can obtain significantly faster computation timethan that by MapReduce, usually by one order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Development of Streaming Big Data Analysis System Using In-memory Cluster Computing Framework: Spark

A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark

Article 01 July 2021

Proposition of a Parallel and Distributed Algorithm for the Dimensionality Reduction with Apache Spark

References

Bilotta, G., Sánchez, R.Z., Ganci, G.: Optimizing satellite monitoring of volcanic areas through gpus and multi-core cpus image processing: An opencl case study. Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 6(6), 2445–2452 (2013)
Article Google Scholar
Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)
Google Scholar
Callico, G., Lopez, S., Aguilar, B., Lopez, J., Sarmiento, R.: Parallel implementation of the modified vertex component analysis algorithm for hyperspectral unmixing using opencl (2014)
Google Scholar
CUDA: http://www.nvidia.com/object/cuda_home_new.html/
Dagum, L., Menon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM (2003)
Google Scholar
Golpayegani, N., Halem, M.: Cloud computing for satellite data processing on high end compute clusters. In: IEEE International Conference on Cloud Computing, 2009. CLOUD 2009, pp. 88–92. IEEE (2009)
Google Scholar
Grauer-Gray, S., Kambhamettu, C., Palaniappan, K.: Gpu implementation of belief propagation using cuda for cloud tracking and reconstruction. In: IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS 2008), vol. 4, p. 2 (2008)
Google Scholar
Johnpaul, C., Thampi, N.S.: Distributed in-memory cluster computing approach in scala for solving graph data applications. In: 2014 International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1–6. IEEE (2014)
Google Scholar
Programming Language, S.: http://www.scala-lang.org
Lin, X., Wang, P., Wu, B.: Log analysis in cloud computing environment with hadoop and spark. In: 2013 5th IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT), pp. 273–276. IEEE (2013)
Google Scholar
Marchal, S., Jiang, X., State, R., Engel, T.: A big data architecture for large scale security monitoring. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 56–63. IEEE (2014)
Google Scholar
Pan, X., Zhang, S.: A remote sensing image cloud processing system based on hadoop. In: 2012 IEEE 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS), vol. 1, pp. 492–494. IEEE (2012)
Google Scholar
Tan, Y.K.A., Tan, W.J., Kwoh, L.K.: Fast colour balance adjustment of ikonos imagery using cuda. In: IEEE International Geoscience and Remote Sensing Symposium, 2008. IGARSS 2008, vol. 2, pp. II-1052. IEEE (2008)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10–10 (2010)
Google Scholar
Zhao, J., Zhou, H.: Design and optimization of remote sensing image fusion parallel algorithms based on cpu-gpu heterogeneous platforms. In: 2011 4th International Congress on Image and Signal Processing (CISP), vol. 3, pp. 1623–1627. IEEE (2011)
Google Scholar

Download references

Acknowledgement

This work was supported in part by Natural Science Foundation of China under contract 71331005, in part by Shanghai Science and Technology Development Funds (13dz2260200, 13511504300), and in part by the Open Foundation of Second Institute of Oceanography (SOA).

Author information

Authors and Affiliations

School of Computer Science, Shanghai Key Laboratory of Data Science, Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan University, Shanghai, China
Zhongyi Sun, Fengke Chen, Mingmin Chi & Yangyong Zhu
State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography (SOA), Hangzhou, China
Mingmin Chi

Authors

Zhongyi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Fengke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mingmin Chi
View author publications
You can also search for this author in PubMed Google Scholar
Yangyong Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingmin Chi .

Editor information

Editors and Affiliations

University of Technology, Sydney, Australia
Chengqi Zhang
Xi'an Jiaotong University, Xi'an Jiaotong, China
Wei Huang
Chinese Academy of Sciences, Beijing, China
Yong Shi
University of Illinois Chicago, CHICAGO, Illinois, USA
Philip S. Yu
Fudan University, Shanghai, China
Yangyong Zhu
Research Center on Fictitious Econo, Chinese Academy of Sciences, Beijing, China
Yingjie Tian
University of Technology, Sydney, Australia
Peng Zhang
Victoria University, Melbourne, Victoria, Australia
Jing He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Z., Chen, F., Chi, M., Zhu, Y. (2015). A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing. In: Zhang, C., et al. Data Science. ICDS 2015. Lecture Notes in Computer Science(), vol 9208. Springer, Cham. https://doi.org/10.1007/978-3-319-24474-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-24474-7_17
Published: 14 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24473-0
Online ISBN: 978-3-319-24474-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Development of Streaming Big Data Analysis System Using In-memory Cluster Computing Framework: Spark

A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark

Proposition of a Parallel and Distributed Algorithm for the Dimensionality Reduction with Apache Spark

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Development of Streaming Big Data Analysis System Using In-memory Cluster Computing Framework: Spark

A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark

Proposition of a Parallel and Distributed Algorithm for the Dimensionality Reduction with Apache Spark

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.