Abstract
Existence of several challenges and high cost in the development of monitoring infrastructure have become major reasons for data sparsity by statutory government agencies tasked to study pollution exposure in urban areas. As an effort to mitigate this problem, the recent usage of satellite aerosol optical depth data along with the usage of learning algorithms have become popular in recent times. This paper presents a novel four-staged approach using different machine learning, deep learning and statistical methods to develop a spatio-temporal hybrid model for temporal forecasting using data from existing stations along with satellite aerosol optical depth data for spatial interpolation. Experiments conducted on real-world data belonging to the cities of Kolkata, Bengaluru and Mumbai show that a consistent pattern is not followed in all the cities in all stages except in spatial interpolation where Random Forest Regression is found to surpass all other models used. While a long short-term memory network (LSTM Auto-Encoder) when employed in temporal forecasting inside the hybrid method outperforms others in Mumbai, a random forest regression-based method and a multi-layer perceptron-based method outperform others similarly in Kolkata and Bengaluru, respectively.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data and Code Availability
The analysis code and data used for this paper upon publication can be found in the following link - https://github.com/nathzi1505/AOD-Hybrid-Paper.
References
Kennedy D, Bates RR, Watson AY, et al. (1988) Air pollution, the automobile, and public health
Paulos E, Anderson K, Townsend A (2004) Ubicomp in the urban frontier. Speech at the Sixth International Conference on Ubiquitous Computing Workshop
World Health Organisation (2016) WHO Global Urban Ambient Air Pollution Database. URL https://www.who.int/airpollution/data/cities-2016/en/. Accessed: 2021-06-01
The Hindustan Times (2017) Delhi gets 18 more monitoring stations to keep tab on air quality. URL https://tinyurl.com/3mya2zz3. Accessed: 2021-06-01
Xing YF, Xu YH, Shi MH, Lian YX (2016) The impact of pm2. 5 on the human respiratory system. J thoracic dis 8(1):E69
World Health Organisation (2013) Health effects of Particulate Matter. URL https://www.euro.who.int/__data/assets/pdf_file/0006/189051/Health-effects-of-particulate-matter-final-Eng.pdf. Accessed: 2021-06-01
Wang W, Guo Y (2009) Air pollution pm2.5 data analysis in los angeles long beach with seasonal arima model. 2009 Int Conf Energy and Environ Technol 3:7–10
Lei F, Dong X, Ma X (2020) Prediction of pm2. 5 concentration considering temporal and spatial features: A case study of fushun, liaoning province. Journal of Intelligent & Fuzzy Systems (Preprint), 1–11
Wang M, Sampson PD, Hu J, Kleeman M, Keller JP, Olives C, Szpiro AA, Vedal S, Kaufman JD (2016) Combining land-use regression and chemical transport modeling in a spatiotemporal geostatistical model for ozone and pm2. 5. Environ sci technol 50(10):5111–5118
Shao P, Xin J, An J, Kong L, Wang B, Wang J, Wang Y, Wu D (2017) The empirical relationship between pm2. 5 and aod in nanjing of the yangtze river delta. Atmos Pollut Res 8(2):233–243
Bui TC, Kim J, Kang T, Lee D, Choi J, Yang I, Jung K, Cha SK (2020) Star: Spatio-temporal prediction of air quality using a multimodal approach
He Z, Chow C, Zhang J (2020) Stnn: A spatio-temporal neural network for traffic predictions. IEEE Transactions on Intelligent Transportation Systems pp 1–10
Pu Q, Yoo EH (2020) Spatio-temporal modeling of pm2.5 concentrations with missing data problem: a case study in beijing, china. Int J Geogr Inf Sci 34(3):423–447. https://doi.org/10.1080/13658816.2019.1664742
Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, Sabath MB, Choirat C, Koutrakis P, Lyapustin A, Wang Y, Mickley LJ, Schwartz J (2019) An ensemble-based model of pm2.5 concentration across the contiguous united states with high spatiotemporal resolution. Environment International 130:104,909. https://doi.org/10.1016/j.envint.2019.104909. URL https://www.sciencedirect.com/science/article/pii/S0160412019300650
Stafoggia M, Bellander T, Bucci S, Davoli M, de Hoogh K, de’ Donato F, Gariazzo C, Lyapustin A, Michelozzi P, Renzi M, Scortichini M, Shtein A, Viegi G, Kloog I, Schwartz J (2019) Estimation of daily pm10 and pm2.5 concentrations in italy, 2013–2015, using a spatiotemporal land-use random-forest model. Environment International 124:170–179. https://doi.org/10.1016/j.envint.2019.01.016. URL www.sciencedirect.com/science/article/pii/S0160412018327685
Krishna RK, Ghude SD, Kumar R, Beig G, Kulkarni R, Nivdange S, Chate D (2019) Surface PM2.5 estimate using satellite-derived aerosol optical depth over india. Aerosol and Air Quality Res 19(1):25–37. https://doi.org/10.4209/aaqr.2017.12.0568
Wu Z, Wang Y, Zhang L (2019) Msstn: Multi-scale spatial temporal network for air pollution prediction. In: 2019 IEEE International Conference on Big Data (Big Data), pp 1547–1556. https://doi.org/10.1109/BigData47090.2019.9005574
Lindström J, Szpiro A, Sampson P, Sheppard L, Oron A, Richards M, Larson T (2011) A flexible spatio-temporal model for air pollution with spatio-temporal covariates. ISEE Conference Abstracts 2011. https://doi.org/10.1289/isee.2011.00165
Taieb SB, Atiya AF (2016) A bias and variance analysis for multistep-ahead time series forecasting. IEEE Trans Neural Netw Learn Sys 27(1):62–76. https://doi.org/10.1109/TNNLS.2015.2411629
Chandra R (2015) Competition and collaboration in cooperative coevolution of elman recurrent neural networks for time-series prediction. IEEE Trans Neural Netw Learn Sys 26(12):3123–3136. https://doi.org/10.1109/TNNLS.2015.2404823
Xu M, Yang Y, Han M, Qiu T, Lin H (2019) Spatio-temporal interpolated echo state network for meteorological series prediction. IEEE Trans Neural Netw Learn Sys 30(6):1621–1634
Soh P, Chang J, Huang J (2018) Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access 6:38,186-38,199
Zhu JY, Sun C, Li VO (2017) An extended spatio-temporal granger causality model for air quality estimation with heterogeneous urban big data. IEEE Trans on Big Data 3(3):307–319
Sahu SK, Gelfand AE, Holland DM (2006) Spatio-temporal modeling of fine particulate matter. Journal of Agricultural, Biological, and Environmental Statistics 11(1):61–86. URL http://www.jstor.org/stable/27595586
Cesario E, Comito C, Talia D (2017) An approach for the discovery and validation of urban mobility patterns. Pervasive and Mobile Computing 42:77–92. https://doi.org/10.1016/j.pmcj.2017.09.006. URL www.sciencedirect.com/science/article/pii/S157411921630390X
Comito C (2020) Next: A framework for next-place prediction on location based social networks. Knowledge-Based Systems 204:106,205. https://doi.org/10.1016/j.knosys.2020.106205. URL www.sciencedirect.com/science/article/pii/S095070512030424X
Yang Q, Yuan Q, Yue L, Li T, Shen H, Zhang L (2019) The relationships between pm2.5 and aerosol optical depth (aod) in mainland china: About and behind the spatio-temporal variations. Environmental Pollution 248. https://doi.org/10.1016/j.envpol.2019.02.071
Ni X, Cao C, Zhou Y, Cui X, P. Singh R (2018) Spatio-temporal pattern estimation of pm2.5 in beijing-tianjin-hebei region based on modis aod and meteorological data using the back propagation neural network. Atmosphere 9(3). https://doi.org/10.3390/atmos9030105. URL https://www.mdpi.com/2073-4433/9/3/105
Mao X, Shen T, Feng X (2017) Prediction of hourly ground-level pm2. 5 concentrations 3 days in advance using neural networks with satellite data in eastern china. Atmos Pollut Res 8(6):1005–1015
Kloog I, Chudnovsky AA, Just AC, Nordio F, Koutrakis P, Coull BA, Lyapustin A, Wang Y, Schwartz J (2014) A new hybrid spatio-temporal model for estimating daily multi-year pm2.5 concentrations across northeastern usa using high resolution aerosol optical depth data. Atmospheric Environment 95:581–590. https://doi.org/10.1016/j.atmosenv.2014.07.014. URL www.sciencedirect.com/science/article/pii/S1352231014005354
Rao R. Air quality data in india (2015 - 2020). URL https://www.kaggle.com/rohanrao/air-quality-data-in-india. Accessed: 2021-06-01
Ministry of Environment, Forest and Climate Change. Central control room for air quality management. https://cpcb.nic.in/. Accessed: 2021-06-01
NASA. MODIS - Moderate Resolution Imaging Spectroradiometer. URL https://terra.nasa.gov/about/terra-instruments/modis. Accessed: 2021-06-01
NASA. LAADS DAAC. URL https://ladsweb.modaps.eosdis.nasa.gov/. Accessed: 2021-06-01
Connor JT, Martin RD, Atlas LE (1994) Recurrent neural networks and robust time series prediction. IEEE trans neural netw 5(2):240–254
Winters PR (1960) Forecasting sales by exponentially weighted moving averages. Manag Sci 6(3):324–342. https://doi.org/10.1287/mnsc.6.3.324
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. nature 323(6088):533–536
Ma X, Zhang J, Du B, Ding C, Sun L (2018) Parallel architecture of convolutional bi-directional lstm neural networks for network-wide metro ridership prediction. IEEE Trans Intell Transp Sys 20(6):2278–2288
Breiman L (2001) Random forests. Mach learn 45(1):5–32
Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Statistics and comput 14(3):199–222
Quinlan JR (1987) Simplifying decision trees. Int j man-mach stud 27(3):221–234
Griffith DA (2003) Spatial Autocorrelation and Spatial Filtering. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-24806-4
Exploratory spatial data analysis (esda) and spatial autocorrelation. URL https://cran.r-project.org/web/packages/lctools/vignettes/SpatialAutocorrelation.html. Accessed: 2021-06-01
Goldberger AS (1964) Classical linear regression. Econometric theory pp 156–212
Walker GT (1931) On periodicity in series of related terms. Proc R Soc London. Series A, Containing Papers of a Math Phys Character 131(818):518–532
Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3). https://doi.org/10.1145/1961189.1961199
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC press
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Sys 28(10):2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Van Rossum G, Drake FL Jr (1995) Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands
Martín Abadi et al. (2015) Tensorflow:large-scale machine learning on heterogeneous systems
Seabold S, Perktold J (2010) statsmodels: Econometric and statistical modeling with python
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Nath P, Saha P, Middya AI, Roy S (2021) Long-term time-series pollution forecast using statistical and deep learning methods. Neural Comput Appl 33(19):12551–12570. https://doi.org/10.1007/s00521-021-05901-2
Mean squared logarithmic error (msle): Peltarion platform. URL https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/mean-squared-logarithmic-error-(msle). Accessed: 2021-06-01
Sammut C, Webb GI (eds) (2010) Mean Absolute Error, pp 652–652. Springer US, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_525
Herald D (2020) How bad is bengaluru air?. URL https://www.deccanherald.com/metrolife/metrolife-your-bond-with-bengaluru/how-bad-is-bengaluru-air-909370.html. Accessed: 2021-06-01
Acknowledgements
The research work of Asif Iqbal Middya is partially supported by UGC-NET Junior Research Fellowship (UGC-Ref. No.:3684 / (NET-JULY 2018)) provided by the University Grants Commission, Government of India. This research work is also supported by the project entitled “Participatory and Realtime Pollution Monitoring System For Smart City, funded by Higher Education, Science & Technology and Biotechnology, Department of Science & Technology, Government of West Bengal, India”.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nath, P., Roy, B., Saha, P. et al. Hybrid learning model for spatio-temporal forecasting of PM\(_{2.5}\) using aerosol optical depth. Neural Comput & Applic 34, 21367–21386 (2022). https://doi.org/10.1007/s00521-022-07616-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07616-4