Air Quality Prediction
Air Quality Prediction
Air, an essential natural resource, has been compromised in terms of quality by economic
activities. Considerable research has been devoted to predicting instances of poor air quality, but
most studies are limited by insufficient longitudinal data, making it difficult to account for
seasonal and other factors. Several prediction models have been developed using an 11-year
dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine
learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN),
random forest, stacking ensemble, and support vector machine (SVM), produce promising results
for air quality index (AQI) level predictions. A series of experiments, using datasets for three
different regions to obtain the best prediction performance from the stacking ensemble,
AdaBoost, and random forest, found the stacking ensemble delivers consistently superior
performance for R2 and RMSE, while AdaBoost provides best results for MAE.
Worldwide, air pollution is responsible for around 1.3 million deaths annually according to the
World Health Organization (WHO) . The depletion of air quality is just one of harmful effects
due to pollutants released into the air. Other detrimental consequences, such as acid rain, global
warming, aerosol formation, and photochemical smog, have also increased over the last several
decades .
The recent rapid spread of COVID-19 has prompted many researchers to investigate underlying
pollution-related conditions contributing to COVID-19 pandemics in countries. Several shreds of
evidence have shown that air pollution is linked to significantly higher COVID-19 death rates,
and patterns in COVID-19 death rates mimic patterns in both high population density and high
exposure areas . All the above mentioned raises an urgent need to anticipate and plan for
pollution
fluctuations to help communities and individuals better mitigate the negative impact of air
pollution. To do so, air quality evaluation plays a significant role in monitoring and controlling
air pollution. The Environmental Protection Agency (EPA) tracks the commonly known criteria
pollutants, i.e., ground-level ozone (O3), Sulphur dioxide (SO2), particulates matter (PM10 and
PM2.5), carbon monoxide (CO), carbon dioxide (CO2), and nitrogen dioxide (NO2). These
substances are in compositions of
a common index, called the Air Quality Index (AQI), indicating how clean or polluted the air is
currently or forecasted to become in areas. As the AQI increases, a higher percentage of the
population is exposed. Different countries have their air quality indices, corresponding to
different air quality standards. In the United States, the US Environmental Protection Agency
monitors six pollutants at more than 4000 sites: O3, PM10, PM2.5, NO2, SO2, and lead.
Rybarczyk and Zalakeviciute [4] reviewed
a selection of the 46 most relevant journal papers and found more studies with O3, NO2, PM10
and PM2.5, and less on an overall AQI.
Recent researches focus more on advanced statistical learning algorithms for air quality
evaluation and air pollution prediction. Raimondo et al. [ 5 ], Garcia et al. [6], and Park et al.
have used neural networks to build models for predicting the prevalence of individual pollutants,
e.g., particulates matter measuring less than 10 microns (PM10). Raimondo et al. used a support
vector machine (SVM)
and artificial neural network (ANN) to train models. Their best ANN model attained almost 79%
for specificity with only a 0.82% false-positive rate, while their best SVM model at a specificity
of 80% with a false positive rate of only 0.13%. Yu et al. [ 8] proposed a random forest
approach, named RAQ, for AQI category prediction. Then, Yi et al. [9 ] applied deep neural
networks for AQI category
prediction. Veljanovska and Dimoski applied different settings to outperform k-nearest neighbor
(k-NN), decision tree, and SVM for predicting AQI levels. Their ANN model achieved an
accuracy of 92.3%, outperforming all other tested algorithms.
The work presented in this paper focuses on the development of AQI prediction models for acute
air pollution events 1, 8, and 24 h in advance. The following machine learning (ML) algorithms
are investigated, i.e., random forest, adaptive boosting (AdaBoost), support vector machine,
artificial neural network, and stacking ensemble methods to train models. As well, this research
observes how prediction
performance decays over longer time frames, and the precision is measured with three commonly
used scale-dependent error indexes: mean absolute error (MAE), root mean squared error
(RMSE), and R-squared (R2)