0% found this document useful (0 votes)
17 views30 pages

Dsa Final Project Report

The project report outlines a comprehensive analysis of air quality data sourced from monitoring stations in India, focusing on pollutant levels in Kerala and Delhi. It details the processes of data transformation, normalization, and statistical analysis, including t-tests and ANOVA, to evaluate differences in pollution levels. The findings indicate significant differences in average pollutant levels between the two states, supported by various statistical tests and visualizations.

Uploaded by

physizzmva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views30 pages

Dsa Final Project Report

The project report outlines a comprehensive analysis of air quality data sourced from monitoring stations in India, focusing on pollutant levels in Kerala and Delhi. It details the processes of data transformation, normalization, and statistical analysis, including t-tests and ANOVA, to evaluate differences in pollution levels. The findings indicate significant differences in average pollutant levels between the two states, supported by various statistical tests and visualizations.

Uploaded by

physizzmva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

FOUNDATIONS OF DATA SCIENCE

DURING THE YEAR 2024


A COURSE PROJECT REPORT

Submitted by
BHUKYA INDU (B230880EC)
M.VIDYADHARI (B231051EC)

DEPARTMENT OF
ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY CALICUT
NOVEMBER 2024
TABLE OF CONTENT
NO. TITLE PAGE NO.
1. Description of Data 3
1.1 Data Source 3
1.2 Data Attributes 3
2. Data Transformation 4
3. Data Wrangling and Descriptive Statistics 4-5
3.1 Data Loading and Initial Inspection 4
3.2 Filtering Data 4
3.3 Handling Missing Values 5
3.4 Feature Engineering
5
3.5 Descriptive Statistics
5
3.6 Visualization
5
4. Data Normalisation 6-9
4.1 First Normal Form (1NF) 6
4.2 Second Normal Form (2NF) 7
4.3 Third Normal Form (3NF) 9
5. Statistical Analysis 11-13
5.1 T-Test for Pollutant Levels between Kerala and Delhi 11
5.2 One-Way ANOVA for State Effect on Pollutant Levels 11
5.3 Two-Way ANOVA for State and Pollutant Type 11
5.4 Proportion Z-Test for High Pollution Levels
12
5.5 Chi-Square for Independence between Pollutant and State
13
6 Data Visualisation 13-17

6.1) Boxplot for Pollutant Levels Across Kerala and Delhi 13

6.2) Histogram of Pollutant Levels for Kerala and Delhi 14

6.3) Distribution Plot of Pollutant Levels in Kerala and Delhi 15

6.4) Heatmap of the Correlation Matrix for Numeric Variables 16

6.5) Bar Plot of High Pollution Proportion by Stat 17


1)Description of Data
1.1 Data Source
The air quality data used in this report is sourced from the Real-Time Air
Quality Index catalog available at data.gov.in. This dataset provides real-
time pollutant concentration levels from monitoring stations across various
cities in India. It consists of 3,229 records with attributes such as station
location, pollutant type, and time-based updates.
1.2 Data Attributes
Sr Attribute Data type Example
No.

1 Country Nominal Data India

2 State Nominal Data Andhra Pradesh


3 City Nominal Data Amaravati
4 Station Nominal Data Secretariat, Amaravati –
APPCB
5 last_update Ordinal Data 16-09-2024 02:00:00
6 Latitude Continuous Data 16.515083

7 longitude Continuous Data 80.518167

8 pollutant_id Nominal Data PM10, SO2, NH3


9 pollutant_min Continuous Data 40.0
10 pollutant_max Continuous Data 75.0
11 pollutant_avg Continuous Data 55.0
2) Data Transformation
Sr Command Purpose
No.
1 .dropna() Deletes rows with missing pollutant readings.
2 .fillna(value) Fills missing values with a default or interpolated
value.
3 .drop_duplicates() Removes duplicate rows to maintain data consistency.

4 .rename(columns) Renames columns for consistency (e.g., pollutant_id to


Pollutant).
5 .astype() Converts column data types (e.g., last_update to
datetime).
6 .groupby() Groups data by city or state to compute average
pollution levels.
7 .reset_index() Resets the index after grouping for a clean DataFrame.

8 .str.strip() Removes extra spaces or special characters from string


values.
9 .apply() Applies custom functions to columns where required.

After transformation, the dataset was uniform and ready for analysis.
Specifically, the operations ensured that all timestamps were in a
consistent format, redundant records were removed, and missing pollutant
values were either filled or excluded. This cleaned and structured data
allowed for effective grouping and visualization, enabling meaningful
insights into air quality trends across various locations.

3) Data Wrangling and Descriptive Statistics


3.1 Data Loading and Initial Inspection: The dataset was loaded
from air_quality.csv using pandas. We conducted an initial
inspection to check column names, data types, and to ensure there
were no major issues with data completeness
3.2 Filtering Data: We focused on air quality data for selected states,
specifically Kerala and Delhi, to narrow down the analysis to these
locations. This filtered dataset will allow us to conduct a more
detailed comparison between the two states.
3.3 Handling Missing Values: For numeric columns, missing values
were replaced using mean imputation to maintain dataset integrity
without introducing significant biases. This step helps to ensure
complete data for statistical analysis.
3.4 Feature Engineering: We added a new feature, high_pollution,
which flags observations where the pollutant_avg exceeds a
threshold (e.g., 50). This binary indicator supports further analysis
on pollution severity.
3.5 Descriptive Statistics: Summary statistics, including mean, standard
deviation, and range, were computed for pollutant_avg in each
state. These statistics provide insight into the general level and
variability of pollutants, which is essential for understanding
pollution patterns.
3.6 Visualization: Boxplots, line plots, and distribution plots were
created to visually explore pollutant levels across the two states.
These plots help identify trends, outliers, and differences in
pollutant distributions between Kerala and Delhi.

Dataset Information:
Column Non-Null Count Dtype
country 3229 object
state 3229 object
city 3229 object
station 3229 object
last_update 3229 object
latitude 3229 float64
longitude 3229 float64
pollutant_id 3229 object
pollutant_min 2998 float64
pollutant_max 2998 float64
pollutant_avg 2998 float64
Descriptive Statistics In a Table
Statistic Latitude Longitude Pollutant Min Pollutant Max Pollutant Avg

Count 3229.000 3229.000 3229.000 3229.000 3229.000


Mean 22.7039 78.5719 16.494 45.770 28.109
Std Dev 5.4758 4.7900 16.305 51.671 26.885
Min 8.5149 70.9092 1.000 1.000 1.000
25% 19.0605 75.6381 5.000 14.000 9.000
50% 23.2648 77.3025 13.000 35.000 22.000
75% 27.1941 80.3230 22.000 57.000 37.000
Max 34.0662 94.6366 114.000 500.000 392.000

4) Data Normalisation
4.1 First Normal Form (1NF)
To ensure the dataset adheres to First Normal Form (1NF), the following
transformations were applied:
1. Atomic Values: Each field contains only a single value. For example,
pollutant_id holds one pollutant type per entry (e.g., PM10 or SO2).
2. No Repeating Groups: There are no multiple values within a single
cell.
3. Unique Records: Duplicate entries were removed using
.drop_duplicates() to ensure every row is unique.
4. Composite Primary Key: Since no single attribute uniquely identifies a
record, a composite primary key was created using the combination
of station, pollutant_id, and last_update. This ensures the uniqueness
of each entry.
Station Pollutant Last Latitude Longitude Pollutant Pollutant Pollutant
Update Min Max Avg
Secretariat, PM10 2024-09- 16. 80. 40.0 75.0 55.0
Amaravati 16 515083 518167
- APPCB 02:00:00
Gulzarpet, SO2 2024- 14. 77. NaN NaN NaN
Anantapur 09-16 675886 593027
- APPCB 02:00:00
Gangineni Cheruvu, PM10 2024-09- 13. 79. NaN NaN NaN
Chittoor 16 204880 097889
- APPCB 02:00:00
Anand Kala SO2 2024-09- 16. 81. 4.0 18.0 11.0
Kshetram, 16 987287 736318
Rajamahendravaram 02:00:00
– APPCB
Tirumala, Tirupati - NH3 2024-09- 13. 79. 1.0 2.0 1.0
APPCB 16 670000 350000
02:00:00

This table ensures compliance with 1NF by ensuring:


• Each field holds a single, atomic value.
• There are no repeating groups or multivalued fields.
• The combination of station, pollutant_id, and last_update serves as a
composite primary key, ensuring the uniqueness of each row

4.2 Second Normal Form (2NF)


To achieve Second Normal Form (2NF), partial dependencies were
removed. In 1NF, some non-key attributes depended only on a part of the
composite primary key. For example:
• Latitude and Longitude depend only on the station.
• Pollutant Min, Max, and Avg depend only on the combination of
pollutant_id and last_update.
Table A: Main Data
Station Pollutant Last Update
Secretariat, Amaravati – APPCB PM10 2024-09-16 02:00:00

Gulzarpet, Anantapur - APPCB SO2 2024-09-16 02:00:00

Gangineni Cheruvu, Chittoor - APPCB PM10 2024-09-16 02:00:00

Anand Kala Kshetram, Rajamahendravaram - SO2 2024-09-16 02:00:00


APPCB

Tirumala, Tirupati - APPCB NH3 2024-09-16 02:00:00

Table B: Pollutant Data


Pollutant Last Update Pollutant Min Pollutant Max Pollutant Avg

PM10 2024-09-16 40.0 75.0 55.0


02:00:00
SO2 2024-09-16 NaN NaN NaN
02:00:00
NH3 2024-09-16 1.0 2.0 1.0
02:00:00
Table C: Station Data
Station City Latitude Longitude
Secretariat, Amaravati - Amaravati 16.515083 80.518167
APPCB
Gulzarpet, Anantapur - Anantapur 14.675886 77.593027
APPCB
Gangineni Cheruvu, Chittoor 13.204880 79.097889
Chittoor - APPCB
Anand Kala Kshetram, Rajamahendravaram 16.987287 81.736318
Rajamahendravaram -
APPCB

Tirumala, Tirupati - Tirupati 13.670000 79.350000


APPCB

4.3 Third Normal Form (3NF)


To achieve Third Normal Form (3NF), we need to eliminate any transitive
dependencies. In the 2NF structure, some attributes were still indirectly
dependent on non-key attributes. For example:
• The city determines the state (i.e., city -> state).
• The state determines the country (i.e., state -> country).
Table A: Main Data
Station Pollutant Last Update
Secretariat, Amaravati – APPCB PM10 2024-09-16 02:00:00

Gulzarpet, Anantapur - APPCB SO2 2024-09-16 02:00:00

Gangineni Cheruvu, Chittoor - APPCB PM10 2024-09-16 02:00:00

Anand Kala Kshetram, Rajamahendravaram - APPCB SO2 2024-09-16 02:00:00

Tirumala, Tirupati - APPCB NH3 2024-09-16 02:00:00


Table B: Pollutant Data
Pollutant Last Update Pollutant Min Pollutant Max Pollutant Avg

PM10 2024-09-16 02:00:00 40.0 75.0 55.0

SO2 2024-09-16 02:00:00 NaN NaN NaN

NH3 2024-09-16 02:00:00 3.0 2.0 1.0

Table C: Station Data


Station City
Secretariat, Amaravati - APPCB Amaravati
Gulzarpet, Anantapur - APPCB Anantapur
Gangineni Cheruvu, Chittoor - APPCB Chittoor
Anand Kala Kshetram, Rajamahendravaram - APPCB Rajamahendravaram

Tirumala, Tirupati - APPCB Tirupati

Table D: City Data


City State
Amaravati Andhra Pradesh
Anantapur Andhra Pradesh
Chittoor Andhra Pradesh
Rajamahendravaram Andhra Pradesh
Tirupati Andhra Pradesh

Table E: State Data


State Country
Andhra Pradesh India
5) Statistical Analysis
5.1 T-Test for Pollutant Levels between Kerala and Delhi
• Objective: This test aims to determine if there is a significant
difference in average pollutant levels between Kerala and Delhi.
• Methodology: A two-sample t-test was performed on pollutant_avg
values from each state. The null hypothesis assumed no difference in
average pollution levels.
• Results: The t-test yielded a p-value of [insert p-value], indicating
whether the difference in pollution levels is statistically significant.
Output :
T-Test between Kerala and Delhi pollutant_avg p-value: 0.004206118877480918

5.2 One-Way ANOVA for State Effect on Pollutant Levels


• Objective: To assess whether the state (Kerala or Delhi) has a
significant effect on pollutant levels.
• Methodology: A one-way ANOVA was conducted using the formula
pollutant_avg ~ C(state).
• Results: The ANOVA results indicated an F-statistic of [insert F-
statistic] and a p-value of [insert p-value], suggesting whether state
differences significantly impact pollutant levels.
One-Way ANOVA Results
Source DF Sum of Squares Mean Square F-value P-value
C(state) 1.0 14,764.4354 14,764.4354 8.3338 0.0042
Residual 270.0 478,341.3249 1,771.6345 NaN NaN

5.3 Two-Way ANOVA for State and Pollutant Type


• Objective: To examine the combined effect of state and pollutant_id
on pollutant_avg.
• Methodology: A two-way ANOVA was conducted to test for both
main effects and interaction effects.
• Results: The results showed whether there are significant interactions
between state and pollutant type in determining average pollution
levels.
Two-Way ANOVA Results
Source DF Sum of Squares Mean Square F-value P-value

C(state) 1.0 14,764.4354 14,764.4354 23.7079 1.956e-06

C(pollutant_id) 6.0 302,549.2657 50,424.8776 80.9695 1.752e-56

C(state) 6.0 15,119.0543 2,519.8424 4.0462 6.787e-04


(pollutant_id)
Residual 258.0 160,673.0050 622.7636 NaN NaN

5.4 Proportion Z-Test for High Pollution Levels


• Objective: This test was conducted to assess the proportion of high
pollution days (pollutant levels > 50) in the dataset.
• Methodology: A z-test for proportions was applied, comparing the
observed high pollution proportion against an expected value (e.g.,
0.5).
• Results: The z-test returned a p-value of [insert p-value], showing if
the observed high pollution rate was significantly different from the
assumed rate.
Output :
Proportion Z-Test for high pollution level (> 50) p-value: 2.059641982631519e-05
5.5 Chi-Square Test for Independence between Pollutant Type and
State
• Objective: This test assesses whether pollutant types are
independent of states, examining if certain pollutants are more
prevalent in Kerala or Delhi.
• Methodology: A chi-square test was performed on a contingency
table of pollutant_id and state.
• Results: The chi-square test yielded a p-value of [insert p-value],
indicating if there is a significant association between pollutant type
and state.
Output :
Chi-Square Test for independence between pollutant_id and state p-value:
0.9959679989232744

6) Data Visualisation
6.1) Boxplot for Pollutant Levels Across Kerala and Delhi
Objective: Visualize the distribution of average pollutant levels across the
Kerala and Delhi
• Number of Variables: 2 (state, pollutant_avg)
• Type of Relation: Categorical vs Numerical
• Type of Plot Selected: Box Plot
This box plot compares the average pollutant levels in Delhi and Kerala.
Here’s a breakdown of the information:
1. Delhi:
o The median pollutant level in Delhi is higher than in Kerala.
o There is a larger spread of data, with more variability in
pollutant levels.
o The box has a wider interquartile range (IQR), indicating more
variation in the middle 50% of pollutant levels.
o There are several outliers, with values significantly above the
rest of the data, reaching over 250.
2. Kerala:
o Kerala has a lower median pollutant level compared to Delhi.
o The data is less variable, as seen from a narrower IQR.
o There are no extreme outliers like in Delhi.
Inference: Pollution levels in Delhi are generally higher and more
variable than in Kerala, with some exceptionally high readings in Delhi.
This suggests that Delhi may experience more severe pollution events
or fluctuations in pollution levels compared to Kerala.

6.2) Histogram of Pollutant Levels for Kerala and Delhi


Objective: Compare the distribution of average pollutant levels between
Kerala and Delhi
• Number of Variables: 2 (state, pollutant_avg)
• Type of Relation: Categorical vs Numerical
• Type of Plot Selected: Histogram (stacked) with color distinction by
state
This histogram displays the frequency distribution of pollutant levels for
Kerala and Delhi, providing insights into the differences in pollution
levels between the two regions.
Observations:
1. Delhi:
o The green bars, representing Delhi, show a wide distribution of
pollutant levels, with a significant number of higher pollutant
readings compared to Kerala.
o Most of the Delhi data is concentrated below 100, with a
notable peak around the lower pollutant levels.
o There are also a few instances where the pollutant levels
exceed 150, even going above 250, which is consistent with
the outliers observed in the previous box plot.
2. Kerala:
o The orange bars, representing Kerala, are concentrated at the
lower end of the pollutant levels, primarily under 50.
o There are fewer high-pollutant readings in Kerala, and the
distribution is more compact, suggesting that extreme
pollution events are less frequent or severe compared to Delhi.
Inference:
This histogram confirms that pollutant levels in Delhi tend to be both
higher and more variable than in Kerala, with Delhi experiencing more
frequent and intense pollution episodes. In contrast, Kerala generally
maintains lower pollutant levels, with most readings concentrated below
50, indicating comparatively better air quality.

6.3) Distribution Plot of Pollutant Levels in Kerala and Delhi


Objective: Visualize the density of average pollutant levels for each state
• Number of Variables: 1 (pollutant_avg)
• Type of Relation: Numerical
• Type of Plot Selected: KDE plot (Kernel Density Estimation)
This graph compares the distribution of pollutant levels between Kerala
and Delhi. Here are the key insights:
1. Pollution Levels in Kerala vs. Delhi:
o Kerala (blue distribution) has lower pollutant levels compared
to Delhi (orange distribution).
o The peak (mode) for Kerala occurs at a much lower pollutant
level, around 20-30, whereas Delhi has a wider and flatter
distribution.
2. Spread and Skewness:
o The distribution of pollution in Delhi is more spread out,
indicating higher variability in pollutant levels. It has a long tail
extending towards higher pollution values, showing that Delhi
experiences extreme pollution levels more frequently than
Kerala.
o Kerala's pollutant levels are more concentrated, showing less
variation and generally lower pollution levels.
3. Overlap:
o There is some overlap between the two regions, suggesting
that Delhi and Kerala experience similar pollutant levels in a
certain range (roughly between 0 and 50), but beyond that,
Delhi's pollution levels extend significantly higher.
In summary, the graph suggests that Delhi experiences much higher and
more variable pollution compared to Kerala, with Kerala having more
moderate and less dispersed pollutant levels.

6.4) Heatmap of the Correlation Matrix for Numeric Variables


Objective: Understand the correlation between numeric variables in the
dataset
• Number of Variables: Multiple (all numeric columns)
• Type of Relation: Numerical vs Numerical
• Type of Plot Selected: Heatmap
This image shows a correlation matrix of numeric variables, with correlation
values color-coded. Here's an interpretation of the matrix:
1. Latitude and Longitude:
o Latitude and longitude have a positive correlation (0.76),
indicating that the two variables have some linear relationship,
though it is not perfect.
2. Pollutant Variables (min, max, avg):
o The pollutant-related variables (minimum, maximum, and
average pollutant levels) are highly correlated with each other:
▪ Pollutant min vs. pollutant max has a correlation of 0.82.
▪ Pollutant min vs. pollutant avg has a correlation of 0.90.
▪ Pollutant max vs. pollutant avg has a very high
correlation of 0.95.
o This indicates that areas experiencing higher minimum
pollutant levels also tend to have higher maximum and
average pollutant levels, showing strong internal consistency
among pollutant measures.
3. Latitude and Longitude vs. Pollutant Levels:
o Latitude and longitude show low correlations with pollutant
levels (correlation values range from 0.06 to 0.20), meaning
there is very little linear relationship between geographic
coordinates and pollution metrics in this dataset.

6.5) Bar Plot of High Pollution Proportion by State


Objective: Compare the proportion of high pollution levels across states
• Number of Variables: 2 (state, high_pollution)
• Type of Relation: Categorical vs Numerical
Type of Plot Selected: Bar plot
This bar chart shows the proportion of high pollution levels (where the
pollutant average is greater than 50) for two states: Delhi and Kerala.
Here’s an analysis:
1. Delhi:
o A large proportion (over 40%) of the data points for Delhi
have high pollution levels (pollutant average > 50). This
highlights that Delhi frequently experiences poor air quality.
2. Kerala:
o In contrast, only a small proportion (less than 10%) of the data
points for Kerala have high pollution levels. This suggests that
Kerala experiences relatively better air quality, with significantly
fewer instances of high pollution.
6.6) Andhra Pradesh State Data Analysis and Visualization
Objective: Distribution of Average Pollutant Levels
• Number of Variables: 1 (pollutant_avg)
• Type of Relation: Numerical
• Type of Plot Selected: Histogram with KDE (Kernel Density Estimation)

This histogram shows the distribution of average pollutant levels, with


frequency on the y-axis and average pollutant level on the x-axis. The
distribution appears to be right-skewed, meaning that most pollutant
levels are clustered toward lower values, with a long tail extending to
higher pollutant levels.
Key inferences include:
1. Concentration of Lower Levels: A significant number of observations
fall below 50 on the pollutant scale, indicating that lower pollutant
levels are much more common in the dataset.
2. Fewer High Pollutant Levels: As pollutant levels increase, their
frequency drops sharply, indicating that extremely high pollutant
levels are rare.

Objective: Average Pollutant Levels by Station


• Number of Variables: 2 (station, pollutant_avg)
• Type of Relation: Categorical–Numerical
• Type of Plot Selected: Box Plot
This scatter plot displays the average pollutant levels recorded at various
stations, with each station represented on the x-axis and the pollutant
levels on the y-axis. Here are some key insights:
1. Wide Variation Across Stations: There is significant variability in
pollutant levels across different stations, with most values clustering
below 100.
2. High Outliers: A few stations have extreme pollutant levels reaching
as high as around 400, which stand out from the general
distribution.
3. Frequent Low Levels: Most pollutant levels are relatively low, similar
to the histogram you shared earlier. This reinforces the trend that
while high pollutant levels are observed, they are rare.
4. Dense Cluster at the Bottom: The majority of data points are
concentrated below 50, indicating that low pollutant levels are
typical across most stations.
Objective: Comparison of Average Pollutant Levels by Pollutant Type
• Number of Variables: 2 (pollutant_id, pollutant_avg)
• Type of Relation: Categorical–Numerical
• Type of Plot Selected: Bar Plot

This bar chart shows the average pollutant levels by pollutant type, with
pollutant types on the x-axis and their corresponding average levels on the
y-axis. Here are some observations:
1. PM10 and PM2.5 Have the Highest Levels: Particulate matter (PM10
and PM2.5) shows the highest average levels among the pollutants,
with PM10 being the highest. This suggests that particulate pollution
is a significant concern in this dataset.
2. Low Levels for Ozone and NH3: Ozone and NH3 (ammonia) have the
lowest average levels, indicating that these pollutants are less
prevalent compared to others.
3. Moderate Levels for CO and NO2: Carbon monoxide (CO) and
nitrogen dioxide (NO2) show moderate average levels, falling
between the extremes observed for particulate matter and gases like
SO2 and NH3.
4. Variation in Pollutant Types: The variation in average levels
highlights the differing impacts and sources of each pollutant type,
with particulate matter likely originating from sources like
construction, traffic, and industrial activities, while gases may come
from combustion processes and other emissions.

Objective: Comparison of Min, Max, and Avg Pollutant Levels by Type


• Number of Variables: 3 (pollutant_id, pollutant_min, pollutant_max,
pollutant_avg)
• Type of Relation: Categorical–Numerical with Multiple Metrics
• Type of Plot Selected: Grouped Bar Plot

This bar chart displays the minimum, maximum, and average levels of
various pollutants. Here’s a breakdown of the key observations:
1. PM10 and PM2.5: These particulate matters have the highest
maximum levels among all pollutants, with PM10 reaching the
highest maximum level close to 90, while PM2.5 is also notably high.
The average and minimum values for these pollutants are also
relatively high, indicating frequent high concentrations.
2. CO: Carbon monoxide has a moderately high maximum level, though
its average and minimum values are lower compared to PM10 and
PM2.5, suggesting occasional spikes in concentration.
3. Ozone: Ozone shows a lower range overall compared to PM10,
PM2.5, and CO but still has a distinguishable peak at the maximum
level.
4. NO2 and SO2: These pollutants have lower levels across all metrics
(min, max, and average) compared to the others, implying generally
lower presence in the monitored data.
5. NH3: Ammonia has consistently low values across minimum,
maximum, and average levels, indicating it is the least present
pollutant in this data set.
Overall, PM10 and PM2.5 are the most prominent pollutants with
significant variability, while NH3 shows consistently low levels.

Objective: Average Pollutant Levels by Pollutant Type


• Number of Variables: 1 (pollutant_id)
• Type of Relation: Categorical
• Type of Plot Selected: Pie Chart
This pie chart illustrates the average pollutant levels by type, as a
percentage of the total pollutant levels. Here’s a summary of the key
points:
1. PM10: This pollutant contributes the most to the total average
levels, accounting for 31.0% of the overall pollutant presence. This
reinforces that PM10 is a major pollutant in this data set.
2. PM2.5: The second-highest contributor at 20.9%, PM2.5 is another
significant pollutant, indicating a substantial presence of particulate
matter in the atmosphere.
3. CO (Carbon Monoxide): With 18.0%, CO also contributes notably to
the overall pollutant levels, though it's slightly less than PM10 and
PM2.5.
4. Ozone: Ozone accounts for 11.6%, showing a moderate contribution
compared to other pollutants.
5. NO2: Nitrogen dioxide contributes 9.7%, which is moderate but less
than CO and particulate matter pollutants.
6. SO2 (Sulfur Dioxide): At 6.6%, SO2 has a relatively minor
contribution compared to the primary pollutants.
7. NH3 (Ammonia): NH3 has the lowest average presence, at just 2.2%,
indicating it is the least prevalent pollutant in this data set.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy