0% found this document useful (0 votes)
77 views23 pages

My Dsbda Miniproject 1

Uploaded by

Swaraj Farakate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views23 pages

My Dsbda Miniproject 1

Uploaded by

Swaraj Farakate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

SAVITRIBAI PHULE PUNE UNIVERSITY

A
MINI-PROJECT REPORT

ON

“Performing Data Analytics Operations on a Dataset”


(A project to fulfill the requirements of DSBDA Lab)

BY

Prathmesh Digambar Somwanshi


Kaustubh Anil Shinde
Prajwal Maruti Shinde
Swaraj Prakash Farakate

Under the guidance of


(Prof. A. M. Karanjkar)

DEPARTMENT OF COMPUTER ENGINEERING


DEPARTMENT OF COMPUTER ENGINEERING

SINHGAD COLLEGE OF ENGINEERING, VADGAON, PUNE

CERTIFICATE
This is to certify that final project work entitled “Performing Data Analytics Operations on a
Dataset” was successfully carried by
SWARAJ PRAKASH FARAKATE
ROLL NO : 305C055

In the partial fulfilment of the DSBDA course during Semester-II of Third Year of Computer
Engineering prescribed by the SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE.

Prof. A. M. Karanjkar Dr. M . P. Wankhade


(Faculty Coordinator) H.O.D
Department of Computer Engineering Department of Computer Engineering

Dr. S. D. Lokhande
(Principal)
Sinhgad College of Engineering
ACKNOWLEDGEMENT

I feel great pleasure in expressing my deepest sense of gratitude and sincere thanks to my
guide Prof. A. M. Karanjkar for their valuable guidance during the Project work, without
which it would have been very difficult task. I have no words to express my sincere thanks
for valuable guidance, extreme assistance and cooperation extended to all the Staff
Members of department of Computer Engineering. This acknowledgement would be
incomplete without expressing my special thanks to Dr. M. P. Wankhade, Head of the
Department (Computer Engineering) for their support during the work. I would also like to
extend my heartfelt gratitude to my Principal, Dr. S. D. Lokhande who provided a lot of
valuable support, mostly being behind the veils of college bureaucracy. Last but not least I
would like to thanks all the Teaching, Non- Teaching staff members of my department, my
parent and my colleagues those who helped me directly or indirectly for completing of this
Project successfully.
ABSTRACT

This project delves into the realm of data analytics applied to a dataset containing
information on COVID-19 vaccinated individuals, both fully and partially vaccinated. The
primary objective is to glean insights into the vaccination landscape across different states
by analyzing gender distribution among vaccinated individuals. Leveraging data analytics
techniques, the project aims to uncover patterns and trends regarding vaccination rates
among males and females on a state-wise level. By dissecting this dataset, valuable
information can be extracted to inform public health strategies, identify areas for targeted
interventions, and ensure equitable distribution of vaccines. This report documents the
methodology employed, key findings, and implications derived from the data analysis
process. Through this endeavor, we endeavor to contribute to the broader discourse on
leveraging data analytics to inform evidence-based decision-making in public health crises
like the COVID-19 pandemic. Additionally, we explore the impact of vaccination progress
over time, examining how vaccination rates have evolved across different demographic
segments. This temporal analysis enables us to gauge the efficacy of vaccination campaigns
and identify regions requiring targeted interventions. Moreover, we employ visualization
techniques such as bar charts and heatmaps to present our findings intuitively, facilitating
easier comprehension and interpretation. Overall, this project underscores the significance
of data analytics in informing public health strategies amidst the ongoing battle against
COVID-19. By leveraging data-driven insights, policymakers and healthcare authorities
can optimize resource allocation and devise targeted interventions, ultimately aiding in the
collective effort to mitigate the impact of the pandemic and safeguard public health.
TABLE OF CONTENT

Chapter Page No.


Chapter 1: Introduction 1

1.1 Background Study 1


1.2 Objective or Purposes 1
Chapter 2: Problem Statement 2

Chapter 3: Motivation 3

Chapter 4: Methodology 4
4.1 Introduction to Dataset 4
4.2 Introduction to Jupyter Notebook 5
4.3 Set of Commands 6
Chapter 5: Result Analysis 7
Chapter 6: Conclusion 9

Chapter 7: References 10
CHAPTER 1: INTRODUCTION

1.1 BACKGROUND STUDY


A In this context, our mini project focuses on performing data analytics on a given dataset
comprising information on COVID-19 vaccinated patients, both fully and partially.
Specifically, we aim to explore gender-based disparities in vaccination coverage across
different states. Understanding the distribution of vaccinated males and females within
each state is essential for identifying demographic-specific vaccination trends and
addressing potential disparities. By leveraging analytical techniques such as data
preprocessing, visualization, and statistical analysis, we seek to extract actionable insights
from the dataset. Through our analysis, we endeavor to contribute to the growing body of
knowledge surrounding COVID-19 vaccination strategies, ultimately supporting
evidence-based decision-making in public health policy and practice.

1.2 OBJECTIVE AND PURPOSE


Through this analysis, we aim to achieve several key objectives:

1. Demographic Insights: Explore the distribution of vaccinated individuals based on


gender and assess any gender-based disparities in vaccination coverage.
2. Geographic Analysis: Investigate the spatial distribution of vaccination rates across
various states and regions, identifying areas with high or low vaccination uptake.
3. Trend Identification: Examine temporal trends in vaccination progress to gauge the
effectiveness of vaccination campaigns over time.
4. Data Visualization: Utilize visualizations such as bar charts, heatmaps, and graphs to
present our findings in a clear and concise manner, facilitating easier interpretation and
decision-making.
5. Policy Implications: Provide actionable insights to policymakers and healthcare
authorities to optimize resource allocation and formulate targeted interventions aimed at
improving vaccination coverage and addressing disparities.

By fulfilling these objectives, this mini project endeavors to contribute to the broader
discourse on public health strategies amidst the COVID-19 pandemic, aiding in the
formulation of evidence-based policies to combat the spread of the virus and safeguard
public health.

1
-

.
CHAPTER 2:
PROBLEM STATEMENT

Use the following covid_vaccine_statewise.csv dataset and perform following analytics


on the given dataset :
https://www.kaggle.com/sudalairajkumar/covid19-
inndia?select=covid_vaccine_statewise.csv

a. Describe the dataset


b. Number of persons state wise vaccinated for first dose in India
c. Number of persons state wise vaccinated for second dose in India
d. Number of Males vaccinated
d. Number of females vaccinated

2
CHAPTER 3:
MOTIVATION
The COVID-19 pandemic has posed unprecedented challenges to global health systems,
economies, and societies at large. Amidst this crisis, vaccination emerges as a pivotal tool
in combating the spread of the virus and mitigating its impact. As vaccination campaigns
roll out worldwide, it becomes imperative to assess their efficacy and address potential
disparities in coverage. Data analytics offers a powerful lens through which to examine
vaccination trends, providing valuable insights into the distribution of vaccinated
individuals across different demographic segments and geographic regions. By analyzing
a dataset encompassing information on COVID-19 vaccinated patients, this project aims
to unravel patterns related to gender and vaccination status at the state level. Through this
endeavor, we aspire to contribute to the collective understanding of vaccination dynamics,
thereby informing evidence-based strategies to enhance public health outcomes and
navigate the path towards recovery from the ongoing pandemic.

3
3.2 SCHEMA DIAGRAM
CHAPTER 4:
METHODOLOGY
4.1 Dataset Introduction:

The dataset under analysis pertains to COVID-19 vaccinated individuals, encompassing


both fully and partially vaccinated patients. With the global battle against the COVID-19
pandemic at the forefront, vaccination campaigns have emerged as a critical strategy in
combating the spread of the virus and mitigating its impact on public health. This dataset
offers a comprehensive repository of information regarding individuals who have
undergone vaccination, capturing essential demographic details such as age, gender, and
geographic location. By harnessing the power of data analytics, we aim to extract valuable
insights from this dataset, with a particular emphasis on understanding vaccination trends
across different states. Through a systematic exploration of the data, we seek to unravel
patterns pertaining to the distribution of vaccinated individuals based on gender and
vaccination status within each state. Moreover, this analysis will extend to examining the
temporal evolution of vaccination rates, enabling us to discern how vaccination efforts
have unfolded over time and across various demographic segments.

4.2 Introduction to Jupyter Notebook:

Jupyter Notebook serves as an ideal environment for our analytical pursuits, providing an
interactive platform that seamlessly integrates code execution, data visualization, and
narrative documentation. Through the utilization of Pandas and numpy libraries, we
harness the robust functionalities they offer for data manipulation, analysis, and
computation. Pandas empowers us with high-level data structures and intuitive tools for
data wrangling, facilitating efficient exploration and manipulation of our dataset.
Meanwhile, NumPy enhances our computational capabilities, offering a plethora of
mathematical functions and operations essential for statistical analysis. As we navigate
through our Jupyter Notebook, we embark on a journey of discovery, employing a series
of operations to extract actionable insights from our dataset. Our analytical endeavors
encompass tasks such as segregating vaccinated individuals based on gender and
vaccination status across different states, enabling us to discern trends and patterns that
may inform public health strategies. By leveraging the combined power of Jupyter
Notebook, Pandas, and NumPy, we endeavor to contribute valuable insights to the
ongoing efforts aimed at combating the COVID-19 pandemic. Some advantages of jupyter
notebook are:

1. Interactive Computing Environment: Jupyter Notebook provides an interactive


computing environment that allows users to write and execute code in a sequential
manner. This interactive nature enables users to iteratively explore data, experiment with
algorithms, and visualize results in real-time.
2. Support for Multiple Programming Languages: Jupyter Notebook supports multiple
programming languages, including but not limited to Python, R, and Julia. This versatility

4
makes it a popular choice among data scientists, researchers, and educators working
across different domains.
3. Mixing Code, Text, and Visualizations: Jupyter Notebook allows users to seamlessly
integrate code cells with markdown cells containing formatted text, equations, and
multimedia elements. This blend of code, text, and visualizations facilitates clear
documentation of the analysis process, making it easier to communicate findings and
insights.
4. Rich Output Formats: Jupyter Notebook provides support for a wide range of output
formats, including HTML, LaTeX, PDF, and various image formats. This flexibility
allows users to generate reports, presentations, and publications directly from their
notebooks, enhancing reproducibility and sharing capabilities.
5. Code Modularity and Reusability: Jupyter Notebook encourages code modularity and
reusability through the creation of functions, classes, and modules. Users can organize
their code into logical units, making it easier to maintain, debug, and extend for future
projects.

4.3 Set of Commands:

1. import pandas as pd : This command is used to import pandas library in your existing
file browser and it also refers that in remaining document it will be used as pd.

2. import numpy as np : It is a Python statement commonly used to import the NumPy


library and give it the alias np.

3. df = pd.read_csv("C:\\Users\\Admin\\Downloads\\state.csv") : This line of code reads


a CSV file named "state.csv" located at the specified file path :
"C:\Users\Admin\Downloads\" into a DataFrame using the Pandas library.

4. df : This command is used to print the actual database in your notebook.

5. df.describe(): This command is used to give all information about the dataset that you
have used. Like finding count, mean, standard deviation, etc.

6. df.shape(): This command is used to give you count of rows and columns.

7. sum1=df[df['State']=='Rajasthan']['Total-Vaccinated'].sum(): computes the total


number of vaccinated individuals in the state of Rajasthan based on the data available in
the DataFrame df. This operation is performed using the capabilities of Pandas within the
Jupyter Notebook environment.

8. condition = (df['State'] == 'Rajasthan') | (df['State'] == 'Delhi') : is creating a boolean


mask where the condition is true for rows where the "State" column contains either
"Rajasthan" or "Delhi". This mask can be used to filter rows in the DataFrame df.

9. filtered=df[(df['State'] == 'Rajasthan') | (df['State'] == 'Delhi')] : Finally, this line uses


the combined boolean mask to filter the DataFrame df selecting only the rows where the

5
condition is True. The resulting DataFrame filtered contains only the rows where the value
in the "State" column is either 'Rajasthan' or 'Delhi'.

10. sum = filtered['Total-Vaccinated'].sum() : This command is used to calculate the sum


of all the true values that you sorted by above command.

6
CHAPTER 5:
RESULT ANALYSIS

Dataset :

7
Code and Results:

8
CHAPTER 6:
CONCLUSION

In conclusion, our exploration of COVID-19 vaccination data through data analytics


techniques, utilizing the Pandas and NumPy libraries within Jupyter Notebook, has
yielded valuable insights into the vaccination landscape across different states. Through
operations such as segregating vaccinated individuals by gender and vaccination status,
we have been able to discern patterns and disparities in vaccination coverage. Our analysis
revealed variations in vaccination rates between males and females across different states,
underscoring the importance of targeted outreach efforts to address potential gender-
specific disparities in vaccine uptake. Additionally, the classification of patients based on
their vaccination status—fully and partially vaccinated—provided a comprehensive
overview of the progress of vaccination campaigns within each state. Furthermore, the
visualization of our findings through various charts and graphs enhanced the
interpretability of the results, facilitating informed decision-making by healthcare
authorities and policymakers. By leveraging data analytics, we have not only gained
insights into the current state of COVID-19 vaccination efforts but also laid the
groundwork for future analyses and interventions aimed at optimizing vaccination
strategies and improving public health outcomes.

9
TABLE WITH NORMALIZE FORM

1.STATION TABLE
Station_id Station_name
1 Station A
2 Station B
3 Station C
4 Station D

2. TRAIN TABLE
Train_id Train_name Dep_time Arl_time Dep_stn_id ArL_id

1 Train1 08:00 10:00 1 3

2 Train2 10:30 12:30 2 4

3 Train3 12:00 14:00 1 4

3.PASSANGER TABLE
Passanger_id Passanger_name age Gender
1 Ramesh 33 Male
2 Ram 34 Female
3 xeviour 45 male

4.TICKET TABLE
CHAPTER 7

➢ https://www.kaggle.com

➢ https://www.kaggle.com/sudalairajkumar/covid19- inndia?select=covid_vaccine_statewise.csv

➢ https://jupyternotebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.

7.2 FUTURE ASPECT


The future aspects of railway management systems are likely to involve advancements in technology and
adoption of innovative solutions to further improve efficiency, safety, and passenger experience. Here are s
potential future developments in railway management systems:
1. Internet of Things (IoT) Integration: Railway systems can leverage IoT technologies to gather real-
data from trains, tracks, and infrastructure. This data can be used for predictive maintenance, monitoring
performance, optimizing schedules, and enhancing safety measures.
2. Artificial Intelligence (AI) and Machine Learning (ML): AI and ML algorithms can be applied to ana
large volumes of data collected from various sources, such as passenger information, train operations,
maintenance records. This can help in predicting demand, optimizing routes, detecting anomalies,
improving decision-making processes.
3. Automated Ticketing and Fare Systems: The implementation of smart ticketing and fare systems,
as contactless payment methods, mobile ticketing, and digital wallets, can streamline ticketing proce
reduce queues, and enhance passenger convenience.
4. Advanced Security and Surveillance: Railway management systems can employ advanced secu
measures, including biometric identification systems, facial recognition, and video analytics, to enh
security and prevent unauthorized access.
5. Integration with Mobility Services: Integration with other modes of transportation, such as buses, t
and ride-sharing services, can provide passengers with seamless multi-modal travel experiences. This
include integrated ticketing systems and real-time information updates across different modes of transpor
6. Enhanced Passenger Experience: Railway management systems can focus on improving the ov
passenger experience by offering personalized services, providing real-time travel information, offe
entertainment options, and ensuring comfort and convenience during the journey.
CHAPTER 7:
REFRENCES

➢ https://www.kaggle.com

➢ https://www.kaggle.com/sudalairajkumar/covid19-
inndia?select=covid_vaccine_statewise.csv

➢ https://jupyternotebook.readthedocs.io/en/stable/examples/Notebook/
Notebook%20Basics.html

10
7.3 REFRENCES:

➢ https://www.kaggle.com

➢ https://www.kaggle.com/sudalairajkumar/covid19- inndia?select=covid_vaccine_statewise.csv

➢ https://jupyternotebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy