My Dsbda Miniproject 1
My Dsbda Miniproject 1
A
MINI-PROJECT REPORT
ON
BY
CERTIFICATE
This is to certify that final project work entitled “Performing Data Analytics Operations on a
Dataset” was successfully carried by
SWARAJ PRAKASH FARAKATE
ROLL NO : 305C055
In the partial fulfilment of the DSBDA course during Semester-II of Third Year of Computer
Engineering prescribed by the SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE.
Dr. S. D. Lokhande
(Principal)
Sinhgad College of Engineering
ACKNOWLEDGEMENT
I feel great pleasure in expressing my deepest sense of gratitude and sincere thanks to my
guide Prof. A. M. Karanjkar for their valuable guidance during the Project work, without
which it would have been very difficult task. I have no words to express my sincere thanks
for valuable guidance, extreme assistance and cooperation extended to all the Staff
Members of department of Computer Engineering. This acknowledgement would be
incomplete without expressing my special thanks to Dr. M. P. Wankhade, Head of the
Department (Computer Engineering) for their support during the work. I would also like to
extend my heartfelt gratitude to my Principal, Dr. S. D. Lokhande who provided a lot of
valuable support, mostly being behind the veils of college bureaucracy. Last but not least I
would like to thanks all the Teaching, Non- Teaching staff members of my department, my
parent and my colleagues those who helped me directly or indirectly for completing of this
Project successfully.
ABSTRACT
This project delves into the realm of data analytics applied to a dataset containing
information on COVID-19 vaccinated individuals, both fully and partially vaccinated. The
primary objective is to glean insights into the vaccination landscape across different states
by analyzing gender distribution among vaccinated individuals. Leveraging data analytics
techniques, the project aims to uncover patterns and trends regarding vaccination rates
among males and females on a state-wise level. By dissecting this dataset, valuable
information can be extracted to inform public health strategies, identify areas for targeted
interventions, and ensure equitable distribution of vaccines. This report documents the
methodology employed, key findings, and implications derived from the data analysis
process. Through this endeavor, we endeavor to contribute to the broader discourse on
leveraging data analytics to inform evidence-based decision-making in public health crises
like the COVID-19 pandemic. Additionally, we explore the impact of vaccination progress
over time, examining how vaccination rates have evolved across different demographic
segments. This temporal analysis enables us to gauge the efficacy of vaccination campaigns
and identify regions requiring targeted interventions. Moreover, we employ visualization
techniques such as bar charts and heatmaps to present our findings intuitively, facilitating
easier comprehension and interpretation. Overall, this project underscores the significance
of data analytics in informing public health strategies amidst the ongoing battle against
COVID-19. By leveraging data-driven insights, policymakers and healthcare authorities
can optimize resource allocation and devise targeted interventions, ultimately aiding in the
collective effort to mitigate the impact of the pandemic and safeguard public health.
TABLE OF CONTENT
Chapter 3: Motivation 3
Chapter 4: Methodology 4
4.1 Introduction to Dataset 4
4.2 Introduction to Jupyter Notebook 5
4.3 Set of Commands 6
Chapter 5: Result Analysis 7
Chapter 6: Conclusion 9
Chapter 7: References 10
CHAPTER 1: INTRODUCTION
By fulfilling these objectives, this mini project endeavors to contribute to the broader
discourse on public health strategies amidst the COVID-19 pandemic, aiding in the
formulation of evidence-based policies to combat the spread of the virus and safeguard
public health.
1
-
.
CHAPTER 2:
PROBLEM STATEMENT
2
CHAPTER 3:
MOTIVATION
The COVID-19 pandemic has posed unprecedented challenges to global health systems,
economies, and societies at large. Amidst this crisis, vaccination emerges as a pivotal tool
in combating the spread of the virus and mitigating its impact. As vaccination campaigns
roll out worldwide, it becomes imperative to assess their efficacy and address potential
disparities in coverage. Data analytics offers a powerful lens through which to examine
vaccination trends, providing valuable insights into the distribution of vaccinated
individuals across different demographic segments and geographic regions. By analyzing
a dataset encompassing information on COVID-19 vaccinated patients, this project aims
to unravel patterns related to gender and vaccination status at the state level. Through this
endeavor, we aspire to contribute to the collective understanding of vaccination dynamics,
thereby informing evidence-based strategies to enhance public health outcomes and
navigate the path towards recovery from the ongoing pandemic.
3
3.2 SCHEMA DIAGRAM
CHAPTER 4:
METHODOLOGY
4.1 Dataset Introduction:
Jupyter Notebook serves as an ideal environment for our analytical pursuits, providing an
interactive platform that seamlessly integrates code execution, data visualization, and
narrative documentation. Through the utilization of Pandas and numpy libraries, we
harness the robust functionalities they offer for data manipulation, analysis, and
computation. Pandas empowers us with high-level data structures and intuitive tools for
data wrangling, facilitating efficient exploration and manipulation of our dataset.
Meanwhile, NumPy enhances our computational capabilities, offering a plethora of
mathematical functions and operations essential for statistical analysis. As we navigate
through our Jupyter Notebook, we embark on a journey of discovery, employing a series
of operations to extract actionable insights from our dataset. Our analytical endeavors
encompass tasks such as segregating vaccinated individuals based on gender and
vaccination status across different states, enabling us to discern trends and patterns that
may inform public health strategies. By leveraging the combined power of Jupyter
Notebook, Pandas, and NumPy, we endeavor to contribute valuable insights to the
ongoing efforts aimed at combating the COVID-19 pandemic. Some advantages of jupyter
notebook are:
4
makes it a popular choice among data scientists, researchers, and educators working
across different domains.
3. Mixing Code, Text, and Visualizations: Jupyter Notebook allows users to seamlessly
integrate code cells with markdown cells containing formatted text, equations, and
multimedia elements. This blend of code, text, and visualizations facilitates clear
documentation of the analysis process, making it easier to communicate findings and
insights.
4. Rich Output Formats: Jupyter Notebook provides support for a wide range of output
formats, including HTML, LaTeX, PDF, and various image formats. This flexibility
allows users to generate reports, presentations, and publications directly from their
notebooks, enhancing reproducibility and sharing capabilities.
5. Code Modularity and Reusability: Jupyter Notebook encourages code modularity and
reusability through the creation of functions, classes, and modules. Users can organize
their code into logical units, making it easier to maintain, debug, and extend for future
projects.
1. import pandas as pd : This command is used to import pandas library in your existing
file browser and it also refers that in remaining document it will be used as pd.
5. df.describe(): This command is used to give all information about the dataset that you
have used. Like finding count, mean, standard deviation, etc.
6. df.shape(): This command is used to give you count of rows and columns.
5
condition is True. The resulting DataFrame filtered contains only the rows where the value
in the "State" column is either 'Rajasthan' or 'Delhi'.
6
CHAPTER 5:
RESULT ANALYSIS
Dataset :
7
Code and Results:
8
CHAPTER 6:
CONCLUSION
9
TABLE WITH NORMALIZE FORM
1.STATION TABLE
Station_id Station_name
1 Station A
2 Station B
3 Station C
4 Station D
2. TRAIN TABLE
Train_id Train_name Dep_time Arl_time Dep_stn_id ArL_id
3.PASSANGER TABLE
Passanger_id Passanger_name age Gender
1 Ramesh 33 Male
2 Ram 34 Female
3 xeviour 45 male
4.TICKET TABLE
CHAPTER 7
➢ https://www.kaggle.com
➢ https://www.kaggle.com/sudalairajkumar/covid19- inndia?select=covid_vaccine_statewise.csv
➢ https://jupyternotebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.
➢ https://www.kaggle.com
➢ https://www.kaggle.com/sudalairajkumar/covid19-
inndia?select=covid_vaccine_statewise.csv
➢ https://jupyternotebook.readthedocs.io/en/stable/examples/Notebook/
Notebook%20Basics.html
10
7.3 REFRENCES:
➢ https://www.kaggle.com
➢ https://www.kaggle.com/sudalairajkumar/covid19- inndia?select=covid_vaccine_statewise.csv
➢ https://jupyternotebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.