0% found this document useful (0 votes)
18 views13 pages

Predicting Blood Donation Using Machine Learning: Presented By

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

Predicting Blood Donation Using Machine Learning: Presented By

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Predicting Blood donation

Using
Machine learning

Presented By:-
SATYAPRAJNA SAMAL
(230301120218)
RAHUL SETHI(230301120222)
ADITYA ROSHAN
PRADHAN(230301120240)
RUDRANARAYAN PANI
Contents
• Introduction to ML
• Problem
• About Dataset
• Joining Train and Test Data
• Feature analysis
• Cross Validation Models
• Using of libraries
• Conclusion
INTRODUCTION:
*"In today’s world, despite the enormous scientific advancements and the great
developments in medical sciences, adequate supply of healthy blood is one of the
challenges and concerns of the medical community in the world. Blood donation has an
important and critical role to preserve the health and survival of human life. Preserving
and supplying the volume of blood required in blood banks of each region, and diverse
blood groups with connections between them, with assuming that number of blood
groups are rarer; makes prediction & planning more complicated & important during time.
Use data mining hospitals & transfer centres databases helps discovery relations so they
can have future prediction based on past information.

*The blood demand is increasing day by day due to accidents, surgeries etc.

*The motivation behind this research help predict donor help medical professionals
predict future demand & supply plan accordingly with Blood Banks & entice voluntary
donors meet demand."
Predicting if a Blood Donor will donate within a given time window
While working for Rotaract Club of MSIT from last 3 years one of my main responsibility was to organise
Blood Donation Camps, and it is an amazing event to organise because it gives you a feeling that you
are helping for a right cause which saves life.

Problem
One of the major problem while organising the Blood Donation Camp was that to convince the people
who were walking near the camp to be a donor which results in 70% of the people were not interesting
in donating due to reasons like they have work to do, they need to go somewhere etc. There's this one
time in every year when we organise a blood donation camp in Adarsh Public School, New Delhi on the
day of parent teacher meeting, so parents were already told about the donation camp and almost 80-
90% of the parents become donors.
So, I thought that if before organising the event we could we could reach out to the right people before
the donation then we will get more donors and can save more lives. As part of making records we were
collecting data of the volunteers from last 2 years and contains details of there address but it was not
well organised.
So I googled it.
I found the data that I needed from Drivendata.

Use information about each donor's history


•Months since Last Donation: this is the number of monthis since this donor's most recent donation.
•Number of Donations: this is the total number of donations that the donor has made.
•Total Volume Donated: this is the total amound of blood that the donor has donated in cubuc
Loading the Data
The data are pre-split into training and test sets, so we’ll read them in separately.
The good thing is we have no missing values and we have 576 rows and 6 Columns. The
features are 'Months since Last Donation', 'Number of Donations', 'Total Volume Donated',
'Months since First Donation'.
In the class column there are two classes
•class 1 : The donor donated blood in March 2007.
•class 0 : The donor did not donate blood in March 2007.
Note : I am asuming that 1 means donated and 0 means not donated
Joining Train and Test Data
Join train and test datasets in order to obtain the same number of features during categorical
conversion. This will help in feature engineering
Feature analysis:
Only months_since_first_donation seems to have a significative correlation
with the class probability.
It doesn't mean that the other features are not usefull. num_donations in
these features can be correlated with the class. To determine this, we need to
explore in detail these features.
We notice that num_donations
distributions are not the same in the
class 1 and class 0 subpopulations.
Indeed, there is a peak corresponding
to the people who have donated only
0-1 time will not donate blood and
who have donated 2-3 will likely
donate.
It seems that people have donated
more number of times are more likely
to donate blood.

We notice that
months_since_last_donation
distributions are not the same in
the class 1 and class 0
subpopulations. Indeed, there is a
peak corresponding to the people
who have donated recently(in 1-2
months) will donate blood.
It seems that people have
donated recently are more likely
We notice that
months_since_first_donation
distributions are not the same in
the class 1 and class 0
subpopulations.
Indeed, there is a peak
corresponding to the people who
have just donated recently(in 6-20
months) will not donate blood.

Volume donated is also a good


feature to know wether the
donor will donate or not.

From the above 4 graph we can


see that Frequency and monetary
values are highly correlated. So we
can use only the frequency.
Cross Validation Models
I compared 10 popular classifiers and evaluate the mean accuracy of each of them by a stratified
kfold cross validation procedure.
•SVC
•Decision Tree
•AdaBoost
•Random Forest
•Extra Trees
•Gradient Boosting
•Multiple layer perceprton (neural network)
•KNN
•Logistic regression
•Linear Discriminant Analysis

By seeing the figure we


can see that Random
Forest, Extra Trees,
Gradient Boosting
Using of libraries:
Numpy :- To perform wide variety of mathematicaloperations on arrays·
Pandas :- Provides various data structures and operationsfor manipulating
numerical data..
Matplotlib & seaborn :- For creating static,animated andinteractive
visualizations.·
Sklearn :- Tool for predictive data analysis. Featuresvarious classification,
regression and clusteringalgorithms
Conclusion
From results we can say that after applying decision tree, we have got accuracy of 75.4
percent due to very less correlation between the attributes of data set.
Now we can target the people who are interested in donating blood and which will results in
getting more volunteers and we can save more people.
For those interested, the Jupyter Notebook with all the code can be found in the Github
repository for this post.
Thank you

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy