E Book Mastering Ds For Interview
E Book Mastering Ds For Interview
ple G
uide
accredian
Mastering the
Data
Science
Interview
TABLE OF
CONTENTS
01
Overview
Basic Data Science Interview Questions 02
www.accredian.com
MASTERING DATA SCIENCE INTERVIEW accredian
OVERVIEW
Over the years, data science has gained widespread
importance due to the importance of data. Data is
considered the new oil of the future which when
analyzed and harnessed properly can prove to be
very beneficial to the stakeholders. Not just this, a
data scientist gets exposure to work in diverse
domains, solving real-life practical problems all by
making use of trendy technologies.
MANVENDER SINGH
CEO I ACCREDIAN www.accredian.com I 01
Basic
Data Science
Interview
Questions
www.accredian.com I 02
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 03
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 04
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 05
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 06
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 07
MASTERING DATA SCIENCE INTERVIEW accredian
It involves :
www.accredian.com I 10
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 11
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 12
MASTERING DATA SCIENCE INTERVIEW accredian
Overfitting:
The model performs well
only for the sample training
data. If any new data is
given as input to the model,
it fails to provide any result.
These conditions occur due
to low bias and high
variance in the model.
Decision trees are more
prone to Overfitting.
Underfitting:
Here, the model is so simple
that it is not able to identify
the correct relationship in
the data, and hence it does
not perform well even on
the test data. This can
happen due to high bias
and low variance. Linear
regression is more prone
to Underfitting.
www.accredian.com I 14
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 15
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 16
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 17
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 18
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 19
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 20
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 21
Intermediate
Data Science
Interview
Questions
www.accredian.com I 22
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 23
MASTERING DATA SCIENCE INTERVIEW accredian
GRADIENT DESCENT
www.accredian.com I 25
MASTERING DATA SCIENCE INTERVIEW accredian
So, the closer the curve to the upper left corner, the
better the model is. In other words, whichever curve has
greater area under it that would be the better model.
You can see this in the below graph:
www.accredian.com I 26
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 27
MASTERING DATA SCIENCE INTERVIEW accredian
Database Design:
This is the process of designing the database.
The database design creates an output which is a
detailed data model of the database.
Strictly speaking, database design includes the
detailed logical model of a database but it can also
include physical design choices and storage
parameters.
www.accredian.com I 28
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 29
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 30
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 31
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 32
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 33
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 34
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 35
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 36
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 37
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 38
MASTERING DATA SCIENCE INTERVIEW accredian
As you can see from the image, before the optimal point,
increasing the complexity of the model reduces the error
(bias). However, after the optimal point, we see that the
increase in the complexity of the machine learning
model increases the variance.
www.accredian.com I 40
MASTERING DATA SCIENCE INTERVIEW accredian
In these formulas:
FP = false positive
FN = false negative
TP = true positive
RN = true negative
www.accredian.com I 41
MASTERING DATA SCIENCE INTERVIEW accredian
Also,
www.accredian.com I 42
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 43
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 44
MASTERING DATA SCIENCE INTERVIEW accredian
RANDOM FOREST
www.accredian.com I 45
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 46
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 47
MASTERING DATA SCIENCE INTERVIEW accredian
In case the outliers are not that extreme, then we can try:
www.accredian.com I 48
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 49
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 50
MASTERING DATA SCIENCE INTERVIEW accredian
Once all the models are trained, then it’s time to make a
prediction, we make predictions using all the trained
models and then average the result in the case of
regression, and for classification, we choose the result,
generated by models, that have the highest frequency.
www.accredian.com I 52
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 53
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 54
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 55
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 56
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 58
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 59
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 60
Advanced
Data Science
Interview
Questions
www.accredian.com I 61
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 62
MASTERING DATA SCIENCE INTERVIEW accredian
RMSE:
MSE:
www.accredian.com I 64
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 65
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 66
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 67
MASTERING DATA SCIENCE INTERVIEW accredian
Let us say that there is a wine seller who has his own
shop. This wine seller purchases wine from the dealers
who sell him the wine at a low cost so that he can sell
the wine at a high cost to the customers. Now, let us say
that the dealers whom he is purchasing the wine from,
are selling him fake wine. They do this as the fake wine
costs way less than the original wine and the fake and
the real wine are indistinguishable to a normal consumer
(customer in this case). The shop owner has some friends
who are wine experts and he sends his wine to them
every time before keeping the stock for sale in his shop.
So, his friends, the wine experts, give him feedback that
the wine is probably fake. Since the wine seller has been
purchasing the wine for a long time from the same
dealers, he wants to make sure that their feedback is
right before he complains to the dealers about it. Now,
let us say that the dealers also have got a tip from
somewhere that the wine seller is suspicious of them.
So, in this situation, the dealers will try their best to sell
the fake wine whereas the wine seller will try his best to
identify the fake wine. Let us see this with the help of a
diagram shown below:
www.accredian.com I 68
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 69
MASTERING DATA SCIENCE INTERVIEW accredian
The training data will be split into various groups and the
model is run and validated against these groups in
rotation.
K- Fold method
Leave p-out method
Leave-one-out method
Holdout method
www.accredian.com I 74
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 75
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 76
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 77
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 79
MASTERING DATA SCIENCE INTERVIEW accredian
FILTER METHODS:
These methods pick up only the intrinsic properties of
features that are measured via univariate statistics
and not cross-validated performance. They are
straightforward and are generally faster and require
less computational resources when compared to
wrapper methods.
Chi-Square test,
Fisher’s Score method,
Correlation Coefficient,
Variance Threshold,
Mean Absolute Difference (MAD) method,
Dispersion Ratios, etc
www.accredian.com I 81
MASTERING DATA SCIENCE INTERVIEW accredian
WRAPPER METHODS:
www.accredian.com I 82
MASTERING DATA SCIENCE INTERVIEW accredian
EMBEDDED METHODS
www.accredian.com I 83
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 85
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 86
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 87
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 88
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 89
MASTERING DATA SCIENCE INTERVIEW accredian
www.accredian.com I 91
MASTERING DATA SCIENCE INTERVIEW accredian
GRID SEARCH:
www.accredian.com I 92
MASTERING DATA SCIENCE INTERVIEW accredian
RANDOM SEARCH:
www.accredian.com I 93
MASTERING DATA SCIENCE INTERVIEW accredian
We know that,
Probability of finding atleast 1 shooting star in 15 min
= P(sighting in 15min) = 30% = 0.3
Hence, Probability of not sighting any
shooting star in 15 min = 1-P(sighting in 15min)
= 1-0.3
= 0.7
We know that,
Probability of finding atleast 1 shooting star in 15 min
= P(sighting in 15min) = 30% = 0.3
Hence, Probability of not sighting any
shooting star in 15 min = 1-P(sighting in 15min)
= 1-0.3
= 0.7
www.accredian.com I 96
MASTERING DATA SCIENCE INTERVIEW accredian
START YOUR
JOURNEY TO
BECOMING A
DATA SCIENCE
EXPERT
Now that you have a comprehensives overview of the field of Data
Science, the career opportunities that await you, and the skills you
need to get there, the next and most effective step towards achieving
your goal is to get certified and learn all you need to.
Professional Program
E&ICT IIT G - Executive Program in Data Science & AI - 12 months
E&ICT IIT G - Executive Program in Data Science & AI - 10 months
Advanced Program
E&ICT IIT G - Advanced Certification in Data Science & ML - 6 months
Basic Program
E&ICT IIT G - Certificate in Data Analytics - 3 months
www.accredian.com I 97
accredian
INDIA
www.accredian.com