0% found this document useful (0 votes)

116 views14 pages

Everything Know About P Value From Scratch Data Science

This document provides an overview of p-values, how they are used and interpreted in statistics and data science. It explains that p-values represent the probability of obtaining sample results at least as extreme as the actual observed results, assuming the null hypothesis is true. Lower p-values provide stronger evidence against the null hypothesis. The document uses examples and diagrams to illustrate key concepts like statistical significance, the relationship between p-values and the threshold alpha value, and how to interpret p-values in the context of hypothesis testing. It aims to build intuition around p-values from first principles to help data scientists better explain and understand this important statistical concept.

Uploaded by

sifar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views14 pages

Everything Know About P Value From Scratch Data Science

Uploaded by

sifar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Everything you Should Know about p-value from Scratch for Data

Science
I NT E RM E D I AT E S T AT I S T I C S

Overview

What is p-value? Where is it used in data science? And how can we calculate it?
We answer all these questions and more in this article on learning p-value from scratch
This article looks at p-value from the statistics as well as the data science perspective

Introduction

Does the below scenario look familiar when you talk about p-value to aspiring data scientists?

I cannot tell you the number of times data scientists, even established ones, flounder when it comes to
explaining how to interpret a p-value. In fact, take a moment to answer these questions:

How do you interpret a p-value?

How much importance should we place in the p-value?
How will you explain the significance of p-value to a non-data science person (a stakeholder for
example)?

These are crucial questions that every data science professional should be able to answer. And in my
experience, most struggle to get past the first question. We cannot expect to convince our clients about
the result of a machine learning model if we can’t break it down for them, right?

The Wikipedia definition of p-value is daunting to anyone who is new to the world of statistics and data
science. This is how a typical conversation about p-value goes:

And you are left hanging with formulae and conventions about what to do but no clue on how to interpret
the p-value. So how do we learn p-value once and for all and indelibly ingrain it in our mind?

How we will Understand p-value from Scratch

In this article, we will start building the intuition for the p-value step-by-step from scratch and will also
debunk the traditional (mis)interpretations of the p-value. This is what we will cover:

1. What is p-value?
2. Statistical Significance
3. Example of p-value in Statistics
4. Example of p-value in Data Science
5. Some traditional (mis)interpretations of the p-value

So, let’s dive right into it.

What is p-value?

Let’s start with the absolute basics. What is p-value? To understand this question, we will pick up the
normal distribution:
We have the range of values on the x-axis and the frequency of occurrences of different values on the y-
axis. If you need a quick refresher on the concept of normal distributions, check out this article.

Now, let’s say we pick any random value from this distribution. The probability that we will pick values
close to the mean is highest as it has the highest peak (due to high occurrence values in that region). We
can clearly see that if we move away from the peak, the occurrence of the values decreases rapidly and so
does the corresponding probability, towards a very small value close to zero.

But this article is about p-values – so why are we looking at a normal distribution? Well, with respect to the
normal distribution we discussed above, consider the way we define the p-value.

p-value is the cumulative probability (area under the curve) of the values to the right of the red point in the figure above.

Or,

p-value corresponding to the red point tells us about the ‘total probability’ of getting any value to the right hand side of the red point,
when the values are picked randomly from the population distribution.

Now, this might look like a very naive definition, but we will build on it as we go along.

P-value does not hold any value by itself. A large p-value implies that sample scores are more aligned or
similar to the population score. It is as simple as that.

Now, you might have come across the thumb rule of comparing the p-value with the alpha value to draw
conclusions. So let’s look into the alpha value.

Statistical Significance of the p-value: Enter – Alpha value

I’ve mentioned the alpha value, also known as the significance level, a few times so far. This is a value that
we know to be 0.05 or 5% for some unknown reason.

We are also taught in statistics classes the convention that p-value being less than alpha means that the
results obtained are statistically significant. But what in the world is the alpha value?

So, let’s spend a moment to look at what the alpha value signifies.

Alpha value is nothing but a threshold p-value, which the group conducting the test/experiment decides upon before conducting a test of
similarity or significance ( Z-test or a T-test).

This means that if the likeliness of getting the sample score is less than alpha or the threshold p-value,
we consider it significantly different from the population, or even belonging to some new sample
distribution.
Consider the above normal distribution again. The red point in this distribution represents the alpha value
or the threshold p-value. Now, let’s say that the green and orange points represent different sample results
obtained after an experiment.

We can see in the plot that the leftmost green point has a p-value greater than the alpha. As a result, these
values can be obtained with fairly high probability and the sample results are regarded as lucky.

The point on the rightmost side (orange) has a p-value less than the alpha value (red). As a result, the
sample results are a rare outcome and very unlikely to be lucky. Therefore, they are significantly different
from the population.

The alpha value is decided depending on the test being performed. An alpha value of 0.05 is considered a
good convention if we are not sure of what value to consider.

But this comes with an asterisk – the smaller the value of alpha we consider, the harder it is to consider
the results as significant. Keep in mind that the alpha value will vary from experiment to experiment and
there is no alpha value which can be considered as a thumb rule.

Let’s look at the relationship between the alpha value and the p-value closely.

p-value < alpha

Consider the following population distribution:

Here, the red point represents the alpha value. This is basically the threshold p-value. We can clearly see
that the area under the curve to the right of the threshold is very low.

The orange point represents the p-value using the sample population. In this case, we can clearly see that
the p-value is less than the alpha value (the area to the right of the red point is larger than the area to the
right of the orange point). This can be interpreted as:

The results obtained from the sample is an extremity of the population distribution (an extremely rare event), and hence there is a good
chance it may belong to some other distribution (as shown below).

Considering our definitions of alpha and the p-value, we consider the sample results obtained as
significantly different. We can clearly see that the p-value is far less than the alpha value.

p-value > alpha:

Right – I feel you should answer this question before reading further. Now that you know the other side of
this coin, you will be able to think of the outcome of this scenario.

p-value greater than the alpha means that the results are in favor of the null hypothesis and therefore we
fail to reject it. This result is often against the alternate hypothesis (obtained results are from another
distribution) and the results obtained are not significant and simply a matter of chance or luck.

Again, consider the same population distribution curve with the red point as alpha and the orange point as
the calculated p-value from the sample:
So, p-value > alpha (considering the area under the curve to the right-hand side of the red and the orange
points) can be interpreted as follows:

The sample results are just a low probable event of the population distribution and are very likely to be obtained by luck.

We can clearly see that the area under the population curve to the right of the orange point is much larger
than the alpha value. This means that the obtained results are more likely to be part of the same
population distribution than being a part of some other distribution.

Now that we have understood the interpretation of the p-value and the alpha value, let’s look at a classic
example from the world of statistics.

Example of p-value in Statistics

In the National Academy of Archery, the head coach intends to improve the performance of the archers
ahead of an upcoming competition. What do you think is a good way to improve the performance of the
archers?

He proposed and implemented the idea that breathing exercises and meditation before the competition
could help. The statistics before and after experiments are below:
Interesting. The results favor the assumption that the overall score of the archers improved. But the coach
wants to make sure that these results are because of the improved ability of the archers and not by luck or
chance. So what do you think we should do?

This is a classic example of a similarity test (Z-test in this case) where we want to check whether the
sample is similar to the population or not. I will not go deep into the similarity test since that is out of the
scope of this article.

In order to solve this, we will follow a step-by-step approach:

1. Understand the information given and form the alternate and null hypothesis
2. Calculate the Z-score and find the area under the curve
3. Calculate the corresponding p-value
4. Compare the p-value and the alpha value
5. Interpret the final results

A solution to this Problem

Step 1: Understand the given information

Population Mean = 74
Population Standard Deviation = 8 (Historical data of the last 10 years is associated to the population)
Sample Mean = 78
Sample Size = 60 (Here, the sample is associated with the archers who practiced breathing exercises
and meditation)

We have the population mean and standard deviation with us and the sample size is over 30, which means
we will be using the Z-test.

According to the problem above, there can be two possible conditions:

1. The after-experiment results are a matter of luck, i.e. mean before and after experiment are similar.
This will be our “Null Hypothesis”
2. The after-experiment results are indeed very different from the pre-experiment ones. This will be our
“Alternate Hypothesis”

Step 2: Calculating the Z-Score

We will now calculate the Z-Score using the above formula. What do the symbols stand for, you ask? Well,
here you go:

X = Population Mean
M = Sample Mean
Sigma = Population Standard Deviation
n = number of sample instances

On plugging in the corresponding values, Z-Score comes out to be – 3.87.

Step 3: Referring to the Z-table and finding the p-value:

If we look up the Z-table for 3.87, we get a value of ~0.999. This is the area under the curve or probability
under the population distribution. But this is the probability of what?
The probability that we obtained is to the left of the Z-score (Red Point) which we calculated. The value
0.999 represents the “total probability” of getting a result “less than the sample score 78”, with respect
to the population.

Here, the red point signifies where the sample mean lies with respect to the population distribution. But we
have studied earlier that p value is to the right-hand side of the red point, so what do we do?

For this, we will use the fact that the total area under the normal Z distribution is 1. Therefore the area to
the right of Z-score (or p-value represented by the unshaded region) can be calculated as:

p-value = 1 – 0.999

p-value = 0.001

0.001 (p-value) is the unshaded area to the right of the red point. The value 0.001 represents the “total
probability” of getting a result “greater than the sample score 78”, with respect to the population.

Step 4: Comparing p-value and alpha value:

We were not given any value for alpha, therefore we can consider alpha = 0.05. According to our
understanding, if the likeliness of obtaining the sample (p-value) result is less than the alpha value, we
consider the sample results obtained as significantly different.

We can clearly see that the p-value is far less than the alpha value:

0.001 (red region) << 0.5 (orange region)

This says that the likeliness of obtaining the mean as 78 is a rare event with respect to the population
distribution. Therefore, it is convenient to say that the increase in the performance of the archers in the
sample population is not the result of luck. The sample population belongs to some other (better in this
case) distribution of itself.

Example of p-value in Data Science

Now, this is the section I’m sure you’ve been waiting for. Using p-value in statistics is understandable and
we’ve even heard of it plenty of times. But where does p-value fit in the data science spectrum?
Even though many aspiring data scientists understand what the p-value means, they do not know how to
use this knowledge in Data Science. As a result, they miss out on a significantly powerful method of
improving their models.

P-value is an important metric in the process of feature selection. In feature selection, we try to find out the best subset of the
independent variables to build the model.

Now you might ask, “Why not just throw in all the independent variables?”

Actually, throwing in redundant and non-contributing variables adds complexity to the model. Moreover,
they can reduce the model performance in terms of accuracy, runtime and even memory footprint.

Note: If you need a refresher on feature selection, refer to the below tutorial:

Feature selection methods with example.

Let’s look at an example. Consider that I have a dataset that contains information about different startups.
We have the below variables:

Our aim is to predict the profits earned by the startups based on the rest of the independent variables.
Now, your intuition might say – use all the independent variables available to build a linear regression
model.

After preprocessing and OneHotEncoding, the dependent variables have the following mapping:

Next, we will build an OLS (ordinary least squares) model using the statsmodels library. Here’s what we get:

This table displays all the statistics regarding the independent variables. But right now, we are only
interested in looking at the column with the p-values in it. We can clearly see that the “R&S Spend” ,
“Administration” and “State_California” have a p-value over 0.50!
But the question is, what does this p-value mean in a regression model? For that, let’s understand what’s
the hypothesis for which these p-values are calculated:

Null Hypothesis: The independent variable has no significant effect over the target variable
Alternate Hypothesis: The independent variables have a significant effect on the target variable

Now, the above results show that “R&S Spend”, “Administration” and “State_California” have no significant
effect over the “Profit” earned by the startups. So let’s start by removing these three variables from the
model.

The resultant mapping after removing those two variables is:

On again building the OLS model using the statsmodels library, this is what we get:

We can see that there is now only one variable left over the value of 0.05 – “State_Florida”. So should we
remove it?

For starters, we never decided any alpha value. If we were to take the alpha value 0.05, the variable
“State_Florida” would have been eliminated. If I would have selected the alpha as 0.10, the variable would
have survived the filtration process.

In this case, I will let it stay considering that 0.05 is not a thumb rule to choose for the alpha value.

The most important thing to note in this model summary is that although we have reduced two independent variables, the value of the
adjusted R-Square value went up.

This is a two-fold effect as we discussed previously. With the help of p-value, we not only made a simpler
model with fewer variables, but we also improved the model’s performance.

Before wrapping up this article, let’s look at different ways p-values are misinterpreted by a lot of data
science professionals and statisticians.

Some traditional (mis)interpretations of the p-value

There are many ways I have seen people misinterpreting the p-value. Here are just a few of the most
common mistakes:

1. The probability that we would reject the null hypothesis incorrectly: Although a low p-value promotes
the rejection of the null hypothesis, it addresses nothing about the probability of rejecting it
2. The level of statistical significance: We choose the significance level before we perform the
experiment. If the p-value satisfies our level of significance (p < alpha), only then can we make
conclusions
3. The magnitude of the effect of intervention: p-value by no means signifies the magnitude of the
intervention in the sample which was introduced during the experimentation
4. The probability that the null hypothesis is true: This comes close and might not cause much harm, but
it will still be a source of confusion. To talk about a null hypothesis being true using a frequentist
statistic is impossible. A high p-value means that our data is highly consistent with our null
hypothesis, nothing more

And there are many more! Keep these in mind and you’ll do well the next time you encounter p-value in your
work.

End Notes

In this article, we followed a step by step procedure to understand p-value thoroughly by introducing one
parameter at a time. P-value can be very intriguing to a new statistician or a data scientist, but the way we
understood it above with example in statistics and an example in data science, I believe we can now
explain p-value confidently to anyone without having to depend upon the complex definitions or
conventions set in stone just because no one ever explained it to us.

If you want to learn more, check out the following courses:

Introduction to Data Science

Applied Machine Learning: Beginner to Professional

Article Url - https://www.analyticsvidhya.com/blog/2019/09/everything-know-about-p-value-from-scratch-

data-science/

Sharoon Saxena
Passionate about learning new things everyday, well versed with Machine Learning and Data Science
and an Avid Reader. Setting sights on Reinforcement Learning and Game Theory, I could see Artificial
General Intelligence on the Horizon.

P-Value and Statistical Significance What It Is Amp Why It Matters
No ratings yet
P-Value and Statistical Significance What It Is Amp Why It Matters
14 pages
Hypothesis Testing BRM
No ratings yet
Hypothesis Testing BRM
57 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
A Critical Evaluation of The Current "P-Value Controversy"
No ratings yet
A Critical Evaluation of The Current "P-Value Controversy"
19 pages
Political Studies 285 Lecture 8a: Null Hypothesis Testing: Winter 2023
No ratings yet
Political Studies 285 Lecture 8a: Null Hypothesis Testing: Winter 2023
78 pages
Module2 DS
No ratings yet
Module2 DS
46 pages
Unit 2 FHCA
No ratings yet
Unit 2 FHCA
10 pages
Chapter 8 and 9 Intro-to-Hypothesis-Testing-Using-Sign-Test
No ratings yet
Chapter 8 and 9 Intro-to-Hypothesis-Testing-Using-Sign-Test
44 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
12 pages
P-Value - v5
No ratings yet
P-Value - v5
11 pages
The-Probability-value-or-the-p Ye
No ratings yet
The-Probability-value-or-the-p Ye
6 pages
Statistical Parameters P-Value
No ratings yet
Statistical Parameters P-Value
2 pages
P Value
No ratings yet
P Value
9 pages
Biology Stat Guide - P-Value
No ratings yet
Biology Stat Guide - P-Value
1 page
Hypothesis Testing
No ratings yet
Hypothesis Testing
41 pages
Unit 2
No ratings yet
Unit 2
9 pages
P Value Calculation
No ratings yet
P Value Calculation
9 pages
Annurev Statistics 031219 041051
No ratings yet
Annurev Statistics 031219 041051
12 pages
Notes - Machine Learning
No ratings yet
Notes - Machine Learning
9 pages
P-Value Guide
No ratings yet
P-Value Guide
8 pages
P Value
No ratings yet
P Value
2 pages
P Value
No ratings yet
P Value
4 pages
20 SS129
No ratings yet
20 SS129
21 pages
P-Value 221213 212154
No ratings yet
P-Value 221213 212154
10 pages
FULL Version Testbank Coordinate Geometry For JEE Advanced 3rd Edition G Tewani Multiple Formats
No ratings yet
FULL Version Testbank Coordinate Geometry For JEE Advanced 3rd Edition G Tewani Multiple Formats
409 pages
The P-Value Requires Context Not A Threshold
No ratings yet
The P-Value Requires Context Not A Threshold
4 pages
P Value
67% (3)
P Value
6 pages
Confidence Levels
No ratings yet
Confidence Levels
8 pages
Use Correct Statistics - Prof Bhisma Murti - 18 Nov 2020
No ratings yet
Use Correct Statistics - Prof Bhisma Murti - 18 Nov 2020
19 pages
P-Value What It Is, How To Calculate It, and Why It Matters
No ratings yet
P-Value What It Is, How To Calculate It, and Why It Matters
1 page
6-Testing&conf Intervals PDF
No ratings yet
6-Testing&conf Intervals PDF
43 pages
Siginificance Test For Dummies
No ratings yet
Siginificance Test For Dummies
3 pages
INTERPRETATIONOFp VALUE
No ratings yet
INTERPRETATIONOFp VALUE
3 pages
MM13 Content Module 7 1
No ratings yet
MM13 Content Module 7 1
12 pages
Calculate The P Value
No ratings yet
Calculate The P Value
4 pages
K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
100% (1)
K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
531 pages
Estadistica, Articulo, Statistical Errors
No ratings yet
Estadistica, Articulo, Statistical Errors
3 pages
1 Vocab Reasoning
No ratings yet
1 Vocab Reasoning
3 pages
Significance of P-Value, Box-Whisker Plots in Statistical Testing 260811
No ratings yet
Significance of P-Value, Box-Whisker Plots in Statistical Testing 260811
14 pages
P Value
No ratings yet
P Value
15 pages
What Is Statistical Significance
No ratings yet
What Is Statistical Significance
2 pages
Statistical Errors: P Values, The Gold Standard' of Statistical Validity, Are
No ratings yet
Statistical Errors: P Values, The Gold Standard' of Statistical Validity, Are
3 pages
P Values Are Random Variables
No ratings yet
P Values Are Random Variables
5 pages
The P-Value: What Is A Null Hypothesis?
No ratings yet
The P-Value: What Is A Null Hypothesis?
4 pages
P-Values Notes
No ratings yet
P-Values Notes
15 pages
SMDM FAQs Week 3
No ratings yet
SMDM FAQs Week 3
3 pages
Chapter 21 More About Tests: Zero in On The Null
No ratings yet
Chapter 21 More About Tests: Zero in On The Null
13 pages
What Is The Difference Between Alpha and
No ratings yet
What Is The Difference Between Alpha and
3 pages
P Value - P Valor
No ratings yet
P Value - P Valor
2 pages
The Threats To The Objectivity in Internal Auditing
No ratings yet
The Threats To The Objectivity in Internal Auditing
2 pages
Significance-Testing-White Paper
No ratings yet
Significance-Testing-White Paper
7 pages
Data Science Interview Questions - Statistics: Mohit Kumar Dec 12, 2018 11 Min Read
100% (1)
Data Science Interview Questions - Statistics: Mohit Kumar Dec 12, 2018 11 Min Read
14 pages
Level of Significance
No ratings yet
Level of Significance
17 pages
What A P-Value Tells You About Statistical Data: Deborah J. Rumsey Statistics For Dummies, 2nd Edition
No ratings yet
What A P-Value Tells You About Statistical Data: Deborah J. Rumsey Statistics For Dummies, 2nd Edition
1 page
Article
No ratings yet
Article
3 pages
P Value Definition
100% (1)
P Value Definition
1 page
Easy Introduction To AB Testing and P-Values - Conductrics
No ratings yet
Easy Introduction To AB Testing and P-Values - Conductrics
5 pages
FTRE Brochure
No ratings yet
FTRE Brochure
36 pages
This Content Downloaded From 42.1.77.20 On Tue, 05 Nov 2024 14:43:27 UTC
No ratings yet
This Content Downloaded From 42.1.77.20 On Tue, 05 Nov 2024 14:43:27 UTC
17 pages
Sanskrit PDF
No ratings yet
Sanskrit PDF
33 pages
Fourier Analysis-A Signal Processing Approach
No ratings yet
Fourier Analysis-A Signal Processing Approach
14 pages
A Refresher On Statistical Significance
No ratings yet
A Refresher On Statistical Significance
9 pages
CHM2032L Lab Manual 8 Spectrophotometry Yavuz-Petrowski Fall 2021 Tde88JS
No ratings yet
CHM2032L Lab Manual 8 Spectrophotometry Yavuz-Petrowski Fall 2021 Tde88JS
21 pages
Chapter 12.2 - Financial Statements
No ratings yet
Chapter 12.2 - Financial Statements
10 pages
Matrikulasi - 2
No ratings yet
Matrikulasi - 2
37 pages
Dasmesh Group of Schools: Faridkot/Kotkapura/Bargari Std. VII
No ratings yet
Dasmesh Group of Schools: Faridkot/Kotkapura/Bargari Std. VII
23 pages
P-Values Explained by Data Scientist For Data Scientists
No ratings yet
P-Values Explained by Data Scientist For Data Scientists
8 pages
The Travelers Property Casualty Co. v. Saint-Gobain Technical Fabrics Canada Ltd.
No ratings yet
The Travelers Property Casualty Co. v. Saint-Gobain Technical Fabrics Canada Ltd.
11 pages
Abd Malik
No ratings yet
Abd Malik
1 page
Diagnostic Report: ENGINE #1 - J1939 Active Fault Codes
No ratings yet
Diagnostic Report: ENGINE #1 - J1939 Active Fault Codes
4 pages
Subtitle
No ratings yet
Subtitle
4 pages
HL Business Management Course Outline - Final
No ratings yet
HL Business Management Course Outline - Final
14 pages
A - Statistical Versus Practical Significance
No ratings yet
A - Statistical Versus Practical Significance
12 pages
UEFA Euro 2020 Case Study
No ratings yet
UEFA Euro 2020 Case Study
3 pages
What Is Weather in Canada
No ratings yet
What Is Weather in Canada
5 pages
ATC-3002 Quick Start Guide
No ratings yet
ATC-3002 Quick Start Guide
2 pages
After Class - AVTC6 - Unit 6 - Pie Charts - K26
No ratings yet
After Class - AVTC6 - Unit 6 - Pie Charts - K26
3 pages
What Is A P Value
No ratings yet
What Is A P Value
4 pages
Possible Quiz Questions: For January 21st Quiz #1
No ratings yet
Possible Quiz Questions: For January 21st Quiz #1
4 pages
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
No ratings yet
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
6 pages
Contracting Activity and Technical Staff Requirements
No ratings yet
Contracting Activity and Technical Staff Requirements
2 pages
Safe Work Procedure
No ratings yet
Safe Work Procedure
2 pages
Projectile Motion Quiz Review: Define
No ratings yet
Projectile Motion Quiz Review: Define
5 pages
What Is Budgetary Cycle
No ratings yet
What Is Budgetary Cycle
6 pages
Mahanakhon Structural Design Presentation
100% (1)
Mahanakhon Structural Design Presentation
42 pages
Standard PDI G102
No ratings yet
Standard PDI G102
8 pages
Alternating Quantities
No ratings yet
Alternating Quantities
16 pages
Todorov Theory
No ratings yet
Todorov Theory
1 page
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Everything Know About P Value From Scratch Data Science

Uploaded by

Everything Know About P Value From Scratch Data Science

Uploaded by

Everything you Should Know about p-value from Scratch for Data

How do you interpret a p-value?

How we will Understand p-value from Scratch

So, let’s dive right into it.

p-value < alpha

Consider the following population distribution:

p-value > alpha:

Example of p-value in Statistics

In order to solve this, we will follow a step-by-step approach:

A solution to this Problem

Step 1: Understand the given information

According to the problem above, there can be two possible conditions:

Step 2: Calculating the Z-Score

On plugging in the corresponding values, Z-Score comes out to be – 3.87.

Step 3: Referring to the Z-table and finding the p-value:

Step 4: Comparing p-value and alpha value:

0.001 (red region) << 0.5 (orange region)

Example of p-value in Data Science

Feature selection methods with example.

The resultant mapping after removing those two variables is:

Some traditional (mis)interpretations of the p-value

If you want to learn more, check out the following courses:

Introduction to Data Science

Article Url - https://www.analyticsvidhya.com/blog/2019/09/everything-know-about-p-value-from-scratch-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.