0% found this document useful (0 votes)
5 views7 pages

Statistics

The document discusses key statistical concepts including measures of spread such as range, interquartile range, and standard deviation, along with their calculations. It also covers the relationship between variables through correlation coefficients and highlights the importance of identifying outliers in data analysis. Additionally, it touches on probability, particularly focusing on mutually exclusive events and the binomial distribution.

Uploaded by

mcvoidrunner1952
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Statistics

The document discusses key statistical concepts including measures of spread such as range, interquartile range, and standard deviation, along with their calculations. It also covers the relationship between variables through correlation coefficients and highlights the importance of identifying outliers in data analysis. Additionally, it touches on probability, particularly focusing on mutually exclusive events and the binomial distribution.

Uploaded by

mcvoidrunner1952
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Statistics

16 – Working with Data

frequeny
Frequency density =
class width

The range is the difference between the largest and smallest values in a data set, and the
interquartile range is the difference between the upper and lower quartile. While they can be
a measure of spread, neither consider all the values given.

To counter this, we can use the standard deviation, usually given by the symbol σ.

Let’s consider the data 2,5,8, which has the mean (usually denoted by x̄ ) of 5.

We can look at the difference of each data point from the mean:

( x represents the data x x−x̄


point)
2 -3

5 0

8 3

The mean of the differences will always be 0, as the negatives will cancel out the positives.
This means that it cannot be used as a measure of spread. Because of this, we can square
the differences to ensure that they are non-negative.
x ( x - x̄ )²

2 9

5 0

8 9

The average is given by adding all the values of ( x - x̄ )² and dividing by n , the number of data
items. The symbol for adding up all the values is Σ.

9+0+ 9 18
In our case, the average would be = =6
3 3
However, we would need to undo the squaring to ensure the measure has the same units as
x . This means that the standard deviation for our data would be √ 6


2
Standard deviation: σ = Σ ( x− x̄ )
n

Standard deviation can also be thought of as:

σ=

Σ x2
n
− x̄ ²

Which can also be written as ‘the mean of the squares minus the square of the means’:

σ =(x ²)−(x)²
Variance (σ ² ) is the square of standard deviation and has very useful mathematical
properties.

Calculations from frequency tables:

Σ fx ²
x=
n
Where f is the frequency of each x value and n is the total frequency

2 Σ fx ²
σ = −x ²
n

Now, let’s have a look if there is a relationship between two variables. Data that comes in
pairs int this fashion is said to be bivariate. When we have these two sets of data, there may
or may not be a relationship between them. We can describe the relationship between them
by investigation their correlation.

However, instead of describing the correlation with words, we can use a numerical value, the
correlation coefficient, r, which can only take values of −1 ¿ r ¿1
As x increases, y generally
Strong posi- increases. r ≈ 1
tive correla-
tion

As x increases, y generally
Strong nega- decreases. r ≈−1
tive

No clear linear relationship


No correla- between x and y . r ≈ 0
tion

If there is perfect correlation, r =±1


However, just because r ≈ 0 doesn't mean that there is no relationship between the two
variables – it just means that there is no linear relationship.

Scatter diagrams can also reveal if there are 2 separate groups within the data

However, you must remember that correlation does not equal causation. Such correlation
may be due to a coincidence, or due to a third hidden variable. For example, there might be
a strong correlation between ice cream sales and number of swimmers at a beach. Clearly,
eating ice cream doesn’t make you want to swim; instead, the hidden variable of
temperature could cause both to rise.
When working with real-world data, there may be errors, missing data, or extreme values
that can distort results.

Often the most useful thing to do is to look at your data graphically. And if the underlying
pattern is strong, outliers can become obvious.

There are also some calculations you can do to check for outliers:

 An outlier is any number more than 1.5 interquartile ranges away from the nearest
quartile

 Any outlier is more than 2 standard deviations away from the mean

Once an outlier has been spotted, you must decide then decide whether to include it in your
calculation. This often requires you to look at the data in context:

 If an outlier is clearly an error (e.g. wrong units/impossible value) then it should be


excluded from the data

 If there are several outliers it might be a distinctly different group which should be
analysed separately.
17 – Probability

Events are mutually exclusive if they both cannot happen at the same time e.g. rolling a 6
and a 5 on a die in 1 roll. If the events are mutually exclusive:

P ( A∧B ) =0

P ( A∨B ) =P ( A )+ P (B)

P ( A∧B ) =P ( A ) × P (B)

P ( A ) + P ( ' not A ' ) =1

Bionomial distribution

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy