0% found this document useful (0 votes)
22 views11 pages

Bes Summary

Uploaded by

u22652150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

Bes Summary

Uploaded by

u22652150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

BES Summary

Dillon Pretorius
August 23, 2016

1
1 Chapter 1: Introduction to Data
1.1 Terminology and Concepts
• Data - Observations
• Summary statistic - Single number summarising large amounts of data.

• Data matrix - A table with each specific case on a row and variables in
the columns.
• Types of variables
– Numerical - Can take wide range of numerical values (can add/subtract/take
averages)
∗ Discrete - Can only take specific values with jumps.
∗ Continuous - Can take any value
– Categorical - Non-numerical. Can only be specific values, called lev-
els. If levels have a natural ordering, they are ordinal.
• Associated/Dependant variables - Variables showing some connection be-
tween them. Either positively or negatively associated.
• Independent variables - Variables not associated with each other.

• Population - All the cases in question. A sample is a subset of these


cases, usually a small fraction.
• Anecdotal Evidence - Data collected haphazardly and possibly represent-
ing isolated, extraordinary occurrences. Unreliable.

• Bias - Bias can occur because of non-response (ie. When people selected
as a sample don’t respond) and convenience sample (individuals who
are more easily accessible are selected more)
• Explanatory and Response variables - Explanatory variables are variables
which we suspect may effect the response variables.

• Observational Study - When data is collected in a way that doesn’t directly


interfere with how the data arises. Show natural associations, but not
causal connection.
• Placebo - Fake treatment

• Blind Study - When patients do not know which treatment they are receiv-
ing. When the researcher also doesn’t know, it is called a double-blind
setup.
• Confounding Variable - A Variable correlated to both explanatory and
response variables. Must be taken into consideration before making state-
ments of causality.
• Prospective and Retrospective studies - Observational studies done as
events unfold, or afterwards, respectively.

2
• Random Sampling
– Simple - Each case in population has equal chance of selection
– Stratified - Similar cases are grouped into strata then simple sam-
pling occurs within each stratum.
– Cluster - Observations grouped into clusters, then random entire clus-
ters are selected.
– Multistage - Same as cluster, but only a simple random sample is
taken from each cluster instead of the whole cluster.
• Experiment - Conducted with suspected explanatory and response vari-
ables. Checks for causal connection. Researchers assign treatments to
cases. Differences between groups are controlled. Cases are randomised
into treatment groups to even out uncontrollable variable differences. Large
samples of cases are preferable for replication, either in a single study or
by multiple groups of scientists. Cases may first be grouped by variables
that may influence results, this is called blocking.
• Scatter plot - Type of graph useful for relationships between variables.
• Dot plot - A one-variable scatter plot

• Histogram - Data organised into bins, drawn as bars of height representing


the number of cases in each bin. Show data density.
• Data distribution - When the tail of the data trails to the left or the right,
it is left skewed or right skewed respectively. If data trails off equally
on each side it is symmetric.
• Mean - The average.
• Mode - Most commonly occurring value, the peak of a histogram. Either
unimodal, bimodal, or multimodal for one, two, or more peaks.

• Median - The middle value, when the data is sorted in order.


• Standard deviation - Average distance of data from the mean.Describes
variability.
• Box plot - Middle line representing the median. Box representing the
IQR (also represents variance). Whiskers representing and extension of
1.5 ∗ IQR, beyond which, observations are called outliers. Whiskers end
at the last point that is not an outlier.
• Interquartile Range (IQR) - Distance between 25th and 75th percentiles
(Q1 and Q3 ).

• Robust estimates - Median and IQR, extreme observations have little ef-
fect.
• Transformations - When some mathematical function is applied to a vari-
able.

3
• Intensity map - Shows geographical data with colours representing values
according to a key.

• Contingency table - Summarises data for two categorical variables. Num-


bers represent counts of combinations. Row and column totals for each
variable are also present. Single variable tables are called frequency ta-
bles.
• Bar plot - Plots a single categorical variable, either against the count, or
the proportion of the total.
• Segmented bar plot - Plots contingency table information.Can be stan-
dardised so all bars are same height.
• Mosaic plot - Plots contingency table information. Box area represents
number of observations.

• Pie charts - Don’t be that guy.


• Numerical data group comparisons - Either side-by-side box plots which
are box plots next to each other drawn on the same scale, or hollow his-
tograms, which are outlines of each group’s histograms on the same plot.

1.2 Formulae
Count
P roportion =
T otal
sum of observations
M ean = x =
number of observations

Deviation = x − x

sum of deviations squared


V ariance = s2 =
n−1

Standard deviation = s = variance

IQR = Q3 − Q1

4
2 Chapter 2: Foundation for Inference
2.1 Terminology and Concepts
• Point estimate - A single value given as an estimate of a population
• Hypothesis test - Statistical technique used to evaluate opposing claims
using data. Should be set up before seeing the data, to avoid choosing
one- or two- sided incorrectly.
– Frame research question in terms of hypotheses (H0 and HA )
– Collect data (observational study or experiment)
– Analyse data (eg. with p-value)
– Form conclusion (eg. comparison of p-value to α)
• Null Hypothesis (H0 ) - A skeptical perspective of no-difference. Any rela-
tionship caused by chance. If this hypothesis strongly disagrees with the
data, we reject it in favour of the alternative hypothesis.
• Alternative Hypothesis (HA ) - The variables are not independent. Differ-
ence was not due to chance.
• Statistical Inference - Practice of making decisions and conclusions from
data in the context of uncertainty.
• Randomisation - Simulating the null hypothesis and calculating the prob-
ability of the observed difference occurring by chance.
• p-value - Also called the test statistic. Probability of observing data at
least as favourable to H0 as our current data set if the null hypothesis were
true.
• Statistical significance - If the p-value is lower than some significance
level, usually α = 0.05, it means the data provides strong enough evidence
against H0 that we reject it in favour of HA . We say it is statistically
significant.
• Decision errors - If we reject H0 or HA when either was actually true, we
have made a Type 1 or Type 2 error respectively. Depending on which
error is more costly in a practical implementation, the significance level
can be adjusted. A smaller α avoids Type 1 errors and a larger α avoids
Type 2 errors.
• Confirmation bias - Looking for data that supports our ideas. Setting an
alternative hypothesis that agrees with our worldview.

• Two-sided hypothesis tests - HA is taken as any deviation from the norm,


positive or negative. This is a more rigorous and open-minded test and
should be the default. The p-value is the sum of the probabilities of each
tail of the data. (Just the p-value X2 when data is symmetrical)
• Parameter - The ”true” value of interest, usually estimated from a point
estimate.

5
• Null value - Reference value for H0 .
• Central limit theorem - If we look at a proportion (or difference in propor-
tions) and the scenario satisfies certain conditions (observations in sam-
ple are independent and the sample is large enough), then the sample
proportion will appear to follow a bell-shaped curve called the normal
distribution.
• Normal distribution - Symmetric, unimodal, bell curve. Also called normal
curve, or normal model. Can look different depending on details of model,
the mean and standard deviation. If x = 0 and s = 1 then it’s called the
standard normal distribution. A normal distribution can be written
as a function of these parameters: N (x, s)
• Z-Score - The number of standard deviations an observation is above or
below the mean. Can be used to identify how unusual an observation is.

• Percentile - The percentage of observations falling above or below a certain


value.
• Normal probability table - Maps Z-Scores to percentiles. (Given for tests
and exams. Included at the end of this document). Remember, the
area on the left = 1-the area on the right. Always begin questions using
this by drawing a picture of the distribution and the area(s) in question.

• Normal approximation - How closely real data follows the normal distri-
bution can be seen by the shape of its histogram, or how close the points
are to a straight line in a normal probability plot.
• Standard Error (SE) - Standard deviation associated with the estimate
(ie. a point estimate). This will either be given, or a formula to calculate
it will be provided.

• Confidence interval - Plausible range of values for a population parameter.


(eg. For a 95% confidence interval, We are 95% sure this interval contains
the true value.)

6
2.2 Formulae
In General
observation - mean
Z-Score =
standard deviation

In Context of Hypothesis Test


point estimate - null value
Z-Score =
standard error

confidence interval = point estimate ± margin of error

margin of error = selected confidence level ∗ SE

7
Formulas

Pn Pn sP
n
i=1 xi i=1 (xi x̄)2 p i=1 (xi x̄)2
x̄ = var = s= var =
n n 1 n 1
x µ
Q1 1.5 ⇥ IQR Q3 + 1.5 ⇥ IQR Z=

Confidence interval: x ± z ⇤ ⇥ SE 95% confidence interval: x ± 1.96⇤ ⇥ SE

BES 220 6 of 8 Semester test 1 – 24 August 2016


Normal probability table

negative Z

Second decimal place of Z


0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Z
0.0002 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 3.4
0.0003 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0005 0.0005 0.0005 3.3
0.0005 0.0005 0.0005 0.0006 0.0006 0.0006 0.0006 0.0006 0.0007 0.0007 3.2
0.0007 0.0007 0.0008 0.0008 0.0008 0.0008 0.0009 0.0009 0.0009 0.0010 3.1
0.0010 0.0010 0.0011 0.0011 0.0011 0.0012 0.0012 0.0013 0.0013 0.0013 3.0
0.0014 0.0014 0.0015 0.0015 0.0016 0.0016 0.0017 0.0018 0.0018 0.0019 2.9
0.0019 0.0020 0.0021 0.0021 0.0022 0.0023 0.0023 0.0024 0.0025 0.0026 2.8
0.0026 0.0027 0.0028 0.0029 0.0030 0.0031 0.0032 0.0033 0.0034 0.0035 2.7
0.0036 0.0037 0.0038 0.0039 0.0040 0.0041 0.0043 0.0044 0.0045 0.0047 2.6
0.0048 0.0049 0.0051 0.0052 0.0054 0.0055 0.0057 0.0059 0.0060 0.0062 2.5
0.0064 0.0066 0.0068 0.0069 0.0071 0.0073 0.0075 0.0078 0.0080 0.0082 2.4
0.0084 0.0087 0.0089 0.0091 0.0094 0.0096 0.0099 0.0102 0.0104 0.0107 2.3
0.0110 0.0113 0.0116 0.0119 0.0122 0.0125 0.0129 0.0132 0.0136 0.0139 2.2
0.0143 0.0146 0.0150 0.0154 0.0158 0.0162 0.0166 0.0170 0.0174 0.0179 2.1
0.0183 0.0188 0.0192 0.0197 0.0202 0.0207 0.0212 0.0217 0.0222 0.0228 2.0
0.0233 0.0239 0.0244 0.0250 0.0256 0.0262 0.0268 0.0274 0.0281 0.0287 1.9
0.0294 0.0301 0.0307 0.0314 0.0322 0.0329 0.0336 0.0344 0.0351 0.0359 1.8
0.0367 0.0375 0.0384 0.0392 0.0401 0.0409 0.0418 0.0427 0.0436 0.0446 1.7
0.0455 0.0465 0.0475 0.0485 0.0495 0.0505 0.0516 0.0526 0.0537 0.0548 1.6
0.0559 0.0571 0.0582 0.0594 0.0606 0.0618 0.0630 0.0643 0.0655 0.0668 1.5
0.0681 0.0694 0.0708 0.0721 0.0735 0.0749 0.0764 0.0778 0.0793 0.0808 1.4
0.0823 0.0838 0.0853 0.0869 0.0885 0.0901 0.0918 0.0934 0.0951 0.0968 1.3
0.0985 0.1003 0.1020 0.1038 0.1056 0.1075 0.1093 0.1112 0.1131 0.1151 1.2
0.1170 0.1190 0.1210 0.1230 0.1251 0.1271 0.1292 0.1314 0.1335 0.1357 1.1
0.1379 0.1401 0.1423 0.1446 0.1469 0.1492 0.1515 0.1539 0.1562 0.1587 1.0
0.1611 0.1635 0.1660 0.1685 0.1711 0.1736 0.1762 0.1788 0.1814 0.1841 0.9
0.1867 0.1894 0.1922 0.1949 0.1977 0.2005 0.2033 0.2061 0.2090 0.2119 0.8
0.2148 0.2177 0.2206 0.2236 0.2266 0.2296 0.2327 0.2358 0.2389 0.2420 0.7
0.2451 0.2483 0.2514 0.2546 0.2578 0.2611 0.2643 0.2676 0.2709 0.2743 0.6
0.2776 0.2810 0.2843 0.2877 0.2912 0.2946 0.2981 0.3015 0.3050 0.3085 0.5
0.3121 0.3156 0.3192 0.3228 0.3264 0.3300 0.3336 0.3372 0.3409 0.3446 0.4
0.3483 0.3520 0.3557 0.3594 0.3632 0.3669 0.3707 0.3745 0.3783 0.3821 0.3
0.3859 0.3897 0.3936 0.3974 0.4013 0.4052 0.4090 0.4129 0.4168 0.4207 0.2
0.4247 0.4286 0.4325 0.4364 0.4404 0.4443 0.4483 0.4522 0.4562 0.4602 0.1
0.4641 0.4681 0.4721 0.4761 0.4801 0.4840 0.4880 0.4920 0.4960 0.5000 0.0
⇤ For Z  3.50, the probability is less than or equal to 0.0002.
Normal probability table

positive Z

Second decimal place of Z


Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
⇤ For Z 3.50, the probability is greater than or equal to 0.9998.
Do not be anxious about anything, but in every situation, by prayer and
petition, with thanksgiving, present your requests to God. And the peace of
God, which transcends all understanding, will guard your hearts and your
minds in Christ Jesus. - Phil 4:6-7

11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy