0% found this document useful (0 votes)
31 views89 pages

1 - BIOL2163 Intro To Statistics

The document provides information about guidelines for students to borrow laptops from the university. It outlines that students must email the deputy principal to request a loan, sign a loan agreement, and loans are only for registered and financially cleared students. Loans are for up to 28 days and can be renewed by contacting the library. It also provides contact information for the class representatives and notes that the course material will be distributed through an online learning system.

Uploaded by

Zara16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views89 pages

1 - BIOL2163 Intro To Statistics

The document provides information about guidelines for students to borrow laptops from the university. It outlines that students must email the deputy principal to request a loan, sign a loan agreement, and loans are only for registered and financially cleared students. Loans are for up to 28 days and can be renewed by contacting the library. It also provides contact information for the class representatives and notes that the course material will be distributed through an online learning system.

Uploaded by

Zara16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

BIOL 2163

Biostatistics
Email : Luke.Rostant@sta.uwi.edu
Lecture Times : Wed, 12-1pm, BBC
Thu, 12-1pm, BBC
Office Hours : online by appointment

1
For those without access to a
computer off campus:
• Please note the following guidelines concerning the Loaning of Laptops:
• Students in need of a laptop are asked to email
deputy.principal@sta.uwi.edu.
• Students who request a laptop will be asked to sign a loan agreement.
• There is no cost involved.
• Only registered and financially cleared students will be allowed to borrow
these devices.
• Loans will be for up to 28 days at a time and can be renewed.
• Loans will be arranged by The Alma Jordan Library (AJL), via the main
office at Sir Arthur Lewis (SAL) Hall.

• Look out for further updates and for more information about the Office of
the Deputy Principal, please visit https://sta.uwi.edu/deputyprincipal/

2
Class representatives
• Alexei Sanguinette
• alexei.sanguinette@my.uwi.edu

• Karine Khan
• karine.khan@my.uwi.edu

3
BIOL 2163
Biostatistics
Myelearning will be the only medium for distributing
material.
For students experiencing problems with myelearning or
their my.uwi email:
Please check MySecureArea in the Student Portal for any
existing holds on your account that may prevent access to
Myelearning. Please check the Registration Page for the list
of holds and whom to contact for resolution.

Please contact the Myelearning Team on the CITS


ServiceDesk for any further assistance:
myelearning@sta.uwi.edu 4
Chapter 1
Introduction to Statistics

1-1 Overview
1-1 Supplemental - Statistical Thinking
1-2 Types of Data
1-2 Supplemental - Critical Thinking
1-3 Design of Experiments

5
Section 1-1
Overview

6
Preview
Polls, studies, surveys and other data collecting
tools collect data from a small part of a larger
group so that we can learn something about the
larger group. This is a common and important
goal of statistics:
Learn about a large group by examining data
from some of its members.

7
Preview

In this context, the terms sample and population


have special meaning. Formal definitions for
these and other basic terms will be given here.

In this section we will look at some of the ways


to describe data.

8
Data

 Data
collections of observations (such as
measurements, genders, survey responses)

9
Statistics

 Statistics
is the science of planning studies and
experiments, obtaining data, and then
organizing, summarizing, presenting,
analyzing, interpreting, and drawing
conclusions based on the data

10
Population

 Population
the complete collection of all individuals
(scores, people, measurements, and so on)
to be studied; the collection is complete in
the sense that it includes all of the
individuals to be studied

11
Census versus Sample

 Census
Collection of data from every member of
a population
 Sample
Subcollection of members selected from
a population

12
Chapter Key Concepts

 Sample data must be collected in an


appropriate way, such as through a
process of random selection.

 If sample data are not collected in an


appropriate way, the data may be so
completely useless that no amount of
statistical torturing can salvage them.

13
Section 1-1 Supplemental
Statistical Thinking

14
Key Concept

This section introduces basic principles of


statistical thinking used throughout this
book. Whether conducting statistical analysis
of data that we have collected, or analyzing a
statistical analysis done by someone else, we
should not rely on blind acceptance of
mathematical calculation. We should
consider these factors:

15
Key Concept (continued)

 Context of the data


 Source of the data
 Sampling method
 Conclusions
 Practical implications

16
Context

 What do the values represent?


 Where did the data come from?
 Why were they collected?
 An understanding of the context will
directly affect the statistical procedure
used.

17
Source of data

 Is the source objective?


 Is the source biased?
 Is there some incentive to distort or spin
results to support some self-serving
position?
 Is there something to gain or lose by
distorting results?
 Be vigilant and skeptical of studies from
sources that may be biased. 18
Sampling Method

 Does the method chosen greatly influence


the validity of the conclusion?
 Voluntary response (or self-selected)
samples often have bias (those with special
interest are more likely to participate).
These samples’ results are not necessarily
valid.
 Other methods are more likely to produce
good results.
19
Conclusions

 Make statements that are clear to those


without an understanding of statistics and
its terminology.
 Avoid making statements not justified by
the statistical analysis.

20
Practical Implications

 State practical implications of the results.


 There may exist some statistical
significance yet there may be NO practical
significance.
 Common sense might suggest that the
finding does not make enough of a
difference to justify its use or to be
practical.

21
Statistical Significance

 Consider the likelihood of getting the


results by chance.
 If results could easily occur by chance, then
they are not statistically significant.
 If the likelihood of getting the results is so
small, then the results are statistically
significant.

22
Section 1-2
Types of Data

23
Key Concept

The subject of statistics is largely about using


sample data to make inferences (or
generalizations) about an entire population.
It is essential to know and understand the
definitions that follow.

24
Parameter

 Parameter
a numerical measurement describing some
characteristic of a population.

population

parameter
25
Statistic

 Statistic
a numerical measurement describing some
characteristic of a sample.

sample

statistic
26
Parameters vs. Statistics
A freshwater pond is stocked with 500
tilapia. All 500 fish are weighed.
Parameter: Average weight of the 500 fish

Suppose 10 fish are randomly selected and


weighed.
Statistic : Average weight of the 10 fish
27
Quantitative Data

 Quantitative (or numerical) data


consists of numbers representing counts or
measurements.

Example: The heights of students


Example: The ages of respondents

28
Categorical Data

Categorical (or qualitative or


attribute) data
consists of names or labels (representing
categories)

Example: The genders (male/female) of professional


athletes
Example: Shirt numbers on professional athletes
uniforms - substitutes for names.
29
Types of Quantitative Data

Quantitative data can further be


described by distinguishing between
discrete and continuous types.

30
Quantitative Data - Discrete Data

 Discrete data
result when the number of possible values is either
a finite number or a ‘countable’ number
(i.e. the number of possible values is
0, 1, 2, 3, . . .)

Example: The number of eggs that a hen lays

Note that a hen cannot lay 1.2 eggs


31
Quantitative Data - Continuous Data

 Continuous (numerical) data


result from infinitely many possible values that
correspond to some continuous scale that covers a
range of values without gaps, interruptions, or
jumps

Example: The amount of milk that a cow produces;


e.g. 2.343115 gallons per day

32
Levels of Measurement

Another way to classify data is to use levels


of measurement. Four of these levels are
discussed in the following slides.

33
Nominal Level

 Nominal level of measurement


characterized by data that consist of names, labels, or categories
only, and the data cannot be arranged in an ordering scheme
(such as low to high)

Example: Survey responses yes, no, undecided

Gender : Male, Female

Colour of pea pods : green, yellow

Can be used for counts, but should not be used for calculations such
as averages, totals etc., even if coded as numbers
34
Ordinal Level
 Ordinal level of measurement
involves data that can be arranged in some order, but
differences between data values either cannot be
determined or are meaningless

Example: Course grades A, B, C, D, or F (Differences in actual


marks cannot be determined . What does A-B mean?)

Ranks: 1-Strongly Agree, 2-Somewhat Agree etc.

Can be used for counts, but not be used for averages, totals etc.
35
Interval Level
 Interval level of measurement
like the ordinal level, with the additional property that
the difference between any two data values is
meaningful, however, there is no natural zero starting
point (where none of the quantity is present)
Years 1000, 2000, 1776, and 1492 (Year zero does not mean
time did not exist before it).

Temperature in Celsius (Zero does not mean no heat)

Averages and totals may be meaningful, but ratios are


36
meaningless.
Ratio Level
 Ratio level of measurement
the interval level with the additional property that
there is also a natural zero starting point (where zero
indicates that none of the quantity is present); for
values at this level, differences and ratios are
meaningful

Example: Prices of college textbooks ($0 represents


no cost, a $100 book costs twice as much as a $50
book) 37
Summary - Levels of Measurement

 Nominal - categories only – no order


 Ordinal - categories with some order –
differences are meaningless
 Interval – order, differences but no natural
starting point – ratios are meaningless
 Ratio – order, differences and a natural starting
point – ratios are meaningful

38
Recap

In this section we have looked at:

 Basic definitions and terms describing data


 Parameters versus statistics
 Types of data (quantitative and qualitative)
Types of Quantitative Data (discrete, continuous)
 Levels of measurement (nominal, ordinal, interval,
ratio)

39
Section 1-2 Supplemental
Critical Thinking

40
Key Concepts
 Success in the introductory statistics course
typically requires more common sense than
mathematical expertise.
 Improve skills in interpreting information based
on data.
 This section is designed to illustrate how common
sense is used when we think critically about data
and statistics.
 Think carefully about the context, source,
method, conclusions and practical implications.
41
Misuses of Statistics

1. Evil intent on the part of dishonest


people.
2. Unintentional errors on the part of
people who don’t know any better.

We should learn to distinguish between statistical


conclusions that are likely to be valid and those
that are seriously flawed.
42
Graphs

To correctly interpret a graph, you must analyze the numerical information given in the
graph, so as not to be misled by the graph’s shape. READ labels and units on the axes!

43
Pictographs

Part (b) is designed to exaggerate the difference by increasing each dimension in


proportion to the actual amounts of oil consumption.

44
Bad Samples
Voluntary response sample
(or self-selected sample)

one in which the respondents themselves decide


whether to be included

In this case, valid conclusions can be made only


about the specific group of people who agree to
participate and not about the population.

45
Correlation and Causality

 Concluding that one variable causes the other


variable when in fact the variables are linked

Two variables may seemed linked, smoking and


pulse rate, this relationship is called correlation.
Cannot conclude the one causes the other.
Correlation does not imply causality.

46
Small Samples

Conclusions should not be based on


samples that are far too small.
Example: Basing a school suspension rate
on a sample of only three students

47
Percentages

Misleading or unclear percentages are


sometimes used. For example, if you take
100% of a quantity, you take it all. If you have
improved 100%, then are you perfect?! 110%
of an effort does not make sense.

48
Loaded Questions

If survey questions are not worded carefully,


the results of a study can be misleading.
Survey questions can be “loaded” or
intentionally worded to elicit a desired
response.
Too little money is being spent on “welfare”
versus too little money is being spent on
“assistance to the poor.” Results: 19% versus
63% 49
Order of Questions

Questions are unintentionally loaded by such


factors as the order of the items being
considered.
Would you say traffic contributes more or less
to air pollution than industry? Results: traffic -
45%; industry - 27%
When order reversed.
Results: industry - 57%; traffic - 24%
50
Nonresponse

Occurs when someone either refuses to


respond to a survey question or is unavailable.
People who refuse to talk to pollsters have a
view of the world around them that is
markedly different than those who will let
poll-takers into their homes.

51
Missing Data

Can dramatically affect results.


Subjects may drop out for reasons unrelated
to the study.
People with low incomes are less likely to
report their incomes.
US Census suffers from missing people (tend
to be homeless or low income).
52
Self-Interest Study

Some parties with interest to promote will


sponsor studies.
Be wary of a survey in which the sponsor can
enjoy monetary gain from the results.
When assessing validity of a study, always
consider whether the sponsor might influence
the results.

53
Precise Numbers vs. Accurate Numbers
Percentage of voters who support Donald
Trump.
Precise figure: 43.15%
Imprecise figures: Approximately 40%,
Between 40 to 45%
Because a figure is precise, many people
incorrectly assume that it is also accurate.
A precise number can be an estimate, and it
should be referred to that way. 54
Deliberate Distortion

Some studies or surveys are distorted on


purpose. The distortion can occur within the
context of the data, the source of the data, the
sampling method, or the conclusions.

55
Recap

In this section we have:

 Reviewed misuses of statistics


 Illustrated how common sense can play a
big role in interpreting data and statistics

56
Section 1-3
Design of Experiments

57
Key Concept
 If sample data are not collected in an
appropriate way, the data may be so
completely useless that no amount of
statistical torturing can salvage them.
 Method used to collect sample data
influences the quality of the statistical
analysis.
 Of particular importance is simple random
sample.
58
Bad Samples
Voluntary response sample
(or self-selected sample)

one in which the respondents themselves decide


whether to be included

In this case, valid conclusions can be made only


about the specific group of people who agree to
participate and not about the population.

59
Basics of Collecting Data

Statistical methods are driven by the data that we


collect. We typically obtain data from two distinct
sources: observational studies and experiments.

60
Observational Study

 Observational study
observing and measuring specific
characteristics without attempting to modify
the subjects being studied

Example: A Gallup poll of voters

61
Types of Observational Studies
 Cross sectional study
data are observed, measured, and collected at one
point in time

 Retrospective (or case control) study


data are collected from the past by going back in
time (examine records, interviews, …)

 Prospective (or longitudinal or cohort) study


data are collected in the future from groups sharing
common factors (called cohorts)
62
Experiment

 Experiment
apply some treatment and then observe its effects
on the subjects; (subjects in experiments are called
experimental units) – i.e. we attempt to modify the
subjects being studied

Example: A clinical trial. One group of patients is


given a new drug (treatment) and another (a
control group) is given a placebo.
63
Experiments - Confounding

 Confounding
occurs in an experiment when the experimenter is
not able to distinguish between the effects of
different factors.

Suppose all subjects in the control group were all older


patients. Is the effect due to the drug or to age?

Try to plan the experiment so that confounding does


not occur.
64
Controlling Effects of Variables in Experiments
 An Example of Bad Experimental Design With Confounding

We want to test the effectiveness of a fertilizer on trees.


We have two plots of land - one is dry and one is moist.

A bad design: Treat 6 trees planted on the moist soil with


fertilizer. Plant 6 untreated trees on the dry soil.
Are the differences between the two groups of trees due to
the treatment (fertilizer) or to the soil type?

There is confounding between the treatment and the soil type.


65
Controlling Effects of Variables in Experiments
 An Example of Bad Experimental Design With Confounding

66
Controlling Effects of Variables in Experiments
 To Prevent Confounding: Use a Randomized Block Design

 Randomized Block Design


a block is a group of subjects that are similar, but blocks
differ in ways that might affect the outcome of the
experiment

67
Controlling Effects of Variables in Experiments
 Randomized Block Design

68
Controlling Effects of Variables in
Experiments

 Blinding
is a technique in which the subject doesn’t know
whether he or she is receiving a treatment or a
placebo. Blinding allows us to determine whether
the treatment effect is significantly different from a
placebo effect, which occurs when an untreated
subject reports improvement in symptoms.

69
Controlling Effects of Variables in
Experiments
 Double-Blind
Blinding occurs at two levels:
(1) The subject doesn’t know whether he or she is
receiving the treatment or a placebo

(2) The experimenter does not know whether he


or she is administering the treatment or
placebo

70
Controlling Effects of Variables in Experiments
 Completely Randomized Experimental Design
assign subjects to different treatment groups through a
process of random selection
 Randomized Block Design
a block is a group of subjects that are similar, but blocks
differ in ways that might affect the outcome of the
experiment
 Rigorously Controlled Design
carefully assign subjects to different treatment groups, so
that those given each treatment are similar in ways that are
important to the experiment
 Matched Pairs Design
compare exactly two treatment groups using subjects
matched in pairs that are somehow related or have similar
characteristics 71
Replication
 Replication
is the repetition of an experiment on more than
one subject. Samples should be large enough so
that the erratic behavior that is characteristic of
very small samples will not disguise the true effects
of different treatments. It is used effectively when
there are enough subjects to recognize the
differences from different treatments.

Use a sample size that is large enough to let us see the


true nature of any effects, and obtain the sample using
an appropriate method, such as one based on
randomness. 72
Randomization

 Randomization
is used when subjects are assigned to different
groups through a process of random selection. The
logic is to use chance as a way to create two groups
that are similar.

73
Summary

Three very important considerations in the design of


experiments are the following:
1. Use randomization to assign subjects to different
groups

2. Use replication by repeating the experiment on enough


subjects so that effects of treatment or other factors
can be clearly seen.
3. Control the effects of variables by using such
techniques as blinding and a completely randomized
experimental design
74
Sampling Strategies

 Why Sample?

75
Sampling Strategies
 Types of Samples

76
Sampling Strategies

 Probability Sample
selecting members from a population in such a way
that each member of the population has a known
(but not necessarily the same) chance of being
selected

77
Sampling Strategies

 Random Sample
members from the population are selected in such
a way that each individual member in the
population has an equal chance of being selected

 Simple Random Sample


of n subjects selected in such a way that every
possible sample of the same size n has the same
chance of being chosen
78
Random Sampling
selection so that each
individual member has an
equal chance of being selected

79
Systematic Sampling
Select some starting point and then
select every kth element in the population

80
Stratified Sampling
subdivide the population into at
least two different subgroups that share the same
characteristics, then draw a sample from each subgroup (or
stratum)

81
Cluster Sampling
divide the population area into sections
(or clusters); randomly select some of those clusters; choose all
members from selected clusters

82
Multistage Sampling

Collect data by using some combination of the basic


sampling methods

In a multistage sample design, pollsters select a sample in


different stages, and each stage might use different
methods of sampling

83
Sampling Strategies
 Non-Probability Samples

84
Sampling Strategies
 Non-Probability Samples

85
Convenience Sampling
use results that are easy to get

86
Methods of Sampling - Summary

Multistage Sampling – Combination of one or more


of the above sampling methods in stages.

The methods that we study usually assume that we


have a simple random sample 87
Errors
No matter how well you plan and execute the
sample collection process, there is likely to be
some error in the results.
 Sampling error
the difference between a sample result and the true
population result; such an error results from chance
sample fluctuations

 Nonsampling error
sample data incorrectly collected, recorded, or
analyzed (such as by selecting a biased sample, using
a defective instrument, or copying the data
88
incorrectly)
Recap

In this section we have looked at:


 Types of studies and experiments
 Controlling the effects of variables
 Randomization
 Types of sampling
 Sampling and Non-Sampling errors

89

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy