0% found this document useful (0 votes)

50 views11 pages

Lecture 1 NOTES Variables and Distributions 2020-21

This document introduces key concepts for analyzing variables and distributions in studies. It defines explanatory and response variables, and different types of variables including binary, categorical, and quantitative. It discusses displaying and interpreting frequency tables and distributions for variables, including histograms. The purpose is to help students produce and interpret summaries of data to answer research questions.

Uploaded by

Hannah Matthews

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views11 pages

Lecture 1 NOTES Variables and Distributions 2020-21

Uploaded by

Hannah Matthews

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

LECTURE 1

VARIABLES AND DISTRIBUTIONS

Objectives
After completing this session, students should be able to:
1/ Identify the variables in a study, their types (binary, (ordered) categorical,
quantitative), and whether they are explanatory or response variables.
2/ Display and interpret distributions of variables in frequency and cumulative
frequency tables, histograms and cumulative frequency curves.
3/ Construct and interpret two-way frequency tables comparing distributions.
4/ Construct and interpret a summary of a distribution of a quantitative variable using
the mean, standard deviation, median, quartiles, and other centiles.
5/ Distinguish a skewed from a symmetric distribution; use logarithms to make more
symmetric a skewed distribution; calculate a geometric mean of such a distribution.

INTRODUCTION
We will look at a simple example first, then refer back to it to introduce some key concepts

Example 1:-
A group of 2266 Canadian newborns are classified by birthweight and whether the
mother had to lift as part of her job. (yes or no). Interest is in whether babies born to
mothers who had to lift at work are likely to be smaller. The data from the study
might be held in a computer file looking something like this:
Table 1
Subject ID Birthweight lifting at work
Number (0-no, 1= yes)

1 3300 0
2 4100 0
3 3970 1
4 3840 0
…………
…………
2265 4000 0
2266 3300 0

The purpose of this lecture is to introduce concepts and methods to help you produce and
interpret summaries of these and similar data to answer questions like that posed above.

L1.1
SUBJECTS AND VARIABLES

In example 1, and in most health studies, information (data) was collected on the
characteristics of persons included in the study (study subjects). In example 1, these
characteristics are birthweight and mother lifting at work. Other examples of characteristics
are sex, environmental exposures, treatments, blood pressure, and experience of disease. We
call the characteristics variables, because the value a variable takes (for example birthweight
in grams, whether lifted at work) varies from person to person.
In Table 1, each row represents a subject, and each column a variable. This is the way data
are usually kept by computers.
In this example, and often, study subjects are persons. However sometimes rather than
measuring characteristics (variables) for persons we measure them for other “units”, for
example households, towns, hospitals, or mosquitoes. In statistical terminology, we still call
these units subjects. A value of each variable is needed for each subject.

Explanatory and response variables

A variable will usually be measured for one of two purposes:-
 It is an outcome of interest. In example 1 above, birthweight is the outcome.
Outcome variables are also called response variables.
 It is a factor that influences (or might influence) the outcome. In the example, lifting
is such a variable. These are often called explanatory variables.
More advanced point: In our example, we have suggested looking at lifting as the explanatory
variable and birthweight as the response variable. Note that in other studies, for example a
sociological one looking at the association between ethnicity and lifting, lifting could be a
response variable. In a study looking at the association of birthweight with diseases later on in
life, birthweight would be an explanatory variable. The distinction between explanatory and
response variable is context-specific, not an intrinsic attribute of the variable.

Types of variable
Variables can also be classified by the types of values they can take. The main ones are:
 Qualitative
o Binary (dichotomous) variables, where the values are two different
categories; for example, sex, or vaccinated status for a particular vaccine, or
being of low birthweight or not.
o Categorical variables, taking as values several different categories that are
distinct from each other; for example marital status (never married, married,
widowed, divorced), or blood group.
o Ordered categorical variables, for which the different categories are
ordered on some scale; for example, severity of disease (mild, moderate,
severe) or a disability score.

L1.2
 Quantitative (numerical) variables, where some quantity is measured on a well defined
scale with units; for example, weight, blood pressure, number of episodes of asthma in a
fixed period.

Coding variable values

When the values of a binary or categorical variable are recorded, they are usually given
numerical codes for computer use. For example the outcome "Yes" and "No" were coded 1
and 0 in Example 1. However, this does not make these variables quantitative.
Information is sometimes missing for some subjects in which case there should also be a
special code for "missing value", to allow us to omit them from analyses. This applies to all
types of variables.

Another example: A randomised controlled trial for a new drug for the treatment of
hypertension. The response (outcome) is blood pressure or change in blood pressure. The
principal explanatory variable is whether a subject is assigned to the new drug or control.
Question 1
What would be the response and explanatory variables in the following studies?
1a A questionnaire survey of whooping cough vaccination status in a sample of boys and
girls, aiming to identify if socio-economic or ethnic status determined whether children were
vaccinated.
1b A study of the occurrence of whooping cough in children, aiming to identify how effective
vaccination was at preventing whooping cough.
Question 2 Write down the types of the variables you identified in question 1.

FREQUENCY TABLES AND DISTRIBUTIONS

We can see how the values change from subject to subject just by eye-balling a table like
Table 1. But some way of summarising what we see is usually necessary.
The frequencies with which the different possible Table 2
values of a variable occur in a group of subjects is
called the frequency distribution of the variable --------------------------------
in the group. For a very simple example, consider Lifting Number of deliveries
at work --------------------
the frequency distribution of lifting at work in freq- relative
example 1, which is shown in the first two uency frequency(%)
--------------------------------
columns of Table 2, reproduced on the right with NO 1,310 (58.8%)
YES 956 (41.2%)
additional column titles. Notice that it was useful
to include in the table not just the counts of All 2,266 (100%)
subjects lifting and not lifting (frequencies), but ----------------------------------------------------
also the percentages (relative frequencies). A
table like Table 2 is called a frequency table.
For qualitative variables (such as lifting at work) frequency tables usually include one row for

L1.3
each value of the variable (here NO and YES), . For quantitative variables such tables are
seldom helpful, unless the number of observations is quite small. It is more useful to group
the values taken by the variable and to report the numbers and the frequencies (or percentage
frequencies) of subjects in each group. We show a grouped frequency table for the
distribution of birthweight in babies born to women who did not lift during pregnancy:

Table 3. The distribution of birth weight in 1310 women who did not lift at work
--
Birthweight N. of women Percentage
(g) (frequency) (relative frequency)
--
500- 999 3 0.2
1000-1499 9 0.7
1500-1999 16 1.2
2000-2499 68 5.2
2500-2999 301 23.0
3000-3499 484 37.0
3500-3999 327 25.0
4000-4499 94 7.2
4500-4999 8 0.6
-
Total 1310 100.0
-

Showing distributions graphically; histograms

For qualitative variables frequency distributions can be displayed as bar charts (see notes on
The display of results in this manual). For quantitative variables a grouped frequency
distribution (Table 3) can be displayed in a histogram – see the figure below. Notice:
1. A histogram is made up of a rectangle for
40

each group (row of the grouped frequency

table).
30
Relative Frequency (%)

2. In histograms, the rectangles touch – there

is no gap between them. This distinguishes
20

them from bar-charts showing

distributions of categorical variables,
10

which generally do not touch.

3. We can tell the shape of a distribution
0

0 1000 2000 3000 4000 5000

from a histogram, in particular whether it birth weight

is symmetrical. In this example the distribution is not quite symmetrical. It is

“skewed to the left”.
4. In this example, and others where the widths of the groups are equal, the height of
each rectangle represents the frequency. By changing the scale of the y-axis it can
also represent the relative frequency.
5. [Optional advanced point.] If histograms have groups of unequal width, the area of
the rectangle rather than the height represents frequency of relative frequency.
6. [Optional advanced point.] Novices might be tempted to draw non-touching

L1.4
rectangles: 500-999; 1000-1999; etc. These are what the recorded values were. A
(pedantic) case could also be made for touching rectangles with boundaries at 8.95,
9.95 etc. Both these would be a mistake. The point is to give a simple visual picture
of the distribution, not to display a lot of detail, which would distract your readers.

Cumulative relative frequency tables and curves

An alternative to the histogram for quantitative variables is to display the cumulative
frequencies. These are calculated below for the birthweight data.
Table 4
---
Birthweight N. of women Percentage Cumulative Percentage
(g)
(frequency) (relative frequency) (cumulative relative
frequency)
---
500- 999 3 0.2 0.2
1000-1499 9 0.7 0.9
1500-1999 16 1.2 2.1
2000-2499 68 5.2 7.3
2500-2999 301 23.0 30.3
3000-3499 484 37.0 67.3
3500-3999 327 25.0 92.2
4000-4499 94 7.2 99.4
4500-4999 8 0.6 100.0
--
Total 1310 100.0 100.0
--

The cumulative relative frequency (also called cumulative percentage) of babies whose birth
weight is below 1500g is 0.2+0.7=0.9%, the cumulative relative frequency below 2000g is
0.2+0.7+1.2=2.1%, and so on. In a cumulative frequency curve the cumulative percentage
frequencies are plotted against the right hand ends of their intervals:

100

80
cumlative relative frequency (%)

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
bi rth_weight

L1.5
It is not as easy to study the shape of a distribution from a cumulative curve as from a
histogram, but cumulative curves are easier to use when there are different grouping intervals
(they avoid the problems of calculating the heights of the bars to equalise the areas) and also
have other rather more specialised applications. They are useful for finding medians (see
below; briefly, the median birth weight is obtained by drawing a horizontal line through the
50% point on the vertical axis and noting the birthweight at the point where it cuts the curve).

Comparing distributions in two groups of subjects

Up to here we have shown a frequency table, histogram, and cumulative frequency curve,
which showed the distribution of a variable in one group of subjects. These “one-way”
presentations are purely descriptive; they give useful information about the values the
variable takes, but cannot show how it is related to anything else.
Two-way frequency tables allow two (or more) groups to be compared. Usually, the
interesting two-way table shows the distribution of the response variable (eg birthweight) in
groups defined by the explanatory variable (lifting at work):

Table 5

Lifting at work

Birthweight no yes
 
(gm) N. % N. %

<2000 28 2.1 25 2.6
2000-2499 68 5.2 49 5.1
2500-2999 301 23.0 244 25.5
3000-3499 484 37.0 345 36.1
3500- 429 32.7 293 30.6

Total 1310 100.00 956 100.00


SUMMARY MEASURES FOR QUANTITATIVE VARIABLES

Distributions of categorical variables (including binary and ordered categorical variables) can
be expressed as percentages; these percentages can be presented as such or contrasted
between groups of subjects. There is little more that can be done to summarise these
distributions. The distribution of a binary variable is however expressible in only one
percentage (with its accompanying total, for completeness). For example “41.2% of mothers
lifted at work (n= 2,266)”.
For a quantitative variable, there are other measures that aim to summarise the information in
the values taken by the variable. Two summaries are usually given:
 one for central value or "location" of the distribution
 one for spread – a measure that indicates how widely values are spread above and below
the central value.

L1.6
There are several summary measures of central value and of spread, but the most common are
the mean and standard deviation.
The mean
The most commonly used measure of the central value of a distribution is the arithmetic
mean, or the average. It is the sum of the observations divided by the number of
observations. The mean birthweight in Example 1 (non-lifting mothers) is 3240g. As is
usual, it lies quite centrally in the histogram. We shall now review the calculation of the mean
in a simpler example:

Example 2
The plasma volumes of 8 healthy men are
2.75 2.86 3.37 2.76 2.62 3.49 3.05 3.12 litres, respectively
The arithmetic mean plasma volume is
2.75  2.86  3.37  2.76  2.62  3.49  3.05  3.12 24.02
  3.0 litres
8 8
A formula that corresponds to this calculation is

mean  x 
x i

n
Where n denotes the number of observations or values of the variable and the value of the i’th
variable is denoted xi, so that x1=2.75, x2=2.86, etc. The Σ (greek sigma) symbol indicates
that the values of xi must be added.

The standard deviation

This is the measure of spread used in conjunction with the mean. It is based on the deviations
of the observations from the mean, that is on the difference between each observation and the
mean.
These deviations are squared and added. The result is divided by (n-1). The result of this is a
a kind of mean of squared deviations, and is called the variance. The standard deviation is
the square root of the variance. The abbreviation SD is often used for the standard deviation.
For those of you who find formulae help your understanding:

SD  var iance 
(x i  x)2
(n  1)
We do not actually calculate SDs using this procedure, which is described here to give a feel
of what the SD is – a kind of average of deviations about the mean. (The divisor n-1 is used
rather than n for technical reasons beyond the scope of this Unit.)
Both the SD and the mean can be obtained on your calculator by using the statistics mode.
The Appendix shows how this is done for most Casio and other makes of calculator. SD and
mean are also given by many computer programs. (SD is denoted  n1 by most calculators.)

L1.7
The mean and SD are usually the best summary measures of symmetrical distributions. For
these, the interval one standard deviation either side of the mean includes roughly 70% of the
distribution and two standard deviations includes roughly 95%.
Table of means and standard deviations are an alternative way of comparing the distribution
of a quantitative response variable in two or more groups. Example 1 again:
Table 6
─────────────────────────────────────────────────────
Lifting Number of Birthweight in g
at work deliveries(%) -----------------------
Mean SD
─────────────────────────────────────────────────────
NO 1,310 (58.8%) 3239.5 559.0
YES 956 (41.2%) 3190.6 553.4

All 2,266 (100%) 3219.4 557.0

─────────────────────────────────────────────
Would you choose this or table 5 to present a comparison of birthweight in babies born to
mothers lifting during pregnancy and those not lifting?

Non Symmetric Distributions

Example 3
The number of days spent in hospital by 17 subjects after an operation, arranged in
increasing size, were:
3 4 4 6 8 8 8 10 10 12 14 14 17 25 27 37 42
The distribution is not symmetric (asymmetric) because low values are closer together
and often repeated, compared with the string of high values. The mean is 14.6 days.
This is not in the centre of the distribution.

The median and quartiles of a distribution

The median is an alternative measure
of central value that works better for
such a skewed distribution. It is the
value which halves the distribution,
with 50% of the observations below it
and 50% above.
The three values which divide the
distribution into quarters are called the
quartiles. The middle quartile is the
median, and the distance between the
lower quartile and the upper quartile,
called the inter-quartile range, is used
as a measure of spread.

L1.8
For a distribution with a large number of observations the quartiles are most easily found
from the cumulative relative frequency distribution (such as in the example above), by
reading off the values that correspond to 25%, 50%, and 75%.
For a smaller number of observations the median can be found directly by arranging the
observations in order from the lowest to the highest value and striking off values at both ends
until only one or two remain. If one, this value is the median; if two the median is half way
between them. The median is then used to divide the data into two halves and the medians of
each of the halves found in the same way - these are the upper and lower quartiles. (If the
median is the single central value, include it in each half).
For example 3, the median stay in hospital is 10 days. The 1st and 3rd quartiles are 8 and 17
days. (Formally, there are a number of definitions of the quartiles; one computer package
even gives four different versions! The details of these are not important to us).

Centiles
The quartiles are the values which correspond to the cumulative percentages 25, 50 and 75,
but there is no need to stick to these percentages. When using a distribution as a standard, for
example the distribution of weight standardised for age in young children, it is common
practice to report the values corresponding to the percentages, say, 5% 10%, 25%. These are
known as the 5th, 10th, 25th centiles (or percentiles) of the distribution.

Overall summary of a distribution

The following five numbers give a general purpose summary Box plot
of a distribution, and `work’ for both non-symmetric and 40 for
symmetric distributions:- example
The smallest value 3
30
The lower quartile, Q25
The median (Q50)
20
The upper quartile, Q75
The largest value
10
These numbers are sometimes shown in a figure (right) called
a box plot. In this, a bar represents the median, the box goes
between the quartiles, “whiskers” include 95% of a well-
0
behaved distribution, and outlying values are *s or “o”s..

Logarithms and Distributions

Another way to cope with skewed distributions, where (as with the days of stay in hospital) the
skew is "to the right", is by using the logarithms, or logs, of the data values for statistical
analysis, in place of the values themselves.
Before considering this procedure, recall a few points concerning logs.
There are different kinds of logarithm. Logarithms "to the base 10" were invented to do
multiplication and division by way of addition and subtraction, but they no longer play this role
in arithmetic because multiplication and division can be done with a calculator. We shall use
another kind of logarithm called the natural logarithm.

L1.9
For every positive number x there corresponds its logarithm ln(x). The two main arithmetical
properties of natural logarithms (and any other kind of logarithm) are
ln(xy) = ln(x) + ln(y)
ln(x/y) = ln(x) - ln(y).
That is, they convert multiplication to addition and division to subtraction.
To convert a number to its natural log with your calculator use the ln key. To convert the log of
x back to x the anti-logarithm function is used; this is the key SHIFT ln (or ex) on Casio and
many other calculators, sometimes written as exp(x) in text.
We now consider the use of logarithms with skewed data.

Example 3 - continued
We return to the 17 observations of duration of stay in hospital. As has been said, the
distribution is skewed to the right with a few rather large observations. We recall the
mean duration (14.6 days) is not a usually a satisfactory measure of the centre and the
median (10 days) is better for this purpose. The figure below repeats the point plot of
the observations, then shows the equivalent plot of their logarithms.

The distribution of the logs is more symmetric.

The mean log duration is 2.41 and is a satisfactory measure of the central value of the
distribution of log duration. The anti-logarithm of this mean is 11.13 and is known as
the geometric mean. It is usually a more useful measure of the central value of the
distribution of duration than the original mean and is close to the median for positively
skewed distributions.

L1.10
Some General Rules for Reporting Summaries
 Always report the number of observations on which the summary (percentage, mean, etc) is
based.
 If the central value of a quantitative distribution is measured using the mean give the
standard deviation as well.
 If the central value of a quantitative distribution is measured using the median, give the
lower and upper quartiles as well.
 For binary responses (two possible values, A or B) report the percentage of A's or B's but
not usually of both.

Answers to reader exercise questions

Answer to Questions 1 and 2:
Response or explanatory Variable Type (Question 2)

Question 1a
Response Whether or not vaccinated binary (yes, no)
Explanatory socio-economic status ordered categorical (usually)
ethnic status categorical (unordered)
sex (not mentioned as of interest, binary
but might be required to control
confounding – see epidemiology)
Other variables (not mentioned) -
might also be required to control
confounding, eg age (quantitative)

Question 1b
Response Whooping cough binary (yes, no)
Explanatory Whether or not vaccinated binary (yes, no)
Other variables (not mentioned) -
might also be required to control
confounding, eg age (quantitative)

Note: These two questions illustrate that whether a variable is a response or explanatory
variable depends on the context – vaccination status was the response variable in 1a, but an
explanatory variable in 1b.

L1.11

Biostatistics - Prelim Transes
No ratings yet
Biostatistics - Prelim Transes
7 pages
Statistical Methods in Epidemiology
No ratings yet
Statistical Methods in Epidemiology
109 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
118 pages
Unit 2
No ratings yet
Unit 2
27 pages
Biostatistics and Exercise
100% (9)
Biostatistics and Exercise
97 pages
1-Lecture - 1 - Applied Bioststistcs 4
No ratings yet
1-Lecture - 1 - Applied Bioststistcs 4
18 pages
Ss - Lab - Ktu
No ratings yet
Ss - Lab - Ktu
18 pages
Drawing Aids: Setting Grid and Snap
No ratings yet
Drawing Aids: Setting Grid and Snap
14 pages
MOVIES - Files On DVDs - 367
No ratings yet
MOVIES - Files On DVDs - 367
85 pages
DSCSignerServiceVer 4 1 6UserGuidelines
No ratings yet
DSCSignerServiceVer 4 1 6UserGuidelines
75 pages
BST 121
No ratings yet
BST 121
111 pages
Stat - Lesson 1 Concepts and Definitions
No ratings yet
Stat - Lesson 1 Concepts and Definitions
5 pages
EA311 Lecture Note One
No ratings yet
EA311 Lecture Note One
33 pages
Basic Concepts of Variables
No ratings yet
Basic Concepts of Variables
1 page
Planning & Carrying Out A Search
No ratings yet
Planning & Carrying Out A Search
5 pages
Unit 2
No ratings yet
Unit 2
66 pages
Usability Evaluation of Modeling Languages by Christian Schalles
No ratings yet
Usability Evaluation of Modeling Languages by Christian Schalles
185 pages
Lec7 Biostatistics
No ratings yet
Lec7 Biostatistics
86 pages
SagarRane (10 0) PDF
No ratings yet
SagarRane (10 0) PDF
7 pages
Lectures Total
No ratings yet
Lectures Total
269 pages
FMA Unit-4 MCQ's PART-A (1 Mark Question)
100% (1)
FMA Unit-4 MCQ's PART-A (1 Mark Question)
8 pages
On Variables
No ratings yet
On Variables
83 pages
02 Biostatistics - DrSikanderLectures
No ratings yet
02 Biostatistics - DrSikanderLectures
161 pages
04 LEGAL and ETHICAL ISSUES
No ratings yet
04 LEGAL and ETHICAL ISSUES
34 pages
Unit 2: Chapter 3: Requirements Analysis and Specification 1. Requirements Gathering and Analysis
No ratings yet
Unit 2: Chapter 3: Requirements Analysis and Specification 1. Requirements Gathering and Analysis
21 pages
Biostastic Most Imp PDF
No ratings yet
Biostastic Most Imp PDF
93 pages
Types of Data
No ratings yet
Types of Data
34 pages
Probability and Statistics - Y2Phys
No ratings yet
Probability and Statistics - Y2Phys
108 pages
Pharmacy Statistics Prelims - Reviewer
No ratings yet
Pharmacy Statistics Prelims - Reviewer
47 pages
Introduction For Biostatistics
No ratings yet
Introduction For Biostatistics
41 pages
Terms and Definitions
No ratings yet
Terms and Definitions
31 pages
K-Map Method
No ratings yet
K-Map Method
3 pages
UNIFIED MATH 10 FIRST PERIODIC TEST With Answer Key
92% (181)
UNIFIED MATH 10 FIRST PERIODIC TEST With Answer Key
5 pages
10 1016@j Aller 2009 10 005
No ratings yet
10 1016@j Aller 2009 10 005
7 pages
DD vcredistUI0CD6
No ratings yet
DD vcredistUI0CD6
2 pages
Neb Class 12 Computer Programming in C Notes
No ratings yet
Neb Class 12 Computer Programming in C Notes
60 pages
DA1-Types of Data
100% (1)
DA1-Types of Data
17 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Analysis and Design of Research Studies Lecture 1: Variables and Distributions Summarising Data Objectives
No ratings yet
Analysis and Design of Research Studies Lecture 1: Variables and Distributions Summarising Data Objectives
16 pages
Resume To Yaya Wallet
No ratings yet
Resume To Yaya Wallet
13 pages
Biostat Compiled
No ratings yet
Biostat Compiled
617 pages
Probability and Statistics: BY Engr. Jorge P. Bautista
No ratings yet
Probability and Statistics: BY Engr. Jorge P. Bautista
172 pages
Unit One Graphing and Descriptive Statis-1
No ratings yet
Unit One Graphing and Descriptive Statis-1
12 pages
Biostatistics Module Sep2023 240520 122333
No ratings yet
Biostatistics Module Sep2023 240520 122333
65 pages
Introduction To Statistics and SPSS
100% (1)
Introduction To Statistics and SPSS
110 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
EEI3346 Final Written Paper - 9th January2022
No ratings yet
EEI3346 Final Written Paper - 9th January2022
10 pages
Statistical Method Lecture Note
No ratings yet
Statistical Method Lecture Note
13 pages
SV102
No ratings yet
SV102
2 pages
Emailing Assignmen 8614 Irshan
No ratings yet
Emailing Assignmen 8614 Irshan
17 pages
Section 1: Organizing Data: Line List Line Listing
No ratings yet
Section 1: Organizing Data: Line List Line Listing
45 pages
BIO STATISTICS of First Semester
No ratings yet
BIO STATISTICS of First Semester
143 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
B.1 Learning Modules Quarter 3 Learning Information and Course Activity
No ratings yet
B.1 Learning Modules Quarter 3 Learning Information and Course Activity
23 pages
HCF and LCM PDF
No ratings yet
HCF and LCM PDF
31 pages
Unit-2 Ids
No ratings yet
Unit-2 Ids
64 pages
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Introduction Bio.
No ratings yet
Introduction Bio.
12 pages
Clinic Information System
100% (1)
Clinic Information System
15 pages
Biostatistics Teaching
No ratings yet
Biostatistics Teaching
283 pages
Topic 1 Descriptive Statistics SV
No ratings yet
Topic 1 Descriptive Statistics SV
113 pages
DTMSU MicroProject - 1
No ratings yet
DTMSU MicroProject - 1
22 pages
1 Introduction
No ratings yet
1 Introduction
97 pages
Chapter-1 (Introduction To Biostatistics)
No ratings yet
Chapter-1 (Introduction To Biostatistics)
30 pages
6117991xF2-8 - ADTRAN1148SVX Host - Client
No ratings yet
6117991xF2-8 - ADTRAN1148SVX Host - Client
4 pages
Important Concepts Doc
No ratings yet
Important Concepts Doc
40 pages
احصاء حيوي
No ratings yet
احصاء حيوي
37 pages
Chapter-1 Data Analysis
No ratings yet
Chapter-1 Data Analysis
14 pages
Introduction To Programming Language C 2023
100% (1)
Introduction To Programming Language C 2023
44 pages
G.E. 4 Pre - Final Handoout
No ratings yet
G.E. 4 Pre - Final Handoout
11 pages
Exercises
No ratings yet
Exercises
21 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Application: Material: Color: Advantage
No ratings yet
Application: Material: Color: Advantage
1 page
BOT 315 Slide
No ratings yet
BOT 315 Slide
20 pages
Vikram Takalkar 4.4+yrs ReactJS
No ratings yet
Vikram Takalkar 4.4+yrs ReactJS
2 pages
Chapters 1 and 2
No ratings yet
Chapters 1 and 2
12 pages
Purchase Order Version Management - S - 4HANA Materials Management
No ratings yet
Purchase Order Version Management - S - 4HANA Materials Management
18 pages
Lesson 1 Intro To Statistics
No ratings yet
Lesson 1 Intro To Statistics
3 pages
Cs 101 Lecture - Unit1-Week1-2
No ratings yet
Cs 101 Lecture - Unit1-Week1-2
33 pages
Fozia Saifal (8614) Assignment 1
No ratings yet
Fozia Saifal (8614) Assignment 1
17 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
Enma 104 Notes
No ratings yet
Enma 104 Notes
27 pages
Bio Introduction
No ratings yet
Bio Introduction
101 pages
CoDeSys Version 3.5 First Steps Ver. 1
No ratings yet
CoDeSys Version 3.5 First Steps Ver. 1
74 pages
Using CAATs To Support IS Audit
No ratings yet
Using CAATs To Support IS Audit
3 pages
1 Introduct
No ratings yet
1 Introduct
9 pages
Statistics Introduction
No ratings yet
Statistics Introduction
55 pages
Design of LPDDR3 Memory Controller With Axi
No ratings yet
Design of LPDDR3 Memory Controller With Axi
4 pages
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 1 NOTES Variables and Distributions 2020-21

Uploaded by

Lecture 1 NOTES Variables and Distributions 2020-21

Uploaded by

LECTURE 1

VARIABLES AND DISTRIBUTIONS

Explanatory and response variables

Coding variable values

FREQUENCY TABLES AND DISTRIBUTIONS

Showing distributions graphically; histograms

each group (row of the grouped frequency

2. In histograms, the rectangles touch – there

them from bar-charts showing

which generally do not touch.

0 1000 2000 3000 4000 5000

is symmetrical. In this example the distribution is not quite symmetrical. It is

Cumulative relative frequency tables and curves

Comparing distributions in two groups of subjects

SUMMARY MEASURES FOR QUANTITATIVE VARIABLES

The standard deviation

All 2,266 (100%) 3219.4 557.0

Non Symmetric Distributions

The median and quartiles of a distribution

Overall summary of a distribution

Logarithms and Distributions

The distribution of the logs is more symmetric.

Answers to reader exercise questions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.