0% found this document useful (0 votes)
28 views23 pages

MAP 716 Lecture 4 Simple Linear Regression

Here are the answers to the tutorial questions: 1. The estimated regression line is: Cholesterol level = -2.134 + 0.044(Average TV time) 2. The estimate of the slope of the line is 0.044 3. The test is a t-test. The null hypothesis is that the slope is 0 (no association) and the alternative is that the slope is not equal to 0 (association exists). 4. The t-test result shows the slope is statistically significant. We can conclude there is evidence of an association between average TV time and cholesterol level. 5. The implication of the estimated slope is that for every 1 hour increase in average TV time, cholesterol

Uploaded by

josephnjenga142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views23 pages

MAP 716 Lecture 4 Simple Linear Regression

Here are the answers to the tutorial questions: 1. The estimated regression line is: Cholesterol level = -2.134 + 0.044(Average TV time) 2. The estimate of the slope of the line is 0.044 3. The test is a t-test. The null hypothesis is that the slope is 0 (no association) and the alternative is that the slope is not equal to 0 (association exists). 4. The t-test result shows the slope is statistically significant. We can conclude there is evidence of an association between average TV time and cholesterol level. 5. The implication of the estimated slope is that for every 1 hour increase in average TV time, cholesterol

Uploaded by

josephnjenga142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

MAP 716: BIOSTATITSICS II AND COMPUTTING

Lecture 4: Simple Linear Regression

Dr Alice Lakati, PhD


Senior Lecturer
Amref International University
Simple Linear Regression
• Linear regression analyzes the relationship
between two variables, X and Y.
• For each subject (or experimental unit), you know
both X and Y and you want to find the best straight
line through the data.
• In some situations, the slope and/or intercept have
a scientific meaning.
• In other cases, you use the linear regression line as
a standard curve to find new values of X from Y, or
Y from X.
Simple Linear Regression
• The first step in investigating the relation between two
continuous variables X and Y is to obtain a scatter plot
• Shape of plot gives forms of relation
• Linear
• Quadratic
• More complex
• The variable Y is called the response or outcome
dependent variable
• The variable X is called the predictor or explanatory or
independent variable (or factor), or covariates
• The aim is to fit a straight line through the points in the
scatter plot in some optimal way
The Regression Line

• Simple linear regression is a model which describes the


relationship between the mean μy/x of the variable y and
another variable x
• The term linear is used because the mean μy/x is represented
as a straight line or straight function of x

• The equation of the model is μy/x =α + ßx

• The goal of linear regression is to adjust the values of slope


and intercept to find the line that best predicts Y from X.
• More precisely, the goal of regression is to minimize the sum
of the squares of the vertical distances of the points from the
line
Slope and intercept
SLR Model

• The SLR Model has Yi as the response variable in the ith


trial

• Xi = value of the predictor variable (constant) in the ith


trial
• Ei= unknown random error term with E (ei)=0 and
constant variance Var(ei)= 2. we further assume the
errors are uncorrelated, i.e Cov {ei, ej}=0 for all i≠j
• The Model is “simple” because since only one X
variable is involved; “linear” because no parameters
appears as an exponent or is multiplied or divided by
another parameter; also note that X appear only in the
first power
Regression line
• If we have observed the values of the two
variables, we can perform a regression of Y on X
• The general equation of the regression line based
on a sample is
• Y’=a + bx
• Where y’ is the estimated or fitter value of y
• a is the intercept of the line (the value of y where
the line crosses the y axis
• b is the slope of the line
x Y Y’
53 60 53
66 65 64
73 65 69
49 57 50
Least Squares Estimation
• Least squares estimation is the method of estimating
the equation/fitting the model to the data in an
optimal way
• The sum of the squares of the vertical distances of the
observations from the line are minimized
• Formula for LSE
Correlation and Association
• The correlation of
coefficient
between X and Y
can be estimated
by
Prediction
• A=9.667
• B= 0.816
• r= 0.744
• R squared = 0.554

• State regression equation


• Y = 9.667 + 0.816x + e

• If a student score 78% in the CATs, estimate the final exam score ?
• If score you score 66% , what be the exam score 63.52 vs actual
61 variation 2.5
Parameter estimation
• The estimates a and b of the parameters α and ß are known
as regression coefficients
• The estimate b of the population slope ß is of most interest
• b is interpreted as the increase in the mean of y for each
change of one unit in the value of x
• The regression equation can be used to predict or forecast
the values of y
• E.g weight’= -37.6 + 0.635height
• For a height of 170cm the predicted weight will be 73.4kg
Confidence interval for b
• Standard error (Se b) for b = ss residual/√SSxy(total)

• So 95% CI = b ± tn-2 Se(b)


Coefficient of Determination
• The coefficient of determination is denoted by R2
• For SLR R2 =r2 square of the correlation coefficient
• R2 measures the fit of the model
• It is interpreted as the proportion of the
variability among the observed values of y
explained by the linear regression of y on x

• It measures the usefulness or predictive value of


the model
SLR Model
• The main concern is the total variation among the y’s
that can be attributed to chance and how much can be
attributed to the relationship between the two variables
• To answer this question we require to analyze or
separate the total variation of the observed y’s to the
different sources which give rise to it
• Total variation =Σ( y-y’)
• Regression variation = Σ(y”-y’)
• Residual or chance variation = Σ(y-y’)
Model…
Interpretation of outputs
• F-statistics; Prob >F is the probability that
variability associated with model have occurred
by chance, on the assumption that the true model
has only a constant term and no explanatory
variable
• R squared is more important
• Adjusted R estimates the good fit – useful MLR
• Coefficients in the model – slope
• T-test test the significance of the coefficient
• (chol le)Y=- 2.134+ 0.044(TV time) +e

• Slope = 0.044 (95% CI 0.02 – 0.06..)

• T-test = b/se(b),
• H0: b=0, H1: b<>0
Tutorial: Example; SLR on average time of watching TV and cholesterol level concentration mmol/L

1. Write down the estimated regression line. .


2. What is the estimate of the slope of the line?
3. What is the test that is associated with this estimate and
what are the hypotheses associated with it?
4. State the result of this test and your conclusion in terms of
the problem.
5. What is the implication of the estimated slope?
6. What is the implication of the intercept?
Tutorial…
7. What is the test associated with the overall
strength of the linear association? What are the
hypotheses associated with this test?

i. What conclusion do you reach with respect to the


existence of a linear association between average
time of watching TV and cholesterol level?
Example; SLR on average time of watching TV &
cholesterol level concentration mmol/L

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy