0% found this document useful (0 votes)

57 views21 pages

A Brief Introduction To Linear Models in R

The document provides an introduction to linear models in R. It discusses how linear models are commonly used in bioinformatics for analyses like RNA-Seq, GWAS, and microarrays. It demonstrates how to fit linear models in R using the lm function, interpret model coefficients, and understand the design matrix that underlies models fit by packages like limma and edgeR. Key aspects covered include model syntax in R, treatment contrasts, and models with and without intercept terms.

Uploaded by

Jalanie Matulik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views21 pages

A Brief Introduction To Linear Models in R

Uploaded by

Jalanie Matulik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

A Brief Introduction to Linear Models in R

0. Introduction
Many bioinformatics applications involving repeatedly fitting linear models to data. Examples include:

 RNA-Seq differential expression analyses

 GWAS (for continuous traits)
 eQTL analyses
 Microarray data analyses

Understanding linear modelling in R can help in implementing these types of analyses.

Scope
 Basics of linear models
 R model syntax
 Understanding contrasts
 Models with continuous covariates

We will not discuss:

 Diagnostic plots
 Data-driven model selection
 Anything that doesn’t scale well when applied to 1000’s of genes/SNPs/proteins

1. Linear models
A linear model is a model for a continuous outcome Y of the form
Y=β0+β1X1+β2X2+⋯+βpXp+ϵ
The covariates X can be:

 a continuous variable (age, weight, temperature, etc.)

 Dummy variables coding a categorical covariate (more later)

The β’s are unknown parameters to be estimated.

The error term ϵ is assumed to be normally distributed with a variance that is constant across the range of the data.
Models with all categorical covariates are referred to as ANOVA models and models with continuous covariates are referred to as linear regression models. These are all linear models,
and R doesn’t distinguish between them.

2. Linear models in R
R uses the function lm to fit linear models.
Read in ’lm_example_data.csv`:

dat <- read.csv("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2018-September-Bioinformatics-Prerequisites/master/friday/

lm_example_data.csv")
head(dat)
## sample expression batch treatment time temperature
## 1 1 1.2139625 Batch1 A time1 11.76575
## 2 2 1.4796581 Batch1 A time2 12.16330
## 3 3 1.0878287 Batch1 A time1 10.54195
## 4 4 1.4438585 Batch1 A time2 10.07642
## 5 5 0.6371621 Batch1 A time1 12.03721
## 6 6 2.1226740 Batch1 B time2 13.49573
str(dat)
## 'data.frame': 25 obs. of 6 variables:
## $ sample : int 1 2 3 4 5 6 7 8 9 10 ...
## $ expression : num 1.214 1.48 1.088 1.444 0.637 ...
## $ batch : Factor w/ 2 levels "Batch1","Batch2": 1 1 1 1 1 1 1 1 1 1 ...
## $ treatment : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ...
## $ time : Factor w/ 2 levels "time1","time2": 1 2 1 2 1 2 1 2 1 2 ...
## $ temperature: num 11.8 12.2 10.5 10.1 12 ...

Fit a linear model using expression as the outcome and treatment as a categorical covariate:

oneway.model <- lm(expression ~ treatment, data = dat)

In R model syntax, the outcome is on the left side, with covariates (separated by +) following the ~

oneway.model
##
## Call:
## lm(formula = expression ~ treatment, data = dat)
##
## Coefficients:
## (Intercept) treatmentB treatmentC treatmentD treatmentE
## 1.1725 0.4455 0.9028 2.5537 7.4140
class(oneway.model)
## [1] "lm"

Note that this is a one-way ANOVA model.

summary() applied to an lm object will give p-values and other relevant information:
summary(oneway.model)
##
## Call:
## lm(formula = expression ~ treatment, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9310 -0.5353 0.1790 0.7725 3.6114
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1725 0.7783 1.506 0.148
## treatmentB 0.4455 1.1007 0.405 0.690
## treatmentC 0.9028 1.1007 0.820 0.422
## treatmentD 2.5537 1.1007 2.320 0.031 *
## treatmentE 7.4140 1.1007 6.735 1.49e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.74 on 20 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7033
## F-statistic: 15.22 on 4 and 20 DF, p-value: 7.275e-06

In the output:
 “Coefficients” refer to the β�’s
 “Estimate” is the estimate of each coefficient
 “Std. Error” is the standard error of the estimate
 “t value” is the coefficient divided by its standard error
 “Pr(>|t|)” is the p-value for the coefficient
 The residual standard error is the estimate of the variance of ϵ�
 Degrees of freedom is the sample size minus # of coefficients estimated
 R-squared is (roughly) the proportion of variance in the outcome explained by the model
 The F-statistic compares the fit of the model as a whole to the null model (with no covariates)

coef() gives you model coefficients:

coef(oneway.model)
## (Intercept) treatmentB treatmentC treatmentD treatmentE
## 1.1724940 0.4455249 0.9027755 2.5536669 7.4139642

What do the model coefficients mean?

By default, R uses reference group coding or “treatment contrasts”. For categorical covariates, the first level alphabetically (or first factor level) is treated as the reference group. The
reference group doesn’t get its own coefficient, it is represented by the intercept. Coefficients for other groups are the difference from the reference:
For our simple design:

 (Intercept) is the mean of expression for treatment = A

 treatmentB is the mean of expression for treatment = B minus the mean for treatment = A
 treatmentC is the mean of expression for treatment = C minus the mean for treatment = A
 etc.

# Get means in each treatment

treatmentmeans <- tapply(dat$expression, dat$treatment, mean)
treatmentmeans["A"]
## A
## 1.172494
# Difference in means gives you the "treatmentB" coefficient from oneway.model
treatmentmeans["B"] - treatmentmeans["A"]
## B
## 0.4455249

What if you don’t want reference group coding? Another option is to fit a model without an intercept:

no.intercept.model <- lm(expression ~ 0 + treatment, data = dat) # '0' means 'no intercept' here
summary(no.intercept.model)
##
## Call:
## lm(formula = expression ~ 0 + treatment, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9310 -0.5353 0.1790 0.7725 3.6114
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## treatmentA 1.1725 0.7783 1.506 0.147594
## treatmentB 1.6180 0.7783 2.079 0.050717 .
## treatmentC 2.0753 0.7783 2.666 0.014831 *
## treatmentD 3.7262 0.7783 4.787 0.000112 ***
## treatmentE 8.5865 0.7783 11.032 5.92e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.74 on 20 degrees of freedom
## Multiple R-squared: 0.8878, Adjusted R-squared: 0.8598
## F-statistic: 31.66 on 5 and 20 DF, p-value: 7.605e-09
coef(no.intercept.model)
## treatmentA treatmentB treatmentC treatmentD treatmentE
## 1.172494 1.618019 2.075270 3.726161 8.586458

Without the intercept, the coefficients here estimate the mean in each level of treatment:

treatmentmeans
## A B C D E
## 1.172494 1.618019 2.075270 3.726161 8.586458

The no-intercept model is the SAME model as the reference group coded model, in the sense that it gives the same estimate for any comparison between groups:
Treatment B - treatment A, reference group coded model:

coefs <- coef(oneway.model)

coefs["treatmentB"]
## treatmentB
## 0.4455249

Treatment B - treatment A, no-intercept model:

coefs <- coef(no.intercept.model)

coefs["treatmentB"] - coefs["treatmentA"]
## treatmentB
## 0.4455249

The Design Matrix

For the RNASeq analysis programs limma and edgeR, the model is specified through the design matrix.

The design matrix X� has one row for each observation and one column for each model coefficient.
Sound complicated? The good news is that the design matrix can be specified through the model.matrix function using the same syntax as for lm, just without a response:
Design matrix for reference group coded model:

X <- model.matrix(~treatment, data = dat)

X
## (Intercept) treatmentB treatmentC treatmentD treatmentE
## 1 1 0 0 0 0
## 2 1 0 0 0 0
## 3 1 0 0 0 0
## 4 1 0 0 0 0
## 5 1 0 0 0 0
## 6 1 1 0 0 0
## 7 1 1 0 0 0
## 8 1 1 0 0 0
## 9 1 1 0 0 0
## 10 1 1 0 0 0
## 11 1 0 1 0 0
## 12 1 0 1 0 0
## 13 1 0 1 0 0
## 14 1 0 1 0 0
## 15 1 0 1 0 0
## 16 1 0 0 1 0
## 17 1 0 0 1 0
## 18 1 0 0 1 0
## 19 1 0 0 1 0
## 20 1 0 0 1 0
## 21 1 0 0 0 1
## 22 1 0 0 0 1
## 23 1 0 0 0 1
## 24 1 0 0 0 1
## 25 1 0 0 0 1
## attr(,"assign")
## [1] 0 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$treatment
## [1] "contr.treatment"

(Note that “contr.treatment”, or treatment contrasts, is how R refers to reference group coding)

 The first column will always be 1 in every row if your model has an intercept
 The column treatmentB is 1 if an observation has treatment B and 0 otherwise
 The column treatmentC is 1 if an observation has treatment C and 0 otherwise
 etc.
Exercises and Things to Think About
 Use ?formula to explore specifying models in R.
 Use ?lm.fit to see how lm uses the design matrix internally.
 If the response y is log gene expression, the model coefficients are often referred to as log fold-changes. Why does this make sense? (Hint: log(x/y) = log(x) - log(y))

3. Adding More Covariates

Batch Adjustment
Suppose we want to adjust for batch differences in our model. We do this by adding the covariate “batch” to the model formula:

batch.model <- lm(expression ~ treatment + batch, data = dat)

summary(batch.model)
##
## Call:
## lm(formula = expression ~ treatment + batch, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9310 -0.8337 0.0415 0.7725 3.6114
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1725 0.7757 1.512 0.147108
## treatmentB 0.4455 1.0970 0.406 0.689186
## treatmentC 1.9154 1.4512 1.320 0.202561
## treatmentD 4.2414 1.9263 2.202 0.040231 *
## treatmentE 9.1017 1.9263 4.725 0.000147 ***
## batchBatch2 -1.6877 1.5834 -1.066 0.299837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.735 on 19 degrees of freedom
## Multiple R-squared: 0.7667, Adjusted R-squared: 0.7053
## F-statistic: 12.49 on 5 and 19 DF, p-value: 1.835e-05
coef(batch.model)
## (Intercept) treatmentB treatmentC treatmentD treatmentE batchBatch2
## 1.1724940 0.4455249 1.9153967 4.2413688 9.1016661 -1.6877019

For a model with more than one coefficient, summary provides estimates and tests for each coefficient adjusted for all the other coefficients in the model.

Two-Way ANOVA Models

Suppose our experiment involves two factors, treatment and time. lm can be used to fit a two-way ANOVA model:

twoway.model <- lm(expression ~ treatment*time, data = dat)

summary(twoway.model)
##
## Call:
## lm(formula = expression ~ treatment * time, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0287 -0.4463 0.1082 0.4915 1.7623
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.97965 0.69239 1.415 0.17752
## treatmentB 0.40637 1.09476 0.371 0.71568
## treatmentC 1.00813 0.97918 1.030 0.31953
## treatmentD 3.07266 1.09476 2.807 0.01328 *
## treatmentE 9.86180 0.97918 10.071 4.55e-08 ***
## timetime2 0.48211 1.09476 0.440 0.66594
## treatmentB:timetime2 -0.09544 1.54822 -0.062 0.95166
## treatmentC:timetime2 -0.26339 1.54822 -0.170 0.86718
## treatmentD:timetime2 -1.02568 1.54822 -0.662 0.51771
## treatmentE:timetime2 -6.11958 1.54822 -3.953 0.00128 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.199 on 15 degrees of freedom
## Multiple R-squared: 0.912, Adjusted R-squared: 0.8591
## F-statistic: 17.26 on 9 and 15 DF, p-value: 2.242e-06
coef(twoway.model)
## (Intercept) treatmentB treatmentC
## 0.97965110 0.40636785 1.00813264
## treatmentD treatmentE timetime2
## 3.07265513 9.86179766 0.48210723
## treatmentB:timetime2 treatmentC:timetime2 treatmentD:timetime2
## -0.09544075 -0.26339279 -1.02568281
## treatmentE:timetime2
## -6.11958364

The notation treatment*time refers to treatment, time, and the interaction effect of treatment by time. (This is different from other statistical software).
Interpretation of coefficients:

 Each coefficient for treatment represents the difference between the indicated group and the reference group at the reference level for the other covariates
 For example, “treatmentB” is the difference in expression between treatment B and treatment A at time 1
 Similarly, “timetime2” is the difference in expression between time2 and time1 for treatment A
 The interaction effects (coefficients with “:”) estimate the difference between treatment groups in the effect of time
 The interaction effects ALSO estimate the difference between times in the effect of treatment

To estimate the difference between treatment B and treatment A at time 2, we need to include the interaction effects:

# A - B at time 2
coefs <- coef(twoway.model)
coefs["treatmentB"] + coefs["treatmentB:timetime2"]
## treatmentB
## 0.3109271

We can see from summary that one of the interaction effects is significant. Here’s what that interaction effect looks like graphically:

interaction.plot(x.factor = dat$time, trace.factor = dat$treatment, response = dat$expression)

Another Parameterization
In a multifactor model, estimating contrasts can be fiddly, especially with lots of factors or levels. Here is an equivalent way to estimate the same two-way ANOVA model that gives easier
contrasts:
First, define a new variable that combines the information from the treatment and time variables

dat$tx.time <- interaction(dat$treatment, dat$time)

dat$tx.time
## [1] A.time1 A.time2 A.time1 A.time2 A.time1 B.time2 B.time1 B.time2
## [9] B.time1 B.time2 C.time1 C.time2 C.time1 C.time2 C.time1 D.time2
## [17] D.time1 D.time2 D.time1 D.time2 E.time1 E.time2 E.time1 E.time2
## [25] E.time1
## 10 Levels: A.time1 B.time1 C.time1 D.time1 E.time1 A.time2 ... E.time2
table(dat$tx.time, dat$treatment)
##
## A B C D E
## A.time1 3 0 0 0 0
## B.time1 0 2 0 0 0
## C.time1 0 0 3 0 0
## D.time1 0 0 0 2 0
## E.time1 0 0 0 0 3
## A.time2 2 0 0 0 0
## B.time2 0 3 0 0 0
## C.time2 0 0 2 0 0
## D.time2 0 0 0 3 0
## E.time2 0 0 0 0 2
table(dat$tx.time, dat$time)
##
## time1 time2
## A.time1 3 0
## B.time1 2 0
## C.time1 3 0
## D.time1 2 0
## E.time1 3 0
## A.time2 0 2
## B.time2 0 3
## C.time2 0 2
## D.time2 0 3
## E.time2 0 2

Next, fit a one-way ANOVA model with the new covariate. Don’t include an intercept in the model.

other.2way.model <- lm(expression ~ 0 + tx.time, data = dat)

summary(other.2way.model)
##
## Call:
## lm(formula = expression ~ 0 + tx.time, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0287 -0.4463 0.1082 0.4915 1.7623
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## tx.timeA.time1 0.9797 0.6924 1.415 0.177524
## tx.timeB.time1 1.3860 0.8480 1.634 0.122968
## tx.timeC.time1 1.9878 0.6924 2.871 0.011662 *
## tx.timeD.time1 4.0523 0.8480 4.779 0.000244 ***
## tx.timeE.time1 10.8414 0.6924 15.658 1.06e-10 ***
## tx.timeA.time2 1.4618 0.8480 1.724 0.105290
## tx.timeB.time2 1.7727 0.6924 2.560 0.021751 *
## tx.timeC.time2 2.2065 0.8480 2.602 0.020018 *
## tx.timeD.time2 3.5087 0.6924 5.068 0.000139 ***
## tx.timeE.time2 5.2040 0.8480 6.137 1.90e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.199 on 15 degrees of freedom
## Multiple R-squared: 0.9601, Adjusted R-squared: 0.9334
## F-statistic: 36.06 on 10 and 15 DF, p-value: 1.14e-08
coef(other.2way.model)
## tx.timeA.time1 tx.timeB.time1 tx.timeC.time1 tx.timeD.time1 tx.timeE.time1
## 0.9796511 1.3860189 1.9877837 4.0523062 10.8414488
## tx.timeA.time2 tx.timeB.time2 tx.timeC.time2 tx.timeD.time2 tx.timeE.time2
## 1.4617583 1.7726854 2.2064982 3.5087306 5.2039723
We get the same estimates for the effect of treatment B vs. A at time 1:

c1 <- coef(twoway.model)
c1["treatmentB"]
## treatmentB
## 0.4063679
c2 <- coef(other.2way.model)
c2["tx.timeB.time1"] - c2["tx.timeA.time1"]
## tx.timeB.time1
## 0.4063679

We get the same estimates for the effect of treatment B vs. A at time 2:

c1 <- coef(twoway.model)
c1["treatmentB"] + c1["treatmentB:timetime2"]
## treatmentB
## 0.3109271
c2 <- coef(other.2way.model)
c2["tx.timeB.time2"] - c2["tx.timeA.time2"]
## tx.timeB.time2
## 0.3109271

And we get the same estimates for the interaction effect (remembering that an interaction effect here is a difference of differences):

c1 <- coef(twoway.model)
c1["treatmentB:timetime2"]
## treatmentB:timetime2
## -0.09544075
c2 <- coef(other.2way.model)
(c2["tx.timeB.time2"] - c2["tx.timeA.time2"]) - (c2["tx.timeB.time1"] - c2["tx.timeA.time1"])
## tx.timeB.time2
## -0.09544075

(See https://www.bioconductor.org/packages/3.7/bioc/vignettes/limma/inst/doc/usersguide.pdf for more details on this parameterization)

Exercises and Things to Think About

 How much do the parameter estimates for treatment change when batch is added?
 The data frame dat has a column called ‘temperature’. What formula would you use if you wanted to look at differences between treatments, adjusting for temperature?

4. Continuous Covariates
Linear models with continuous covariates (“regression models”) are fitted in much the same way:

continuous.model <- lm(expression ~ temperature, data = dat)

summary(continuous.model)
##
## Call:
## lm(formula = expression ~ temperature, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.87373 -0.67875 -0.07922 1.00672 1.89564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.40718 0.93724 -10.04 7.13e-10 ***
## temperature 0.97697 0.06947 14.06 8.77e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.054 on 23 degrees of freedom
## Multiple R-squared: 0.8958, Adjusted R-squared: 0.8913
## F-statistic: 197.8 on 1 and 23 DF, p-value: 8.768e-13
coef(continuous.model)
## (Intercept) temperature
## -9.4071796 0.9769656

For the above model, the intercept is the expression at temperature 0 and the “temperature” coefficient is the slope, or how much expression increases for each unit increase in
temperature:

coefs <- coef(continuous.model)

plot(expression ~ temperature, data = dat)
abline(coefs)
text(x = 12, y = 10, paste0("expression = ", round(coefs[1], 2), "+", round(coefs[2], 2), "*temperature"))

The slope from a linear regression model is related to but not identical to the Pearson correlation coefficient:
cor.test(dat$expression, dat$temperature)
##
## Pearson's product-moment correlation
##
## data: dat$expression and dat$temperature
## t = 14.063, df = 23, p-value = 8.768e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8807176 0.9764371
## sample estimates:
## cor
## 0.9464761
summary(continuous.model)
##
## Call:
## lm(formula = expression ~ temperature, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.87373 -0.67875 -0.07922 1.00672 1.89564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.40718 0.93724 -10.04 7.13e-10 ***
## temperature 0.97697 0.06947 14.06 8.77e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.054 on 23 degrees of freedom
## Multiple R-squared: 0.8958, Adjusted R-squared: 0.8913
## F-statistic: 197.8 on 1 and 23 DF, p-value: 8.768e-13

Notice that the p-values for the correlation and the regression slope are identical.
Scaling and centering both variables yields a regression slope equal to the correlation coefficient:

scaled.mod <- lm(scale(expression) ~ scale(temperature), data = dat)

coef(scaled.mod)[2]
## scale(temperature)
## 0.9464761
cor(dat$expression, dat$temperature)
## [1] 0.9464761

Exercises and things to think about

 Look at the documentation for formula again using ?formula. How would you change the formula statement if you wanted to add a quadratic term?
 Convert temperature to Farenheit by replacing temperature with I(9/5*temperature + 32) in the model formula. Does the p-value for the association with expression change?
 For your experiment, what would the model formula look like?

Tutorial
No ratings yet
Tutorial
217 pages
A028 GLM-SC3
No ratings yet
A028 GLM-SC3
137 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Yaikob Second Assesiment Final
No ratings yet
Yaikob Second Assesiment Final
33 pages
Soruma SECOND ASSEsiment Final L Reg
No ratings yet
Soruma SECOND ASSEsiment Final L Reg
34 pages
20mia1006 FDA LAB REGRESSION TYPES
No ratings yet
20mia1006 FDA LAB REGRESSION TYPES
11 pages
Rallfun v37
No ratings yet
Rallfun v37
1,294 pages
RSM Tutorial
No ratings yet
RSM Tutorial
12 pages
Measures of Central Tendency and Variability
No ratings yet
Measures of Central Tendency and Variability
20 pages
Project-Report Sample
No ratings yet
Project-Report Sample
59 pages
Analysis Course HW3
No ratings yet
Analysis Course HW3
12 pages
R Cheat Sheet Merged
100% (2)
R Cheat Sheet Merged
35 pages
Previous QP
No ratings yet
Previous QP
9 pages
CHP 14 Example 7 Regression-Analysis
No ratings yet
CHP 14 Example 7 Regression-Analysis
2 pages
Analysis Course HW5
No ratings yet
Analysis Course HW5
7 pages
Midterm STAT380 (Part2) Rawan
No ratings yet
Midterm STAT380 (Part2) Rawan
9 pages
Lab Taskr
No ratings yet
Lab Taskr
6 pages
Make Up Cat
No ratings yet
Make Up Cat
6 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
soruma-SECOND-ASSEsiment L Reg
No ratings yet
soruma-SECOND-ASSEsiment L Reg
33 pages
R 5 Marks
No ratings yet
R 5 Marks
11 pages
STSA 3732 - 2025 - Tutorial 2 - MEMO
No ratings yet
STSA 3732 - 2025 - Tutorial 2 - MEMO
10 pages
TH C Hành ch4bt3 - SBT
No ratings yet
TH C Hành ch4bt3 - SBT
10 pages
Uni T - 2 - R Programming
No ratings yet
Uni T - 2 - R Programming
10 pages
Homework4 1
No ratings yet
Homework4 1
10 pages
R Tutorial
No ratings yet
R Tutorial
32 pages
ISYE6501 Homework 1
No ratings yet
ISYE6501 Homework 1
7 pages
R Programming
No ratings yet
R Programming
34 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
418 Material
No ratings yet
418 Material
16 pages
Glmnet
No ratings yet
Glmnet
42 pages
Exame Do Dia 13 12 2019
No ratings yet
Exame Do Dia 13 12 2019
8 pages
Ali
No ratings yet
Ali
31 pages
Lec 13
No ratings yet
Lec 13
46 pages
Huraira
No ratings yet
Huraira
26 pages
WEEK
No ratings yet
WEEK
17 pages
Formulario
No ratings yet
Formulario
7 pages
Amta - Final - Notes.r: ### Step Wise AIC Regression
No ratings yet
Amta - Final - Notes.r: ### Step Wise AIC Regression
6 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
Evaluation of Gas Hydrate in Gas Pipeline Transportation
No ratings yet
Evaluation of Gas Hydrate in Gas Pipeline Transportation
107 pages
R Assignment Final
No ratings yet
R Assignment Final
30 pages
Lab 5
No ratings yet
Lab 5
6 pages
Debarghya Das (Ba-1), 18021141033
No ratings yet
Debarghya Das (Ba-1), 18021141033
12 pages
R Programs
No ratings yet
R Programs
12 pages
Homework 2
No ratings yet
Homework 2
8 pages
R Cheat Sheet: 1. Basics 4. Input and Export of Data
100% (1)
R Cheat Sheet: 1. Basics 4. Input and Export of Data
4 pages
Chapter 6. Product Limit Stimator Peter Smith: Required Libraries
No ratings yet
Chapter 6. Product Limit Stimator Peter Smith: Required Libraries
9 pages
Lecture Notes Respiratory Medicine 9th Edition Stephen J. Bourke Instant Download
100% (1)
Lecture Notes Respiratory Medicine 9th Edition Stephen J. Bourke Instant Download
52 pages
R Examples
No ratings yet
R Examples
56 pages
Rooster
No ratings yet
Rooster
15 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Disposal of Unused Drugs - Knowledge and Behavior Among People Around The World
100% (1)
Disposal of Unused Drugs - Knowledge and Behavior Among People Around The World
34 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
80407049830 (5)
No ratings yet
80407049830 (5)
2 pages
Get Data Analytics For Accounting, 3rd Edition Vernon J. Richardson Free All Chapters
No ratings yet
Get Data Analytics For Accounting, 3rd Edition Vernon J. Richardson Free All Chapters
40 pages
Project
No ratings yet
Project
16 pages
ReadISACA QAE Databases On ISACA PERFORMTica
No ratings yet
ReadISACA QAE Databases On ISACA PERFORMTica
9 pages
Calculation of Electrical Induction Near Power Lines
No ratings yet
Calculation of Electrical Induction Near Power Lines
22 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
List of Lingerie Brands - Reader
No ratings yet
List of Lingerie Brands - Reader
2 pages
V003t07a004 88 GT 249
100% (1)
V003t07a004 88 GT 249
12 pages
Problem 4.1 A)
No ratings yet
Problem 4.1 A)
11 pages
HW8-smoother Tuning DIAL
100% (1)
HW8-smoother Tuning DIAL
5 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Game of Bitcoins Mega Airdrop Sheet
No ratings yet
Game of Bitcoins Mega Airdrop Sheet
9 pages
Bryson Yee Resume 2018-2019 Updated
No ratings yet
Bryson Yee Resume 2018-2019 Updated
2 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Ndoro and Another V Conjugal Enterprises (Private) Limited and Another (814 of 2022) 2022 ZWHHC 814 (16 November 2022)
No ratings yet
Ndoro and Another V Conjugal Enterprises (Private) Limited and Another (814 of 2022) 2022 ZWHHC 814 (16 November 2022)
7 pages
FCB UnO ControlCenter Manual
No ratings yet
FCB UnO ControlCenter Manual
30 pages
Module 1.session 3.ISCM.2021
No ratings yet
Module 1.session 3.ISCM.2021
18 pages
TOPIC 7 Unemployment
No ratings yet
TOPIC 7 Unemployment
13 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
4 pages
Corporate Governanceand Ethics
No ratings yet
Corporate Governanceand Ethics
8 pages
CAO 7-2022 Reference Materials
No ratings yet
CAO 7-2022 Reference Materials
1 page
Testing & Commissioning of Irrigation System
No ratings yet
Testing & Commissioning of Irrigation System
13 pages
How To Enable and Use Remote Desktop For Windows 10
No ratings yet
How To Enable and Use Remote Desktop For Windows 10
11 pages
SDS Underwater Cutting Rods 2018 PDF
100% (1)
SDS Underwater Cutting Rods 2018 PDF
8 pages
CB Model Gearbox Rebuild
No ratings yet
CB Model Gearbox Rebuild
7 pages
Shi 2023 J. Phys. Conf. Ser. 2459 012020
No ratings yet
Shi 2023 J. Phys. Conf. Ser. 2459 012020
11 pages
Social Security and Health Rights of Migrant Workers in India
No ratings yet
Social Security and Health Rights of Migrant Workers in India
4 pages
Redox
No ratings yet
Redox
2 pages
Use of Modified Bituminous Binders in India - Current Imperatives
100% (5)
Use of Modified Bituminous Binders in India - Current Imperatives
25 pages
University of The East Caloocan Campus
No ratings yet
University of The East Caloocan Campus
5 pages
Smartphone As A Tool For Different Applications
No ratings yet
Smartphone As A Tool For Different Applications
4 pages
Mathematics Principles V11
From Everand
Mathematics Principles V11
Clive W. Humphris
No ratings yet
Employability Skills: Brush Up Your Maths
From Everand
Employability Skills: Brush Up Your Maths
Clive W. Humphris
No ratings yet
GCSE Maths Teachers Pack V11
From Everand
GCSE Maths Teachers Pack V11
Clive W. Humphris
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A Brief Introduction To Linear Models in R

Uploaded by

A Brief Introduction To Linear Models in R

Uploaded by

A Brief Introduction to Linear Models in R

 RNA-Seq differential expression analyses

Understanding linear modelling in R can help in implementing these types of analyses.

We will not discuss:

 a continuous variable (age, weight, temperature, etc.)

The β’s are unknown parameters to be estimated.

dat <- read.csv("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2018-September-Bioinformatics-Prerequisites/master/friday/

oneway.model <- lm(expression ~ treatment, data = dat)

Note that this is a one-way ANOVA model.

coef() gives you model coefficients:

What do the model coefficients mean?

 (Intercept) is the mean of expression for treatment = A

# Get means in each treatment

coefs <- coef(oneway.model)

Treatment B - treatment A, no-intercept model:

coefs <- coef(no.intercept.model)

The Design Matrix

X <- model.matrix(~treatment, data = dat)

3. Adding More Covariates

batch.model <- lm(expression ~ treatment + batch, data = dat)

Two-Way ANOVA Models

twoway.model <- lm(expression ~ treatment*time, data = dat)

interaction.plot(x.factor = dat$time, trace.factor = dat$treatment, response = dat$expression)

dat$tx.time <- interaction(dat$treatment, dat$time)

other.2way.model <- lm(expression ~ 0 + tx.time, data = dat)

(See https://www.bioconductor.org/packages/3.7/bioc/vignettes/limma/inst/doc/usersguide.pdf for more details on this parameterization)

Exercises and Things to Think About

continuous.model <- lm(expression ~ temperature, data = dat)

coefs <- coef(continuous.model)

scaled.mod <- lm(scale(expression) ~ scale(temperature), data = dat)

Exercises and things to think about

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.