0% found this document useful (0 votes)

14 views44 pages

H-409 Multivariate Analysis With R and Stata

This document is a comprehensive guide on Multivariate Analysis using R and Stata, authored by Md. Mostakim during the 2018-19 academic session at the University of Dhaka. It includes various problems related to testing vector means, generating multivariate normal samples, and conducting MANOVA and multivariate regression, with detailed solutions and interpretations provided for each problem. The document also contains acknowledgments, references, and encourages feedback from readers.

Uploaded by

SHAKIB HASAN LIMON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views44 pages

H-409 Multivariate Analysis With R and Stata

Uploaded by

SHAKIB HASAN LIMON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Mu lt iv ariat e An alys is W ith R and Stat a

Md. Mostakim

Session: 2018-19

Department of Statistics

University of Dhaka

Published Date:

1st Published: August, 2023

Acknowledgements:

I would like to acknowledge our course teacher Dr. Md. Belal Hossain Sir for helping us learning
Multivariate Analysis. Also, thanks to Amina Siddika for her lecture.

N.B. You may share this pdf book as much as you like but don’t use it for any unethical purpose.

For any kind of feedback please contact to mostakimbd2016@gmail.com. Your feedback will be very inspiring for me.
Table of Contents
Problem 01: Testing Vector Means....................................................................................................... 1

Problem 02: Testing Vector Means....................................................................................................... 4

Problem 03: Testing Vector Means....................................................................................................... 7

Problem 04: Repeated Measure Design ........................................................................................... 10

Problem 05: Testing Vector Means.................................................................................................... 12

Problem 06: Testing Vector Means.................................................................................................... 12

Problem 07: Generating Multivariate Normal Samples ............................................................ 12

Problem 08: Generating Multivariate Normal Samples ............................................................ 15

Problem 09: Generating Multivariate Normal Samples ............................................................ 15

Problem 10: Test for Multivariate Normality ................................................................................ 16

Problem 11: MANOVA ............................................................................................................................ 19

Problem 12: Multivariate Regression ............................................................................................... 21

Problem 13: Multivariate Regression ............................................................................................... 27

Problem 14: Multivariate Regression ............................................................................................... 33

Problem 15: Multivariate Regression ................................................................................................... 39

References and Further Reading ........................................................................................................ 41

The results and interpretation from both R and Stata are quite similar, so only provide the output

and interpretation using R. We provide the Stata code starting with a dot (.).

Problem 01: Testing Vector Means

Answer:

With R:

> # Create a matrix X with the variables

X <- matrix(c(2, 8, 6, 8, 12, 9, 9, 10), ncol = 2)

# Get the number of observations and variables

n <- nrow(X)
p <- ncol(X)

# Calculate the degrees of freedom for the test statistic

df2 <- n - p

# Calculate the sample means of the variables

xbar <- colMeans(X)

# Calculate the sample covariance matrix

S <- cov(X)

# Specify the hypothesized means

mu <- c(7, 11)

# Calculate the test statistic T^2

T2 <- n * t(xbar - mu) %*% solve(S) %*% (xbar - mu)

# Print the value of T^2

1|Page
> ## [,1]
## [1,] 13.63636

Therefore the value of 𝑇 2 is 13.64.

> ((n-1)*p)/(n-p)

> ## [1] 3

> c(df1=p,df2=n-p)

> ## df1 df2

## 2 2

Therefore, 𝑇 2 is distributed as 3F (2, 2)

> #Critical value approach

3*qf(0.05, 2, 2, lower.tail = FALSE)

> ## [1] 57

Since 𝑇 2 = 13.64 < 57 we failed to reject Ho at the ⍺=0.05 level. Therefore, we may conclude
that the means of column 1 and column 2 of X don’t differ significantly from the values 7, 11
respectively at 5% level of significance.

> #p-value approach

pval<-pf((n-p)*T2/((n-1)*p),p,n-p,lower.tail=F)
pval

> ## [,1]
## [1,] 0.1803279

Since p-value = 0.18 > 0.05 we failed to reject Ho at the ⍺=0.05 level. Therefore, we may
conclude that the means of column 1 and column 2 of X don’t differ significantly from the
values 7, 11 respectively at 5% level of significance.

With Stata:

2|Page
Type the data in data editor in STATA.

Code:

. matrix mu=(7, 11)

.. mvtest
mvtest means X1 X2,
means X1 X2, equals(mu)
equals(mu)

Test that means equal vector mu

Hotelling T2 = 13.64
Hotelling F(2,2) = 4.55
Prob > F = 0.1803

Menu:

Statistics > Multivariate Analysis > Manova, Multivariate Regression and Related >
Multivariate test of means, covariances and matrices

3|Page
Problem 02: Testing Vector Means

a. Evaluate 𝑇 2 for testing the hypothesis that Ho: µ = (20, 200, 150, 3) vs Ha: µ ≠ (20, 200, 150,
3) using the variables (“mpg”,“disp”,“hp”,“wt”) from the data “mtcars”.

b. Specify the distribution of 𝑇 2 for the situation in (a)

c. Using (a) and (b), test the Ho at the level 𝑎𝑙𝑝ℎ𝑎 = 0.01. What conclusion do you reach.

Answer:

> # Load the "mtcars" data set

data("mtcars")

# Create a matrix X with the variables "mpg", "disp", "hp", and "wt"
X <- matrix(c(mtcars$mpg, mtcars$disp, mtcars$hp, mtcars$wt), ncol = 4)

# Get the number of observations and variables

n <- nrow(X)

4|Page
p <- ncol(X)

# Calculate the degrees of freedom for the test statistic

df2 <- n - p

# Calculate the sample means of the variables

xbar <- colMeans(X)

# Calculate the sample covariance matrix

S <- cov(X)

# Specify the hypothesized means

mu <- c(20, 200, 150, 3)

# Calculate the test statistic T^2

T2 <- n * t(xbar - mu) %*% solve(S) %*% (xbar - mu)

# Print the value of T^2

> ## [,1]
## [1,] 10.78587

Therefore, the value of 𝑇 2 is 10.79.

> ((n-1)*p)/(n-p)

> ## [1] 4.428571

> c(df1=p,df2=n-p)

> ## df1 df2

## 4 28

Therefore 𝑇 2 is distributed as 4.43F(4,28)

> #Critical value approach

4.43*qf(0.01, 4, 28, lower.tail = FALSE)

> ## [1] 18.04796

5|Page
Since 𝑇 2 = 10.79 < 22.97 we failed to reject Ho at the ⍺=0.01 level. Therefore, we may conclude
that the means of “mpg”,“disp”,“hp”,“wt” don’t differ significantly from the values 20, 200, 150,
3 respectively at 1% level of significance.

> #p-value approach

pval<-pf((n-p)*T2/((n-1)*p),p,n-p,lower.tail=F)
pval

> ## [,1]
## [1,] 0.07058328

Since p-value = 0.07 > 0.01 we failed to reject Ho at the ⍺=0.01 level. Therefore, we may
conclude that the means of “mpg”,“disp”,“hp”,“wt” don’t differ significantly from the values 20,
200, 150, 3 respectively at 1% level of significance.

With Stata:

Firstly we have to extract the “mtcars” data from R to do this in R we write:

> library("foreign")
data("mtcars")
write.dta(mtcars, "C:/Users/Mostakim/Documents/mtcars.dta")

The data will be saved in users documents naming mtcars.

Then we have to work with Stata.

. use "C:\Users\Mostakim\Documents\mtcars.dta", clear

. matrix mu=(20, 200, 150, 3)

. mvtest means mpg disp hp wt, equals(mu)

Test that means equal vector mu

Hotelling T2 = 10.79

Hotelling F(4,28) = 2.44

Prob > F = 0.0706

6|Page
Problem 03: Testing Vector Means

Data for radiation from microwave ovens were introduced below.

4
let, 𝑋1 = √𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑟𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑑𝑜𝑜𝑟 𝑐𝑙𝑜𝑠𝑒𝑑

𝑋2 = 4√𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑟𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑑𝑜𝑜𝑟 𝑜𝑝𝑒𝑛

a. Evaluate 𝑇 2 for testing the hypothesis that Ho: µ = (0.562, 0.589) vs Ha: µ ≠ (0.562, 0.589)
using the variables X1 and X2

b. Specify the distribution of 𝑇 2 for the situation in (a)

c. Using (a) and (b), test the Ho at the level 𝑎𝑙𝑝ℎ𝑎 = 0.01. What conclusion do you reach.

d. Find the eigen values and eigen vectors of variance-covariance matrix

e. find the half -lengths of the major and minor axes

f. find the confidence ellipse

7|Page
Answer:

> radc<-
c(0.15,0.09,0.18,0.1,0.05,0.12,0.08,0.05,0.08,0.1,0.07,0.02,.01,0.1,0.1
,0.1,0.02,0.1,0.01,0.4,0.1,0.05,0.03,0.05,0.15,0.1,0.15,0.09,0.08,0.18,
0.1,0.2,0.11,0.3,0.02,0.2,0.2,0.3,0.3,0.4,0.3,0.05)
rado<-
c(0.30,0.09,0.30,0.10,0.10,0.12,0.09,0.10,0.09,0.10,0.07,0.05,0.01,0.45
,0.12,0.2,0.04,0.1,0.01,0.6,0.12,0.1,0.05,0.05,0.15,0.3,0.15,0.09,0.09,
0.28,0.1,0.1,0.1,0.3,0.12,0.25,0.2,0.4,0.33,0.32,0.12,0.12)
x1<-matrix(radc^(1/4))
x2<-matrix(rado^(1/4))
x<-cbind(x1,x2)
xbar <-colMeans(x)
s<-cov(x)
p <- 2
n <- 42
mu<-c(.562,.589)
T2<-n*t(xbar-mu)%*%solve(s)%*%(xbar-mu)
T2

> ## [,1]
## [1,] 1.2573

> ((n-1)*p)/(n-p)

> ## [1] 2.05

> c(df1=p,df2=n-p)

> ## df1 df2

## 2 40

Therefore, 𝑇 2 is distributed as 2.05F(2,40)

> #Critical value approach

2.05*qf(0.05, 2, 40, lower.tail = FALSE)

8|Page
> ## [1] 6.62504

Since the value of T2 is 1.25 < 6.62 We conclude that 𝜇 = [.562, .589] is in the region.
Equivalently, a test of 𝐻𝑜: 𝜇 = (.562, .589) would not be rejected in favor of 𝐻1 : 𝜇 ≠
(.562, .589) at the alpha=0.05 significance level. Therefore, the mean value of X1 and X2 do not
significantly differ from the value 0.562, 0.589 respectively.

> #p-value approach

pval<-pf((n-p)*T2/((n-1)*p),p,n-p,lower.tail=F)
pval

> ## [,1]
## [1,] 0.5465654

The p-value > 0.05 also indicates the same result of above.

> A <- as.matrix(s)

e <- eigen(A)
e$values

> ## [1] 0.026163638 0.002731895

> e$vectors

> ## [,1] [,2]

## [1,] 0.7041574 -0.7100439
## [2,] 0.7100439 0.7041574

> lambda1<-e$values[1]
lambda2<-e$values[2]

> f <- qf(0.05, 2, 40, lower.tail = FALSE)

v1 <- sqrt(lambda1*p*(n-1)*f/(n*(n-p))) #Half length of major axis
v1

> ## [1] 0.06424195

> v2 <- sqrt(lambda2p(n-1)f/(n(n-p))) #Half length of minor axis

> ## [1] 0.02075877

Half length of major axis is 0.06 and half length of minor axis is 0.02
9|Page
> v1/v2

> ## [1] 3.094689

The length of the major axis is 3.1 times the length of the minor axis.

The confidence ellipse will be:

Problem 04: Repeated Measure Design

Improved anesthetics are often developed by first studying their effects on animals. In one study,
19 dogs were initially given the drug pentobarbital. Each dog was then administered carbon
dioxide (C02) at each of two pressure levels. Next, halothane (H) was added, and the
administration of (C02) was repeated. The response, milliseconds between heartbeats, was
measured for the four treatment combinations: Sleep dog data link

Treatment 1 = high C02 pressure without H

Treatment 2 = low C02 pressure without H

Treatment 3 = high C02 pressure with H

10 | P a g e
Treatment 4 = low C02 pressure with H

Test whether there are significant differences in treatments.

Answer:

Book Applied Multivariate Statistical Analysis by Johnson. Chapter 6: Comparisons of Several

Multivariate Means.

With STATA:
. use "F:\Mostakim\4th Year\Stat H-409; Statistical computing VII;
Multivariate Analysis and Experimental Design\Data\sleepdog.dta", clear
. mvtest means T1 T2 T3 T4

Test that all means are the same

Hotelling T2 = 116.02
Hotelling F(3,16) = 34.38
Prob > F = 0.0000

Since p-value < 0.05 we may reject null hypothesis and conclude that at least two of the
treatment means are not equal at 5% level of significance.

11 | P a g e
Problem 05: Testing Vector Means

a. Evaluate 𝑇 2 for testing the hypothesis that Ho: µ= (0.8, 2, 60, 30, 2, 200) vs Ha: µ≠ (0.8, 2, 60,
30, 2, 200) using the following data.

Battery life data link

b. Specify the distribution of 𝑇 2 for the situation in (a)

c. Using (a) and (b), test the Ho at the level 𝑎𝑙𝑝ℎ𝑎 = 0.01. What conclusion do you reach?

Problem 06: Testing Vector Means

a. Evaluate 𝑇 2 for testing the hypothesis that Ho: µ= (4, 50, 10) vs Ha: µ≠ (4, 50, 10) using the
following data.

Sweat data

b. Specify the distribution of 𝑇 2 for the situation in (a)

c. Using (a) and (b), test the Ho at the level 𝑎𝑙𝑝ℎ𝑎 = 0.01. What conclusion do you reach?

Problem 07: Generating Multivariate Normal Samples

Draw a sample of 30 observations from a multivariate normal distribution 𝑁 (𝜇, 𝛴) where 𝜇′ =

(−3, 2, 1) is the mean matrix and 𝛴 = 𝑚𝑎𝑡𝑟𝑖𝑥(𝑐(3, −1,0, −1,3,0,0,0,3), 𝑛𝑟𝑜𝑤 = 3, 𝑛𝑐𝑜𝑙 = 3)
is the covariance matrix. Then draw scatter plots and box plots of the variables and discuss the
aspects of multivariate data.

Answer:

With R:

12 | P a g e
> library("mvtnorm") #Reading the required library

> ## Warning: package 'mvtnorm' was built under R version 4.2.3

> # Define the mean vector μ

mu <- c(-3, 2, 1)

# Define the covariance matrix Σ

Sigma <- matrix(c(3, -1, 0, -1, 3, 0, 0, 0, 3), nrow = 3, ncol = 3)

# Generate a sample of 30 observations from the multivariate normal

distribution
set.seed(82)
sample <- rmvnorm(n = 30, mean = mu, sigma = Sigma)

# Scatter plots
pairs(sample, main = "Scatter Plots")

> # Boxplot
boxplot(sample, main="Box plot")

13 | P a g e
By examining the scatter plots, we can observe patterns and correlations between the variables. If
the points cluster tightly around a straight line, it suggests a strong linear relationship. On the
other hand, if the points are scattered with no apparent pattern, it indicates a weak or no linear
relationship. In our case there exists no relationship between the variables. However there seems
a negative relation between var1 and var2.

The box plots provide information about the distribution of each variable, including measures
such as the median, quartiles, and potential outliers. They help us understand the central
tendency, variability, and skewness of the individual variables.

With Stata:

. matrix mu = (-3, 2, 1)

. matrix s =(3,-1,0\-1,3,0\0,0,3)

. drawnorm p q r, n(30) means(mu) cov(s) seed(82)

. graph matrix p q r

. graph box p q r

Menu:

Data > Create or change data > Other variable-creation commands > Draw sample from
normal distribution

14 | P a g e
Problem 08: Generating Multivariate Normal Samples

Draw a sample of 500 observations from a multivariate normal distribution 𝑁 (𝜇, 𝛴) where 𝜇′ =
9 5 2
(5, −6, 0.5) is the mean matrix and 𝛴 = 5 4 1 is the covariance matrix. Then draw scatter
2 1 1
plots and box plots of the variables and discuss the aspects of multivariate data.

Problem 09: Generating Multivariate Normal Samples

Draw a sample of 1000 observations from a multivariate normal distribution N (μ, Σ) where 𝜇′ =
1 −2 0
(−3, 1, 4) is the mean matrix and 𝛴 = −2 5 0 is the covariance matrix. Then draw scatter
0 0 2
plots and box plots of the variables and discuss the aspects of multivariate data.

15 | P a g e
Problem 10: Test for Multivariate Normality

Construct a Q-Q plot for the “mtcars” data set for the variables “mpg”,“disp”,“hp”,“wt” and
carry out a test for normality. What do you conclude?

Answer:

With R:

Constructing Q-Q plot to detect multivariate outliers with Mahalanobis Distance:

> data("mtcars")
x <- mtcars[,c("mpg","disp","hp","wt")]
n <- nrow(x)
p <- ncol(x)
xbar <- colMeans(x)
Sx <- cov(x)
D2 <- mahalanobis(x, xbar, Sx)

qqplot(qchisq(ppoints(n), df = p), D2, main ="Q-Q plot of Assessing

Multivariate Normality", ylab="Mahalanobis distance")
abline(a=0, b=1)

16 | P a g e
We observe from the Q-Q plot that the majority of the points closely align with the reference
line, indicating a reasonable approximation to normality. However, there is one point that
deviates noticeably from the line, which introduces some uncertainty in our decision-making
process. To further assess the multivariate normality, we can perform the “mshapiro.test” to
obtain a formal statistical test.

Testing multivariate normality by “mshapiro.test”.

> library("mvnormtest")
mshapiro.test(t(x)) #Shapiro-Wilk Multivariate Normality Test

> ##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.82883, p-value = 0.0001491

Ho (null): The variables follow a multivariate normal distribution.

Ha (alternative): The variables do not follow a multivariate normal distribution.

Since p-value=0.000 < 0.01 we may reject null hypothesis at 1% level of significance and may
conclude that the variables “mpg”,“disp”,“hp”,“wt” may not follow multivariate normal
distribution.

With Stata:

Firstly we have to extract the “mtcars” data from R to do this in R we write:

> library("foreign")
data("mtcars")
write.dta(mtcars, "C:/Users/Mostakim/Documents/mtcars.dta")

The data will be saved in users documents naming mtcars.

Then we have to work with Stata.

. use "C:\Users\Mostakim\Documents\mtcars.dta", clear

. mvtest normality mpg disp hp wt

17 | P a g e
Test for multivariate normality

Doornik-Hansen chi2(8) = 37.792 Prob>chi2 = 0.0000

Menu:
File > Open > mtcars. Dta;
Statistics > Multivariate Analysis > Manova, Multivariate Regression and Related >
Multivariate test of means, covariances and matrices

For Q-Q plot we have to construct Q-Q plot for each of the variable as there is no straight
forward method for calculating Mahalanobis Distance for four variables in Stata.

Code:

. qnorm mpg

. qnorm disp

. qnorm hp

. qnorm wt

Menu:
Statistics > Summaries, tables, and tests > Distributional plots and tests > Normal quantile
plot

18 | P a g e
Problem 11: MANOVA

Construct MANOVA for gear and carb factor of the data “mtcars” considering mpg, disp, wt as
dependent variables and comment the results.

Answer:

The hypotheses are:

𝐻10 : 𝐺𝑒𝑎𝑟 ℎ𝑎𝑠 𝑛𝑜 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

𝐻11 : 𝐺𝑒𝑎𝑟 ℎ𝑎𝑠 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

𝐻20 : 𝐶𝑎𝑟𝑏 ℎ𝑎𝑠 𝑛𝑜 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

𝐻21 : 𝐶𝑎𝑟𝑏 ℎ𝑎𝑠 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

𝐻30 : 𝑇ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑓 𝑔𝑒𝑎𝑟 𝑎𝑛𝑑 𝑐𝑎𝑟𝑏 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

𝐻31 : 𝑇ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑓 𝑔𝑒𝑎𝑟 𝑎𝑛𝑑 𝑐𝑎𝑟𝑏 𝑖𝑠 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑒𝑓𝑓𝑒𝑐𝑡 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

With R:

> data(mtcars)
Y <- cbind(mtcars[,1],mtcars[,3],mtcars[,6]) #Defining the dependent
variables as Y
gear<-mtcars[,10]
carb<-mtcars[,11]
fit <- manova(Y ~ gear*carb)
summary(fit, test="Pillai")

> ## Df Pillai approx F num Df den Df Pr(>F)

## gear 1 0.58367 12.1502 3 26 3.699e-05 ***
## carb 1 0.67904 18.3357 3 26 1.345e-06 ***
## gear:carb 1 0.09786 0.9401 3 26 0.4355
## Residuals 28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comment:

The p-value for both gear and carb is less that 0.05, so, we may reject null hypothesis at 5% level
of significance and may conclude that gear and carb has significant effect on the model. But the

19 | P a g e
p-value for the interaction effect of gear and garb is 0.4355 > 0.05, so, we failed to reject null
hypothesis of interaction effect at 5% level of significance and conclude that there is no
interaction effect of gear and carb on the model.

Therefore, the MANOVA results indicate that both gear and carb have significant main effects
on the dependent variables (mpg, hp, and wt). However, the interaction effect between gear and
carb is not statistically significant. These findings suggest that gear and carb independently
influence the dependent variables but do not interact significantly with each other.

With Stata:

Code:

. use "C:\Users\Mostakim\Documents\mtcars.dta", clear

. manova mpg disp wt = gear##carb

Menu:

Statistics > Multivariate Analysis > Manova, Multivariate Regression and Related > MANOVA

20 | P a g e
Problem 12: Multivariate Regression

Use the dataset “mtcars” to fit multivariate regression model. Use “mpg”, “disp”, “wt” as
dependent variables and “gear” and “carb” as independent variables.

Analyze the residuals and comment on your results.

Answer:

> data(mtcars)
y<-cbind(mtcars[,1],mtcars[,3],mtcars[,6])
gear<-mtcars[,10]
carb<-mtcars[,11]
fit <- lm(y~gear+carb)
summary(fit,method ="Pillai")

> ## Response Y1 :
##
## Call:
## lm(formula = Y1 ~ gear + carb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3385 -2.5873 0.3211 1.3742 7.0758
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.2756 2.9465 2.469 0.0197 *
## gear 5.5756 0.8129 6.859 1.56e-07 ***
## carb -2.7537 0.3713 -7.416 3.59e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.211 on 29 degrees of freedom
## Multiple R-squared: 0.7344, Adjusted R-squared: 0.7161
## F-statistic: 40.09 on 2 and 29 DF, p-value: 4.481e-09
##
##
## Response Y2 :
##
## Call:
## lm(formula = Y2 ~ gear + carb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -111.22 -46.33 -12.41 45.84 224.61
##

21 | P a g e
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 547.621 71.279 7.683 1.80e-08 ***
## gear -120.567 19.664 -6.131 1.11e-06 ***
## carb 45.402 8.982 5.055 2.18e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 77.69 on 29 degrees of freedom
## Multiple R-squared: 0.6325, Adjusted R-squared: 0.6071
## F-statistic: 24.95 on 2 and 29 DF, p-value: 4.977e-07
##
##
## Response Y3 :
##
## Call:
## lm(formula = Y3 ~ gear + carb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97574 -0.33262 -0.03964 0.24969 1.05929
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.83881 0.49694 11.750 1.51e-12 ***
## gear -1.00441 0.13709 -7.326 4.53e-08 ***
## carb 0.38478 0.06262 6.144 1.07e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5416 on 29 degrees of freedom
## Multiple R-squared: 0.7134, Adjusted R-squared: 0.6936
## F-statistic: 36.09 on 2 and 29 DF, p-value: 1.352e-08

Coefficient of gear: For increasing no. of forward gears (gear) by one unit holding other
regressors constant, Miles/gallon (mpg) increases by 5.57 units on average. The p-value < 0.05
indicates that gear significantly affects mpg at 5% level of significance.

Also, For increasing no. of forward gears (gear) by one unit holding other regressors constant,
displacement (disp) decreases by 120.57 units on average. The p-value < 0.05 indicates that gear
significantly affects disp at 5% level of significance.

Furthermore, For increasing no. of forward gears (gear) by one unit holding other regressors
constant, weight (wt) decreases by 1 units on average. The p-value < 0.05 indicates that gear
significantly affects wt at 5% level of significance.

22 | P a g e
Similarly the interpretation goes for other coefficients.

> library(olsrr)

Performing Residual Analysis:

• Residuals will have a constant variance (Homoscedasticity)

• Residuals will be approximately normally distributed (with a mean of zero) (Normality)

• Residuals will be independent of one another. (Independence)

> model1 <- lm(mpg~gear+carb,data=mtcars)

ols_plot_diagnostics(model1)

23 | P a g e
24 | P a g e
Homoscedasticity:

Residual vs fitted plot shows somewhat pattern indicates that the constant variance assumption is
may or may not met. We will conduct Breusch-Pagan test to confirm homoscedasticity.

> library(lmtest)

> bptest(model1)

> ##
## studentized Breusch-Pagan test
##
## data: model1
## BP = 6.0304, df = 2, p-value = 0.04904

Breusch-Pagan test hypothesis:

Ho: There exists no heteroscedasticity

Ha: There exist heteroscedasticity

From Breusch-Pagan test, the p-value is less than 0.05, we may reject the null hypothesis and
may conclude that heteroscedasticity is present in the regression model. Therefore,
homoscedasticity assumption is not fulfilled.

Normality:

From Q-Q plot we can see that most of the residuals lie along with the reference line however,
some of the residuals deviate from the reference line indicates that the residuals may not come
from normal distribution. We will check this result by Shapiro-Wilk normality test:

> shapiro.test(model1$residuals)

> ##
## Shapiro-Wilk normality test
##
## data: model1$residuals
## W = 0.94706, p-value = 0.1189

Shapiro-Wilk Normality Test:

Ho: The data follow normal distribution

Ha: The data do not follow normal distribution

25 | P a g e
Since the p-value = 0.1189 > 0.05 we failed to reject null hypothesis and may conclude that the
residuals follow normal distribution at 5% level of significance. Therefore, the normality
assumptions is satisfied.

Independence:

> plot(sample(1:32), model1$residuals)

The plot of residuals against time order does not exhibit any systematic pattern, indicates that
there is no presence of autocorrelation among the residuals that is the residuals are independent.
Therefore, the independence assumption is satisfied.

Similarly check model2 and model3.

> model2 <- lm(disp~gear+carb,data=mtcars)

model3 <- lm(wt~gear+carb,data=mtcars)

Stata:
. use "C:\Users\Mostakim\Documents\mtcars.dta", clear

. mvreg mpg disp wt = gear carb

26 | P a g e
. mvreg, notable noheader corr

Menu:
Statistics > Multivariate Analysis > Manova, Multivariate Regression and Related >
Multivariate regression;
Linear models and related > Regression diagnostics >

Problem 13: Multivariate Regression

Satellite applications motivated the development of a silver-zinc battery. Table 7.5 contains
failure data collected to characterize the performance of the battery during its life cycle. Use
these data to Battery life data link

27 | P a g e
(a) Find the estimated linear regression of on an appropriate ("best") subset of predictor
variables.

(b) Analyze the residuals and comment on your results.

Answer:

Considering Y as our dependent variable.

> data<-read.table("F:/Mostakim/4th Year/Stat H-409; Statistical

computing VII; Multivariate Analysis and Experimental
Design/Data/Battery life.txt", header = TRUE)
y2<-c(3,4,5,8,2,3,6,3,6,9,5,2,10,5,9,8,2,3,12,4)
Y<-cbind(data$y,y2)
model<-lm(y~z1+z2+z3+z4+z5,data=data)
summary(model)

> ##
## Call:
## lm(formula = y ~ z1 + z2 + z3 + z4 + z5, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -184.715 -30.446 2.968 26.375 147.850
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2937.7571 4040.6401 -0.727 0.47918
## z1 -33.7934 43.3653 -0.779 0.44879
## z2 -0.1798 13.9073 -0.013 0.98987
## z3 -1.7397 1.3414 -1.297 0.21564
## z4 7.0627 1.9728 3.580 0.00302 **
## z5 1529.2897 2020.2396 0.757 0.46161
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 84.49 on 14 degrees of freedom
## Multiple R-squared: 0.5201, Adjusted R-squared: 0.3487
## F-statistic: 3.034 on 5 and 14 DF, p-value: 0.04627

From the above table, we can see that p-value for z2 variable is the highest so we remove z2 from
our model.

> model1<-lm(y~z1+z4+z3+z5,data=data)
summary(model1)

28 | P a g e
> ##
## Call:
## lm(formula = y ~ z1 + z4 + z3 + z5, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -184.735 -30.235 3.275 26.424 147.452
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2934.986 3898.158 -0.753 0.46315
## z1 -33.776 41.875 -0.807 0.43251
## z4 7.063 1.906 3.706 0.00211 **
## z3 -1.743 1.276 -1.365 0.19227
## z5 1527.713 1948.190 0.784 0.44515
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 81.62 on 15 degrees of freedom
## Multiple R-squared: 0.5201, Adjusted R-squared: 0.3921
## F-statistic: 4.064 on 4 and 15 DF, p-value: 0.01991

From the above table, we can see that p-value for z5 variable is the highest so we remove z5 from
our model.

> model2<-lm(y~z1+z4+z3,data=data)
summary(model2)

> ##
## Call:
## lm(formula = y ~ z1 + z4 + z3, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -179.602 -26.148 -2.675 21.164 166.585
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 120.591 109.814 1.098 0.28840
## z1 -33.771 41.368 -0.816 0.42629
## z4 6.891 1.870 3.685 0.00201 **
## z3 -1.716 1.260 -1.361 0.19223
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 80.64 on 16 degrees of freedom
## Multiple R-squared: 0.5004, Adjusted R-squared: 0.4067
## F-statistic: 5.342 on 3 and 16 DF, p-value: 0.009653

29 | P a g e
From the above table, we can see that p-value for z1 variable is the highest so we remove z1 from
our model.

> model3<-lm(y~z3+z4,data=data)
summary(model3)

> ##
## Call:
## lm(formula = y ~ z3 + z4, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -160.32 -26.94 -11.31 31.07 171.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.224 82.528 0.754 0.46118
## z3 -1.399 1.187 -1.178 0.25496
## z4 7.075 1.838 3.849 0.00129 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79.84 on 17 degrees of freedom
## Multiple R-squared: 0.4796, Adjusted R-squared: 0.4184
## F-statistic: 7.833 on 2 and 17 DF, p-value: 0.003881

From the above table, we can see that p-value for z3 variable is the highest so we remove z3 from
our model.

> model4<-lm(y~z4,data=data)
summary(model4)

> ##
## Call:
## lm(formula = y ~ z4, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -153.37 -43.61 -11.61 31.08 200.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -22.842 40.401 -0.565 0.5788
## z4 6.930 1.854 3.739 0.0015 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##

30 | P a g e
## Residual standard error: 80.7 on 18 degrees of freedom
## Multiple R-squared: 0.4371, Adjusted R-squared: 0.4058
## F-statistic: 13.98 on 1 and 18 DF, p-value: 0.001504

Therefore, our final model with best regressor will be: 𝑌̂ = −22.84 + 6.93𝑧4

Residuals Diagnostics:

i. Plot the residuals against the predicted values, no systematic pattern indicates equal
variances and no dependence on 𝑦̂ that is the model is good.

ii. Plot the residuals against a predictor variable, systematic pattern suggests the need for
more terms in the model. No systematic pattern indicates the model is fine.

iii. Q-Q plots of the residuals, if the residuals lie along the reference line then they are
normally distributed and the model is adequate.

iv. Plot the residuals versus time, no systematic pattern indicates that the residuals are
independent and the model is adequate.

> qqnorm(model4$residuals); qqline(model4$residuals)

31 | P a g e
> plot(model4$fitted.values,model4$residuals); abline(h=0)

> plot(data$z4,model$residuals); abline(h=0)

32 | P a g e
> plot(sample(1:20), model4$residuals); abline(h=0)

We leave the interpretation for the above plots as an exercise for the reader!

Similarly, find the best predictors for the response variable Y2.

Problem 14: Multivariate Regression

Amitriptyline is prescribed by some physicians as an antidepressant. However, there are also
conjectured side effects that seem to be related to the use of the drug: irregular heartbeat,
abnormal blood pressures, and irregular waves on the electrocardiogram, among other things.
Data gathered on 17 patients who were admitted to the hospital after an amitriptyline overdose
are given in Table 7.6. The two response variables are: Amitriptyline data link

Y1 == Total TCAD plasma level (TOT)

Y2 == Amount of amitriptyline present in TCAD plasma level (AMI)

The five predictor variables are:

33 | P a g e
Z1 == Gender: 1 if female, 0 if male (GEN)

Z2 == Amount of antidepressants taken at time of overdose (AMT)

Z3 == PR wave measurement (PR)

Z4 == Diastolic blood pressure (DIAP)

Z5 == QRS wave measurement (QRS)

Perform a multivariate multiple regression analysis using both responses and

i. Suggest and fit appropriate linear regression models.

ii. Analyze the residuals
iii. Comment

Answer:

34 | P a g e
> data<-read.csv("F:/Mostakim/4th Year/Stat H-409; Statistical computing
VII; Multivariate Analysis and Experimental
Design/Data/amitriptyline.csv", header=TRUE)
Y<-cbind(data$y1,data$y2)
model<-lm(Y~z1+z2+z3+z4+z5, data=data)
summary(model)
> ## Response Y1 :
##
## Call:
## lm(formula = Y1 ~ z1 + z2 + z3 + z4 + z5, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -399.2 -180.1 4.5 164.1 366.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.879e+03 8.933e+02 -3.224 0.008108 **
## z1 6.757e+02 1.621e+02 4.169 0.001565 **
## z2 2.848e-01 6.091e-02 4.677 0.000675 ***
## z3 1.027e+01 4.255e+00 2.414 0.034358 *
## z4 7.251e+00 3.225e+00 2.248 0.046026 *
## z5 7.598e+00 3.849e+00 1.974 0.074006 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 281.2 on 11 degrees of freedom
## Multiple R-squared: 0.8871, Adjusted R-squared: 0.8358
## F-statistic: 17.29 on 5 and 11 DF, p-value: 6.983e-05
##
##
## Response Y2 :
##
## Call:
## lm(formula = Y2 ~ z1 + z2 + z3 + z4 + z5, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -373.85 -247.29 -83.74 217.13 462.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.729e+03 9.288e+02 -2.938 0.013502 *
## z1 7.630e+02 1.685e+02 4.528 0.000861 ***
## z2 3.064e-01 6.334e-02 4.837 0.000521 ***
## z3 8.896e+00 4.424e+00 2.011 0.069515 .
## z4 7.206e+00 3.354e+00 2.149 0.054782 .
## z5 4.987e+00 4.002e+00 1.246 0.238622
## ---

35 | P a g e
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 292.4 on 11 degrees of freedom
## Multiple R-squared: 0.8764, Adjusted R-squared: 0.8202
## F-statistic: 15.6 on 5 and 11 DF, p-value: 0.0001132

For both response variable Y1 and Y2, z5 is not significant so we can drop z5 from our model.

> model1<-lm(Y~z1+z2+z3+z4, data=data)

summary(model1)
> ## Response Y1 :
##
## Call:
## lm(formula = Y1 ~ z1 + z2 + z3 + z4, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -360.64 -192.74 -44.95 239.31 435.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.154e+03 9.071e+02 -2.374 0.035121 *
## z1 6.505e+02 1.800e+02 3.614 0.003555 **
## z2 3.126e-01 6.603e-02 4.735 0.000485 ***
## z3 1.049e+01 4.739e+00 2.214 0.046955 *
## z4 5.951e+00 3.518e+00 1.692 0.116499
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 313.3 on 12 degrees of freedom
## Multiple R-squared: 0.8471, Adjusted R-squared: 0.7961
## F-statistic: 16.62 on 4 and 12 DF, p-value: 7.772e-05
##
##
## Response Y2 :
##
## Call:
## lm(formula = Y2 ~ z1 + z2 + z3 + z4, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -340.61 -193.34 -15.12 159.58 453.75
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.252e+03 8.658e+02 -2.601 0.023164 *
## z1 7.465e+02 1.718e+02 4.345 0.000953 ***
## z2 3.246e-01 6.303e-02 5.150 0.000241 ***
## z3 9.040e+00 4.524e+00 1.998 0.068837 .

36 | P a g e
## z4 6.352e+00 3.358e+00 1.892 0.082896 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 299.1 on 12 degrees of freedom
## Multiple R-squared: 0.8589, Adjusted R-squared: 0.8119
## F-statistic: 18.27 on 4 and 12 DF, p-value: 4.847e-05

For both response Y1 and Y2, z4 is not significant, so we can drop z5 from our model.

> model2<-lm(Y~z1+z2+z3, data=data)

summary(model2)
> ## Response Y1 :
##
## Call:
## lm(formula = Y1 ~ z1 + z2 + z3, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -597.48 -189.26 -61.15 204.74 552.22
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.328e+03 8.174e+02 -1.625 0.12824
## z1 5.582e+02 1.834e+02 3.044 0.00942 **
## z2 2.583e-01 6.169e-02 4.187 0.00106 **
## z3 8.578e+00 4.921e+00 1.743 0.10487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 335 on 13 degrees of freedom
## Multiple R-squared: 0.8106, Adjusted R-squared: 0.7669
## F-statistic: 18.55 on 3 and 13 DF, p-value: 5.575e-05
##
##
## Response Y2 :
##
## Call:
## lm(formula = Y2 ~ z1 + z2 + z3, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -587.45 -128.73 44.91 186.38 512.45
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.371e+03 7.989e+02 -1.716 0.109859
## z1 6.480e+02 1.792e+02 3.615 0.003138 **
## z2 2.666e-01 6.028e-02 4.423 0.000688 ***

37 | P a g e
## z3 6.998e+00 4.809e+00 1.455 0.169336
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 327.4 on 13 degrees of freedom
## Multiple R-squared: 0.8169, Adjusted R-squared: 0.7746
## F-statistic: 19.33 on 3 and 13 DF, p-value: 4.498e-05

For both response Y1 and Y2, z3 is not significant, so we can drop z3 from our model.

> model3<-lm(Y~z1+z2, data=data)

summary(model3)
> ## Response Y1 :
##
## Call:
## lm(formula = Y1 ~ z1 + z2, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -756.05 -190.68 -59.83 203.32 560.84
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.72005 206.70337 0.274 0.7878
## z1 507.07308 193.79082 2.617 0.0203 *
## z2 0.32896 0.04978 6.609 1.17e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 358.6 on 14 degrees of freedom
## Multiple R-squared: 0.7664, Adjusted R-squared: 0.733
## F-statistic: 22.96 on 2 and 14 DF, p-value: 3.8e-05
##
##
## Response Y2 :
##
## Call:
## lm(formula = Y2 ~ z1 + z2, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -716.80 -135.83 -23.16 182.27 695.97
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -241.34791 196.11640 -1.231 0.23874
## z1 606.30967 183.86521 3.298 0.00529 **
## z2 0.32425 0.04723 6.866 7.73e-06 ***
## ---

38 | P a g e
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 340.2 on 14 degrees of freedom
## Multiple R-squared: 0.787, Adjusted R-squared: 0.7566
## F-statistic: 25.87 on 2 and 14 DF, p-value: 1.986e-05

Both z1 and z2 are significant to our model. Therefore, backward elimination method terminates
and we got our final model containing z1 and z2 as our predictors.

We leave the remaining problem as an exercise for the readers!

Problem 15: Multivariate Regression

Consider the air-pollution data in Table 1.5. Let Y1 = N02 and Y2 = 03 be the two responses
(pollutants) corresponding to the predictor variables x1 = wind, x2 = solar radiation, x3 = CO, x4
= NO, and x7 = HC. Air pollution

Type the data in Excel/SPSS file and read it in R/STATA to answer the following questions:

(a) Perform a regression analysis using only the first response Y1.

(i) Suggest and fit appropriate (with best predictors) linear regression models.

(ii) Analyze the residuals.

(iii) Construct a 95% prediction interval for N02 corresponding to z1 = 10 and z2 = 80.

(b) Perform a multivariate multiple regression analysis using both responses Y1 and Y2.

(i) Suggest and fit appropriate linear regression models.

(ii) Analyze the residuals.

(iii) Construct a 95% prediction ellipse for both N02 and 03 for z1 = 10 and z2 = 80.
Compare this ellipse with the prediction interval in Part a (iii). Comment.

39 | P a g e
Answer:

Try yourself!

40 | P a g e
References and Further Reading

1. Theory and Practical Lectures of Dr. Md. Belal Hossain Sir.

2. Applied Multivariate Statistical Analysis by Johnson.

3. An Introduction to Multivariate Statistical Analysis by Anderson.

41 | P a g e

Hypothesis Testing
No ratings yet
Hypothesis Testing
10 pages
Quick Revision of Bio Phy Che 9 Hours
100% (2)
Quick Revision of Bio Phy Che 9 Hours
489 pages
GForce System
0% (1)
GForce System
12 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Espiritu Labex4
No ratings yet
Espiritu Labex4
6 pages
Data Warehousing, Mining, Neural Network
No ratings yet
Data Warehousing, Mining, Neural Network
26 pages
Technological Applied Sciences: DOI Orcid Id Corresponding Author
No ratings yet
Technological Applied Sciences: DOI Orcid Id Corresponding Author
7 pages
Inferences - Mean - Vector
No ratings yet
Inferences - Mean - Vector
30 pages
TP MSDC 1 Sujet
No ratings yet
TP MSDC 1 Sujet
3 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
573 Final
No ratings yet
573 Final
10 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
R Commands New 2
No ratings yet
R Commands New 2
23 pages
Practical Files 2018
No ratings yet
Practical Files 2018
75 pages
Lab6 - Hypothesis Testing and Confidence Intervals in R
No ratings yet
Lab6 - Hypothesis Testing and Confidence Intervals in R
3 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
10 Meanvector PDF
No ratings yet
10 Meanvector PDF
10 pages
Semester Test 2 Memo (1) - 2014
No ratings yet
Semester Test 2 Memo (1) - 2014
7 pages
Chap 1
No ratings yet
Chap 1
14 pages
Hypothesis Tests in R
No ratings yet
Hypothesis Tests in R
25 pages
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
No ratings yet
Analysing and Presenting Data: Practical Hints: Daniele CEI, Giorgio MATTEI
53 pages
STAT456 Study Guide
No ratings yet
STAT456 Study Guide
31 pages
Hotelling T Square
No ratings yet
Hotelling T Square
4 pages
HLST 2301 Notes Print Me
No ratings yet
HLST 2301 Notes Print Me
29 pages
22-23 323 Week6Notes
No ratings yet
22-23 323 Week6Notes
28 pages
Lab Checkup Notes 2
No ratings yet
Lab Checkup Notes 2
7 pages
T Tests
No ratings yet
T Tests
6 pages
Hypothesis
No ratings yet
Hypothesis
16 pages
Statistical Hypothesis Testing
No ratings yet
Statistical Hypothesis Testing
20 pages
Squared Ranks For Variance
No ratings yet
Squared Ranks For Variance
7 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
Lab6 - HT and CI in R Some Solutions
No ratings yet
Lab6 - HT and CI in R Some Solutions
7 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
Hypthesis Testing 1sample+exercises
No ratings yet
Hypthesis Testing 1sample+exercises
28 pages
Test On Variables: in Surveys, The Foolish Ask Questions, Wise Cannot Answers
No ratings yet
Test On Variables: in Surveys, The Foolish Ask Questions, Wise Cannot Answers
24 pages
Hypothesis Testing in Stata PDF
No ratings yet
Hypothesis Testing in Stata PDF
9 pages
Introduction of Hotelling T-Square
No ratings yet
Introduction of Hotelling T-Square
11 pages
Ex 2301 Eng
No ratings yet
Ex 2301 Eng
6 pages
Module 3 Hypothesis Testing Using R
No ratings yet
Module 3 Hypothesis Testing Using R
7 pages
Ttest
No ratings yet
Ttest
16 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
8 pages
Methods of Multivariate Statistics: Study Guide For
No ratings yet
Methods of Multivariate Statistics: Study Guide For
219 pages
Manova 1
No ratings yet
Manova 1
68 pages
AD3411 - 6 To11
No ratings yet
AD3411 - 6 To11
15 pages
Module2 Analytical Tool
No ratings yet
Module2 Analytical Tool
25 pages
Statistics EXP-5
No ratings yet
Statistics EXP-5
10 pages
Task 5
No ratings yet
Task 5
3 pages
Module 4 T Test For Independent
No ratings yet
Module 4 T Test For Independent
8 pages
Chebyshev's Rule: Definitions: Sas Puts Out 2-Sided P-Values Rule/definitions Applications
No ratings yet
Chebyshev's Rule: Definitions: Sas Puts Out 2-Sided P-Values Rule/definitions Applications
6 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
CS1B Actuarial Statistics Solutions
No ratings yet
CS1B Actuarial Statistics Solutions
13 pages
Week 4
No ratings yet
Week 4
7 pages
Statwre Reviewer
No ratings yet
Statwre Reviewer
13 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
26 pages
R Module 11 - Statistics
No ratings yet
R Module 11 - Statistics
35 pages
HW12 Sol
No ratings yet
HW12 Sol
9 pages
InferenceForMeans Hotelling PDF
No ratings yet
InferenceForMeans Hotelling PDF
23 pages
RDocumentation - Func (Ttest)
No ratings yet
RDocumentation - Func (Ttest)
3 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
AnalysisandDesignofaSmallTwo BarCreepTestSpecimen
No ratings yet
AnalysisandDesignofaSmallTwo BarCreepTestSpecimen
14 pages
Master PDF of Calculative Arithmetic by GV SIR - 2
No ratings yet
Master PDF of Calculative Arithmetic by GV SIR - 2
1 page
EET302 M2-Ktunotes - in
No ratings yet
EET302 M2-Ktunotes - in
33 pages
Proshake Tutorial
No ratings yet
Proshake Tutorial
10 pages
Growth and Decay Basic Calculus Lesson Plan
No ratings yet
Growth and Decay Basic Calculus Lesson Plan
10 pages
Alligation and Mixture - Lecture
No ratings yet
Alligation and Mixture - Lecture
33 pages
Maths Paper Solving
No ratings yet
Maths Paper Solving
5 pages
Lecture 6.1
No ratings yet
Lecture 6.1
128 pages
Mastercam Lathe Lesson 7 CAMInstructor
100% (3)
Mastercam Lathe Lesson 7 CAMInstructor
56 pages
ISO 2768-2 - 1989 General Tolerances
No ratings yet
ISO 2768-2 - 1989 General Tolerances
12 pages
Cycle Counting in Fatigue Analysis: Standard Practices For
No ratings yet
Cycle Counting in Fatigue Analysis: Standard Practices For
10 pages
Function Varargout
No ratings yet
Function Varargout
7 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Unit V
No ratings yet
Unit V
22 pages
Mcqs
100% (1)
Mcqs
2 pages
Engineering Economy Take-Home Exam (Fall 2019-2020)
No ratings yet
Engineering Economy Take-Home Exam (Fall 2019-2020)
5 pages
Science Stem Lesson
No ratings yet
Science Stem Lesson
25 pages
3.OO Testing
No ratings yet
3.OO Testing
9 pages
Pipe Flow Calculations
No ratings yet
Pipe Flow Calculations
2 pages
Mathematics Grade 4
No ratings yet
Mathematics Grade 4
4 pages
154.3 Moments Centre of Mass CIE IGCSE Physics Practical QP
No ratings yet
154.3 Moments Centre of Mass CIE IGCSE Physics Practical QP
14 pages
Math5 DLLQ4W2 A
No ratings yet
Math5 DLLQ4W2 A
6 pages
Roller Deflection
No ratings yet
Roller Deflection
18 pages
Liquid Crystal As Phase Change Materials
No ratings yet
Liquid Crystal As Phase Change Materials
9 pages
Mission Planning Issues of Imaging Satellites Summ
No ratings yet
Mission Planning Issues of Imaging Satellites Summ
20 pages
Pythagorean Triples: Determine Whether Each Set of Numbers Form A Pythagorean Triple. 12, 20, 16 8, 15, 17 1, 7, 5
No ratings yet
Pythagorean Triples: Determine Whether Each Set of Numbers Form A Pythagorean Triple. 12, 20, 16 8, 15, 17 1, 7, 5
2 pages
Quantum Phases of Light Paper
No ratings yet
Quantum Phases of Light Paper
58 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

H-409 Multivariate Analysis With R and Stata

Uploaded by

H-409 Multivariate Analysis With R and Stata

Uploaded by

Mu lt iv ariat e An alys is W ith R and Stat a

1st Published: August, 2023

Problem 02: Testing Vector Means....................................................................................................... 4

Problem 03: Testing Vector Means....................................................................................................... 7

Problem 04: Repeated Measure Design ........................................................................................... 10

Problem 05: Testing Vector Means.................................................................................................... 12

Problem 06: Testing Vector Means.................................................................................................... 12

Problem 07: Generating Multivariate Normal Samples ............................................................ 12

Problem 08: Generating Multivariate Normal Samples ............................................................ 15

Problem 09: Generating Multivariate Normal Samples ............................................................ 15

Problem 10: Test for Multivariate Normality ................................................................................ 16

Problem 11: MANOVA ............................................................................................................................ 19

Problem 12: Multivariate Regression ............................................................................................... 21

Problem 13: Multivariate Regression ............................................................................................... 27

Problem 14: Multivariate Regression ............................................................................................... 33

Problem 15: Multivariate Regression ................................................................................................... 39

References and Further Reading ........................................................................................................ 41

Problem 01: Testing Vector Means

> # Create a matrix X with the variables

# Get the number of observations and variables

# Calculate the degrees of freedom for the test statistic

# Calculate the sample means of the variables

# Calculate the sample covariance matrix

# Specify the hypothesized means

# Calculate the test statistic T^2

# Print the value of T^2

Therefore the value of 𝑇 2 is 13.64.

> ## df1 df2

Therefore, 𝑇 2 is distributed as 3F (2, 2)

> #Critical value approach

> #p-value approach

. matrix mu=(7, 11)

Test that means equal vector mu

b. Specify the distribution of 𝑇 2 for the situation in (a)

> # Load the "mtcars" data set

# Get the number of observations and variables

# Calculate the degrees of freedom for the test statistic

# Calculate the sample means of the variables

# Calculate the sample covariance matrix

# Specify the hypothesized means

# Calculate the test statistic T^2

# Print the value of T^2

Therefore, the value of 𝑇 2 is 10.79.

> ## [1] 4.428571

> ## df1 df2

Therefore 𝑇 2 is distributed as 4.43F(4,28)

> #Critical value approach

> ## [1] 18.04796

> #p-value approach

Firstly we have to extract the “mtcars” data from R to do this in R we write:

The data will be saved in users documents naming mtcars.

Then we have to work with Stata.

. use "C:\Users\Mostakim\Documents\mtcars.dta", clear

. matrix mu=(20, 200, 150, 3)

. mvtest means mpg disp hp wt, equals(mu)

Test that means equal vector mu

Hotelling F(4,28) = 2.44

Prob > F = 0.0706

Data for radiation from microwave ovens were introduced below.

𝑋2 = 4√𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑟𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑑𝑜𝑜𝑟 𝑜𝑝𝑒𝑛

b. Specify the distribution of 𝑇 2 for the situation in (a)

d. Find the eigen values and eigen vectors of variance-covariance matrix

e. find the half -lengths of the major and minor axes

f. find the confidence ellipse

> ## [1] 2.05

> ## df1 df2

Therefore, 𝑇 2 is distributed as 2.05F(2,40)

> #Critical value approach

> #p-value approach

> A <- as.matrix(s)

> ## [1] 0.026163638 0.002731895

> ## [,1] [,2]

> f <- qf(0.05, 2, 40, lower.tail = FALSE)

> ## [1] 0.06424195

> v2 <- sqrt(lambda2*p*(n-1)*f/(n*(n-p))) #Half length of minor axis

> ## [1] 0.02075877

> v2 <- sqrt(lambda2p(n-1)f/(n(n-p))) #Half length of minor axis