0% found this document useful (0 votes)

13 views30 pages

R Stastics PDF

The document provides an overview of statistical analysis in R, focusing on functions for calculating mean, median, and mode, as well as linear and multiple regression techniques. It explains the syntax and parameters for relevant functions like mean(), median(), lm(), and glm(), along with examples demonstrating their application. Additionally, it covers the process of establishing relationships between variables and predicting outcomes using regression models.

Uploaded by

shubhamumapure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views30 pages

R Stastics PDF

Uploaded by

shubhamumapure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

R - Mean, Median and Mode

Previous
Next

Statistical analysis in R is performed by using many in-built

functions. Most of these functions are part of the R base package.
These functions take R vector as an input along with the
arguments and give the result.

The functions we are discussing in this chapter are mean, median

and mode.

Mean
It is calculated by taking the sum of the values and dividing with
the number of values in a data series.

The function mean() is used to calculate this in R.

Syntax

The basic syntax for calculating mean in R is −

mean(x, trim = 0, na.rm = FALSE, ...)

Following is the description of the parameters used −

• x is the input vector.

• trim is used to drop some observations from both end of the
sorted vector.
• na.rm is used to remove the missing values from the input
vector.

Example
Live Demo
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x)
print(result.mean)

When we execute the above code, it produces the following result

−

[1] 8.22

Applying Trim Option

When trim parameter is supplied, the values in the vector get
sorted and then the required numbers of observations are
dropped from calculating the mean.

When trim = 0.3, 3 values from each end will be dropped from
the calculations to find mean.

In this case the sorted vector is (−21, −5, 2, 3, 4.2, 7, 8, 12, 18,
54) and the values removed from the vector for calculating mean
are (−21,−5,2) from left and (12,18,54) from right.

Live Demo
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x,trim = 0.3)
print(result.mean)

When we execute the above code, it produces the following result

−

[1] 5.55

Applying NA Option
If there are missing values, then the mean function returns NA.

To drop the missing values from the calculation use na.rm =

TRUE. which means remove the NA values.

Live Demo
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5,NA)
# Find mean.
result.mean <- mean(x)
print(result.mean)

# Find mean dropping NA values.

result.mean <- mean(x,na.rm = TRUE)
print(result.mean)

When we execute the above code, it produces the following result

−

[1] NA
[1] 8.22

Median
The middle most value in a data series is called the median.
The median() function is used in R to calculate this value.

Syntax

The basic syntax for calculating median in R is −

median(x, na.rm = FALSE)

Following is the description of the parameters used −

• x is the input vector.

• na.rm is used to remove the missing values from the input
vector.

Example
Live Demo
# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find the median.

median.result <- median(x)
print(median.result)

When we execute the above code, it produces the following result

−
[1] 5.6

Mode
The mode is the value that has highest number of occurrences in
a set of data. Unike mean and median, mode can have both
numeric and character data.

R does not have a standard in-built function to calculate mode.

So we create a user function to calculate mode of a data set in R.
This function takes the vector as input and gives the mode value
as output.

Example
Live Demo
# Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Create the vector with numbers.

v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)

# Calculate the mode using the user function.

result <- getmode(v)
print(result)

# Create the vector with characters.

charv <- c("o","it","the","it","it")

# Calculate the mode using the user function.

result <- getmode(charv)
print(result)

When we execute the above code, it produces the following result

−

[1] 2
[1] "it"
R - Linear Regression
Previous
Next

Regression analysis is a very widely used statistical tool to

establish a relationship model between two variables. One of
these variable is called predictor variable whose value is gathered
through experiments. The other variable is called response
variable whose value is derived from the predictor variable.

In Linear Regression these two variables are related through an

equation, where exponent (power) of both these variables is 1.
Mathematically a linear relationship represents a straight line
when plotted as a graph. A non-linear relationship where the
exponent of any variable is not equal to 1 creates a curve.

The general mathematical equation for a linear regression is −

y = ax + b

Following is the description of the parameters used −

• y is the response variable.

• x is the predictor variable.
• a and b are constants which are called the coefficients.

Steps to Establish a Regression

A simple example of regression is predicting weight of a person
when his height is known. To do this we need to have the
relationship between height and weight of a person.

The steps to create the relationship is −

• Carry out the experiment of gathering a sample of observed

values of height and corresponding weight.
• Create a relationship model using the lm() functions in R.
• Find the coefficients from the model created and create the
mathematical equation using these
• Get a summary of the relationship model to know the average
error in prediction. Also called residuals.
• To predict the weight of new persons, use
the predict() function in R.

Input Data

Below is the sample data representing the observations −

# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131

# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48

lm() Function
This function creates the relationship model between the
predictor and the response variable.

Syntax

The basic syntax for lm() function in linear regression is −

lm(formula,data)

Following is the description of the parameters used −

• formula is a symbol presenting the relation between x and y.

• data is the vector on which the formula will be applied.

Create Relationship Model & get the Coefficients

Live Demo
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152,
131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

relation <- lm(y~x)

print(relation)

When we execute the above code, it produces the following result

−
Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-38.4551 0.6746

Get the Summary of the Relationship

Live Demo
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152,
131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

relation <- lm(y~x)

print(summary(relation))

When we execute the above code, it produces the following result

−

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-6.3002 -1.6629 0.0412 1.8944 3.9775

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -38.45509 8.04901 -4.778 0.00139 **
x 0.67461 0.05191 12.997 1.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.253 on 8 degrees of freedom

Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491
F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06
predict() Function
Syntax

The basic syntax for predict() in linear regression is −

predict(object, newdata)

Following is the description of the parameters used −

• object is the formula which is already created using the lm()

function.
• newdata is the vector containing the new value for predictor
variable.

Predict the weight of new persons

Live Demo
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152,
131)

# The resposne vector.

y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

relation <- lm(y~x)

# Find weight of a person with height 170.

a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)

When we execute the above code, it produces the following result

−

1
76.22869

Visualize the Regression Graphically

Live Demo
# Create the predictor and response variable.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152,
131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)

# Give the chart file a name.

png(file = "linearregression.png")

# Plot the chart.

plot(y,x,col = "blue",main = "Height & Weight
Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in
Kg",ylab = "Height in cm")

# Save the file.

dev.off()

When we execute the above code, it produces the following result

−
R - Multiple Regression
Previous
Next

Multiple regression is an extension of linear regression into

relationship between more than two variables. In simple linear
relation we have one predictor and one response variable, but in
multiple regression we have more than one predictor variable and
one response variable.

The general mathematical equation for multiple regression is −

y = a + b1x1 + b2x2 +...bnxn

Following is the description of the parameters used −

• y is the response variable.

• a, b1, b2...bn are the coefficients.
• x1, x2, ...xn are the predictor variables.

We create the regression model using the lm() function in R. The

model determines the value of the coefficients using the input
data. Next we can predict the value of the response variable for a
given set of predictor variables using these coefficients.

lm() Function
This function creates the relationship model between the
predictor and the response variable.

Syntax

The basic syntax for lm() function in multiple regression is −

lm(y ~ x1+x2+x3...,data)

Following is the description of the parameters used −

• formula is a symbol presenting the relation between the

response variable and predictor variables.
• data is the vector on which the formula will be applied.
Example
Input Data

Consider the data set "mtcars" available in the R environment. It

gives a comparison between different car models in terms of
mileage per gallon (mpg), cylinder displacement("disp"), horse
power("hp"), weight of the car("wt") and some more parameters.

The goal of the model is to establish the relationship between

"mpg" as a response variable with "disp","hp" and "wt" as
predictor variables. We create a subset of these variables from
the mtcars data set for this purpose.

Live Demo
input <- mtcars[,c("mpg","disp","hp","wt")]
print(head(input))

When we execute the above code, it produces the following result

−

mpg disp hp wt
Mazda RX4 21.0 160 110 2.620
Mazda RX4 Wag 21.0 160 110 2.875
Datsun 710 22.8 108 93 2.320
Hornet 4 Drive 21.4 258 110 3.215
Hornet Sportabout 18.7 360 175 3.440
Valiant 18.1 225 105 3.460

Create Relationship Model & get the Coefficients

Live Demo
input <- mtcars[,c("mpg","disp","hp","wt")]

# Create the relationship model.

model <- lm(mpg~disp+hp+wt, data = input)

# Show the model.

print(model)

# Get the Intercept and coefficients as vector

elements.
cat("# # # # The Coefficient Values # # # ","\n")
a <- coef(model)[1]
print(a)

Xdisp <- coef(model)[2]

Xhp <- coef(model)[3]
Xwt <- coef(model)[4]

print(Xdisp)
print(Xhp)
print(Xwt)

When we execute the above code, it produces the following result

−

Call:
lm(formula = mpg ~ disp + hp + wt, data = input)

Coefficients:
(Intercept) disp hp wt
37.105505 -0.000937 -0.031157 -3.800891

# # # # The Coefficient Values # # #

(Intercept)
37.10551
disp
-0.0009370091
hp
-0.03115655
wt
-3.800891

Create Equation for Regression Model

Based on the above intercept and coefficient values, we create

the mathematical equation.

Y = a+Xdisp.x1+Xhp.x2+Xwt.x3
or
Y = 37.15+(-0.000937)*x1+(-0.0311)*x2+(-3.8008)*x3
Apply Equation for predicting New Values

We can use the regression equation created above to predict the

mileage when a new set of values for displacement, horse power
and weight is provided.

For a car with disp = 221, hp = 102 and wt = 2.91 the predicted
mileage is −

Y = 37.15+(-0.000937)*221+(-0.0311)*102+(-3.8008)*2.91 = 22.7104
Print

R - Logistic Regression
Previous
Next

The Logistic Regression is a regression model in which the

response variable (dependent variable) has categorical values
such as True/False or 0/1. It actually measures the probability of
a binary response as the value of response variable based on the
mathematical equation relating it with the predictor variables.

The general mathematical equation for logistic regression is −

y = 1/(1+e^-(a+b1x1+b2x2+b3x3+...))

Following is the description of the parameters used −

• y is the response variable.

• x is the predictor variable.
• a and b are the coefficients which are numeric constants.

The function used to create the regression model is

the glm() function.

Syntax

The basic syntax for glm() function in logistic regression is −

glm(formula,data,family)
Following is the description of the parameters used −

• formula is the symbol presenting the relationship between the

variables.
• data is the data set giving the values of these variables.
• family is R object to specify the details of the model. It's value
is binomial for logistic regression.

Example

The in-built data set "mtcars" describes different models of a car

with their various engine specifications. In "mtcars" data set, the
transmission mode (automatic or manual) is described by the
column am which is a binary value (0 or 1). We can create a
logistic regression model between the columns "am" and 3 other
columns - hp, wt and cyl.

Live Demo
# Select some columns form mtcars.
input <- mtcars[,c("am","cyl","hp","wt")]

print(head(input))

When we execute the above code, it produces the following result

−

am cyl hp wt
Mazda RX4 1 6 110 2.620
Mazda RX4 Wag 1 6 110 2.875
Datsun 710 1 4 93 2.320
Hornet 4 Drive 0 6 110 3.215
Hornet Sportabout 0 8 175 3.440
Valiant 0 6 105 3.460

Create Regression Model

We use the glm() function to create the regression model and get
its summary for analysis.

Live Demo
input <- mtcars[,c("am","cyl","hp","wt")]
am.data = glm(formula = am ~ cyl + hp + wt, data =
input, family = binomial)

print(summary(am.data))

When we execute the above code, it produces the following result

−

Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = input)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.17272 -0.14907 -0.01464 0.14116 1.27641

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288 8.11637 2.428 0.0152 *
cyl 0.48760 1.07162 0.455 0.6491
hp 0.03259 0.01886 1.728 0.0840 .
wt -9.14947 4.15332 -2.203 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 43.2297 on 31 degrees of freedom

Residual deviance: 9.8415 on 28 degrees of freedom
AIC: 17.841

Number of Fisher Scoring iterations: 8

Conclusion

In the summary as the p-value in the last column is more than

0.05 for the variables "cyl" and "hp", we consider them to be
insignificant in contributing to the value of the variable "am".
Only weight (wt) impacts the "am" value in this regression model.
R - Normal Distribution
Previous
Next

In a random collection of data from independent sources, it is

generally observed that the distribution of data is normal. Which
means, on plotting a graph with the value of the variable in the
horizontal axis and the count of the values in the vertical axis we
get a bell shape curve. The center of the curve represents the
mean of the data set. In the graph, fifty percent of values lie to
the left of the mean and the other fifty percent lie to the right of
the graph. This is referred as normal distribution in statistics.

R has four in built functions to generate normal distribution. They

are described below.

dnorm(x, mean, sd)

pnorm(x, mean, sd)
qnorm(p, mean, sd)
rnorm(n, mean, sd)

Following is the description of the parameters used in above

functions −

• x is a vector of numbers.
• p is a vector of probabilities.
• n is number of observations(sample size).
• mean is the mean value of the sample data. It's default value
is zero.
• sd is the standard deviation. It's default value is 1.

dnorm()
This function gives height of the probability distribution at each
point for a given mean and standard deviation.

Live Demo
# Create a sequence of numbers between -10 and 10
incrementing by 0.1.
x <- seq(-10, 10, by = .1)
# Choose the mean as 2.5 and standard deviation as 0.5.
y <- dnorm(x, mean = 2.5, sd = 0.5)

# Give the chart file a name.

png(file = "dnorm.png")

plot(x,y)

# Save the file.

dev.off()

When we execute the above code, it produces the following result

−

pnorm()
This function gives the probability of a normally distributed
random number to be less that the value of a given number. It is
also called "Cumulative Distribution Function".
Live Demo
# Create a sequence of numbers between -10 and 10
incrementing by 0.2.
x <- seq(-10,10,by = .2)

# Choose the mean as 2.5 and standard deviation as 2.

y <- pnorm(x, mean = 2.5, sd = 2)

# Give the chart file a name.

png(file = "pnorm.png")

# Plot the graph.

plot(x,y)

# Save the file.

dev.off()

When we execute the above code, it produces the following result

−
qnorm()
This function takes the probability value and gives a number
whose cumulative value matches the probability value.

Live Demo
# Create a sequence of probability values incrementing
by 0.02.
x <- seq(0, 1, by = 0.02)

# Choose the mean as 2 and standard deviation as 3.

y <- qnorm(x, mean = 2, sd = 1)

# Give the chart file a name.

png(file = "qnorm.png")

# Plot the graph.

plot(x,y)

# Save the file.

dev.off()

When we execute the above code, it produces the following result

−
rnorm()
This function is used to generate random numbers whose
distribution is normal. It takes the sample size as input and
generates that many random numbers. We draw a histogram to
show the distribution of the generated numbers.

Live Demo
# Create a sample of 50 numbers which are normally
distributed.
y <- rnorm(50)

# Give the chart file a name.

png(file = "rnorm.png")

# Plot the histogram for this sample.

hist(y, main = "Normal DIstribution")

# Save the file.

dev.off()
When we execute the above code, it produces the following result
−

R - Binomial Distribution
Previous
Next

The binomial distribution model deals with finding the probability

of success of an event which has only two possible outcomes in a
series of experiments. For example, tossing of a coin always gives
a head or a tail. The probability of finding exactly 3 heads in
tossing a coin repeatedly for 10 times is estimated during the
binomial distribution.
R has four in-built functions to generate binomial distribution.
They are described below.

dbinom(x, size, prob)

pbinom(x, size, prob)
qbinom(p, size, prob)
rbinom(n, size, prob)

Following is the description of the parameters used −

• x is a vector of numbers.
• p is a vector of probabilities.
• n is number of observations.
• size is the number of trials.
• prob is the probability of success of each trial.

dbinom()
This function gives the probability density distribution at each
point.

Live Demo
# Create a sample of 50 numbers which are incremented
by 1.
x <- seq(0,50,by = 1)

# Create the binomial distribution.

y <- dbinom(x,50,0.5)

# Give the chart file a name.

png(file = "dbinom.png")

# Plot the graph for this sample.

plot(x,y)

# Save the file.

dev.off()

When we execute the above code, it produces the following result

−
pbinom()
This function gives the cumulative probability of an event. It is a
single value representing the probability.

Live Demo
# Probability of getting 26 or less heads from a 51
tosses of a coin.
x <- pbinom(26,51,0.5)

print(x)

When we execute the above code, it produces the following result

−

[1] 0.610116

qbinom()
This function takes the probability value and gives a number
whose cumulative value matches the probability value.

Live Demo
# How many heads will have a probability of 0.25 will
come out when a coin
# is tossed 51 times.
x <- qbinom(0.25,51,1/2)

print(x)

When we execute the above code, it produces the following result

−

[1] 23

rbinom()
This function generates required number of random values of
given probability from a given sample.

Live Demo
# Find 8 random values from a sample of 150 with
probability of 0.4.
x <- rbinom(8,150,.4)

print(x)

When we execute the above code, it produces the following result

−

[1] 58 61 59 66 55 60 61 67

R - Random Forest
Previous
Next

In the random forest approach, a large number of decision trees

are created. Every observation is fed into every decision tree. The
most common outcome for each observation is used as the final
output. A new observation is fed into all the trees and taking a
majority vote for each classification model.

An error estimate is made for the cases which were not used
while building the tree. That is called an OOB (Out-of-bag) error
estimate which is mentioned as a percentage.

The R package "randomForest" is used to create random forests.

Install R Package
Use the below command in R console to install the package. You
also have to install the dependent packages if any.

install.packages("randomForest)

The package "randomForest" has the

function randomForest() which is used to create and analyze
random forests.

Syntax

The basic syntax for creating a random forest in R is −

randomForest(formula, data)

Following is the description of the parameters used −

• formula is a formula describing the predictor and response

variables.
• data is the name of the data set used.

Input Data

We will use the R in-built data set named readingSkills to create a

decision tree. It describes the score of someone's readingSkills if
we know the variables "age","shoesize","score" and whether the
person is a native speaker.

Here is the sample data.

# Load the party package. It will automatically load

other
# required packages.
library(party)

# Print some records from data set readingSkills.

print(head(readingSkills))

When we execute the above code, it produces the following result

and chart −

nativeSpeaker age shoeSize score

1 yes 5 24.83189 32.29385
2 yes 6 25.95238 36.63105
3 no 11 30.42170 49.60593
4 yes 7 28.66450 40.28456
5 yes 11 31.88207 55.46085
6 yes 10 30.07843 52.83124
Loading required package: methods
Loading required package: grid
...............................
...............................

Example

We will use the randomForest() function to create the decision tree

and see it's graph.

# Load the party package. It will automatically load

other
# required packages.
library(party)
library(randomForest)

# Create the forest.

output.forest <- randomForest(nativeSpeaker ~ age +
shoeSize + score,
data = readingSkills)

# View the forest results.

print(output.forest)

# Importance of each predictor.

print(importance(fit,type = 2))
When we execute the above code, it produces the following result
−

Call:
randomForest(formula = nativeSpeaker ~ age + shoeSize + score,
data = readingSkills)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 1

OOB estimate of error rate: 1%

Confusion matrix:
no yes class.error
no 99 1 0.01
yes 1 99 0.01
MeanDecreaseGini
age 13.95406
shoeSize 18.91006
score 56.73051

Conclusion

From the random forest shown above we can conclude that the
shoesize and score are the important factors deciding if someone
is a native speaker or not. Also the model has only 1% error
which means we can predict with 99% accuracy.

R - Chi Square Test

Previous
Next

Chi-Square test is a statistical method to determine if two

categorical variables have a significant correlation between them.
Both those variables should be from same population and they
should be categorical like − Yes/No, Male/Female, Red/Green etc.
For example, we can build a data set with observations on
people's ice-cream buying pattern and try to correlate the gender
of a person with the flavor of the ice-cream they prefer. If a
correlation is found we can plan for appropriate stock of flavors
by knowing the number of gender of people visiting.

Syntax
The function used for performing chi-Square test is chisq.test().

The basic syntax for creating a chi-square test in R is −

chisq.test(data)

Following is the description of the parameters used −

• data is the data in form of a table containing the count value

of the variables in the observation.

Example
We will take the Cars93 data in the "MASS" library which
represents the sales of different models of car in the year 1993.

Live Demo
library("MASS")
print(str(Cars93))

When we execute the above code, it produces the following result

−

'data.frame': 93 obs. of 27 variables:

$ Manufacturer : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
$ Model : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
$ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
$ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
$ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
$ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
$ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ...
$ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ...
$ AirBags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
$ DriveTrain : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
$ Cylinders : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
$ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
$ Horsepower : int 140 200 172 172 208 110 170 180 170 200 ...
$ RPM : int 6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
$ Rev.per.mile : int 2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
$ Man.trans.avail : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
$ Fuel.tank.capacity: num 13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
$ Passengers : int 5 5 5 6 4 6 6 6 5 6 ...
$ Length : int 177 195 180 193 186 189 200 216 198 206 ...
$ Wheelbase : int 102 115 102 106 109 105 111 116 108 114 ...
$ Width : int 68 71 67 70 69 69 74 78 73 73 ...
$ Turn.circle : int 37 38 37 37 39 41 42 45 41 43 ...
$ Rear.seat.room : num 26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
$ Luggage.room : int 11 15 14 17 13 16 17 21 14 18 ...
$ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
$ Origin : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
$ Make : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...

The above result shows the dataset has many Factor variables
which can be considered as categorical variables. For our model
we will consider the variables "AirBags" and "Type". Here we aim
to find out any significant correlation between the types of car
sold and the type of Air bags it has. If correlation is observed we
can estimate which types of cars can sell better with what types
of air bags.

Live Demo
# Load the library.
library("MASS")

# Create a data frame from the main data set.

car.data <- data.frame(Cars93$AirBags, Cars93$Type)

# Create a table with the needed variables.

car.data = table(Cars93$AirBags, Cars93$Type)
print(car.data)

# Perform the Chi-Square test.

print(chisq.test(car.data))
When we execute the above code, it produces the following result
−

Compact Large Midsize Small Sporty Van

Driver & Passenger 2 4 7 0 3 0
Driver only 9 7 11 5 8 3
None 5 0 4 16 3 6

Pearson's Chi-squared test

data: car.data
X-squared = 33.001, df = 10, p-value = 0.0002723

Warning message:
In chisq.test(car.data) : Chi-squared approximation may be incorrect

Conclusion
The result shows the p-value of less than 0.05 which indicates a
string correlation.

Assignement-1 Compound Angle 1685019490830
No ratings yet
Assignement-1 Compound Angle 1685019490830
3 pages
Reduction Thesis Peirce
100% (3)
Reduction Thesis Peirce
7 pages
Power Quality Performance Enhancement Using Single-Phase UPQC With Fuzzy Logic Controller Integrated With PV-BES System
No ratings yet
Power Quality Performance Enhancement Using Single-Phase UPQC With Fuzzy Logic Controller Integrated With PV-BES System
22 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
Face-Bow Record Without A Third Point of Reference Theoretical Considerations and An Alternative Technique
No ratings yet
Face-Bow Record Without A Third Point of Reference Theoretical Considerations and An Alternative Technique
5 pages
CBSE Class 6 Maths Practice Worksheets
73% (33)
CBSE Class 6 Maths Practice Worksheets
52 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Unit V
No ratings yet
Unit V
22 pages
Predict and Co
No ratings yet
Predict and Co
6 pages
Unit 5-1
No ratings yet
Unit 5-1
17 pages
R Programming Student Lab Manual-52-63-3-12
No ratings yet
R Programming Student Lab Manual-52-63-3-12
10 pages
WINSEM2024-25 CSE3506 ELA CH2024250502181 Reference Material III 21-12-2024 21NEW3
No ratings yet
WINSEM2024-25 CSE3506 ELA CH2024250502181 Reference Material III 21-12-2024 21NEW3
7 pages
Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering
No ratings yet
Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering
10 pages
Reinventing Discovery
No ratings yet
Reinventing Discovery
4 pages
BI Practical No.9
No ratings yet
BI Practical No.9
2 pages
Computers & Fluids: Tapan K. Sengupta, Himanshu Singh, Swagata Bhaumik, Rajarshi R. Chowdhury
No ratings yet
Computers & Fluids: Tapan K. Sengupta, Himanshu Singh, Swagata Bhaumik, Rajarshi R. Chowdhury
12 pages
Mean Median Mode
No ratings yet
Mean Median Mode
4 pages
1 Introduction To R and Rstudio: 2024-2025 Calculus Iii
No ratings yet
1 Introduction To R and Rstudio: 2024-2025 Calculus Iii
3 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
PRACTICAL5
No ratings yet
PRACTICAL5
4 pages
Class 10 Science Chapter 8 Presentation
No ratings yet
Class 10 Science Chapter 8 Presentation
68 pages
Week4 CheatSheet ModelDevelopment
No ratings yet
Week4 CheatSheet ModelDevelopment
4 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
NM Presentation
No ratings yet
NM Presentation
16 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Lecture 7 (Adaptive Filters)
No ratings yet
Lecture 7 (Adaptive Filters)
18 pages
Teaching Notes of R
No ratings yet
Teaching Notes of R
78 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Boeing 747 - Aerodynamic Analysis
100% (1)
Boeing 747 - Aerodynamic Analysis
59 pages
Statistical Functions and Regression in R
No ratings yet
Statistical Functions and Regression in R
5 pages
Linear Model
No ratings yet
Linear Model
10 pages
BDA Exp7 Removed
No ratings yet
BDA Exp7 Removed
4 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
Mean, Median and Mode
No ratings yet
Mean, Median and Mode
4 pages
Econometrics I - R Summary (Maite Cabeza-Gutes)
No ratings yet
Econometrics I - R Summary (Maite Cabeza-Gutes)
77 pages
Data Analysis Using R - 5
No ratings yet
Data Analysis Using R - 5
9 pages
Ieee1 PDF
No ratings yet
Ieee1 PDF
13 pages
Linear Algebra Toronto LectureNotes223
No ratings yet
Linear Algebra Toronto LectureNotes223
96 pages
Predictive Modeling-Handouts
No ratings yet
Predictive Modeling-Handouts
11 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
WEEK
No ratings yet
WEEK
17 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Dimitri Vey - Multisymplectic Geometry and Loop Quantum Gravity: Toward A Covariant Canonical Quantum Gravity
No ratings yet
Dimitri Vey - Multisymplectic Geometry and Loop Quantum Gravity: Toward A Covariant Canonical Quantum Gravity
19 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Unit-15 Data Analysis and R
No ratings yet
Unit-15 Data Analysis and R
12 pages
Spore News Vol 3 No 1
No ratings yet
Spore News Vol 3 No 1
6 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
Unit III
No ratings yet
Unit III
13 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Solution 1
No ratings yet
Solution 1
6 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Basic Regression Analysis 2
No ratings yet
Basic Regression Analysis 2
6 pages
PSD Analysis Steps
No ratings yet
PSD Analysis Steps
15 pages
Maple Labs
No ratings yet
Maple Labs
19 pages
ABB机器人编程手册
No ratings yet
ABB机器人编程手册
1,280 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
Regression Analysis Using R
No ratings yet
Regression Analysis Using R
17 pages
CS ELEC 4 Finals Module
No ratings yet
CS ELEC 4 Finals Module
57 pages
10Th Maths EM Creative One Mark - UNIT 3 - 4 - Kalviexpress
No ratings yet
10Th Maths EM Creative One Mark - UNIT 3 - 4 - Kalviexpress
10 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
HW 4
No ratings yet
HW 4
7 pages
Lab Manual OF Antenna and Wave Propagation: Using MATLAB
No ratings yet
Lab Manual OF Antenna and Wave Propagation: Using MATLAB
83 pages
RegrCorr PDF
No ratings yet
RegrCorr PDF
20 pages
Linear Regression Experiment
No ratings yet
Linear Regression Experiment
6 pages
A New Approach To Current Differential Protection For Transmission Lines
No ratings yet
A New Approach To Current Differential Protection For Transmission Lines
25 pages
Week 2 Measurements in Chemistry
No ratings yet
Week 2 Measurements in Chemistry
32 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Manual Tambahan Geogebra
No ratings yet
Manual Tambahan Geogebra
21 pages
Unit 6
No ratings yet
Unit 6
8 pages
100 Data Science in R Interview Questions and Answers For 2016
100% (2)
100 Data Science in R Interview Questions and Answers For 2016
56 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Roller Coaster SImulation
No ratings yet
Roller Coaster SImulation
75 pages
Filter Sizing - Pool & Spa News
No ratings yet
Filter Sizing - Pool & Spa News
3 pages
NTA UGC NET Electronic Science Syllabus
No ratings yet
NTA UGC NET Electronic Science Syllabus
3 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.