0% found this document useful (0 votes)
34 views92 pages

Jasika RM Lab

The p-value is 0.007 which is less than 0.05. Therefore, we reject the null hypothesis and accept the alternate hypothesis that the mean age of the population is greater than 40.

Uploaded by

Jessica Nigam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views92 pages

Jasika RM Lab

The p-value is 0.007 which is less than 0.05. Therefore, we reject the null hypothesis and accept the alternate hypothesis that the mean age of the population is greater than 40.

Uploaded by

Jessica Nigam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 92

RESEARCH METHODOLOGY LAB

(Using MS Excel and R Studio)

PRACTICAL FILE

Submitted for partial fulfillment for the award of the Degree


of

BACHELOR OF BUSINESS ADMINISTRATION


{BBA (G) 2021 – 2024}

Under the guidance of

Dr. AANCHAL AGGARWAL

Submitted by

“JASIKA”
“111”

VIVEKANANDA SCHOOL OF BUSINESS STUDIES


VIVEKANANDA INSTITUTE OF PROFESSIONAL STUDIES
(Affiliated to Guru Gobind Singh Indraprastha University)
STUDENT UNDERTAKING

This is to certify that I have completed the Project titled “RESEARCH


METHODOLOGY FILE USING MS EXCEL AND R-STUDIO” under the
guidance of “DR. AANCHAL AGGARWAL” in partial fulfilment of the
requirement for the award of degree of Bachelor of Business Administration at
Vivekananda Institute of Professional Studies, Vivekananda School of Business
Studies, New Delhi. This is an original piece of work and has not been
submitted elsewhere.

STUDENT NAME

Jasika STUDENT SIGNATURE


CERTIFICATE

This is to certify that the Project titled “RESEARCH METHODOLOGY


FILE USING MS EXCEL AND R-STUDIO” is an academic work done by
“Jasika” submitted in the partial fulfilment of the requirement for the award of
the Degree of Course from Vivekananda Institute of Professional Studies.  It
has been completed under the guidance of Dr. AANCHAL AGGARWAL. The
authenticity of the project work will be examined by the viva examiner which
includes data verification, authenticity of information etc. and it may be rejected
due to non-fulfilment of quality standards set by the Institute.

Signature of the Faculty Guide


ACKNOWLEDGEMENT

Before we get into the project, I would like to add a few words of appreciation
for the people who have been a part of this project right from its inception. The
writing of this project has been one of the significant academic challenges I
have faced and without the support, patience, and guidance of the people
involved, this task would not have been completed. It is to them I owe my
deepest gratitude.
It gives me immense pleasure in presenting this project report on “Research
Methodology (using MS Excel and R Studio”. The success of this project is a
result of sheer hard work, and determination put in by me with the help of my
project guide. I hereby take this opportunity to add a special note of thanks for
DR. AANCHAL AGGARWAL, who undertook to act as my mentor despite
her many other academic and professional commitments. Her wisdom,
knowledge and commitment to the highest standards inspired and motivated me,
without her insight, support and energy, this project wouldn’t have kick-started
and neither would have reached fruitfulness.
INDEX
TOPIC PAGE NO.
Data Analysis
 Descriptive statistics
 Histogram frequency distribution
 Correlation (Positive, Negative, zero)
HYPOTHESIS TESTING
 One sample t test using dummy (one-tailed)
 Two sample t test (two-tailed)
 Two sample - t test (one tailed)
 Paired Sample t test
 Two sample z test
 F test
 ANOVA – Single Factor
 ANOVA – Two Factor without replication
 ANOVA – Two Factor with replication
HYPOTHESIS TESTING in R Studio
 How to install R Studio
 Introduction to R studio
 Import of Data Sheet in R studio
 Descriptive statistics
 Correlation
 Hypothesis Testing: One sample t test (two tail)
 Hypothesis Testing: Two independent sample t test
 Hypothesis Testing: Paired Sample t test (alpha 10%, one tail )
 Hypothesis Testing: Paired Sample t test 2 (One tail)
 Hypothesis Testing: F test
 Hypothesis Testing: One-way ANOVA
DATA
ANALYSIS
Descriptive statistics

AGE
5
25
15
45
26
45
48
59
15
48
47
77
16
28
25
84
75
59
25
48
13
20
30
69
AGE

39.458333
Mean 33
4.5983531
Standard Error 89
Median 37.5
Mode 25
Standard 22.527237
Deviation 94
507.47644
Sample Variance 93
-
0.8383230
Kurtosis 41
0.4426490
Skewness 84
Range 79
Minimum 5
Maximum 84
Sum 947
Count 24
HISTOGRAM FREQUENCY DISTRIBUTION

STEPS:
1) Go to Data Tab  Data analysis optionSelect Histogram Option and
Click OK

2) Select the input range, Labels, Output range and Pareto, Chart Output
and Cumulative Percentage Option and Click OK
3) The Output is Displayed
Frequenc Cumulative BIN Frequenc Cumulative
BINS y % S y %
25 0 0.00% 85 47 21.27%
30 3 1.36% 90 37 38.01%
35 4 3.17% 95 35 53.85%
40 2 4.07% 80 19 62.44%
45 0 4.07% 70 15 69.23%
50 6 6.79% 65 14 75.57%
55 11 11.76% 55 11 80.54%
60 8 15.38% 75 11 85.52%
65 14 21.72% 100 9 89.59%
70 15 28.51% 60 8 93.21%
75 11 33.48% 50 6 95.93%
80 19 42.08% 35 4 97.74%
85 47 63.35% 30 3 99.10%
90 37 80.09% 40 2 100.00%
95 35 95.93% 25 0 100.00%
100 9 100.00% 45 0 100.00%
Mor
More 0 100.00% e 0 100.00%

Histogram
50 120.00%
45
100.00%
40
35
80.00%
30
Frequency

25 60.00% Frequency
20 Cumulative %
40.00%
15
10
20.00%
5
0 0.00%
85 90 95 80 70 65 55 75 100 60 50 35 30 40 25 45 ore
M
BINS
CORRELATION
The correlation coefficient (a value between -1 and +1) tells you how strongly two variables
are related to each other.
a. POSITIVE CORRELATION
What is the correlation between the advertisement of a product in a month and its sales in
crores?
Sales in
Advertisement in month crores
32 5
54 10
67 15
65 20
98 24
112 34
101 25
34 34
Result:
Advertisement in Sales in
  month crores
Advertisement in
month 1
Sales in crores 0.485149134 1

Inference:
Here r = +0.48, therefore there is a positive correlation between advertisements and sales.
b. NEGATIVE CORRELATION
What is the correlation between no of cigarettes in a week and life
expectancy?
Cigarette Life
s expectancy
5 80
23 78
25 60
48 53
17 85
8 84
4 73
26 79
11 81
19 75
14 68
35 72
29 58
4 92
23 65
Inference:
Here r = -0.71, therefore there is a negative correlation between number of cigarettes in a
week and life expectancy.
c. NO ZERO CORRELATION

What is the correlation between shoe size and IQ level?


Shoe IQ
size level
1 4
2 5
3 4
4 5
5 4
6 5
7 4
Result:
Shoe IQ
  size level
Shoe
size 1
IQ
level 0 1

Inference:
Here r=0, therefore there is no correlation between shoe size and IQ level.
HYPOTHESIS
TESTING
ONE SAMPLE T-TEST USING A DUMMY (ONE-TAILED)
Problem: To determine that the population mean of age is greater than 40
at a=0.05
Age Dummy
42 0
76 0
56 0
67  
65  
65  
89  
45  
45  
65  
78  
55  
44  
65  
76  
89  
54  
56  
56  
76  
45  

Hypothesis Testing:
Null hypothesis (H0): The mean age of the population is not greater than 40.
Alternate hypothesis (H1): The mean age of the population is greater than 40.
H0 = µ≤40
H1 = µ>40
Result:
t-Test: Two-Sample Assuming Equal Variances
     
  Age Dummy
Mean 62.33333 0
Variance 208.6333 0
Observations 21 3
Pooled Variance 189.6667  
Hypothesized Mean Difference 40  
df 22  
t Stat 2.627379  
P(T<=t) one-tail 0.007691  
t Critical one-tail 1.717144  
P(T<=t) two-tail 0.015382  
t Critical two-tail 2.073873  
Decision Rule:
If t-stat is greater than t-critical, reject Null Hypothesis.
If p(t) is less than a, reject Null Hypothesis

Inference:
Since t Stat (2.62) is greater than t critical (1.71), reject null hypothesis.
Since P (0.007) is less than α (0.05), reject null hypothesis.

Conclusion:
The population mean age is greater than 40 at α=0.05
TWO SAMPLE T-TEST (TWO TAILED)
Problem: To analyse that there is a significant difference between the marks
scored by class groups A & B in mathematics at α=10%
Group A Group B

76 95
87 97
98 87
78 89
76 87
78 45
76 76
88 56
78 76
87 87
87 76
87 76
76 45
89 88
65 76
78 66
89 78
87 56
87 77

Hypothesis Testing:
Null hypothesis (H0): There is no significant difference between the marks
scored by class groups A & B in mathematics at α=10%
Alternate hypothesis (H1): There is a significant difference between the marks
scored by class groups A & B in mathematics at α=10%
H0 = µA = µB; µA - µB = 0
H1 = µA ≠ µB; µA - µB ≠ 0
Result:
t-Test: Two-Sample Assuming Equal Variances

Group Group
  A B
82.4736 75.4210
Mean 8 5
57.3742 238.812
Variance 7 9
Observations 19 19
148.093
Pooled Variance 6
Hypothesized Mean Difference 0
df 36
1.78626
t Stat 1
0.04124
P(T<=t) one-tail 1
1.30551
t Critical one-tail 4
0.08248
P(T<=t) two-tail 2
1.68829
t Critical two-tail 8  
Decision Rule:
If t-stat is greater than t-critical, reject Null Hypothesis.
If p(t) is less than α, reject Null Hypothesis
Inference:
Since t Stat (1.78) is greater than t critical (1.68), reject null hypothesis.
Since P (0.08) is less than α (0.1), reject null hypothesis.

Conclusion:
There is a significant difference between the marks scored by class groups A &
B in mathematics at α=10%.
TWO SAMPLE T TEST (ONE TAILED)
Problem: To analyze that the time spent by full-time students
studying statistics is more than the time spent by part-time
Full Part
time time

3.2 3.1
1.5 3.4
6.5 4.6
0.2 2.8
3.7 2.3
3.3 1.5
1.7 3.8
3.6 9.5
3.8 4.3
5.3 2.7
6.9 1.6
3.6 1.6
1.7 3.2
1.2 4.2
7.2 3.9
3.9 1.2
1.9 0
5.3 0
t-Test: Two-Sample Assuming Unequal Variances

Part
  Full time time
2.98333
Mean 3.583333333 3
4.56617
Variance 4.133235294 6
Observations 18 18
Hypothesized Mean Difference 0
df 34
t Stat 0.86306312
P(T<=t) one-tail 0.19707508
t Critical one-tail 1.690924255
P(T<=t) two-tail 0.394150159
t Critical two-tail 2.032244509  

DECISION RULE:
If T stat > T Critical, Reject Null Hypothesis
If P< Alpha, Reject Null Hypothesis
INFERENCE:
Since T stat (0.86) is less than t critical (1.69. Therefore, accept Null
Hypothesis.
Since P value (0.39) which is greater than alpha. Therefore, accept Null
Hypothesis.
CONCLUSION:
Therefore, the time spent by part time students in studying statistics is same as
the time spent by part time students at Alpha= 0.05
PAIR SAMPLE T-TEST
Problem: To determine that there is a significant difference
between the time to finish the race when race is completed with local
shoes and branded shoes.
Athelet Local Branded
e shoes shoes
1 3.2 3.1
2 1.5 3.4
3 6.5 4.6
4 0.2 2.8
5 3.7 2.3
6 3.3 1.5
7 1.7 3.8
8 3.6 9.5
9 3.8 4.3
10 5.3 2.7
11 6.9 1.6
12 3.6 1.6
13 1.7 3.2
14 1.2 4.2
15 7.2 3.9

Hypothesis Testing:
Null hypothesis (H0): There is no significant difference between the
time to finish the race when race is completed with local shoes and
branded shoes.
Alternate hypothesis (H1): There is a significant difference between
the time to finish the race when race is completed with local shoes
and branded shoes.
H0 = µA = µB; µA - µB = 0 or tl = tb, tl – tb = 0
H1 = µA ≠ µB; µA - µB ≠ 0 or tl ≠ tb, tl – tb ≠ 0
Result:
t-Test: Paired Two Sample for Means

Local shoes
Mean 3.56 3.5
4.59828
Variance 6 3.76
Observations 15 15
-
Pearson Correlation 0.02216
Hypothesized Mean
Difference 0
df 14
0.07950
t Stat 6
0.46887
P(T<=t) one-tail 8
t Critical one-tail 1.76131
0.93775
P(T<=t) two-tail 5
2.14478
t Critical two-tail 7  
Decision Rule:
If t-stat is greater than t-critical, reject Null Hypothesis.
If p(t) is less than α, reject Null Hypothesis

Inference:
Since t Stat (0.079) is less than t critical (2.14), accept null
hypothesis.
Since P (0.93) is greater than α (0.05), accept null hypothesis.
Conclusion:
There is no significant difference between the time to finish the race
when race is completed with local shoes and branded shoes.
TWO SAMPLE Z TEST
PROBLEM- The net annual returns (the returns on investment after deducting
all relevant fees) in percentage are given. Can investors do better by buying
mutual funds directly from banks or other financial institutions than by
purchasing mutual funds through brokers. Can we conclude at the 5%
significance level that directly-purchased mutual funds outperform mutual funds
bought through brokers?

Broke
Direct r

9.33 3.24
6.94 -6.76
16.17 12.8
16.97 11.1
5.94 2.73
12.61 -0.13
3.33 18.22
16.13 -0.8
11.2 -5.75
1.14 2.59
4.68 3.71
3.09 13.15
7.26 11.05
2.05 -3.12
13.07 8.94
0.59 2.74
13.57 4.07
0.35 5.6
2.69 -0.85
18.45 -0.28
4.23 16.4
10.28 6.39
7.1 -1.9
-3.09 9.49
5.6 6.7
5.27 0.19
8.09 12.39
15.05 6.54
13.21 10.92
1.72 -2.15
14.69 4.36
-2.97 -11.07
10.37 9.24
-0.63 -2.67
-0.15 8.97
0.27 1.87
4.59 -1.53
6.38 5.23
-0.24 6.87
10.32 -1.69
10.29 9.43
4.39 8.31
-2.06 -3.99
7.66 -4.44
10.83 8.63
14.48 7.06
4.8 1.57
13.12 -8.44
-6.54 -5.72
-1.06 6.95

HYPOTHESIS TESTING
Null Hypothesis: Directly purchased mutual funds do not outperform mutual
funds bought through brokers.
Alternate Hypothesis: Directly purchased mutual funds do outperform mutual
funds bought through brokers.

H0: µ0 ≤ µ1
H1 : µ0 > µ1
RESULT:

z-Test: Two
Sample for Means    
     
DIREC BROK
  T ER
Mean 6.6312 3.7232
37.488
Known Variance 2 43.3393
Observations 50 50
Hypothesized
Mean Difference 0  
2.2871
z 77  
0.0110
P(Z<=z) one-tail 93  
1.6448
z Critical one-tail 54  
0.0221
P(Z<=z) two-tail 85  
1.9599
z Critical two-tail 64  

DECISION RULE :
If Z STAT IS LESS THAN Z critical accept null hypothesis
If P(Z) greater than α, accept null hypothesis

INFERENCE:
Since z-stat (2.28) is greater than z-critical (1.64), we will reject Null
hypothesis.
Since p(z) value (0.011) is less than α(0.05), we will reject Null hypothesis.

CONCLUSION:
Directly purchased mutual funds outperform funds bought through brokers.

F TEST
Determine whether the variance of Class 1 is greater than the variance of class2 in
mathematics.

Class1 Class2

65 76

76 54

65 67

76 65

56 76

45 66

HYPOTHESIS TESTING
NULL HYPOTHESIS: Variance of class 1 is not greater than variance of class 2.
ALTERNATE HYPOTHESIS: Variance of class 1 is greater than variance of class 2.
H0: V1≤V2: V1-V2≤0
H1: V1>V2: V1-V2>0
Decision rule
If F STAT IS LESS THAN f critical accept null hypothesis
If P(F) greater than α, accept null hypothesis
INFERENCE
SINCE F STAT = 2.13 is less than F critical = 5.05 therefore accept null hypothesis
Since P = 0.21 is less than α = 0.05 accept null hypothesis
CONCLUSION
Therefore, variance of class 1 is not greater than class 2

ANOVA - SINGLE FACTOR


Problem: There is a significant difference between mean marks of economi
and history.
H
H1: at least one of the means is different.

Economic Scienc Histor


s e y
42 69 35
53 54 40
49 58 53
53 64 42
43 64 50
44 55 39
45 56 55
52   39
54   40

Hypothesis Testing:
Null hypothesis: There is no significant difference between the mean of population.
H0: μ1 = μ2 = μ3
Alternate hypothesis: There is a significant difference between the mean of population.
H1: at least one of the means is different.
Anova: Single
Factor

SUMMARY
Groups Count Sum Average Variance
Economics 9 435 48.33333 23.5
Science 7 420 60 32.33333
History 9 393 43.66667 50.5

ANOVA
Source of
Variation SS df MS F P-value F crit
Between
Groups 1085.84 2 542.92 15.19623 7.16E-05 3.443357
Within Groups 786 22 35.72727

Total 1871.84 24        
INFERENCE
Since f stat (15.196) is greater than f critical (3.443) therefore reject the null hypothesis.
Since P value (0.0000715) which is less than alpha therefore reject null hypothesis.
CONCLUSION
Therefore, the mean marks of the students in economics science and history are all not equal assuming α= 0.
ANOVA-TWO FACTOR WITHOUT REPLICATION

Problem Statement: To test whether or not marks of students differ with


respect to student and subjects both.

Studen Histor
ts Eco Sci y
A 42 69 35
B 53 54 40
C 49 58 53
D 53 64 42
E 43 64 50

COLUMN WISE
ALTERNATIVE HYPOTHESIS: THERE IS A DIFFERENCE IN MARKS EACH
STUDENT SCORES
NULL HYPOTHESIS:THERE IS NO DIFFERENCE BETWEEN STUDENT MARKING
AND SUBJECTS SCORES

ROW WISE
ALTERNATIVE HYPOTHESIS: THERE IS A SIGNIFICANT DIFFERENCE IN
MARKS STUDENT WISE

NULL HYPOTHESIS: THERE IS NO SIGNIFICANT DIFFERENCE STUDENT WISE


Anova: Two-Factor Without Replication

Va
ria
SUMMARY Count Sum Average nce
322
.33
A 3 146 48.66667 33
B 3 147 49 61
20.
333
C 3 160 53.33333 33
D 3 159 53 121
114
.33
E 3 157 52.33333 33

Eco 5 240 48 28
34.
Sci 5 309 61.8 2
54.
History 5 220 44 5

ANOVA
P-
val F
Source of Variation SS df MS F ue crit
0.3 0.8 3.8
002 698 378
Rows 60.93333 4 15.23333 63 89 53
8.5 0.0 4.4
952 101 589
Columns 872.1333 2 436.0667 69 72 7
Error 405.8667 8 50.73333

Total 1338.933 14        

DECISION RULE:
If f is greater than f critical, reject null hypothesis.
If p(f) is less than α, reject null hypothesis.

INFERENCE:
ROW WISE
Since f(0.3) is lesser than f critical (3.83), we will accept null hypothesis.
Since p(f) value (0.86) is greater than α(0,05), we will accept null hypothesis.
COLUMN WISE
Since f(8.59) is greater than f critical (4.45), we will reject null hypothesis.
Since p(f) value is greater than α(0.05), we will reject null hypothesis.

CONCLUSION:
ROW WISE- there is enough evidence that there is no significant difference between marks
of the students.
COLUMN WISE- there is enough evidence that there is significant difference between
marks of three subjects – Economics, Science, History.
ANOVA-TWO FACTOR WITH REPLICATION
Problem : To test whether or not marks of students differ with respect to school, subject wise
and school wise in conjunction with the subjects.

  economics science history


SCHOOL A 42 69 35

  53 54 40

  49 58 53
  53 64 42

  43 64 50

SCHOOL B 44 55 39

  45 56 55

  52 0 39

  54 0 40
  0 0 0
Hypothesis Testing:
Row Wise:
H0 : There is no significant difference between school A and School B
H1: There is a significant difference between school A and School B
Column Wise:
H02 : There is no significant difference between economics, medicine and history
H2: There is a significant difference between economics, medicine and history
Interaction Wise:
H03: There is no significant difference between school A and School B subject-wise (in
conjunction with subjects)
H3: There is a significant difference between school A and School B subject-wise (in
conjunction with subjects)
Anova: Two-Factor With Replication

SUMMARY science history Total

42      

Count 5 5 10

Sum 309 220 529

Average 61.8 44 52.9


127.433
Variance 34.2 54.5 3

44      
Count 5 5 10
Sum 111 173 284
Average 22.2 34.6 28.4
640.266
Variance 924.2 420.3 7

Total      
Count 10 10
Sum 420 393
Average 42 39.3
861.555 235.566
Variance 6 7
ANOVA
Source of
Variation SS df MS F P-value F crit
8.37636 0.01056 4.49399
Sample 3001.25 1 3001.25 1 8 8
0.75388 4.49399
Columns 36.45 1 36.45 0.10173 9 8
3.18183 0.09343 4.49399
Interaction 1140.05 1 1140.05 1 7 8
Within 5732.8 16 358.3

Total 9910.55 19        

DECISION RULE:
If f stat is greater than f critical, reject null hypothesis.
If p value is less than a, reject null hypothesis.

INFERENCE:

Row wise:
Here, f stat (8.376) is greater than f-critical (4.493), we will reject Null hypothesis.
Here, p value (0.01) is less than a (0.05), we will reject Null hypothesis.

Column wise:
Here, f stat (0.101) is less than f-critical (4.493), we will accept Null hypothesis.
Here, p value (0.753) is greater than u (0.05), we will accept Null hypothesis.

Interaction wise:
Here, f stat (3.181) is less than f-critical (4.493), we will accept Null hypothesis.
Here, p value (0.093) is greater than (1 (0.05), we will accept Null hypothesis.

CONCLUSION:

Row wise:
There is enough evidence that marks of students differ significantly school wise.

Column wise:
There is enough evidence that there is no difference between the marks of the three
subjects. i.e., Economics, Science and History.

Interaction:
There is no significant difference between the marks of School A and School B subject
HYPOTHESIS TESTING
IN R STUDIO
How to install R studio
In order to install R Studio, we first need to install R. Following are the steps how to install
R:

1. Go to CRAN, click Download R for Windows, click Base, and download the installer for the
latest R version.
2. Right-click the installer file and select Run as Administrator from the pop-up menu.
3. Select the language to be used during installation.
This doesn’t change the language used by R; all messages and Help files remain in English.
4. Follow the instructions of the installer.
You can safely use the default settings and just keep clicking Next until R starts installing.

After installing the setup of R,we can install the setup of R Studio. Following are the steps
how to install R Studio:

1. Install R. Leave all default settings in the installation options.


2. Open RStudio.
3. Go to the “Packages” tab and click on “InstallPackages”. ...
4. Start typing “Rcmdr” until you see it appear in a list. ...
5. Wait while all the parts of the R Commander package are installed.
Introduction to R studio

Chapter 1 R and RStudio


R is a programming language used for statistical computing while RStudio uses the R
language to develop statistical programs. In R, you can write a program and run the code
independently of any other computer program. RStudio however, must be used alongside R
in order to properly function. Often referred to as an IDE, or integrated development
environment, RStudio allows users to develop and edit programs in R by supporting a large
number of statistical packages, higher quality graphics, and the ability to manage your
workspace.

R and RStudio are not separate versions of the same program, and cannot be substituted for
one another. R may be used without RStudio, but RStudio may not be used without R.

Chapter 2 The Advantages of RStudio

1) RStudio is designed to make it easy to write scripts.

As soon as you create a new script, the windows within your RStudio session adjust
automatically so you can see both your script and the results in your console when you run
your syntax.

Even better is the ability to call up potential syntax options while you are writing just by
using the tab key.

For example, suppose I am trying to access a variable in a data set called “teachers”, but I
haven’t memorized the variable names:

 
2) RStudio makes it convenient to view and interact with the objects stored in your
environment.

In the basic R GUI, you can always list the objects you have stored in your environment. But
RStudio has a very useful “Environment” window available.

This shows all of the objects that you have stored, including data; scalars, vectors, and
matrices; model outputs; etc., along with a summary of the information that is stored in those
objects.

You can even click on your data sets directly to open them and view them as spreadsheets.

 
 
 
 
 
 
 
 
 
 
 
 
 

3) RStudio makes it easy to set your working directory and access files on your computer.

Especially if you are working in Windows, one of the most tedious parts of programming in
R is setting your working directory to access your files.

With RStudio, you can navigate to folders on your computer in the “Files” window, view any
files you have in that folder, and set that folder as the working directory.

Working directory is a folder where R reads and saves files.


Command :

setwd("c:/Documents/my/working/directory")

Set a default working directory

A default working directory is a folder where RStudio goes, every time you open it. You can
change the default working directory from RStudio menu under: Tools –> Global options –>
click on “Browse” to select the default working directory you want.

4) RStudio makes graphics much more accessible for a casual user.

The basic R GUI requires you to go to some lengths to save graphics as you go. But RStudio
has a window that does exactly that.

You can easily click back and forth between plots, change the sizes of your plot without
rerunning the code, and export or copy plots to include in other documents.
 

Chapter 3 Four Panes in RStudio


RStudio is a four pane work-space for 1) creating file containing R script, 2) typing R
commands, 3) viewing command histories, 4) viewing plots and more.

Top-left panel: Code editor allowing you to create and open a file containing R script. The R
script is where you keep a record of your work. R script can be created as follow: File –>
New –> R Script.

Bottom-left panel: R console for typing R commands


Top-right panel:

Workspace tab: shows the list of R objects you created during your R session

History tab: shows the history of all previous commands

Bottom-right panel:

Files tab: show files in your working directory

Plots tab: show the history of plots you created. From this tab, you can export a plot to a PDF
or an image files

Packages tab: show external R packages available on your system. If checked, the package is
loaded in R.
IMPORT OF DATA SHEET IN R STUDIO

1.In File tab, click on Import Dataset then click from excel

2.Click on browse and select file

3. Select the sheet and click on import.


4.Output
DESCRIPTIVE STATISTICS
Coding
For Summary Statistics
Summary (Des_Statistics$Age)
For Standard Deviation
sd(Des_Statistics$Age)
For Variance
var(Des_Statistics$Age)

Result
For Summary Statistics
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.00 22.00 42.50 42.05 58.75 98.00
For Standard Deviation
[1] 27.31006
For Variance
[1] 745.8395
CORRELATION
Coding
cor.test(Correlation$`Advertisement in month`,Correlation$`Sales in crores`)
Result
Pearson's product-moment correlation

data: Correlation$`Advertisement in month` and Correlation$`Sales in crores`


t = 1.359, df = 6, p-value = 0.223
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3335576 0.8866886
sample estimates:
cor
0.4851491
Inference
There is a positive correlation between advertisement in a month and sales in
crores.
ONE SAMPLE T-TEST USINF DUMMY

Problem: To determine that there is a significance difference between the calculated mean
age of population and estimated mean age of population mean age being (40)
Age Dummy
42 0
76 0
56 0
67
65
65
89
45
45
65
78
55
44
65
76
89
54
56
56
76
45

Null Hypothesis: There is a no significance difference between the calculated mean age
of population and estimated mean age of population mean age being 40.
Alternate Hypothesis: There is a significance difference between the calculated mean
age
of population and estimated mean age of population mean age being 40.
H0 : µ = 40
H1 : µ ≠ 40
Output:
One Sample t-test

data: one_sample_t_test_2$Age
t = 7.0855, df = 20, p-value = 7.209e-07
alternative hypothesis: true mean is not equal to 40
95 percent confidence interval:
55.75844 68.90823
sample estimates:
mean of x
62.33333
Decision Rule:
If p value is less than a, reject Null Hypothesis
Inference:
Since P (0.02) is less than α (0.05), reject null hypothesis.
Conclusion:
There is a significance difference between the calculated mean age of population and
estimated mean age of population mean age being 40
HYPOTHESIS TESTING: ONE SAMPLE T TEST(TWO TAILED)

Problem: To analyse that there is a significant difference between the marks scored by class
groups A & B in mathematics at α=10%
Group A Group B
76 95
87 97
98 87
78 89
76 87
78 45
76 76
88 56
78 76
87 87
87 76
87 76
76 45
89 88
65 76
78 66
89 78
87 56
87 77

Alternate Hypothesis: There is a significant difference between the marks scored by class
groups A & B in mathematics at α=10%
Null Hypothesis: There is no significant difference between the marks scored by class
groups A & B in mathematics at α=10%
H0= µa=µb, µa-µb=0
H1= µa≠µb, µa-µb≠0
Coding
t.test(twosample_t_test2$’Group A’,twosample_t_test2$’Group B’)
Output
Welch Two Sample t-test
data: twosample_t_test2$`Group A` and twosample_t_test2$`Group B`
t = 1.7863, df = 26.177, p-value = 0.08565
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.060474 15.165737
sample estimates:
mean of x mean of y
82.47368 75.42105
Decision Rule
If p value is less than a, reject Null Hypothesis
Inference:
Since P (0.08) is less than α (0.10), reject null hypothesis.
Conclusion:
There is a significant difference between the marks scored by class groups A & B in
mathematics at α=10%
HYPOTHESIS TESTING: TWO INDEPENDENT SAMPLE T TEST

Problem: To analyse that the time spent by full time students in studying statistics is more
than the time spent by part time students at α=0.05.
Full Part
time time
3.2 3.1
1.5 3.4
6.5 4.6
0.2 2.8
3.7 2.3
3.3 1.5
1.7 3.8
3.6 9.5
3.8 4.3
5.3 2.7
6.9 1.6
3.6 1.6
1.7 3.2
1.2 4.2
7.2 3.9
3.9 1.2
1.9 0
5.3 0
Coding
t.test(two_sample_t_test1$`Full time`,two_sample_t_test1$`Part time`)
Output
Welch Two Sample t-test

data: two_sample_t_test1$`Full time` and two_sample_t_test1$`Part time`


t = 0.86306, df = 33.916, p-value = 0.3942
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.8129418 2.0129418
sample estimates:
mean of x mean of y
3.583333 2.983333
Decision Rule
If p value is less than a, reject Null Hypothesis
Inference:
Since p (0.2202) is greater than alpha (0.05), it will accept Null Hypothesis.

Conclusion:
here is no significant difference between the mean of babyfood A and babyfood B
HYPOYHESIS TESTING: PAIRED SAMPLE T TEST

Problem: To determine that there is a significant difference between the time to finish the
race when race is completed with local shoes and branded shoes.
Athelet Branded
Local shoes
e shoes
1 3.2 3.1
2 1.5 3.4
3 6.5 4.6
4 0.2 2.8
5 3.7 2.3
6 3.3 1.5
7 1.7 3.8
8 3.6 9.5
9 3.8 4.3
10 5.3 2.7
11 6.9 1.6
12 3.6 1.6
13 1.7 3.2
14 1.2 4.2
15 7.2 3.9
Hypothesis Testing:
Null hypothesis (H0): There is no significant difference between the time to finish the race
when race is completed with local shoes and branded shoes.
Alternate hypothesis (H1): There is a significant difference between the time to finish the
race when race is completed with local shoes and branded shoes.
H0 = µA = µB; µA - µB = 0 or tl = tb, tl – tb = 0
H1 = µA ≠ µB; µA - µB ≠ 0 or tl ≠ tb, tl – tb ≠ 0
Coding
t.test(`Local shoes`,`Branded shoes`,mu=0,alternative = "two.sided",paired = T,conf.level =
0.95)
Output
Paired t-test
data: Local shoes and Branded shoes
t = 0.079506, df = 14, p-value = 0.9378
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-1.558575 1.678575
sample estimates:
mean difference
           0.06
Decision Rule
If p value is less than a, reject Null Hypothesis
Inference:
Since p(0.93) is lesser than the alpha (0.10), we will reject null hypothesis.
Conclusion:
There is a significant difference between the time to finish the race when race is completed
with local shoes and branded shoes.
HYPOTHESIS TESTING: PAIRED SAMPLE T TEST 2 (ONE TAIL)

NULL HYPOTHESIS: µcm≤µcd


ALTERNATE HYPOTHESIS: µcm>µvd
CODING
t.test(PT_TEST2$`Chocolate Milk`,PT_TEST2$`Carbohydrate Replacement
Drink`,paired = T,alternative = "greater",conf.level = 0.90)
RESULT
Paired t-test
data: PT_TEST2$`Chocolate Milk` and PT_TEST2$`Carbohydrate
Replacement Drink`
t = 1.9793, df = 8, p-value = 0.04157
alternative hypothesis: true difference in means is greater than 0
90 percent confidence interval:
2.455942 Inf
sample estimates:
mean of the differences
8.345556
DECISION RULE
If p is less than α reject null hypothesis
INFERENCE
Since p=0.04157 is less than α=0.1 we reject null hypothesis
CONCLUSION
The mean time to exhaustion is greater after chocolate milk than after
carbohydrate replacement drink.
HYPOTHESIS TESTING: F TEST

Determine whether group 1 variance is greater than group 2 variance.

Null hypothesis:
group 1 variance is equal group 2 variance.
Alternate hypothesis:
group 1 variance is greater than group 2 variance.

H1: var1 > var2


H0: var1 = var2
Coding
var.test(ftest$’Group1’,ftest$’Group2’,alternative=”greater”)
Output
F test to compare two variances
data: ftest_1_$`Group 1` and ftest_1_$`Group 2`
F = 2.1317, num df = 5, denom df = 5, p-value = 0.2129
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
0.4220932 Inf
sample estimates:
ratio of variances
2.13171
Decision Rule
IF p value is greater than alpha (0.05), accept null hypothesis
Inference
P is greater than alpha so variance 1 is greater than variance 2
Conclusion
Variance of Group 1 is Equal to Variance of Group 2 (VAR G1 = VAR G2)
HYPOYHESIS TESTIG: ONE WAY ANOVA

REASERCH PROBLEM:

Economi Scienc Histor


cs e y
42 69 35
53 54 40
49 58 53
53 64 42
43 64 50
44 55 39
45 56 55
52   39
54   40
HYPOTHESIS

H0-- μ1 = μ2 = μ3

H1- at least one of the means is different.


CODING:-

x=aov(MARKS~SUBJECT)

summary(x)

RESULT:-

Df Sum Sq Mean Sq F value Pr(>F)


subject 2 1086 542.9 15.2 7.16e-05 ***
Residuals 22 786 35.7
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2 observations deleted due to missingness

DECISION RULE:

If p< Alpha, Reject Null Hypothesis.

INFERENCE:

Since P is lesser than alpha value, we will NOT accept null hypothesis

CONCLUSION:

Mean of all the three groups is not equal.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy