0% found this document useful (0 votes)

40 views6 pages

DevRes wk1-2

This document provides an introduction to performing basic statistical analysis in R. It demonstrates how to import data, examine variables, calculate summary statistics, and perform t-tests and chi-square tests. The document contains code examples for reading in data, recoding variables, plotting distributions and comparing means between groups.

Uploaded by

Faustina Prima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views6 pages

DevRes wk1-2

Uploaded by

Faustina Prima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

# Welcome to Development Research!

You will now learn how to write some code in R and to perform some basic statistics!

###################################################### Quantitative Practical 1 [START] ##################################################

############################################## Part 1 Importing and looking at your data #########################################

# Note that using a # allows you add comments to annotate your code. It's easy to write lots of lines of
# code and to loose track of why you did certain things. Keeping notes is generally a good idea.

# R works using commands. This means that you can write a line of code that tells R what to do.
# for example
1+2
# this tells R to calculate 1 + 2 (Yes! R will also do calculations like a calculator and this is a very useful feature)

# First, you will want to read in your data. R allows you to read and write many different data types.
# This includes files from EXCEL, data files commonly used in ArcGIS (e.g. .dbf), and other statistical,
# including SPSS and STATA. Some of these features are available in base, others have to be imported in separate packages
# We will use .csv files, which are easily transferable and which can be easily saved in and opened in EXCEL.

data <- read.csv(file.choose()) # this line of data includes two commands, read.csv() - which reads in the data,
# and file.choose() - which allows you to select your file using a browser.

# This command allows you to see the first few lines of data in your dataset
head(data)

# Looks at the variable "weight..kg.". It seems like R are doesn't like brackets ().
# Let's fix and rename the weight and height variables
colnames(data)[colnames(data) == "weight..kg."] <- c("weight")
colnames(data)[colnames(data) == "height..m."] <- c("height")

# Let's try and get some summary statistics

summary(data)

# note that the categorical variables have been imported as "characters" and we need to change these to be factors so that R can identify them as categorical variables
# let's re-assign them
data$pokemon <- as.factor(data$pokemon)
data$type <- as.factor(data$type)
data$sex <- as.factor(data$sex)
data$surface <- as.factor(data$surface)

# Let's run the summary function again to see what has happened
summary(data)
# let's check if what we have done has fixed the problem
names(data)

# now let's calculate the mean and the standard deviation for pokemon height and weight
# this command let's you calculate the mean for Pokemon weight in our sample (note that we specify the dataset [data] and the variable [weight]
# we and use $ to do this).
mean(data$weight)

# this command let's you calculate the standard deviation

sd(data$weight)

# let's calculate the standard error, which you'll remember is the standard deviation divided by the square root of the sample size
# Note the command nrow will take on the value of the number of rows of the data frame in brackets. It's your sample size or n
sd(data$weight)/sqrt(nrow(data))

# we can also create an object and assign a value or a calculation (this will also appear in the Environment pannel)
weight_sd <- sd(data$weight)

# if you run that command you'll get the value in the console
weight_sd

# we can also create an object for the square root of the sample size
sqrt_n <- sqrt(nrow(data))

# and we can now divide the standard deviation by the square root of n
weight_sd/sqrt_n

# We can also use a package with an in-built function

# first we need to install the package
install.packages("plotrix")

# Now we need to upload or attach the package

library(plotrix)

# now we can use the std.error() function

std.error(data$weight)

# now let's calculate the mean for Pokemons caught in the park
# note that the first part of the command is the same as above - the second part specifies the surface
mean(data$weight[which(data$surface == "natural") ], )

# now let's calculate the mean for Pokemons caught in the park and that are bug types
mean(data$weight[which(data$surface == "natural" & data$type == "bug") ], )

# What is the mean for bugn types on built surfaces?

# Generating a table to calculate all these sub classifications in individual steps is very time consuming
# Again, we can speed up the process by using a package
# Let's install another package
install.packages("doBy")

# and upload the package

library(doBy)

# let use the summaryBy() function to calculate the mean height and weight of Pokemons by surface type
# note that FUN = mean is telling summaryBy to calculate the mean, you could equally ask it
# to calculate the standard deviation
summaryBy(weight + height ~ surface + type, FUN = mean, data = data)

# we can also get summaryBy() to run several calculations using a function

# note here that we are also using std.error from the package plotrix
summaryBy(weight + height ~ surface,
data = data,
FUN = function(x) { c(mean = mean(x), sd = sd(x), sum = sum(x)) }
)

# we can also get summaryBy() to calculate statistics for both surface and pokemon type
# this time we are also going to assign it an object
# you could read what summaryBy does as: calculate weight and height as a function of surface and pokemon type
summary.table <- summaryBy(weight + height ~ surface + type,
data = data,
FUN = function(x) { c(mean = mean(x), sd = sd(x), se = std.error(x)) }
)
summary.table

# How come you are getting NA in some rows?

# You can check your data by double clicking on the data icon (the one with the little blue arrow) in the "Environment" tab in the upper
# right hand corner. How many poison types did we find in built areas?
# R cannot calculate standard deviations if there is only one row of data.

# now let's save this table so that we can use it later

# note that we first used read.csv; now we will use write.csv
write.csv(summary.table, "/Users/user/Desktop/PokemonTable.csv")
# One of the things that we might want to do is to also get some statistics for ranges of values
# First we will look at the range of values for weight in our data
# Notice that this function gives us the min. and the max. as well as the mean and the median, and the quartiles
summary(data$weight)

# Now let's plot some frequency distributions for HP

# Note that we have now generate a histogram, reflecting the values we just calculated
hist(data$HP)

# Now let's try and compare the frequency distributions of height between the two surfaces
# We are going to use a new package called "ggplot" which is the most powerful package for data visualisation
# Lets install and load the package first
install.packages("ggplot2")
library(ggplot2)

# now lets graph the distribution for HP

ggplot(data, aes(HP)) + # this part tells ggplot the data and variables you are selecting to plot, note that here you are only graphing on value on the x axis
geom_histogram(bins = 30) # this part tells ggplot what kind of plot you want to make and sets the number of bars you want to draw

# now lets graph the distribution for HP for the surfaces that we have
# note that in this instance, we have divided the dataset and drawing two separate histograms
# we can colour them and also change the transparency. In this case, we can make the green bars more transparent so that you can see the grey bars behind them
# we can also add the axis labels
ggplot() +
geom_histogram(data = data[which(data$surface == "built"),], aes(HP), bins = 6, fill = "grey") +
geom_histogram(data = data[which(data$surface == "natural"),], aes(HP), bins = 7, fill = "green", alpha = 0.3) +
xlab("Pokemon HP") +
ylab("Number of Pokemons") +
theme_bw()

# another way of graphing distributions is to use density plots

# let's consider variables
ggplot() +
geom_density(data = data[which(data$sex == "male"),], aes(CP), colour = "grey") +
geom_density(data = data[which(data$sex == "female"),], aes(CP), colour = "green") +
xlab("Pokemon HP") +
ylab("Density") +
theme_classic() # we can change the "theme" of graphs two
############################################## Part 2 Comparing means (t test) #########################################

# let's calculate the t-statistic

# now we can calculate the means for HP for the two surfaces
m_built <- mean(data$HP[which(data$surface == "built") ], )
m_natural <- mean(data$HP[which(data$surface == "natural") ], )

# and now the variance

s_built <- var(data$HP[which(data$surface == "built") ], )
s_natural<- var(data$HP[which(data$surface == "natural") ], )

# we can use the summary function to provide us with some summary statistics, including the number of datapoints (length) in each group
# how many data points do we have in each group?
summary(data)

# so according to the formula for the t-statistic

(m_built - m_natural) / sqrt((s_built/18)+(s_natural/40))

# now let's compare that with the built in t.test function in {base}
t.test(data$HP~data$surface)

# is there a statistical difference Pokemon HP between surfaces?

# can you run another t-test compare groups?
############################################## Compare frequencies (Chi Square test) #########################################

### Chi Square Analysis

# let's try and see whether there's a difference in the frequency of male and female Pokemon between surfaces
# let's generate a summary table for our analysis
summaryBy(type ~ surface + sex, FUN = sum, data = data)

# Now let's generate the contingency table

# the table is what we call a matrix
# f m
C<-matrix(c(85, 90, #built
141, 209), #natural
nrow=2)

# what does our table look like

# Now with a pen and paper let's calculate the chi-square statistic; you can use R as a calculator if you wish

# And now, let's run the Chi Square test running the test using R's function
chisq.test(C, correct = F)

###################################################### Quantitative Practical 1 [END] ##################################################

Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
STAT319 Lab Manual Based On R - Final Version
No ratings yet
STAT319 Lab Manual Based On R - Final Version
127 pages
Verzani Answers
100% (8)
Verzani Answers
94 pages
Introduction To Biostatistics A Guide To Design, Analysis, and Discovery (FULL VERSION DOWNLOAD)
100% (12)
Introduction To Biostatistics A Guide To Design, Analysis, and Discovery (FULL VERSION DOWNLOAD)
15 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
R Course
No ratings yet
R Course
64 pages
R Graphics Essentials Great Data Visualization
No ratings yet
R Graphics Essentials Great Data Visualization
248 pages
Da Lab File
No ratings yet
Da Lab File
33 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
CPT5 - Short Circuit Analysis - July 25, 2005
100% (3)
CPT5 - Short Circuit Analysis - July 25, 2005
235 pages
MorphoTools2 Tutorial
No ratings yet
MorphoTools2 Tutorial
42 pages
#PART 1a) : "Vqv/ggbiplot"
No ratings yet
#PART 1a) : "Vqv/ggbiplot"
29 pages
Parta PDF
No ratings yet
Parta PDF
153 pages
R Textbook Full
No ratings yet
R Textbook Full
96 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Summarizing Data
No ratings yet
Summarizing Data
13 pages
Visual Statistics Use R
No ratings yet
Visual Statistics Use R
451 pages
Malasakit Form
100% (1)
Malasakit Form
2 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
R Practical
No ratings yet
R Practical
9 pages
ComputerLabNotes 2024
No ratings yet
ComputerLabNotes 2024
109 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
Lucero R Tutorial 2016
No ratings yet
Lucero R Tutorial 2016
135 pages
R Statistics For Comparing Means Interior
100% (1)
R Statistics For Comparing Means Interior
205 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Visual Statistics Use R!
50% (2)
Visual Statistics Use R!
388 pages
Visual Statistics Use R PDF
No ratings yet
Visual Statistics Use R PDF
388 pages
R Assignment 1 Instructions 202501
No ratings yet
R Assignment 1 Instructions 202501
4 pages
Lab3Instructions Knitr
No ratings yet
Lab3Instructions Knitr
5 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
Clustering 2
No ratings yet
Clustering 2
11 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Rbook PDF
No ratings yet
Rbook PDF
360 pages
Shipunov Visual Statistics
No ratings yet
Shipunov Visual Statistics
429 pages
Algorithm M
No ratings yet
Algorithm M
8 pages
R Commands
No ratings yet
R Commands
18 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Canadian Manual On Foundation Engineering
No ratings yet
Canadian Manual On Foundation Engineering
297 pages
Eidd S8 TD1
No ratings yet
Eidd S8 TD1
3 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
OH-SFF Naval Manual
No ratings yet
OH-SFF Naval Manual
180 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Introduction To Biostatistics A Guide To Design, Analysis, and Discovery ISBN 0122622707, 9780122622700 Illustrated Ebook Download
No ratings yet
Introduction To Biostatistics A Guide To Design, Analysis, and Discovery ISBN 0122622707, 9780122622700 Illustrated Ebook Download
16 pages
Smoke Control Hotels PDF
No ratings yet
Smoke Control Hotels PDF
9 pages
R Session - Note2 - Updated
No ratings yet
R Session - Note2 - Updated
7 pages
R Console
No ratings yet
R Console
6 pages
R
No ratings yet
R
4 pages
Contents
No ratings yet
Contents
17 pages
R Commands
No ratings yet
R Commands
5 pages
Graphics Chapter
No ratings yet
Graphics Chapter
49 pages
Statistics With R Programming PDF
No ratings yet
Statistics With R Programming PDF
53 pages
UL2
No ratings yet
UL2
2 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
BAN5
No ratings yet
BAN5
2 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Chapter-3, Size of Business
No ratings yet
Chapter-3, Size of Business
4 pages
Newseam 1 Module 2 Matanacio
No ratings yet
Newseam 1 Module 2 Matanacio
32 pages
Stock Watson 3U ExerciseSolutions Chapter5 Students PDF
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter5 Students PDF
9 pages
Configuring A JOB in T24
No ratings yet
Configuring A JOB in T24
2 pages
Nursing Informatics Week 1
No ratings yet
Nursing Informatics Week 1
37 pages
Chapter: 9.8 HTML Images Topic: 9.8.1 HTML Images: E-Content of Internet Technology and Web Design
No ratings yet
Chapter: 9.8 HTML Images Topic: 9.8.1 HTML Images: E-Content of Internet Technology and Web Design
7 pages
Load Line 1979
No ratings yet
Load Line 1979
76 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
IPC - 912 - 914 Series - ED4 - R3
No ratings yet
IPC - 912 - 914 Series - ED4 - R3
202 pages
Technology Newsletter
No ratings yet
Technology Newsletter
5 pages
Losses in Piping System
No ratings yet
Losses in Piping System
18 pages
FoxScanner+Update+Guide+EN V1.00
No ratings yet
FoxScanner+Update+Guide+EN V1.00
12 pages
HTM 08-03 PDF
No ratings yet
HTM 08-03 PDF
54 pages
Comparative Analysis of Truss Bridges IJERTV10IS010168
No ratings yet
Comparative Analysis of Truss Bridges IJERTV10IS010168
3 pages
Nop 180
No ratings yet
Nop 180
2 pages
Multiple Injuries After Ship Tips Over at Edinburgh Dockyard
No ratings yet
Multiple Injuries After Ship Tips Over at Edinburgh Dockyard
10 pages
TML Lib CJ1 Motion Control Library For o
No ratings yet
TML Lib CJ1 Motion Control Library For o
2 pages
ARM313R Data Sheet
No ratings yet
ARM313R Data Sheet
2 pages
Assignment MCA 103
No ratings yet
Assignment MCA 103
4 pages
History 4/3 Gold Mining 1886
No ratings yet
History 4/3 Gold Mining 1886
15 pages
Evaluation of Quickcampus++ As Integrated Student Management System of Pangasinan State University
No ratings yet
Evaluation of Quickcampus++ As Integrated Student Management System of Pangasinan State University
4 pages
Questions 1. Research Design: Balangay: A Proposed Flood Resilient House Methodology
No ratings yet
Questions 1. Research Design: Balangay: A Proposed Flood Resilient House Methodology
3 pages
Final PPT CAMPUS
No ratings yet
Final PPT CAMPUS
20 pages
U.S. Seismic Design Maps CHRYSLER
No ratings yet
U.S. Seismic Design Maps CHRYSLER
2 pages
Air Act 1981 Project Arjun Dubey 4046
No ratings yet
Air Act 1981 Project Arjun Dubey 4046
3 pages
Packing List 082022140
No ratings yet
Packing List 082022140
2 pages
10 Lessons in Front-end
From Everand
10 Lessons in Front-end
Krasimir Tsonev
2/5 (1)
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DevRes wk1-2

Uploaded by

DevRes wk1-2

Uploaded by

# Welcome to Development Research!

###################################################### Quantitative Practical 1 [START] ##################################################

############################################## Part 1 Importing and looking at your data #########################################

# Let's try and get some summary statistics

# this command let's you calculate the standard deviation

# We can also use a package with an in-built function

# Now we need to upload or attach the package

# now we can use the std.error() function

# What is the mean for bugn types on built surfaces?

# and upload the package

# we can also get summaryBy() to run several calculations using a function

# How come you are getting NA in some rows?

# now let's save this table so that we can use it later

# Now let's plot some frequency distributions for HP

# now lets graph the distribution for HP

# another way of graphing distributions is to use density plots

# let's calculate the t-statistic

# and now the variance

# so according to the formula for the t-statistic

# is there a statistical difference Pokemon HP between surfaces?

### Chi Square Analysis

# Now let's generate the contingency table

# what does our table look like

###################################################### Quantitative Practical 1 [END] ##################################################

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.