0% found this document useful (0 votes)
148 views27 pages

Beginner Guide To R and R Studio V1

This document provides an introduction to using R and RStudio. It discusses downloading and installing R and RStudio, and describes the basic RStudio interface. It then demonstrates how to perform arithmetic operations in R, create variables, load data from files, and calculate basic statistics like mean, median, and standard deviation. Key functions covered include ls(), mean(), median(), and read.csv() for importing data.

Uploaded by

broto_waseso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views27 pages

Beginner Guide To R and R Studio V1

This document provides an introduction to using R and RStudio. It discusses downloading and installing R and RStudio, and describes the basic RStudio interface. It then demonstrates how to perform arithmetic operations in R, create variables, load data from files, and calculate basic statistics like mean, median, and standard deviation. Key functions covered include ls(), mean(), median(), and read.csv() for importing data.

Uploaded by

broto_waseso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Beginner Guide to R and R Studio

Getting Started with R and R studio

Introduction

We will be using R and Rstudio to do many of our statistical


calculations. First of all, we assume that we have never used R
or RStudio before, so we will start from scratch. R is a free
statistical software package used all over the world by lots of
people at universities and workplaces that you can download
onto your computer. RStudio provides the front page for you to
use R. R and RStudio are becoming the go to package for data
science.

Installing R and RStudio

One can either work locally (on your own machine) by


downloading and installing R and RStudio or work in the cloud
accessing RStudio from a web browser. You may choose
RStudio from a web browser or download it on your computer.

RStudio on cloud: https://rstudio.cloud/

Download on your computer: You can go to http://cran.us.r-


project.org/ to download R and then
https://www.rstudio.com/products/rstudio/ to download
RStudio
Using RStudio for the first time

Once logged into the RStudio server you will see the following.
Try not to overwhelmed. Although R Studio is very powerful,
for our intro class we will use it mainly as a fancy graphing
calculator. In more advance classes, we explore the software
further.

This is workspace and


history windows. Shows all
variables and loaded in.

This is called the console This is where graphs are


window where you type in plotted and help topics are
commands shown

To exit RStudio either click on your login name in the upper


right of the window or click the power button icon in the upper
right hand of the window.
Making a New R Script

Although we can just run all our codes in the console, it is


considered good practice to have all the codes in the same
place and that’s when having a file with all the codes comes in
handy for troubleshooting, reusing the codes as pipelines.

To make a new script, we can go to File > New File > R Script or
we can click on the new icon and choose R Script as shown
below.
Now we have 4 windows: clockwise from left, the unsaved R
Script file, Environment and history panel, File and Plots panel
and the console.

To save we use the universal floppy disk icon for save. After
saving the script file we can see the newly saved file in the File
panel in the lower right corner.
Part One: Arithmetic and Variables
Type 8+3 and press return. It doesn’t matter whether there are
space between the values or not.
>8 + 3
[1] 11

The answer is printed in the console as above. We’ll come on to


what the [1] means at the end of this section.
>27 / 5
[1] 5.4

R does everything that a graphic calculator would do. For


instance, trigonometry functions such as sine, cosine, etc. We
are asking for cosine of -𝜋 in the following:

# calculate the absolute value of cosine of -pi ~3.141592...


>cos(-pi)
[1] -1

Command abs() returns the absolute value and -2^3 is -2 to the


power of 3 which without abs() yields to -8

# absolute value of -2 to the power 3


>abs(-2^3)
[1] 8

And finally in order to take square roots we use the sqrt()


function.
# Square root of 4068289
>sqrt(4068289)
[1] 2017

These calculations have just produced output in the console –


no values have been saved.

To save value, it can be assigned to a data structure name. “=”


is generally used as the assignment operator, though “<-“ is
sometimes used as well. For now we’ll use x,y,z as names of
data structures, though more informative names can be used as
discussed later in this section.

# assigning the result of 8 + 3 to the variable x


>x = 8 + 3

If R has performed the command successfully, you will not see


any output, as the value of 8+3 has been saved to the data
structure called x. You can access and use this data structure at
any time and can be print the value of x into the console.
>x
[1] 11

Create another data structure called y.


# assign number 3 to variable y
>y = 3

Now that values have been assigned to x and y they can be


used in calculations.
# adding x and y
>x+y
[1] 14

# multiplying x and y
>x*y
[1] 33

# assigning the result of x multiplied by y to variable z


>z = x * y
>z
[1] 33

Important: R is case sensitive so z and Z are not the same. If you


try to print the value of Z out into the console an error will be
returned as Z has not been used so far in this section.

# we can wee what's inside y by looking in to values pane in the


environment
# or just by calling z
>z

# variables are case-sensitive while we have defined the


variable x, X (uppercase) is not defined
>Z
Error: object ‘Z” not found
To check what data structures you have created, enter ls() into
the console or look at the ‘workspace’ tab in RStudio (the upper
right window)

# list the objects in the environemnt e.g. data, variables,


functions, etc.
>ls()
[1] “x” “y” “z”

we can remove an object from the environment by rm()


function
>rm(y)

# now variable/object y is not defined i.e. deleted


>ls()
[1] “x” “z”

In order to clear all the objects from the environment, we can


clear the workspace by going to Session > Clear Workspace….>
check include objects > Yes. And done!

Caution: If you use the same data structure name as one that
you have previously used, then R will overwrite the previous
information with the new information without warning the
user.
Example Two: Data Vector and Loading Packages,
and Basic Stats

Suppose we have a small dataset we want to find the summary


statistics for, i.e. the mean, median, standard deviation,
variance and quartiles.

Let’s say this is our data: -3 2 0 1.5 4 1 3 9

Since it is a very small set of data, we can type it in using the “c”
command, short for combine or concatenate:

# make a variable called "ourdata" containing a vecotr of


numbers
>ourdata = c(-3,2,0,1.5,4,1,3,8)

#what is the length of our data?


>length(ourdata)
[1] 8

# call the fifth element in ourdata


>ourdata[5]
[1] 4

# calculate mean
>mean(ourdata)
[1] 2.0625

# calculate median
>median(ourdata)
[1] 1.75

# calculate range
>range(ourdata)
[1] -3 8

# calculate standard deviation


>sd(ourdata)
[1] 3.189688

# calculate variance
>var(ourdata)
[1] 10.17411

# using summary function to get basic stats


>summary(ourdata)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.000 0.750 1.750 2.062 3.250 8.000

The functionality of R can also be extended by many different


packages. We show you an extensive function below from the
psych package for descriptive statistics. For example here we
would like to use the describe() function past of “psych” library.
So, first we need to load the package with the library() function
and then we can use the function of interest in this case,
describe(). You may need to install this package first using the
command.
# load the pscych library
>install.packages("psych")
>library(psych)
>describe(ourdata)
vars n mean sd median trimmed mad min max range skew
kurtosis se
X1 1 8 2.06 3.19 1.75 2.06 2.22 -3 8 11 0.3 -0.66 1.13

Example 3: Load data from file, attach()

Sometimes the dataset is much larger and we prefer to read it


in directly into RStudio, rather than type in by hand as above in
Example Two.

We have an old data set on cars from 1978, accessible at


http://www.datadescant.com/stat104.cars10.csv

This data set may be read into RStudio as follows (and this is
how we will read in all data sets for this class).

#loading the data with read.csv() command and assigning to


mydata
>mydata=read.csv("http://www.datadescant.com/stat104.cars
10.csv")

# getting the name of columns


>names(mydata)
[1] "make" "price" "mpg" "headroom" "trunk"
[6] "weight" "length" "turn" "displacement"
"gear_ratio"
[11] "foreign"

The dim() command tells us the dimensions of our data set; we


have 74 rows of data on 11 variables.

# getting the dimension of the dataset


>dim(mydata)
[1] 74 11

Sometimes it helps to visualize the data in a tabular, excel-like


format. To do so, we can use the function view() or just simple
click on the mydata on the environment window.

Acessing data or a variable


We can access our data in several ways. The obvious but
cumbersome approach as follows:

Using the $ to choose a column dataframename$colname


this will bring back a list of values from "mydata" dataframe
calling the "mpg" column. The number in the brackets indicate
index of values just to make life easier.
>mydata$mpg
[1] 22 17 22 20 15 18 26 20 16 19 14 14 21 29 16 22 22 24 19 30
18 16 17 28 21
[26] 12 12 14 22 14 15 18 14 20 21 19 19 18 19 24 16 28 34 25
26 18 18 18 19 19
[51] 19 24 17 23 25 23 35 24 21 21 25 28 30 14 26 35 18 31 18
23 41 25 25 17

Now, it seems appropriate to talk about the meaning of


number is in the brackets. By the dim() function we learned
that the car data that now we have saved into mydata variables
has 74 rows and 11 columns. When we use the “$” operator, R
returns a list of mpg data and the number in the brackets
corresponds to location of data points in the list. For instance in
the last line we see, [73] 25 17, which means 73rd miles per
gallon (mpg) is 25 and therefor 74th is 17. It’s just an R
convention: dataframes (2D) have a dimension list. You might
want to verify this by view(mydata) and look at the 73rd and
74th data point in the mpg column.

Now, that we have learned to extract a column of data from a


data frame. We can call the function on the selected column.
For examples, if we want to know the median value of mile per
gallon of cars we run the following:

# in case you wanted to confirm the list index numbers :)


View(mydata)

# median of MPGs
>median(mydata$mpg)
[1] 20

Alternatively we can “attach” the data set into RStudio’s


memory and work directly with the variable names as follows:

# Using attach () and detach()


>attach(mydata)

# median of MPGs just by calling the "attached" dataframe


>median(mpg)
[1] 20

# standard deviation of MPGs


>sd(mpg)
[1] 5.785503

it is very important to detach() especially while working with


multiple dataframes.
>detach(mydata)

# now R might complain that object mpg is not found


sd(mpg)
Error in is.data.frame(x) : object 'mpg' not found

# we still can use the $ to access the mpg column


sd(mydata$mpg)
[1] 5.785503

Eventually, we will learn shortcuts such as follows to look at the


descriptive statistics for three variables at once.

#calling the describe() function on combination of MPG, Price,


and Length columns
>describe(mydata[,c("mpg","price", "length")])
vars n mean sd median trimmed mad min max range skew kurtosis se
mpg 1 74 21.30 5.79 20.0 20.77 5.19 12 41 29 0.93. 0.87 0.67
price 2 74 6165.26 2949.50 5006.5 5614.17 1358.06 3291 15906 12615 1.62. 1.69 342.87
length 3 74 187.93 22.27 192.5 188.03 28.17 142 233 91 -0.04 -1.01 2.59

PART II: MAKING GRAPHS!

R Comment to draw graph:


• plot()
• hist()
• dotplot()
• boxplot()
• par(mfrow=c( , )). Escape by dev.off()
• general argument:
o col=
o pch=
o cex=

Plotting Data Point: plot()


Let’s asssign values to x and y
>x <- c(1,2,3)
>y <- c(1,4,9)

plot() function needs two set of data points and will plot them
against each other. Then, I will introduce some of the options
we can use to make out plots more detailed, informative, and
representative.

# making a simple plot


>plot(x,y)
I encourage you to change the values of pch, cex, main, sub,
xlab, and ylab:
>plot(x,y, xlab = "x", ylab="y", pch = 19, cex=0.8, col = "blue",
xlim = c(0,4), ylim=c(0,10), main="Our First Plot!",sub = "STAT
100")

Now, let make a scatterplot of weight and price of the car data.
We put the essential price and weight in to the plot function
and rest is just to make the plot prettier. The pch argument
defines the shape of the dots.
# attaching mydata to the environment
>attach(mydata)

#plot price based on weight


>plot(weight,price,main="Price of a car versus Weight", pch =
19, cex= 0.8,xlab= "Weight (lbs)", ylab="Price ($)")

Making Histograms

One can also create new variables in R, such as tranformations


of existing variables. Consider the variables price. Here is a
histogram of the variables.
>hist(price, col = "lightgreen")

We are going to see if we can find a transformation that makes


it more symmetric. The par() command (as in parameter)
combined with mfrow argument (as in matrix from rows) used
below lets us plot several graphs at once on the same page and
compare the data more side by side.

# Transforming the price data


>invprice=1/price
>logprice=log(price) #simple log by default is the natural log
>sqrtprice=sqrt(price)

# ask for a 2 x 2 matrix pattern for plots


>par(mfrow=c(2,2))

# make histogram of price and 3 transformed price data points


>hist(price,main="Price")
>hist(invprice,main="1/Price")
>hist(sqrtprice,main="sqrt(Price)")
>hist(logprice,main="log(Price)")

Note: if you make another plot, R still thinks we would like to


continue with par(mfrow=c(2,2)) pattern. To escape and make
independent plot again, we would need this command:
dev.off()
# escape from par(mfrow=c(2,2))
>dev.off()
Dot Plots: dotPlots()

An alternative way to visualize data points distributions can be


done by dotPlots() function. This function resides in BHH2
package that we will load by calling it with library() functcion.

# making dot plots ~ stacked scatter plot simillar to Stem and


Leaf Plots
# loading BHH2 Package, where dotPlot() function resides.
>install.packages("BHH2")
>library(BHH2)

# Ask for a 2 x 2 matrix pattern in Parameters


>par(mfrow=c(2,2))

# making dot plots of price and the 3 transformed price data


points
>dotPlot(price,main="Price")
>dotPlot(invprice,main="1/Price")
>dotPlot(sqrtprice,main="sqrt(Price)")
>dotPlot(logprice,main="log(Price)")
# escape from par(mfrow=c(2,2))
>dev.off()

Making Boxplots

>boxplot(mpg, main="Boxplot of MPGs")


# I would encourage you to change the horizontal = TRUE to
horizontal = FALSE and replot!
>boxplot(mpg[foreign=="Foreign"],mpg[foreign=="Domestic"],
horizontal=TRUE,names=c("Foreign","Domestic"),main="MPG
by Origin of Car", col = c("blue","red"))
Command Function Example
read.csv() Reads a csv file into R Read.csv(“file_name.csv)

mean() Computes back to mean mean(x)


median() Computes back to median median(y)
var() Computes back to variance var(1:10)
sd() Computes back to standard deviation sd(c(1,5,6,10,-7)
summary() Gives a basic statistical summary summary(ourdata)
min() Returns the minimum value min(ourdata)
max() Returns the maximun value max(ourdata)
range() Returns the range ie min max range(ourdata)
table() Makes a frequency table table(mydata$mpg)

plot() Plots 2D graph from 1 or two sets of plot(x,y)


data points
boxplot() Makes a boxplot boxplot(mydata$mpg)
hist() Make a histogram of a chosen data hist(mydata$mpg)
series
dotPlot() Makes stacked scatter plot similar to dotPlot(mydata$mpg)
stem and leaf plots

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy