An Introduction To R: Biostatistics 615/815
An Introduction To R: Biostatistics 615/815
Biostatistics 615/815
Functional language
Programs are a collection of functions
for,
Homework Notes
Due on Wednesday (by end of the day)
Source code
Indented and commented, if appropriate
This Week
The R programming language
The R Project
Environment for statistical computing and graphics
Free software
Associated with simple programming language
The R Project
Versions of R exist of Windows, MacOS, Linux and various other Unix flavors R was originally written by Ross Ihaka and Robert Gentleman, at the University of Auckland It is an implementation of the S language, which was principally developed by John Chambers
Compiled C vs Interpreted R
C requires a complete program to run
R Function Libraries
Implement many common statistical procedures Provide excellent graphics functionality A convenient starting point for many data analysis projects
R Programming Language
Interpreted language To start, we will review
Interactive R
R defaults to an interactive mode A prompt > is presented to users Each input expression is evaluated and a result returned
R as a Calculator
> 1 + 1 [1] 2 > 2 + 3 * 4 [1] 14 > 3 ^ 2 [1] 9 > exp(1) [1] 2.718282 > sqrt(10) [1] 3.162278 > pi [1] 3.141593 > 2*pi*6378 [1] 40074.16 # Simple Arithmetic # Operator precedence # Exponentiation # Basic mathematical functions are available
Variables in R
Numeric
Sequences of characters
Type determined automatically when variable is created with "<-" operator
R as a Smart Calculator
> x > y > z > x [1] <- 1 <- 3 <- 4 * y * z 12 # Can define variables # using "<-" operator to set values
> X * Y * Z # Variable names are case sensitive Error: Object "X" not found > This.Year <- 2004 > This.Year [1] 2004 # Variable names can include period
R Vectors
A series of numbers Created with
c() to concatenate elements or sub-vectors rep() to repeat elements or patterns seq() or m:n to generate sequences
Most mathematical functions and operators can be applied to vectors
Without loops!
Defining Vectors
> rep(1,10) # repeats the number 1, 10 times [1] 1 1 1 1 1 1 1 1 1 1 > seq(2,6) # sequence of integers between 2 and 6 [1] 2 3 4 5 6 # equivalent to 2:6 > seq(4,20,by=4) # Every 4th integer between 4 and 20 [1] 4 8 12 16 20 > x <- c(2,0,0,4) # Creates vector with elements 2,0,0,4 > y <- c(1,9,9,9) > x + y # Sums elements of two vectors [1] 3 9 9 13 > x * 4 # Multiplies elements [1] 8 0 0 16 > sqrt(x) # Function applies to each element [1] 1.41 0.00 0.00 2.00 # Returns vector
Alternative:
Use vector of T and F values to select subset of elements
Data Frames
Group a collection of related vectors Most of the time, when data is loaded, it will be organized in a data frame Lets look at an example
Parameters header, sep, and na.strings control useful options read.csv() and read.delim() have useful defaults for comma or tab delimited files
bp[,-2]
Lists
Collections of related variables Similar to records in C Created with list function
point
<- list(x = 1, y = 1)
Access to components follows similar rules as for data frames, the following all retrieve x:
point$x;
c(),
seq(), read.table(),
Next
More detail on the R language, with a focus on managing code execution
Programming Constructs
Grouped Expressions Control statements
if
else
Grouped Expressions
{expr_1; expr_2; } Valid wherever single expression could be used Return the result of last expression evaluated Relatively similar to compound statements in C
if else
if (expr_1) expr_2 else expr_3 The first expression should return a single logical value
Example: if else
# Standardize observation i if (sex[i] == male) { z[i] <- (observed[i] males.mean) / males.sd; } else { z[i] <- (observed[i]
for
for (name in expr_1) expr_2 Name is the loop variable expr_1 is often a sequence
20, by = 2)
Example: for
# Sample M random pairings in a set of N objects for (i in 1:M) { # As shown, the sample function returns a single # element in the interval 1:N p = sample(N, 1) q = sample(N, 1) # Additional processing as needed ProcessPair(p, q); }
repeat
repeat expr Continually evaluate expression Loop must be terminated with break statement
Example: repeat
# Sample with replacement from a set of N objects # until the number 615 is sampled twice M <- matches <- 0 repeat { # Keep track of total connections sampled M <- M + 1 # Sample a new connection p = sample(N, 1) # Increment matches whenever we sample 615 if (p == 615) matches <- matches + 1; # Stop after 2 matches if (matches == 2) break; }
while
while (expr_1) expr_2 While expr_1 is false, repeatedly evaluate expr_2 break and next statements can be used within the loop
Example: while
# Sample with replacement from a set of N objects # until 615 and 815 are sampled consecutively match <- false while (match == false) { # sample a new element p = sample(N, 1) # if not 615, then goto next iteration if (p != 615) next; # Sample another element q = sample(N, 1) # Check if we are done if (q != 815) match = true; }
Functions in R
Easy to create your own functions in R As tasks become complex, it is a good idea to organize code into functions that perform defined tasks In R, it is good practice to give default values to function arguments
Function definitions
name <- function(arg1, arg2, ) expression Arguments can be assigned default values: arg_name = expression Return value is the last evaluated expression or can be set explicitly with return()
Defining Functions
> square <- function(x = 10) x * x > square() [1] 100 > square(2) [1] 4 > intsum <- function(from=1, to=10) { sum <- 0 for (i in from:to) sum <- sum + i sum } > intsum(3) # Evaluates sum from 3 to 10 [1] 52 > intsum(to = 3) # Evaluates sum from 1 to 3 [1] 6
Debugging Functions
Toggle debugging for a function with debug()/undebug() command With debugging enabled, R steps through function line by line
Use print() to inspect variables along the way Press <enter> to proceed to next line
So far
Different types of variables
Useful R Functions
Online Help Random Generation Input / Output Data Summaries Exiting R
Random Generation in R
In contrast to many C implementations, R generates pretty good random numbers
set.seed(seed)can be used to select a specific sequence of random numbers sample(x, size, replace = FALSE) generates a sample of size elements from x.
Random Generation
runif(n, min = 1, max = 1)
Samples from Uniform distribution Samples from Binomial distribution Samples from Normal distribution Samples from Exponential distribution Samples from T-distribution
rbinom(n, size, prob) rnorm(n, mean = 0, sd = 1) rexp(n, rate = 1) rt(n, df) And others!
R Help System
R has a built-in help system with useful information and examples help() provides general help help(plot) will explain the plot function help.search(histogram) will search for topics that include the word histogram example(plot) will provide examples for the plot function
Input / Output
Use sink(file) to redirect output to a file Use sink() to restore screen output Use print() or cat() to generate output inside functions Use source(file) to read input from a file
Managing Workspaces
As you generate functions and variables, these are added to your current workspace Use ls() to list workspace contents and rm() to delete variables or functions When you quit, with the q() function, you can save the current workspace for later use
-3
-2
-1 x
> x <- rnorm(1000) > y <- rnorm(1000) + x > summary(y) Min. 1st Qu. Median Mean -4.54800 -1.11000 -0.06909 -0.09652 > var(y) [1] 2.079305 > hist(x, col="lightblue") > plot(x,y)
Max. 4.83200
Good book to browse is Data Analysis and Graphics in R by Maindonald and Braun
# check if connected
# check if connected i = a[p]; while (a[i] != i) i <- a[i] j = a[q]; while (a[j] != j) j <- a[j] if (i == j) next a[i] = j } } # update connectivity array
i = a[p]; while (a[i] != i) i <- a[i] j = a[q]; while (a[j] != j) j <- a[j] if (i == j) next
# FIND
if (weight[i] < weight[j]) # UNION { a[i] = j; weight[j] <- weight[j] + weight[i]; } else { a[j] = i; weight[i] <- weight[i] + weight[j]; } } }
Benchmarking a function
To conduct empirical studies of a functions performance, we dont always need a stopwatch. Relevant functions