DR - Pierpaolo-Delser - Introduction R
DR - Pierpaolo-Delser - Introduction R
Gitlab
27/11/2018
• Data analysis in R
• Data analysis in R
R vs Excel
• …
Why R? What can R do?
R vs Excel
• Automating tasks will pay off within the time frame of a PhD and thereafter
R is more efficient
Why R? What can R do?
R vs Excel
Flexibility:
• Compute analysis;
• Generate reports;
Why R? What can R do?
Why R? What can R do?
Advantages:
Advantages: Disadvantages:
• Good graphics (MATLAB and python) • Working with large datasets is limited by RAM
Advantages: Disadvantages:
• Good graphics (MATLAB and python) • Working with large datasets is limited by RAM
Disadvantages:
Advantages:
t<-read.table(“/home/pier/data/input_stat”)
Advantages:
dim(t) #number of columns and rows
• Functions can be integrated in R packages
Why R? What can R do?
t<-read.table(“/home/pier/data/input_stat”)
Advantages:
dim(t) #number of columns and rows
• Functions can be integrated in R packages
#I want to calculate the sum of each column
and if it is > 10, print out “column X has
sum greater than 10”
for (i in 1:ncol(t)) {
f<-sum(t[,i])
if (f>10) {
print (paste(“Column “, i, “
has a sum greater than 10”, sep=“”))
}
else {
}
}
Why R? What can R do?
t<-read.table(“/home/pier/data/input_stat”)
Advantages:
dim(t) #number of columns and rows
• Functions can be integrated in R packages
#I want to calculate the sum of each column
and if it is > 10, print out “column X has
sum_10 <- function(table) { sum greater than 10”
}
Why R? What can R do?
t<-read.table(“/home/pier/data/input_stat”)
Advantages:
dim(t) #number of columns and rows
• Functions can be integrated in R packages
#I want to calculate the sum of each column
and if it is > 10, print out “column X has
sum_10 <- function(table) { sum greater than 10”
}
Why R? What can R do?
Advantages:
R packages for:
• Statistical analysis;
• Plotting;
• Graphs;
• Managing calendar dates;
• Selecting colour palette;
• Machine learning;
• Population genetics;
• …
Outline:
• Data analysis in R
Variables:
Variables:
Assignment “<-”:
Variables:
Assignment “<-”:
> t<-3
> k<-“hello”
> a<-1.435275289
Basic commands and operations
Arithmetic operators:
• Addition: +
• Subtraction: -
• Division: /
• Multiplication: *
• Exponentiation: ^
> (3+2)^2
[1] 25
> (7-5)/2
[1] 1
> 1*2*3*4
[1] 24
Basic commands and operations
Data type:
Data structure:
> length(t)
[1] 4
Basic commands and operations
c function for concatenating
Data structure: values and vectors to create
longer vectors
• Vector: an ordered collection of data;
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Basic commands and operations
c function for concatenating
Data structure: values and vectors to create
longer vectors
• Vector: an ordered collection of data;
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> dim(t)
[1] 3 2
Basic commands and operations
c function for concatenating
Data structure: values and vectors to create
longer vectors
• Vector: an ordered collection of data;
factor_1 factor_2
Variable_1 1 4
Variable_2 2 5
Varibale_3 3 6
Basic commands and operations
c function for concatenating
Data structure: values and vectors to create
longer vectors
• Vector: an ordered collection of data;
, , land
, , sea
Data structure:
• Data frame: are matrix-like structure but the columns can be of different data types (i.e.
numerical and character)
weight gender
1 1 M
2 23 F
3 4 F
4 56 M
5 32 F
Basic commands and operations
• Vector:
> length(t)
[1] 4
> t[3]
[1] 67
> t[c(2,4)]
[1] 5.6748 5
Basic commands and operations
• Matrix:
> dim(t)
[1] 3 2
• Matrix:
> dim(t)
[1] 3 2
• Matrix:
factor_1 factor_2
Variable_1 1 4
Variable_2 2 5
Varibale_3 3 6
, , land
, , sea
, , land
Deleting:
> t
[1] 10 11 12 13 14 15 16 17 18 19 20
> t1<-t[-2]
> t1
[1] 10 12 13 14 15 16 17 18 19 20
Basic commands and operations
Deleting:
> t
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> which(t>15)
[1] 7 8 9 10 11
> t[which(t>15)]
[1] 16 17 18 19 20
Basic commands and operations
Loops and conditional execution
Syntax:
Syntax:
> for (i in 1:10) {
for (variable in sequence) { + print(paste("hello to our customer number ",
i, sep=""))
statements
+ }
} [1] "hello to our customer number 1"
> for (i in 1:10) {
[1] "hello to our customer number 2"
+ print(i)
[1] "hello to our customer number 3"
+ }
[1] "hello to our customer number 4"
[1] 1
[1] "hello to our customer number 5"
[1] 2
[1] "hello to our customer number 6"
[1] 3
[1] "hello to our customer number 7"
[1] 4
[1] "hello to our customer number 8"
[1] 5
[1] "hello to our customer number 9"
[1] 6
[1] "hello to our customer number 10"
[1] 7
[1] 8
[1] 9
[1] 10
Basic commands and operations
Comparison operators
equal: ==
not equal: !=
greater: >
less than: <
greater or equal: >=
less than or equal: <=
Logical operators
and: &
or: |
not: !
Basic commands and operations
alternative
Logical operators
and: & }
or: |
not: !
Basic commands and operations
} else { + print("Zero")
+ } else {
alternative
+ print("Negative number")
}
+ }
User-defined functions
myFunction(arg1=…, arg2=…)
Basic commands and operations
User-defined functions
> a
[1] -0.9379583 0.6599282 0.6204624
0.4395611 1.0989696 2.4148308
> var(a)
[1] 1.171392
> myvar(a)
[1] 1.171392
Outline:
• Data analysis in R
• Descriptive Statistics
• Statistical Modeling
• Regressions: Linear and Logistic;
• Time Series;
• …
• Multivariate Functions
• Bayesian statistics
• Machine learning
Parallel boxplots
> set.seed(12345)
> weight<-
round(c(rnorm(10,0,1),rnorm(10,2,
1)),3)
> group<-rep(c("ctrl","case"),
each=10)
> mydata<-data.frame(weight,
group)
T-test
Call:
lm(formula = weight ~ group, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-1.6852 -0.4560 0.0184 0.7238 1.5310
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2860 0.2618 8.733 6.89e-08 ***
groupctrl -2.4188 0.3702 -6.534 3.85e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = weight ~ group, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-1.6852 -0.4560 0.0184 0.7238 1.5310
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2860 0.2618 8.733 6.89e-08 ***
groupctrl -2.4188 0.3702 -6.534 3.85e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = weight ~ group, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-1.6852 -0.4560 0.0184 0.7238 1.5310
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2860 0.2618 8.733 6.89e-08 ***
groupctrl -2.4188 0.3702 -6.534 3.85e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = weight ~ group, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-1.6852 -0.4560 0.0184 0.7238 1.5310
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2860 0.2618 8.733 6.89e-08 ***
groupctrl -2.4188 0.3702 -6.534 3.85e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Logistic regression;
• ANOVA;
• ….
Outline:
• Data analysis in R
• Github: “GitHub is a development platform inspired by the way you work. From open
source to business, you can host and review code, manage projects, and build software
alongside 31 million developers.”
Github and Gitlab
• Github: “GitHub is a development platform inspired by the way you work. From open
source to business, you can host and review code, manage projects, and build software
alongside 31 million developers.”
Coding
Github and Gitlab
• Github: “GitHub is a development platform inspired by the way you work. From open
source to business, you can host and review code, manage projects, and build software
alongside 31 million developers.”
Coding Share
Github and Gitlab
• Github: “GitHub is a development platform inspired by the way you work. From open
source to business, you can host and review code, manage projects, and build software
alongside 31 million developers.”
• Github: “GitHub is a development platform inspired by the way you work. From open
source to business, you can host and review code, manage projects, and build software
alongside 31 million developers.”
• Github: “GitHub is a development platform inspired by the way you work. From open
source to business, you can host and review code, manage projects, and build software
alongside 31 million developers.”
Updated
Coding Share Review Improve
code
Github and Gitlab
• Github: “GitHub is a development platform inspired by the way you work. From open
source to business, you can host and review code, manage projects, and build software
alongside 31 million developers.”
Updated
Coding Share Review Improve
code
Github and Gitlab
Github and Gitlab
Github and Gitlab
• Working together (with other users) helps to improve and grow faster;
Conclusions: