0% found this document useful (0 votes)
33 views93 pages

R-Unit 4

Uploaded by

sanjayyalla4661
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views93 pages

R-Unit 4

Uploaded by

sanjayyalla4661
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

R Programming( 20MCA2PERR)

UNIT – 4

1
Histogram
• A histogram represents the frequencies of values of a variable bucketed into ranges.
• Histogram is similar to bar chart but the difference is it groups the values into continuous ranges.
• Each bar in histogram represents the height of the number of values present in that range.

Syntax:
hist(v,main,xlab,xlim,ylim,breaks,col,border) #ylab is optional

Following is the description of the parameters used −


• v is a vector containing numeric values used in histogram.
• main indicates title of the chart.
• xlab is used to give description of x-axis.
• xlim is used to specify the range of values on the x-axis.
• ylim is used to specify the range of values on the y-axis.
• breaks is used to mention the width of each bar.
• col is used to set color of the bars.
• border is used to set border color of each bar. 2
Tools->Global options->R Mark down
select "window" from that list in the "show output preview in:" then apply.

Tools > Global Options > Pane Layout, "Plots" is checked.

Update the RStudio.

3
Temperature <- airquality$Temp
hist(Temperature)

>colors()#To know the colours supported by R

4
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.


png(file = "histogram.png") #saved in current working directory:getwd()

# Create the histogram.


hist(v, xlab = "Weight",col = "pink",border = "red")

# Save the file.


dev.off()

5
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.


png(file = "histogram_lim_breaks.png")

# Create the histogram.


hist(v,xlab = "Weight",col = "green",border = "red", xlim = c(0,40), ylim = c(0,5), breaks = 5)

# Save the file.


dev.off()

#Try freq attribute

6
Histogram with non-uniform width

hist(Temperature,
main="Maximum daily temperature",
xlab="Temperature in degrees Fahrenheit",
xlim=c(50,100),
col="chocolate",
border="brown",
breaks=c(55,60,70,75,80,100)
)

#Frequency changed as density:


#depends of breaks vector

7
Barplot
• A barplot is a tool to visualize the distribution of a qualitative variable. We draw a barplot of
the qualitative variable size

• Syntax to create a bar-chart


barplot(H,xlab,ylab,main, names.arg,col)

• Following is the description of the parameters used :


• H is a vector or matrix containing numeric values used in bar chart.
• xlab is the label for x axis.
• ylab is the label for y axis.
• main is the title of the bar chart.
• names.arg is a vector of names appearing under each bar.
• col is used to give colors to the bars in the graph.

8
Bar Chart Labels, Title and Colors
# Create the data for the chart
R <- c(7,12,28,3,41)
M <- c("Mar","Apr","May","Jun","Jul")

# Give the chart file a name


png(file = "barchart_months_revenue.png")

# Plot the bar chart


barplot(R,names.arg=M, xlab="Month",ylab="Revenue",col="blue",main="Revenue chart",border="red")

# Save the file


dev.off()

9
Group Bar Chart and Stacked Bar Chart
# Create the input vectors.
colors = c("green","orange","brown")
months <- c("Mar","Apr","May","Jun","Jul")
regions <- c("East","West","North")

# Create the matrix of the values.


Values <- matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11), nrow = 3, ncol = 5, byrow = TRUE)

# Give the chart file a name


png(file = "barchart_stacked.png")

# Create the bar chart


barplot(Values, main = "total revenue", names.arg = months, xlab = "month", ylab = "revenue", col = colors)

# Add the legend to the chart


legend("topleft", regions, cex = 1.3, fill = colors)

# Save the file


dev.off()

10
# Create the input vectors.
colors = c("green","orange","brown")
months <- c("Mar","Apr","May","Jun","Jul")
regions <- c("East","West","North")
# Create the matrix of the values.
Values <- matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11), nrow = 3, ncol = 5, byrow = TRUE)
barplot(Values, main = "total revenue", names.arg = months, xlab = "month", ylab = "revenue", col = colors, beside=TRUE)
# Add the legend to the chart
legend("topleft", regions, cex = 1.3, fill = colors)

11
Try the following attributes in Barplot

width=c( )
space=n
axes=FALSE
legend=c()
horiz=TRUE

12
Team A Team B Team C Team D Team E
Round 1 34 56 12 89 67
Round 2 12 56 78 45 90
Round 3 14 23 45 25 89

Team A Team B Team C Team D Team E


Round 1 34 56 12 89 67
Round 2 12 56 78 45 90
Round 3 14 23 45 25 89

13
Boxplots
• Boxplots are a measure of how well distributed is the data in a data set. It divides the data set into three
quartiles.
• This graph represents the minimum, maximum, median, first quartile and third quartile in the data set.
• It is also useful in comparing the distribution of data across data sets by drawing boxplots for each of
them
• Syntax
boxplot(x, data, notch, varwidth, names, main)

Following is the description of the parameters used:


• x is a vector or a formula.
• data is the data frame.
• notch is a logical value. Set as TRUE to draw a notch.
• varwidth is a logical value. Set as true to draw width of the box proportionate to the sample size.
• names are the group labels which will be printed under each boxplot.
• main is used to give a title to the graph.

14
Boxplots

15
Boxplots

x <- c(8, 5, 14, 12, 3, 9, 7, 4, 4, 6, 8, 12, 2, 0, 5, 3)


boxplot(x, horizontal = TRUE)
boxplot(x)

x <- c(8, 5, 14, 12, 3, 9, 7, 4, 4, 6, 8, 12, 2, 0, 5, 3)


boxplot(x)

16
Boxplots
In this example, the data set "mtcars" available in the R environment to create a basic
boxplot.

boxplot(mpg ~ cyl,data= mtcars,


xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
main = "Mileage Data",col="pink")

17
Boxplot with Notch
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
main = "Mileage Data",
notch = TRUE,
varwidth = TRUE,
col = c("green", "red", "blue"),
names = c("High", "Medium", "Low")
)

18
Use the mtcars dataset available in R by default and produce the following plot.

boxplot(mpg ~ cyl, data=mtcars, notch=TRUE, varwidth=TRUE,col="pink", boxcol="blue",


medcol="red", medlwd=1,xlab="Number of cylinders", ylab = "Miles per
Gallon",whiskcol="green",staplecol="purple")

19
PlantGrowth is a built-in Rdataset. has 30 rows and two columns. The “weight” column represents the dry
biomass of each plant in grams, while the “group” column describes the experimental treatment that each
plant was given.
The
'data.frame': 30 obs. of 2 variables:
$ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
$ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...

Create a horizontal boxplot of weight as a function of group.


Provide the x axis label as “Treatment group” and Y axis label as “Dried Biomass Weight”.
Set the color of box as red and change the box border color.
Change the line type of the whisker.
Change the color and width of staple and whisker.

20
boxplot(weight ~ group,
data = PlantGrowth,
main = "PlantGrowth data",
xlab = "Treatment Group",
ylab = "Dried Biomass Weight",
col = "red",
boxlty = 0,
border="green",
whisklty = 3,
whisklwd = 1.5,
whiskcol="purple",
staplelwd = 1.5,
staplecol="pink",
horizontal = TRUE)

21
Contingency Table

• A contingency table, sometimes called a two-way frequency table, is a tabular


mechanism with at least two rows and two columns used in statistics to present
categorical data in terms of frequency counts.
• More precisely, an r×c contingency table shows the observed frequency of two
variables, the observed frequencies of which are arranged into r rows and c columns.

gender cup cone sundae sandwich other


male 592 300 204 24 80
female 410 335 180 20 55

22
The following table of values shows a sample of 2300 music listeners classified by age,
education and whether they listen to classical music.

This is a 2×2×2 contingency table.

23
Mosaic Plots
• Mosaic plots give a graphical representation of these successive decompositions.
• Counts are represented by rectangles.
• At each stage of plot creation, the rectangles are split parallel to one of the two axes

A mosaic plot is an area-proportional visualisation of observed frequencies, composed


of tiles (corresponding to the cells in the contigency table) created by recursive vertical
and horizontal splits of a rectangle. The area of each tile is proportional to the
corresponding cell entry.

24
Old Versus Young

25
Education Level Music Listening

26
The following table gives the details of number of participants for an event.
Plot this data using Mosaic plot

Age Adult Child


Sex
Female 425 45
Male 1667 64

participationdata = c(425,1667,45,64)
dim(participationdata) = c(2, 2)
dimnames(participationdata) =list(Sex = c("Female", "Male"), Age = c("Adult", "child"))
participationdetails<-as.table(participationdata) # To change sex vs age to age vs.sex
print(participationdata)
mosaicplot(~ Age + Sex, data=participationdetails,col="light yellow",main="Participation Details")
27
Plot the following Vaccination data using Mosaic plot

Age Adult Child


Sex Survived
Female No 109 17
Yes 316 28
Male No 1329 35
Yes 338 29

Age Adult Child


Sex Survived
Female No 109 17
Yes 316 28
Male No 1329 35
Yes 338 29
28
vaccination = c(109,316,17,28,
1329,338,35,29)
dim(vaccination) = c(2, 2, 2)
dimnames(vaccination) =list(Survived = c("No", "Yes"),
Age = c("Adult", "child"),
Sex = c("Female", "Male"))
print(vaccination)
#mosaicplot(~ Sex + Age + Survived, data = vaccination)
mosaicplot(vaccination,col="blue",cex.lab=1.1,cex.axis=1,cex.main =2, main="vaccination Survival details")
, , Sex = Female

Age
Survived Adult child
No 109 17
Yes 316 28

, , Sex = Male

Age
Survived Adult child
No 1329 35
Yes 338 29 29
music = c(210, 194, 190, 406,
170, 110, 730, 290)
dim(music) = c(2, 2, 2)
dimnames(music) =list(Age = c("Old", "Young"),
Listen = c("Yes", "No"),
Education = c("High", "Low"))
print(music)
mosaicplot(music, col = "steelblue1", main = "Classical Music Listening")
, , Education = High

Listen
Age Yes No
Old 210 190
Young 194 406

, , Education = Low

Listen
Age Yes No
Old 170 730
Young 110 290 30
The Titanic dataset available in the base installation. It describes the number of passengers who survived or died, cross-
classified by their class (1st, 2nd, 3rd, Crew), sex (Male, Female), and age (Child, Adult). Create a mosaic plot to
represent the Titanic data.

mosaicplot(~ Class+Sex+Age+Survived, data=Titanic, col='steelblue' , main="Titanic Survival details")

31
Produce a histogram with for the gear field of mtcars dataset.

hist(gear).

32
Binomial Distribution
 Binomial distribution in R is a probability distribution used in statistics.
 The binomial distribution is a discrete distribution and has only two outcomes i.e.
success or failure.
 All its trials are independent, the probability of success remains the same and the
previous outcome does not affect the next outcome

Formula:

33
Functions for Binomial Distribution
There are four functions for handling binomial distribution :
Probability Density Function: dbinom(x, size, prob)
Cumulative Distribution Function: pbinom(x, size, prob,lower.tail = FALSE))
Quantile Function: qbinom(p, size, prob)
Generating random numbers: rbinom(n, size, prob)

where,
x is a vector of numbers.
p is a vector of probabilities.
n is number of observations.
size is the number of trials.
prob is the probability of success of each trial.

34
dbinom(): This function gives the probability density distribution at each point.

Example: Find probability of winning exactly 19 times out of 25 coin tosses


dbinom(19, 25, 0.5) #dbinom(x, size, prob)
0.005277991

pbinom(): This function gives the cumulative probability of an event. It is a single value
representing the probability.
Examples
Find the Probability of getting 10 or less heads from 25 tosses of a coin.
pbinom(10,25,0.5) #pbinom(x, size, prob)
0.2121781

Find the Probability of more than 10 heads from 25 tosses of a coin.


pbinom(10,25,0.5,lower.tail=FALSE)
0.7878219 35
qbinom(): This function takes the probability value and gives a number whose cumulative
value matches the probability value. (probability of a variable X following a binomial
distribution taking values lower than or equal to x)
Example:
Find the 25th quantile of a binomial distribution with 25 trials and probability of success
on each trial = 0.5
qbinom(0.25,25,0.5)
11
This says that the probability that the number of successes is < 11 is 25% or less in a
binomial experiment with 25 trials and success probability of 0.5.

#binomial quantile for the probability 1-0.4


qbinom(0.25,25,0.5, lower.tail=FALSE)
14

36
rbinom(): This function generates required number of random values of given probability
from a given sample.

Example:
#to draw n random observations from a binomial distribution
rbinom(10,25,0.5)
8 14 10 12 10 14 16 7 13 12

37
#Binomial distribution for tossing a coin
cat("probability of winning exactly 19 times out of 25 tosses", dbinom(19, 25, 0.5),"\n")
cat("probability of getting 10 or less heads from 25 tosses",pbinom(10,25,0.5),"\n") #P(X <= x)
#P(X > x)
c at("probability of more than 10 heads from 25 tosses",pbinom(10,25,0.5,lower.tail=FALSE),"\n")
cat("binomial quantile for the probability 0.4",qbinom(0.25,25,0.5),"\n")
cat("binomial quantile for the probability 1-0.4",qbinom(0.25,25,0.5,lower.tail=FALSE),"\n")
#to draw n random observations from a binomial distribution
cat("10 random observations",rbinom(10,25,0.5),"\n")

probability of winning exactly 19 times out of 25 tosses 0.005277991


probability of getting 10 or less heads from 25 tosses 0.2121781
probability of more than 10 heads from 25 tosses 0.7878219
binomial quantile for the probability 0.4 11
binomial quantile for the probability 0.4 14
10 random observations 8 14 10 12 10 14 16 7 13 12
38
Binomial distribution problems
1. Bob makes success in 60% of his free-throw attempts. If he shoots 12 free throws, what is the
probability that he achieves exactly 10 success?

#find the probability of 10 successes during 12 trials where the probability of


#success on each trial is 0.6
dbinom(x=10, size=12, prob=.6)
# [1] 0.06385228

2. Subha flips a fair coin 20 times. What is the probability that the coin lands on heads exactly 7
times?
#find the probability of 7 successes during 20 trials where the probability of
#success on each trial is 0.5
dbinom(x=7, size=20, prob=.5)
# [1] 0.07392883

39
3. Raju flips a fair coin 5 times. What is the probability that the coin lands on heads more
than 2 times?

#find the probability of more than 2 successes during 5 trials where the
#probability of success on each trial is 0.5
pbinom(2, size=5, prob=.5, lower.tail=FALSE)
# [1] 0.5

4. Suppose a bowler scores a strike on 30% of his attempts when he bowls. If he bowls
10 times, what is the probability that he scores 4 or fewer strikes?

#find the probability of 4 or fewer successes during 10 trials where the


#probability of success on each trial is 0.3
pbinom(4, size=10, prob=0.3)
# [1] 0.8497317

40
Examples:
Find the 10th quantile of a binomial distribution with 10 trials and probability of success
on each trial = 0.4

qbinom(0.10, size=10, prob=0.4)


# [1] 2

Find the 40th quantile of a binomial distribution with 30 trials and probability of success
on each trial = 0.25

qbinom(0.40, size=30, prob=0.25)


# [1] 7
This says that the probability that the number of successes is < 7 is 40% or less in a
binomial experiment with 30 trials and success probability of 0.25.

41
Generate a vector that shows the number of successes of 10 binomial experiments with 100 trials
where the probability of success on each trial is 0.3.
results <- rbinom(10, size=100, prob=.3)
results
# [1] 31 29 28 30 35 30 27 39 30 28

Find mean number of successes in the above 10 experiments


mean(results)
# [1] 32.8

Generate a vector that shows the number of successes of 1000 binomial experiments with 100 trials
where the probability of success on each trial is 0.3.
results <- rbinom(1000, size=100, prob=0.3)

Find mean number of successes in these 1000 experiments


mean(results)
# [1] 30.105

42
Suppose there are twelve multiple choice questions in an English class quiz. Each question has
five possible answers, and only one of them is correct. Find the probability of having exactly 4
correct answers by random attempts as follows.
#Since only one out of five possible answers is correct, the probability of answering a question
#correctly by random is 1/5=0.2.
dbinom(4, size=12, prob=0.2)
[1] 0.1329

Find the probability of having four or less correct answers if a student attempts to answer every
question at random.

pbinom(4, size=12, prob=0.2) > dbinom(0, size=12, prob=0.2)


[1] 0.92744 + dbinom(1, size=12, prob=0.2)
+ dbinom(2, size=12, prob=0.2)
+ dbinom(3, size=12, prob=0.2)
+ dbinom(4, size=12, prob=0.2)
[1] 0.9274

43
A Hospital database displays that the patients suffering from cancer, 65% recover of it.
What will be the probability that of 5 randomly chosen patients out of which 3 will recover?

dbinom(3, size=5, prob=0.65)

A bowler scores a wicket on 20% of his attempts when he bowls. If he bowls 5 times, what
would be the probability that he scores 4 or lesser wicket?

pbinom(4, size=5, prob=.2)

44
Suppose you have a large population of students that’s 50% female. If students are
assigned to classrooms at random, and you visit 100 classrooms with 20 students each,
then how many girls might you expect to see in each classroom?

rbinom(100,20,0.5)

45
Bernoulli Distribution

Bernoulli Distribution is a special case of Binomial distribution where only a single trial is
performed. It is a discrete probability distribution for a Bernoulli trial (a trial that has only
two outcomes i.e. either success or failure)
The Bernoulli distribution is a special case of the binomial distribution with n=1
The base installation of R does not provide any Bernoulli distribution functions. For that
reason, we need to install and load the Rlab add-on package first.

install.packages("Rlab") # Install Rlab package


library("Rlab") # Load Rlab package

46
dbern()
dbern( ) function in R programming measures density function of Bernoulli distribution.

Syntax: dbern(x, prob)

Parameter:

x: vector of quantiles
prob: probability of success on each trial
log: logical; if TRUE, probabilities p are given as log(p)

Note: # import Rlab library


library(Rlab)
Example:
dbern(x, prob=.5)
pbern(q, prob=.5, lower.tail=TRUE)
qbern(p, prob=.5, lower.tail=TRUE)
rbern(n, prob=.5) 47
Geometric Distribution
Geometric distribution is a type of discrete probability distribution that represents the
probability of the number of successive failures before a success is obtained in a
Bernoulli trial. This distribution is based on three important assumptions
These are listed as follows.
 The trials being conducted are independent.
 There can only be two outcomes of each trial - success or failure.
 The success probability, denoted by p, is the same for each trial.

There are four functions for handling Geometric distribution:

dgeom: returns the value of the geometric probability density function.


pgeom: returns the value of the geometric cumulative density function.
qgeom: returns the value of the inverse geometric cumulative density function.
rgeom: generates a vector of geometric distributed random variables.
48
The dgeom function finds the probability of experiencing a certain amount of failures
before experiencing the first success in a series of Bernoulli trials, using the following
syntax:
dgeom(x, prob)
where:
• x: number of failures before first success
• prob: probability of success on a given trial

Example:
A researcher is waiting outside of a library to ask people if they support a certain law. The
probability that a given person supports the law is p = 0.2. What is the probability that the
fourth person the researcher talks to is the first person to support the law?
Solution:
dgeom(x=3, prob=.2)

#0.1024
The probability that the researchers experiences 3 “failures” before the first success is
49
0.1024.
The pgeom function finds the probability of experiencing a certain amount of failures or
less before experiencing the first success in a series of Bernoulli trials, using the following
syntax:

pgeom(q, prob)
where:
q: number of failures before first success
prob: probability of success on a given trial

50
Example:1
A researcher is waiting outside of a library to ask people if they support a certain law.
The probability that a given person supports the law is p = 0.2. What is the probability
that the researcher will have to talk to 3 or less people to find someone who supports
the law?
pgeom(q=3, prob=.2)

#0.5904

Example:2
A researcher is waiting outside of a library to ask people if they support a certain law.
The probability that a given person supports the law is p = 0.2. What is the probability
that the researcher will have to talk to more than 5 people to find someone who
supports the law?
1 - pgeom(q=5, prob=.2) or pgeom(q=5, prob=.2, lower.tail=FALSE)

#0.262144 51
qgeom
The qgeom function finds the number of failures that corresponds to a certain
percentile, using the following syntax:
qgeom(p, prob)
where:
p: percentile
prob: probability of success on a given trial

Example:
A researcher is waiting outside of a library to ask people if they support a certain law. The
probability that a given person supports the law is p = 0.2. We will consider a “failure” to
mean that a person does not support the law. How many “failures” would the researcher
need to experience to be at the 90th percentile for number of failures before the first
success?
qgeom(p=.90, prob=0.2)

#10
52
rgeom
The rgeom function generates a list of random values that represent the number of
failures before the first success, using the following syntax:
rgeom(n, prob)
where:
n: number of values to generate
prob: probability of success on a given trial

53
Example:
A researcher is waiting outside of a library to ask people if they support a certain law. The
probability that a given person supports the law is p = 0.2. We will consider a “failure” to
mean that a person does not support the law. Simulate 10 scenarios for how many
“failures” the researcher will experience until she finds someone who supports the law.

set.seed(0) #make this example reproducible


rgeom(n=10, prob=.2)

# 1 2 1 10 7 4 1 7 4 1
 During the first simulation, the researcher experienced 1 failure before finding someone
who supported the law.
 During the second simulation, the researcher experienced 2 failures before finding
someone who supported the law.
 During the third simulation, the researcher experienced 1 failure before finding someone
who supported the law.
 During the fourth simulation, the researcher experienced 10 failures before finding
someone who supported the law. 54
A sports marketer randomly selects persons on the street until he encounters someone
who attended a game last season. What is the probability the marketer encounters 3
people who did not attend a game before the first success when p = 0.20 of the
population attended a game?

dgeom(x = 3, prob = 0.20)

Simulate the above scenario for 10 times to find how many “failures” the marketer will
experience until he finds someone who attended a game last season.

rgeom(n = 10, prob = 0.20)

55
What is the probability the marketer fails to find someone who attended a game in less
than or equal to 5 trials before finding someone who attended a game on the next trial
when the population probability is p = 0.20?

pgeom(q = 5, prob = 0.20, lower.tail = TRUE)

What is the probability the marketer fails to find someone who attended a game on greater
than 5 trials before finding someone who attended a game on the next trial when the
population probability is p = 0.20?

pgeom(q = 5, prob = 0.20, lower.tail = FALSE)

56
Consider a production line having 3.5 % defective rate. Let X denote the number of non-
defective products before first defective product.
(a) Find the probability that the there will be 3 non-defective products before first
defective.

(b) Find the probability that there will be at most 3 non-defective products before first
defective.

(c) Find the probability that there will be at least 3 non-defective products before first
defective.

(d) What is the probability that 3 to 5 (inclusive) non-defective products before first
defective product?

(e) What is the value of c, if P(X≤c)≥0.60? (or) Find the 60th quantile of given
geometric distribution
57
(f) Simulate 100 Geometric distributed random variables with prob=0.35.
a) The probability that the there will be 3 non-defective products before first defective is
dgeom(3,0.35)

b) The probability that there will be at most 3 non-defective products before first
defective[P(X≤3)] is pgeom(3,0.35) (or) sum(dgeom(0:3), 0.35)

c) The probability that there will be at least 3 non-defective products before first defective
[P(X≥3)] is pgeom(2, 0.35,lower.tail=FALSE)

d) The probability that 3 to 5 (inclusive) non-defective products before first defective


product [P(3≤X≤5)] is sum(dgeom(3:5, 0.35)) (or) pgeom(5,prob)-pgeom(2,prob)

e) we need to find the value of c such a that P(X≤c)≥0.60. That is we need to find the 60th
quantile of given Geometric distribution: qgeom(0.60, 0.35)

f) Simulating 100 Geometric distributed random variables with prob=0.35 is


rgeom(100,0.35) 58
Products produced by a machine has a 3% defective rate.
What is the probability that the first defective occurs in the fifth item inspected?
What is the probability that the first defective occurs in the first five inspections?

It is known that 20% of products on a production line are defective. Products are inspected until first defective is
encountered. Let X = number of inspections to obtain first defective .
what is the minimum number of inspections, that would be necessary so that the probability of observing a
defective is more that 75%?
Choose k so that P(X ≤ k) ≥ .75.
qgeom(.75, .2)

59
Poisson Distribution
Poisson Distribution deals with the probability distribution of data values taking the mean
into consideration.
That is, it estimates the probability value for a set of cases with specific trails or events that
happens at a customized yet constant mean value.

The following functions are used with Poisson Distribution


 dpois() function
 ppois() function
 qpois() function
 rpois() function

60
The Poisson probability function with mean λ can be calculated with the R dpois function
for any value of x. The following block of code summarizes the arguments of the
function:

dpois(x, lambda)

Example:
It is known that a certain website makes 10 sales per hour. In a given hour, what is the
probability that the site makes exactly 8 sales?
dpois(x=8, lambda=10)

Output
0.112599

61
The probability of a variable X following a Poisson distribution taking values equal or
lower than x can be calculated with the ppois funtion, which arguments are described
below:
syntax
ppois(q, lambda,lower.tail = TRUE) # If TRUE, probabilities are P(X <= x), or P(X > x)

Example:
It is known that a certain website makes 10 sales per hour. In a given hour, what is the
probability that the site makes 8 sales or less?
ppois(8, lambda = 10)

# 0.3328197

62
It is known that a certain website makes 10 sales per hour. In a given hour, what is the
probability that the site makes more than 8 sales?

1 - ppois(q=8, lambda=10) ppois(8, lambda,lower.tail = FALSE)


#0.6671803

63
The qpois function
The R qpois function allows obtaining the corresponding Poisson quantiles for a set of
probabilities. The qpois function finds the number of successes that corresponds to a
certain percentile based on an average rate of success,

qpois syntax
qpois(p, lambda, lower.tail = TRUE)

Example:
It is known that a certain website makes 10 sales per hour. How many sales would the
site need to make to be at the 90th percentile for sales in an hour?
qpois(p=.90, lambda=10)
#14

64
The rpois function

To draw n observations from a Poisson distribution the rpois function can be used. The
following block of code summarizes the arguments of the function.

rpois syntax
rpois(n, lambda)

Example:
Generate a list of 15 random variables that follow a Poisson distribution with a rate
of success equal to 10.
rpois(n=15, lambda=10)

65
Data from the maternity ward in a certain hospital shows that there is a historical average
of 4.5 babies born in this hospital every day. What is the probability that 6 babies will be
born in this hospital tomorrow?

dpois(6, 4.5)

Try simulating births in this hospital for a year.

rpois(365, 4.5)

What about the probability of more than 6 babies being born?

ppois(6, 4.5, lower.tail = FALSE)

66
Consider that the number of visits on a web page is known to follow a Poisson distribution
with mean 15 visits per hour.
What is the probability of getting
a) 10 or less visits per hour?
b) The probability of getting more than 20 visits per hour, P(X > 20)
c) The probability of getting less than 15 visits per hour? P(X<15)
d) The probability of getting 10 to 20 visits per hour.

a) ppois(10, lambda = 15) # 0.1184644

b) ppois(20, lambda = 15, lower.tail = FALSE) # 0.08297091

c) ppois(14, lambda = 15)

d) ppois(20, lambda = 15) - ppois(10, lambda = 15) (or) sum(dpois(11:20, lambda = 15)) #
Equivalent
67
Problem
If there are twelve cars crossing a bridge per minute on average, find the probability of
having seventeen or more cars crossing the bridge in a particular minute.

> ppois(16, lambda=12, lower=FALSE) # upper tail


[1] 0.10129 #10.1%

68
Exponential distribution
The exponential distribution is a continuous probability distribution used to model the time
or space between events in a Poisson process. where the events occur continuously and
independently at a constant rate λ. It is a probability distribution that is used to model the
time we must wait until a certain event occurs.
Function Description
Exponential density
dexp
(Probability density function)
Exponential distribution
pexp
(Cumulative distribution function)

qexp Quantile function of the exponential distribution

rexp Exponential random number generation


69
The dexp function
The function in R to calculate the density function for any rate λ is the dexp function,
described below:

dexp syntax
dexp(x, # X-axis values (> 0)
rate = 1, # Vector of rates (lambdas)
) # If TRUE, probabilities are given as log
Example:
#To calculate the exponential density function of rate 2 for a grid of values in R you can
type:
# Grid of values
x <- seq(from = 0, to = 8, by = 0.01)

# Exponential PDF of rate 2


dexp(x, rate = 2)
70
Example:
An exponential distribution in R with mean 10 you will need to calculate the corresponding
rate:

# Exponential density function of mean 10


dexp(x, rate = 0.1) # E(X) = 1/lambda = 1/0.1 = 10

71
The pexp function

The R function that allows you to calculate the probabilities of a random variable XX taking
values lower than x is the pexp function, which has the following syntax:

pexp syntax
pexp(q,
rate = 1,
lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
log.p = FALSE)

Example:
The probability of the variable (of rate 1) taking a value lower or equal to 2 is 0.8646647:

pexp(2) # 0.8646647
72
Examples:
The time spent on a determined web page is known to have an exponential distribution
with an average of 5 minutes per visit. In consequence, as E(X) = 1/λ ; 5 = 1/λ ; λ=0.2.

i) To calculate the probability of a visitor spending up to 3 minutes on the site :


pexp(3, rate = 0.2) # 0.4511884 or 45.12%
1 - pexp(3, rate = 0.2, lower.tail = FALSE) # Equivalent

ii)To calculate the probability of a visitor spending more than 10 minutes on the site you can
type:
pexp(10, rate = 0.2, lower.tail = FALSE) # 0.1353353 or 13.53%

iii) To calculate the probability of a visitor spending between 2 and 6 minutes is:
pexp(6, rate = 0.2) - pexp(2, rate = 0.2) # 0.3691258 or 36.91%

73
The qexp function

The qexp function allows you to calculate the corresponding quantile (percentile) for
any probability p:

qexp syntax
qexp(q,
rate = 1,
lower.tail = TRUE) # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise

Example:
To calculate the quantile for the probability 0.8646647 (Q(0.86)) :

qexp(0.8646647) # 2
qexp(1 - 0.8646647, lower.tail = FALSE) # Equivalent

74
The rexp function

The rexp function allows you to draw n observations from an exponential distribution.
The syntax of the function is as follows:

rexp syntax
rexp(n, # Number of observations to be generated
rate = 1)

Example:
To draw ten observations from an exponential distribution of rate 1 :

rexp(10)

0.7551818 1.1816428 0.1457067 0.1397953 0.4360686


2.8949685 1.2295621 0.5396828 0.9565675 0.1470460
75
Suppose the mean checkout time of a supermarket cashier is three minutes. Find the
probability of a customer checkout being completed by the cashier in less than or equal
to two minutes.

Solution
The checkout processing rate is equals to one divided by the mean checkout
completion time. Hence the processing rate is 1/3 checkouts per minute. We then
apply the function pexp of the exponential distribution with rate=1/3.

> pexp(2, rate=1/3)

[1] 0.48658

76
The time (in hours) required to repair a machine is an exponential distributed random
variable with paramter λ=1/2.

(a) Find the value of the density function at x=2.5.


(b) Find the probability that a repair time takes at most 3 hours.
(c) Find the probability that a repair time exceeds 4 hours.
(d) Find the probability that a repair time takes between 2 to 4 hours.
(e) Plot the graph of cumulative Exponential probabilities.
(f) Find the 50th quantile of given Exponential distribution.(or)What is the value of c, if
P(X≤c)≥0.50?
(g) Simulate 1000 Exponential distributed random variables with λ=1/2.

77
Let X denote the time (in hours) required to repair a machine. Given that X∼Exp(λ=1/2)

a) dexp(2.5,rate=0.5)

b) The probability that a repair time takes at most 3 hours is pexp(3,rate=0.5)

c) The probability that a repair time exceeds 4 hours is pexp(4,rate=0.5,lower.tail=FALSE)

d) The probability that a repair time takes between 2 to 4 hours can be written as P(2<X<4).

pexp(4,rate=lambda)- pexp(2,rate=0.5)

e) To find the 50th quantile of given Exponential distribution qexp(0.50,rate=0.5).

f) 1000 random numbers from Exponential distribution with given rate=0.5 rexp(1000,0.5)

78
Normal distribution
Normal Distribution is a probability function used in statistics that tells about how the data values are
distributed.
It is the most important probability distribution function used in statistics because of its advantages in real
case scenarios.
For example, the height of the population, shoe size, IQ level, rolling a dice, and many more.

R has four in built functions to generate normal distribution. They are described below.
dnorm(x, mean, sd)
pnorm(x, mean, sd)
qnorm(p, mean, sd)
rnorm(n, mean, sd)

where,
x is a vector of numbers.
p is a vector of probabilities.
n is number of observations(sample size).
mean is the mean value of the sample data. It's default value is zero.
sd is the standard deviation. It's default value is 1.
79
dnorm
The function dnorm returns the value of the probability density function (pdf) of the normal
distribution given a certain random variable x, a population mean μ and population standard
deviation σ. The syntax for using dnorm is as follows:

dnorm(x, mean, sd)


Example:
#find the value of the standard normal distribution pdf at x=0
dnorm(x=0, mean=0, sd=1)
# [1] 0.3989423

#by default, R uses mean=0 and sd=1


dnorm(x=0)
# [1] 0.3989423

#find the value of the normal distribution pdf at x=10 with mean=20 and sd=5
dnorm(x=10, mean=20, sd=5)
# [1] 0.01079819 80
pnorm
The function pnorm returns the value of the cumulative density function (cdf) of the
normal distribution given a certain random variable q, a population mean μ and
population standard deviation σ. The syntax for using pnorm is as follows:
pnorm(q, mean, sd)

Example:
Suppose the height of males at a certain school is normally distributed with a mean of
μ=70 inches and a standard deviation of σ = 2 inches. Approximately what percentage of
males at this school are taller than 74 inches?

#find percentage of males that are taller than 74 inches in a population with
#mean = 70 and sd = 2
pnorm(74, mean=70, sd=2, lower.tail=FALSE)

# [1] 0.02275013
81
Suppose the weight of a certain species of otters is normally distributed with a mean of μ=30 lbs
and a standard deviation of σ = 5 lbs. Approximately what percentage of this species of otters
weight less than 22 lbs?

#find percentage of otters that weight less than 22 lbs in a population with
#mean = 30 and sd = 5
pnorm(22, mean=30, sd=5)

# [1] 0.05479929
Suppose the height of plants in a certain region is normally distributed with a mean of μ=13 inches
and a standard deviation of σ = 2 inches. Approximately what percentage of plants in this region
are between 10 and 14 inches tall?

#find percentage of plants that are less than 14 inches tall, then subtract the
#percentage of plants that are less than 10 inches tall, based on a population
#with mean = 13 and sd = 2
pnorm(14, mean=13, sd=2) - pnorm(10, mean=13, sd=2)

# [1] 0.6246553 82
qnorm
The function qnorm returns the value of the inverse cumulative density function (cdf) of the normal
distribution given a certain random variable p, a population mean μ and population standard deviation σ.
The syntax for using qnorm is as follows:

qnorm(p, mean, sd)


#find the Z-score of the 99th quantile of the standard normal distribution
qnorm(.99, mean=0, sd=1)
# [1] 2.326348

#by default, R uses mean=0 and sd=1


qnorm(.99)
# [1] 2.326348

#find the Z-score of the 95th quantile of the standard normal distribution
qnorm(.95)
# [1] 1.644854

#find the Z-score of the 10th quantile of the standard normal distribution
qnorm(.10)
83
# [1] -1.281552
rnorm
The function rnorm generates a vector of normally distributed random variables given a vector length n, a
population mean μ and population standard deviation σ. The syntax for using rnorm is as follows:
rnorm(n, mean, sd)
Example:
#generate a vector of 5 normally distributed random variables with mean=10 and sd=2
five <- rnorm(5, mean = 10, sd = 2)
five
# [1] 10.658117 8.613495 10.561760 11.123492 10.802768

#generate a vector of 1000 normally distributed random variables with mean=50 and sd=5
narrowDistribution <- rnorm(1000, mean = 50, sd = 15)

#generate a vector of 1000 normally distributed random variables with mean=50 and sd=25
wideDistribution <- rnorm(1000, mean = 50, sd = 25)

84
Suppose that you have a machine that packages rice inside boxes. The process follows a
Normal distribution and it is known that the mean of the weight of each box is 1000
grams and the standard deviation is 10 grams.

What is the probability of a box weighing exactly 950 grams?


dnorm( 950, 1000,10)

What is the probability of a box weighing more than 980 grams?


pnorm(980,1000,10,lower.tail=FALSE)

Calculate the quantile for probability 0.5 for the above scenario.
qnorm(0.5,1000,10)

Simulate the above scenario for 10 observations.


rnorm(10,1000,10)
85
Uniform distribution

A uniform distribution, also called a rectangular distribution, is a probability


distribution that has constant probability.
This distribution is defined by two parameters, a and b:
 a is the minimum.
 b is the maximum.

EX: When you roll a fair die, the outcomes are 1 to 6. The probabilities of getting these
outcomes are equally likely and that is the basis of a uniform distribution.

Following are the functions provided by R to handle uniform distribution:


dunif() function
runif() function
qunif() function
punif() function
86
In order to calculate the uniform density function in R in the interval (a, b)(a,b) for any
value of xx you can make use of the dunif function, which has the following syntax:

dunif syntax
dunif(x, # X-axis values (grid of values)
min = 0, # Lower limit of the distribution (a)
max = 1) # Upper limit of the distribution (b)

Example:
Consider that you want to calculate the uniform probability density function in the
interval (1, 3) for a grid of values. For that purpose you can type:
x <- 0:4 # Grid
dunif(x, min = 1, max = 3)
Output
0.0 0.5 0.5 0.5 0.0 87
The punif function

punif function to calculate the uniform cumulative distribution function, this is, the
probability of a variable X taking a value lower than x. This function has the following
syntax:

punif syntax
punif(q, # Vector of quantiles
min = 0, # Lower limit of the distribution (a)
max = 0, # Upper limit of the distribution (b)
lower.tail = TRUE) # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise

Example:
To calculate the probability of a uniform variable on the interval (0, 1) taking a value
equal or lower to 0.6 is:

punif(0.6) # 0.6 88
The qunif function
To calculate the quantile for any probability (p) for a uniform distribution

qunif syntax
qunif(p, # Vector of probabilities
min = 0, # Lower limit of the distribution (a)
max = 1, # Upper limit of the distribution (b)
lower.tail = TRUE) # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise

Example:
To calculate the quantile for the probability 0.5 of a uniform distribution on the interval (0,
60)

qunif(0.5, min = 0, max = 60)


# 30
89
The runif function

The R runif function allows drawing n random observations from a uniform distribution.

runif syntax
runif(n # Number of observations to be generated
min = 0, # Lower limit of the distribution (a)
max = 0) # Upper limit of the distribution (b)

Example:
To draw ten observations from a uniform distribution on the interval (-1, 1)

runif(n = 10, min = -1, max = 1)

-0.20757312 -0.46819001 -0.80643735 -0.92675885 0.80520074


0.39716130 -0.39939392 0.78837145 0.28130687 0.09807602
90
The daily amount of coffee, in liters, dispensed by a machine located in an airport lobby is a
random variable X having a continuous uniform distribution with a=7 and b=10.

(a)Find the value of the density function at x=7.6.

(b) Find the probability that on a given day the amount of coffee dispensed by the machine
will be at most 8.8 liters.

(c) Find the probability that on a given day the amount of coffee dispensed by the machine
will be at least 8.5 liters.

(d) What is the value of c, if P(X≤c)≥0.60?

(e) Simulate 1000 Uniform distributed random variables with a=7 and b=10.

91
(a) The value of the density function at x=7.6
dunif(7.6,min=7,max=10)

(b) The probability that on a given day the amount of coffee dispensed by the machine
will be at most 8.8 liters.
punif(8.8,7,10)

(c) The probability that on a given day the amount of coffee dispensed by the machine
will be at least 8.5 liters.
punif(8.4,7,10,lower.tail=FALSE)

(d)The value of c, if P(X≤c)≥0.60?


qunif(0.6,7,10)

(e) Simulate 1000 Uniform distributed random variables with a=7 and b=10.
runif(1000,7,10)
92
X is the time (in minutes) that a person has to wait in order to take a flight. If each flight
takes off each hour X∼U(0,60).

Find the probability that a person has to wait exactly for 20 minutes?
dunif(20,0,60)

Find the probability that a person has to wait up to 15 minutes?


punif(15,0,60)

Calculate the quantile for the probability 0.5 of a uniform distribution on the interval (0, 60)
qunif(0.5,0,60)

93

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy