0% found this document useful (0 votes)
21 views47 pages

R Visualization ADA

The document discusses exploratory data analysis and visualizing data in R. It provides examples of using the plot() function to create basic graphs and customize aspects like markers, lines, labels, and titles. It also discusses other related functions like abline() and par() as well as the grammar of graphics framework for building complex graphs from layers of data.

Uploaded by

HARSHITA RATHORE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views47 pages

R Visualization ADA

The document discusses exploratory data analysis and visualizing data in R. It provides examples of using the plot() function to create basic graphs and customize aspects like markers, lines, labels, and titles. It also discusses other related functions like abline() and par() as well as the grammar of graphics framework for building complex graphs from layers of data.

Uploaded by

HARSHITA RATHORE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Exploratory Data Analysis

Visualizing Data

s.patra@iimkashipur.ac.in

Indian Institute of Management Kashipur

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 1 / 47
plot() Function

x <- 1:10
y <- log(x)
plot(x,y)
2.0
1.5
y

1.0
0.5
0.0

2 4 6 8 10

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 2 / 47
plot() Options
The shape of the markers: The plot markers are by default small, empty circles. These
are also known as plot characters - denoted by pch. Change the shape of the marker by
varying pch values from 0 to 25 (0 is for a square, 1 is for a circle, 3 is for a triangle, 4 is
for a cross and so on).
Size of the plot markers: This aspect of a graph can be controlled using the cex
parameter. The cex parameter can be set to 0.5 if you want the markers to be 50%
smaller and 1.5 if you want them to be 50% larger.
Color of the plot markers: The symbols can be assigned one or many colors. These colors
can be selected from a list provided by R under the colors() function.
Connecting the points with lines: Many times, it is necessary to connect the displayed
points with different kinds of lines. This can be done using the type attribute of the plot
function. The type attribute set to p refers to only points and l to only a line. Similarly,
values b and o are for lines connecting points and overlaying points respectively. To get a
histogram like display the h option is used and s is used for a step option.
Varying the lines: The line type can be specified by the lty parameter (range 0 to 6) and
line width is set using an lwd parameter.

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 3 / 47
R Plot pch Symbols

ggpubr::show_point_shapes()

Point shapes available in R

0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 4 / 47
plot() Options

plot(x,y,pch = c(0,18),cex = 1.5,col = c('red','blue'),type='o',lty = 3,lwd = 2)

2.0
1.5
y

1.0
0.5
0.0

2 4 6 8 10

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 5 / 47
Adding Labels & Title
The main title is added using the main option in the plot function. The font, color, and
size can be customized using the font.main, col.main and cex.main respectively.
The titles for the axes are provided using xlab and ylab attributes. These can be
customized using font.lab,col.lab and cex.lab like above.
You can also add some extra text inside the plot using the text attribute, specifying the
text to use and the coordinates to display.
The text attribute can also be used to label the data points. The text, in this case, is a
vector of labels instead of a string.
The legend can be added to a graph using the R’s legend() function. Legend takes as
input the coordinates, text and the symbols to be interpreted.

labelset <-c('one','two','three','four','five','six','seven','eight','nine','ten')
plot(x,y,pch = c(0,18),cex = 1.5,col = c('red','blue'),type='o',lty = 3,lwd = 2,
main = "Graph of y = log(x) vs Graph of y = x-1", col.main = "purple",
xlab="X Values",ylab="Y Values")
text(x+1,y,labelset,col='red')
lines(x,x-1,col='green',lty = 4, lwd = 2)
legend('bottomright',inset=0.05, c("Log","minus 1"),
lty=c(2,4),col=c("red","green"))
abline(h=c(4,6),col="orange",lty=2)

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 6 / 47
plot() Options

Graph of y = log(x) vs Graph of y = x−1

ten
nine
eight
2.0

seven

six

five
1.5

four
Y Values

three
1.0

two
0.5

Log(x)
x−1
0.0

one

2 4 6 8 10

X Values

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 7 / 47
Other Related Functions
abline(v = 10) draws a straight line at x = 10.
abline(h = 10) draws a straight line at y = 10.
par() function sets the margin by taking option: mar() for margin and oma() for outer margin
area.
For both arguments, you must give four values giving the desired space in the bottom, left, top
and right part of the chart respectively. For instance, par(mar=c(4,0,0,0)) draws a margin of
size 4 only on the bottom of the chart.
par(mfrow = c(2, 2)): Creates a 2 x 2 plotting matrix

dev.off() : closes the specified plot (by default the current device)

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 8 / 47
The Grammar Of Graphics
Grammar: “the fundamental principles or rules of an art or science”
A good grammar will allow us to gain insight into the composition of complicated graphics, and
reveal unexpected connections between seemingly different graphics.
The most important modern work in graphical grammars is “The Grammar of Graphics” by
Wilkinson, Anand, and Grossman (2005). They proposed an alternative parameterization of the
grammar, based around the idea of building up a graphic from multiple layers of data.
The basic idea: independently specify plot building blocks and combine them to create just
about any kind of graphical display you want. Building blocks of a graph include:

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 9 / 47
Components of Layered Grammer
The data that you want to visualise and a set of aesthetic mappings describing how
variables in the data are mapped to aesthetic attributes that you can perceive.
Geometric objects, geoms for short, represent what you actually see on the plot: points,
lines, polygons, etc.
Statistical transformations, stats for short, summarize data in many useful ways. For
example, binning and counting observations to create a histogram, or summarising a 2d
relationship with a linear model. Stats are optional, but very useful.
The scales map values in the data space to values in an aesthetic space, whether it be
colour, or size, or shape. Scales draw a legend or axes, which provide an inverse mapping
to make it possible to read the original data values from the graph.
A coordinate system, coord for short, describes how data coordinates are mapped to the
plane of the graphic. It also provides axes and gridlines to make it possible to read the
graph. We normally use a Cartesian coordinate system, but a number of others are
available, including polar coordinates and map projections.
A faceting specification describes how to break up the data into subsets and how to
display those subsets as small multiples. This is also known as conditioning or
latticing/trellising.

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 10 / 47
Data Layer
library(ggplot2)
df <- read.delim("datasets/marketing_campaign.csv") %>%
drop_na()
ggplot(data = df)

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 11 / 47
Aesthetic Layer (aes)
The aesthetic layer maps variables in our data onto scales in our graphical visualization, such as
the x and y coordinates.

ggplot(data = df, aes(x = Income, y = MntSweetProducts))

200
MntSweetProducts

100

0e+00 2e+05 4e+05 6e+05


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 12 / 47
Geometries Layer (geom_)

ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +


geom_point()

200
MntSweetProducts

100

0e+00 2e+05 4e+05 6e+05


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 13 / 47
Facets Layer

ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +


geom_point() +
facet_wrap(~Education)

2n Cycle Basic Graduation

200

100
MntSweetProducts

0
0e+00 2e+05 4e+05 6e+05
Master PhD

200

100

0
0e+00 2e+05 4e+05 6e+05 0e+00 2e+05 4e+05 6e+05
Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 14 / 47
Statistics Layer
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)

## ‘geom_smooth()‘ using formula = ’y ~ x’

400

300
MntSweetProducts

200

100

0e+00 2e+05 4e+05 6e+05


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 15 / 47
Statistics Layer
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point() +
facet_wrap(~Education) +
stat_smooth(method = "lm", se = FALSE)

## ‘geom_smooth()‘ using formula = ’y ~ x’

2n Cycle Basic Graduation

400

300

200

100
MntSweetProducts

0e+00 2e+05 4e+05 6e+05


Master PhD

400

300

200

100

0e+00 2e+05 4e+05 6e+05 0e+00 2e+05 4e+05 6e+05


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 16 / 47
Coordinates Layer
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
coord_cartesian(xlim = c(0, 115000), ylim = c(0, 200))

## ‘geom_smooth()‘ using formula = ’y ~ x’

200

150
MntSweetProducts

100

50

0 30000 60000 90000 120000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 17 / 47
Themes Layer
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
coord_cartesian(xlim = c(0, 115000), ylim = c(0, 200)) +
theme_classic()

## ‘geom_smooth()‘ using formula = ’y ~ x’

200

150
MntSweetProducts

100

50

0 30000 60000 90000 120000


s.patra@iimkashipur.ac.in (Indian Institute of Management
Exploratory
Kashipur)
Data
Income Analysis 18 / 47
More on Aesthetic Mappings
Adding colour to the chart
While you can do data manipulation in aes(), e.g. aes(log(Income),
log(MntSweetProducts)), best to only do simple calculations.
ggplot(data = df,
aes(x=log(Income), y=MntSweetProducts, col=factor(Teenhome), size = 2)) +
geom_point()

200

factor(Teenhome)
MntSweetProducts

0
1
2

size
100 2

8 10 12
log(Income)

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 19 / 47
More on Aesthetic Mappings
Aesthetic mappings can be supplied in the initial ggplot() call, in individual layers, or in some
combination of both. All of these calls create the same plot specification:

## Specification 1
ggplot(data = df, aes(x = Income,
y = MntSweetProducts,
col=factor(Teenhome),
size = 2)) +
geom_point()
## Specification 2
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point(aes(col=factor(Teenhome), size = 2))
## Specification 3
ggplot(data = df, aes(x = Income)) +
geom_point(aes(y = MntSweetProducts, col=factor(Teenhome), size = 2))
## Specification 4
ggplot(data = df) +
geom_point(aes(x = Income,
y = MntSweetProducts,
col=factor(Teenhome),
size = 2))

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 20 / 47
More on Aesthetic Mappings
## Specification 1
ggplot(data = df, aes(x = Income, y = MntSweetProducts, col=factor(Teenhome))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
coord_cartesian(xlim = c(0, 115000), ylim = c(0, 200))

200

150
MntSweetProducts

factor(Teenhome)
0
100
1
2

50

0 30000 60000 90000 120000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 21 / 47
More on Aesthetic Mappings
## Specification 2
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point(aes(col=factor(Teenhome))) +
geom_smooth(method = "lm", se = FALSE) +
coord_cartesian(xlim = c(0, 115000), ylim = c(0, 200))

200

150
MntSweetProducts

factor(Teenhome)
0
100
1
2

50

0 30000 60000 90000 120000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 22 / 47
More on Aesthetic Mappings
Setting vs. mapping Colours : Instead of mapping an aesthetic property to a variable, you can
set it to a single value by specifying it in the layer parameters. We map an aesthetic to a
variable (e.g., aes(colour = factor(Teenhome))) or set it to a constant (e.g., colour = “red”).
## Specification 2
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point(aes(col= "red")) +
geom_smooth(method = "lm", se = FALSE) +
coord_cartesian(xlim = c(0, 115000), ylim = c(0, 200))

200

150
MntSweetProducts

colour
100
red

50

0 30000 60000 90000 120000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 23 / 47
More on Aesthetic Mappings
## Specification 2
ggplot(data = df, aes(x = Income, y = MntSweetProducts)) +
geom_point() +
geom_smooth(aes(colour = "loess"), method = "loess", se = TRUE) +
geom_smooth(aes(colour = "lm"), method = "lm", se = TRUE) +
coord_cartesian(xlim = c(0, 115000), ylim = c(0, 200)) +
theme_classic()

200

150
MntSweetProducts

colour
100 lm
loess

50

0 30000 60000 90000 120000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 24 / 47
Visualizing Amounts: Bar Plot

df %>%
ggplot(aes(x = Education, y = Income)) +
geom_bar(stat = "identity")

6e+07

4e+07
Income

2e+07

0e+00

2n Cycle Basic Graduation Master PhD


Education

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 25 / 47
Visualizing Distribution: Histogram

df %>%
filter(Income < 20000) %>%
ggplot(aes(x=Income)) +
geom_histogram(binwidth=2000, fill="red", color="blue", alpha=0.9)

30

20
count

10

0 5000 10000 15000 20000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 26 / 47
Visualizing Distribution: Density Plot

df %>%
filter(Income < 20000) %>%
ggplot(aes(x=Income)) +
geom_density(fill="green", color="#e9ecef", alpha=0.8)

1.0e−04

7.5e−05
density

5.0e−05

2.5e−05

0.0e+00

5000 10000 15000 20000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 27 / 47
Visualizing Distribution: Box Plot

df %>%
filter(Income < 20000) %>%
ggplot(aes(x=Education, y=Income)) +
geom_boxplot() +
geom_jitter(color="black", size=0.4, alpha=0.9)

20000

15000
Income

10000

5000

2n Cycle Basic Graduation Master PhD


Education

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 28 / 47
Visualizing Distribution: Cumulative Distribution

df %>%
filter(Income < 20000) %>%
ggplot(aes(x = Income, colour = factor(Teenhome))) +
stat_ecdf()

1.00

0.75

factor(Teenhome)
0
ecdf

0.50
1
2

0.25

0.00

5000 10000 15000 20000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 29 / 47
Visualizing Distribution: Quantile-Quantile Plot
df %>%
group_by(Dt_Customer) %>%
summarise(No_Customer = n()) %>%
mutate(Date = dmy(Dt_Customer)) %>%
filter(year(Date) == 2014) %>%
ggplot(aes(x = Date, y = No_Customer)) +
geom_line() +
geom_point()

9
No_Customer

Jan Apr Jul


Date

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 30 / 47
More on Facets: facet_grid()

df %>%
filter(Income < 20000) %>%
ggplot(aes(x=Education, y=Income)) +
geom_boxplot() +
geom_jitter(color="black", size=0.4, alpha=0.9) +
facet_grid(factor(Teenhome) ~ factor(Kidhome))

0 1 2
20000

15000

0
10000

5000

20000

15000
Income

1
10000

5000

20000

15000

2
10000

5000

2n Cycle Basic Graduation Master PhD 2n Cycle Basic Graduation Master PhD 2n Cycle Basic Graduation Master PhD
Education

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 31 / 47
Position Adjustments: Stacked Bar Chart
Each geom also has a default position adjustment which specifies a set of “rules” as to how
different components should be positioned relative to each other.

df %>%
ggplot(aes(x = Education, y = Income, fill = factor(Teenhome))) +
geom_bar(stat = "Identity")

6e+07

4e+07

factor(Teenhome)
Income

0
1
2

2e+07

0e+00

2n Cycle Basic Graduation Master PhD


Education

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 32 / 47
Position Adjustments: Grouped Bar Chart

df %>%
ggplot(aes(x = Education, y = Income, fill = factor(Teenhome))) +
geom_bar(stat = "Identity", position = "dodge")

6e+05

4e+05
factor(Teenhome)
Income

0
1
2

2e+05

0e+00

2n Cycle Basic Graduation Master PhD


Education

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 33 / 47
Position Adjustments: Percentage Chart

df %>%
ggplot(aes(x = Education, y = Income, fill = factor(Teenhome))) +
geom_bar(stat = "Identity", position = "fill")

1.00

0.75

factor(Teenhome)
Income

0
0.50
1
2

0.25

0.00

2n Cycle Basic Graduation Master PhD


Education

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 34 / 47
More on Coordinates

ggplot(df) +
aes(x=Income, y=MntSweetProducts) +
geom_point(aes(col=factor(Teenhome)), size=2) +
scale_x_continuous(breaks=seq(0, 150000, 25000), labels = seq(0,150,25)) +
xlim(c(0, 115000)) +
ylim(c(0, 200))

200

150
MntSweetProducts

factor(Teenhome)
0
100
1
2

50

0 30000 60000 90000 120000


Income

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 35 / 47
Axis Transformation
Built in functions for axis transformations are :

scale_x_log10(), scale_y_log10() : for log10 transformation


scale_x_sqrt(), scale_y_sqrt() : for sqrt transformation
scale_x_reverse(), scale_y_reverse(): to reverse coordinates
coord_trans(x =“log10”, y=“log10”) : possible values for x and y are log2 , log10 ,
sqrt, . . .
scale_x_continuous(trans=‘log2’),
scale_y_continuous(trans=‘log2’): another allowed value for the
argument trans is log10
coord_flip(): flips coordinates

A continuous scale will handle things like numeric data (where there is a continuous set of
numbers), whereas a discrete scale (scale_x_discrete())will handle things like colors.

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 36 / 47
Labels & Annotations
Textual labels and annotations (on the plot, axes, geometry, and legend) are an important part
of making a plot understandable and communicating information.

ggplot(df) +
aes(x=Income, y=MntSweetProducts) +
geom_point(aes(col=factor(Teenhome)), size=2) +
scale_x_continuous(breaks=seq(0, 150000, 25000), labels = seq(0,150,25)) +
xlim(c(0, 115000)) +
ylim(c(0, 200)) +
labs(title="Income vs Amount of Sweet Products Bought",
subtitle="Customer dataset",
y="Amount of sweet products",
x="Income (in thousand units)",
color = "Teens at home",
caption="Customer Purchase Behaviour")

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 37 / 47
Labels & Annotations
Income vs Amount of Sweet Products Bought
Customer dataset

200

150
Amount of sweet products

Teens at home
0
100
1
2

50

0 30000 60000 90000 120000


Income (in thousand units)
Customer Purchase Behaviour

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 38 / 47
Dealing with Colours
ggplot2 allows to customize the shape colors thanks to its fill and color arguments. It is
important to understand the diffence between both. Note that color and colour always have
the same effect.
Methods to call a colour

Name: R offers about 657 color names. You can read all of them using colors().
rgb(red, green, blue, alpha): The rgb() function allows to build a color using a
quantity of red, green and blue. An additionnal parameter (alpha) is available to set the
transparency. All parameters ranged from 0 to 1.
Number: Also possible to call a function by its number. For instance, if you need the
color number 143, use colors()[143].
Hex code → All colors can be defined by their hex code. A hex code looks like this:
#69b3a2. To find the hex code of your colour, visit this colour picker.
Colour Libraries: Rcolorbrewer, paletteer etc.

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 39 / 47
Dealing with Colours: Rcolorbrewer Package
There are 3 types of palettes : Sequential palettes, Diverging palettes and Qualitative palettes.

YlOrRd
YlOrBr
YlGnBu
YlGn
Reds
RdPu
Purples
PuRd
PuBuGn
PuBu
OrRd
Oranges
Greys
Greens
GnBu
BuPu
BuGn
Blues
Set3
Set2
Set1
Pastel2
Pastel1
Paired
Dark2
Accent
Spectral
RdYlGn
RdYlBu
RdGy
RdBu
PuOr
PRGn
PiYG
BrBG

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 40 / 47
Dealing with Colours: RColorBrewer Package

library(RColorBrewer)
ggplot(df) +
aes(x=Income, y=MntSweetProducts) +
geom_point(aes(col=factor(Teenhome)), size=2) +
scale_colour_brewer(palette = "Set1") +
scale_x_continuous(breaks=seq(0, 150000, 25000), labels = seq(0,150,25)) +
xlim(c(0, 115000)) +
ylim(c(0, 200)) +
labs(title="Income vs Amount of Sweet Products Bought",
subtitle="Customer dataset",
y="Amount of sweet products",
x="Income (in thousand units)",
color = "Teens at home",
caption="Customer Purchase Behaviour") +
theme_classic()

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 41 / 47
Dealing with Colours: Rcolorbrewer Package
Income vs Amount of Sweet Products Bought
Customer dataset

200

150
Amount of sweet products

Teens at home
0
100
1
2

50

0 30000 60000 90000 120000


Income (in thousand units)
Customer Purchase Behaviour

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 42 / 47
Draw a Verical Line

ggplot(df) +
aes(x=Income, y=MntSweetProducts) +
geom_point(aes(col=factor(Teenhome)), size=2) +
scale_colour_brewer(palette = "Set1") +
scale_x_continuous(breaks=seq(0, 150000, 25000), labels = seq(0,150,25)) +
xlim(c(0, 115000)) +
ylim(c(0, 200)) +
geom_vline(xintercept = c(35000,88000), #geom_hline for horizontal
linetype="dotted",
color = "green",
size=1.5) +
labs(title="Income vs Amount of Sweet Products Bought",
subtitle="Customer dataset",
y="Amount of sweet products",
x="Income (in thousand units)",
color = "Teens at home",
caption="Customer Purchase Behaviour") +
theme_classic()

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 43 / 47
Draw a Verical Line
Income vs Amount of Sweet Products Bought
Customer dataset

200

150
Amount of sweet products

Teens at home
0
100
1
2

50

0 30000 60000 90000 120000


Income (in thousand units)
Customer Purchase Behaviour

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 44 / 47
Add a text annotation at a particular coordinate

ggplot(df) +
aes(x=Income, y=MntSweetProducts) +
geom_point(aes(col=factor(Teenhome)), size=2) +
scale_colour_brewer(palette = "Set1") +
scale_x_continuous(breaks=seq(0, 150000, 25000), labels = seq(0,150,25)) +
xlim(c(0, 115000)) +
ylim(c(0, 200)) +
geom_vline(xintercept = c(35000,88000), #geom_hline for horizontal
linetype="dotted",
color = "green",
size=1.5) +
geom_text(x=5000, y=175, label="Scatter plot")) +
labs(title="Income vs Amount of Sweet Products Bought",
subtitle="Customer dataset",
y="Amount of sweet products",
x="Income (in thousand units)",
color = "Teens at home",
caption="Customer Purchase Behaviour") +
theme_classic()

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 45 / 47
Add a text annotation at a particular coordinate
Income vs Amount of Sweet Products Bought
Customer dataset

200

Scatter plot

150
Amount of sweet products

Teens at home
0
100
1
2

50

0 30000 60000 90000 120000


Income (in thousand units)
Customer Purchase Behaviour

s.patra@iimkashipur.ac.in (Indian Institute of Management


Exploratory
Kashipur)
Data Analysis 46 / 47
Themes Options
There are three types of elements within the Themes Layer; text, line, and rectangle. Together
these three elements can control all the non-data ink in the graph.

Figure 1: Theme Elements

For more details, visit here.


## Themes Options
s.patra@iimkashipur.ac.in (Indian Institute of Management
Exploratory
Kashipur)
Data Analysis 47 / 47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy