0% found this document useful (0 votes)
6 views10 pages

Dav Exp8

The document outlines an experiment focused on exploring the ggplot2 library in R for data visualization. It includes objectives such as creating various plots, customizing aesthetics, and analyzing a dataset (hflights) to draw conclusions. The conclusion emphasizes the importance of effective data representation for insights and decision-making.

Uploaded by

aayushdumbre93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views10 pages

Dav Exp8

The document outlines an experiment focused on exploring the ggplot2 library in R for data visualization. It includes objectives such as creating various plots, customizing aesthetics, and analyzing a dataset (hflights) to draw conclusions. The conclusion emphasizes the importance of effective data representation for insights and decision-making.

Uploaded by

aayushdumbre93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

SIES Graduate School of Technology

Sri Chandrasekarendra Saraswati Vidyapuram


Sector V, Nerul, Navi Mumbai - 400706

Name of the Student: AAYUSH


Roll No.:02
Class:TE AI&DS
Batch: B1

EXPERIMENT NO. 8
Data Visualization: ggplot Library.

Aim: 1. Write a program in R to explore the ggplot library.


2. Write a program to plot, subplot functions, change the color, line style of the plot.
3. Write a program to visualize data using scatterplot, boxplot, histogram, line graph,
bar chart, violin plot, and regression plot and to draw appropriate conclusions.

System Requirements: R.

Theory:
R language is designed for statistical computing, graphical data analysis, and scientific
research. It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages. Data visualization is defined as a graphical
representation that contains the information and the data. Data visualization techniques
provides an important suite of tools for identifying a qualitative understanding. This can be
helpful to explore the dataset and extract some information to know about a dataset and can
help with identifying patterns, corrupt data, outliers, and much more.
ggplot() library: initializes a ggplot object. It can be used to declare the input data frame for a
graphic and to specify the set of plot aesthetics intended to be common throughout all
subsequent layers unless specifically overridden. The data argument is default dataset to use for
plot. If not already a data.frame, will be converted to one by fortify(). If not specified, must be
supplied in each layer added to the plot. The mapping argument is default list of aesthetic
mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
ggplot2() library: also termed as Grammar of Graphics is a free, open-source, and easy-to-use
visualization package widely used in R Programming Language. It is the most powerful
visualization package written by Hadley Wickham.
Building Blocks of layers with the grammar of graphics
o Data: The element is the data set itself
o Aesthetics: The data is to map onto the Aesthetics attributes such as x-axis, y-axis,
color, fill, size, labels, alpha, shape, line width, line type
o Geometrics: How our data being displayed using point, line, histogram, bar, boxplot
o Facets: It displays the subset of the data using Columns and rows
o Statistics: Binning, smoothing, descriptive, intermediate
o Coordinates: the space between data and display using Cartesian, fixed, polar, limits
o Themes: Non-data link

hflights: A data frame with 227,496 rows and 21 columns. The variables description is as
follows
Year, Month, DayofMonth: date of departure
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
o DayOfWeek: day of week of departure (useful for removing weekend effects)
o DepTime, ArrTime: departure and arrival times (in local time, hhmm)
o UniqueCarrier: unique abbreviation for a carrier
o FlightNum: flight number
o TailNum: airplane tail number
o ActualElapsedTime: elapsed time of flight, in minutes
o AirTime: flight time, in minutes
o ArrDelay, DepDelay: arrival and departure delays, in minutes
o Origin, Dest origin and destination airport codes
o Distance: distance of flight, in miles
o TaxiIn, TaxiOut: taxi in and out times in minutes
o Cancelled: cancelled indicator: 1 = Yes, 0 = No
o CancellationCode: reason for cancellation: A = carrier, B = weather, C = national air
system, D = security
o Diverted: diverted indicator: 1 = Yes, 0 = No

Program:
library(ggplot2)
library(vcd)
library(hflights)
library(dplyr)

df = data.frame(Months = c("Jan","Feb","Mar","Apr"),Sales = c(4.2,10,29.5,15))


ggplot(data=df, aes(x=Months,y=Sales)) + geom_point()

ggplot(data=df,aes(x=Months,y=Sales)) +
geom_bar(stat = "identity", aes(color=Sales)) +
geom_text(aes(label=Sales,vjust=Sales/2),size = 3.5)

p=ggplot(data=df,aes(x=Months,y=Sales)) +
geom_bar(stat = "identity", aes(fill=Months)) + coord_flip()

# bar plots with labels

#outside bars
ggplot(data=df, aes(x=Months,y=Sales)) +
geom_bar(stat = "identity", fill="Steelblue") +
geom_text(aes(label=Sales,vjust=Sales/2),size=3.5)

#change barplot line colors by groups


p=ggplot(data=df,aes(x=Months,y=Sales,color=Months))+
geom_bar(stat="identity", fill=c("#999190","#E60","#56B","#800"))
p

#change legend position


Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

p+theme(legend.postition="top")
p+theme(legend.position="bottom")

#remove legend
p + theme(legend.position="none")

#add trend line


p=ggplot(df,aes(x=Months,y=Sales,fill=Months)) +
geom_bar(stat="identity")

p+geom_line(aes(y=Sales),group=1,size=2,color="Pink")+
geom_text(aes(label=Sales),vjust=-.5,size=-3.5)

# dataset hflights
f_sub=filter(hflights,ArrDelay>400)
p=ggplot(f_sub,aes(x=ArrTime,y=ArrDelay,color=factor(DayOfWeek)))
p+geom_point()

f_sub=filter(hflights,DayOfWeek>5)
p=ggplot(f_sub,aes(x=DepTime,y=DepDelay,color=factor(Cancelled)))
p+geom_point()+geom_smooth()+geom_line()

#single numeric variable


#histogram
f_sub=filter(hflights,DepDelay<200)
#density
ggplot(f_sub,aes(x=DepDelay)) + geom_density()
#boxplots
ggplot(f_sub,aes(x="DepDelay",y=DepDelay))+ geom_boxplot()
#violin plot
ggplot(f_sub,aes(x="DepDelay",y=DepDelay)) + geom_violin(color="Purple")

#comparing density with normal


ggplot(f_sub,aes(x=DepDelay)) + geom_density(color="blue") +
stat_function(fun = dnorm, args =
list(mean=mean(f_sub$DepDelay),sd=sd(f_sub$DepDelay)),color="green")
#single categorical variable
f_sub=hflights%>%filter(UniqueCarrier %in% c("UA","AA","XE"))
p=ggplot(f_sub,aes(x=UniqueCarrier,y=DepDelay,color=Cancelled))
p+geom_point()

ggplot(f_sub,aes(x=UniqueCarrier))+
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

geom_bar(color="purple",fill="#800000",width=0.5)+
coord_polar(theta='y')+
xlab("Carrier Name")+
ylab("Frequency")+
ggtitle("Carrier Frequencies")
#num num
f_sub=filter(hflights,DepDelay<50)
ggplot(f_sub,aes(x=DepTime))+
geom_bar(color='orange',width=0.5)+
coord_flip()
#cat cat
ggplot(hflights,aes(x=Origin,fill=UniqueCarrier))+
geom_bar()

Output:

Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

Conclusion:

Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706

Conclusion:
In this experiment, we explored the ggplot2 library in R to create various types of
visualizations such as scatter plots, box plots, histograms, and bar charts. By customizing
colors, line styles, and other elements, we learned how to effectively represent and interpret
data, enabling better insights and decision-making.

Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy