Dav Exp8
Dav Exp8
EXPERIMENT NO. 8
Data Visualization: ggplot Library.
System Requirements: R.
Theory:
R language is designed for statistical computing, graphical data analysis, and scientific
research. It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages. Data visualization is defined as a graphical
representation that contains the information and the data. Data visualization techniques
provides an important suite of tools for identifying a qualitative understanding. This can be
helpful to explore the dataset and extract some information to know about a dataset and can
help with identifying patterns, corrupt data, outliers, and much more.
ggplot() library: initializes a ggplot object. It can be used to declare the input data frame for a
graphic and to specify the set of plot aesthetics intended to be common throughout all
subsequent layers unless specifically overridden. The data argument is default dataset to use for
plot. If not already a data.frame, will be converted to one by fortify(). If not specified, must be
supplied in each layer added to the plot. The mapping argument is default list of aesthetic
mappings to use for plot. If not specified, must be supplied in each layer added to the plot.
ggplot2() library: also termed as Grammar of Graphics is a free, open-source, and easy-to-use
visualization package widely used in R Programming Language. It is the most powerful
visualization package written by Hadley Wickham.
Building Blocks of layers with the grammar of graphics
o Data: The element is the data set itself
o Aesthetics: The data is to map onto the Aesthetics attributes such as x-axis, y-axis,
color, fill, size, labels, alpha, shape, line width, line type
o Geometrics: How our data being displayed using point, line, histogram, bar, boxplot
o Facets: It displays the subset of the data using Columns and rows
o Statistics: Binning, smoothing, descriptive, intermediate
o Coordinates: the space between data and display using Cartesian, fixed, polar, limits
o Themes: Non-data link
hflights: A data frame with 227,496 rows and 21 columns. The variables description is as
follows
Year, Month, DayofMonth: date of departure
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
o DayOfWeek: day of week of departure (useful for removing weekend effects)
o DepTime, ArrTime: departure and arrival times (in local time, hhmm)
o UniqueCarrier: unique abbreviation for a carrier
o FlightNum: flight number
o TailNum: airplane tail number
o ActualElapsedTime: elapsed time of flight, in minutes
o AirTime: flight time, in minutes
o ArrDelay, DepDelay: arrival and departure delays, in minutes
o Origin, Dest origin and destination airport codes
o Distance: distance of flight, in miles
o TaxiIn, TaxiOut: taxi in and out times in minutes
o Cancelled: cancelled indicator: 1 = Yes, 0 = No
o CancellationCode: reason for cancellation: A = carrier, B = weather, C = national air
system, D = security
o Diverted: diverted indicator: 1 = Yes, 0 = No
Program:
library(ggplot2)
library(vcd)
library(hflights)
library(dplyr)
ggplot(data=df,aes(x=Months,y=Sales)) +
geom_bar(stat = "identity", aes(color=Sales)) +
geom_text(aes(label=Sales,vjust=Sales/2),size = 3.5)
p=ggplot(data=df,aes(x=Months,y=Sales)) +
geom_bar(stat = "identity", aes(fill=Months)) + coord_flip()
#outside bars
ggplot(data=df, aes(x=Months,y=Sales)) +
geom_bar(stat = "identity", fill="Steelblue") +
geom_text(aes(label=Sales,vjust=Sales/2),size=3.5)
p+theme(legend.postition="top")
p+theme(legend.position="bottom")
#remove legend
p + theme(legend.position="none")
p+geom_line(aes(y=Sales),group=1,size=2,color="Pink")+
geom_text(aes(label=Sales),vjust=-.5,size=-3.5)
# dataset hflights
f_sub=filter(hflights,ArrDelay>400)
p=ggplot(f_sub,aes(x=ArrTime,y=ArrDelay,color=factor(DayOfWeek)))
p+geom_point()
f_sub=filter(hflights,DayOfWeek>5)
p=ggplot(f_sub,aes(x=DepTime,y=DepDelay,color=factor(Cancelled)))
p+geom_point()+geom_smooth()+geom_line()
ggplot(f_sub,aes(x=UniqueCarrier))+
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
geom_bar(color="purple",fill="#800000",width=0.5)+
coord_polar(theta='y')+
xlab("Carrier Name")+
ylab("Frequency")+
ggtitle("Carrier Frequencies")
#num num
f_sub=filter(hflights,DepDelay<50)
ggplot(f_sub,aes(x=DepTime))+
geom_bar(color='orange',width=0.5)+
coord_flip()
#cat cat
ggplot(hflights,aes(x=Origin,fill=UniqueCarrier))+
geom_bar()
Output:
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
Conclusion:
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab
SIES Graduate School of Technology
Sri Chandrasekarendra Saraswati Vidyapuram
Sector V, Nerul, Navi Mumbai - 400706
Conclusion:
In this experiment, we explored the ggplot2 library in R to create various types of
visualizations such as scatter plots, box plots, histograms, and bar charts. By customizing
colors, line styles, and other elements, we learned how to effectively represent and interpret
data, enabling better insights and decision-making.
Department of Artificial Intelligence and Data Science CSL601: Data Analytics and Visualization Lab