Fds Manual
Fds Manual
Department
of
Computer Science and Business System
Laboratory Manual
Academic Year:2024-25
Regulation : R21
Note: The Programs are written as per the idea of imparting basic knowledge in students.
Writing a Program is similar to cooking a dish, one must know what you want to cook,
what ingredients are needed and how to use them to create a marvelous and delicious
dish, hope you people enjoy programming.
Department Vision
To become a centre for excellence with a focused research, innovation and to stand as an
exemplary institute for Computer Science and Business System by enabling students to
develop enthralling industrial and management skills.
DM-1: Provide a rigorous theoretical and practical framework across state of the art
infrastructure with an emphasis on software development.
DM-2: Impart the skills necessary to amplify the pedagogy to grow technically and to meet
interdisciplinary needs with collaborations and innovative research abilities and societal
needs.
DM-3: To develop globally competent engineers with excellent managerial skills to become
leaders and entrepreneurs through quality pedagogy.
DM-4: To evolve as a centre of excellence in the field of interdisciplinary engineering
research and practice.
Program Outcomes:
Engineering graduates will be able to:
Course Outcomes:
1. Illustrate the use of various data structures.
2. Analyze and manipulate Data using Pandas
3. Creating static, animated, and interactive visualizations using Matplotlib.
4. Understand the implementation procedures for the machine learning algorithms.
5. Apply appropriate data sets to the Machine Learning algorithms and Identify
appropriate algorithms to solve real-world problems
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
LIST OF EXPERIMENTS
Program
Name Of The Program Page No.
No.
R AS CALCULATOR APPLICATION
a. Using with and without R objects on console
1 b. Using mathematical functions on console 9
c. Write an R script, to create R objects for calculator application and
save in a specified location in disk
DESCRIPTIVE STATISTICS IN R
2 a. Write an R script to find basic descriptive statistics using summary 13
b. Write an R script to find suBUet of dataset by using suBUet ()
REGRESSION MODEL
Import a data from web storage. Name the dataset and now do
Logistic Regression to find out relation between variables that are
5 affecting the admission of a student in a institute based on his or her 30
GRE score, GPA obtained and rank of the student. Also check the
model is fit or not. require (foreign), require (MASS).
CLASSIFICATION MODEL
a. Install relevant packages for classification.
8 b. Choose a classifier for classification problems. 73
c. Evaluate the performance of the classifier.
CLUSTERING MODEL
9 a. Clustering algorithms for unsupervised classification. 78
b. Plot the cluster data using R visualizations.
Program specific
Program Outcomes outcomes
COs-
POs 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
CO1
CO2
CO3
CO4
CO5
LAB INSTRUCTOINS
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
Students should report to the concerned lab as per the time table.
Students who turn up late to the labs will in no case be permitted to do the program
schedule for the day.
After completion of the program, certification of the concerned staff in-charge in the
observation book is necessary.
Student should bring a notebook of 150 pages and should enter the output
/observations into the notebook while performing the experiment.
The record of observations along with the detailed experimental procedure of the
experiment in the immediate next session should be submitted and certified staff
member in-charge.
Students should be present in the lab for total scheduled duration.
Students are required to prepare thoroughly the algorithm to perform the
experiment before coming to laboratory.
System Requirements
Intel based desktop PC with minimum of 2.6GHZ or faster processor with at least 1 GB
RAM and 40 GB free disk space and LAN connected.
Operating system : Flavor of any WINDOWS or UNIX
Software : R-Studio IDE and R Software
1. R AS CALCULATOR APPLICATION
Using with and without R objects on console
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
# Program make a simple calculator that can add, subtract, multiply and divide using
functions
add <- function(x, y) {
return(x + y)
}
subtract <- function(x, y) {
return(x - y)
}
multiply <- function(x, y) {
return(x * y)
}
divide <- function(x, y) {
return(x / y)
}
# take input from the user
print("Select operation.")
print("1.Add")
print("2.Subtract")
print("3.Multiply")
print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))
num1 = as.integer(readline(prompt="Enter first number: "))
num2 = as.integer(readline(prompt="Enter second number: "))
operator <- switch(choice,"+","-","*","/")
result <- switch(choice, add(num1, num2), subtract(num1, num2), multiply(num1, num2),
divide(num1, num2))
print(paste(num1, operator, num2, "=", result))
Output
1. DESCRIPTIVE STATISTICS IN R
a. Write an R script to find basic descriptive statistics using summary
# 1 2 3 4 5 6 7 8 9 10
Now, we can use the summary command to calculate summary statistics of our
vector:
Read tabular data into R read.table(file, header = FALSE, sep = "", dec = ".")
# Read "comma separated value" files (".csv") read.csv(file, header = TRUE, sep =
",", dec = ".", ...)
# Or use read.csv2: variant used in countries that
# use a comma as decimal point and a semicolon as field separator. read.csv2(file,
header = TRUE, sep = ";", dec = ",", ...)
# Read TAB delimited files read.delim(file, header = TRUE, sep = "\t", dec = ".", ...)
read.delim2(file, header = TRUE, sep = "\t", dec = ",", ...)
VISUALIZATIONS
Syntax
The basic syntax to create a boxplot in R is −
boxplot(x, data, notch, varwidth, names, main)
Following is the description of the parameters used −
x is a vector or a formula.
data is the data frame.
notch is a logical value. Set as TRUE to draw a notch.
varwidth is a logical value. Set as true to draw width of the box proportionate
to the sample size.
names are the group labels which will be printed under each boxplot.
main is used to give a title to the graph.
Example
We use the data set "mtcars" available in the R environment to create a basic
boxplot. Let's look at the columns "mpg" and "cyl" in mtcars.
Scatterplots
Scatterplots show many points plotted in the Cartesian plane. Each point represents the values
of two variables. One variable is chosen in the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
Syntax
The basic syntax for creating scatterplot in R is
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Example
We use the data set "mtcars" available in the R environment to create a basic
scatterplot. Let's use the columns "wt" and "mpg" in mtcars.
OUTPUT
Scatterplot Matrices
When we have more than two variables and we want to find the correlation between
one variable versus the remaining ones we use scatterplot matrix. We
use pairs() function to create matrices of scatterplots.
Syntax
The basic syntax for creating scatterplot matrices in R is −
pairs(formula, data)
Following is the description of the parameters used −
formula represents the series of variables used in pairs.
data represents the data set from which the variables will be taken.
Example
Each variable is paired up with each of the remaining variable. A scatterplot is
plotted for each pair.
pairs(~wt+mpg+disp+cyl,data = mtcars,
OUTPUT:
Outlier Analysis -
At first, it is very important to detect the presence of outliers in the dataset.
So, let us begin. We have made use of the Bike Rental Count Prediction dataset.
You can find the dataset here!
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
x=data.frame(mtcars)
print(x)
print(x)
Prior to outlier detection, we have performed missing value analysis just to check for
the presence of any NULL or missing values. For the same, we have made use
of sum(is.na(data)) function.
Histograms
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
Simple Histogram
CODE:Draw histogram on mtcars dataset
hist(mtcars$mpg)
OUTPUT:
h <- hist(x, breaks = 10, col = "red", xlab = "Miles Per Gallon", main = "Histogram
Out put
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
> install.packages("ggplot2")
library(ggplot2)
CODE Explanation
You pass the dataset mtcars to ggplot.
Inside the aes() argument, you add the x-axis as a factor variable(cyl)
The + sign means you want R to keep reading the code. It makes the
code more readable by breaking it.
Use geom_bar() for the geometric object.
Output
CODE Explanation
You pass slices dataset for giving weightage to countries
Take lbls variable for labels as countries
Draw Pie chart using pie function
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
PROBLEM DEFINATION:
a)How to find a corelation matrix and plot the correlation on iris data set
R SOURCE CODE:
d<-data.frame(x1=rnorm(10),x2=rnorm(10),x3=rnorm(10))
cor(d)
m<-cor(d)#get correlations
library('corrplot')
corrplot(m,method='square')
x<-matrix(rnorm(2),nrow=5,ncol=4)
y<-matrix(rnorm(15),nrow=5,ncol=3)
COR<-cor(x,y)
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
COR
OUTPUT:
4.b) Plot the correlation plot on dataset and visualize giving an overview of
relationships among data on iris data.
R-Code
library(ggplot2)
library(tidyr)
library(datasets)
data("iris")
summary(iris)
Create a correlation matrix of the Iris dataset using the Data Explorer correlation
function . Include only continuous variables in your correlation plot to avoid
confusion as factor variables don’t make sense in a correlation plot
library(DataExplorer)
library(corrplot)
Output:
corrplot 0.92 loaded
correlation plot
title="matrix_iris"
plot_correlation(iris)
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
Output:
The correlation coefficient between Petal Length and Petal Width is 0.96. The correlation
cofficient of Sepal length and Sepal Width is -0.12, which indicate that Sepal length and
Sepal Width has negaive correlate relationship. When the correlation coefficient between
Petal Length and Petal Width is 0.96, Petal Length and Petal Width have stronger
correlation relationship than Sepal length and Sepal Width.
# Data Frame
df <- as.data.frame(cbind(IQ, result))
Output:
IQ result
1 25.46872 0
2 26.72004 0
3 27.16163 0
4 27.55291 1
5 27.72577 0
6 28.00731 0
7 28.18095 0
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
8 28.28053 0
9 28.29086 0
10 28.34474 1
11 28.35581 1
12 28.40969 0
13 28.72583 0
14 28.81105 0
15 28.87337 1
16 29.00383 1
17 29.01762 0
18 29.03629 0
19 29.18109 1
20 29.39251 0
21 29.40852 0
22 29.78844 0
23 29.80456 1
24 29.81815 0
25 29.86478 0
26 29.91535 1
27 30.04204 1
28 30.09565 0
29 30.28495 1
30 30.39359 1
31 30.78886 1
32 30.79307 1
33 30.98601 1
34 31.14602 0
35 31.48225 1
36 31.74983 1
37 31.94705 1
38 31.94772 1
39 33.63058 0
Fundamentals of Data Science Lab
Dept. of Computer Science and Business System Lab Manual 2024 – 25
40 35.35096 1
Call:
glm(formula = result ~ IQ, family = binomial, data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1451 -0.9742 -0.4950 1.0326 1.7283
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -16.8093 7.3368 -2.291 0.0220 *
IQ 0.5651 0.2482 2.276 0.0228 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1