0% found this document useful (0 votes)

20 views76 pages

Big Data Manual - Fall 2023

Uploaded by

rukshana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views76 pages

Big Data Manual - Fall 2023

Uploaded by

rukshana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 76

SKYLINE UNIVERSITY COLLEGE

School of Computing
University City of Sharjah

COURSE CODE SIT4112

COURSE NAME BIG DATA ANALYTICS
STUDENT ID 17084
STUDENT NAME MOUFI OWAIDA

BIG DATA ANALYTICS

Lab Manual
SIT4112
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT INDEX

COURSE LEARNING OUTCOMES

Upon the successful completion of the course, the student will be able to:
CLO1: Describe the tools and methods suitable for big data analysis.
CLO2: Analyze methods to identify, classify and extract information from
unstructured datasets.
CLO3: Evaluate the techniques to interpret data using visual information.

Experimen Experimen Experimen Experimen Experimen Experimen Experimen Experimen

t #1 t #2 t #3 t #4 t #5 t #6 t #7 t #8

CLO3 CLO3 CLO3 CLO3 CLO3 CLO3 CLO3 CLO3

1|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

MARKS BREAKDOWN

Question CLO Covered Max Marks Marks Obtained

Lab Manual- CLO3
Experiment-1
Lab Manual- CLO3
Experiment-2
Lab Manual- CLO3
Experiment-3
Lab Manual- CLO3
Experiment-4
Lab Manual- CLO3
Experiment-5
Lab Manual- CLO3
Experiment-6
Lab Manual- CLO3
Experiment-7
Lab Manual- CLO3
Experiment-8

Total Marks

Instructor Sign

2|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT NO:1

Name of experiment: - Installing Hadoop

Installing Hadoop involves several steps. Below are the high-level steps for installing Hadoop on a
Unix-like system (e.g., Linux) or the Windows Subsystem for Linux (WSL) on Windows. Installing
Hadoop directly on Windows can be challenging, so using WSL is recommended for Windows users.

Prerequisites:

1. Java: Ensure you have Java (preferably Java 8 or later) installed on your system. You can
check by running java -version.

Installation Steps:

1.Installing Java

 Oracle java download

 Choose Java 8 for windows
 Download java
 Move to the C drive inside Java folder and install java.
 Cut jdk from program files and paste it inside Java folder in C.

 Edit the system environment variable.

 Create a NEW environment variable name as JAVA_HOME and value as C:\Java\
jdk1.8.0_281\bin
 EDIT the path and give the NEW path as C:\Java\jdk1.8.0_281\bin.

 Verify Installation:
Run the cmd and check the java version.

Download Hadoop:

1 Visit the Apache Hadoop website (https://hadoop.apache.org/) and download the Hadoop
distribution that suits your needs. Choose a stable version and download the tarball (e.g.,
hadoop-x.y.z.tar.gz).

2. Extract Hadoop:

1. Extract the downloaded tarball to your preferred directory(C drive and name it as
Hadoop.

2. Configure Hadoop files:

3|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

3. Navigate to the Hadoop installation directory, and configure the Hadoop

environment by editing the etc/hadoop/hadoop-env.sh file. Set the JAVA_HOME
variable to the path as C:\Java\jdk1.8.0_281

4. For example: JAVA_HOME= C:\Java\jdk1.8.0_281

 Edit the system environment variable.

 Create a NEW environment variable name as HADOOP_HOME and value as C:\
hadoop\bin
 EDIT the path and give the NEW path as C:\hadoop\bin and C:\hadoop\sbin
 Verify Installation:
o Run the cmd and check the hadoop.
o Hadoop version

3. Edit Hadoop Configuration Files:

1. Hadoop's configuration files are located in the etc/hadoop directory. You'll need to
modify these files to suit your cluster setup. Key configuration files include:

1. core-site.xml: Configure core properties such as the Hadoop distributed file

system URI.
# For core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

 hdfs-site.xml: Configure HDFS-specific properties such as the block size and replication
factor.
Create a folder in Hadoop name as data
Inside data folder ,create namenode and datanode

# For hdfs-site.xml or https-site.xml

4|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop\data\datanode</value>
</property>

 mapred-site.xml: Configure properties related to the MapReduce framework.

# For mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

 yarn-site.xml: Configure properties for the Hadoop resource manager.

# For yarn-site.xml
<property>

5|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property><property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.cl
ass</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

You can find templates for these files in the etc/hadoop directory. Copy them to the
same directory and make your changes.

Fix of Hadoop ‘bin’ Folder

Now, you need to fix some configuration files. To do it, you need to replace the Hadoop bin folder
with another bin folder which already contains all the files properly configured. First, download this
compressed file (hadoop3_xFixedbin.rar). Then, you need to delete bin folder:

Initialize HDFS:

Before starting Hadoop services, you need to initialize the HDFS filesystem. Run the following
command:

hdfs namenode -format

Start Hadoop Services:

 You can start Hadoop services using the sbin/start-all.sh script. This will start the
HDFS, YARN (Resource Manager and Node Manager), and other components.

Verify Installation:

 Open a web browser and go to the Hadoop ResourceManager web interface at

http://localhost:8088 to verify that the ResourceManager is running.

 You can also use Hadoop shell commands (hadoop fs -ls /) to interact with the HDFS
and ensure it's working as expected.

6|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Manage Hadoop Services:

 You can stop all Hadoop services using the sbin/stop-all.sh script.

Optional: If you plan to set up a multi-node cluster, you'll need to configure each node in a similar
manner. Ensure that all nodes can communicate with each other and update the configuration files
accordingly.

EXPERIMENT NO:2

Name of experiment: - HDFS Commands

HDFS is the primary or major component of the Hadoop ecosystem which is responsible for storing
large data sets of structured or unstructured data across various nodes and thereby maintaining the
metadata in the form of log files. To use the HDFS commands, first you need to start the Hadoop
services using the following command:

sbin/start-all.sh

Sbin.start-all.cmd

To check the Hadoop services are up and running use the following command:

Jps

Commands:

1. ls: This command is used to list all the files.

Syntax:

bin/hdfs dfs -ls <path>

Example:

bin/hdfs dfs -ls /

It will print all the directories present in HDFS. bin directory contains executables so, bin/hdfs
means we want the executables of hdfs particularly dfs(Distributed File System) commands

2. mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So let’s
first create it.

7|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Syntax:

bin/hdf dfs -mkdir <folder name>

creating home directory:

hdfs/bin -mkdir /user
hdfs/bin -mkdir /user/username -> write the username of your computer
Example:
bin/hdfs dfs -mkdir /myfolder => '/' means absolute path

3. touchz: It creates an empty file.

Syntax:

bin/hdfs dfs -touchz <file_path>

Example:
bin/hdfs dfs -touchz /myfolder/myfile.txt

4. copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the
most important command. Local filesystem means the files present on the OS.

Syntax:

bin/hdfs dfs -copyFromLocal <local file path> <dest(present on hdfs)>

Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy to folder geeks
present on hdfs.

bin/hdfs dfs -copyFromLocal ../Desktop/AI.txt /geeks

(OR)

bin/hdfs dfs -put ../Desktop/AI.txt /geeks

8|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

5. cat: To print file contents.

Syntax:

bin/hdfs dfs -cat <path>

Example:

// print the content of AI.txt present

// inside geeks folder.

bin/hdfs dfs -cat /geeks/AI.txt

6. copyToLocal (or) get: To copy files/folders from hdfs store to local file system.

Syntax:

bin/hdfs dfs -copyToLocal <<srcfile(on hdfs)> <local file dest>

Example:

bin/hdfs dfs -copyToLocal /geeks ../Desktop/hero

(OR)

bin/hdfs dfs -get /geeks/myfile.txt ../Desktop/hero

myfile.txt from geeks folder will be copied to folder hero present on Desktop.

7. moveFromLocal: This command will move file from local to hdfs.

Syntax:

bin/hdfs dfs -moveFromLocal <local src> <dest(on hdfs)>

Example:

bin/hdfs dfs -moveFromLocal ../Desktop/cutAndPaste.txt /geeks

9|Page
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

8. cp: This command is used to copy files within hdfs. Lets copy folder geeks to geeks_copied.

Syntax:

bin/hdfs dfs -cp <src(on hdfs)> <dest(on hdfs)>

Example:

bin/hdfs -cp /geeks /geeks_copied

9 . mv: This command is used to move files within hdfs. Let’s cut-paste a file myfile.txt from geeks
folder to geeks_copied.

Syntax:
bin/hdfs dfs -mv <src(on hdfs)> <src(on hdfs)>

Example:
bin/hdfs -mv /geeks/myfile.txt /geeks_copied

10. rmr: This command deletes a file from HDFS recursively. It is very useful command when you
want to delete a non-empty directory.

Syntax:

bin/hdfs dfs -rmr <filename/directoryName>

Example:

bin/hdfs dfs -rmr /geeks_copied -> It will delete all the content inside the
directory then the directory itself.

10 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

11. du: It will give the size of each file in directory.

Syntax:

bin/hdfs dfs -du <dirName>

Example:

bin/hdfs dfs -du /geeks

12. dus:: This command will give the total size of directory/file.

Syntax:

bin/hdfs dfs -dus <dirName>

Example:

bin/hdfs dfs -dus /geeks

13. stat: It will give the last modified time of directory or path. In short it will give stats of the
directory or file.

Syntax:

bin/hdfs dfs -stat <hdfs file>

Example:

bin/hdfs dfs -stat /geeks

11 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT NO. 3

Name of experiment: Running a MapReduce Job

To run a MapReduce job to find the word count.

MapReduce in Hadoop is a software framework for ease in writing applications of software,

processing huge amounts of data. MapReduce provides the facility to distribute the workload among
various nodes. Hence, reducing the processing time as data on which the computation needs to be
done is now divided into small chunks and individually processed. Through MapReduce you can
achieve parallel processing resulting in faster execution of the job.

MapReduce Word Count is a framework which splits the chunk of data, sorts the map outputs and
input to reduce tasks. A File-system stores the output and input of jobs. Re-execution of failed tasks,
scheduling them and monitoring them is the task of the framework.

Figure below shows the architecture as well as working of MapReduce with an example:

12 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Splitting: The parameter of splitter can be anything. By comma, space, by a new line or a semicolon.

Mapping: This is done as explained below.

Shuffle/Intermediate splitting: The process is usually parallel on cluster keys. The output of the map
gets into the Reducer phase and all the similar keys of data are aligned in a cluster.

Reducing: This is done as explained below. Final result — All the data is clustered or combined to
show the together form of a result.

First, we will import our dataset into the HDFS (Hadoop Distributed File System).

The dataset can be a simple txt file with some words or sentences written in it.

Please follow these steps to import the dataset:

Step 1. Download and extract the compressed file of the text-based dataset from a source and save
it as a txt file in your local storage.

Step 2. Create a directory in the Hadoop HDFS by using the following command:

Step 3. Copy the txt file in HDFS from the local directory.

13 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Step 4: Run MapReduceClient.jar and also provide input and out directories.

Step 5: Verify content of the copied file.

EXPERIMENT NO. 4

Name of experiment: Understanding Big Data Analytics Types.

In this lab, students will import different files and perform the following tasks:

● Data Preparation

● Descriptive Statistics

● Correlation Matrix

● Data Visualization using Histogram, Scatter Plot, Barchart etc.

Code and Output

14 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

15 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

16 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

17 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

18 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

19 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT NO:5

Name of experiment: - CLUSTERING

Goal: Understanding the intuition of Clustering using K-means clustering

Theory: Imagine a dataset consisting of several points spread over an n-dimensional space. In order
to find patterns over data points on a n-dimensional space, we use unsupervised methods.

One of the most popular unsupervised method is clustering. Clustering is the task of grouping

together a set of objects such that objects in the same cluster are more similar to each

other than objects in different cluster.

Clustering algorithms can be categorized based on their cluster model, in other words on how they
form clusters or groups. Some of the prominent based clustering algorithms are connectivity-based
clustering, centroid-based clustering, Distribution based clustering and density based methods.

In this exercise, centroid based clustering is implemented. In this type of clustering, clusters

are represented by a central vector or a centroid. This centroid might not necessarily be a

member of the dataset. This is an iterative clustering algorithm where the notion of similarity

is derived on how close the data point is to the center of the cluster. In this exercise, we will be
working on mall customers data.

Software Tools: Python - Jupyter Notebook

20 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Procedure:

1. Import the required libraries and load dataset

2. Use scatter plot to visualize the expected clusters.

3. Apply KMeans to predict the clusters.

4. Pre-process the dataframe using min max scaler

5. Predict the clutsters again and analyze the issues while preparing clusters.

6. Use Elbow technique to show number of clusters that can be formed.

CODE + DISCUSSION

21 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

22 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

23 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

24 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

25 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

26 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Output

27 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

28 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

29 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

30 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

31 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT NO: 6

Goal: Introduction to Market basket analysis using Apriori Algorithm.

Theory: Market Basket Analysis is one of the key techniques used by large retailers to uncover
associations between items. It works by looking for combinations of items that occur together
frequently in transactions. To put it another way, it allows retailers to identify relationships between
the items that people buy.

Software Tools: Python - Jupyter Notebook

Example: You are a data scientist (or becoming one!), and you get a client who runs a retail store.
Your client gives you data for all transactions that consists of items bought in the store by several
customers over a period of time and asks you to use that data to help boost their business. Your
client will use your findings to not only change/update/add items in inventory but also use them to
change the layout of the physical store or rather an online store. To find results that will help your
client, you will use Market Basket Analysis (MBA) which uses Association Rule Mining on the given
transaction data

Procedure:

Using Python

 Read the data into variable and check the descriptive information of read data.
 Start with item sets containing just a single item.
 Determine the support for item sets. Keep the item sets that meet your minimum support
threshold and remove item sets that do not.

32 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

 Using the item sets you have kept from Step 1, generate all the possible itemset
configurations.
 Repeat Steps 3 & 4 until there are no more new items etc.

33 | P a g e
CODE & OUTPUT

SKYLINE UNIVERSITY COLLEGE

!pipUniversity
install squarify
City of Sharjah

-------------------------------------

import numpy as np

import pandas as pd

import squarify

import matplotlib.pyplot as plt

#importing the packages required for market basket analysis

from mlxtend.frequent_patterns import apriori

from mlxtend.frequent_patterns import association_rules

from mlxtend.preprocessing import TransactionEncoder

-------------------------------------

df = pd.read_csv('MBA.csv', header=None)

df.head()
-------------------------------------

df.columns
-------------------------------------

len(df.columns)
-------------------------------------

df_result = pd.DataFrame()

for i in range(len(df.columns)):

df_result = df_result.append(df[i].value_counts())
-------------------------------------

df_result
-------------------------------------

df_sum = df_result.sum()

df_sum = df_sum.sort_values(ascending=False)

df_sum

-------------------------------------

plt.figure(figsize=(15,10))
34 | P a g e
count = 40

plt.title('Frequency Plot')

color = plt.cm.spring(np.linspace(0,1,count))
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

35 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

36 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

37 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Output

38 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

39 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

40 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

41 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT NO: 7

Goal: Applying Linear Regression on Vehicles data.

Theory: Linear regression is based on least square estimation which says regression
coefficients (estimates) should be chosen in such a way that it minimizes the sum of the
squared distances of each observed response to its fitted value.

Software Tools: Python - Jupyter Notebook

Example: In this experiment, we will be working with vehicle dataset to understand The relation
between labor cost (dependent variable) and other variables such as labor hours using the technique
Linear Regression.

Procedure:

Using Python

1. Read the data into variable vehicle, and check the summary of read data.
2. Analyze the dependent and independent variables and create a scatter plot.
3. Apply the Linear Regression Model.
4. Create the Linear Regression Model for response variable.
5. Analyze the summary of result, and remove non-significant variable(s).
6. Re-run the model.
7. Predict the value using reduced model.

42 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Code & Output

43 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

44 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

45 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

46 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

47 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

48 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

49 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

50 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

51 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

52 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

53 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

54 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

55 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

56 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT NO: 8

Goal: Decision Tree for Classifying Mushrooms as safe to eat or poisonous.

Theory: A decision tree is a decision support tool that uses a tree-like graph or model

of decisions and their possible consequences, including chance event outcomes, resource costs,

and utility. It is one way to display an algorithm that only contains conditional control

statements.

Software Tools: Python – Jupyter notebook

Example: This dataset is originally recorded from UCI Machine Learning Repository. The main aim of
this dataset is to classify whether the mushrooms are edible(e) or poisonous(p) to consume. Each
and every species can be identified as purely edible, purely poisonous or unknown edibility
contained in the dataset.

The detailed description of attributes mentioned in the dataset are listed below:

cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s

cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s

cap-color:brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y

bruises?: bruises=t,no=f

odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s

gill-attachment: attached=a,descending=d,free=f,notched=n

gill-spacing: close=c,crowded=w,distant=d

gill-size: broad=b,narrow=n

gill-color:black=k,brown=n,buff=b,chocolate=h,gray=g,green=r,orange=o,pink=p,purple=u,red=e,
white=w,yellow=y

stalk-shape: enlarging=e,tapering=t

stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?

stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s

stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s

57 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

stalk-color-above-ing:brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y

veil-type: partial=p,universal=u

veil-color: brown=n,orange=o,white=w,yellow=y

ring-number: none=n,one=o,two=t

ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z

spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y

population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y

habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d

Here, classification is being performed using 2 classes i.e. edible or poisonous.

Logistic Regression approach is used for classification. Initially, the data is converted to numerical
format using LabelEncoder. Then the data is splitted into training & testing sets. By
applying logistic regression model the resultant is produced with 95% accuracy. The
overall impact of every parameter over the classes is shown using a heatmap.

Target Variable = Class (e: edible, p: poisonous)

Procedure:

Using Python

 Read the data into variable, and check the summary of read data.
 Clean the dataset and remove irrelevant columns
 Partition the data into training and test datasets.
 Create a classification tree from training data.
 Use predict function, with validate data set to create learning model.
 Display misclassification error for validate and train dataset.

58 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Code & Output

59 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

60 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

61 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

62 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

63 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

64 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Output

65 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

66 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

67 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

68 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

EXPERIMENT NO: 9

69 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Goal: Data Visualization

Python provides various libraries that come with different features for visualizing data. All these
libraries come with different features and can support various types of graphs. In this tutorial, we
will
be discussing four such libraries.

Matplotlib
Seaborn
Bokeh
Plotly

Scatter Plot

Line Chart

70 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Histogram

71 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

72 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

Output

73 | P a g e
SKYLINE UNIVERSITY COLLEGE
University City of Sharjah

74 | P a g e

Big Data Journal
No ratings yet
Big Data Journal
50 pages
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Practical N0.2 AIM: Install Hadoop Hadoop Installation On Windows 10
No ratings yet
Practical N0.2 AIM: Install Hadoop Hadoop Installation On Windows 10
12 pages
Bda Manual
No ratings yet
Bda Manual
80 pages
BDA-ALLEXP (2)_merged
No ratings yet
BDA-ALLEXP (2)_merged
149 pages
Hadoop File Complte
No ratings yet
Hadoop File Complte
18 pages
Big Data Analytics and Visualization Lab
No ratings yet
Big Data Analytics and Visualization Lab
193 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
Vsphere Esxi Vcenter Server 551 Security Guide
0% (1)
Vsphere Esxi Vcenter Server 551 Security Guide
182 pages
bda lab record
No ratings yet
bda lab record
60 pages
Bigdata Manual Final
No ratings yet
Bigdata Manual Final
65 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
Z77 Pro3 - multiQIG
No ratings yet
Z77 Pro3 - multiQIG
192 pages
CCS334 BDA LAB MANUAL
No ratings yet
CCS334 BDA LAB MANUAL
48 pages
BDA RECORD (24-25)
No ratings yet
BDA RECORD (24-25)
50 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
Wartung Und Reparatur Engl Rev-3.1
No ratings yet
Wartung Und Reparatur Engl Rev-3.1
70 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Bda Record
No ratings yet
Bda Record
46 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Prachi 20CS111 BDALab File
No ratings yet
Prachi 20CS111 BDALab File
20 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
Hadoop1
No ratings yet
Hadoop1
15 pages
BDA Lab Manual_organized (2) (1) - Copy
No ratings yet
BDA Lab Manual_organized (2) (1) - Copy
69 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Unit 1 Bdhall
No ratings yet
Unit 1 Bdhall
66 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
7.TR-069 Basics & Trouble Shooting
No ratings yet
7.TR-069 Basics & Trouble Shooting
54 pages
DGC2020HD Configuration Manual
No ratings yet
DGC2020HD Configuration Manual
392 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
bda-manual
No ratings yet
bda-manual
33 pages
HadoopfilePP
No ratings yet
HadoopfilePP
83 pages
Bda Record
No ratings yet
Bda Record
83 pages
BDA Lab Manual 2023-2024
No ratings yet
BDA Lab Manual 2023-2024
54 pages
BDA lab manual UPDATED
No ratings yet
BDA lab manual UPDATED
45 pages
NEW BDA MANUAL
No ratings yet
NEW BDA MANUAL
80 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
HADOOP RECORD 2024-FINAL
No ratings yet
HADOOP RECORD 2024-FINAL
59 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
big data
No ratings yet
big data
28 pages
BDA lab Manual
No ratings yet
BDA lab Manual
62 pages
Software Engineering Fundamentals
No ratings yet
Software Engineering Fundamentals
30 pages
1-Lab 01-Intro To Robotics
No ratings yet
1-Lab 01-Intro To Robotics
25 pages
Data Science
No ratings yet
Data Science
82 pages
Wire List: To Target From Target Wire Properties
No ratings yet
Wire List: To Target From Target Wire Properties
57 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
BIG data file
No ratings yet
BIG data file
28 pages
Operating System: Deployment Guide Automating Windows NT Setup
No ratings yet
Operating System: Deployment Guide Automating Windows NT Setup
129 pages
Teleform User Guide - Cara Penggunaan Teleform
No ratings yet
Teleform User Guide - Cara Penggunaan Teleform
106 pages
Software Requirements Specification Introduction: Part - B
No ratings yet
Software Requirements Specification Introduction: Part - B
10 pages
BDA
No ratings yet
BDA
30 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
Signed Off - Media and Information Literacy1 - q2 - m5
No ratings yet
Signed Off - Media and Information Literacy1 - q2 - m5
33 pages
2 ONOS Installation and Build
No ratings yet
2 ONOS Installation and Build
22 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
Modelling Tomography
No ratings yet
Modelling Tomography
21 pages
Rapid Cybersecurity Ops
No ratings yet
Rapid Cybersecurity Ops
87 pages
Final Report
No ratings yet
Final Report
22 pages
GS+ Reference Manual
No ratings yet
GS+ Reference Manual
23 pages
Hadoop on Windows
No ratings yet
Hadoop on Windows
13 pages
PRD Samples PDF
No ratings yet
PRD Samples PDF
22 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
Zeshan CVV
No ratings yet
Zeshan CVV
2 pages
Unit 4 Practice
No ratings yet
Unit 4 Practice
11 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
InstallGuide 4.5
No ratings yet
InstallGuide 4.5
4 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
DSR Project (3144) KISHOR
No ratings yet
DSR Project (3144) KISHOR
18 pages
IT officer Cv
No ratings yet
IT officer Cv
4 pages
2022 SUpplementry
No ratings yet
2022 SUpplementry
10 pages
Cheat Sheets
No ratings yet
Cheat Sheets
11 pages
Storage For Data Resilience Level 1 V2 Quiz
No ratings yet
Storage For Data Resilience Level 1 V2 Quiz
14 pages
The Future of Remote Work PDF
No ratings yet
The Future of Remote Work PDF
2 pages
Setup Devcloud Access 221170
No ratings yet
Setup Devcloud Access 221170
3 pages
RaichalK (2y 2m)
No ratings yet
RaichalK (2y 2m)
3 pages
RTSO-3002 Datasheet
No ratings yet
RTSO-3002 Datasheet
2 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.