0% found this document useful (0 votes)
31 views58 pages

DWDM Complete Record 2

Uploaded by

Hemanth Kumar1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views58 pages

DWDM Complete Record 2

Uploaded by

Hemanth Kumar1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Department of CSE DWDM Lab Record Roll Number: 22311A05FB

CLEMENTINE

1. Using BASKETS1n dataset select the data as given below

a) Customer age < 35 and count the customers who buy dairy and VEG products

b) Find the AVG income of customers who buy atleast 5 products

c) Derive the field whose homeown is 'YES' and Age > 30 and sort data w.r.t. income in Ascending order, and
output only the item fields.

d) Find the mean value of salary w.r.t age={Young, Middle, Senior}.

Input data set is applicable to all exercises in given problem statement: BASKETS1n

III CSE-H Page No: 1


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

a) Customer age < 35 and count the customers who buy dairy and VEG products

SOLUTION 1a)

Expected output:

Output dataset:

Procedure:

1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.

2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.

3. Go to field options and select Derive flag and give condition as dairy = 'T' and cannedveg = 'T' and fruitveg = 'T'
and click OK based on the conditions the truth values are shown and records are selected.

III CSE-H Page No: 2


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

4. Goto options search for the Select give the condition as [ (age < 35) and DnV_T = 'T' ]

5. If both the conditions are true it counts the number of records.

6. Select the Aggregate operation to retrieve sum and max of the records.

III CSE-H Page No: 3


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

7. Select output table and connect with aggregator

8. Right click on output table and execute it.

9. Then click on run now output will be displayed.

III CSE-H Page No: 4


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
b) Find the AVG income of customers who buy atleast 5 products

SOLUTION 1b)

Expected output:

Output dataset:

Procedure:

1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.

2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.

3. Goto options search for the Select give the condition as as shown in figure.

III CSE-H Page No: 5


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
4. If the condition is true it counts the number of records.

5. Select the Aggregate operation to retrieve income_Mean and count of the records.

6. Select output table and connect with aggregator

7. Right click on output table and execute it.

8. Then click on run now output will be displayed.

III CSE-H Page No: 6


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

c) Derive the field whose homeown is 'YES' and Age > 30 and sort data w.r.t. income in Ascending order, and
output only the item fields

SOLUTION 1c)

Expected output:

Output dataset:

Procedure:

1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.

2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.

3. Goto Field ops search for the Derive give the condition as as shown in figure.

III CSE-H Page No: 7


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

4. Connect Derive to Select with the condition shown in figure

5. Connect Select to Sort for sorting Income in ascending order

III CSE-H Page No: 8


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

6. Connect Sort to Filter out all the non-item fields.

7. Select output table and connect with Filter.

7. Right click on output table and execute it.

8. Then click on run now output will be displayed.

III CSE-H Page No: 9


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

d) Find the mean value of salary w.r.t age={Young, Middle, Senior}.

SOLUTION 1d)

Expected output:

Output dataset:

Procedure:

1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.

2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.

3. Goto Field ops search for the Binning give the condition as as shown in figure.

III CSE-H Page No: 10


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

4. Connect Binning to Type to read data types and values as shown in figure

III CSE-H Page No: 11


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
5. Connect Type to Reclassify, for classifying age binned in 1,2,3 to Young,Middle,Senior respectively.

6. Connect Reclassify to Aggregate to get the income_Mean w.r.t. Different age categories.

III CSE-H Page No: 12


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

7. Select output table and connect with Aggregate.

8. Right click on output table and execute it.

9. Then click on run now output will be displayed.

III CSE-H Page No: 13


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

2. Using DRUG3n and DRUG4n datasets select the data as given below

a) Select 50% of records where maximum type of drug are present along with no restrictions on remaining drugs,
and use histogram graph of age w.r.t BP

b) Take the equal number of samples of each drug and calculate the Std. Dev. of age w.r.t drug and compare it
with complete data Std. Dev. of age w.r.t drug and give a conclusion statement.

c) List 5 strong associations of attribute values, and derive and display the data.

d) Append DRUG2n dataset to given datasets and consider distinct values of Age.

e) Using the above 3 datasets (DRUG2n, DRUG3n, DRUG4n) perform the following

i) Young_Age <=30, Middle_Age >30 and <=50, Senior_Age >50

ii) Multi plot the above Age categories with Na and K and drug

Input data set is applicable to all exercises in given problem statement: DRUG3n and DRUG4n ( For excercises d
and e DRUG2n is also used)

III CSE-H Page No: 14


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

a) Select 50% of records where maximum type of drug are present along with no restrictions on remaining drugs,
and use histogram graph of age w.r.t BP

SOLUTION 2a)

Expected output:

III CSE-H Page No: 15


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
Output dataset/ graph:

Procedure: The following are the nodes used for this exercise with respective settings.

APPEND:

III CSE-H Page No: 16


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Aggregate: to get the count of all Drug types as shown in following output table.

Table: from the following output we can identify that ‘drugY’ has maximum number of records when compared to
remaining Drug types.

III CSE-H Page No: 17


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Balance: for selecting 50% of records for ‘drugY’

III CSE-H Page No: 18


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Histogram: The following Histogram node gives the output of Age w.r.t. BP

Table: It is an output of the records after selecting 50% of ‘drugY’ and no restrictions on remaining Drugs

III CSE-H Page No: 19


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

b) Take the equal number of samples of each drug and calculate the Std. Dev. of age w.r.t drug and compare it
with complete data Std. Dev. of age w.r.t drug and give a conclusion statement.

SOLUTION 2b)

Expected output:

III CSE-H Page No: 20


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Output dataset:

When the above Sample data output and complete data output is compared Standard Deviation of Age w.r.t Each
drug type is almost similar, but there is a bit difference in Standard Deviation of Age w.r.t drugX, drugY and drugC
in Sample data where as in complete data Standard Deviation of Age for the above drug types has minor
difference.

Procedure: The following are the nodes used for this exercise with respective settings.

III CSE-H Page No: 21


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

APPEND

SELECT and SAMPLE: This procedure is followed for remaining drug types types where 20 equal samples of each
drug type is selected

APPEN
D:
Appen
ding
all
sampl
es

III CSE-H Page No: 22


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Sample Aggregate: Aggregate of Sample data

Complete Aggregate: Aggregate of Complete data

III CSE-H Page No: 23


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

III CSE-H Page No: 24


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

c) List 5 strong associations of attribute values, and derive and display the data.

SOLUTION 2c)

Expected output:

Output dataset: The following is the output for Sex = ‘M’ and Cholesterol = ‘High’

III CSE-H Page No: 25


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Procedure: The following are the nodes used for this exercise with respective settings.

APPEND

III CSE-H Page No: 26


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

WEB: Plotting the web for Sex, BP, Cholesterol and Drug to get 5 strong associations

When web is created showing 5 strong links, we have to derive nodes for every links by right clicking on link and
generate derive node for link.

III CSE-H Page No: 27


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

When a derive node is created from link the following is the configuration of derive node

III CSE-H Page No: 28


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

d) Append DRUG2n dataset to given datasets and consider distinct values of Age.

SOLUTION 2d)

Expected output:

Output dataset: The following output is showing the records with distinct value of ages.

III CSE-H Page No: 29


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Procedure: The following are the nodes used for this exercise with respective settings.

Append:

Appended data sets are given in accordance with age are exported to output graphs.

III CSE-H Page No: 30


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
Distinct Age:

e) Using the above 3 datasets (DRUG2n, DRUG3n, DRUG4n) perform the following

i) Young_Age <=30, Middle_Age >30 and <=50, Senior_Age >50

ii) Multi plot the above Age categories with Na and K and drug

SOLUTION 2e)

Expected output:

III CSE-H Page No: 31


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Output dataset/graph:

Above is Multi plot the above Young_Age categories with Na and K and drug

III CSE-H Page No: 32


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Above is Multi plot the above Middle_Age categories with Na and K and drug

Above is Multi plot the above Senior_Age categories with Na and K and drug

III CSE-H Page No: 33


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Procedure: The following are the nodes used for this exercise with respective settings.

Append:

Appended data sets are given in accordance with age are exported to output graphs.

Select Age = Young:

III CSE-H Page No: 34


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Select Age = Middle:

Select Age = Senior:

III CSE-H Page No: 35


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

For each Select of age we use Multi Plot as shown in output.

Exercise 3

Using BASKETS1N

a) Find the association rules only for items using Apriori model with minimum support 3% and confidence
90%.

III CSE-H Page No: 36


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

b) Compare the GRI and Aproiri having support 22% and confidence 90% usage(prepare a sample data set in
spreadsheet)

c) Determine the Drugs(Drug4n) importance w.r.t Age, Cholesterol and BP and Compare the C5.0 and Neural
Net

d) Determine the importance of the attributes using K-Means from Drug3n and Drug4n datasets

SOLUTION 3a)

Expected output:

Input data set:

III CSE-H Page No: 37


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
Output dataset:

Procedure: The following are the nodes used for this exercise with respective settings.

Type : Using this node we read values and type of each attribute, the non-item attribute are given direction as
none and all Item based attributes are given as both input and output to the Apriori Model.

III CSE-H Page No: 38


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
Model : The below diagram represents the settings for the Apriori Model.

As we execute this the Apriori Model is build, one we browse the model we can see the resultant rules as output.

III CSE-H Page No: 39


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

SOLUTION 3b)

Expected output:

Input data set: An Excel file is prepared as shown below

III CSE-H Page No: 40


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Output dataset:

The below outputs shows that there is no difference in rules generated from the Apriori and GRI ( Generalized
Rule Induction ), but the order is changed. In Apriori, first low level frequent item sets rules are generated and
subsequently the next level frequent itemset rules. Whereas in GRI, the rules are generated on Items i.e. first
largest rule then smallest rule size for one item, then the same for subsequent items.

III CSE-H Page No: 41


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
Procedure: The following are the nodes used for this exercise with respective settings.

Settings for Apriori

III CSE-H Page No: 42


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
Settings for GRI

III CSE-H Page No: 43


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

SOLUTION 3c)

Expected output:

Input data set:

III CSE-H Page No: 44


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Output dataset:

Output of Neural Net Model

III CSE-H Page No: 45


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Output of C5.0 Model

Output of C5.0 Model viewer which is shown in tree format

III CSE-H Page No: 46


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Comparision: Attribute Importance is as follows

Neural net Model : BP- 0.2928, Age- 0.1070, Cholestrol- 0.1023

C 5.0 : 2 Level Decision Tree is prepared with BP as root attribute (level 1) with Age and
Cholesterol Attributes at level 2

** As a result Neural Net and C 5.0 Models are giving the same information.

III CSE-H Page No: 47


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Procedure: The following are the nodes used for this exercise with respective settings.

Settings for C5.0 Model

III CSE-H Page No: 48


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Settings for Neural Net Model

III CSE-H Page No: 49


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

III CSE-H Page No: 50


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

SOLUTION 3d)

Expected output:

Input data set:

III CSE-H Page No: 51


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Output dataset:

The following output shows which attributes are important. (The unimportant attributes are Sex and Age)

III CSE-H Page No: 52


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Procedure: The following are the nodes used for this exercise with respective settings.

Append

III CSE-H Page No: 53


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

Setting for K-Means

III CSE-H Page No: 54


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
Weka

A ) To Open dataset follow : Openfile/local disk C/Program files/weka3.8.6/data/weather.nominal

III CSE-H Page No: 55


Department of CSE DWDM Lab Record Roll Number: 22311A05FB
B) Next goto Choose select Filter next unsupervised and select instances.
Now Choose Removewithvalues and do the following steps.

Then click on OK and Then Apply.

III CSE-H Page No: 56


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

C) Open iris dataset from Openfile option in WEKA.

III CSE-H Page No: 57


Department of CSE DWDM Lab Record Roll Number: 22311A05FB

III CSE-H Page No: 58

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy