DWDM Complete Record 2
DWDM Complete Record 2
CLEMENTINE
a) Customer age < 35 and count the customers who buy dairy and VEG products
c) Derive the field whose homeown is 'YES' and Age > 30 and sort data w.r.t. income in Ascending order, and
output only the item fields.
Input data set is applicable to all exercises in given problem statement: BASKETS1n
a) Customer age < 35 and count the customers who buy dairy and VEG products
SOLUTION 1a)
Expected output:
Output dataset:
Procedure:
1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.
2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.
3. Go to field options and select Derive flag and give condition as dairy = 'T' and cannedveg = 'T' and fruitveg = 'T'
and click OK based on the conditions the truth values are shown and records are selected.
4. Goto options search for the Select give the condition as [ (age < 35) and DnV_T = 'T' ]
6. Select the Aggregate operation to retrieve sum and max of the records.
SOLUTION 1b)
Expected output:
Output dataset:
Procedure:
1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.
2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.
3. Goto options search for the Select give the condition as as shown in figure.
5. Select the Aggregate operation to retrieve income_Mean and count of the records.
c) Derive the field whose homeown is 'YES' and Age > 30 and sort data w.r.t. income in Ascending order, and
output only the item fields
SOLUTION 1c)
Expected output:
Output dataset:
Procedure:
1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.
2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.
3. Goto Field ops search for the Derive give the condition as as shown in figure.
SOLUTION 1d)
Expected output:
Output dataset:
Procedure:
1. Specify the name of the file. You can enter a filename or click the ellipsis button (...) to select a file. The file path
is shown once you have selected a file, and its contents are displayed with delimiters in the panel below it.
2. Select var.file from sources then goto C:\Program Files (x86)\SPSS Clementine\11.1\Demos\BASKETS1n we get
the baskets in file.
3. Goto Field ops search for the Binning give the condition as as shown in figure.
4. Connect Binning to Type to read data types and values as shown in figure
6. Connect Reclassify to Aggregate to get the income_Mean w.r.t. Different age categories.
2. Using DRUG3n and DRUG4n datasets select the data as given below
a) Select 50% of records where maximum type of drug are present along with no restrictions on remaining drugs,
and use histogram graph of age w.r.t BP
b) Take the equal number of samples of each drug and calculate the Std. Dev. of age w.r.t drug and compare it
with complete data Std. Dev. of age w.r.t drug and give a conclusion statement.
c) List 5 strong associations of attribute values, and derive and display the data.
d) Append DRUG2n dataset to given datasets and consider distinct values of Age.
e) Using the above 3 datasets (DRUG2n, DRUG3n, DRUG4n) perform the following
ii) Multi plot the above Age categories with Na and K and drug
Input data set is applicable to all exercises in given problem statement: DRUG3n and DRUG4n ( For excercises d
and e DRUG2n is also used)
a) Select 50% of records where maximum type of drug are present along with no restrictions on remaining drugs,
and use histogram graph of age w.r.t BP
SOLUTION 2a)
Expected output:
Procedure: The following are the nodes used for this exercise with respective settings.
APPEND:
Aggregate: to get the count of all Drug types as shown in following output table.
Table: from the following output we can identify that ‘drugY’ has maximum number of records when compared to
remaining Drug types.
Histogram: The following Histogram node gives the output of Age w.r.t. BP
Table: It is an output of the records after selecting 50% of ‘drugY’ and no restrictions on remaining Drugs
b) Take the equal number of samples of each drug and calculate the Std. Dev. of age w.r.t drug and compare it
with complete data Std. Dev. of age w.r.t drug and give a conclusion statement.
SOLUTION 2b)
Expected output:
Output dataset:
When the above Sample data output and complete data output is compared Standard Deviation of Age w.r.t Each
drug type is almost similar, but there is a bit difference in Standard Deviation of Age w.r.t drugX, drugY and drugC
in Sample data where as in complete data Standard Deviation of Age for the above drug types has minor
difference.
Procedure: The following are the nodes used for this exercise with respective settings.
APPEND
SELECT and SAMPLE: This procedure is followed for remaining drug types types where 20 equal samples of each
drug type is selected
APPEN
D:
Appen
ding
all
sampl
es
c) List 5 strong associations of attribute values, and derive and display the data.
SOLUTION 2c)
Expected output:
Output dataset: The following is the output for Sex = ‘M’ and Cholesterol = ‘High’
Procedure: The following are the nodes used for this exercise with respective settings.
APPEND
WEB: Plotting the web for Sex, BP, Cholesterol and Drug to get 5 strong associations
When web is created showing 5 strong links, we have to derive nodes for every links by right clicking on link and
generate derive node for link.
When a derive node is created from link the following is the configuration of derive node
d) Append DRUG2n dataset to given datasets and consider distinct values of Age.
SOLUTION 2d)
Expected output:
Output dataset: The following output is showing the records with distinct value of ages.
Procedure: The following are the nodes used for this exercise with respective settings.
Append:
Appended data sets are given in accordance with age are exported to output graphs.
e) Using the above 3 datasets (DRUG2n, DRUG3n, DRUG4n) perform the following
ii) Multi plot the above Age categories with Na and K and drug
SOLUTION 2e)
Expected output:
Output dataset/graph:
Above is Multi plot the above Young_Age categories with Na and K and drug
Above is Multi plot the above Middle_Age categories with Na and K and drug
Above is Multi plot the above Senior_Age categories with Na and K and drug
Procedure: The following are the nodes used for this exercise with respective settings.
Append:
Appended data sets are given in accordance with age are exported to output graphs.
Exercise 3
Using BASKETS1N
a) Find the association rules only for items using Apriori model with minimum support 3% and confidence
90%.
b) Compare the GRI and Aproiri having support 22% and confidence 90% usage(prepare a sample data set in
spreadsheet)
c) Determine the Drugs(Drug4n) importance w.r.t Age, Cholesterol and BP and Compare the C5.0 and Neural
Net
d) Determine the importance of the attributes using K-Means from Drug3n and Drug4n datasets
SOLUTION 3a)
Expected output:
Procedure: The following are the nodes used for this exercise with respective settings.
Type : Using this node we read values and type of each attribute, the non-item attribute are given direction as
none and all Item based attributes are given as both input and output to the Apriori Model.
As we execute this the Apriori Model is build, one we browse the model we can see the resultant rules as output.
SOLUTION 3b)
Expected output:
Output dataset:
The below outputs shows that there is no difference in rules generated from the Apriori and GRI ( Generalized
Rule Induction ), but the order is changed. In Apriori, first low level frequent item sets rules are generated and
subsequently the next level frequent itemset rules. Whereas in GRI, the rules are generated on Items i.e. first
largest rule then smallest rule size for one item, then the same for subsequent items.
SOLUTION 3c)
Expected output:
Output dataset:
C 5.0 : 2 Level Decision Tree is prepared with BP as root attribute (level 1) with Age and
Cholesterol Attributes at level 2
** As a result Neural Net and C 5.0 Models are giving the same information.
Procedure: The following are the nodes used for this exercise with respective settings.
SOLUTION 3d)
Expected output:
Output dataset:
The following output shows which attributes are important. (The unimportant attributes are Sex and Age)
Procedure: The following are the nodes used for this exercise with respective settings.
Append