DWM Paper (S2022) Solution
DWM Paper (S2022) Solution
1
2Hours] (Total Marks : 70
Tine: 2
questions.
Instructions : 1) Attempt all
2) Make suitable assumptions wherever necessary.
3) Figures to the right indicate full marks.
4) Simple and non-programmable scientific calculators are allowed.
01 a) What is data warehousing ? Explain its features. (Refer section 1.2) (3]
b) With the help of a neat diagram explain the3 -tier architecture of data warehouse.
(Refer section 1.7) [4]
[4)
D) Dofeature wise comparison between OLAP and OLTP. (Refer section 1.1.1)
C) Define the term "Data mining". Why is it called data mining rather knowledge
of
mining ? Explain the process of knowledge discovery from databases with the help(7]
a suitable diagram. (Refer sections 2.1and 2.8)
OR
Q.4 a)
Why tree pruning useful in decision tree induction ?
[3)
Ans. : When decision trees are built, many of the branches may reflect noise or
tliers in the training data.
Tree pruning methods address this problem of
overfitting the data.
Tree pruning attempts to identify and
of improving classification accuracy on remove such branches, with the goal
unseen data.
Decision trees can suffer from repetition and
overwhelming to interpret. replication, making them
Repetition occurs when an
of the tree In replication, attribute repeatedly tested along a given branch
is
duplicate sub-trees exist within the tree.
These situations can impede the accuracy and
tree. omprehensibility of a decision
Pruned trees
These tend to be smaller and less
complex and thus, easier to comprehend.
They are usually faster and better at
data than un-pruned trees. correctly classifying independent test
Pruned trees tend to be more compact than their
There are two common approaches to tree un-pruned counterparts
pruning :
Pre-pruning
" In the pre-pruning approach, a tree is "'pruned" by halting its construction
early (e.g. by deciding not to further split or partition the subset of training
tuples at a given node).
When constructing a tree, measures such as
information gain, Gini index and so on can be used to statistical significance,
assess the goodness of
a split.
lf partitioning the tuples at a node would result in a split that falls below a
pre specified threshold, then further partitioning of the given subset is
halted.
There are difficulties, however, in choosing an appropriate threshold.
Fligh thresholds could result in oversimplified trees, whereas low thresholds
could result in very little simplification
Post - pruning
" The second and more common approach is post pruning, which removes
subtrees from a "fully grown" tree.
" Asubtree at a given node is pruned by removing its branches and
it with a leaf. replacing
" The leaf is labeled with the most frequent class among the subtree being
replaced.
The cost complexity pruning algorithm used in CART is an example of the
post pruning approach.
The basic idea is that the simplest solution is preferred.
Unlike cost complexity, pruning does not require an independent set of
tuples.
Post pruning leads to a more reliable tree.
b) Explain association rules with confidence and support giving an example.
(Refer sections 4.7 and 4.7.3)
Fig. 1
A common form of background knowledge as
generated or specialized according to a hierarchy ofthat an attribute
concepts.
may be
TECHNICAL PUBLICATIONS A
WarehouSingand Data Mining S-5 Solved University Question Papers
Level 1
Milk
min support = 5 %
(Support =10 %)
Level 2 2 % Milk
min support =5 % Skim milk
(Support =6%) (Support=4 %)
Example of uniform minimum support for all
levels
Using reduced
minimum support at lower level :
At every level of
abstraction, there is its own minimum support
So minimum support at
lower levels reduces. threshold.
Level 1
Milk
min support = 5%
(Support = 10 %)
Level 2 2% Milk
min support =5% (Support =6 %) Skim milk
(Support = 4 %)
Transaction ID Iterms
T |Chips, Coke
|Chips, Ketchup
|Hot Dogs, Coke, Chips
Ans. :
33.33
Minimum support count X 6= 2
10
Item set
Sup-count Item set
Hot Dogs, Buns
Hot Dogs, Buns Sup-count
Hot Dogs,Coke Hot Dogs, Ketchup
2
Hot Dogs, Chips 2 1
Hot Dogs, Coke
Coke, Chips 3
Hot Dogs, Chips 2
Buns, Ketchup 1
Buns, Coke
Buns, Chips
Item set Sup-count Ketchup, Coke 0
Hot Dogs, Buns, Coke Ketchup, Chips 1
Hot Dogs, Buns, Chips Coke, Chips 3
Hot Dogs, Coke, Chips
Item set
Sup-count
Hot Dogs, Coke, Chips 2
Fig. 2
.5 a) Diference between linear and logistic regression. (Refer sections 5.10.1 and
5.10.3)
(3]
b) What is outlier ? Discuss different methods for outlier
(Refer section 6.10) detection.
[4]
C) What is web mining ? Explain types of web mining. (Refer section 7.10) (71
OR
a) Diference between spatial mining and temporal mining. (Refer section 7.2.1) (3]
Explain linear regression using suitable example along with its pros and cons.
(Refer section 5.10.1) (71
O00