0% found this document useful (0 votes)

3 views4 pages

1

Data mining is the process of extracting knowledge from large datasets, evolving from the need for effective data management and analysis. It involves multiple disciplines and follows several steps, including data cleaning, integration, selection, transformation, mining, evaluation, and presentation. Additionally, the document discusses the differences between databases and data warehouses, outlines decision tree construction, emphasizes the importance of evaluation criteria for classification methods, and suggests ways to improve classification accuracy.

Uploaded by

maira butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views4 pages

1

Uploaded by

maira butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Question 1: What is data mining?

in your answer, address the following:

Data mining refers to the process of extracting or mining interesting knowledge or
patterns from large amounts of data.
(a) is it another hype?
Data mining is not another hype. Instead, the need for data mining has arisen due
to the wide availability of huge amounts of data and the need for turning such data
into useful information and knowledge. Thus, data mining can be viewed as
the result of the natural evolution of information technology.
(b) is it a simple transformation of technology developed from databases,
statistics, and machine learning?
No. Data mining is more than a simple transformation of technology developed from
databases, statistics, and machine learning. Instead, data mining involves an
integration, rather than a simple transformation, of techniques from multiple
disciplines such as database technology, statistics ,machine learning, high-
performance computing, pattern recognition, neural networks, data visualization,
information retrieval, image and signal processing, and spatial data analysis.
(c) explain how the evolution of database technology led to data mining.
Database technology began with the development of data collection and database
creation mechanisms that led to the development of effective mechanisms for data
management including data storage and retrieval, and query and transaction
processing .The large number of database systems offering query and transaction
processing eventually and naturally led to the need for data analysis and
understanding .Hence, data mining began its development out of this necessity.
(d) describe the steps involved in data mining when viewed as a process of
knowledge discovery
The steps involved in data mining when viewed as a process of knowledge
discovery are as follows:
•Data cleaning, a process that removes or transforms noise and inconsistent data
•Data integration, where multiple data sources may be combined
•Data selection, where data relevant to the analysis task are retrieved from the
database
•Data transformation, where data are transformed or consolidated into forms
appropriate for mining
•Data mining, an essential process where intelligent and efficient methods
are applied in order to extract patterns
•Pattern evaluation, a process that identifies the truly interesting patterns
representing knowledge based on some interestingness measures
•Knowledge presentation, where visualization and knowledge representation
techniques are used to present the mined knowledge to the user.
Question2: How database is different from data warehouse.
Deference between a data warehouse and a database: A data warehouse is a
repository of
information collected from multiple sources, over a history of time, stored under a
unified schema, and used for data analysis and decision support; whereas a
database, is a collection of interrelated data that represents the current status of
the stored data. There could be multiple heterogeneous database where the
schema of one database may not agree with the schema of another. A database
system supports ad-hoc query and on-line transaction processing.
- Similarities between a data warehouse and a database: Both are repositories of
information, storing huge amounts of persistent data.

Question 3: Briefly explain the steps of making decision tree.

Step 1: Determine the Root of the Tree.
Step 2: Calculate Entropy for The Classes.
Step 3: Calculate Entropy After Split for Each Attribute.
Step 4: Calculate Information Gain for each split.
Step 5: Perform the Split.
Step 6: Perform Further Splits.
Step 7: Complete the Decision Tree
The core algorithm for building decision trees called ID3 .ID3 uses Entropy and
Information Gain to construct a decision tree.
Entropy
Entropy controls how a Decision Tree decides to split the data. It actually effects
how a Decision Tree draws its boundaries.
Information Gain
The information gain is based on the decrease in entropy after a dataset is split on
an attribute.
Constructing a decision tree is all about finding attribute that returns the highest
information gain
Step 1: Calculate entropy of the target/class variable
Step 2: The dataset is then split on the different attributes. The entropy for each
branch is calculated.
Then it is added proportionally, to get total entropy for the split. The resulting
entropy is subtracted
from the entropy before the split. The result is the Information Gain, or decrease in
entropy.
Step 3: Choose attribute with the largest information gain as the decision node,
divide the dataset by its branches and repeat the same process on every branch.

Step 4a: A branch with entropy of 0 is a leaf node.

Step 4b: A branch with entropy more than 0 needs further splitting.
Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data
is classified.
Question 4: Explain the importance of evaluation criteria for classification
methods.
Performance evaluation of classification model is important for understanding the
quality of the model, to refine the model, and for choosing the adequate model.
Evaluation criteria Metrics help us understand how a classifier performs; many are
available, some with numerous tunable parameters. It is also critical for evaluating
reports by others—if a study presents a single metric, one might question the
performance of the classifier when evaluated using other metrics. Classification
metrics are calculated from true positives (TPs), false positives (FPs), false
negatives (FNs) and true negatives (TNs), all of which are tabulated in the so-called
confusion matrices. The relevance of each of these four quantities will depend on
the purpose of the classifier and motivate the choice of metric. For a medical test
that determines whether patients receive a treatment that is cheap, safe and
effective, FPs would not be as important as FNs, which would represent patients
who might suffer without adequate treatment. In contrast, if the treatment were an
experimental drug, then a very conservative test with few FPs would be required to
avoid testing the drug on unaffected individuals.
Question 5: How can we improve the accuracy of classification?
some methods to enhance a classification accuracy, talking generally, are:
1 - Cross Validation: Separate your train dataset in groups, always separate a group
for prediction and change the groups in each execution. Then you will know what
data is better to train a more accurate model.
2 - Cross Dataset : The same as cross validation, but using different datasets.
3 - Tuning your model : Its basically change the parameters you're using to train
your classification model
4 - Use the normalization process : Discover which techniques will provide a more
concise data to you to use on the training.
5 - Understand more the problem you're treating... Try to implement other methods
to solve the same problem. Always there's at least more than one way to solve the
same problem. You maybe not using the best approach.
Question 6: Solve by using k nearest neighbor:
P1=3 and P2=7
Where k=3
P1 P2 Class
7 7 F
7 4 F
3 4 T
1 4 T

Google Gemini1
No ratings yet
Google Gemini1
165 pages
Drivers For Big Data
No ratings yet
Drivers For Big Data
7 pages
DWM NOTES
No ratings yet
DWM NOTES
118 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
DM Passing Package
No ratings yet
DM Passing Package
38 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Data Mining Micro PGDM
No ratings yet
Data Mining Micro PGDM
40 pages
AIML-HC Mod 02
No ratings yet
AIML-HC Mod 02
65 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Week001-Module (1) Merged
No ratings yet
Week001-Module (1) Merged
122 pages
DWDM Short YNotes
No ratings yet
DWDM Short YNotes
9 pages
DMA QB Solved
No ratings yet
DMA QB Solved
42 pages
BSC CSIT Final Year Project Report On Sword of Warrior Game Project Report
No ratings yet
BSC CSIT Final Year Project Report On Sword of Warrior Game Project Report
52 pages
Data Mining and Visualization Question Bank
100% (1)
Data Mining and Visualization Question Bank
11 pages
Fundamentals of Data Science-1
No ratings yet
Fundamentals of Data Science-1
9 pages
DM 100
No ratings yet
DM 100
17 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
AndroRat Tutorial (Noob-Friendy)
50% (2)
AndroRat Tutorial (Noob-Friendy)
18 pages
Model Question Paper 2
No ratings yet
Model Question Paper 2
7 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
Top 50 Data Mining Interview Questions & Answers - GeeksforGeeks
No ratings yet
Top 50 Data Mining Interview Questions & Answers - GeeksforGeeks
25 pages
Module 1: Introduction To CAD Software
100% (1)
Module 1: Introduction To CAD Software
8 pages
Data Mining Questions and Answers
No ratings yet
Data Mining Questions and Answers
22 pages
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
No ratings yet
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
7 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Mining - DM 1-5 Question Bank
No ratings yet
Data Mining - DM 1-5 Question Bank
10 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
5 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
9 pages
Soln 1
100% (1)
Soln 1
6 pages
Data Mining (Viva)
No ratings yet
Data Mining (Viva)
18 pages
Top 50 Data Mining Interview Questions & Answers PDF
No ratings yet
Top 50 Data Mining Interview Questions & Answers PDF
30 pages
Classification
No ratings yet
Classification
50 pages
Motor Forward and Reverse Direction Control Using A PLC
No ratings yet
Motor Forward and Reverse Direction Control Using A PLC
5 pages
DMjoy
No ratings yet
DMjoy
9 pages
VI Editor
No ratings yet
VI Editor
32 pages
ImageJ User Guide
100% (1)
ImageJ User Guide
199 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
DM Vsaq
No ratings yet
DM Vsaq
8 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
DM - MP
No ratings yet
DM - MP
15 pages
Assignment of DMDW kg11
No ratings yet
Assignment of DMDW kg11
17 pages
Reaserch Paper
No ratings yet
Reaserch Paper
90 pages
Statistical Process Control Documentation - v6.5
No ratings yet
Statistical Process Control Documentation - v6.5
36 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Device Info
No ratings yet
Device Info
17 pages
QB Data Mining
No ratings yet
QB Data Mining
5 pages
MOST ASKED QUESTIONS Pattern Recognition GTU
No ratings yet
MOST ASKED QUESTIONS Pattern Recognition GTU
23 pages
Ict Lecturenotes2 230825065644 F36a5d57
No ratings yet
Ict Lecturenotes2 230825065644 F36a5d57
32 pages
Data Mining
No ratings yet
Data Mining
20 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Dta Mining
No ratings yet
Dta Mining
15 pages
FDS - I Unit
No ratings yet
FDS - I Unit
9 pages
苹果自分配ip地址
100% (2)
苹果自分配ip地址
4 pages
CV Writing
No ratings yet
CV Writing
22 pages
Comp 414 Revision
No ratings yet
Comp 414 Revision
9 pages
Evaluation of Student Academic Performan
No ratings yet
Evaluation of Student Academic Performan
7 pages
6 TheRealTimeFaceDetectionandRecognitionSystem
No ratings yet
6 TheRealTimeFaceDetectionandRecognitionSystem
48 pages
DWM Unit-3 Sem Ans
No ratings yet
DWM Unit-3 Sem Ans
10 pages
Chapter Two
No ratings yet
Chapter Two
18 pages
Viva Data Mining Lab
No ratings yet
Viva Data Mining Lab
11 pages
C - Notes (Data Planet)
No ratings yet
C - Notes (Data Planet)
142 pages
Solved DM Questions
No ratings yet
Solved DM Questions
6 pages
Activity 1 PDF
No ratings yet
Activity 1 PDF
3 pages
12 Shared - Folders
No ratings yet
12 Shared - Folders
11 pages
Sample Business Letters
No ratings yet
Sample Business Letters
11 pages
Farida Mannan Moon Website Quotation
No ratings yet
Farida Mannan Moon Website Quotation
7 pages
Question Bank: Q1) What Is Data Warehouse?
No ratings yet
Question Bank: Q1) What Is Data Warehouse?
17 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Tower & Power Materials Dimension Data
No ratings yet
Tower & Power Materials Dimension Data
231 pages
Cloth Store Management System
No ratings yet
Cloth Store Management System
25 pages
What Is The Importance of Voice Quality, Posters, Word Rate and Eye Contact at The Time of Presentation? Eye Contact
No ratings yet
What Is The Importance of Voice Quality, Posters, Word Rate and Eye Contact at The Time of Presentation? Eye Contact
5 pages
3031 Ahmad Aslam IT 6th EVE
No ratings yet
3031 Ahmad Aslam IT 6th EVE
2 pages
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
No ratings yet
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
10 pages
Ok55-Fb - Dindllb Eng Web Mfl70504378
No ratings yet
Ok55-Fb - Dindllb Eng Web Mfl70504378
46 pages
Quiz
No ratings yet
Quiz
16 pages
Assignment 3 Explain Useful Data Mining Queries?
No ratings yet
Assignment 3 Explain Useful Data Mining Queries?
4 pages
Select The Way(s) To Increase The Security of A Traditional User Id and Password System?
No ratings yet
Select The Way(s) To Increase The Security of A Traditional User Id and Password System?
36 pages
HW1
No ratings yet
HW1
4 pages
Shell Bash Scripting For Devops Notes
No ratings yet
Shell Bash Scripting For Devops Notes
6 pages
Data Mining List of Important Question
No ratings yet
Data Mining List of Important Question
4 pages
Use Case Specification Template On Uber
No ratings yet
Use Case Specification Template On Uber
14 pages
HCI Lesson5
No ratings yet
HCI Lesson5
17 pages
Subject: Change of Product Specifications and Price
No ratings yet
Subject: Change of Product Specifications and Price
1 page
The Rabin-Karp Algorithm: String Matching
No ratings yet
The Rabin-Karp Algorithm: String Matching
18 pages
Volvo On Call
No ratings yet
Volvo On Call
22 pages
Machine Learning Assignment 1
No ratings yet
Machine Learning Assignment 1
4 pages
ADBMS Course Information
No ratings yet
ADBMS Course Information
6 pages
sc554 2014 Lecture09
No ratings yet
sc554 2014 Lecture09
34 pages
Department of Computer Science & Engineering: B.Tech. Semester - 4 Question Bank 2101CS402 - Madf
No ratings yet
Department of Computer Science & Engineering: B.Tech. Semester - 4 Question Bank 2101CS402 - Madf
2 pages
Testo 175H1
No ratings yet
Testo 175H1
2 pages
Activity Guide - Flippy Do PT 1 - Unit 1 Lesson 4
No ratings yet
Activity Guide - Flippy Do PT 1 - Unit 1 Lesson 4
2 pages
Configuration of Procomm Plus
No ratings yet
Configuration of Procomm Plus
6 pages
Watercad 4.0
No ratings yet
Watercad 4.0
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

1

Uploaded by

1

Uploaded by

Question 1: What is data mining?

in your answer, address the following:

Question 3: Briefly explain the steps of making decision tree.

Step 4a: A branch with entropy of 0 is a leaf node.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.