0% found this document useful (0 votes)

28 views44 pages

2 Data Mining Terms & Concepts

Uploaded by

saharsh0812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views44 pages

2 Data Mining Terms & Concepts

Uploaded by

saharsh0812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

DATA MINING TERMS &

CONCEPTS
DBMS
• Database System is used in traditional way of storing and
retrieving data.
• The major task of database system is to perform query
processing.
• These systems are generally referred as online
transaction processing system.
• These systems are used day to day operations of and
organization.
Data Warehouse
• Data Warehouse is the place where huge amount of data
is stored.

• It is meant for users or knowledge workers in the role of

data analysis and decision making.

• These systems are referred as online analytical

processing.
DBMS and Data Warehouse Difference
DBMS and Data Warehouse Difference
OLTP and OLAP
• OLTP

Transaction Oriented applications

Mainly concern with Entry, Storage and retrieval of data.

Design to day-to-day operations such as purchasing,

inventory, payroll, accounting etc.

It supports basically DML operations.

Users of OLTP

Almost all industries including:

Airlines

Supermarkets

Banking

Insurance

Etc.
• Data usually captured in OLTP are stored in
commercial relational databases. e.g;

• Database of supermarket store consists of the

following table to store the data about its
transactions, product, inventory, employee etc.
• Transactions

• ProductName

• EmployeeDetails

• InventorySupplies

• Suppliers
Advantages of OLTP
• Simplicity

• Efficiency

• Allow user to read, write and delete data quickly

• Fast query processing

• Respond user actions immediately and also support transaction

processing in demand.
Challenges
• Security

• It require concurrency control(locking) and

• recovery mechanism.

• OLTP system data content not suitable for decision

making

• A typical OLTP system manages the current data within the

enterprises/organization. These data are too far away from the
decision making.
Answer
The supermarket store is deciding on introducing a new
product. The key debating issue are: “which product should
they introduce?” and “should it be specific to a few
customer segments?”

The Supermarket store is looking at offering some discount

on their year of sale. The question here: “How much
discount should they offer ” and “ should different discount
to be given to different customer segment?”
Answer: OLAP

• OLAP differ from traditional DB in way the

data is conceptualized and stored.

• OLAP data are held in the dimensional

form rather than the relational form.

• OLAP life’s blood is multidimensional data

model.

• The multidimensional data model views

the data in the form of data cube.
Distributed Data Store (Distributed
Database)
• A distributed data store is a computer network where
information is stored on more than one node, often in a
replicated fashion It is usually specifically used to refer to
a distributed database where users store information on a
number of nodes.
Multidimensional Schema
• Multidimensional Schema is especially designed to model
data warehouse systems.

• The schemas are designed to address the unique needs

of very large databases designed for the analytical
purpose (OLAP).
• Two main types of schemas used are:

• Star Schema

• Snowflake Schema
Star Schema
• Star Schema in data warehouse, in which the center of
the star can have one fact table and a number of
associated dimension tables.

• It is known as star schema as its structure resembles a

star.

• The Star Schema data model is the simplest type of Data

Warehouse schema.
Star schema
Star schema Example
Characteristics of star schema
• Every dimension in a star schema is represented with the
only one-dimension table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a
foreign key
• The dimension table are not joined to each other
Snowflake Schema
Snowflake Schema
Characteristics of Snowflake Schema

• It uses smaller disk space.

• Easier to implement a dimension as is added to the

Schema.

• Due to multiple tables query performance is reduced

Difference
Difference
ETL
• ETL is a process in Data Warehousing and it stands for
Extract, Transform and Load.

• It is a process in which an ETL tool extracts the data from

various data source systems, transforms it in the staging
area, and then finally, loads it into the Data Warehouse
system.
ETL
Extraction
• The first step of the ETL process is extraction.

• In this step, data from various source systems is extracted

which can be in various formats like relational databases,
No SQL, XML, and flat files into the staging area.

• It is important to extract the data from various source

systems and store it into the staging area first and not
directly into the data warehouse because the extracted
data is in various formats and can be corrupted also.
Transformation
• In this step, a set of rules or functions are applied on the
extracted data to convert it into a single standard format. It may
involve following processes/tasks:

• Filtering – loading only certain attributes into the data warehouse.

• Cleaning – filling up the NULL values with some default values,

mapping U.S.A, United States, and America into USA, etc.

• Joining – joining multiple attributes into one.

• Splitting – splitting a single attribute into multiple attributes.

• Sorting – sorting tuples on the basis of some attribute (generally key-

attribute).
Loading

• In this step, the transformed data is finally loaded into the

data warehouse.

• Sometimes the data is updated by loading into the data

warehouse very frequently and sometimes it is done after
longer but regular intervals.

• The rate and period of loading solely depends on the

requirements and varies from system to system.
Pipelining
Data mining
• Data mining has been defined as the non-trivial extraction
of implicit, previously unknown, and potentially useful
information from large data sets or databases.
Knowledge Discovery
• Knowledge discovery is the process of finding novel,
interesting, and useful patterns in data.

• Data mining is a subset of knowledge discovery. Thus,

data mining is also known as Knowledge Discovery in
Databases
Information Retrieval
• Automatic retrieval of all relevant documents while at the
same time retrieving as few of the non-relevant as
possible.

• It has the primary goals of indexing text and searching for

useful documents in a collection.
Triplet
• Data is an expression of feedback; a statement (rightly or
wrongly so) about an observation.
• Information is contextualized data.
• Knowledge is a phenomenon that implies our ability to
use the information for reasoning and decision making,
i.e., it is the basis of what you can, will, would, should or
might do with information.
Information Extraction
• Information Extraction has the goal of transforming a
collection of documents, usually with the help of an IR
system, into information that is more readily digested and
analyzed.
Knowledge Representation
• Knowledge representation is the presentation of
knowledge to the user for visualization in terms of trees,
tables, rules graphs, charts, matrices, etc.
Concept Hierarchies
• A concept hierarchy defines a sequence of mappings from
a set of low-level concepts to higher-level, more general
concepts.

• Depending on the type of the ordering relation we

distinguish several types of concept hierarchies.
Set Group Hierarchy
• Concept hierarchies may also be defined by discretizing
or grouping values for a given dimension or attribute,
resulting in a set-grouping hierarchy.
Schema Hierarchy
• A concept hierarchy that is a total or partial order among
attributes in a database schema is called a schema
hierarchy.
Different user view point

• There may be more than one concept hierarchy for a

given attribute or dimension, based on different user
viewpoints.

• For instance, a user may prefer to organize price by

defining ranges for inexpensive, moderately_priced, and
expensive.
Schema hierarchy

• Relating concept generality.

• The ordering reflects the generality of the attribute values,

e.g. street < city < state < country.
Set-grouping hierarchy
• The ordering relation is the subset relation (⊆). Applies to
set values.

• Example:
• {13, ..., 39} = young; {13, ..., 19} = teenage;
• {13, ..., 19} ⊆ {13, ..., 39} ⇒ teenage < young
Operation-derived hierarchy
• Produced by applying an operation (encoding, decoding,
information extraction).

• For example: markovz@cs.ccsu.edu instantiates the

hierarcy user−name < department < university <
education
Rule-based hierarchy

• Using rules to define the partial order.

• for example: if antecedent then consequent defines the

order antecedent < consequent.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
ETL Testing - PPT
No ratings yet
ETL Testing - PPT
77 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
DWM Unit 1
No ratings yet
DWM Unit 1
67 pages
DMW Lab File Work
No ratings yet
DMW Lab File Work
18 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
10 pages
R20-DMT Unit-I
No ratings yet
R20-DMT Unit-I
24 pages
Multidimensional
No ratings yet
Multidimensional
77 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
CST466-M1 - Ktunotes - in
No ratings yet
CST466-M1 - Ktunotes - in
24 pages
Joomla
No ratings yet
Joomla
4 pages
An Introduction To Data Warehousing and Data Mining
No ratings yet
An Introduction To Data Warehousing and Data Mining
34 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
What Is Data Warehouse?: Data Mining by IK Unit 2
No ratings yet
What Is Data Warehouse?: Data Mining by IK Unit 2
21 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
4th Year DW& DM Kai075 Unit 1
No ratings yet
4th Year DW& DM Kai075 Unit 1
25 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Idq New Log Files
No ratings yet
Idq New Log Files
187 pages
Unit 5 DW
No ratings yet
Unit 5 DW
12 pages
Unit-2 1
No ratings yet
Unit-2 1
60 pages
Data Warehousing: Data Models and OLAP Operations: Lecture-1
No ratings yet
Data Warehousing: Data Models and OLAP Operations: Lecture-1
47 pages
DW Concepts
No ratings yet
DW Concepts
7 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
DMDW 7
No ratings yet
DMDW 7
30 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
Data Mining
No ratings yet
Data Mining
98 pages
Data Warehousing
No ratings yet
Data Warehousing
21 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-28 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-28 Reference-Material-I
32 pages
2.data Warehouse and OLAP
No ratings yet
2.data Warehouse and OLAP
14 pages
BusinessIntelligence 2023
No ratings yet
BusinessIntelligence 2023
36 pages
DWM Unit 1 (2023)
No ratings yet
DWM Unit 1 (2023)
38 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Data Dictionary
No ratings yet
Data Dictionary
11 pages
DWM Chp2 Notes
No ratings yet
DWM Chp2 Notes
21 pages
Vicon DIM IV
No ratings yet
Vicon DIM IV
26 pages
Unit 2
No ratings yet
Unit 2
32 pages
Data Mining: OLAP Operations
100% (1)
Data Mining: OLAP Operations
8 pages
The Need of Data Analysis
No ratings yet
The Need of Data Analysis
12 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
5 pages
Data Warehouse Modeling
No ratings yet
Data Warehouse Modeling
17 pages
Data Warehouse Concepts PDF
0% (1)
Data Warehouse Concepts PDF
14 pages
DM-M1-PPT v1.11
No ratings yet
DM-M1-PPT v1.11
84 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
DM Chapter 2
No ratings yet
DM Chapter 2
35 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Bi Unit 4
No ratings yet
Bi Unit 4
40 pages
Datawarehouse Interview Quesion and Answers
100% (1)
Datawarehouse Interview Quesion and Answers
230 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
Data Modeling Interview Questions
75% (4)
Data Modeling Interview Questions
11 pages
ML Module1
No ratings yet
ML Module1
56 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
(Data Sheet) How Does Scoring Work
No ratings yet
(Data Sheet) How Does Scoring Work
6 pages
QA-Designing Business Intelligence Solutions With Microsoft SQL Server 2012
No ratings yet
QA-Designing Business Intelligence Solutions With Microsoft SQL Server 2012
5 pages
Introduction To Computer in Dentistry
No ratings yet
Introduction To Computer in Dentistry
5 pages
Opennebula Instal Steps
No ratings yet
Opennebula Instal Steps
23 pages
Computer Science Class 9th Chapter 1 Notes
No ratings yet
Computer Science Class 9th Chapter 1 Notes
14 pages
Case Study - Rca - Customer Complaints - Sologic
No ratings yet
Case Study - Rca - Customer Complaints - Sologic
4 pages
Blockchain Syllabus
No ratings yet
Blockchain Syllabus
2 pages
IU Master
No ratings yet
IU Master
34 pages
استخدام نظم المعلومات الجغرافية لتقييم الوضع الراهن لمواقع مدارس البنات الحكومية بمدينة مكة المكرمة
No ratings yet
استخدام نظم المعلومات الجغرافية لتقييم الوضع الراهن لمواقع مدارس البنات الحكومية بمدينة مكة المكرمة
65 pages
Seminar Report On WWW
No ratings yet
Seminar Report On WWW
7 pages
Oracle AQs Presentation
No ratings yet
Oracle AQs Presentation
15 pages
The Digital Firm: Electronic Business and Electronic Commerce
No ratings yet
The Digital Firm: Electronic Business and Electronic Commerce
54 pages
1Z0 1003 24 Demo
No ratings yet
1Z0 1003 24 Demo
4 pages
CS505-P Update Mcqs FinalTerm by Vu Topper RM
100% (1)
CS505-P Update Mcqs FinalTerm by Vu Topper RM
18 pages
SANTAK Remote
No ratings yet
SANTAK Remote
5 pages
Summer Internship Proposal
No ratings yet
Summer Internship Proposal
13 pages
Human Computer Interaction Exam Questions (2017 Fall Semester)
No ratings yet
Human Computer Interaction Exam Questions (2017 Fall Semester)
2 pages
Linux-Foundation: Exam Questions CKA
No ratings yet
Linux-Foundation: Exam Questions CKA
11 pages
ACCG3055 Information Systems in Management Chapter 1: Disruptive IT Impacts Companies, Competition and Careers
No ratings yet
ACCG3055 Information Systems in Management Chapter 1: Disruptive IT Impacts Companies, Competition and Careers
29 pages
MayuriKothawade Resume
No ratings yet
MayuriKothawade Resume
5 pages
(Ebook PDF) Business Driven Technology 7th Edition Download
100% (3)
(Ebook PDF) Business Driven Technology 7th Edition Download
25 pages
Magic Quadrant For Network Firewalls, 2021
No ratings yet
Magic Quadrant For Network Firewalls, 2021
41 pages
03 - Lecture - Performance Analysis
No ratings yet
03 - Lecture - Performance Analysis
26 pages
Alicloud
No ratings yet
Alicloud
582 pages
20bce0610 VL2022230103815 Pe003
No ratings yet
20bce0610 VL2022230103815 Pe003
32 pages
An Efficient Approach To Access Database in J2ME Applications
No ratings yet
An Efficient Approach To Access Database in J2ME Applications
5 pages
Project Proposal - CIT 490
No ratings yet
Project Proposal - CIT 490
2 pages
Living in Information Technology Era:: Chapter 0 - Chapter 1
No ratings yet
Living in Information Technology Era:: Chapter 0 - Chapter 1
3 pages
Actualnet
No ratings yet
Actualnet
65 pages
I Love Merge
No ratings yet
I Love Merge
52 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2 Data Mining Terms & Concepts

Uploaded by

2 Data Mining Terms & Concepts

Uploaded by

DATA MINING TERMS &

• It is meant for users or knowledge workers in the role of

• These systems are referred as online analytical

Transaction Oriented applications

Mainly concern with Entry, Storage and retrieval of data.

Design to day-to-day operations such as purchasing,

It supports basically DML operations.

Almost all industries including:

• Database of supermarket store consists of the

• Allow user to read, write and delete data quickly

• Fast query processing

• Respond user actions immediately and also support transaction

• It require concurrency control(locking) and

• OLTP system data content not suitable for decision

• A typical OLTP system manages the current data within the

The Supermarket store is looking at offering some discount

• OLAP differ from traditional DB in way the

• OLAP data are held in the dimensional

• OLAP life’s blood is multidimensional data

• The multidimensional data model views

• The schemas are designed to address the unique needs

• It is known as star schema as its structure resembles a

• The Star Schema data model is the simplest type of Data

• It uses smaller disk space.

• Easier to implement a dimension as is added to the

• Due to multiple tables query performance is reduced

• It is a process in which an ETL tool extracts the data from

• In this step, data from various source systems is extracted

• It is important to extract the data from various source

• Filtering – loading only certain attributes into the data warehouse.

• Cleaning – filling up the NULL values with some default values,

• Joining – joining multiple attributes into one.

• Splitting – splitting a single attribute into multiple attributes.

• Sorting – sorting tuples on the basis of some attribute (generally key-

• In this step, the transformed data is finally loaded into the

• Sometimes the data is updated by loading into the data

• The rate and period of loading solely depends on the

• Data mining is a subset of knowledge discovery. Thus,

• It has the primary goals of indexing text and searching for

• Depending on the type of the ordering relation we

• There may be more than one concept hierarchy for a

• For instance, a user may prefer to organize price by

• Relating concept generality.

• The ordering reflects the generality of the attribute values,

• For example: markovz@cs.ccsu.edu instantiates the

• Using rules to define the partial order.

• for example: if antecedent then consequent defines the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.