0% found this document useful (0 votes)

67 views

Data Mining: Set-01: (Introduction)

Data mining is the process of sorting through large datasets to identify patterns and establish relationships through data analysis. It draws ideas from machine learning, pattern recognition, statistics, and database systems. Some challenges of data mining include scalability, dimensionality, complex and heterogeneous data, data quality, and privacy preservation. Data mining has applications in e-commerce, crime agencies, information retrieval, science and engineering, medical data mining, and more.

Uploaded by

Abdur Rahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

Data Mining: Set-01: (Introduction)

Uploaded by

Abdur Rahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Mining

Set-01: (Introduction)

Q 1. What is Data Mining? Describe the origins of data mining.

Data Mining: Data mining is the process of sorting through large data sets to identify patterns and
establish relationships to solve problems through data analysis.
Example:
 E-commerce
 Crime agencies
 Information retrieval
 Science and Engineering
 Medical data mining

Origins of Data Mining:

 Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems
 Traditional Techniques may be unsuitable due to
 Enormity of data
 High dimensionality of data
 Heterogeneous, distributed nature of data

Fig: Origins of data mining

Q 2. Write down some challenges in data mining.

 Scalability
 Dimensionality
 Complex and Heterogeneous Data
 Data Quality
 Data Ownership and Distribution
 Privacy Preservation
 Streaming Data
Q 3. Write down some applications of data mining.
 E-commerce
 Crime agencies
 Information retrieval
 Science and Engineering
 Medical data mining
 Market Basket Analysis
 Manufacturing Engineering
 Fraud Detection
 Corporate Surveillance
 Research Analysis
 Bio Informatics

Q 4. Distinguish between information extraction & data mining procedure.

Ans: Difference between information extraction and data mining procedure

Information Extraction Data Mining

1. Data Mining is the ability to retrieve

1. Information extraction is the task of
information from one or more data sources in
automatically extracting structured information
order to combine it, cluster it, visualize it and
from unstructured documents.
discover patterns in the data.

2. It return relevant results. 2. It discover patterns in the data

3. Obtaining required information from the 3. Process of discovering useful hidden patterns
sources you already have. from the data you have.

4. Uses: Fraud Detection, Research Analysis, Bio

4. Uses: Extract data from large databases.
informatics, Manufacturing Engineering etc.
Set-02: (Data)

Q 1. What is Data? Describe different types of attributes with appropriate example.

Data: In computing, data is information that has been translated into a form that is efficient for
movement or processing. Data is information converted into binary digital form.

Different types of attributes are following,

1. Nominal Data Attributes: The values of a nominal attribute are just different names, nominal
values provide only enough information to distinguish one object from another.

Operation: (=, ≠ )

Examples: Zip Code, employee ID numbers, eye color, gender etc.

2. Ordinal Data Attributes: The values of an ordinal attribute provide enough information to
order objects. All Values have a meaningful order.

Operation: (<, >)

Example: Hardness of minerals, grades, street numbers etc.

3. Interval Data Attributes: For interval attributes, the differences between values are
meaningful, i.e., a unit of measurement exists.

Operation: (+,-)

Example: Calendar dates, temperature in Celsius or Fahrenheit

4. Rational Data Attributes: For ratio variables, both differences and ratios are meaningful.

Operation: (*, /)

Example: Temperature in Kelvin, Monetary quantities, counts, age, mass, length, electrical
current etc.

Q 2. Definition: Principal component analysis, Dimensionality reduction, Cosine similarity, Feature

extraction and Feature creation.
1. Principal Component Analysis: Principal component analysis (PCA) is a statistical procedure
that uses an orthogonal transformation to convert a set of observations of possibly correlated
variables into a set of values of linearly uncorrelated variables called principal components.

2. Dimensionality reduction: Dimensionality reduction or dimension reduction is the process of

reducing the number of random variables under consideration by obtaining a set of principal
variables. It can be divided into feature selection and feature extraction.

3. Cosine similarity: Cosine similarity is a measure of similarity between two non-zero vectors
of an inner product space that measures the cosine of the angle between them.

4. Feature extraction: The creation of a new set of features from the original raw data is known
as feature extraction. Consider a set of photographs, where each photograph is to be classified
according to whether or not it contains a human face.

5. Feature creation: It is frequently possible to create, from the original attributes, a new set of
attributes that captures the important information in a data set much more effectively.
Q 3. Describe different types of Data with appropriate example.
Different types of data are given below:
1. Numeric Data: Numeric data consists of numeric digits from numeric digits from 0 to 9. It
may also contain decimal point “.”, plus sing “+” or negative sign “-“. The numeric type of data
may either be positive or negative. The use of “+” with positive numbers is optional.

Examples: 10, +5, -12, 13.7, -32.5 etc.

2. Text Data: Text data consists of words, sentences and paragraphs. Text processing refers to the
ability to manipulate words, lines and pages. Text is normally stored as ASCII code without
formatting.
Examples: Some examples of text data are Riaz Ameen, Pakistan, Islam etc.

3. Audio Data: Sound is a representation of audio. Audio data includes music, speech or any type
of sound.

4. Video Data: Video is a set of full-motion images played at a high speed. Video is used to
display actions and movements.

5. Image Data: This type of data includes chart, graph, pictures and drawing. This form of data is
more comprehensive. It can be transmitted as a set of bits. The bits are packed as bytes.
Set-03: (Data Visualization)
Q 1. What is Data Visualization?
Data Visualization: Data visualization is the display of information in a graphic or tabular format.
Successful visualization requires that the data be converted into a visual format so that the
characteristics of the data and the relationships among data items or attributes can be analyzed or
reported.

Q 2. Short note with figure: Histogram, Boxplot & Scatterplot.

 Histogram: A Histogram is graphical display of data using bars of different heights. It groups the
various numbers in the data set into many ranges. It also represents the estimation of the probability
of distribution of a continuous variable. Usually a histogram looks like this.

Fig: Histogram

 Boxplot: A Boxplot is graphical representation of groups of numerical data through their quartiles.
Box plots may also have lines extending vertically from the boxes indicating variability outside the
upper and lower quartiles.

Fig: Boxplot

 Scatterplot: A scatterplot is a type of graph which uses values from two variables plotted in a
Cartesian plane. It is usually used to find out the relationship between two variables.

Fig: Scatterplot
Q 3. What is OLAP? Describe different types of OLAP.
OLAP: OLAP stands for Online Analytical Processing Server. OLAP is based on the
multidimensional data model. It allows managers, and analysts to get an insight of the information
through fast, consistent, and interactive access to information.

Some types of OLAP are following:

1. Relational OLAP: ROLAP servers are placed between relational back-end server and client
front-end tools. To store and manage warehouse data, ROLAP uses relational or extended-
relational DBMS.
ROLAP includes the following −
 Implementation of aggregation navigation logic.
 Optimization for each DBMS back end.
 Additional tools and services.

2. Multidimensional OLAP: MOLAP uses array-based multidimensional storage engines for

multidimensional views of data. With multidimensional data stores, the storage utilization may
be low if the data set is sparse.

3. Hybrid OLAP: Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher
scalability of ROLAP and faster computation of MOLAP. HOLAP servers allows to store the
large data volumes of detailed information.

4. Specialized SQL Servers: Specialized SQL servers provide advanced query language and
query processing support for SQL queries over star and snowflake schemas in a read-only
environment.

Q 4. What is Data Cube?

Data Cube: A multidimensional representation of the data, together with all possible totals, is known
as a data cube. Despite the name, the size of each dimension-the number of attribute values-does not
need to be equal.

Fig: Data Cube

Set-04: Classification

Difference among supervised, semi-supervised, reinforcement & unsupervised learning

techniques:

Supervised Unsupervised Semi-Supervised Reinforcement

Learning Learning Learning Learning

1. Graph based 1. Markov decision

1. Linear regression 1. Clustering
method processes

2. Logistic regression 2. K Means 2. Generative models 2. Monte carlo methods

3. K Nearest 3. Dimensionality 3. Low density 3. Temporal difference

neighbors reduction separation learning

4. Principle component 4. Heuristics 4. Neuro-dynamic

4. Decision trees
analysis approaches programming
5. Input / Output
5. Input only 5. Input only 5. Input & critic
pairs

Decision tree: A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and
each leaf node holds a class label.

Entropy: Entropy can be defined as a measure of the average information content per source
symbol. Claude Shannon, the “father of the Information Theory”, provided a formula for it as −
H=−∑ipilogbpi

Misclassification Error: Misclassification may occur due to selection of property which is not
suitable for classification. When all classes, groups, or categories of a variable have the same error
rate or probability of being misclassified then it is said to be misclassification.

Overfitting: Overfitting is a modeling error which occurs when a function is too closely fit to a
limited set of data points. Overfitting the model generally takes the form of making an overly
complex model.

Under fitting: Under fitting occurs when a statistical model or machine learning algorithm cannot
capture the underlying trend of the data. Specifically, under fitting occurs if the model or algorithm
shows low variance but high bias.

Bias: Bias is prejudice in favor of or against one thing, person, or group compared with another,
usually in a way considered to be unfair.

Variance: Variance is a measurement of the spread between numbers in a data set. The variance
measures how far each number in the set is from the mean.

Advantages of nearest neighbor classifiers:

a) Simple to implement
b) Flexible to feature / distance choices
c) Naturally handles multi-class cases
d) Can do well in practice with enough representative data
Limitations of nearest neighbor classifier:
a) Required well classified training data
b) Can be sensitive to k value chosen
c) All attributes are used in classification even ones that may be irrelevant
Set-05: Regression

Difference between Classification & Regression techniques:

Subject Classification Technique Regression Technique

1. Basic The discovery of model or functions A devised model in which the mapping
where the mapping of objects is of objects is done into values.
done into predefined classes.
2.Involves Discrete values Continuous values
prediction of
3.Continuous Decision tree, logistic regression, Regression tree (Random forest), Linear
values etc. regression, etc.
4. Nature of the Unordered Ordered
predicted data
5. Method of Measuring accuracy Measurement of root mean square error
calculation

Linear Regression: Linear regression is a linear approach for modelling the relationship between a
scalar dependent variable y and one or more explanatory variables denoted X.

Logistic Regression: The Logistic Regression is a regression model in which the response variable
has categorical values such as True/False or 0/1. It actually measures the probability of a binary
response as the value of response variable based on the mathematical equation relating it with the
predictor variables.

Polynomial Regression: Polynomial regression is a form of regression analysis in which the

relationship between the independent variable x and the dependent variable y is modelled as an nth
degree polynomial in x.

Optimization Cost Function: Cost functions are a way to help the data modeler solve a
supervised learning problem, either classification or regression. The 'fit' of the response surface on
the data available will associate a cost of the event that occurs. In an optimization example, you
would want to minimize your cost.

Q 1.
Set-06: Model Evaluation + ANN

Precision: Precision is the percentage of retrieved documents that are in fact relevant to the query.
Precision can be defined as –
Precision= |{Relevant} ∩ {Retrieved}| / |{Retrieved}|

Recall: Recall is the percentage of documents that are relevant to the query and were in fact
retrieved. Recall is defined as –
Recall = |{Relevant} ∩ {Retrieved}| / |{Relevant}|

F-Measure: F-Measure (also known as F-score) is the commonly used trade-off. The information
retrieval system often needs to trade-off for precision or vice versa. F-score is defined as harmonic
mean of recall or precision as follows −
F-Measure = recall x precision / (recall + precision) / 2

Confusion Matrix: A confusion matrix is a table that is often used to describe the performance of
a classification model on a set of test data for which the true values are known.

Root Mean Square Error (RMSE): The root-mean-square error (RMSE) is a frequently used
measure of the differences between values predicted by a model or an estimator and the values
actually observed.

Mean Absolute Error (MAE): Mean absolute error (MAE) is a measure of difference between
two continuous variables. Assume X and Y are variables of paired observations that express the
same phenomenon. Examples of Y versus X include comparisons of predicted versus observed,
subsequent time versus initial time, and one technique of measurement versus an alternative
technique of measurement.

Artificial Neural Network: Artificial Neural Network (ANN) is an efficient computing system whose
central theme is borrowed from the analogy of biological neural networks. ANNs are also named as
“artificial neural systems”.

Why we use non-linear activation function on ANN: Neural networks are used to implement
complex functions, and non-linear activation functions enable them to approximate arbitrarily complex
functions. Without the non-linearity introduced by the activation function, multiple layers of a neural
network are equivalent to a single layer neural network.

Let’s see a simple example to understand why without non-linearity it is impossible to

approximate even simple functions like XOR and XNOR gate. In the figure below, we graphically
show an XOR gate. There are two classes in our dataset represented by a cross and a circle. When the
two features, and are the same, the class label is a red cross, otherwise, it is a blue circle. The
two red crosses have an output of 0 for input value (0,0) and (1,1) and the two blue rings have an
output of 1 for input value (0,1) and (1,0).

Fig: Graphical Representation of XOR gate

Difference between Forward & Back Propagation methods:

Forward Propagation Back Propagation

1. A feedforward neural network is an artificial 1. Backpropagation is a method used in artificial

neural network where in connections between neural networks to calculate a gradient.
the units do not form a cycle.
2. Compute functional signals 2. Computes error signal
Set-07: (Clustering & Association Rule Mining)
Q 1. Difference between Clustering & Classification technique.
Ans: Difference between Clustering & Classification technique:
Classification Technique Clustering Technique

1. A supervised learning technique 1. An unsupervised learning technique

2. Finite set of classes 2. Finite set of clusters
3. Goal of assigning new input to a class 3. Goal of finding similarities within a given
dataset
4. Infinite set of input data 4. Finite set of data

Q 2. What is association rule mining? Write down some frequent item set, support & confidence.
Association Rule Mining: Association rule mining is a procedure which is meant to find frequent
patterns, correlations, associations, or causal structures from data sets found in various kinds of
databases such as relational databases, transactional databases, and other forms of data repositories.

Q 3. Difference between Apriori & Eclat algorithm in association rule mining.

Ans: Difference between Apriori & Eclat algorithm:
Apriori Algorithm Eclat Algorithm

1. The Apriori Algorithm is an influential 1. The Eclat algorithm is used to perform item
algorithm for mining frequent item sets for set mining
Boolean association rules.
2. Apriori are use large dataset 2. Eclat are small and medium dataset

3. Apriori are scan original dataset 3. Eclat scan currently generated dataset

4. Apriori are slower than Eclat 4. Eclat are slower than Apriori

5. In Apriori database is taken as usual 5. Eclat using the database in vertical layout

The Power of Touch in Media by PHD UK
No ratings yet
The Power of Touch in Media by PHD UK
4 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
DM_Midsem_Question Bank (1)
No ratings yet
DM_Midsem_Question Bank (1)
5 pages
UNIT 1
No ratings yet
UNIT 1
34 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
DM UNIT-1-1
No ratings yet
DM UNIT-1-1
56 pages
ITS632 Lecture2 Data
No ratings yet
ITS632 Lecture2 Data
61 pages
Chapter 2 - Tagged
No ratings yet
Chapter 2 - Tagged
66 pages
Dwdmsem 6 QB
No ratings yet
Dwdmsem 6 QB
13 pages
Unit 2 Data Preprocessing for Students.pptx
No ratings yet
Unit 2 Data Preprocessing for Students.pptx
169 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Unit I Notes
No ratings yet
Unit I Notes
23 pages
Unit 2
No ratings yet
Unit 2
37 pages
Dmml Notes
No ratings yet
Dmml Notes
89 pages
IDS Unit 2 Additional Topics
No ratings yet
IDS Unit 2 Additional Topics
15 pages
DWDM REFERENCE NOTES
No ratings yet
DWDM REFERENCE NOTES
126 pages
Data Mining Report
No ratings yet
Data Mining Report
15 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
DataMining Unit I Notes
No ratings yet
DataMining Unit I Notes
28 pages
Data Mining
No ratings yet
Data Mining
15 pages
BI_UNIT 3
No ratings yet
BI_UNIT 3
18 pages
fds print
No ratings yet
fds print
7 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-1 (Lecture Note)
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-1 (Lecture Note)
2 pages
Data Warehousing and Data Mining: DR Seema Agarwal
No ratings yet
Data Warehousing and Data Mining: DR Seema Agarwal
72 pages
Datamining-lect2 - What is Data_ the Data Mining Pipeline. Preprocessing and Postprocessing. Samping and Normalization (1)
No ratings yet
Datamining-lect2 - What is Data_ the Data Mining Pipeline. Preprocessing and Postprocessing. Samping and Normalization (1)
94 pages
Datamining Lect1
No ratings yet
Datamining Lect1
61 pages
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
No ratings yet
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
55 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Data Mining-1
No ratings yet
Data Mining-1
15 pages
FDS - 2 SOLVED
No ratings yet
FDS - 2 SOLVED
14 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
02Data
No ratings yet
02Data
24 pages
Unit-II Notes
No ratings yet
Unit-II Notes
9 pages
Wk. 3. Data (12-05-2021)
No ratings yet
Wk. 3. Data (12-05-2021)
57 pages
Getting To Know Your Data: - Chapter 2
No ratings yet
Getting To Know Your Data: - Chapter 2
63 pages
Data Mining Chapter 2 Data Preprocessing
No ratings yet
Data Mining Chapter 2 Data Preprocessing
33 pages
ML 1,2 Unit Peter Flach Machine Learning. The Art and Scienc
No ratings yet
ML 1,2 Unit Peter Flach Machine Learning. The Art and Scienc
22 pages
Whats App
No ratings yet
Whats App
23 pages
Chapter 2.1 2.2
No ratings yet
Chapter 2.1 2.2
40 pages
Business Analytics and Data Mining Modeling Using R
No ratings yet
Business Analytics and Data Mining Modeling Using R
6 pages
ML-Lecture-4-data
No ratings yet
ML-Lecture-4-data
22 pages
Data Warehousing and Mining: Unit: Introduction and Datawarehousing
No ratings yet
Data Warehousing and Mining: Unit: Introduction and Datawarehousing
8 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Data Warehousing & Data Mining - Study Material
No ratings yet
Data Warehousing & Data Mining - Study Material
27 pages
DM Unit2(Part1)
No ratings yet
DM Unit2(Part1)
19 pages
Question Bank With 2 Marks
100% (1)
Question Bank With 2 Marks
21 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
21 pages
Lect 3
No ratings yet
Lect 3
51 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
Data mining 3
No ratings yet
Data mining 3
31 pages
Datamining-Lect1 2
No ratings yet
Datamining-Lect1 2
44 pages
2020 intro
No ratings yet
2020 intro
58 pages
Chap2 Data
No ratings yet
Chap2 Data
68 pages
TTDS Lecture 1
No ratings yet
TTDS Lecture 1
22 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Sokolowski - Exorcising Concepts
No ratings yet
Sokolowski - Exorcising Concepts
3 pages
The Amazing World of AI
No ratings yet
The Amazing World of AI
5 pages
Executive Order No. 210: WHEREAS, Section 7, Article XIV of The 1987 Constitution Provides That For
No ratings yet
Executive Order No. 210: WHEREAS, Section 7, Article XIV of The 1987 Constitution Provides That For
7 pages
Centro Escolar University School of Education, Liberal Arts, Music and Social Work
No ratings yet
Centro Escolar University School of Education, Liberal Arts, Music and Social Work
3 pages
Navarro College Essentials of The MLT Program
No ratings yet
Navarro College Essentials of The MLT Program
1 page
Education Project Proposal Guidance KAVA5 2017-1
No ratings yet
Education Project Proposal Guidance KAVA5 2017-1
14 pages
Boltzman Machine
No ratings yet
Boltzman Machine
4 pages
Lesson Plan: Skill Focus: Reading
No ratings yet
Lesson Plan: Skill Focus: Reading
22 pages
Creative arts and design basic 8
No ratings yet
Creative arts and design basic 8
5 pages
Lesson Plan: Buckets, Poly Spots
No ratings yet
Lesson Plan: Buckets, Poly Spots
3 pages
Care Plan
100% (1)
Care Plan
3 pages
Paragraph: Getting Your Main Point Across
No ratings yet
Paragraph: Getting Your Main Point Across
2 pages
SITXCOM010 Manage Conflict
No ratings yet
SITXCOM010 Manage Conflict
6 pages
Schmidt - Intercultural Quiz
No ratings yet
Schmidt - Intercultural Quiz
2 pages
Basic Techniques of Technical Writing
No ratings yet
Basic Techniques of Technical Writing
3 pages
Human Resource Management Practices and Innovation A Review of Literature PDF
100% (1)
Human Resource Management Practices and Innovation A Review of Literature PDF
13 pages
Kudesia - 2015 - Mindfulness and Creativity in The Workplace
No ratings yet
Kudesia - 2015 - Mindfulness and Creativity in The Workplace
23 pages
Re: U1DF: The Art of Getting Things Done: Nina Robinson
No ratings yet
Re: U1DF: The Art of Getting Things Done: Nina Robinson
3 pages
ECCE 2013 Sample Test Key
No ratings yet
ECCE 2013 Sample Test Key
1 page
Quiz 3 - Attempt Review
50% (2)
Quiz 3 - Attempt Review
5 pages
Session Plan
No ratings yet
Session Plan
2 pages
A Behavioral Approach To Law and Economics
No ratings yet
A Behavioral Approach To Law and Economics
100 pages
ACID Plan Template
No ratings yet
ACID Plan Template
4 pages
Pagadian Junior Colleges (PJC), Inc
No ratings yet
Pagadian Junior Colleges (PJC), Inc
8 pages
BTVTED
No ratings yet
BTVTED
43 pages
Customer Service Executive
No ratings yet
Customer Service Executive
2 pages
The Borg CR Scales Folder
No ratings yet
The Borg CR Scales Folder
4 pages
Types and Functions of Groups PDF
100% (1)
Types and Functions of Groups PDF
28 pages
Survey Form To Assess The Level of Attainment of Student Outcomes - Employer
No ratings yet
Survey Form To Assess The Level of Attainment of Student Outcomes - Employer
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining: Set-01: (Introduction)

Uploaded by

Data Mining: Set-01: (Introduction)

Uploaded by

Data Mining

Q 1. What is Data Mining? Describe the origins of data mining.

Origins of Data Mining:

Fig: Origins of data mining

Q 2. Write down some challenges in data mining.

Q 4. Distinguish between information extraction & data mining procedure.

Information Extraction Data Mining

1. Data Mining is the ability to retrieve

2. It return relevant results. 2. It discover patterns in the data

4. Uses: Fraud Detection, Research Analysis, Bio

Q 1. What is Data? Describe different types of attributes with appropriate example.

Different types of attributes are following,

Examples: Zip Code, employee ID numbers, eye color, gender etc.

Operation: (<, >)

Example: Hardness of minerals, grades, street numbers etc.

Example: Calendar dates, temperature in Celsius or Fahrenheit

Q 2. Definition: Principal component analysis, Dimensionality reduction, Cosine similarity, Feature

2. Dimensionality reduction: Dimensionality reduction or dimension reduction is the process of

Examples: 10, +5, -12, 13.7, -32.5 etc.

Q 2. Short note with figure: Histogram, Boxplot & Scatterplot.

Some types of OLAP are following:

2. Multidimensional OLAP: MOLAP uses array-based multidimensional storage engines for

Q 4. What is Data Cube?

Fig: Data Cube

Difference among supervised, semi-supervised, reinforcement & unsupervised learning

Supervised Unsupervised Semi-Supervised Reinforcement

1. Graph based 1. Markov decision

2. Logistic regression 2. K Means 2. Generative models 2. Monte carlo methods

3. K Nearest 3. Dimensionality 3. Low density 3. Temporal difference

4. Principle component 4. Heuristics 4. Neuro-dynamic

Advantages of nearest neighbor classifiers:

Difference between Classification & Regression techniques:

Polynomial Regression: Polynomial regression is a form of regression analysis in which the

Let’s see a simple example to understand why without non-linearity it is impossible to

Fig: Graphical Representation of XOR gate

Forward Propagation Back Propagation

1. A feedforward neural network is an artificial 1. Backpropagation is a method used in artificial

1. A supervised learning technique 1. An unsupervised learning technique

Q 3. Difference between Apriori & Eclat algorithm in association rule mining.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.