0% found this document useful (0 votes)
246 views3 pages

Activity 1 PDF

Data mining is defined as the process of extracting useful information and patterns from large datasets. It involves analyzing massive amounts of data to discover insights that can help businesses solve problems or seize opportunities. Data mining results from the evolution of both database technology and machine learning research, combining disciplines like algorithms, statistics, and pattern recognition to extract knowledge from data in a more complex way than simple transformations. The key steps in data mining as a knowledge discovery process are data cleaning, integration, selection, transformation, mining patterns, and presenting knowledge. A data warehouse differs from a standard database in that it is a central repository that integrates data from multiple sources for analysis to guide management decisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
246 views3 pages

Activity 1 PDF

Data mining is defined as the process of extracting useful information and patterns from large datasets. It involves analyzing massive amounts of data to discover insights that can help businesses solve problems or seize opportunities. Data mining results from the evolution of both database technology and machine learning research, combining disciplines like algorithms, statistics, and pattern recognition to extract knowledge from data in a more complex way than simple transformations. The key steps in data mining as a knowledge discovery process are data cleaning, integration, selection, transformation, mining patterns, and presenting knowledge. A data warehouse differs from a standard database in that it is a central repository that integrates data from multiple sources for analysis to guide management decisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Reyes, John Michael E.

Activity 1

1. What is data mining? In your answer, address the following:

- Data mining is defined as a process used to extract usable data from a larger set
of any raw data. It implies analysing data patterns in large batches of data using
one or more software. Data mining is also known as Knowledge Discovery in
Data (KDD). Also refers to the process of extracting or mining interesting
knowledge or patterns from large amounts of data. It analyzes massive volumes
of data to discover insights that help businesses solve problems, mitigate risks,
or seize new opportunities.

a. Is it another hype?

- From what I know and from what I’ve read and learned, it is not another
hype.Data mining grows because of its wide availability to everyone. It became
so vast because there is too much data that can be turned into information or
knowledge. It is somewhat we can always see from the future. We all know that
there will be changes, new things will pop up and represent themselves and
there is no reason to believe that there will be no changes in the future.

b. Is it a simple transformation or application of technology developed from


databases, statistics, machine learning, and pattern recognition?

- I think it is not a simple transformation or application of technology developed


from databases, statistics, machine learning, and pattern recognition. It is more
than that, because it includes combinations or amalgamation of disciplines. Like
the one that is presented in our lecture earlier which are algorithm, database
technology, statistics, machine learning, visualization, pattern recognition and
other disciplines. I think it is one of the most complex transformations or
applications of technology, it can affect what will happen in the future and you
can tell that it is really important. It is too vast that it can’t be called just a simple
transformation.

c. We have presented a view that data mining is the result of the evolution of
database technology. Do you think that data mining is also the result of the
evolution of machine learning research? Can you present such views based on
the historical progress of this discipline? Address the same for the fields of
statistics and pattern recognition.

- We all know that it all started at some data collection that guide to an efficient
development that can be used with such data that has been stored. I can say it is
somewhat a result of the evolution of machine learning because it is also capable
of analyzing and creating something new. It is what we call a relationship with
each other. Data mining is used on an existing dataset to find patterns. Machine
learning, on the other hand, is trained on a 'training’ data set, which teaches the
computer how to make sense of data, and then to make predictions about new
data sets.

d. Describe the steps involved in data mining when viewed as a process of


knowledge discovery.

- Data cleaning, a process of detecting and correcting inconsistent data.


- Data integration, from the word integration, it is where multiple data sources may
be combined.
- Data selection, where data relevant to the analysis task are retrieved from the
database
- Data transformation, where data are transformed or converted into forms
appropriate for mining or into one format.
- Data mining is defined as a process used to extract usable data from a larger set
of any raw data.
- Pattern evaluation, a process that identifies the truly interesting patterns
representing knowledge based on some interestingness measures.
- Knowledge presentation, where visualization and knowledge representation
techniques are used to present the mined knowledge to the user.

2. How is a data warehouse different from a database? How are they similar?

- First is a data warehouse is I think what we call a repository of information more


than a database. It is a large store of data accumulated from a wide range of
sources within a company and used to guide management decisions. It is a
central repository of integrated data from one or more disparate sources. While
Database is an organized collection of data, generally stored and accessed
electronically from a computer system. It is also a collection of structured data to
make it easily accessible, manageable and update. The similarities are they are
both storage and both of them have been storing huge amounts of persistent
data.

3. Define each of the following functionalities: characterization, discrimination,


association and correlation analysis, classification, regression, clustering and
outlier analysis.

- Characterization, ​is a summarization of general features of objects in a target


class, and produces what is called characteristic rules.This refers to summarizing
data of class under study.
- Discrimination, it refers to the mapping or classification of a class with some
predefined group or class. Somewhat a comparison of the general features of
target class data objects with the general features of objects from one or a set of
contrasting classes.
- Association and correlation analysis, Correlation analysis explores the
association between two or more variables and makes inferences about the
strength of the relationship. Technically, association refers to any relationship
between two variables, whereas correlation is often used to refer only to a linear
relationship between two variables.
- Classification, it is the organization of data in given classes and it predicts the
class of objects whose class label is unknown. Its objective is to find a derived
model that describes and distinguishes data classes or concepts.
- Regression, it is a technique used to fit an equation to a dataset. A function that
predicts a number.
- Clustering and outlier analysis, it is similar to classification, clustering is the
organization of data in classes.However, unlike classification, in clustering, class
labels are unknown and it is up to the clustering algorithm to discover acceptable
classes. It analyzes data objects without consulting a known class label. The
objects are clustered or grouped based on the principle of maximizing the
intraclass similarity and minimizing the interclass similarity. Each cluster that is
formed can be viewed as a class of objects. While outlier analysis are data
elements that cannot be grouped in a given class or cluster.They are often very
important to identify. Sometimes outliers are being discarded and considered as
noise in some applications but still can still be usable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy