0% found this document useful (0 votes)
99 views22 pages

Topic 4 - Data Mining Tools and Technique

Uploaded by

Arif Syazmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views22 pages

Topic 4 - Data Mining Tools and Technique

Uploaded by

Arif Syazmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

DATA MINING

TOOLS AND
TECHNIQUES

ITC2263 INTRODUCTION TO DATA


ANALYTICS
TOPIC 4
Data Mining tools

❑ Data Mining tools the set of techniques that utilize specific


algorithms, statistical analysis, artificial intelligence, and database
systems to analyze data from different dimensions and
perspectives.
❑ Data Mining tools have the objective of discovering
patterns/trends/groupings among large sets of data and
transforming data into more refined information.
Data Mining tools-cont’d

❑We can perform various algorithms such as clustering or


classification on your data set and visualize the results
itself. It is a framework that provides us better insights
for our data and the phenomenon that data represent.
❑Such a framework is called a data mining tool.
Tools-examples

❖ Orange Data Mining


❖ Orange is a perfect machine learning and data mining software suite. It supports the
visualization and is a software-based on components written in Python computing
language.
– As it is a software-based on components, the components of Orange are called
"widgets." These widgets range from preprocessing and data visualization to the
assessment of algorithms and predictive modeling.
– Widgets deliver significant functionalities such as:
– Displaying data table and allowing to select features
– Data reading
– Training predictors and comparison of learning algorithms
– Data element visualization, etc.
Tools-examples

❖ SAS Data Mining


❖ SAS stands for Statistical Analysis System. It is a product of the SAS Institute created for
analytics and data management.
❖ SAS can mine data, change it, manage information from various sources, and analyze
statistics.
❖ It offers a graphical UI for non-technical users.
❖ SAS data miner allows users to analyze big data and provide accurate insight for timely
decision-making purposes.
❖ SAS has distributed memory processing architecture that is highly scalable. It is
suitable for data mining, optimization, and text mining purposes
Tools-examples

❖ DataMelt Data Mining


❖ DataMelt is a computation and visualization environment which offers an
interactive structure for data analysis and visualization.
❖ It is primarily designed for students, engineers, and scientists. It is also known
as DMelt.
❖ DMelt is a multi-platform utility written in JAVA.
❖ It can run on any operating system which is compatible with JVM (Java Virtual
Machine)
Cont’d

– It consists of Science and mathematics libraries.


– Scientific libraries:
Scientific libraries are used for drawing the 2D/3D plots.
– Mathematical libraries:
Mathematical libraries are used for random number generation,
algorithms, curve fitting, etc.
Tools-examples

❖ Rattle
– Rattle is a data mining tool based on GUI. It uses the R stats programming
language. Rattle exposes the statical power of R by offering significant data
mining features.
– While rattle has a comprehensive and well-developed user interface, It has an
integrated log code tab that produces duplicate code for any GUI operation.
– The data set produced by Rattle can be viewed and edited. Rattle gives the
other facility to review the code, use it for many purposes, and extend the code
without any restriction.
Tools-examples

❖ Rapid Miner:
– Rapid Miner is one of the most popular predictive analysis systems created by
the company with the same name as the Rapid Miner.
– It is written in JAVA programming language. It offers an integrated environment
for text mining, deep learning, machine learning, and predictive analysis.
– The instrument can be used for a wide range of applications, including company
applications, commercial applications, research, education, training, application
development, machine learning.
Cont’d

– Rapid Miner provides the server on-site as well as in public or


private cloud infrastructure.
– It has a client/server model as its base. A rapid miner comes with
template-based frameworks that enable fast delivery with few
errors(which are commonly expected in the manual coding writing
process)
Data Mining Techniques

❖ Data mining includes the utilization of refined data analysis tools to find
previously unknown, valid patterns and relationships in huge data sets.
❖ These tools can incorporate statistical models, machine learning techniques,
and mathematical algorithms, such as neural networks or decision trees.
❖ Thus, data mining incorporates analysis and prediction.
❖ In recent data mining projects, various major data mining techniques have been
developed and used,
Figure 1: Data Mining
Techniques
Classification:

❖ This technique is used to obtain important and relevant information about data
and metadata. This data mining technique helps to classify data in different
classes.
– Classification of Data mining frameworks as per the type of data sources
mined:
This classification is as per the type of data handled. For example, multimedia,
spatial data, text data, time-series data, World Wide Web, and so on..
– Classification of data mining frameworks as per the database involved:
This classification based on the data model involved. For example. Object-
oriented database, transactional database, relational database, and so on..
Classification-cont’d

– Classification of data mining frameworks as per the kind of knowledge discovered:


This classification depends on the types of knowledge discovered or data mining
functionalities. For example, discrimination, classification, clustering, characterization, etc.
some frameworks tend to be extensive frameworks offering a few data mining functionalities
together..
– Classification of data mining frameworks according to data mining techniques used:
This classification is as per the data analysis approach utilized, such as neural networks,
machine learning, genetic algorithms, visualization, statistics, data warehouse-oriented or
database-oriented, etc.
The classification can also take into account, the level of user interaction involved in the data
mining procedure, such as query-driven systems, autonomous systems, or interactive
exploratory systems.
Clustering

❖ Clustering is a division of information into groups of connected objects. Describing the


data by a few clusters mainly loses certain confine details, but accomplishes
improvement. It models data by its clusters.
– Data modeling puts clustering from a historical point of view rooted in statistics,
mathematics, and numerical analysis. From a machine learning point of view, clusters
relate to hidden patterns, the search for clusters is unsupervised learning, and the
subsequent framework represents a data concept.
– clustering plays an extraordinary job in data mining applications. For example,
scientific data exploration, text mining, information retrieval, spatial database
applications, CRM, Web analysis, computational biology, medical diagnostics, and
much more.
Clustering-cont’d

– Clustering analysis is a data mining technique to identify similar


data.
– This technique helps to recognize the differences and similarities
between the data.
– Clustering is very similar to the classification, but it involves
grouping chunks of data together based on their similarities.
Regression

❖ Regression analysis is the data mining process is used to identify and analyze
the relationship between variables because of the presence of the other factor.
❖ It is used to define the probability of the specific variable. Regression, primarily
a form of planning and modeling.
❖ For example, we might use it to project certain costs, depending on other
factors such as availability, consumer demand, and competition.
❖ Primarily it gives the exact relationship between two or more variables in the
given data set.
Association Rules:

❖ This data mining technique helps to discover a link between two or more items.
It finds a hidden pattern in the data set.
– Association rules are if-then statements that support to show the probability of
interactions between data items within large data sets in different types of
databases. Association rule mining has several applications and is commonly
used to help sales correlations in data or medical data sets.
– The way the algorithm works is that you have various data,
– For example, a list of grocery items that you have been buying for the last six
months. It calculates a percentage of items being purchased together.
Outer detection

❖ This type of data mining technique relates to the observation of data items in the
data set, which do not match an expected pattern or expected behavior.
❖ This technique may be used in various domains like intrusion, detection, fraud
detection, etc. It is also known as Outlier Analysis or Outilier mining.
❖ The outlier is a data point that diverges too much from the rest of the dataset.
❖ The majority of the real-world datasets have an outlier. Outlier detection plays a
significant role in the data mining field.
❖ Outlier detection is valuable in numerous fields like network interruption
identification, credit or debit card fraud detection, detecting outlying in wireless
sensor network data, etc.
Sequential Patterns

❖ The sequential pattern is a data mining technique specialized for evaluating


sequential data to discover sequential patterns. It comprises of finding
interesting subsequences in a set of sequences, where the stake of a sequence
can be measured in terms of different criteria like length, occurrence frequency,
etc.
– In other words, this technique of data mining helps to discover or recognize
similar patterns in transaction data over some time.
Prediction

❖ Prediction used a combination of other data mining techniques


such as trends, clustering, classification, etc.
❖ It analyzes past events or instances in the right sequence to
predict a future event.
Case study on data mining applications

❖Case study 1
❖Case study 2
Read the given case studies and summarize the finding
from your point of view.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy