Topic 4 - Data Mining Tools and Technique
Topic 4 - Data Mining Tools and Technique
TOOLS AND
TECHNIQUES
❖ Rattle
– Rattle is a data mining tool based on GUI. It uses the R stats programming
language. Rattle exposes the statical power of R by offering significant data
mining features.
– While rattle has a comprehensive and well-developed user interface, It has an
integrated log code tab that produces duplicate code for any GUI operation.
– The data set produced by Rattle can be viewed and edited. Rattle gives the
other facility to review the code, use it for many purposes, and extend the code
without any restriction.
Tools-examples
❖ Rapid Miner:
– Rapid Miner is one of the most popular predictive analysis systems created by
the company with the same name as the Rapid Miner.
– It is written in JAVA programming language. It offers an integrated environment
for text mining, deep learning, machine learning, and predictive analysis.
– The instrument can be used for a wide range of applications, including company
applications, commercial applications, research, education, training, application
development, machine learning.
Cont’d
❖ Data mining includes the utilization of refined data analysis tools to find
previously unknown, valid patterns and relationships in huge data sets.
❖ These tools can incorporate statistical models, machine learning techniques,
and mathematical algorithms, such as neural networks or decision trees.
❖ Thus, data mining incorporates analysis and prediction.
❖ In recent data mining projects, various major data mining techniques have been
developed and used,
Figure 1: Data Mining
Techniques
Classification:
❖ This technique is used to obtain important and relevant information about data
and metadata. This data mining technique helps to classify data in different
classes.
– Classification of Data mining frameworks as per the type of data sources
mined:
This classification is as per the type of data handled. For example, multimedia,
spatial data, text data, time-series data, World Wide Web, and so on..
– Classification of data mining frameworks as per the database involved:
This classification based on the data model involved. For example. Object-
oriented database, transactional database, relational database, and so on..
Classification-cont’d
❖ Regression analysis is the data mining process is used to identify and analyze
the relationship between variables because of the presence of the other factor.
❖ It is used to define the probability of the specific variable. Regression, primarily
a form of planning and modeling.
❖ For example, we might use it to project certain costs, depending on other
factors such as availability, consumer demand, and competition.
❖ Primarily it gives the exact relationship between two or more variables in the
given data set.
Association Rules:
❖ This data mining technique helps to discover a link between two or more items.
It finds a hidden pattern in the data set.
– Association rules are if-then statements that support to show the probability of
interactions between data items within large data sets in different types of
databases. Association rule mining has several applications and is commonly
used to help sales correlations in data or medical data sets.
– The way the algorithm works is that you have various data,
– For example, a list of grocery items that you have been buying for the last six
months. It calculates a percentage of items being purchased together.
Outer detection
❖ This type of data mining technique relates to the observation of data items in the
data set, which do not match an expected pattern or expected behavior.
❖ This technique may be used in various domains like intrusion, detection, fraud
detection, etc. It is also known as Outlier Analysis or Outilier mining.
❖ The outlier is a data point that diverges too much from the rest of the dataset.
❖ The majority of the real-world datasets have an outlier. Outlier detection plays a
significant role in the data mining field.
❖ Outlier detection is valuable in numerous fields like network interruption
identification, credit or debit card fraud detection, detecting outlying in wireless
sensor network data, etc.
Sequential Patterns
❖Case study 1
❖Case study 2
Read the given case studies and summarize the finding
from your point of view.