0% found this document useful (0 votes)
4 views34 pages

DM Trend Seminar

Uploaded by

writetoaris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views34 pages

DM Trend Seminar

Uploaded by

writetoaris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

DATA MINING TRENDS

71762133046 – SUDHARSHINI R
71762133048 – SUTHARSHANA S S
CONTENTS

Mining Sequence Data


Mining Other Kinds of Data
Visual and Audio Data Mining
MINING SEQUENCE DATA

A sequence is an ordered list of events.


Sequences may be categorized into three groups, based on the characteristics of the events they describe:
1. Time-series data
2. Symbolic sequence data
3. Biological sequences
In time-series data, sequence data consist of long sequences of numeric data, recorded at equal time
intervals. (Eg: Stock Market Data).
Symbolic sequence data consist of long sequences of event or nominal data, which typically are not
observed at equal time intervals. Here the gaps or the lapses do not matter much. (Eg: Customer Shopping
Sequences).
Biological sequences include DNA and protein sequences. Such sequences are typically very long, and
carry important, complicated, but hidden semantic meaning.
SIMILARITY SEARCH IN TIME-SERIES
DATA
Time-series data consists of numeric values recorded over time intervals, used in various
fields like stock market analysis and scientific experiments.
Unlike typical database queries, similarity searches in time-series data aim to find
sequences resembling a given query sequence, often requiring subsequence matching.
To perform similarity searches efficiently, data reduction techniques like DFT, DWT, and
SVD are applied, mapping data into transformed spaces where significant coefficients
(features) are retained.
Indices can be built on original or transformed data to expedite searches.
Query-based similarity search techniques include normalization, atomic matching, window
stitching, and subsequence ordering.
Software packages facilitate these searches.
SIMILARITY SEARCH IN TIME-SERIES
DATA
Recent research suggests transforming time-series data into symbolic representations,
simplifying similarity search by matching subsequences.
Motifs, frequently occurring patterns, are identified for efficient search mechanisms.
Experiments show comparable search quality to traditional methods like DFT and DWT,
with simplicity and speed.
REGRESSION AND TREND ANALYSIS IN
TIME-SERIES DATA
Trend analysis builds an integrated model using the following four major components or
movements to characterize time-series data:
1. Trend or long-term movements: These indicate the general direction in which a
time-series graph is moving over time. (Eg: Using least squares methods to find trend
curves.)
2. Cyclic movements: These are the long-term oscillations about a trend line or curve.
3. Seasonal variations: These are nearly identical patterns that a time series appears to follow
during corresponding seasons of successive years such as holiday shopping seasons. For
effective trend analysis, the data often need to be “deseasonalized” based on a seasonal
index computed by autocorrelation
REGRESSION AND TREND ANALYSIS IN
TIME-SERIES DATA
4. Random movements: These characterize sporadic changes due to chance events such as
labor disputes or announced personnel changes within companies.

Trend analysis can also be used for time-series forecasting, ARIMA (auto-regressive
integrated moving average), long-memory time-series modeling, and autoregression are
popular methods for such analysis.
SEQUENTIAL PATTERN MINING IN
SYMBOLIC SEQUENCES
Symbolic sequences represent ordered sets of elements or events, found in various
applications like customer shopping sequences and biological sequences.
In bioinformatics, research predominantly focuses on the complex semantic meaning of
biological sequences.
Sequential pattern mining is extensively applied to symbolic sequences.
A sequential pattern is a frequent subsequence within a single sequence or a set of
sequences. It involves finding subsequences that occur frequently.
Algorithms for mining sequential patterns have been developed, including scalable
approaches and methods for mining closed sequential patterns.
SEQUENTIAL PATTERN MINING IN
SYMBOLIC SEQUENCES
User-specified constraints can reduce the search space in sequential pattern mining,
deriving only patterns of interest. This is referred to as constraint-based sequential
pattern mining.
Constraints can be relaxed or additional constraints enforced to derive different kinds of
patterns, such as enforcing gap constraints or deriving periodic sequential patterns.
Partial order patterns can also be mined by relaxing strict sequential ordering requirements.
Sequential pattern mining methodology can be extended to mining trees, lattices, episodes,
and other ordered patterns.
SEQUENCE CLASSIFICATION

Sequence classification methods can be organized into three categories:


(1) feature based classification, which transforms a sequence into a feature vector and then
applies conventional classification methods;
(2) sequence distance–based classification;
(3) model-based classification such as using hidden Markov model (HMM).
For time-series or other numeric-valued data, the feature selection techniques for symbolic
sequences cannot be easily applied to time-series data without discretization.
However, discretization can cause information loss. A recently proposed time-series
shapelets method uses the time-series subsequences that can maximally represent a class as
the features.
ALIGNMENT OF BIOLOGICAL
SEQUENCES
Biological sequence analysis is integral to bioinformatics and modern biology, involving
the comparison, alignment, indexing, and analysis of nucleotide or amino acid sequences.
Sequence alignment is fundamental, based on the evolutionary relationship between
organisms.
Aligning sequences helps identify similarities and homology, aiding in the construction of
phylogenetic trees.
Alignment problems can be pairwise or multiple, with sequences aligning based on
identical symbols (nucleotides) or similar symbols (amino acids).
Local alignments focus on specific regions, while global alignments cover entire
sequences.
ALIGNMENT OF BIOLOGICAL
SEQUENCES
Insertions, deletions, and substitutions occur naturally, represented by substitution matrices.
Scoring mechanisms evaluate alignment quality, with identical or similar symbols
receiving positive scores and gaps receiving negative scores.
The goal is to maximize alignment scores, although finding optimal alignments is
computationally expensive.
Dynamic programming is commonly used for sequence alignment, though heuristic
methods are often employed for efficiency.
BLAST (Basic Local Alignment Search Tool) is a widely used tool in biosequence
analysis, facilitating various analyses and searches.
HIDDEN MARKOV MODEL FOR
BIOLOGICAL SEQUENCE ANALYSIS
Biologists analyze biological sequences by constructing probabilistic models like Markov chains and
hidden Markov models (HMMs).
These models capture the structure and statistical regularities of sequence classes.
In both Markov chains and HMMs, the probability of a state depends only on the previous state, making
them well-suited for biological sequence analysis.
The most common methods for constructing HMMs include the forward algorithm, the Viterbi algorithm,
and the Baum-Welch algorithm.
• The forward algorithm calculates the probability of observing a sequence in the model.
• The Viterbi algorithm finds the most probable path through the model corresponding to the observed
sequence.
• The Baum-Welch algorithm learns or adjusts the model parameters to best explain a set of training
sequences.
These algorithms play crucial roles in analyzing biological sequences, aiding biologists in understanding
the underlying structure and patterns within the data.
MINING OTHER KINDS OF DATA

Various kinds of semi-structured or unstructured data, such as spatiotemporal, multimedia, and


hypertext data, present interesting applications and call for specialized data mining methodologies.
Mining multiple types of data, including
1. Spatial data
2. Spatiotemporal data
3. Cyber-physical system data
4. Multimedia data
5. Text data
6. Web data
7. Data streams, is increasingly crucial in data mining.
MINING SPATIAL DATA

Spatial data mining discovers patterns and knowledge from spatial data. Spatial data, in
many cases, refer to geospace-related data stored in geospatial data repositories.
Recently, large geographic data warehouses have been constructed by integrating thematic
and geographically referenced data from multiple sources. From these, we can construct
spatial data cubes that contain spatial dimensions and measures, and support spatial OLAP
for multidimensional spatial data analysis.
Spatial data mining can be performed on spatial data warehouses, spatial databases, and
other geospatial data repositories.
MINING SPATIOTEMPORAL DATA AND
MOVING OBJECTS
Spatiotemporal data mining is about extracting patterns and knowledge from data that spans both
space and time.
It's vital for understanding phenomena like urban evolution, weather patterns, and natural disasters.
With the rise of GPS, mobile devices, and digital mapping services, this field has grown
immensely.
Moving-object data, such as wildlife telemetry and vehicle GPS data, is a key focus.
This involves discovering relationships among moving objects, identifying movement patterns like
clusters and swarms, and analyzing periodic and trajectory patterns.
In essence, spatiotemporal data mining helps us comprehend complex spatial and temporal
dynamics across various fields, from ecology to transportation and beyond.
MINING CYBER-PHYSICAL SYSTEM
DATA
Cyber-physical systems (CPS) consist of interconnected physical and information components,
forming heterogeneous cyber-physical networks.
Data in CPS are dynamic, noisy, and contain rich spatiotemporal information, vital for real-time
decision-making.
Mining cyber-physical data involves linking current situations with vast information bases,
performing real-time calculations, and providing prompt responses.
Research in this field focuses on rare-event detection, anomaly analysis, reliability,
trustworthiness, effective spatiotemporal data analysis, and integrating stream data mining with
real-time automated control processes.
CPS and networks are expected to be ubiquitous, playing critical roles in modern information
infrastructure.
MINING MULTIMEDIA DATA

Multimedia data mining involves discovering patterns from multimedia databases containing
images, videos, audio, sequences, and hypertext data.
It integrates disciplines like image processing, computer vision, data mining, and pattern
recognition.
Key issues in multimedia data mining include content-based retrieval, similarity search,
generalization, and multidimensional analysis.
Multimedia data cubes incorporate additional dimensions and measures for multimedia
information.
Other topics in multimedia mining include classification, prediction analysis, association mining,
and specific techniques for video and audio data mining.
It's an interdisciplinary field with applications across various domains, facilitating the extraction of
valuable insights from multimedia collections.
MINING TEXT DATA

Text mining is an interdisciplinary field drawing from information retrieval, data mining,
machine learning, statistics, and computational linguistics.
It aims to extract high-quality information from text sources like news articles, emails,
blogs, and web pages.
This is achieved through discovering patterns and trends using statistical pattern learning,
topic modeling, and language modeling.
Typical text mining tasks include categorization, clustering, concept extraction, sentiment
analysis, summarization, and entity-relation modeling.
Other areas include multilingual mining, contextual analysis, and trust analysis.
Text mining finds applications in security, biomedical literature analysis, media analysis,
and customer relationship management.
MINING TEXT DATA

Various software and tools are available for text mining, often utilizing resources like
WordNet, Semantic Web, and Wikipedia to enhance understanding and analysis of text
data.
Overall, text mining plays a crucial role in extracting valuable insights from large volumes
of textual information across diverse domains.
MINING WEB DATA

Web mining is the application of data mining techniques to discover patterns, structures,
and knowledge from the Web.
According to analysis targets, web mining can be organized into three main areas:
1. Web content mining
2. Web structure mining
3. Web usage mining
WEB CONTENT MINING

Web content mining involves analyzing web content, including text, multimedia, and
structured data, to understand web pages' content and provide valuable information for web
search and analysis.
The surface web is indexed by typical search engines, while the deep web consists of
content not accessible through standard searches, often provided by underlying database
engines.
Extensive research has been conducted by academics, search engines, and web service
companies in web content mining.
However, concerns about privacy arise due to the potential disclosure of personal
information through web content mining. Privacy-preserving data mining techniques aim to
address these concerns by developing methods to protect individuals' privacy on the web.
WEB STRUCTURE MINING

Web structure mining is the process of using graph and network mining theory and
methods to analyze the nodes and connection structures on the Web.
It extracts patterns from hyperlinks, where a hyperlink is a structural component that
connects a web page to another location.
It can also mine the document structure within a page (e.g., analyze the treelike structure of
page structures to describe HTML or XML tag usage).
Both kinds of web structure mining help us understand web content.
WEB USAGE MINING

Web usage mining is the process of extracting useful information (e.g., user click streams)
from server logs.
It finds patterns related to general or particular groups of users; understands users’ search
patterns, trends, and associations; and predicts what users are looking for on the Internet.
It helps improve search efficiency and effectiveness, as well as promotes products or
related information to different groups of users at the right time.
Web search companies routinely conduct web usage mining to improve their quality of
service.
MINING DATA STREAMS

Stream data refers to continuously flowing data into a system, characterized by vast
volumes, dynamic changes, potential infinity, and multidimensional features.
Traditional database systems cannot store such data, and most systems can only read the
stream once sequentially, posing significant challenges for effective mining.
Techniques for handling stream data include using sliding windows or tilted time windows
to collect information, along with methods like microclustering, limited aggregation, and
approximation.
Applications of stream data mining span various domains such as real-time anomaly
detection in network traffic, botnets etc…
VISUAL AND AUDIO DATA MINING

Visual data mining leverages data and knowledge visualization techniques to extract
implicit and valuable insights from large datasets.
It harnesses the capabilities of the human visual system, which includes the eyes and the
brain's powerful processing and reasoning capabilities.
This approach effectively combines data visualization and data mining, integrating
techniques from computer graphics, multimedia systems, human-computer interaction,
pattern recognition, and high-performance computing.
In general, data visualization and data mining can be integrated in the following ways:
DATA VISUALIZATION

Data in a database or data warehouse can be viewed at different granularity or abstraction


levels, or as different combinations of attributes or dimensions.
Data can be presented in various visual forms, such as boxplots, 3-D cubes, data
distribution charts etc…
Visual display can help give users a clear impression and overview of the data
characteristics in a large data set.
DATA MINING RESULT VISUALIZATION

Visualization of data mining results is the presentation of the results or knowledge obtained
from data mining in visual forms.
DATA MINING PROCESS
VISUALIZATION
This type of visualization presents the various processes of data mining in visual forms so
that users can see how the data are extracted and from which database or data warehouse
they are extracted, as well as how the selected data are cleaned, integrated, preprocessed,
and mined.
Moreover, it may also show which method is selected for data mining, where the results
are stored, and how they may be viewed.
INTERACTIVE VISUAL DATA MINING

In (interactive) visual data mining, visualization tools can be used in the data mining
process to help users make smart data mining decisions.
AUDIO DATA MINING

Audio data mining uses audio signals to indicate the patterns of data or the features of data
mining results.
Although visual data mining may disclose interesting patterns using graphical displays, it
requires users to concentrate on watching patterns and identifying interesting or novel
features within them.
If patterns can be transformed into sound and music, then instead of watching pictures, we
can listen to pitchs, rhythm, tune, and melody to identify anything interesting or unusual.
Audio data mining is an interesting complement to visual mining.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy