0% found this document useful (0 votes)
4 views42 pages

Unit 5 DWDM

Uploaded by

wooeuwiwieh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views42 pages

Unit 5 DWDM

Uploaded by

wooeuwiwieh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Introduction to Data mining applications

• Data mining: A young discipline with broad and diverse applications


• Many tools have been developed for domain specific applications
• It includes finance ,retail industry,tele communications
• Some application domains
– Data Mining for Financial data analysis
– Data Mining for Retail and
– Data Mining for Telecommunication Industries
– Data Mining for biological data
– Data Mining for scientific applications
– Data Mining for Intrusion Detection and Prevention
Data Mining for Financial Data Analysis (I)

1. Design and construction of data warehouses for multidimensional


data analysis and data mining
2. Loan payment prediction/consumer credit policy analysis
3. Classification and clustering of customers for targeted marketing
4. Classification and clustering of customers for targeted marketing
Data Mining for Financial Data Analysis (I)
• Bank and financial institutions offer a wide range of banking services

• Financial data collected in banks and financial institutions are often

relatively complete, reliable, and of high quality

• Few cases of data mining is as follows

• Design and construction of data warehouses for multidimensional

data analysis and data mining

• DW needs to be constructed

• Data analysis methods has to be applied

• Data characterization,class comparision,otlier analysis play important

roles
3
Data Mining for Financial Data Analysis (I)

• View the debt and revenue changes by month, by region, by sector, and by

other factors

• Access statistical information such as max, min, total, average, trend, etc.

• Loan payment prediction/consumer credit policy analysis

– feature selection and attribute relevance ranking

– Loan payment performance

– Consumer credit rating

– Credit history
Data Mining for Financial Data Analysis (I)
• Classification and clustering of customers for targeted
marketing
– Classification technique is used to identify most crucial
factors that influence customers in decision making
– identify customer groups
– multidimensional segmentation by nearest-neighbor,
classification,
– decision trees,
– associate a new customer to an appropriate customer
group
– Facilitate targeted marketing
Data Mining for Financial Data Analysis (II)
• Detection of money laundering and other financial
crimes
– integration of from multiple DBs (e.g., bank transactions,
federal/state crime history DBs)
– Tools: data visualization, linkage analysis, classification,
clustering tools, outlier analysis, and sequential pattern
analysis tools
– They are used to find unusual access sequences
– They identify more important relationships and
patterns of activities

6
Data Mining for Retail Industry
It is major application in area of data mining
Retail industry: huge amounts of data on sales, customer shopping history, e-
commerce, etc.
Retail data mining can help to
• Identify buying patterns of customers
• Discover customers shopping patterns
• Find associations among customer demographic characteristics
• Predict response to mailing campaigns
• Achieve better customer retention
• Achieve better customer satisfaction
• Reduce cost of business
• Market basket analysis
• Enhance goods consumption ratios
• Design more effective goods transportation and distribution
policies
Data Mining for Retail Industry

Data mining in retail industry is outliend as follows


1. Design and construction of data warehouses
2. Multidimensional analysis of sales, customers, products, time, and
region
3. Analysis of the effectiveness of sales campaigns
4. Customer retention: Analysis of customer loyalty
5. Product recommendation and cross-reference of items
Data Mining for Retail Industry

• Design and construction of data warehouses


• It guides the design and development of DW
• It involves deciding which dimensions to include
• What preprocessing to perform inorder to facilitate effective data
mining
• Multidimensional analysis of sales, customers, products, time,
and region
• It requires timely information regarding customer
needs,sales,trends,fashion,quality cost,profit
• It provides powerfull MD analysis
• It uses visualization tools
• It facilitates analysis on aggregate complex conditions
Data Mining for Retail Industry

• Analysis of the effectiveness of sales campaigns


• It conducts sales campaigns,coupons,various kinds of discounts
• Association analysis may disclose which items are likely to be disclosed
• MD analysis used to perform carefull analysis
• Customer retention: Analysis of customer loyalty
– Use customer loyalty card information to register sequences of
purchases of particular customers
– Use sequential pattern mining to investigate changes in customer
consumption or loyalty
– It helps to retain customers
– It attracks new customers

10
Data Mining for Retail Industry

• Product recommendation and cross-reference of items


• It uses data mining techniques like association rule mining
• It makes personalized product recommendation
• It helps to improve customer service
• It helps in in selecting items
• It increses sales
Data mining in telecommunication industry

• It integrates telecommunication,computer networks,internet


It creates great demand to help the following
• To understand the business involved
• To identify telecommunication patterns
• To catcy fraudlent activities
• To make better use of resources
• To improve quality of service
Data mining in telecommunication industry

• Few scenarios for which data mining may improve telecommunication


industry
MD analysis of telecommunication data
• OLAP tools are used
• Visualization tools are used
• Compares data traffic
• System overload
• Resource usage user group behaviour and profit
• Fraudlent pattern analysis and identification of unusual patterns
• Identify potential fraudlent users
• Detect attempts to gain fraudlent
• Discover unusual patterns
Data mining in telecommunication industry

• MD association and sequential pattern analysis


• association rules help to promote telecommunication services
• sequential pattern analysis also helps to promote
• Mobile telecommunication services
• Data mining plays a major role in design of adaptive solutions
• Usage of visualization tools in telecommunication data analysis
• Tools for OLAP
• Outliers Visualization Are very usefull
Data mining for biological data analysis

• Biological data mining has become essential part of new research field
called bio informatics
• Biological data mining helps to
• Characterize patient behaviour to predict office visits
• Identify successful medical therapies for different illness
• Develop effective genomic and proteomic data analysis
• DNA sequence comprises of 4 building blocks
(adenine ,cytosine,guanine,thymine
• These 4 are combined to form long sequence of chain that resembles
twisted ladder
Data mining for biological data analysis

1. Semantic integration of heterogeneous ,distributed genomic and


protein database:
2. Allignment,indexing,similarity search and comparitive analysis of
multiple nucleoids/protein sequences
3. Discovery of structural patterns and analysis of genetic networks and
protein paths
4. Association and path analysis
5. Visualization tools in gentic data analysis
Data Mining in Science and Engineering
• Data warehouses and data preprocessing
– Resolving inconsistencies or incompatible data collected in diverse
environments and different periods (e.g. eco-system studies)
• Mining complex data types
– Spatiotemporal, biological, diverse semantics and relationships
• Graph-based and network-based mining
– Links, relationships, data flow, etc.
• Visualization tools and domain-specific knowledge
• Other issues
– Data mining in social sciences and social studies: text and social
media
– Data mining in computer science: monitoring systems, software
bugs, network intrusion
17
Data Mining for Intrusion Detection and Prevention
• Majority of intrusion detection and prevention systems use
– Signature-based detection: use signatures, attack patterns that are
preconfigured and predetermined by domain experts
– Anomaly-based detection: build profiles (models of normal
behavior) and detect those that are substantially deviate from the
profiles
• What data mining can help
– New data mining algorithms for intrusion detection
– Association, correlation, and discriminative pattern analysis help
select and build discriminative classifiers
– Analysis of stream data: outlier detection, clustering, model shifting
– Distributed data mining
– Visualization and querying tools
18
Data Mining for Intrusion Detection and Prevention

• New data mining algorithms for intrusion detection


• It Is Used To Detect Misuse detection
• Anaomaly detection models are build
• Normal behaviour is automatically detected
• Significant deviations
1. Association and correlation analysis and aggregation to help select and
build discriminating attributes
2. Analysis of stream data (it is crucial)
3. Distributed data mining(it helps to analyse network data from several
locations)
4. Visualization and querying tools
Trends of Data Mining

• Application exploration: Dealing with application-specific problems


• Scalable and interactive data mining methods
• Integration of data mining with Web search engines, database systems,
data warehouse systems and cloud computing systems
• Mining social and information networks
• Mining spatiotemporal, moving objects and cyber-physical systems
• Mining multimedia, text and web data
• Mining biological and biomedical data
• Data mining with software engineering and system engineering
• Visual and audio data mining
• Distributed data mining and real-time data stream mining
• Privacy protection and information security in data mining
20
Spatial Data Mining
• A spatial database stores a large amount of space-related data, such as
maps, remote sensing or medical imaging data
• It have many features distinguishing them from relational databases.
• It has topological and/or distance information
• Spatial data mining refers to the extraction of knowledge, spatial
relationships
• It discovers spatial relationships between spatial and nonspatial data,
• It have wide applications in geographic information systems,
geomarketing, remote sensing, image database exploration, medical
imaging, navigation, traffic control, environmental studies
• A crucial challenge to spatial data mining is the exploration of efficient
spatial data mining techniques
Spatial Data Mining :close interdependenc

• For example: nature resource,climate, temperature, and economic


situations are likely to be similar in geographically closely located
regions.
• People consider this as the first law of geography: “Everything is
related to everything else, but nearby things are more related than
distant things.”
spatial Data Cube Construction and Spatial OLAP

• “Can we construct a spatial data warehouse?”


• Yes, as with relational data,
• we can construct a data warehouse that facilitates spatial data mining.
• A spatial data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of both spatial and nonspatial data
in support of decision-making processes.
several challenging issues regarding the construction and
utilization of spatial data warehouses

• the integration of spatial data from heterogeneous sources and


systems
• The second challenge is the realization of fast and flexible on-line
analytical processing in spatial data warehouses
• In a spatial warehouse, both dimensions and measures may contain
spatial components.
Three types of dimensions in a spatial data cube
• A nonspatial dimension: It contains only nonspatial data. Nonspatial
dimensions temperature and precipitation can be constructed for the
warehouse
eg:“hot” for temperature and “wet” for precipitation
• A spatial-to-nonspatial dimension :it is a dimension whose primitive-
level data are spatial but whose generalization, starting at a certain
high level, becomes nonspatial.
example: the spatial dimension city relays geographic data for the U.S.
map.
Aspatial-to-spatial dimension :it is a dimension whose primitive level and
all of its highlevel generalized data are spatial.
Example: the dimension equi temperature region contains spatial data, as
do all of its generalizations, such as with regions covering
• 0-5 degrees (Celsius), 5-10 degrees, and so on.
two types of measures in a spatial data cube:
• A numerical measure: it contains only numerical data.
For example, one measure in a spatial data warehouse could be the
monthly revenue of a region, so that a roll-up may compute the total
revenue by year, by county, and so on.
• Numerical measures can be further classified into distributive,
algebraic, and holistic
• A spatial measure: contains a collection of pointers to spatial objects
• the regions with the same range of temperature and precipitation will
be grouped into the same cell
computation of spatial
measures in spatial data cube construction:
• There are three possible choices
• Collect and store the corresponding spatial object pointers but do not perform
precomputation:
• It stores in the corresponding cube cell, a pointer to a collection of spatial object
pointers, and invoking and performing the spatial merge
• This method is a good choice if only spatial display is
• on-line spatial merge computation is fast
• Precompute and store a rough approximation of the spatial measures in the
spatial data cube:
• This choice is good for a rough view or coarse estimation of spatial merge results
• it requires little storage space.
• Selectively precompute some spatial measures in the spatial data cube.:
• This can be a smart choice.
• “Which portion of the cube should be selected for materialization?”
• The selection can be performed at the cuboid level,
Mining Spatial Association and Co-location Patterns
• Similar to the mining of association rules in transactional and
relational databases,
• spatial association rules can be mined in spatial databases.
• A spatial association rule is of the form A->B [s%;c%], where A and B
are sets of spatial or nonspatial predicates,
• s% is the support of the rule, and c%is the confidence of the rule
• Eg: is a(X; “school”)^close to(X; “sports center”))close to(X; “park”)
[0:5%;80%].
• This rule states that 80% of schools that are close to sports centers are
also close to parks, and 0.5% of the data belongs to such a case.
• Since spatial association mining needs to evaluate multiple spatial
relationships among a large number of spatial objects, the process
could be quite costly.
progressive refinement & spatial co-locations

• progressive refinement : it can be adopted in spatial association


analysis. The method first mines large data sets roughly using a fast
algorithm and then improves the quality of mining in data set using a
more expensive algorithm
• spatial co-locations:
• one may like to identify groups of particular features that appear
frequently close to each other in a geospatial map.
• This is essentially the problem of mining spatial co-locations.
• Finding spatial co-locations can be considered as a special case of
mining spatial associations.
Spatial Clustering Methods
• Spatial data clustering identifies clusters, or densely populated
regions, according to some distance measurement in a large,
multidimensional data s

• Spatial classification: you would like to classify regions in a province


into rich versus poor according to the average family income. In doing
so, you would like to identify the important spatial-related factors that
determine a region’s classification

• Spatial trend analysis : it deals with another issue: the detection of


changes and trends along a spatial dimension. Typically, trend analysis
detects changes with time
• changes of temporal patterns in time-series data. Spatial trend analysis
replaces time with space
Mining Raster Databases
• Spatial database systems usually handle vector data that consist of
points, lines, polygons (regions), and their compositions, such as
networks or partitions.
• Examples: a huge amount of space-related data are in digital raster
(image) forms, such as satellite images, remote sensing data
Multimedia Data Mining
• “What is a multimedia database?” A
multimedia database system stores and
manages a
• It is a large collection of multimedia data, such
as audio, video, image, graphics, speech,
text,document, and hypertext data, which
contain text, text markups, and linkages.
Similarity Search in Multimedia Data
• “When searching for similarities in multimedia data, can we search on either the data
description or the data content?”
• we consider two main families
• description-based retrieval systems: which build indices and perform object
retrieval based on image descriptions, such as keywords, captions, size, and time of
creation;
• content-based retrieval systems: support retrieval based on the image content,
such as color histogram, texture, pattern, image topology, and the shape of objects and
their layouts and locations within the image
• Image-sample-based queries :find all of the images that are similar to the given
image sample. This search compares the signature extracted from the sample with the
feature vectors of images that have already been extracted and indexed in the image
database.
• Based on this comparison, images that are close to the sample image are returned.
• Image feature specification queries: specify or sketch image features like color,
texture, or shape, which are translated into a feature vector to be matched with the
feature vectors of the images in the database
Approaches proposed similarity-based retrieval in
image databases, based on image signature
• Color histogram–based signature:
• This method does not contain any information about shape, image
topology, or texture.
• Thus, two images with similar color composition but that contain very
different shapes or textures may be identified
• Multifeature composed signature: In this approach, the signature of an
image includes a composition of multiple features like color histogram,
shape, image topology, and texture. The extracted image features are
stored as metadata,
Approaches proposed similarity-based retrieval in
image databases, based on image signature

Wavelet-based signature: This approach uses the dominant wavelet


coefficients of an image as its signature
• Wavelets capture shape, texture, and image topology information
• in a single unified framework.
• This improves efficiency
Wavelet-based signature with region-based granularity: In this
approach, the computation and comparison of signatures are at the
granularity of regions, not the entire image.
Multidimensional Analysis of Multimedia Data
• A multimedia data cube can contain additional dimensions and measures for
multimedia information, such as color, texture, and shape.
• MultiMediaMiner system is constructed as follows.
• Each image contains two descriptors: a feature descriptor and a layout
descriptor.
• The original image is not stored directly in the database; only its descriptors
are stored
• The feature descriptor is a set of vectors
• color vector containing the color histogram quantized to 512 colors
• MFC(Most Frequent Color) vector & MFO(Most Frequent
• Orientation) vector. The MFC and MFO contain five color centroids and five
edge orientation centroids for the five most frequent colors and five most
frequent orientations,
• respectively.
• The edge orientations used are 0, 22:5, 45, 67:5, 90,
A multimedia data cube dimensions.
• Image Excavator : component of MultiMediaMiner uses image contextual
information, like HTML tags in Web pages, to derive keywords
• A multimedia data cube can have many dimensions.
• the size of the image or video in bytes
• the width and height of the frames (or pictures)
• the date on which the image or video was created (or last modified);
• the format type of the image or video
• the frame sequence duration in seconds;
• the image or video Internet domain
• the Internet domain of pages referencing the
• image or video (parent URL)
• the keywords
• a color dimension
• an edge-orientation dimension;
Classification and Prediction
Analysis of Multimedia Data
• Classification and predictive modeling have been used for
mining multimedia data, especially in scientific research,
such as astronomy, seismology, and geo scientific research.
• Data preprocessing is important when mining image data
and can include data
• cleaning, data transformation, and feature extraction.
Standard methods used in pattern recognition, such as
edge detection
• The popular use of the World Wide Web has made the Web
a rich and gigantic repository of multimedia data
Mining Associations in Multimedia Data
Three categories can be observed:
• Associations between image content and nonimage content
features: A rule like “If at least 50% of the upper part of the picture is
blue, then it is likely to represent sky” belongs to this category since it
links the image content to the keyword sky.
• Associations among image contents that are not related to spatial
relationships: A rule like “If a picture contains two blue squares, then it
is likely to contain one red circle aswell” belongs to this category since
the associations are all regarding image contents.
• Associations among image contents related to spatial
relationships: A rule like “If a red triangle is between two yellow
squares, then it is likely a big oval-shaped object is underneath” belongs
to this category since it associates objects in the image with spatial
relationships.
Audio and Video Data Mining
• Besides still images, an incommensurable amount of audiovisual
information is becoming available in digital form
• set of standards are there for multimedia information description and
compression.
• For example, MPEG-k (developed by MPEG: Moving Picture Experts
Group) and JPEG are typical video compression schemes.
• The most recently released MPEG-7, formally named “Multimedia
Content Description Interface,” is a standard for describing the
multimedia content data.
• There are still a lot of research issues

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy