Data mining involves techniques to extract patterns and actionable information from large databases, enabling future predictions unlike traditional OLAP tools focused on historical data. Key operations include predictive modeling, classification, and database segmentation, which help in identifying behaviors, clusters, and associations within data. Applications range from stock market predictions to fraud detection and customer behavior analysis.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views17 pages
IDW Lecture 32-Data Mining Techniques
Data mining involves techniques to extract patterns and actionable information from large databases, enabling future predictions unlike traditional OLAP tools focused on historical data. Key operations include predictive modeling, classification, and database segmentation, which help in identifying behaviors, clusters, and associations within data. Applications range from stock market predictions to fraud detection and customer behavior analysis.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17
Data Mining
Techniques to identify patterns from data
stored in warehouse. These patterns cannot be identified by simple query and reporting tools. OLAP is used for past historical data analysis. Data Mining is used for future predictions from data. Such as prediction of pattern of a share in stock market for future. Future weather forecasts etc. Data Mining: Formal Definition The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. Focus of data mining is to reveal that information which is hidden and unexpected. Data Mining Applications Data Mining Operations and associated techniques Predictive Modeling Similar to human learning experience. Analyze a database to identify a data set Model uses supervised learning approach. Training Testing Training uses a large historical sample data which is called training data set. Testing is trying out model on unseen data to check its acceptance. Techniques associated with Predictive Modeling Classification Value prediction Classification is used to establish a pre determined class for each record in database from a finite set of classes. Two specialization of classification Tree induction Neural Induction Tree induction Neural Induction Value Prediction Estimate a continuous numeric value associated with a database record. Uses linear regression Identification of behaviors different from normal behavior of data for available patterns. May be used for identification of activities such as credit card fraud detection. Database Segmentation Partition a database in unknown number of segments or clusters of similar records. Used to identify clusters. Clusters are dense inside and sparse outside Scatter plots are used for database segmentation. Example: Identification of legal and forged bank notes. Identification of clusters on basis of some dimensional data: color, features of note, quality of ink,….. Database Segmentation Link Analysis Establishing links/associations between individual records or set of records in database. Three specializations: Associations discovery. Sequential pattern discovery. similar time sequence discovery Associations discovery Find items that imply presence of other items in same event. Example: ‘when a customer rents property for more than two years and is more than 25 years old, in 40% of cases, the customer will buy a property. This association happens in 35% of all customers who rent properties.’ Sequential Pattern Discovery Presence of one item is followed by presence of other item in a sequence in pattern. Understanding long term customer buying behavior. Similar Time sequence discovery Discovery of links between two sets of data that are time dependent. Example: within three months of buying property, new home owners will purchase goods such as cookers, freezers, and washing machines. Deviation Detection New technique. Visualization of data in three dimensions. 3d visualization provides better view of deviation occurring in data. Difference between forged and legal currency notes can be identified if this information is shown in three dimensional format. Deviation Detection