Updated_Predictive_Analytics_and_Data_Mining_Notes
Updated_Predictive_Analytics_and_Data_Mining_Notes
Predictive Analytics
Definition
Predictive analytics involves using historical data, statistical modeling, data mining techniques,
and machine learning to make predictions about future outcomes. It helps identify relationships
between datasets and generates forecasts for business decision-making.
Framework
The predictive analytics process involves the following steps:
Techniques
Predictive analytics employs various techniques:
- Regression Models: Estimate relationships between variables (e.g., product features and
sales).
- Classification Models: Categorize data into predefined groups (e.g., fraud detection).
- Clustering Models: Group data by shared attributes (e.g., customer segmentation).
- Time-Series Models: Analyze data over time to predict trends (e.g., seasonal sales).
Characteristics
- Utilizes historical data for training models.
- Involves statistical and machine learning algorithms.
- Focuses on future predictions.
- Supports strategic decision-making.
Privacy Considerations
Key ethical and privacy considerations in predictive analytics include:
Data Mining
Definition
Data mining is the process of discovering patterns and extracting valuable insights from large
datasets using statistical and machine learning techniques.
Processes
The data mining process includes the following steps:
6. 1. Data Gathering: Collect relevant data from various sources like warehouses.
7. 2. Data Preparation: Clean and transform data to ensure quality and consistency.
8. 3. Data Mining: Apply algorithms to uncover patterns, correlations, and trends.
9. 4. Data Analysis and Interpretation: Develop analytical models to inform decision-making.
Techniques
Common data mining techniques include:
- Association Rule (Market Basket Analysis): Finds relationships between variables (e.g., co-
purchased products).
- Classification: Groups data into predefined categories (e.g., product types).
- Clustering: Groups similar items based on shared attributes (e.g., demographics).
- Decision Trees: Predict outcomes by structuring criteria hierarchically.
- K-Nearest Neighbor (KNN): Classifies data by proximity to other points.
- Neural Networks: Identifies complex patterns through interconnected nodes.
Importance
Data mining helps organizations understand trends, derive insights, and make informed strategic
decisions.
Introduction to Text Analytics and Text Mining
Text mining involves transforming natural language into a format that machines can manipulate,
store, and analyze. It uses natural language processing techniques to extract useful information
from unstructured text data.
Named Entity Recognition (NER): Identifies specific entities like names or dates.
Sentiment Analysis: Determines the emotional tone in text (positive, negative, neutral).
Tokenization: Breaks down text into individual words or tokens for analysis.
Sentiment Analysis
This process analyzes text to determine the emotional tone conveyed in messages. Companies use
sentiment analysis insights to improve customer service and brand reputation.
Prescriptive analytics builds upon descriptive and predictive analytics, providing options to solve
future risks. Key foundations include:
This approach uses mathematical and statistical techniques for decision-making under
uncertainty. Components include:
Markov Models: Describes system behavior over time for long-term impact evaluation.
- Enhanced Decision-Making: Both data mining and warehousing provide valuable insights that
support informed business decisions.
- Improved Customer Insights: Organizations can better understand customer behaviors and
preferences through analysis.
- Data Integration: Data warehousing consolidates data from various sources, making it easier
to analyze and mine for insights.
- Data mining specifically focuses on discovering patterns and extracting insights from large
datasets using statistical methods and machine learning techniques. In contrast, other analytical
tools may focus on descriptive analytics (summarizing historical data) or prescriptive analytics
(providing recommendations based on predictive models). Data mining is more about uncovering
hidden relationships within the data rather than just analyzing or visualizing it.
5. Why Do We Need Data Preprocessing and What Are the Main Tasks?
- Data preprocessing is essential because raw data often contains noise, inconsistencies, or
missing values that can adversely affect analysis outcomes. The main tasks in data preprocessing
include:
- Data Transformation: Converting data into a suitable format or structure for analysis (e.g.,
normalization).
- Data Reduction: Reducing the volume of data while maintaining its integrity (e.g., feature
selection).
- Data Integration: Combining data from multiple sources into a coherent dataset.
- Predictive analytics focuses on forecasting future outcomes based on historical data using
statistical models and machine learning techniques. In contrast, prescriptive analytics goes a step
further by recommending actions to achieve desired outcomes based on predictions. While
predictive analytics answers "what might happen," prescriptive analytics answers "what should
we do about it?"
9. What Are the Consequences of Having a Loose Data Protection and Ethical Policy?
- A loose data protection and ethical policy can lead to severe consequences including:
- Legal Repercussions: Potential fines and penalties for non-compliance with regulations.
- Loss of Customer Trust: Erosion of customer confidence can lead to decreased business.
- Reputational Damage: Negative publicity resulting from mishandling of data can harm brand
image.