0% found this document useful (0 votes)

51 views127 pages

3-Preparing The Data-10-01-2024

The document discusses data preprocessing techniques including data cleaning, integration, transformation, and reduction. It also covers why preprocessing is important, different attribute value types, discrete vs continuous attributes, and handling missing values in data.

Uploaded by

zerohero.pvg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views127 pages

3-Preparing The Data-10-01-2024

Uploaded by

zerohero.pvg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 127

MDI4001 Machine Learning for Data Science

Module 1: Preparing the data

Dr. Sunil Kumar

Assistant Professor

School of Computer Science and Engineering

Vellore Institute of Technology
Vellore, Tamil Nadu
India

January 10, 2024

Data Preprocessing

• Raw data is noisy, incomplete and inconsistent. Data preprocess-

ing is required to make sense of the data.
• Techniques:
• Data Cleaning
• Data Integration
• Data Transformation
• Normalization (Standardization)
• Aggregation
• Discretization
• Data Reduction
• Feature subset selection
• Distance/Similarity Calculation
• Dimensionality Reduction
• Sampling
Why is data prepocessing important?

▶ It improves accuracy and reliability.

Why is data prepocessing important?

▶ It improves accuracy and reliability.

▶ It makes data consistent.
Why is data prepocessing important?

▶ It improves accuracy and reliability.

▶ It makes data consistent.
▶ Reduces the risk of overfitting.
Why is data prepocessing important?

▶ It improves accuracy and reliability.

▶ It makes data consistent.
▶ Reduces the risk of overfitting.
▶ Saves time and effort in modeling.
Data

Data
Data

Data → Data objects

Data

Data → Data objects → Attribute

Data

Data → Data objects → Attribute → Attribute Values

Attributes Values

▶ Nominal/ Categorical
Attributes Values

▶ Nominal/ Categorical
Examples: ID numbers, eye color, zip codes
Attributes Values

▶ Nominal/ Categorical
Examples: ID numbers, eye color, zip codes
▶ Ordinal
Attributes Values

▶ Nominal/ Categorical
Examples: ID numbers, eye color, zip codes
▶ Ordinal
Examples: rankings (e.g., taste of potato chips on a scale
from 1-10), grades, height in tall, medium, short
▶ Interval
Attributes Values

▶ Discrete Attribute
Discrete and Continuous Attributes

▶ Discrete Attribute
▶ Has only a finite or countable infinite set of values
Discrete and Continuous Attributes

▶ Discrete Attribute
▶ Has only a finite or countable infinite set of values
▶ Examples: zip codes, counts, or the set of words in a collection of
documents
Discrete and Continuous Attributes

▶ Discrete Attribute
▶ Has only a finite or countable infinite set of values
▶ Examples: zip codes, counts, or the set of words in a collection of
documents
▶ Often represented as integer variables.
▶ Continuous Attribute
▶ Has real numbers as attribute values
Discrete and Continuous Attributes

Data quality, include many factors:

▶ accuracy
▶ completeness
▶ consistency
▶ timeliness
▶ believability/trust
▶ interpretability.
Data quality problems

▶ Noise and outliers

▶ Noise refers to modification of original values
Data quality problems

▶ Noise and outliers

▶ Noise refers to modification of original values
▶ Missing values
Data quality problems

▶ Noise and outliers

▶ Noise refers to modification of original values
▶ Missing values
▶ Duplicate data
Data Quality: Missing Values

▶ Reasons for missing values

Data Quality: Missing Values

▶ Reasons for missing values

▶ Information is not collected
Data Quality: Missing Values

▶ Reasons for missing values

▶ Information is not collected
▶ Attributes may not be applicable to all cases -
Data Quality: Missing Values

▶ Reasons for missing values

▶ Information is not collected
▶ Attributes may not be applicable to all cases - (e.g., annual income
is not applicable to children)
▶ Handling missing values
Data Quality: Missing Values

▶ Reasons for missing values

▶ Information is not collected
▶ Attributes may not be applicable to all cases - (e.g., annual income
is not applicable to children)
▶ Handling missing values
▶ Ignore the tuple
Data Quality: Missing Values

▶ Reasons for missing values

▶ Information is not collected
▶ Attributes may not be applicable to all cases - (e.g., annual income
is not applicable to children)
▶ Handling missing values
▶ Ignore the tuple
▶ Fill in the missing value manually
▶ Use a global constant to fill in the missing value
▶ Use a measure of central tendency for the attribute (e.g., the mean
or median) to fill in the missing value
Data Quality: Missing Values

▶ Reasons for missing values

Tid Refund Marital Status Taxable Income Cheat

1 Yes Single 125K No
2 No Maried 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 10000K Yes
6 No NULL 60K No
7 Yes Divorced 220K NULL
8 No Single 85K Yes
9 No Married 90K No
9 No Single 90K No
Data Quality: Outliers
Outliers are data objects with characteristics that are considerably
different than most of the other data objects in the data set.
Data Quality: Outliers
Outliers are data objects with characteristics that are considerably
different than most of the other data objects in the data set.
Data Quality: Handle Noise

Data smoothing techniques:

▶ Binning
Data Quality: Handle Noise

Data smoothing techniques:

▶ Binning
▶ smoothing by bin means
Data Quality: Handle Noise

Data smoothing techniques:

▶ Binning
▶ smoothing by bin means
▶ smoothing by bin medians
Data Quality: Handle Noise

Data smoothing techniques:

▶ Binning
▶ smoothing by bin means
▶ smoothing by bin medians
▶ smoothing by bin boundaries
Data Quality: Handle Noise

Data smoothing techniques:

▶ Binning
▶ smoothing by bin means
▶ smoothing by bin medians
▶ smoothing by bin boundaries
▶ Regression: smooth by fitting a regression function
Data Quality: Handle Noise

Data smoothing techniques:

▶ Binning
▶ smoothing by bin means
▶ smoothing by bin medians
▶ smoothing by bin boundaries
▶ Regression: smooth by fitting a regression function
▶ Clustering: detect and remove outliers
Data Quality: Handle Noise(Binning)

▶ Sorted data for price (in dollars): 4, 8, 15, 21, 21, 24, 25, 28, 34
Data Quality: Handle Noise(Binning)

▶ Sorted data for price (in dollars): 4, 8, 15, 21, 21, 24, 25, 28, 34
▶ Partition into (equal-frequency) bins:
Bin 1: 4, 8, 15
Bin 2: 21, 21, 24
Bin 3: 25, 28, 34
Data Quality: Handle Noise(Binning)

▶ Sorted data for price (in dollars): 4, 8, 15, 21, 21, 24, 25, 28, 34
▶ Partition into (equal-frequency) bins:
Bin 1: 4, 8, 15
Bin 2: 21, 21, 24
Bin 3: 25, 28, 34
Smoothing by bin means:
Bin 1: 9, 9, 9
Bin 2: 22, 22, 22
Bin 3: 29, 29, 29
Smoothing by bin boundaries:
Bin 1: 4, 4, 15
Bin 2: 21, 21, 24
Bin 3: 25, 25, 34
Data Quality: Handle Noise(Regression)
▶ Replace noisy or missing values by predicted values
Data Quality: Handle Noise(Regression)
▶ Replace noisy or missing values by predicted values
▶ Requires model of attribute dependencies
Data Quality: Handle Noise(Regression)
▶ Replace noisy or missing values by predicted values
▶ Requires model of attribute dependencies
▶ Can be used for data smoothing or for handling missing data
Data Quality: Handle Noise(Regression)
▶ Replace noisy or missing values by predicted values
▶ Requires model of attribute dependencies
▶ Can be used for data smoothing or for handling missing data
Data Transformation

Data transformation refers to the process of converting raw data into

a format that is suitable for analysis and modeling.
Data Transformation

Data transformation refers to the process of converting raw data into

a format that is suitable for analysis and modeling.
▶ Smoothing
Data Transformation

Data transformation refers to the process of converting raw data into

a format that is suitable for analysis and modeling.
▶ Smoothing
▶ Normalization
Data Transformation

Data transformation refers to the process of converting raw data into

a format that is suitable for analysis and modeling.
▶ Smoothing
▶ Normalization
▶ Aggregation
Data Transformation

Data transformation refers to the process of converting raw data into

a format that is suitable for analysis and modeling.
▶ Smoothing
▶ Normalization
▶ Aggregation
▶ Discretization
Data Transformation

Data transformation refers to the process of converting raw data into

a format that is suitable for analysis and modeling.
▶ Smoothing
▶ Normalization
▶ Aggregation
▶ Discretization
▶ Sampling
Data Transformation

Data transformation refers to the process of converting raw data into

a format that is suitable for analysis and modeling.
▶ Smoothing
▶ Normalization
▶ Aggregation
▶ Discretization
▶ Sampling
▶ Generalization
Data Transformation: Normalization

Data normalization involves converting all data variables into a given

range.
Data Transformation: Normalization

Data normalization involves converting all data variables into a given

range.
▶ Recalculating the values for better comparison
Data Transformation: Normalization

Data normalization involves converting all data variables into a given

range.
▶ Recalculating the values for better comparison
▶ Ensure consistent units (monetary, measurements, temperature):
Data Transformation: Normalization

Data normalization involves converting all data variables into a given

range.
▶ Recalculating the values for better comparison
▶ Ensure consistent units (monetary, measurements, temperature):
▶ Metric, British, American weights, lengths
Data Transformation: Normalization

Data normalization involves converting all data variables into a given

range.
▶ Recalculating the values for better comparison
▶ Ensure consistent units (monetary, measurements, temperature):
▶ Metric, British, American weights, lengths
▶ currency–use common unit (Euro, USD)
▶ currency adjusted for inflation–value of money is not the same as
10 years ago
Data Transformation: Normalization

Data normalization involves converting all data variables into a given

Techniques that are used for normalization are:

▶ Min-Max Normalization:
▶ This transforms the original data linearly.
Data Transformation: Normalization

Techniques that are used for normalization are:

▶ Min-Max Normalization:
▶ This transforms the original data linearly.
▶ Suppose that: minA is the minima and maxA is the maxima of an
attribute A
▶ Where vi is the value you want to plot in the new range.
▶ v ′ is the new value you get after normalizing the old value.
▶ The min-max normalization would map Vi to the Vi′ in a new
smaller range [newminA , newmaxA ].
Min-Max Normalization:

V = 73600, minA = 1200, maxA = 9800

Min-Max Normalization:

V = 73600, minA = 1200, maxA = 9800

▶ Often the desired scale range is [0,1]

Min-Max Normalization:

V = 73600, minA = 1200, maxA = 9800

▶ Often the desired scale range is [0,1]

▶ Works when you know the limits (minimum and maximum) of the
original values.
Z-score Normalization:
▶ values of an attribute (A), are normalized based on the mean of
A and its standard deviation.
Z-score Normalization:
▶ values of an attribute (A), are normalized based on the mean of
A and its standard deviation.
▶ used when minimums and maximums are not known (i.e., expect
data in the future but want consistency).
Z-score Normalization:
▶ values of an attribute (A), are normalized based on the mean of
A and its standard deviation.
▶ used when minimums and maximums are not known (i.e., expect
data in the future but want consistency).
▶ Works when you know the limits (minimum and maximum) of the
original values.

▶ Here Ā and σA are the mean and standard deviation for attribute
A.
Z-score Normalization:
▶ values of an attribute (A), are normalized based on the mean of
A and its standard deviation.
▶ used when minimums and maximums are not known (i.e., expect
data in the future but want consistency).
▶ Works when you know the limits (minimum and maximum) of the
original values.

▶ Here Ā and σA are the mean and standard deviation for attribute
A.

▶ all values won’t be between -1 and 1 but most will be

Z-score Normalization:
▶ values of an attribute (A), are normalized based on the mean of
A and its standard deviation.
▶ used when minimums and maximums are not known (i.e., expect
data in the future but want consistency).
▶ Works when you know the limits (minimum and maximum) of the
original values.

▶ Here Ā and σA are the mean and standard deviation for attribute
A.

▶ all values won’t be between -1 and 1 but most will be

▶ the average of the scaled values should be near 0
Normalization: Decimal Scaling:
▶ It normalizes the values of an attribute by changing the position
of their decimal points
Normalization: Decimal Scaling:
▶ It normalizes the values of an attribute by changing the position
of their decimal points
▶ divide by a constant that brings all values into the acceptable
range
Normalization: Decimal Scaling:
▶ It normalizes the values of an attribute by changing the position
of their decimal points
▶ divide by a constant that brings all values into the acceptable
range
▶ The number of points by which the decimal point is moved can
be determined by the absolute maximum value of attribute A.
Normalization: Decimal Scaling:
▶ It normalizes the values of an attribute by changing the position
of their decimal points
▶ divide by a constant that brings all values into the acceptable
range
▶ The number of points by which the decimal point is moved can
be determined by the absolute maximum value of attribute A.
▶ e.g. if we have a range of values -40 to 120, then dividing by 120,
we’ll have values between -1 and 1. or add -40 and divide by 80
to map -40 to -1 and 120 to 1.

Where j is the smallest integer such that Max(|Vi |)<1

Data Aggregation

Combining two or more attributes (or objects) into a single attribute

(or object)
Purpose:
▶ Data reduction
▶ Reduce the number of attributes or objects
▶ Results in simpler models
▶ Faster computation of the models
▶ Change of scale
▶ Cities aggregated into regions, states, countries, etc.
▶ Days aggregated into weeks, months, or years
▶ More “stable” data
▶ Aggregated data tends to have less variability
Data Aggregation
▶ Temporal aggregation: summarizing data over intervals of time,
such as hours, days, weeks, or months. It is useful for identifying
trends and patterns in time series data.
Data Aggregation
▶ Temporal aggregation: summarizing data over intervals of time,
such as hours, days, weeks, or months. It is useful for identifying
trends and patterns in time series data.
▶ Spatial aggregation: summarizing data according to spatial
criteria such as geographical area, postal code or IP address. It
is useful for analyzing location-based data, such as customer
demographics, sales territories, or traffic patterns.
Data Aggregation
▶ Temporal aggregation: summarizing data over intervals of time,
such as hours, days, weeks, or months. It is useful for identifying
trends and patterns in time series data.
▶ Spatial aggregation: summarizing data according to spatial
criteria such as geographical area, postal code or IP address. It
is useful for analyzing location-based data, such as customer
demographics, sales territories, or traffic patterns.
▶ Attribute Aggregation: summarizing data based on specific
attributes or categories, such as product category, customer
segment, or user role. It is useful for identifying patterns and
trends in categorical data.
Data Aggregation
▶ Temporal aggregation: summarizing data over intervals of time,
such as hours, days, weeks, or months. It is useful for identifying
trends and patterns in time series data.
▶ Spatial aggregation: summarizing data according to spatial
criteria such as geographical area, postal code or IP address. It
is useful for analyzing location-based data, such as customer
demographics, sales territories, or traffic patterns.
▶ Attribute Aggregation: summarizing data based on specific
attributes or categories, such as product category, customer
segment, or user role. It is useful for identifying patterns and
trends in categorical data.
▶ Hierarchical aggregation: summarizing data at different levels
of the hierarchy, such as organization level, product hierarchy, or
geographic hierarchy. It is useful for analyzing data that has a
natural hierarchy.
Data Aggregation
▶ Temporal aggregation: summarizing data over intervals of time,
such as hours, days, weeks, or months. It is useful for identifying
trends and patterns in time series data.
▶ Spatial aggregation: summarizing data according to spatial
criteria such as geographical area, postal code or IP address. It
is useful for analyzing location-based data, such as customer
demographics, sales territories, or traffic patterns.
▶ Attribute Aggregation: summarizing data based on specific
attributes or categories, such as product category, customer
segment, or user role. It is useful for identifying patterns and
trends in categorical data.
▶ Hierarchical aggregation: summarizing data at different levels
of the hierarchy, such as organization level, product hierarchy, or
geographic hierarchy. It is useful for analyzing data that has a
natural hierarchy.
▶ Statistical aggregation: summarizing data using a statistical
measure such as mean, median, mode, standard deviation, or
percentile. It is useful for analyzing numerical data and
identifying outliers and anomalies.
Data Aggregation
Example: Australia precipitation standard deviation

▶ The left histogram shows the standard deviation of average

monthly precipitation.
▶ The right histogram shows the standard deviation of the average
yearly precipitation for the same locations.
▶ The average yearly precipitation has less variability than the
average monthly precipitation.
Data Discretization

Raw values of numeric attribute are replaced by interval labels.

Purpose
▶ Some ML algorithms only accept discrete attributes
Data Discretization

Raw values of numeric attribute are replaced by interval labels.

Purpose
▶ Some ML algorithms only accept discrete attributes
▶ May improve understandability of patterns
Data Discretization

Raw values of numeric attribute are replaced by interval labels.

Purpose
▶ Some ML algorithms only accept discrete attributes
▶ May improve understandability of patterns
▶ For example, the values for the age attribute can be replaced by
the interval labels such as (0-10, 11-20. . . ) or (kid, youth, adult,
senior).
Methods:
▶ Using Binning
Data Discretization

Raw values of numeric attribute are replaced by interval labels.

▶ using a decision tree to identify the optimal splitting points that

would determine the bins or contiguous intervals:
Data Discretization: Decision Tree

▶ using a decision tree to identify the optimal splitting points that

would determine the bins or contiguous intervals:
▶ A decision tree evaluates all possible values of a feature and
selects the cut-point that maximizes the class separation by
utilizing a performance metric like the entropy or Gini impurity.
▶ Then it repeats the process for each node of the first data
separation and for each node of the subsequent data splits, until
a certain stopping criteria is reached.
Data Sampling:

▶ Data may be Big Data

▶ data reduction technique
▶ allows a large data set to be represented by a much smaller
random sample
Data Sampling:

The key principle for effective sampling:

▶ Using a sample will work almost as well as using the entire data
sets, if the sample is representative
Data Sampling:

The key principle for effective sampling:

▶ Using a sample will work almost as well as using the entire data
sets, if the sample is representative
▶ A sample is representative if it has approximately the same
property (of interest) as the original set of data
▶ Otherwise we say that the sample introduces some bias
Data Sampling:

The key principle for effective sampling:

A sample is representative if it has approximately the same properties

(of interest) as the original set of data.
Data Sampling:

Types of Sampling
▶ Sampling without replacement
Data Sampling:

Types of Sampling
▶ Sampling without replacement
▶ Sampling with replacement
Data Sampling:

Types of Sampling
▶ Sampling without replacement
▶ Sampling with replacement
▶ Stratified sampling
Data Sampling:

Types of Sampling
▶ Sampling without replacement
▶ Sampling with replacement
▶ Stratified sampling
▶ Split the data into several partitions (strata), then draw random
samples from each Partition
Feature Subset Selection:

Redundant features
▶ Duplicate much or all of the information contained in one or more
other attributes
Feature Subset Selection:

Redundant features
▶ Duplicate much or all of the information contained in one or more
other attributes
▶ Example: purchase price of a product and the amount of sales
tax paid
Irrelevant features
▶ Contain no information that is useful for the data mining task at
hand
Feature Subset Selection:

▶ There are 2d possible sub-features of d features

▶ Several heuristic feature selection methods:
▶ Best single features under the feature independence assumption:
choose by significance tests (Information Gain, Entropy).
▶ Step-wise forward selection:
▶ The best single-feature is picked first
▶ Then next best feature condition to the first,
▶ Step-wise backward elimination:
▶ Repeatedly eliminate the worst feature
▶ Combined forward selection and backward elimination:
Heuristic Feature Selection Methods:

Decision tree induction:

▶ Decision tree induction constructs a flowchart-like structure
▶ where each internal (nonleaf) node denotes a test on an
attribute, each branch corresponds to an outcome of the best,
and each external (leaf) node denotes a class prediction.
▶ At each node, the algorithm chooses the “best” attribute to
partition the data in-to individual classes.
▶ a tree is constructed from the given data.
▶ all attributes that do not appear in the tree are assumed to be
irrelevant.
Decision tree induction:

Decision tree induction:

The weather data example.

Decision tree induction:
Attribute Creation (Feature Generation):

▶ Create new attributes (features) that can capture the important

information in a data set more effectively than the original ones
▶ Improve accuracy
▶ Understanding of structure of high-dimensional data
▶ For example, add the attribute area based on the attributes
height and width.
Attribute Creation (Feature Generation):

Three general methodologies

▶ Attribute extraction
▶ Domain-specific
▶ Example: extracting edges from images
▶ Mapping data to new space
▶ E.g., Fourier transformation, wavelet transformation, manifold
approaches
▶ Attribute construction
▶ Combining features
▶ Data discretization
▶ Example: dividing mass by volume to get density
Attribute Creation (Feature Generation):

Three general methodologies

▶ Data Quality
▶ Data Quality Problems
▶ Data Cleaning
▶ Data Transformation
▶ Data Reduction
References

1. T. Dasu and T. Johnson. Exploratory Data Mining and Data

Cleaning. John Wiley & Sons, 2003
2. https://builtin.com/data-science/step-step-explanation-principal-
component-analysis
3. Ethem Alpaydin, Introduction to Machine Learning, Fourth
Edition, MIT Press, 2020
4. Hadley Wickham, Garrett Grolemund, R for data science :
Import, Tidy, Transform, Visualize, And Model Data Paperback,
2017
5. Han, J., Kamber, M., Pei, J. Data mining concepts and
techniques. Morgan Kaufmann. 2011
6. Carl Shan, Henry Wang, William Chen, Max Song. The Data
Science Handbook: Advice and Proceedings of the Insight from
25 Amazing Data Scientists. The Data Science Bookshelf. 2016
7. James, G., Witten, D., T., Tibshirani, R. An Introduction to
statistical learning with applications in R. Springer. 2013

Boxwell Exam Example
100% (4)
Boxwell Exam Example
8 pages
Topics To Be Covered
No ratings yet
Topics To Be Covered
58 pages
Lec06 7 Feature Engineering 08112022 100115am
No ratings yet
Lec06 7 Feature Engineering 08112022 100115am
44 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Full
No ratings yet
Full
367 pages
Lec01 Dataprep
No ratings yet
Lec01 Dataprep
67 pages
Chap2 Data
No ratings yet
Chap2 Data
87 pages
CIS62283 02 PreProcessing
100% (1)
CIS62283 02 PreProcessing
51 pages
Chapter - 3 Data Pre - Processing
No ratings yet
Chapter - 3 Data Pre - Processing
54 pages
Data
No ratings yet
Data
84 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Unit I
No ratings yet
Unit I
57 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
Unit 1 - IDS
No ratings yet
Unit 1 - IDS
49 pages
Preprocessing 1
No ratings yet
Preprocessing 1
11 pages
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
57 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
100 pages
How To Work On Data You Haev
No ratings yet
How To Work On Data You Haev
40 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
Wk. 3. Data (12-05-2021)
No ratings yet
Wk. 3. Data (12-05-2021)
57 pages
Data - Part 1
No ratings yet
Data - Part 1
58 pages
Lecture 2
No ratings yet
Lecture 2
30 pages
IB Math HL Calculus Questions
97% (29)
IB Math HL Calculus Questions
21 pages
Data Mining Lecture2-2
No ratings yet
Data Mining Lecture2-2
29 pages
Week 2 - Data Quality
No ratings yet
Week 2 - Data Quality
43 pages
Basic Data Mining Techniques: Attributes
No ratings yet
Basic Data Mining Techniques: Attributes
12 pages
Unit 4
No ratings yet
Unit 4
66 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
UNIT02
No ratings yet
UNIT02
41 pages
Lect 2 DM Converted 1
No ratings yet
Lect 2 DM Converted 1
29 pages
DMML
No ratings yet
DMML
65 pages
Data Warehousing and Mining: Dr. Hossen Asiful Mustafa
No ratings yet
Data Warehousing and Mining: Dr. Hossen Asiful Mustafa
49 pages
Penggalian Data & Analitika Bisnis: Faculties Teknologi Informasi - ITS
No ratings yet
Penggalian Data & Analitika Bisnis: Faculties Teknologi Informasi - ITS
69 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
DM Preprocessing Lec4,5
No ratings yet
DM Preprocessing Lec4,5
36 pages
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
No ratings yet
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
57 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
Data Preprocessing For Clustering
No ratings yet
Data Preprocessing For Clustering
40 pages
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
No ratings yet
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
85 pages
DWDM Unit 1 Chap2 PDF
No ratings yet
DWDM Unit 1 Chap2 PDF
21 pages
Unit - 1 Data Preprocessing
No ratings yet
Unit - 1 Data Preprocessing
66 pages
3-Data Pre-Processing
No ratings yet
3-Data Pre-Processing
18 pages
Data Mining Chapter 2 Data Preprocessing
No ratings yet
Data Mining Chapter 2 Data Preprocessing
33 pages
Data Preprocessing
No ratings yet
Data Preprocessing
12 pages
Why Data Preprocessing?
No ratings yet
Why Data Preprocessing?
3 pages
Data Mining: Data
No ratings yet
Data Mining: Data
50 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Chapter 2 Data Issues
No ratings yet
Chapter 2 Data Issues
21 pages
Data Pre-Processing: Data Preprocessing Describes Any Type of Processing Performed On Raw Data To Prepare It For
No ratings yet
Data Pre-Processing: Data Preprocessing Describes Any Type of Processing Performed On Raw Data To Prepare It For
57 pages
Lec 5
No ratings yet
Lec 5
24 pages
Data Preprocessing
100% (1)
Data Preprocessing
109 pages
Unit 1
No ratings yet
Unit 1
21 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
Data Mining
No ratings yet
Data Mining
40 pages
Week 5 - Data Mining Exploring Data With R
No ratings yet
Week 5 - Data Mining Exploring Data With R
146 pages
DC Circuit
No ratings yet
DC Circuit
142 pages
Digital Image Processing Concepts, Algorithms, and Scientific Applications Second Edition by Bemd Jahne PDF
No ratings yet
Digital Image Processing Concepts, Algorithms, and Scientific Applications Second Edition by Bemd Jahne PDF
413 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
FALLSEM2019-20 MAT1011 ETH VL2019201005249 Reference Material II 19-Sep-2019 Solved-Problems Double-and-Triple-Integrals-2 PDF
100% (1)
FALLSEM2019-20 MAT1011 ETH VL2019201005249 Reference Material II 19-Sep-2019 Solved-Problems Double-and-Triple-Integrals-2 PDF
18 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Geometry Assignment Term 2 Grade 12 Memo
100% (1)
Geometry Assignment Term 2 Grade 12 Memo
13 pages
Le Maitre 1976
No ratings yet
Le Maitre 1976
10 pages
NMCE Lecture Plan
No ratings yet
NMCE Lecture Plan
1 page
(Tutorial) Dynamic Analysis For High Speed Two - Final - Blue
No ratings yet
(Tutorial) Dynamic Analysis For High Speed Two - Final - Blue
28 pages
1 - Relations and Functions
No ratings yet
1 - Relations and Functions
18 pages
For Use With Cat Mobile Crimper M6: MN) Couplings
No ratings yet
For Use With Cat Mobile Crimper M6: MN) Couplings
3 pages
Tutorials On Hydrostatic Forces
No ratings yet
Tutorials On Hydrostatic Forces
3 pages
9709 s20 QP 31-Solved (Handwritten)
No ratings yet
9709 s20 QP 31-Solved (Handwritten)
12 pages
Physics Mechanics Review
No ratings yet
Physics Mechanics Review
17 pages
Software Midterm
No ratings yet
Software Midterm
10 pages
Kiangsu-Chekiang College (Shatin) F.5 Final Examination 2023-24 MATHEMATICS Compulsory Part Paper 1 Question-Answer Book June 18, 2024 (Tuesday)
No ratings yet
Kiangsu-Chekiang College (Shatin) F.5 Final Examination 2023-24 MATHEMATICS Compulsory Part Paper 1 Question-Answer Book June 18, 2024 (Tuesday)
17 pages
Class - VIII HHW (2025-26) - 3
No ratings yet
Class - VIII HHW (2025-26) - 3
6 pages
Tutorial 4 - MATRIX and LINEAR - DE - WITH SOLUTION 2020
No ratings yet
Tutorial 4 - MATRIX and LINEAR - DE - WITH SOLUTION 2020
26 pages
Advances in Geophysics Volume 55 1st Edition Renata Dmowska
No ratings yet
Advances in Geophysics Volume 55 1st Edition Renata Dmowska
75 pages
2021 Winter
No ratings yet
2021 Winter
9 pages
Sec. 3
No ratings yet
Sec. 3
8 pages
Wang-2024-Deep Reinforcement Learning For Dema
No ratings yet
Wang-2024-Deep Reinforcement Learning For Dema
13 pages
Drying: Learning Unit 3
No ratings yet
Drying: Learning Unit 3
24 pages
Calculus II - Summary of Lecture #3
No ratings yet
Calculus II - Summary of Lecture #3
16 pages
04-Problem Solving and Programming
No ratings yet
04-Problem Solving and Programming
4 pages
Narayana: Common Practice Test-7
No ratings yet
Narayana: Common Practice Test-7
13 pages
Kinetika Kimia Orde 1
No ratings yet
Kinetika Kimia Orde 1
24 pages
Hasselbring 07
No ratings yet
Hasselbring 07
33 pages
Risk Matrix
No ratings yet
Risk Matrix
1 page
What Is KMC
No ratings yet
What Is KMC
2 pages
AP Statistics Crash Course
From Everand
AP Statistics Crash Course
Michael D'Alessio
No ratings yet
Sample Portfolio for Events Management Services
From Everand
Sample Portfolio for Events Management Services
Marie Catherine Unabia-del Mar
1.5/5 (2)
Study Blast PSAT Math Prep: Study Blast
From Everand
Study Blast PSAT Math Prep: Study Blast
Study Blast
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.