0% found this document useful (0 votes)

7 views13 pages

DPIR_IA1

The document discusses various aspects of data handling, including sources and approaches for data collection, data transformation strategies, data integration, and data reduction techniques. It covers key concepts such as central tendency, hypothesis testing, data cleaning methods, exploratory data analysis, and clustering techniques like K-Means and Agglomerative Clustering. Each section provides definitions, examples, and significance of the respective topics in data analysis.

Uploaded by

saurabh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views13 pages

DPIR_IA1

Uploaded by

saurabh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

1. What are the Various sources of Data ! Approach to collect the data.

->
• Sources of data are:
• Various transactions done online(Electricity Bill, LIC Premiums, Online Shopping,
Selling Online)
• Data from Facebook, Instagram, WhatsApp, LinkedIn, Twitter
• Uploaded videos on the Internet, videos watched on various social media platforms
• Data generated through your mobile-in terms of the apps installed, apps used, data
generated through apps(Gaming apps, pictures, e-com apps)
• IoT Devices generating data.
• Approaches to collect data:
• Surveys/Polls – to gather data to answer specific questions(For Ex: a poll may be
used to understand how a population of eligible voters will cast their vote in an
upcoming election)
• An interview conducted either over phone/in person/over the internet(to elicit
information on people’s opinions, preferences, and behaviour)
• Experiments
• Case Studies(Uber, Amazon, Smart Toothbrush)

2. What is Data Transformation and the strategies of data transformation.

->
• Data Transformation: The data is transformed/consolidated into forms appropriate for
analysis.
• The strategies for data transformation include the following:
1. Smoothing:
-> It is used to remove noise from the data.
-> Techniques like regression, binning and clustering are used in this process.

2. Attribute Construction:
->Attribute constructions can help improve accuracy and helps in the better
understanding of the data set.
->For example, We may wish to add the attribute area based on the attributes-
height and width.

3. Aggregation:
-> Summary or Aggregation operations are applied to the data.
->For Ex: The daily sales data may be aggregated so as to compute monthly
and annual total amounts.

4. Normalisation:
->The attribute data are scaled so as to fall within a smaller range.
Page 1
-> say range between -1.0 to +1.0(Mean Normalisation)
5. Discretisation:

->The raw values of a numeric attribute are replaced by either Interval

labels(e.g., 0-10, 11-20)
Or
Conceptual labels(youth, adult, senior)

6. Concept hierarchy generation:

->The attributes can be generalised to higher level concepts.

3. What is Data Integration ?

->
• Data Integration is the merging of data from multiple data source.
• Careful Integration can help reduce and avoid:
• Entity Identity Problem:
->For instance, When two databases with respect to customer
information, are getting integrated, may be there are chances that
customer-id in one database and cust-num in another database, refer
to the same attribute. But the field names are different.
->So.. How to get to know that these fields are same??
who helps??
->Solution: Metadata contains the details such as
❖ Name of the attribute
❖ Its meaning
❖ Its data type
❖ Range of values permitted for the attribute
❖ Null rules for handling blank/zero/null values

• Redundancy and Correlation Analysis:

Page 2
->An attribute is said to be redundant if it can be
derived(correlated to) from another attribute or set of attributes.
->Example of Positive Correlation-as height increases, weight
also increases
->Example of Negative Correlation-time spent on mobile and
performance in the Exams.
->Redundancies can be detected by correlation analysis.
->There are tests that can be performed on different kinds of
data.
->Chi-Squared Test: for measuring the correlation existing
between nominal data.

• Tuple Duplication:
->Duplication at the tuple level, for e.g., where there are more
than one identical tuple for a given unique data entry case is
called Tuple Duplication. These identical tuples have to be
eliminated.

• Data value conflict detection and resolution:

->Data Value conflicts refer to the discrepancies in the data
->Discrepancies may arise from inconsistent data
representations and inconsistent use of codes
->For Example:
1) Weight attributes may be stored in different units
2) When exchanging information between schools, each school
may have its own curriculum and grading scheme.

4. What is Data Reduction?

->
• It is a solution to the problem when the data set is likely to be huge, complex data
analysis on huge amount of data can take a long time, making such analysis impractical or
infeasible.
• This technique is applied to obtain reduced representation of the data set that is much
smaller in volume, yet closely maintains the integrity of the original data and produce the
same analytical results.
• Strategies in data reduction are:
• Dimensionality:It is the process of reducing the number of attributes under
consideration. Dimensionality Reduction method includes:
- Wavelet transforms:

Page 3
• Discrete wavelet transform is signal processing technique that, when applied
to a data vector X (transforms)-> numerically different vector X’ (wavelet
coefficients)
• But Both the vectors are of same length.
• The usefulness lies in the fact that the wavelet transformed data can be
truncated storing only a small fractions of the strongest of the wavelet
coefficients

- Attribute Subset Selection:

• Data sets for analysis may contain hundreds and thousands of attributes,
many of which may be irrelevant in the process of arriving at the results.
• For Example, If the task is to classify customers based on whether or not
they are likely to purchase a popular new product, when notified of a sale,
attributes such as the customer’s telephone number is likely to be irrelevant,
unlike such as age or music taste.
• Attribute subset selection reduces the data size by removing irrelevant or
redundant attributes(or dimensions)

- Numerosity:
• Numerosity Reduction Techniques replace the original data volume by
alternative, smaller forms of data representation.

- Parametric method:
• A model is used to estimate the data, so that only the data parameters need to
be stored, instead of the actual data. Ex: Regression Models
- Non-Parametric Method:
• In statistics, a histogram is a graphical representation of the distribution of
data.
• The histogram is represented by a set of rectangles, adjacent to each other,
where each bar represent a kind of data.
- Sampling:
• Sampling is the selection of a subset of individuals from within a statistical
population to estimate characteristics of the whole population.

5.EXPLAIN !
->
• Data Table:
- Collection of measured data values represented as numbers or text. They are
raw before they are transformed.
• Data Value:
- Measurements of various details in different measures
Page 4
- Distances(in cm/m)
- Categories(Telecom/Energy Industry)
- Weights(lb/kg)
• Observation:
- Each row in the Data Table contains information about a specific item.
• Variable:
- an attribute of a specified record
- Types of variable;
• Discrete Variable:
- If a variable contains a fixed number of values – be it numbers or
categorical, then such variables are called Discrete Variables.
- Ex: Number of Students, count of participants, different
sectors(telecom, retail)
• Numerical Variable:
- If a variable contains a Continuous numeric value(with infinite precision),
then such variables are called Numerical Values.
- Example: Height, weight.

• Nominal scales (Value):

- limited number of different values that cannot be ordered.
- Ex: Financial, Engineering, Retail
• Ordinal scales (value):
- values can be ordered or ranked. They have fixed number of categories-
but with ranks.
- Ex: low, medium, high.
• Interval scales (value) :
- Values where the interval between values can be compared.
- Ex: On the Fahrenheit Scale with values 5, 10,15 degree Fahrenheit,
interval is 5 degree Fahrenheit.
• Ratio scale (value) :
- Intervals between values and ratios of values can be compared.
- Ex: For Bank Balance of 5$, 10$ and 15$, the difference between each
pair is 5$. And 10$ is twice as much as 5$.
• Dichotomous Variable:
- If a variable contains only two values. Ex: Yes/No reply for
questionnaires.
• Binary Variable:
- a Dichotomous variable with values 0/1.
- Ex: 1 to represent if a product is purchased.
- 0 to represent the product is not purchased.

Page 5
• Independent variable:
- a variable that stands alone and isn’t changed by the other variables.
• Dependent variable:
- the value which is dependent on the changes in the independent
variables.

6.What is Central Tendency? Approaches to calculate Central

location and its significance.
->
• There are various ways in which a variable can be summarised, the most important is the
value used to characterise the centre of the set of values it contains.
• It is that one value that can be used as a representative of the whole set of data.
• The common statistical approaches for calculating the central location are :
• The Mode:
- When a Researcher is quoting the opinion of a group, he/she is probably referring
to the most frequently expressed opinion which is the modal opinion.
- Its defined as the most frequently occurring value in the data.
- Mode provides the only measure of central tendency for variables measured on a
nominal scale;
- however, the mode can also be calculated for variables measured on the ordinal,
interval, and ratio scales.

• The Median:
- For variables with an even number of values, the average of the two values
closest to the middle is selected (sum the two values and divide by 2).
- The median can be calculated for variables measured on the ordinal, interval, and
ratio scales and is often the best indication of central tendency for variables
measured on the ordinal scale.
• The Mean:
- Its the most commonly used summary of central tendency for variables measured
on the interval or ratio scales.
- It is the average the values given.

7. What is Frequency Distribution ? 3 Types of Visual Representation

.
->
• The central location is a single value that characterises an individual variable’s data
values, it provides NO insight into the variation of the data.
• The frequency distribution, which is based on a simple count of how many times a value
occurs, is often a starting point for the analysis of variation.

Page 6
• The distribution of the data can be understood by using simple data visualisations.
• There are three types of Charts/Visual Representations that are most commonly used.
1. Bar Charts:
- For a variable measured on a nominal scale, a bar chart can be used to
display the relative frequencies for the different values.
- For nominal variables, the ordering of the x-axis is arbitrary; however,
they are often ordered alphabetically or based on the frequency value.
- The y-axis which measures frequency can also be replaced by values
representing the proportion or percentage of the overall number of
observations (replacing the frequency value).
- For variables measured on an ordinal scale containing a small number of
values, a bar chart can also be used to understand the relative
frequencies of the different values.
2. Frequency Histograms:
- The frequency histogram is useful for variables with an ordered scale—
ordinal, interval, or ratio—that contain a larger number of values.
- Each variable is divided into a series of groups based on the data values
and displayed as bars whose heights are proportional to the number of
observations within each group.

3. Box Plots :
- Box plots provide a summary of the overall frequency distribution of a
variable.
- Six values are usually displayed: the lowest value, the lower quartile
(Q1), the median (Q2), the upper quartile (Q3), the highest value, and the
mean.The box in the middle of the plot represents where the central
50% of observations lie.
- A vertical line inside the box , shows the location of the median value and
a dot represents the location of the mean value.

8.What is a Variance and Standard Deviation?

->
• Variance:
- A measure of how much the values of a variable differ from the mean.
- For variables that represent only a sample of some population and not the
population as a whole, the variance formula is ;
- variance= sum(value - mean)2 / n-1

Page 7
• Standard Deviation:
- The standard deviation is the square root of the variance.
- The standard deviation is the most widely used measure of the deviation of a
variable.
- The higher the value, the more widely distributed the variable’s data values are
around the mean.

9.What is Hypothesis Test?

->
• Hypothesis: A proposed explanation made on the basis of limited evidence as a starting
point for further investigation
• Hypothesis testing is the process used to evaluate the strength of evidence from the
sample.
• The Null Hypothesis is stated in terms of what would be expected if there were nothing
unusual about the measured values of the observations in the data from the samples we
collect—“null” implies the absence of effect.
• When a Null Hypothesis is made, there can be two outcomes, either the Hypothesis is
REJECTED or ACCEPTED.
• Hypothesis Tests are used to support making decisions by helping to understand
whether the data collected from a sample of the observations support a particular
hypothesis.
• Example: To test this hypothesis, the company collects a random sample of 100 shampoo
bottles and precisely measures the contents of the bottle. If it is inferred from the
sample that the average amount of shampoo in each bottle is not 200 mL, then a
decision may be made to stop production and rectify the manufacturing problem.

10.What is Data Cleaning and ways to clean the data?

->
• This process attempts to fill in missing values, smooth out the noise while identifying
outliers and correct the inconsistencies in the Data.
• Basic methods for Data Cleaning :
* Ways to handle missing values:
• Ignoring the tuple:
- This is usually done when the class label(assuming the task involved is
classification) is missing.
• Filling in the missing value manually:
- This approach is time consuming and may not be feasible, given a large data
set with many missing values.
• Use a global constant to fill in the missing values:

Page 8
- Replace all missing values of the attributes by the same constant, such as a
label ..like…”Unknown”.
• Use a measure of the central tendency for the attributes
• Use the attribute mean or median for all samples belonging to the same class
• Use the most probable value to fill in the missing value.

* Data Smoothing Techniques:

❖ Binning
❖ Regression
❖ Outlier Analysis

11.Exploratory Data analysis?

->
• Exploring the data in terms of :
- Examining the structure and components of the DataSet
- Distribution of individual variables
- Relationship between 2 variables
- Data Visualization tools to quickly absorb the information said in the DataSet
- To determine whether the question can be answered by the data that you have
- To develop a sketch of the answer to your question.
• For Example,
->The Question can be “Do Countries in the Eastern United States have
higher ozone levels than Countries in the Western United States”.
->To answer this Question, you should have Ozone, Country and US Region
Data, as part of the variables of each Observation.

12. K-Means Clustering.

->
1. It is an example of Partitioning method of grouping observations

2. It groups data using a “top down approach”, since it starts with predefined number of
Clusters

3. Computationally faster and can handle greater number of observations than AHC
(can be grouped under Disadvantages of K-Means)

4. The number of groups have to be specified before creating the Clusters.

Page 9
5. When the data set contains many Outliers, K Means may not create optimal
grouping.

13. Agglomerative Clustering.

->
1. It is an example of a hierarchical method of grouping observations

2. It uses “bottom up approach” for Clustering, as it starts with each observation and
progressively creates Clusters

3. Computational cost is higher since it has to generate the Hierarchical Tree (can be
grouped under Disadvantages of AH Clustering)

4. Limited to datasets with fewer than 10000 Observations.

14.All About Association rule.

->
- The association rules method is an example of an unsupervised grouping method.
- The association rules method groups observations and attempts to discover links or
associations between different attributes of the group.
- Advantages:
• The generated rules are easy to understand.
• This technique can be used with large numbers of observations.
- Disadvantages:
• This method forces you to either restrict your analysis to variables that
categorical or convert continuous variables to categorical variables.

• Generating the rules can be computationally expensive, especially where a data

set has many variables or many possible values per variable, or both.

• This method can generate large numbers of rules that must be prioritized and
interpreted.
- Three values used to generate the association rules:
• Support :

Page 10
- Its value is the proportion of the observations a rule selects out of all
observations in the data set.

• Confidence :
- The Confidence score is a measure for how predictable a rule is.
- The Confidence or Predictability value is calculated using the support
for the entire group divided by the support for all observations satisfied
by the IF-part of the rule.

• Lift :
- The Lift Score indicates the strength of the association.
- Lift = confidence∕THEN-part support

14.All About Decision tree.

->
- Decision trees are an example of a supervised method. Each observation is placed into
interesting groups based on selected variables.
- They can handle categorical and continuous variables since they partition a data set
into distinct regions based on ranges or specific values.
- A tree is made up of a series of decision points, where the split of the entire set of
observations or a subset of the observations is based on some criteria.
- Each point in the tree represents a set of observations under a particular attribute
called a node.
- The relationship between two connected nodes is defined as a parent–child
relationship.
- The variable that is responsible for the larger set that will be divided into two or more
smaller sets is the parent node.
- The nodes resulting from the division of the parent are child nodes.
- A child node with no children and that is called a leaf node.

- Advantages:

Page 11
1. They are easy to understand and used in explaining how decisions are reached
based on multiple criteria
2. They can handle categorical and continuous variables since they partition a data
set into distinct regions based on ranges or specific values.
- Disadvantages:
1. Building decision trees can be computationally expensive, particularly when
analyzing a large data set with many continuous variables.
2. Generating a useful decision tree automatically can be challenging(since large and
complex trees can be easily generated; trees that are too small may not capture
enough information; and generating the “best” tree through optimization is
difficult).

15.All About Splitting.

->
- A table of data is used to generate a decision tree where certain variables are used as
potential decision points (splitting variables)
- and one variable is used to guide the construction of the tree (response variable).
- The response variable will be used to guide which splitting variables are selected and at
what value the split is made.

• This uses the concept of Entropy(impurity).

• As the tree is being generated, it is desirable to decrease the level of impurity until ideally
there is only one category at a terminal node (a node with no children)

• Dichotomous :
• Variables with two values are the most straightforward to split, since each
branch represents a specific value. For example, a variable Temperature may have
only two values: “hot” and “cold.

• Nominal:
• Since nominal values are discrete values with no order, a two-way split is
accomplished by one subset being composed of a set of observations that are
equal a certain value and the other being those observations that do not equal
that value.

• Ordinal:
Page 12
• In the case where a variable’s discrete values are ordered, the resulting subsets
may be made up of more than one value, as long as the ordering is retained.

• Continuous:
• For variables with continuous values to be split two ways, a specific cut-off value
needs to be determined so that observations with values less than the cut-off are
in the subset on the left and those with values greater than or equal to are in the
subset on the right.

Page 13

DATA MINING Notes
No ratings yet
DATA MINING Notes
37 pages
Data Preprocessing
No ratings yet
Data Preprocessing
54 pages
Unit 3
No ratings yet
Unit 3
164 pages
Week 2
No ratings yet
Week 2
96 pages
Ch 3-Final
No ratings yet
Ch 3-Final
39 pages
Endsem Imp Bi Unit 4
No ratings yet
Endsem Imp Bi Unit 4
36 pages
Data Mining - Lecture 3
No ratings yet
Data Mining - Lecture 3
33 pages
DM Data transformation techniques
No ratings yet
DM Data transformation techniques
25 pages
UpdatedUnit 1 Data Preprocessing
No ratings yet
UpdatedUnit 1 Data Preprocessing
38 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
Unit 3.2
No ratings yet
Unit 3.2
45 pages
Data Science unit I(LN and QB)
No ratings yet
Data Science unit I(LN and QB)
44 pages
Data Preprocessing
No ratings yet
Data Preprocessing
39 pages
14. Preprocessing-Cleaning & Reduction
No ratings yet
14. Preprocessing-Cleaning & Reduction
42 pages
Data Warehousing and Mining: Dr. Hossen Asiful Mustafa
No ratings yet
Data Warehousing and Mining: Dr. Hossen Asiful Mustafa
49 pages
Data Preprocessingedfgh
No ratings yet
Data Preprocessingedfgh
21 pages
DMTN
No ratings yet
DMTN
17 pages
DR
No ratings yet
DR
20 pages
Data Integration and Data Reduction
No ratings yet
Data Integration and Data Reduction
27 pages
Data Preprocessing
No ratings yet
Data Preprocessing
21 pages
Preprocessing
No ratings yet
Preprocessing
52 pages
17 Data Analysis
No ratings yet
17 Data Analysis
64 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
100 pages
BIS 541 Ch03 20-21 S
No ratings yet
BIS 541 Ch03 20-21 S
86 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Data Preprocessing
No ratings yet
Data Preprocessing
33 pages
BUSINESS ANALYTICS
No ratings yet
BUSINESS ANALYTICS
14 pages
HIT391-week 3-New
No ratings yet
HIT391-week 3-New
43 pages
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
Data Preprocessing
100% (1)
Data Preprocessing
33 pages
Mod1 DM Part2
No ratings yet
Mod1 DM Part2
34 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
Data Mining Chapter 2 Data Preprocessing
No ratings yet
Data Mining Chapter 2 Data Preprocessing
33 pages
Unit I
No ratings yet
Unit I
57 pages
7.data Preprocessing
No ratings yet
7.data Preprocessing
12 pages
Colleges - Universities 3145
No ratings yet
Colleges - Universities 3145
341 pages
r20 DWDM Unit 2 PART 2
No ratings yet
r20 DWDM Unit 2 PART 2
15 pages
UNIT-III Data Warehouse and Minig Notes MDU
No ratings yet
UNIT-III Data Warehouse and Minig Notes MDU
42 pages
_Business Analytics (DJ19ITEC7013) Prev Year Qb
No ratings yet
_Business Analytics (DJ19ITEC7013) Prev Year Qb
5 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
DMDW 5
No ratings yet
DMDW 5
25 pages
WINSEM2023-24 - BECE352E - ETH - VL2023240504409 - 2024-02-03 - Reference-Material-I 2
No ratings yet
WINSEM2023-24 - BECE352E - ETH - VL2023240504409 - 2024-02-03 - Reference-Material-I 2
16 pages
JAVA Advanced 3
No ratings yet
JAVA Advanced 3
19 pages
Data Mining 11
No ratings yet
Data Mining 11
6 pages
Lecture 7 Data Reduction
No ratings yet
Lecture 7 Data Reduction
5 pages
Data Mining
No ratings yet
Data Mining
40 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Data Preparation
No ratings yet
Data Preparation
21 pages
Kuliah 2 - Data Dan Eksplorasi Data
No ratings yet
Kuliah 2 - Data Dan Eksplorasi Data
61 pages
Chapter 2 Data Issues
No ratings yet
Chapter 2 Data Issues
21 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
52 pages
DATA MINING Notes (Upate)
No ratings yet
DATA MINING Notes (Upate)
25 pages
PanelMaster MCC Product Catalogue
No ratings yet
PanelMaster MCC Product Catalogue
24 pages
Data Mining
No ratings yet
Data Mining
5 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
52 pages
Quiz 3 SWE102 Quizlet
No ratings yet
Quiz 3 SWE102 Quizlet
40 pages
EE-232 Lab Manual Signals and Systems
No ratings yet
EE-232 Lab Manual Signals and Systems
57 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
5 pages
Fast Numerical Simulation For Full Bore Rupture of Pressurized Pipelines
100% (1)
Fast Numerical Simulation For Full Bore Rupture of Pressurized Pipelines
11 pages
Python Lab
No ratings yet
Python Lab
27 pages
Agriculture
No ratings yet
Agriculture
27 pages
J Legal Studies Education - 2024 - Mattalo - Artificial Intelligence The Future of Pedagogy
No ratings yet
J Legal Studies Education - 2024 - Mattalo - Artificial Intelligence The Future of Pedagogy
23 pages
Chapter 10 - Profitability Analysis
100% (1)
Chapter 10 - Profitability Analysis
46 pages
Marketing Management CG
No ratings yet
Marketing Management CG
20 pages
OSCE Stations Checklists in History and Examinations
100% (1)
OSCE Stations Checklists in History and Examinations
94 pages
Certificate of Originality of Work
No ratings yet
Certificate of Originality of Work
34 pages
QCCD Fire Safety
No ratings yet
QCCD Fire Safety
14 pages
IMIR Consumable Indexxx1
No ratings yet
IMIR Consumable Indexxx1
9 pages
Python Course
No ratings yet
Python Course
5 pages
Competency Checklist Electrical
0% (1)
Competency Checklist Electrical
2 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
50 pages
Term Paper - Trade Policy of Bangladesh
No ratings yet
Term Paper - Trade Policy of Bangladesh
28 pages
Apc Skills Assessment-Stream A: Pharmacy Board of Australia (PBA) Registration
No ratings yet
Apc Skills Assessment-Stream A: Pharmacy Board of Australia (PBA) Registration
8 pages
Koenighaus Thermostat en
No ratings yet
Koenighaus Thermostat en
5 pages
SAP FI MCQs - General Ledger
No ratings yet
SAP FI MCQs - General Ledger
5 pages
Specification Sheets PDF
No ratings yet
Specification Sheets PDF
6 pages
Method Statement For Installation of VRF System
75% (4)
Method Statement For Installation of VRF System
8 pages
Acronyms
No ratings yet
Acronyms
6 pages
Elastic Impedance Normalization - Whitcombe
No ratings yet
Elastic Impedance Normalization - Whitcombe
3 pages
Example: Helen Comes Home Late: Activity Guide 6 Daily Activities
No ratings yet
Example: Helen Comes Home Late: Activity Guide 6 Daily Activities
2 pages
Vacuum Pump, KNF, N 820 3FT.18 Data Sheet
No ratings yet
Vacuum Pump, KNF, N 820 3FT.18 Data Sheet
2 pages
Contoh: Topologi Jaringan Kantor Dinas Propinsi / Kabupaten
No ratings yet
Contoh: Topologi Jaringan Kantor Dinas Propinsi / Kabupaten
3 pages
6.0 Emergency Evacuation - Revised
No ratings yet
6.0 Emergency Evacuation - Revised
6 pages
MTN.01-03.T BOAT FORM zTO LOCAL OCMI
No ratings yet
MTN.01-03.T BOAT FORM zTO LOCAL OCMI
3 pages
Cline Anthony - Reflections of Leadership and Management
No ratings yet
Cline Anthony - Reflections of Leadership and Management
5 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DPIR_IA1

Uploaded by

DPIR_IA1

Uploaded by

1. What are the Various sources of Data ! Approach to collect the data.

2. What is Data Transformation and the strategies of data transformation.

->The raw values of a numeric attribute are replaced by either Interval

6. Concept hierarchy generation:

->The attributes can be generalised to higher level concepts.

3. What is Data Integration ?

• Redundancy and Correlation Analysis:

• Data value conflict detection and resolution:

4. What is Data Reduction?

- Attribute Subset Selection:

• Nominal scales (Value):

6.What is Central Tendency? Approaches to calculate Central

7. What is Frequency Distribution ? 3 Types of Visual Representation

8.What is a Variance and Standard Deviation?

9.What is Hypothesis Test?

10.What is Data Cleaning and ways to clean the data?

* Data Smoothing Techniques:

11.Exploratory Data analysis?

12. K-Means Clustering.

4. The number of groups have to be specified before creating the Clusters.

13. Agglomerative Clustering.

4. Limited to datasets with fewer than 10000 Observations.

14.All About Association rule.

• Generating the rules can be computationally expensive, especially where a data

14.All About Decision tree.

15.All About Splitting.

• This uses the concept of Entropy(impurity).

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.