0% found this document useful (0 votes)

17 views16 pages

CHAPTER-4-Data-Management

REVIEWER

Uploaded by

brentogale59

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views16 pages

CHAPTER-4-Data-Management

REVIEWER

Uploaded by

brentogale59

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

CHAPTER 4

DATA MANAGEMENT

Data management is the practice of collecting, organizing, storing, and analyzing data
to derive meaningful insights. In today's digital age, data has become an invaluable asset,
driving innovation and decision-making across various industries. Statistics, a branch of
mathematics, plays a crucial role in data management by providing the tools and techniques
to extract valuable information from raw data.

Learning Outcomes

At the end of this chapter, you are expected to:

1. Utilize various data management tools to process and manage quantitative data;
2. Identify the types of data and its level of measurement
3. Calculate the measure of central tendency and measures of dispersion of a given set
of data.
4. Interpret data based on the result of computation.
5. Appreciate the importance and application of measures of central tendency, and
measures of dispersions in real life situation

Components of Data Management

A. Data Collection

a. Primary Data - Collected first hand through surveys, experiments, or

observations.

b. Secondary Data - Obtained from existing sources like government reports,

academic papers, or public databases.

B. Data Cleaning and Preparation

a. Data Cleaning - Identifying and correcting errors, inconsistencies, and

missing values.

b. Data Preparation - Transforming raw data into a suitable format for analysis,
which may involve:

• Data normalization - Scaling data to a common range.

• Data imputation - Filling in missing values.
• Feature engineering - Creating new features from existing ones.

C. Data Storage and Organization

a. Data Warehouses - Centralized repositories for storing large volumes of data.

b. Data Lakes - Unstructured data storage systems.

c. Data Marts - Smaller, focused data warehouses.

1|Page
D. Data Analysis and Interpretation

a. Descriptive Statistics - Summarizing data using measures like mean, median,

mode, standard deviation, and variance.

b. Inferential Statistics - Drawing conclusions about a population based on a

sample.

c. Hypothesis Testing - Making inferences about population parameters.

d. Regression Analysis - Modelling relationships between variables.

e. Machine Learning - Using algorithms to learn patterns from data.

What is Data?

Data refers to raw and unprocessed facts, figures, or values collected or observed from the
real world. It can take various forms, including numbers, text, images, or any other
representations. Data by itself lacks context and meaning until it is processed, organized, and
interpreted.

Examples:
1. Numeric Data - Numbers like age, height, weight, temperature, sales figures, etc.
2. Textual Data - Words, sentences, paragraphs, like articles, emails, social media posts,
etc
3. Categorical Data - Red, blue, green; Yes or No; Categories like "High," "Medium," "Low"
4. Video data - Movies, TV shows, video clips, etc.
5. Image Data - Pixel values in a digital image
6. Audio Data - Waveform values in digital audio

Types of Data
Data, the raw material of the information age, comes in various forms and can be categorized
based on different criteria.

1. Quantitative Data - These variables represent measurable quantities and can be either
discrete or continuous.
 Discrete Data - Take on distinct, separate values with no intermediate values. Often
whole numbers. Examples include the number of siblings or the number of cars in a
parking lot.

 Continuous data - Can take on any value within a given range and have infinite
possible values. Examples include height, weight, and temperature.
2. Qualitative Variables - These variables represent categories or groups and can be
either nominal or ordinal.
 Nominal Data - Categories with no inherent order or ranking. Examples include
gender, ethnicity, or types of fruits.
 Ordinal Data - Categories with a meaningful order or ranking but with inconsistent
intervals between them. Examples include education levels (e.g., high school,
college, graduate), satisfaction rating (low, medium, high), Likert scale responses.

2|Page
Level of Data Measurement
Understanding the level of measurement of your data is crucial for selecting
appropriate statistical techniques and interpreting your findings accurately. There are four
primary levels of measurement, each with distinct characteristics and limitations.

1. Nominal Data - Nominal data represent categories or groups with no inherent order
or ranking.
Examples:
• Gender (categories: male, female)
• Eye color (categories: brown, blue, green)
• Marital Status (single, married, divorce, widow)
• Hair type (straight, wavy, curly, kinky)
• Car brands (Toyota, Ford, Honda, Chevrolet)
• Political affiliation (democrat, republic, independent)

2. Ordinal Data - Ordinal data have ordered categories, but the intervals between
them are not consistent or meaningful.

Examples:
• Educational levels (categories: high school, college, graduate)
• Customer satisfaction ratings (categories: dissatisfied, neutral,
satisfied)
• Socio-economic status (lower class, working class, middle class,
upper-middle class, upper class)
• Likert scale for agreement (strongly disagree, disagree, neutral, agree,
strongly agree)
• Performance rating (below expectations, meeting expectation,
exceeding expectation)

3|Page
3. Interval Data - Interval data have ordered categories with consistent and meaningful
intervals between them, but they lack a true zero point.

Examples:
• Temperature in Celsius or Fahrenheit
• IQ scores
• pH level
• Longitude or latitude
• Standardized Test Scores

4. Ratio Data - Ratio data have all the properties of interval variables, but they also
have a true zero point, indicating the absence of the attribute.
Examples:
• Height in centimeters or inches

(Measuring the plant height)

• Income
• Weight
• Distance travelled
• Time (in seconds, minutes, hours)

4|Page
Measure of Central Tendency
A measure of central tendency is a summary statistic that represents the center point
or typical value of a dataset. It also referred to as the central location of a distribution. There
are three measures of central tendency - mean, median, and mode. Choosing the best
measure of central tendency depends on the type of data.

A. Mode

Mode is a statistical measure that represents the most frequently occurring value in a dataset.
It is the value with the greatest frequency. Mode is appropriate to use when the variable
measured is in the nominal scale.

Example 1.

Let's say we surveyed a group of people about their favorite color. Here are the results:
• Blue: 15 people
• Red: 10 people
• Green: 8 people
• Yellow: 7 people
In this case, blue is the mode because it is the most frequently chosen color.

Example 2

A teacher records the following scores for a class of 10 students on a recent test:

75, 82, 85, 85, 85, 90, 92, 95, 95, 100

Solution

To find the mode, we identify the score that appears most frequently. In this case, the score
85 appears three times, which is more frequent than any other score. Therefore, the mode of
the test scores is 85.

Real-world examples:

Fashion. A clothing store owner might notice that a particular style of jeans is selling more
than any other. The mode would be the most popular style.

Weather. A meteorologist might observe that the most common daily high temperature in a
particular city during a specific month is 25 degrees Celsius. This would be the modal
temperature.

Product Sales. A supermarket manager might identify the best-selling brand of cereal by
determining the brand that appears most frequently in sales records.

Quality Control. A manufacturer might inspect a batch of products and find that a certain
defect occurs most often. This would be the modal defect.

5|Page
Characteristics:

• Multiple Modes: A dataset can have more than one mode. This is known as bimodal
or multimodal. For instance, if "red" and "blue" are equally popular colors in the
survey example, the dataset would be bimodal.
• No Mode. A dataset might not have a mode if all values occur with the same
frequency.
• Mode for Categorical Data. Mode is often used for categorical data, as it helps
identify the most common category.
• Mode for Numerical Data. While it can be used for numerical data, it's less common
than the mean or median, especially for large datasets.

When to Use Mode

• Identifying the Most Common Value - When you want to know the most frequent
occurrence.
• Categorical Data Analysis - When dealing with categorical data, mode is a useful
measure of central tendency.
• Non-Normal Distributions - In cases where the data is not normally distributed, the
mode can provide insights that might be missed by the mean or median.

B. Median

The median is the middle entry or term in a set of data arranged in either increasing or
decreasing order. The median is a positional measure. Thus, the values of the individual
measures in a set of data do not affect it. It is affected by the number of measures and not by
the size of the extreme values. This measure is appropriate to use when the distribution is at
least ordinal scale since ranking of the data is involved.

To find the median of a given set of data, take note of the following:
1. Arrange the data in either increasing or decreasing order.
2. Locate the middle value. If the number of cases is odd, the middle values is the
median. If the number of cases is even, take the arithmetic mean of the two
middle measures.

Example 1

The number of books borrowed in the library from Monday to Friday last week were 58, 60,
54, 35, and 97 respectively. Find the median.

Solution: Arrange the number of books borrowed in increasing order.

35, 54, 58, 60, 97

The median is 58.

Example 2

Cora’s quizzes for the second quarter are 8, 7,6, 10, 9, 5, 9, 6, 10, and 7. Find the median.

Solution: Arrange the scores in increasing order.

5, 6, 6, 7, 7, 8, 9, 9, 10, 10

6|Page
Since the number of measures is even, then the median is the average of the two
middle scores.

Characteristics of Median

• Less Affected by Outliers. Unlike the mean, the median is not significantly affected by
extremely large or small values.
• Quick and Easy to Calculate. It's relatively simple to find the median, especially for
smaller datasets.
• Represents the Middle Value. It provides a good measure of central tendency,
indicating the value that separates the lower half of the data from the upper half.

Real-world examples

Real State. When analyzing housing market trends, real estate agents often use the median
home price. This is because it's less affected by outliers like extremely expensive or
inexpensive homes.

Income. Economists and policymakers often use the median income to gauge the overall
economic health of a population. This is because it provides a better picture of typical income
levels, as it's less influenced by very high or very low incomes.

Demographics. Demographers use the median age to understand the age distribution of a
population. This can help in planning for future needs like healthcare, education, and social
services.

C. MEAN

The mean (also known as the arithmetic mean) is the most commonly used measure
of central position. It is the sum of measures divided by the number of measures in a variable.
It is symbolized as 𝒙𝒙� (read as x bar). Mean is appropriate to use when the distribution is at
least interval scale.

To find the mean of ungrouped data, use the formula;

7|Page
Example 1

The grades in Chemistry of 10 students are 87, 84, 85, 85, 86, 90, 79, 82, 78, 76. What is
the average grade of the 10 students?

Solution:

Example 2. Find the Average Salary of a Company

Suppose a company in the Philippines has 10 employees with the following annual salaries
in Philippine Pesos (PHP):

Php 150,000
Php 175,000
Php 200,000
Php 225,000
Php 250,000
Php 250,000
Php 275,000
Php 300,000
Php 350,000
Php 1,000,000

Solution: To find the average salary

1. Add up all the salaries: Php 150,000 + Php 175,000 + Php 200,000 + Php 225,000 +
Php 250,000 + Php 250,000 + Php 275,000 + Php 300,000 + Php 350,000 + Php
1,000,000 = Php 3,150,000
2. Divide the total salary by the number of employees: PHP 3,150,000/10 = PHP 315,000

Therefore, the average salary of the company is PHP 315,000.

Characteristics of Mean:

• Sensitivity to Outliers. The mean is sensitive to outliers. This means that extreme
values can significantly influence the mean, potentially skewing it. For example, if a few
very high salaries are included in a dataset of employee salaries, the mean salary will be
higher than the typical salary.

• Uses All Data Points. The mean takes into account every data point in the dataset. This
makes it a comprehensive measure of central tendency.

• Unique Value. A dataset can only have one mean.

Real-world examples

Academic Performance. Teachers often calculate the average score on a test to assess
class performance and identify areas where students may need additional support.
Additionally, a student's Grade Point Average (GPA) is calculated by averaging their grades
in different courses.

8|Page
Finance. Investors track the average price of a stock over a specific period to assess its
performance. Moreover, Investors calculate the average return on their investments (ROI) to
evaluate their portfolio's performance.

Business. Businesses use average sales figures to track performance and set sales targets.

Weather. Meteorologists use the average temperature to predict weather patterns and climate
trends.

Weighted Mean

A weighted mean is a type of average that assigns different weights to different data
points. This is useful when some data points are more important or reliable than others. The
formula for weighted mean is:

Example 1.

Below are Maria’s subjects and the corresponding number of units and grades she got for
the first grading period. Compute her grade point average.

9|Page
Therefore, Maria has the GPA of 81.86 for the first grading period.

10 | P a g e
Measures of Dispersion
The measures that describe the degree of spread of the data are called “measure of
dispersion” or “measure of variability” or “measure of spread”. This measure is used to
determine how scattered the values are in the distribution. In this topic, we will consider four
measures of dispersion, namely: range, average deviation, variance, and standard deviation.

Range for Ungrouped Data

The range is the simplest measure of variability. It is the difference between the largest ad
smallest measurement. To determine the range of ungrouped data, the formula is;

Example 1

Consider the four data sets presented below. Find the range of each data set.

Comparing the data sets, Data Set 1 has the least variation because it has the smallest
value of R. On the other hand, Data Set 3 has the most variation because it has the largest
value of R.

Average Deviation for Ungrouped Data

The Average Deviation (AD) is a measure of absolute dispersion that is affected by

every individual score. It is the mean of the absolute deviations of the individual scores from
the mean of all the scores.

A large average deviation would mean that a set of scores is widely dispersed about
the mean, while a small average deviation would imply that the set of scores is closer to the
mean.

11 | P a g e
The formula of average deviation for ungrouped data is:

To be able to apply the formula, the following steps can be observed:

1. Compute the mean from the given scores.
2. Subtract the mean from the individual scores to get the deviation. That is, 𝑥𝑥 − 𝑥𝑥𝑥
3. Get the absolute value of each deviation.
4. Get the sum of the absolute deviation and divide it by (n-1), where n is the total
number of scores. The quotient is the average deviation.

Example 1

The raw scores of eight students in Statistics are given as follows: 17, 17, 26, 28, 30,
30, 31, and 37. Compute the average deviation.

12 | P a g e
Example 2.

The scores of nine students in Psychology are given as follows: 15, 19, 20, 24, 28,
30, 32, 32, and 40. Calculate the average deviation.

The computed average deviation (A.D.) of scores in Statistics is 6 while test scores in
Psychology is 7.17. This can be interpreted as the scores in Statistics are less dispersed or
closely distributed near the mean (homogeneous) while the scores in Psychology are more
dispersed away from the mean (heterogeneous).

13 | P a g e
Variance for Ungrouped Data

Another way to avoid a sum of zero for the deviation scores is to square each deviation
score and get the average of all squared deviation scores. The resulting measure is called
“variance” which has a squared unit. In symbol, 𝑠𝑠2.

To compute the variance of ungrouped data, the following formula may be used

To be able to apply the formula, the following steps ca be observed:

1. Arrange the values in column from lowest to highest.
2. Compute the mean of the distribution.
3. Determine the deviation (𝑑𝑑 = 𝑥𝑥 − 𝑥𝑥̅).
4. Square the deviations.
5. Get the sum of the squared deviations.
6. Divide the sum by the total number of cases. The quotient is the variance.

Example 1. Consider the data set below. Compute the variance of each data set.

14 | P a g e
Standard Deviation for Ungrouped Data

Recall that, in the computation of the variance, the deviation was squared. This implies
that the variance is expressed in squared units. Extracting the square root of the value of the
variance will give the value of the standard deviation. In symbol, 𝑠𝑠.

To take the standard deviation of ungrouped data, extract the square root of the
variance. In mathematical formula,

Example 1. Consider the data set below. Compute the standard deviation of each data set.

15 | P a g e
On the basis of the obtained standard deviation, we say that the scores in Data Set 1
deviate from the mean by 2.06 units, on the average. For Data Set 2, the scores deviate from
the mean by an average of 2.56 units.

16 | P a g e

Cambridge Lower Secondary Computing 7 (Ben Barnes, Tristan Kirkpatrick Etc.) (Z-Library)
83% (12)
Cambridge Lower Secondary Computing 7 (Ben Barnes, Tristan Kirkpatrick Etc.) (Z-Library)
239 pages
The Digital Mindset Paul Leonardi Z Library
100% (1)
The Digital Mindset Paul Leonardi Z Library
244 pages
Data Management
No ratings yet
Data Management
57 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Module1 BDA
No ratings yet
Module1 BDA
39 pages
Chapter 4_mathematics as a Tool
No ratings yet
Chapter 4_mathematics as a Tool
133 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
E-Note_33325_Content_Document_20250319114322AM
No ratings yet
E-Note_33325_Content_Document_20250319114322AM
69 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
Data Handling
No ratings yet
Data Handling
5 pages
lec2-data
No ratings yet
lec2-data
51 pages
Module 3_Types of Data_Part I
No ratings yet
Module 3_Types of Data_Part I
41 pages
2. Know_Your_Data and Rescaling
No ratings yet
2. Know_Your_Data and Rescaling
72 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
18 pages
Ahsan Stats
No ratings yet
Ahsan Stats
9 pages
Basic Statistics PPT
No ratings yet
Basic Statistics PPT
54 pages
Ch01_ICS422_04
No ratings yet
Ch01_ICS422_04
84 pages
Lectures and Notes MATH 212 (Part 1)
No ratings yet
Lectures and Notes MATH 212 (Part 1)
8 pages
unit1
No ratings yet
unit1
78 pages
Unit .......
No ratings yet
Unit .......
45 pages
Data-managementmmw (1)
No ratings yet
Data-managementmmw (1)
26 pages
DSUR Notes-1
No ratings yet
DSUR Notes-1
12 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
QT Summary Document 1
No ratings yet
QT Summary Document 1
45 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Module 5 Research Methodology (3)
No ratings yet
Module 5 Research Methodology (3)
9 pages
Lecture2
No ratings yet
Lecture2
33 pages
Data Management MMW
No ratings yet
Data Management MMW
92 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
42 pages
Quantitative-Analysis-and-Interpretation-1
No ratings yet
Quantitative-Analysis-and-Interpretation-1
35 pages
Biostatistics 1
No ratings yet
Biostatistics 1
19 pages
DATA MANAGEMENT (MMW)
No ratings yet
DATA MANAGEMENT (MMW)
6 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
STAT TRANSES
No ratings yet
STAT TRANSES
5 pages
2. Know_Your_Data and Rescaling-1
No ratings yet
2. Know_Your_Data and Rescaling-1
78 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Descriptive_Statistics
No ratings yet
Descriptive_Statistics
73 pages
Module1 Understanding Data1
No ratings yet
Module1 Understanding Data1
56 pages
Data Analytics Theory
No ratings yet
Data Analytics Theory
35 pages
RM Module 3
No ratings yet
RM Module 3
34 pages
02data DMDW
No ratings yet
02data DMDW
40 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Data Preparation-Part 1-231018-220411
No ratings yet
Data Preparation-Part 1-231018-220411
74 pages
Chapter1 (1)
No ratings yet
Chapter1 (1)
16 pages
Data Management MMW
No ratings yet
Data Management MMW
92 pages
Statistics
No ratings yet
Statistics
88 pages
Basic Ideas of Data Management
No ratings yet
Basic Ideas of Data Management
32 pages
WEEK-1-QUANTITATIVE
No ratings yet
WEEK-1-QUANTITATIVE
32 pages
Unit 3-Statistics
No ratings yet
Unit 3-Statistics
15 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
BSM with SPSS[1]
No ratings yet
BSM with SPSS[1]
90 pages
Data Management
No ratings yet
Data Management
36 pages
02 Data
No ratings yet
02 Data
35 pages
data handluing
No ratings yet
data handluing
108 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
Data Analysis for Engineers and Statisticians: A Modern Guide to Statistical Methods and Techniques
From Everand
Data Analysis for Engineers and Statisticians: A Modern Guide to Statistical Methods and Techniques
Pasquale De Marco
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Mastering Categorical Data Analysis
From Everand
Mastering Categorical Data Analysis
Pasquale De Marco
No ratings yet
Class-X Term - 2 Project - Compressed
No ratings yet
Class-X Term - 2 Project - Compressed
16 pages
Bryla Et Al. - 2023 - Consumer Adoption of Electric Vehicles - A Systematic Literature Review
No ratings yet
Bryla Et Al. - 2023 - Consumer Adoption of Electric Vehicles - A Systematic Literature Review
16 pages
ICT course outline
No ratings yet
ICT course outline
3 pages
SQL Injection
No ratings yet
SQL Injection
21 pages
DEFINITIONSEMINAR
No ratings yet
DEFINITIONSEMINAR
4 pages
Automated Power Assessment For Turboshaft Engines
No ratings yet
Automated Power Assessment For Turboshaft Engines
15 pages
Application of Data Mining Techniques To Support Customer Relationship Management at Ethiopian Airlines 2002 Thesis
No ratings yet
Application of Data Mining Techniques To Support Customer Relationship Management at Ethiopian Airlines 2002 Thesis
153 pages
IEEE Software 19 ML Patterns
No ratings yet
IEEE Software 19 ML Patterns
8 pages
Sample Qp Practical
No ratings yet
Sample Qp Practical
3 pages
Information Technology (IT), As Defined by The Information Technology Association
100% (1)
Information Technology (IT), As Defined by The Information Technology Association
3 pages
Concepts and Operational Definitions - Plainformat
No ratings yet
Concepts and Operational Definitions - Plainformat
12 pages
ER
No ratings yet
ER
9 pages
Desain Pembelajaran Statistik Melalui Google Classroom: Chairul Fajar Tafrilyanto, Harfin Lanya, Moh. Zayyadi
No ratings yet
Desain Pembelajaran Statistik Melalui Google Classroom: Chairul Fajar Tafrilyanto, Harfin Lanya, Moh. Zayyadi
10 pages
Sem1-Dec-2020-Fluency Eng-Question-Paper
No ratings yet
Sem1-Dec-2020-Fluency Eng-Question-Paper
3 pages
Assessment of OpenStreetMap Data - A Review
No ratings yet
Assessment of OpenStreetMap Data - A Review
5 pages
Learn Outlier Detection in Python PyOD Library 1566237490
No ratings yet
Learn Outlier Detection in Python PyOD Library 1566237490
23 pages
What Is Qualitative Research
No ratings yet
What Is Qualitative Research
4 pages
Microsoft Actualtests AI-100 v2019-10-04 by Sebastian 67q
No ratings yet
Microsoft Actualtests AI-100 v2019-10-04 by Sebastian 67q
61 pages
Doctor- Patient Confidentiality Understanding and Challenges Among Medical Students Final (141to 160)
No ratings yet
Doctor- Patient Confidentiality Understanding and Challenges Among Medical Students Final (141to 160)
17 pages
L12 FileInputOutput
No ratings yet
L12 FileInputOutput
18 pages
COS - Week 12
No ratings yet
COS - Week 12
59 pages
Explain in Brief Flash Memory - : - Sram and Dram
No ratings yet
Explain in Brief Flash Memory - : - Sram and Dram
4 pages
Machine Learning-1
No ratings yet
Machine Learning-1
64 pages
Tables
No ratings yet
Tables
3 pages
Sap Refresh
No ratings yet
Sap Refresh
73 pages
It Ba 2 Module 5
No ratings yet
It Ba 2 Module 5
23 pages
Ahmed Gamal Resume
No ratings yet
Ahmed Gamal Resume
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CHAPTER-4-Data-Management

Uploaded by

CHAPTER-4-Data-Management

Uploaded by

CHAPTER 4

At the end of this chapter, you are expected to:

Components of Data Management

a. Primary Data - Collected first hand through surveys, experiments, or

b. Secondary Data - Obtained from existing sources like government reports,

B. Data Cleaning and Preparation

a. Data Cleaning - Identifying and correcting errors, inconsistencies, and

• Data normalization - Scaling data to a common range.

C. Data Storage and Organization

a. Data Warehouses - Centralized repositories for storing large volumes of data.

b. Data Lakes - Unstructured data storage systems.

c. Data Marts - Smaller, focused data warehouses.

a. Descriptive Statistics - Summarizing data using measures like mean, median,

b. Inferential Statistics - Drawing conclusions about a population based on a

c. Hypothesis Testing - Making inferences about population parameters.

d. Regression Analysis - Modelling relationships between variables.

e. Machine Learning - Using algorithms to learn patterns from data.

(Measuring the plant height)

When to Use Mode

Solution: Arrange the number of books borrowed in increasing order.

35, 54, 58, 60, 97

The median is 58.

Solution: Arrange the scores in increasing order.

To find the mean of ungrouped data, use the formula;

Example 2. Find the Average Salary of a Company

Solution: To find the average salary

Therefore, the average salary of the company is PHP 315,000.

• Unique Value. A dataset can only have one mean.

Range for Ungrouped Data

Average Deviation for Ungrouped Data

The Average Deviation (AD) is a measure of absolute dispersion that is affected by

To be able to apply the formula, the following steps can be observed:

To be able to apply the formula, the following steps ca be observed:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.