0% found this document useful (0 votes)
36 views3 pages

Reviewer +Ch+1+Data+and+Data+Preparation+

1. This document discusses data types, variables, and data preparation. It defines key terms like data, variables, and different data types. 2. It describes different types of data like cross-sectional, structured, unstructured, and big data. It notes the challenges of managing and analyzing big data. 3. The document outlines steps for data preparation including counting, sorting, handling missing values through omission or imputation, and subsetting data. It also discusses descriptive and inferential statistics and their purposes in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

Reviewer +Ch+1+Data+and+Data+Preparation+

1. This document discusses data types, variables, and data preparation. It defines key terms like data, variables, and different data types. 2. It describes different types of data like cross-sectional, structured, unstructured, and big data. It notes the challenges of managing and analyzing big data. 3. The document outlines steps for data preparation including counting, sorting, handling missing values through omission or imputation, and subsetting data. It also discusses descriptive and inferential statistics and their purposes in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CHAPTER 1 3. Benefits may not justify costs.

DATA AND DATA PREPARATION ■ BD’s characteristics:


Learning Objectives: 1. Volume: immense amount of data
1. Various “Data” types compiled for a single or multiple
2. Variables and Types of Measurement Scales sources
3. Data subsetting 2. Velocity: generated at a rapid speed,
management is a critical issue
Key Words 3. Variety: all types, forms,
Data - are compilations of facts, figures, or other contents, both granularity, structured or
numerical and non-numerical. unstructured
★ Purpose: decision making 4. Veracity: credibility and quality of
★ Types: the data, reliability
○ Cross-sectional - data collected by recording a 5. Values: methodological plan for
characteristic of many subjects at the same point in formulating questions, curating the
time, or without regard to differences in time. right data and unlocking hidden
potential.
○ Structured - Numerical information in a
pre-defined, row-column format that is objective and Statistics (as a science)
not open to interpretation, i.e. “spreadsheet”, Data ➔ Language of Data
Matrix1 ➔ Process2
a. Data collection
○ Unstructured - (implied structure, which is not b. Data Preparation
arrange in row-columnmodel) data that is not ● Counting and sorting
structured e.g. Twitter, YouTube, Facebook, and 1. Among the very first tasks analysts
Blogs; perform
2. Gain a better understanding and
○ Big Data (BD) - A massive volume of structured insights into the data
and unstructured data. 3. Help to verify that the data set is
■ Predicament of having BD complete or determine if there are
1. Extremely difficult to manage, missing values
process, and analyze using
traditional data processing tools. 2
(p.143 Research & Thesis Writing with Computer Application by
2. Inconvenient and computationally Laurentina Paler-Calmorin, PhD, Revised Edition 2016)
Data Processing
burdensome. 1. Categorization
2. Coding
1 3. Tabulation of Data
(Ibid p.151)
Data matrix is the presentation of data in tabular or table form.
4. Sorting allows us to review the ➔ Procedure
range of values for each variable ➔ First: find the right data and prepare it for the analysis.
5. Sort based on a single or multiple ➔ Second: use the appropriate statistical tool, which depends
variables on the data.
● Handling missing values ➔ Third: clearly communicate information with actionable
1. Omission strategy - observations business insights
with missing values be excluded
from subsequent analysis. READ THE ADD ONS3
2. Imputation strategy - the missing
values be replaced with some
Branches of Statistics
reasonable imputed values.
1. Descriptive - summary of important aspects of a data set.
◆ Numeric variables: replace
2. Inferential - drawing conclusions about a larger set of data
with the average.
(population) based on a smaller set of data (sample)
◆ Categorical variables:
replace with the
Sample - the subset of the population.
predominant category
★ Purpose: to make inferences4 about the various
characteristics of the population.
● Subsetting - is the process of extracting a
portion of the data set that is relevant for
subsequent statistical analysis. Assignment: Give 5 examples of Philippine Government Agencies or Offices
1. The objective of the analysis is to that can be sourced of data as well of what kind of data it can provide,
compare two subsets of the data. Please include as well of its function and power.
2. Eliminate observations that contain
missing values, low-quality data, or Variable5 - A variable is a characteristic of interest that differs in
outliers. kind or degree among various observations (records).
3. Excluding variables that contain 1. Categorical Variables - are typically expressed in words but
redundant information, or variables are coded into numbers for purposes of data processing.
with excessive amounts of missing Typically count the number of observations that fall into
values.
4. We can also subset data based on 3
data ranges. (Ibid)
Stages of Data Processing
1. Input - “responses” of the subject sample population
c. Data Analysis 2. Throughput - statistical procedures and techniques.
3. Output - presentation of the results/findings thru matrix form.
d. Data Interpretation 4
a conclusion reached on the basis of evidence and reasoning
e. Data Presentation 5
(Ibid, p.17) variable, an information that is susceptible or liable to fluctuation or
change in value, level, degree, scale, or magnitude under different conditions, which is
represented by numerical values, groups, classes, kinds, or categories.
each category (or find percentages). Unable to perform c. Values differ by label or name
meaningful arithmetic operations. d. e.g. marital status, brands or products. See
a. Also called qualitative6 Demographics.
b. Represent categories
c. Labels or names to identify distinguishing 2. Ordinal
characteristics a. Stronger level of measurement
d. Can be defined by two or more categories b. Categorize and rank data with respect to some
e. Coded into numbers for data processing. characteristic
f. e.g. marital status, grade in a course, etc. c. Cannot interpret the difference between the ranked
g. Scales of Measurements values, numbers are arbitrary
i. Nominal Scale d. e.g. reviews from 1 star (poor) to 5 starts
ii. Ordinal (outstanding) See. Likert Scale

2. Numeric Variables 3. Interval - holds no true zero or absolute zero and can
a. Also called quantitative7 represent values below zero (-1 and so on).
b. Represent meaningful numbers. a. Categorize and rank, differences are meaningful
c. Scales of Measurements b. Zero value is arbitrary and does not reflect absence
i. Interval Scale of characteristic
ii. Ratio Scale c. Ratios are not meaningful
d. Classification d. e.g. temperature - 0 degrees Celsius, such as -10
i. Discrete - assumes a countable number of degrees. Time
values, which need not be whole numbers.
e.g. number of children in a family 4. Ratio - never fall below zero. Height and weight measure
from 0 and above, but never fall below it.
ii. Continuous - assumes an uncountable a. Strongest level of measurement
number of values within an interval. In b. A true zero point, reflects absence of characteristic
practice, often measure in discrete values c. Ratios are meaningful
e.g. weight of a newborn baby (as it grows d. e.g. profits
and age.)

Scale of Measurements READ THE ADD ONS: Interval scale Vs Ratio scale: What is the difference?
1. Nominal
a. Least sophisticated
b. Represent categories or groups

6
(Ibid, p.151) qualitative - verbal interpretation
7
(Ibid, p.151) quantitative - refers to figure or number.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy