Introduction To Statistics - c1
Introduction To Statistics - c1
Definition of Statistics
• The definition of statistics can be expressed in two ways to cover two different concepts.
Statistics as numerical data
Statistics for the statistical method
• According to Prof. Horace Secrist, “It is the aggregate of facts affected to nark extent by
the multiplicity of causes, numerically expressed. Enumerated or estimated according to
a reasonable standard of accuracy, connecting in a systematic manner for the
predetermined purpose and placed in relation to each other”.
Secrist’s definition of statistics is more complete. The vital points that
the definition covers are:
• Aggregate of facts
• Affected by the multiplicity of causes
• Numerically expressed
• Estimated according to the standard of accuracy
• Systematic collection of data
• Data collected for a predetermined purpose
• Comparable
2. Statistics as Statistical Method
• Quantitative Data: Numerical data that can be measured and analyzed statistically.
Discrete Data: Countable values (e.g., number of students).
Continuous Data: Measurable values that can take on any number within a range
(e.g., height, weight).
• Qualitative Data: Descriptive data that can be observed but not measured
numerically.
Nominal Data: Categories without a specific order (e.g., colors, types of animals).
Ordinal Data: Categories with a meaningful order (e.g., rankings, satisfaction
levels).
Statistical Investigation
• Investigation carried out by any agency wherein relevant data are collected
and they are analysed with the help of different statistical methods.
• Quantitative Variables Quantitative variables, also known as numerical variables, represent quantities and are
measured on a numerical scale. They can be further divided into two types:
Discrete Variables:
These variables can take on a finite number of values.
Examples include:
• Number of students in a class
• Number of cars in a parking lot
Continuous Variables:
These variables can take on an infinite number of values within a given range.
They are measurable and can take any value within that range.
Examples include:
• Weight of individuals
• Temperature
• Qualitative Variables Qualitative variables, also known as categorical variables, represent qualities or categories.
Nominal Variables:
Ordinal Variables:
These variables represent categories with a logical order or ranking, but the differences between the categories are
not measurable.
Examples include:
• Satisfaction rating (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied)
• Socioeconomic status (Low, Middle, High)
Examples
• Classify each of the following characters as nominal, ordinal, discrete or
continuous variable.
1. Family size
2. Family Income
3. Number of books on a shelf
4. Height of students
5.Hours spent studying per week
6.Satisfaction with current study method
7. Preferred study method
Levels/Scales of Measurement
• Data can also be classified based on the level of measurement, which indicates
the amount of information contained within the data and the types of statistical
analyses that can be performed.
• They are classified into four types: nominal, ordinal, interval, and ratio.
• Each scale has specific properties that determine the types of statistical
analyses that can be performed.
Levels of Measurement
Nominal Level:
Ordinal Level:
➢Data are categorized with a logical order, but the differences between the categories
are not meaningful.
➢ Data are ordered with meaningful differences between values, but there is no true zero point.
➢ In the interval scale, difference between any two values is meaningful and consistent throughout the
scale.
➢ The zero point on an interval scale is arbitrary and does not indicate the absence of the quantity being
measured. It is a point of reference, not a true zero.
Examples:
➢Temperature in Celsius or Fahrenheit: The difference between 20°C and 30°C is the same as the
difference between 30°C and 40°C. However, 0°C does not mean the absence of temperature.
➢ Calendar Years: The difference between the years 2000 and 2010 is the same as between 2010 and
2020, but year 0 does not represent the absence of time.
Ratio Level:
✓ Data are ordered with meaningful differences and a true zero point, allowing for the
calculation of ratios.
✓ Like interval scales, the difference between values is meaningful and consistent.
✓ The zero point on a ratio scale is meaningful and indicates the absence of the quantity
being measured.
Examples:
✓ Height and Weight: Height of 0 cm means no height, and weight of 0 kg means no
weight. Ratios like twice as tall or half as heavy are meaningful.
Collection of Data
• In statistics and research, data can be collected as two main categories:
• Primary data and
• secondary data
• Understanding the distinction between these two types is essential for effectively designing
research studies, collecting data, and analyzing the results.
1.Primary Data
• Primary data is original data collected directly by the researcher for a specific research
purpose or project. This data is gathered first-hand and is tailored to meet the specific
needs of the study.
Methods of Collection:
• Surveys and Questionnaires: Structured forms used to collect information from
respondents.
Examples: Customer satisfaction surveys and employee feedback surveys.
• Interviews: Direct, face-to-face, or virtual conversations to gather detailed information.
Examples: In-depth interviews, structured interviews, unstructured interviews.
• Observations: Watching and recording behaviours or events as they occur naturally.
Examples: Observing customer behaviour in a store, and monitoring employee
performance.
• Experiments: Controlled studies where variables are manipulated to observe their effects.
Examples: Clinical trials and lab experiments.
• Focus Groups: Small group discussions guided by a moderator to explore specific topics.
Examples: Product development feedback and marketing strategy discussions.
Advantages:
• Specific and Relevant: Directly addresses the research question.
• Control Over Data Quality: The researcher can ensure data accuracy and
reliability.
• Up-to-Date Information: Reflects the current situation or behaviors.
Disadvantages:
• Time-Consuming: Data collection can be lengthy.
• Costly: Requires resources for data collection and analysis.
• Limited Scope: May only cover a specific population or area.
2. Secondary Data
• Secondary data is data that has already been collected, processed, and made
available by other sources.
• This data is typically used for purposes different from those for which it was
originally collected.
Sources of Secondary Data:
• Government Publications: Census data, economic reports, health statistics.
• Academic Journals and Books: Research studies, review articles, theoretical
papers.
• Business and Industry Reports: Market analysis, industry trends, and
financial reports.
• Online Databases: Databases like JSTOR, PubMed, and company databases.
• Internal Company Records: Sales records, customer databases, employee
records.
Advantages:
• Cost-Effective: Generally, less expensive than collecting primary data.
• Time-Saving: Data is readily available, reducing the need for time-
consuming data collection.
• Broad Scope: Can cover large populations and long time periods.
Disadvantages:
• Less Specific: This may not perfectly fit the current research needs.
• Quality Concerns: The researcher has no control over how the data was
collected.
• Potentially Outdated: Data might not reflect the most current conditions or
behaviours.