UNIT 5 Data Literacy Levels of Measurement QuesAnsExtra
UNIT 5 Data Literacy Levels of Measurement QuesAnsExtra
Analysis
Levels of Measurement | Nominal, Ordinal, Interval and Ratio
Levels of measurement, also called scales of measurement, tell you how
precisely variables are recorded. In scientific research, a variable is anything that can take
on different values across your data set (e.g., height or test scores).
Depending on the level of measurement of the variable, what you can do to analyze your
data may be limited. There is a hierarchy in the complexity and precision of the level of
measurement, from low (nominal) to high (ratio).
Going from lowest to highest, the 4 levels of measurement are cumulative. This means
that they each take on the properties of lower levels and add new properties.
You can categorize your data by labelling them in mutually City of birth
exclusive groups, but there is no order between the
Gender
categories.
Ethnicity
Car brands
Marital status
You can categorize and rank your data in an order, but you Top 5 Olympic medallists
cannot say anything about the intervals between the
Language ability (e.g.,
rankings.
beginner, intermediate,
Although you can rank the top 5 Olympic medallists, this fluent)
scale does not tell you how close or far apart they are in
Likert-type
number of wins.
questions (Likert scale is a
rating scale used to measure
opinions, attitudes, or
behaviors. e.g., very
dissatisfied to very
satisfied)
You can categorize, rank, and infer equal intervals between Test scores (e.g., IQ or
neighboring data points, but there is no true zero point. exams)
You can categorize, rank, and infer equal intervals between Height
neighboring data points, and there is a true zero point.
Age
A true zero means there is an absence of the variable of
Weight
interest. In ratio scales, zero does mean an absolute lack of
the variable. Temperature in Kelvin
MCQs :
1. What does data literacy mean?
A) The ability to read and write data
B) The ability to collect and store data securely
C) The ability to find and use data effectively
D) The ability to analyze data using AI
Answer: C
2. Which of the following is not a type of data?
A) Structured
B) Unstructured
C) Interpreted
D) Semi-structured
Answer: C
3. What is the main purpose of data collection?
A) To capture a record of past events
B) To delete unneeded information
C) To create false trends
D) To change information
Answer: A
4. Which of the following is a primary source of data collection?
A) Social media data tracking
B) Survey
C) Satellite data tracking
D) Web scraping
Answer: B
5. What does ordinal data represent?
A) Data with no order or rank
B) Categorical data with no difference between the data points
C) Data that can be ranked but not measured
D) Data with equal intervals and no true zero
Answer: C
6. Which level of data allows for meaningful ratios and has a true zero?
A) Nominal
B) Ordinal
C) Interval
D) Ratio
Answer: D
7. What does the “mean” represent in a data set?
A) The middle value
B) The most frequent value
C) The average of all values
D) The range of values
Answer: C
8. Which of the following is a common method of handling missing data?
A) Ignoring it
B) Deleting rows or columns with missing values
C) Converting all missing values to zero
D) Duplicating the data
Answer: B
9. What is the role of variance in a data set?
A) It measures the central value of the data
B) It shows the highest and lowest values
C) It measures the spread of the data points from the mean
D) It counts the number of data points
Answer: C
10.What is data preprocessing?
A) Cleaning and transforming data to prepare it for analysis
B) Storing data in multiple formats
C) Analyzing data using advanced AI techniques
D) Eliminating duplicates in data
Answer: A
11.Which graph is best for displaying trends over time?
A) Pie chart
B) Bar graph
C) Line graph
D) Scatter plot
Answer: C
12.What does a scatter plot represent?
A) The distribution of categorical data
B) The relationship between two variables
C) The proportion of parts to a whole
D) A summary of the central tendency
Answer: B
13.What is the primary function of Matplotlib in Python?
A) Cleaning data
B) Visualizing data through charts and graphs
C) Generating machine learning models
D) Storing data in a database
Answer: B
14.Which measure of central tendency is most affected by extreme values?
A) Median
B) Mode
C) Mean
D) Range
Answer: C
15.What is the purpose of feature selection in data preprocessing?
A) To create more features for better analysis
B) To reduce irrelevant data and improve model performance
C) To duplicate the data
D) To introduce missing values
Answer: B
16.Which of the following is a key method of data reduction?
A) Data normalization
B) Data cleaning
C) Dimensionality reduction
D) Feature transformation
Answer: C
17.In AI, which type of data source is Kaggle considered?
A) Primary source
B) Secondary source
C) Observational source
D) Experiment source
Answer: B
18.Which Python library is commonly used for statistical analysis?
A) NumPy
B) pandas
C) Matplotlib
D) statistics
Answer: D
19.What does data integration refer to?
A) Cleaning and transforming data
B) Merging data from multiple sources
C) Splitting data for machine learning models
D) Reducing the number of features in data
Answer: B
20.Why is diversity important in data collection for AI models?
A) It speeds up data processing
B) It helps the model cover more scenarios
C) It increases model accuracy in all situations
D) It reduces the volume of data needed
Answer: B
21.Which method helps identify relationships between variables in a data set?
A) Line graph
B) Histogram
C) Scatter plot
D) Bar graph
Answer: C
22.What is the difference between primary and secondary data?
A) Primary data is readily available, while secondary data must be collected
B) Primary data is new and collected for a specific purpose, while secondary data is
already existing
C) Secondary data is always structured, while primary data is not
D) Primary data is collected from social media, and secondary data from experiments
Answer: B
23.What is an outlier in data?
A) A data point that lies outside the expected range
B) A duplicate entry in the data set
C) The most frequent value in the data
D) A missing value
Answer: A
24.What is data normalization?
A) Changing data into structured format
B) Ensuring all features have a similar scale and distribution
C) Merging data from multiple sources
D) Removing inconsistencies in the data
Answer: B
25.What kind of graph would you use to display categorical data?
A) Pie chart
B) Line graph
C) Histogram
D) Scatter plot
Answer: A
26.What is the primary difference between nominal and ordinal data?
A) Nominal data can be ordered, while ordinal data cannot.
B) Ordinal data can be ordered, but nominal data cannot.
C) Both nominal and ordinal data can be ordered.
D) Nominal data represents numerical values, while ordinal data represents categories.
Answer: B
27.What does the median represent in a dataset?
A) The most frequent value
B) The highest value
C) The middle value when data is ordered
D) The difference between the highest and lowest values
Answer: C
28.Which of the following represents an example of interval data?
A) Temperature in Celsius
B) Grades in a class
C) Colors of cars
D) Number of students in a class
Answer: A
29.Which statement is true about a ratio scale?
A) It has no true zero
B) It allows for meaningful ratios between data points
C) It only applies to nominal data
D) It cannot be used for mathematical operations
Answer: B
30.What type of chart is best for showing parts of a whole?
A) Bar chart
B) Line graph
C) Pie chart
D) Scatter plot
Answer: C
31.What is one limitation of a histogram?
A) It can only display categorical data
B) It can only display one data distribution per axis
C) It cannot show frequencies of values
D) It cannot display continuous data
Answer: B
32.What does the standard deviation tell us about a dataset?
A) How spread out the data points are from the mean
B) The central value of the data
C) The most frequent value in the dataset
D) The relationship between two variables
Answer: A
33.In which situation would you use a bar graph?
A) To show how one variable changes over time
B) To compare different categories of data
C) To show the distribution of continuous data
D) To find the relationship between two numerical variables
Answer: B
34.What does “mean” represent in statistical analysis?
A) The highest number in a dataset
B) The difference between the highest and lowest numbers
C) The average of the dataset
D) The most frequent number in the dataset
Answer: C
35.Which type of data representation is best for visualizing the correlation between two
variables?
A) Line graph
B) Pie chart
C) Bar graph
D) Scatter plot
Answer: D
36.What is the purpose of a “train-test split” in data modeling?
A) To clean the data
B) To evaluate a model’s performance
C) To visualize the dataset
D) To increase the size of the dataset
Answer: B
37.Which of the following methods is used to handle outliers in data?
A) Ignoring them
B) Calculating the mode
C) Using robust statistical techniques
D) Replacing them with zero
Answer: C
38.Which technique ensures that the performance of a model is consistent across
different subsets of data?
A) Train-test split
B) Cross-validation
C) Mean calculation
D) Data augmentation
Answer: B
39.What is the goal of data preprocessing?
A) To make the dataset larger
B) To prepare data for analysis by cleaning, transforming, and reducing it
C) To train a machine learning model
D) To remove data that is not useful
Answer: B
40.Which of the following is a graphical representation of data distribution?
A) Bar graph
B) Histogram
C) Pie chart
D) Line graph
Answer: B
41.Which chart is best suited for comparing rainfall data over a year?
A) Pie chart
B) Line graph
C) Scatter plot
D) Histogram
Answer: B
42.Why is data diversity important in machine learning?
A) To reduce the complexity of models
B) To ensure the model generalizes to more scenarios
C) To simplify the data preprocessing process
D) To increase the model’s accuracy for a single scenario
Answer: B
43.What does the variance of a dataset represent?
A) The central value of the dataset
B) How far each data point is from the mean
C) The highest value in the dataset
D) The sum of all data points
Answer: B
44.Which Python library is commonly used to create visual data representations?
A) NumPy
B) pandas
C) Matplotlib
D) TensorFlow
Answer: C
45.What is a key characteristic of secondary data?
A) It is collected for a specific purpose
B) It requires interviews and surveys to gather
C) It is pre-existing data available for analysis
D) It is collected during experiments
Answer: C
46.Which chart is used to represent the distribution of heights in a class?
A) Pie chart
B) Scatter plot
C) Histogram
D) Line graph
Answer: C
47.In a bar chart, the length of each bar is proportional to:
A) The sum of all data points
B) The category it represents
C) The value it represents
D) The relationship between two variables
Answer: C
48.Which technique is used to convert categorical variables into numerical variables?
A) Data cleaning
B) Data transformation
C) Data reduction
D) Data normalization
Answer: B
49.What is the primary goal of data reduction?
A) To increase the size of the dataset
B) To reduce the number of features while retaining important information
C) To create more data points
D) To remove outliers from the dataset
Answer: B
50.Which type of data cannot be used for calculations and does not follow any order?
A) Nominal
B) Ordinal
C) Interval
D) Ratio
Answer: A
SHORT-ANSWERED QUESTIONS:
LONG-ANSWERED QUESTIONS:
1. What is Data Literacy, and why is it important in the context of Artificial
Intelligence (AI)?
Answer: Data literacy refers to the ability to find, interpret, and use data effectively. In AI,
data literacy involves understanding how to collect, organize, analyze, and utilize data for
problem-solving and decision-making. AI relies heavily on data; thus, the ability to manage
and interpret large datasets is essential. Data literacy also includes skills like ensuring data
quality and using it ethically. It allows individuals to convert raw data into actionable
insights, a process crucial in fields such as AI where data-driven decision-making can lead
to innovation and efficiency.
Answer: Data collection is the foundational step in AI projects, involving gathering data
from various sources—both online and offline—to train machine learning models. The
significance lies in the fact that the accuracy and diversity of the data collected directly
affect the quality of predictions made by AI models. Two main sources of data include
primary sources (e.g., surveys, interviews, experiments) and secondary sources (e.g.,
databases, social media, web scraping). Proper data collection ensures that the AI system
can generalize well to unseen scenarios, making the model robust and accurate.
Nominal Level: Data is categorized without any order. For example, car brands like
BMW, Audi, and Mercedes are nominal.
Ordinal Level: Data is ordered but the difference between data points is not
meaningful. For example, restaurant ratings like “tasty” and “delicious.”
Interval Level: Data is ordered, and differences between points are meaningful, but
there is no true zero. An example is temperature in Celsius.
Ratio Level: Similar to interval data but with a true zero. Weight and height
measurements are examples.
4. What are the measures of central tendency, and how are they calculated?
Mean: The average of a dataset, calculated by summing all values and dividing by
the total number of observations.
5. How is statistical data represented graphically, and what are the advantages of
graphical representation?
Answer: Statistical data can be represented using various graphical techniques such as:
6. Describe the role of matrices in Artificial Intelligence and give examples of their
applications.
Answer: Matrices are critical in AI, particularly in fields like computer vision, natural
language processing, and recommender systems. For example, in image processing, digital
images are represented as matrices where each pixel has a numerical value. In recommender
systems, matrices relate users to products they’ve viewed or purchased, allowing for
personalized recommendations. Matrices also represent vectors in natural language
processing, helping algorithms understand word distributions in a document.
Answer: Data preprocessing is the process of preparing raw data for machine learning
models by cleaning, transforming, and normalizing it. The key steps include:
5. Feature Selection: Identifying the most relevant features that contribute to the target
variable.
8. Explain the significance of splitting data into training and testing sets in machine
learning.
Answer: In machine learning, data is split into training and testing sets to assess the
model’s performance. The training set is used to train the model, while the testing set
evaluates how well the model generalizes to unseen data. This helps avoid overfitting,
where a model performs well on training data but poorly on new, unseen data. Techniques
like cross-validation can also be applied to ensure consistent model performance across
different data subsets, improving the reliability of the model’s predictions.
Answer: Variance and standard deviation are measures of data dispersion. Variance
indicates how spread out the data points are from the mean, while standard deviation is the
square root of variance. A low variance or standard deviation means data points are
clustered closely around the mean, while high values indicate data points are widely spread.
These metrics are useful in understanding the variability within a dataset, helping to identify
whether the data has significant outliers or is uniformly distributed.
10. Discuss the importance of data visualization in AI and the tools commonly used for
it.