Math144 M1reviewer
Math144 M1reviewer
I. Ensure the project team has the right mix of domain experts, customers, analytic talent, and project
management to be effective.
II. Evaluate how much time is needed and if the team has the right breadth and depth of skills.
In creating a mechanism for performing ongoing monitoring of model accuracy under the operationalize
phase, which of the following statements is/are TRUE?
II. Design alerts to check for when the model is operating “out-of-bounds.”
ANS II only
Which of the following is/are TRUE about phase 5 of data analytics life cycle?
I. Here, the team considers how best to articulate the findings and outcomes to the various team members and
stakeholders, taking into account caveats, assumptions, and any limitations of the results.
II. It is critical to articulate the results properly and position the findings in a way that is appropriate for the
audience.
Which of the following is/are TRUE about testing and training datasets?
I. It is critical to ensure that the training and test datasets are sufficiently robust for the model and analytical
techniques.
II. A simple way to think of these datasets is to view the testing dataset for conducting the initial experiments
and the training datasets for validating an approach once the initial experiments and models have been run.
ANS I only
II. Although the modeling techniques and logic required to develop models can be highly complex, the
actual duration of this phase can be short compared to the time spent preparing the data and defining
the approaches.
I. As a general rule, the more executive the audience, the more succinct the presentation needs to be.
II. When presenting to audiences with more quantitative backgrounds, focus more time on the methodology and
findings.
ANS both I and II
In assessing whether the data the science team have succeeded or failed in its objectives established in phase
I, the following statements are true EXCEPT
When the team has ran the model, completed a thorough discovery phase, and learned a great deal about the
datasets, which of the following remaining tasks needs to be done?
1. This type of data repository are the ones that enables flexible, high performance analysis
in a nonproduction environment and can leverage in-database processing.
data marts
data warehouses
analytic sandbox
Spreadsheets
structured data
semi-structured data
quasi-structured data
unstructured data
3. Which of the following is/are TRUE about the current data analytic architecture?
I. At the beginning of the data workflow, analysts do get data provisioned for their downstream
analytics.
II. Although reports and dashboards are still important for organizations, most traditional data
architectures inhibit data exploration and more sophisticated analysis.
I only
II only
both I and II
neither I nor II
4. Which of the following is/are TRUE about the difference between business intelligence
and data science?
I. Business Intelligence typically involves structured data types and utilizes standard and ad hoc
reporting, dashboards, alerts, etc.
II. Data Science can handle both structured and nonstructured data types and utilizes
optimization, predictive modeling, forecasting, etc.
III. Data Science is more explanatory while Business Intelligence is more exploratory.
I and II only
I, II and III
social media
medical imaging
mobile sensors
textual data
CSV files
genetic mappings
multimedia files
7. Which of the following business driver in data analytics involves customer churn and
fraud detection?
optimize business operations
I. Organizations can apply advanced analytical techniques to optimize processes and derive
more value from these common tasks in business processes.
II. Compliance to regulatory laws require minimal complexity and data requirements.
I only
II only
both I and II
neither I nor II
10. Which of the following are considered problems in the traditional data architecture?
High-value data is hard to reach and leverage, and predictive analytics and data mining
activities are last in line for data.
Data moves in batches from EDW to local analytical tools.
Data Science projects will remain isolated and ad hoc, rather than centrally managed.
III. The challenges of the data deluge present the opportunity to transform business,
government, science, and everyday life.
I and II only
I, II and III
12. Which of the following is/are TRUE about the definition of BIG DATA?
I. According to McKinsey Global Report in 2011, big data is data whose scale, distribution,
diversity, and/or timeliness require the use of new technical architectures and analytics to
enable insights that unlock new sources of business value.
II. McKinsey’s definition of Big Data implies that organizations will need new data architectures
and analytic sandboxes, new tools, new analytical methods, and an integration of multiple skills
into the new role of the data scientist
I only
II only
both I and II
neither I nor II
QUIZ 2
1. This group of players are the ones who make sense of data collected from various
entities.
data devices
data collectors
data aggregators
4. During the decade when social media platforms explode, the generated data volume are
measured in terms of ________ scale
gigabytes
terabytes
petabytes
Exabytes
cellphone
loyalty cards
list brokers
7. This decade saw a proliferation of different kinds of data sources which are mainly
productivity and publishing tools.
1980s
1990s
2000s
2010s
8. Which of the following is/are TRUE about technology and data enablers?
I. This role requires skills related to computer engineering, programming, and database
administration.
II. To do their jobs, members need access to a robust analytic sandbox or workspace where
they can perform large-scale analytical data experiments.
I only
II only
both I and II
neither I nor II
Design, implement, and deploy statistical models and data mining techniques on Big Data
10. Which of the following is/are TRUE about big data ecosystem?
I. Organizations and data collectors are realizing that the data they can gather from individuals
contains intrinsic value.
II. As this new digital economy continues to evolve, the market sees the introduction of data
vendors and data cleaners that use crowdsourcing to test the outcomes of machine learning
techniques.
I only
II only
both I and II
neither I nor II
QUIZ 3
provides access to key databases or tables and ensuring the appropriate security levels are in
place related to the data repositories.
executes the actual data extractions and performs substantial data manipulation to
facilitate the analytics.
provides subject matter expertise for analytical techniques, data modeling, and applying valid
analytical techniques to given business problems.
gives business domain expertise based on a deep understanding of the data, key performance
indicators (KPIs), key metrics, and business intelligence from a reporting perspective.
2. In this phase of the data analytics life cycle, the team assesses the resources available
to support the project in terms of people, technology, time, and data.
Discovery
Data Preparation
Model Building
Model Planning
3. This critical step involves the process of stating the analytics problem to be solved.
4. Which of the following person provides the funding and gauges the degree of value from
the final outputs of the working team in a data analytics project?
Project Manager
Project Sponsor
Business User
I. A common mistake made in data science projects is rushing into data collection and analysis,
which precludes spending sufficient time to plan and scope the amount of work involved,
understanding requirements, or even framing the business problem properly.
II. Having a good data analytics process ensures a comprehensive and repeatable method for
conducting analysis and helps focus time and energy.
I only
II only
both I and II
neither I nor II
The team determine how much business or domain knowledge the data scientist needs to
develop models.
The team perform extract, load and transform to get the data in the sandbox.
The team identify the main objectives of the project, identify what needs to be achieved in
business terms, and identify what needs to be done to meet the needs.
The team identify the key stakeholders and their interests in the project.
7. In this phase of the data analytics life cycle, the team delivers final reports, briefings,
code, and technical documents.
Model Building
Model Planning
Communicate Results
Operationalize
I. Ensure the project team has the right mix of domain experts, customers, analytic talent, and
project management to be effective.
II. Evaluate how much time is needed and if the team has the right breadth and depth of skills.
I only
II only
both I and II
neither I nor II
QUIZ 4
I. Spending time to learn the nuances of the datasets provides context to understand what
constitutes a reasonable value and expected output versus what is a surprising finding.
II. It is important to catalog the data sources that the team has access to and identify additional
data sources that the team can leverage but perhaps does not have access to today.
I only
II only
both I and II
neither I nor II
2. This refers to the process of cleaning data, normalizing datasets, and performing
transformations on the data.
Data Cleansing
Data Transformation
Data Conditioning
Data Visualizing
3. In model planning, why is it important to check whether similar and existing approaches
have been done in relation to the data science problem?
Many times teams can get ideas from analogous problems that other people have solved in
different industry verticals or domain areas.
It is useful to research and understand how other analysts generally approach a specific kind of
problem.
Performing this sort of diligence gives the team ideas of how others have solved similar
problems and presents the team with a list of candidate models to try as part of the model
planning phase.
All of the Above
skewed distribution
unexpected values
5. The following activities are involved under the model planning phase EXCEPT
Ensure that the analytical techniques enable the team to meet the business objectives and
accept or reject the working hypotheses.
Evaluate whether similar, existing approaches are available or if the team will need to create
something new.
I. Under this phase, the team develop datasets for training, testing, and production purposes.
II. Data Exploration, Variable and Model selection characterize this phase.
I only
II only
both I and II
neither I nor II
7. Which of the following is TRUE about preparing the analytic sandbox?
I. Analytic sandbox enables data science team to access data without interfering with the live
production databases.
II. When developing the analytic sandbox, it is a best practice to collect just the right amount of
data as prescribed by IT group.
I only
II only
both I and II
neither I nor II
8. This step in the data analytics life cycle tends to be the most labor intensive and most
iterative.
Discovery
data preparation
model planning
model building
9. In the model selection subphase, the team’s main goal is to __________ based on the
end goal of the project
do data exploration
Performing ETLT
1. Which of the following is TRUE about the final phase of data analytics life cycle?
I. In the final phase, the team communicates the benefits of the project more broadly and sets
up a pilot project to deploy the work in a controlled way before broadening the work to a full
enterprise or ecosystem of users.
II. Under this phase, the team reflect on the project and consider what obstacles were in the
project and what can be improved in the future as well as make recommendations for future
work or improvements to existing processes.
I only
II only
both I and II
neither I nor II
2. In creating a mechanism for performing ongoing monitoring of model accuracy under the
operationalize phase, which of the following statements is/are TRUE?
I. If accuracy improves, finding ways to retrain the model.
II. Design alerts to check for when the model is operating “out-of-bounds.”
I only
II only
both I and II
neither I nor II
3. Which of the following are activities done under phase 5 of data analytics life cycle?
The team reflect on the implications of these findings and measure the business value.
The team record all the findings and then select the three most significant ones that can be
shared with the stakeholders.
SPSS Modeler
Octave
Alpine Miner
I. The phases of model planning and model building can overlap quite a bit, and in practice one
can iterate back and forth between the two phases for a while before settling on a final model.
II. Although the modeling techniques and logic required to develop models can be highly
complex, the actual duration of this phase can be short compared to the time spent preparing
the data and defining the approaches.
I only
II only
both I and II
neither I nor II
I. As a general rule, the more executive the audience, the more succinct the presentation needs
to be.
II. When presenting to audiences with more quantitative backgrounds, focus more time on the
methodology and findings.
I only
II only
both I and II
neither I nor II
8. In assessing whether the data the science team have succeeded or failed in its
objectives established in phase I, the following statements are true EXCEPT
Failure should not be considered as a true failure, but rather as a failure of the data to accept or
reject a given hypothesis adequately.
It is important to conduct very robust analysis to show specific and desired results.
When conducting this assessment, determine if the results are statistically significant and valid.
It is incorrect to only do a mere superficial analysis, which is not robust enough to accept or
reject a hypothesis.
9. Which of the following is/are TRUE about phase 5 of data analytics life cycle?
I. Here, the team considers how best to articulate the findings and outcomes to the various team
members and stakeholders, taking into account caveats, assumptions, and any limitations of the
results.
II. It is critical to articulate the results properly and position the findings in a way that is
appropriate for the audience.
I only
II only
both I and II
neither I nor II
10. When the team has ran the model, completed a thorough discovery phase, and learned
a great deal about the datasets, which of the following remaining tasks needs to be
done?
Reflect on the project and consider what obstacles were in the project and what can be
improved in the future.
Consider what each of the team members and stakeholders needs to fulfill her responsibilities.
12. Which of the following is/are TRUE about testing and training datasets?
I. It is critical to ensure that the training and test datasets are sufficiently robust for the model
and analytical techniques.
II. A simple way to think of these datasets is to view the testing dataset for conducting the initial
experiments and the training datasets for validating an approach once the initial experiments
and models have been run.
I only
II only
both I and II
neither I nor II
QUIZ 6
Estimate the predicted value of Houseprice if the floor area is 115 and the number of bedrooms
is 3.
297, 415
322, 525
356, 875
409, 275
House Prices as predicted by floor area, number of bedrooms, and location type (urban or
suburban)
It is applied when the dependent variable can be explained by two or more variables.
and represents the slope and intercept terms of the regression model, respectively.
outliers
unexpected pattern
changes in variablity
6. The following summary output is obtained after applying regression analysis in regards
to advertising expense relation to sales revenue of a company:
Which of the following statements is/are TRUE?
I. For every unit increase in TV advertisement expense, there is 0.045 unit increase in sales.
II. 94.7% of the variation in sales revenue can be explained by the variation in predictor
variables.
III. When a company did not spend in any advertising, the company is predicted to have 2.921
units of sales.
I and II only
I, II and III
7. Which of the following connects the branches of Descriptive and Inferential Statistics?
Graph Theory
Automata
Calculus
Probability
Clustering Analysis
Regression Analysis
Discriminant Analysis
Canonical Analysis
9. Which of the following is/are TRUE about descriptive and inferential statistics?
I. Descriptive Statistics deals with the collection, organization and presentation of data.
II. Inferential Statistics concern with making predictions and drawing conclusion for a larger
group of data.
I only
II only
both I and II
neither I nor II
10. Which of the following is/are TRUE about 'best fitting regression' line?
I. The slope and intercept of the best fitting line is obtained by minimizing the sum of the
residuals.
II. The fitted values is made as 'close' as possible to the observed values .
I only
II only
both I and II
neither I nor II
QUIZ 7
I. When the outcome variable is categorical, logistic regression is applicable instead of linear
regression.
II. The predictor variables in logistic regression can be continuous, discrete or categorical
variables.
I only
II only
both I and II
neither I nor II
2. Which of the following is/are TRUE about the theoretical model on logistic regression?
I. Logistic regression is based on the logistic function f(y) whose values ranges from -1 to 1.
II. The values of y in the logistic function f(y) are not directly observed but instead, only the
values of f(y) in terms of success or failure are observed.
I only
II only
both I and II
neither I nor II
3. Based on the following results of logistic regression, what is the likelihood of churning
when Age = 40 and Churned_contacts = 5? (Note: Round coefficients up to 2 decimal
places)
0.714
0.623
0.357
0.269
House Prices as predicted by floor area, number of bedrooms, and location type (urban or
suburban)
5. Based on the following results of logistic regression, which of the following statements
is/are TRUE?
I. For every 1 unit increased in Age, the value of logistic function increases by 0.16.
II. The regression coefficient for the Married variable is not significant.
I only
II only
both I and II
neither I nor II
6. The following are situations where logistic regression can be applied EXCEPT
loan approval as determined by a persons income, credit score and amount of loan
In this phase of the data analytics life cycle, the team delivers final reports, briefings, code, and technical
documents.
Answer: Operationalize
(Slide 12)
Which of the following person provides the funding and gauges the degree of value from the final
outputs of the working team in a data analytics project?
Answer: Project Sponsor
(Slide 7)
Which of the following key questions are helpful to ask during the discovery phase when interviewing
the project sponsor?
Answer: All of the above
(Slide 21)
This refers to the process of cleaning data, normalizing datasets, and performing transformations on the
data.
Answer: Data Conditioning
(Slide 38)
In this phase of the data analytics life cycle, the team assesses the resources available to support the
project in terms of people, technology, time, and data.
Answer: Discovery
(Slide 10)
This study source was downloaded by 100000856859346 from CourseHero.com on 11-18-2022 18:53:58 GMT -06:00
https://www.coursehero.com/file/90125477/3pdf/
Powered by TCPDF (www.tcpdf.org)
Quiz 1
This are centralized data containers in a purpose-built space that supports business intelligence and
reporting but restricts robust analyses.
Data marts
Data warehouses
Analytic Sandbox
None of the Above
Which of the following are problems encountered in traditional data architecture?
High-value data is hard to reach and leverage, and predictive analytics and data mining activities
are last in line for data.
Data scientists are limited to performing in-memory analytics which will restrict the size of the
datasets they can use.
Data Science projects will remain isolated and ad hoc, rather than centrally managed.
All of the Above
II only
both I and II
neither I nor
II
Which of the following TRUE about the differences of Business Intelligence (BI) and Data Science?
I. Where Data Science problems tend to require highly structured data organized in rows and columns
for accurate reporting, BI projects tend to use many types of data sources, including large or
unconventional datasets.
II. Data Science tends to be more exploratory in nature and may use scenario optimization to deal with
more open-ended questions.
I only
II only
both I and II
neither I nor
II
Among the business drivers that push businesses to become more analytical and data driven, this one
involves customer churn, fraud and default
Optimize Business Operations
Identify Business Risk
Predict New Business Opportunities
Comply with Regulatory Requirements
Which of the following is true about the current analytical architecture?
I. Data sources are first loaded into the data warehouse where data needs to be well understood,
structured, and normalized with the appropriate data type definitions. This kind of centralization enables
security, backup, and failover of highly critical data.
II. Once in the data warehouse, data is read by additional applications across the enterprise for BI and
reporting purposes. These are high-priority operational processes getting critical data feeds from the
data warehouses and repositories.
I only
II only
both I and II
neither I nor
II
Quiz 2
Examples that fall under this group includes financial analysts, market research analysts, life scientists,
operations managers, and business and functional managers.
Data Savvy Professionals
Deep Analytical Talent
Technology and Data Enablers
None of the Above
Which of the following describe the decade beyond 2010 in regards to big data?
I. In this era, everyone and everything is leaving a digital footprint.
II. Data volumes in this decade are measured in terms of petabytes.
I only
II only
both I and II
neither I nor
II
The following are recurring sets of activities that data scientist performs EXCEPT
Reframe business challenges as analytics challenges.
Design, implement, and deploy statistical models and data mining techniques on Big Data.
Provide technical expertise to support analytical projects such as provisioning and administrating
analytical sandboxes.
Develop insights that lead to actionable recommendations.
Which of the following group of players in the data value chain makes sense of the data collected from
various entities?
Data Devices
Data Collectors
Data Aggregators
Data Users and Buyers
The data now is said to come from many sources including
Photos and video footage uploaded to the World Wide Web
Nontraditional IT devices, including the use of radio-frequency identification (RFID) readers, GPS
navigation systems, and seismic processing
Medical information, such as genomic sequencing and diagnostic imaging
All of the Above
Which of the following key roles in the new big data ecosystem has members who possess a combination
of skills to handle raw, unstructured data and to apply complex analytical techniques at massive scales?
Data Savvy Professionals
Deep Analytical Talent
Technology and Data Enablers
None of the Above
The following are the skillsets and behavioral characteristics a data scientist must possess EXCEPT
Qualitative skill
Curious and creative
Skeptical mindset and critical thinking
Communicative and collaborative
Quiz 3
This refers to the process of cleaning data, normalizing datasets, and performing transformations on the
data.
Data Preparation
Data Transformation
Data Conditioning
Data Visualizing
In this phase of the data analytics life cycle, the team assesses the resources available to support the
project in terms of people, technology, time, and data.
Discovery
Data Preparation
Model Building
Model Planning
The following activities is part of the discovery phase EXCEPT
The team determine how much business or domain knowledge the data scientist needs to
develop models.
N t The team catalog the data sources that the team has access to and identify additional data
sources that the team can leverage.
The team identify the main objectives of the project, identify what needs to be achieved in
business terms, and identify what needs to be done to meet the needs.
The team identify the key stakeholders and their interests in the project.
Which of the following describe the key role of Data Engineer?
provides access to key databases or tables and ensuring the appropriate security levels are in place
related to the data repositories.
executes the actual data extractions and performs substantial data manipulation to facilitate the
analytics.
provides subject matter expertise for analytical techniques, data modeling, and applying valid
analytical techniques to given business problems.
gives business domain expertise based on a deep understanding of the data, key performance
indicators (KPIs), key metrics, and business intelligence from a reporting perspective.
Which of the following activity is NOT involve in identifying potential data sources?
Capture aggregate data sources
Evaluate the data structures and tools needed
Perform extract, transform, load processes to data
Scope the sort of data infrastructure needed
In this phase of the data analytics life cycle, the team delivers final reports, briefings, code, and technical
documents.
Model Building
Model Planning
Communicate Results
Operationalize
I only
II only
both I and II
neither I nor
II
Quiz 4
II only
both I and II
neither I nor
II
Which of the following are free or open source tools available for data analytics practitioner?
SAS Enterprise Miner
SPSS Modeler
Octave
Alpine Miner
Which of the following is a deliverable under the operationalize phase?
Presentation for project sponsors
Presentation for analysts
Technical specifications of implementing the code
All of the Above
The following activities are involved under the model planning phase EXCEPT
Assess the structure of the datasets.
Ensure that the analytical techniques enable the team to meet the business objectives and accept
or reject the working hypotheses.
Evaluate whether similar, existing approaches are available or if the team will need to create
something new.
Assess the validity of the model and its results.
Which of the following is TRUE about model planning?
I. Under this phase, the team develop datasets for training, testing, and production purposes.
II. Data Exploration, Variable and Model selection characterize this phase.
I only
II only
both I and II
neither I nor
II
Which of the following is TRUE about the final phase of data analytics life cycle?
I. In the final phase, the team communicates the benefits of the project more broadly and sets up a
pilot project to deploy the work in a controlled way before broadening the work to a full enterprise or
ecosystem of users.
II. Under this phase, the team reflect on the project and consider what obstacles were in the project
and what can be improved in the future as well as make recommendations for future work or
improvements to existing processes.
I only
II only
both I and II
neither I nor
II
Quiz 5
Prior to any regression modelling, the data should always be inspected for the following EXCEPT
Data – entry errors
Expected pattern
Outliers
Missing values
Which of the following statements is/are ALWAYS TRUE?
I. Inferential statistics consists of Estimation and Hypothesis Testing
II. The link between inferential and descriptive statistics is probability
I only
II only
both I and II
neither I nor
II
15.
6
17.
4
19.
2
20.
8
II only
both I and II
neither I nor
II
II only
both I and II
neither I nor
II
In predicting Sales Revenue using TV and Radio Ads Expenses, we have the
following regression results
Estimate the predicted sales if tv and radio ads expenses are 200 and 50 respectively.
19.
3
21.
5
23.
7
25.
9
Quiz 6
Based on the following results of logistic regression, which of the following statements is/are TRUE?
I. For every 1 unit increased in Age, the value of logistic function increases by 0.16.
II. The regression coefficient for the Married variable is not significant.
I only
II only
both I and II
neither I nor
II
Based on the following results of logistic regression, what is the likelihood of churning when Age = 40
and Churned_contacts = 5? (Note: Round coefficients up to 2 decimal places)
0.714
0.62
3
0.35
7
0.26
9
II only
both I
and II
neither I
nor II
II only
both I and II
neither I nor
II
QUESTION 1
This refers to immense volumes of data, both unstructured and structured, that can be used to
analyze insights which can lead to better decisions and strategic business moves.
Huge Data
Large Data
Big Data
None of the Above
QUESTION 2
The following are included in the ten V’s of Big Data EXCEPT
Velocity
Visual
Veracity
Volume
QUESTION 3
Which of the following common types of big data?
Machine to Machine data
Web and Social Media data
Biometrics data
All of the Above
QUESTION 4
Which of the following is/are ALWAYS TRUE about data science?
I. Deals with unstructured and structured data.
II. It’s a combination of statistics, mathematics, programming, problem-solving, capturing
data in ingenious ways.
I only
II only
both I and II
neither I nor II
QUESTION 5
In obtaining data on data science process, the following questions are asked EXCEPT
Are there anomalies?
How were the data sampled?
Which data are relevant?
None of the Above
QUESTION 6
Which of the following are components of data analytics?
Data mining
This study source wasData visualization
downloaded by 100000820648477 from CourseHero.com on 11-12-2022 00:05:32 GMT -06:00
https://www.coursehero.com/file/97441363/SQ1pdf/
SQ COMPILATION Page 1
Data visualization
Decision Analysis
All of the Above
QUESTION 7
The following is true about data analytics EXCEPT
the science of examining raw data with the purpose of drawing conclusions about that
information.
focus lies in describing the data in its entirety.
involves applying an algorithmic or mechanical process to derive insights.
used in a number of industries to allow the organizations and companies to make better
decisions as well as verify and disprove existing theories or models.
This study source was downloaded by 100000820648477 from CourseHero.com on 11-12-2022 00:05:32 GMT -06:00
https://www.coursehero.com/file/97441363/SQ1pdf/
Powered by TCPDF (www.tcpdf.org)
SQ COMPILATION Page 2