0% found this document useful (0 votes)
1K views41 pages

Math144 M1reviewer

The key role of a Data Engineer is to execute the actual data extractions and performs substantial data manipulation to facilitate the analytics. 2. Which of the following is/are TRUE about Data Engineer responsibilities? I. Design, develop, and maintain data integration, data warehousing, and data processing systems and pipelines. II. Work closely with data scientists and analysts to understand their data needs and ensure they have access to clean, integrated data to fuel their models and analyses. I only II only Both I and II Neither I nor II

Uploaded by

ava ty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views41 pages

Math144 M1reviewer

The key role of a Data Engineer is to execute the actual data extractions and performs substantial data manipulation to facilitate the analytics. 2. Which of the following is/are TRUE about Data Engineer responsibilities? I. Design, develop, and maintain data integration, data warehousing, and data processing systems and pipelines. II. Work closely with data scientists and analysts to understand their data needs and ensure they have access to clean, integrated data to fuel their models and analyses. I only II only Both I and II Neither I nor II

Uploaded by

ava ty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Which of the following is TRUE about the discovery phase?

I. Ensure the project team has the right mix of domain experts, customers, analytic talent, and project
management to be effective.

II. Evaluate how much time is needed and if the team has the right breadth and depth of skills.

ANS both I and II

In creating a mechanism for performing ongoing monitoring of model accuracy under the operationalize
phase, which of the following statements is/are TRUE?

I. If accuracy improves, finding ways to retrain the model.

II. Design alerts to check for when the model is operating “out-of-bounds.”

ANS II only

Which of the following is/are TRUE about phase 5 of data analytics life cycle?

I. Here, the team considers how best to articulate the findings and outcomes to the various team members and
stakeholders, taking into account caveats, assumptions, and any limitations of the results.

II. It is critical to articulate the results properly and position the findings in a way that is appropriate for the
audience.

ANS both I and II

Which of the following is/are TRUE about testing and training datasets?

I. It is critical to ensure that the training and test datasets are sufficiently robust for the model and analytical
techniques.

II. A simple way to think of these datasets is to view the testing dataset for conducting the initial experiments
and the training datasets for validating an approach once the initial experiments and models have been run.

ANS I only

Which of the following is TRUE about model building?


I. The phases of model planning and model building can overlap quite a bit, and in practice one can iterate
back and forth between the two phases for a while before settling on a final model.

II. Although the modeling techniques and logic required to develop models can be highly complex, the
actual duration of this phase can be short compared to the time spent preparing the data and defining
the approaches.

ANS both I and II

Which of the following is/are TRUE in regards to presentation of results?

I. As a general rule, the more executive the audience, the more succinct the presentation needs to be.

II. When presenting to audiences with more quantitative backgrounds, focus more time on the methodology and
findings.
ANS both I and II

In assessing whether the data the science team have succeeded or failed in its objectives established in phase
I, the following statements are true EXCEPT

ANS It is important to conduct very


robust analysis to show specific
and desired results.

When the team has ran the model, completed a thorough discovery phase, and learned a great deal about the
datasets, which of the following remaining tasks needs to be done?

ANS All of the Above


QUIZ1

1. This type of data repository are the ones that enables flexible, high performance analysis
in a nonproduction environment and can leverage in-database processing.

data marts

data warehouses

analytic sandbox

Spreadsheets

2. Clickstream data is categorized under which type of data?

structured data

semi-structured data

quasi-structured data

unstructured data

3. Which of the following is/are TRUE about the current data analytic architecture?

I. At the beginning of the data workflow, analysts do get data provisioned for their downstream
analytics.

II. Although reports and dashboards are still important for organizations, most traditional data
architectures inhibit data exploration and more sophisticated analysis.

I only

II only

both I and II

neither I nor II

4. Which of the following is/are TRUE about the difference between business intelligence
and data science?
I. Business Intelligence typically involves structured data types and utilizes standard and ad hoc
reporting, dashboards, alerts, etc.

II. Data Science can handle both structured and nonstructured data types and utilizes
optimization, predictive modeling, forecasting, etc.

III. Data Science is more explanatory while Business Intelligence is more exploratory.

I and II only

I and III only

II and III only

I, II and III

5. The following are drivers of big data EXCEPT

social media

medical imaging

mobile sensors

none of the above

6. The following are examples of non-structured data types EXCEPT

textual data

CSV files

genetic mappings

multimedia files

7. Which of the following business driver in data analytics involves customer churn and
fraud detection?
optimize business operations

identify business risk

predict new business opportunities

comply with regulatory requirements

8. Which of the following is/are TRUE about state of practice in analytics?

I. Organizations can apply advanced analytical techniques to optimize processes and derive
more value from these common tasks in business processes.

II. Compliance to regulatory laws require minimal complexity and data requirements.

I only

II only

both I and II

neither I nor II

9. Which of the following characteristics stand out as defining Big Data?

Huge volume of Data

Complexity of data types and structures

Speed of new data creation and growth/

All of the Above

10. Which of the following are considered problems in the traditional data architecture?

High-value data is hard to reach and leverage, and predictive analytics and data mining
activities are last in line for data.
Data moves in batches from EDW to local analytical tools.

Data Science projects will remain isolated and ad hoc, rather than centrally managed.

All of the Above

11. Which of the following is/are TRUE about Big Data?

I. Data is created constantly and comes at an ever increasing rate.

II. Data comes from multiple sources.

III. The challenges of the data deluge present the opportunity to transform business,
government, science, and everyday life.

I and II only

I and III only

II and III only

I, II and III

12. Which of the following is/are TRUE about the definition of BIG DATA?

I. According to McKinsey Global Report in 2011, big data is data whose scale, distribution,
diversity, and/or timeliness require the use of new technical architectures and analytics to
enable insights that unlock new sources of business value.

II. McKinsey’s definition of Big Data implies that organizations will need new data architectures
and analytic sandboxes, new tools, new analytical methods, and an integration of multiple skills
into the new role of the data scientist

I only

II only

both I and II

neither I nor II
QUIZ 2

1. This group of players are the ones who make sense of data collected from various
entities.

data devices

data collectors

data aggregators

data users and buyers

2. The following is considered an ideal characteristics of the data scientists EXCEPT

skeptical mindset and critical thinking

curious and creative

strict and punctual

communicative and collaborative

3. This group has advanced training in quantitative disciplines, such as mathematics,


statistics, and machine learning and possess a combination of skills to handle raw,
unstructured data.

deep analytical talent

data savvy professionals

technology and data enablers

none of the above

4. During the decade when social media platforms explode, the generated data volume are
measured in terms of ________ scale

gigabytes

terabytes
petabytes

Exabytes

5. Examples of _________ include financial analysts, market research analysts, life


scientists, operations managers, and business and functional managers.

deep analytical talent

data savvy professionals

technology and data enablers

all of the above

6. The following are considered devices that collect data EXCEPT

cellphone

loyalty cards

credit card reader

list brokers

7. This decade saw a proliferation of different kinds of data sources which are mainly
productivity and publishing tools.

1980s

1990s

2000s

2010s

8. Which of the following is/are TRUE about technology and data enablers?

I. This role requires skills related to computer engineering, programming, and database
administration.
II. To do their jobs, members need access to a robust analytic sandbox or workspace where
they can perform large-scale analytical data experiments.

I only

II only

both I and II

neither I nor II

9. Which of the following are considered recurring tasks of a data scientists?

Reframe business challenges as analytics challenges.

Design, implement, and deploy statistical models and data mining techniques on Big Data

Develop insights that lead to actionable recommendations

All of the Above

10. Which of the following is/are TRUE about big data ecosystem?

I. Organizations and data collectors are realizing that the data they can gather from individuals
contains intrinsic value.

II. As this new digital economy continues to evolve, the market sees the introduction of data
vendors and data cleaners that use crowdsourcing to test the outcomes of machine learning
techniques.

I only

II only

both I and II

neither I nor II
QUIZ 3

1. Which of the following describe the key role of Data Engineer?

provides access to key databases or tables and ensuring the appropriate security levels are in
place related to the data repositories.

executes the actual data extractions and performs substantial data manipulation to
facilitate the analytics.

provides subject matter expertise for analytical techniques, data modeling, and applying valid
analytical techniques to given business problems.

gives business domain expertise based on a deep understanding of the data, key performance
indicators (KPIs), key metrics, and business intelligence from a reporting perspective.

2. In this phase of the data analytics life cycle, the team assesses the resources available
to support the project in terms of people, technology, time, and data.

Discovery

Data Preparation

Model Building

Model Planning

3. This critical step involves the process of stating the analytics problem to be solved.

learning the business domain

framing the problem

interviewing the project sponsor

developing the initial hypotheses

4. Which of the following person provides the funding and gauges the degree of value from
the final outputs of the working team in a data analytics project?
Project Manager

Project Sponsor

Business Intelligence Analyst

Business User

5. Which of the following is TRUE about data analytics life cycle?

I. A common mistake made in data science projects is rushing into data collection and analysis,
which precludes spending sufficient time to plan and scope the amount of work involved,
understanding requirements, or even framing the business problem properly.

II. Having a good data analytics process ensures a comprehensive and repeatable method for
conducting analysis and helps focus time and energy.

I only

II only

both I and II

neither I nor II

6. The following activities is part of the discovery phase EXCEPT

The team determine how much business or domain knowledge the data scientist needs to
develop models.

The team perform extract, load and transform to get the data in the sandbox.

The team identify the main objectives of the project, identify what needs to be achieved in
business terms, and identify what needs to be done to meet the needs.

The team identify the key stakeholders and their interests in the project.

7. In this phase of the data analytics life cycle, the team delivers final reports, briefings,
code, and technical documents.
Model Building

Model Planning

Communicate Results

Operationalize

8. Which of the following is TRUE about the discovery phase?

I. Ensure the project team has the right mix of domain experts, customers, analytic talent, and
project management to be effective.

II. Evaluate how much time is needed and if the team has the right breadth and depth of skills.

I only

II only

both I and II

neither I nor II
QUIZ 4

1. Which of the following is TRUE in learning about the data?

I. Spending time to learn the nuances of the datasets provides context to understand what
constitutes a reasonable value and expected output versus what is a surprising finding.

II. It is important to catalog the data sources that the team has access to and identify additional
data sources that the team can leverage but perhaps does not have access to today.

I only

II only

both I and II

neither I nor II

2. This refers to the process of cleaning data, normalizing datasets, and performing
transformations on the data.
Data Cleansing

Data Transformation

Data Conditioning

Data Visualizing

3. In model planning, why is it important to check whether similar and existing approaches
have been done in relation to the data science problem?

Many times teams can get ideas from analogous problems that other people have solved in
different industry verticals or domain areas.

It is useful to research and understand how other analysts generally approach a specific kind of
problem.

Performing this sort of diligence gives the team ideas of how others have solved similar
problems and presents the team with a list of candidate models to try as part of the model
planning phase.
All of the Above

4. Which of the following can be detected by performing data visualization?

skewed distribution

unexpected values

high level patterns

all of the above

5. The following activities are involved under the model planning phase EXCEPT

Assess the structure of the datasets.

Ensure that the analytical techniques enable the team to meet the business objectives and
accept or reject the working hypotheses.

Evaluate whether similar, existing approaches are available or if the team will need to create
something new.

Assess the validity of the model and its results.

6. Which of the following is TRUE about model planning?

I. Under this phase, the team develop datasets for training, testing, and production purposes.

II. Data Exploration, Variable and Model selection characterize this phase.

I only

II only

both I and II

neither I nor II
7. Which of the following is TRUE about preparing the analytic sandbox?

I. Analytic sandbox enables data science team to access data without interfering with the live
production databases.

II. When developing the analytic sandbox, it is a best practice to collect just the right amount of
data as prescribed by IT group.

I only

II only

both I and II

neither I nor II

8. This step in the data analytics life cycle tends to be the most labor intensive and most
iterative.

Discovery

data preparation

model planning

model building

9. In the model selection subphase, the team’s main goal is to __________ based on the
end goal of the project

do data exploration

examine the relationship among different variables

choose an analytical technique or a short list of candidate techniques

capture the most essential predictors for the outcome variable


10. The following is part of the data preparation phase EXCEPT

Performing ETLT

Survey and Visualize

Developing Initial Hypothesis

Preparing the Analytic Sandbox


QUIZ 5

1. Which of the following is TRUE about the final phase of data analytics life cycle?

I. In the final phase, the team communicates the benefits of the project more broadly and sets
up a pilot project to deploy the work in a controlled way before broadening the work to a full
enterprise or ecosystem of users.

II. Under this phase, the team reflect on the project and consider what obstacles were in the
project and what can be improved in the future as well as make recommendations for future
work or improvements to existing processes.

I only

II only

both I and II

neither I nor II

2. In creating a mechanism for performing ongoing monitoring of model accuracy under the
operationalize phase, which of the following statements is/are TRUE?
I. If accuracy improves, finding ways to retrain the model.
II. Design alerts to check for when the model is operating “out-of-bounds.”

I only

II only

both I and II

neither I nor II

3. Which of the following are activities done under phase 5 of data analytics life cycle?

The team determine if it succeeded or failed in its objectives.

The team reflect on the implications of these findings and measure the business value.

The team record all the findings and then select the three most significant ones that can be
shared with the stakeholders.

All of the Above


4. Which of the following are free or open source tools available for data analytics
practitioner?

SAS Enterprise Miner

SPSS Modeler

Octave

Alpine Miner

5. Which of the following is TRUE about model building?

I. The phases of model planning and model building can overlap quite a bit, and in practice one
can iterate back and forth between the two phases for a while before settling on a final model.

II. Although the modeling techniques and logic required to develop models can be highly
complex, the actual duration of this phase can be short compared to the time spent preparing
the data and defining the approaches.

I only

II only

both I and II

neither I nor II

6. In creating robust models, the following questions needs to be considered EXCEPT

Does the model avoid intolerable mistakes?

How consistent are the contents and files?

Do any of the inputs need to be transformed or eliminated?

Will the kind of model chosen support the runtime requirements?


7. Which of the following is/are TRUE in regards to presentation of results?

I. As a general rule, the more executive the audience, the more succinct the presentation needs
to be.
II. When presenting to audiences with more quantitative backgrounds, focus more time on the
methodology and findings.

I only

II only

both I and II

neither I nor II

8. In assessing whether the data the science team have succeeded or failed in its
objectives established in phase I, the following statements are true EXCEPT

Failure should not be considered as a true failure, but rather as a failure of the data to accept or
reject a given hypothesis adequately.

It is important to conduct very robust analysis to show specific and desired results.

When conducting this assessment, determine if the results are statistically significant and valid.

It is incorrect to only do a mere superficial analysis, which is not robust enough to accept or
reject a hypothesis.

9. Which of the following is/are TRUE about phase 5 of data analytics life cycle?

I. Here, the team considers how best to articulate the findings and outcomes to the various team
members and stakeholders, taking into account caveats, assumptions, and any limitations of the
results.
II. It is critical to articulate the results properly and position the findings in a way that is
appropriate for the audience.
I only

II only

both I and II

neither I nor II
10. When the team has ran the model, completed a thorough discovery phase, and learned
a great deal about the datasets, which of the following remaining tasks needs to be
done?

Reflect on the project and consider what obstacles were in the project and what can be
improved in the future.

Make recommendations for future work or improvements to existing processes.

Consider what each of the team members and stakeholders needs to fulfill her responsibilities.

All of the Above

11. Which of the following is a deliverable under the operationalize phase?

Presentation for project sponsors

Presentation for analysts

Technical specifications of implementing the code

All of the Above

12. Which of the following is/are TRUE about testing and training datasets?

I. It is critical to ensure that the training and test datasets are sufficiently robust for the model
and analytical techniques.

II. A simple way to think of these datasets is to view the testing dataset for conducting the initial
experiments and the training datasets for validating an approach once the initial experiments
and models have been run.

I only

II only

both I and II

neither I nor II
QUIZ 6

1. The following regression equation is obtained after applying regression analysis:

Estimate the predicted value of Houseprice if the floor area is 115 and the number of bedrooms
is 3.

297, 415

322, 525

356, 875

409, 275

2. Linear Regression can be utilized in the following situations EXCEPT

Income as predicted by educational attainment, number of years of experience and Industry


type

House Prices as predicted by floor area, number of bedrooms, and location type (urban or
suburban)

Loan Approval as determined by Age, Credit Score and Monthly Income

Corporate Profit as determined by R&D spending and Salesforce size.

3. Which of the following is/are TRUE about multivariate regression?

There is a regression coefficients for each predictor variables.

It is applied when the dependent variable can be explained by two or more variables.

Multiple Regression can either be linear or nonlinear

All of the Above


4. Which of the following is TRUE about simple linear regression?

and represents the slope and intercept terms of the regression model, respectively.

The random error follows a standard normal distribution.

The dependent variable Y can take on continuous, discrete or categorical variables.

None of the Above

5. Which of the following can be inspected by exloratory data analysis?

outliers

unexpected pattern

changes in variablity

All of the Above

6. The following summary output is obtained after applying regression analysis in regards
to advertising expense relation to sales revenue of a company:
Which of the following statements is/are TRUE?

I. For every unit increase in TV advertisement expense, there is 0.045 unit increase in sales.
II. 94.7% of the variation in sales revenue can be explained by the variation in predictor
variables.
III. When a company did not spend in any advertising, the company is predicted to have 2.921
units of sales.

I and II only

I and III only

II and III only

I, II and III

7. Which of the following connects the branches of Descriptive and Inferential Statistics?

Graph Theory

Automata

Calculus

Probability

8. This is a statistical technique that is used most frequently to analyze relationship


between two or more variables where at least two variables need to be continuous.

Clustering Analysis

Regression Analysis

Discriminant Analysis

Canonical Analysis
9. Which of the following is/are TRUE about descriptive and inferential statistics?

I. Descriptive Statistics deals with the collection, organization and presentation of data.

II. Inferential Statistics concern with making predictions and drawing conclusion for a larger
group of data.

I only

II only

both I and II

neither I nor II

10. Which of the following is/are TRUE about 'best fitting regression' line?

I. The slope and intercept of the best fitting line is obtained by minimizing the sum of the
residuals.

II. The fitted values is made as 'close' as possible to the observed values .

I only

II only

both I and II

neither I nor II
QUIZ 7

1. Which of the following is/are TRUE about logistic regression?

I. When the outcome variable is categorical, logistic regression is applicable instead of linear
regression.
II. The predictor variables in logistic regression can be continuous, discrete or categorical
variables.

I only

II only

both I and II

neither I nor II

2. Which of the following is/are TRUE about the theoretical model on logistic regression?

I. Logistic regression is based on the logistic function f(y) whose values ranges from -1 to 1.
II. The values of y in the logistic function f(y) are not directly observed but instead, only the
values of f(y) in terms of success or failure are observed.

I only

II only

both I and II

neither I nor II

3. Based on the following results of logistic regression, what is the likelihood of churning
when Age = 40 and Churned_contacts = 5? (Note: Round coefficients up to 2 decimal
places)
0.714

0.623

0.357

0.269

4. Which of the following is case application of logistic regression?

Income as predicted by educational attainment, number of years of experience and Industry


type

House Prices as predicted by floor area, number of bedrooms, and location type (urban or
suburban)

Loan Approval as determined by Age, Credit Score and Monthly Income

Corporate Profit as determined by R&D spending and Salesforce size.

5. Based on the following results of logistic regression, which of the following statements
is/are TRUE?

I. For every 1 unit increased in Age, the value of logistic function increases by 0.16.

II. The regression coefficient for the Married variable is not significant.

I only

II only

both I and II

neither I nor II
6. The following are situations where logistic regression can be applied EXCEPT

loan approval as determined by a persons income, credit score and amount of loan

survival or non survival of patient based on some medical diagnostics

income as determined by years of experience, highest educational attainment and


industry type.

University admission of student based on academic credentials and admission test.


Which of the following activity is NOT involve in identifying potential data sources?
Answer: Perform extract, transform, load processes to data
Slide (23,24)

In this phase of the data analytics life cycle, the team delivers final reports, briefings, code, and technical
documents.
Answer: Operationalize
(Slide 12)

Which of the following person provides the funding and gauges the degree of value from the final
outputs of the working team in a data analytics project?
Answer: Project Sponsor
(Slide 7)

Which of the following key questions are helpful to ask during the discovery phase when interviewing
the project sponsor?
Answer: All of the above
(Slide 21)

This refers to the process of cleaning data, normalizing datasets, and performing transformations on the
data.
Answer: Data Conditioning
(Slide 38)

The following activities is part of the discovery phase EXCEPT


Answer: The team catalog the data sources that the team has access to and identify additional data
sources that the team can leverage.
(Slide 10-12)

The following is part of the data preparation phase EXCEPT


Answer: Developing Initial Hypothesis
(Slide 30- 42)

Which of the following describe the key role of Data Engineer?


Answer : Executes the actual data extractions and performs substantial data manipulation to facilitate
the analytics

Which of the following is TRUE about data analytics life cycle?


I. A common mistake made in data science projects is rushing into data collection and analysis, which
precludes spending sufficient time to plan and scope the amount of work involved, understanding
requirements, or even framing the business problem properly. (Slide 4)
II. Having a good data analytics process ensures a comprehensive and repeatable method for conducting
analysis and helps focus time and energy. (Slide 3)
Answer: both I and II

In this phase of the data analytics life cycle, the team assesses the resources available to support the
project in terms of people, technology, time, and data.
Answer: Discovery
(Slide 10)

This study source was downloaded by 100000856859346 from CourseHero.com on 11-18-2022 18:53:58 GMT -06:00

https://www.coursehero.com/file/90125477/3pdf/
Powered by TCPDF (www.tcpdf.org)
Quiz 1
This are centralized data containers in a purpose-built space that supports business intelligence and
reporting but restricts robust analyses.
Data marts
Data warehouses
Analytic Sandbox
None of the Above
Which of the following are problems encountered in traditional data architecture?
High-value data is hard to reach and leverage, and predictive analytics and data mining activities
are last in line for data.
Data scientists are limited to performing in-memory analytics which will restrict the size of the
datasets they can use.
Data Science projects will remain isolated and ad hoc, rather than centrally managed.
All of the Above

Which of the following is always TRUE about Big Data?


I. Due to its size or structure, Big Data cannot be efficiently analyzed using only traditional databases
or methods.
II. Although the variety of Big Data tends to attract the most attention, generally the volume and
velocity of the data provide a more apt definition of Big Data.
I only

II only

both I and II

neither I nor
II

Which of the following TRUE about the differences of Business Intelligence (BI) and Data Science?
I. Where Data Science problems tend to require highly structured data organized in rows and columns
for accurate reporting, BI projects tend to use many types of data sources, including large or
unconventional datasets.
II. Data Science tends to be more exploratory in nature and may use scenario optimization to deal with
more open-ended questions.
I only

II only

both I and II

neither I nor
II
Among the business drivers that push businesses to become more analytical and data driven, this one
involves customer churn, fraud and default
Optimize Business Operations
Identify Business Risk
Predict New Business Opportunities
Comply with Regulatory Requirements
Which of the following is true about the current analytical architecture?
I. Data sources are first loaded into the data warehouse where data needs to be well understood,
structured, and normalized with the appropriate data type definitions. This kind of centralization enables
security, backup, and failover of highly critical data.
II. Once in the data warehouse, data is read by additional applications across the enterprise for BI and
reporting purposes. These are high-priority operational processes getting critical data feeds from the
data warehouses and repositories.
I only

II only

both I and II

neither I nor
II

Which of these attributes stand out as defining Big Data characteristics?


Huge volume of data
Complexity of data types and structures
Speed of new data creation and growth
All of the Above
This type of data has no inherent structure, which may include text documents, PDFs, images, and video.
Quasi-structured Data
Unstructured Data
Semi-structured Data
Structured Data

Quiz 2
Examples that fall under this group includes financial analysts, market research analysts, life scientists,
operations managers, and business and functional managers.
Data Savvy Professionals
Deep Analytical Talent
Technology and Data Enablers
None of the Above
Which of the following describe the decade beyond 2010 in regards to big data?
I. In this era, everyone and everything is leaving a digital footprint.
II. Data volumes in this decade are measured in terms of petabytes.
I only

II only

both I and II

neither I nor
II

The following are recurring sets of activities that data scientist performs EXCEPT
Reframe business challenges as analytics challenges.
Design, implement, and deploy statistical models and data mining techniques on Big Data.
Provide technical expertise to support analytical projects such as provisioning and administrating
analytical sandboxes.
Develop insights that lead to actionable recommendations.
Which of the following group of players in the data value chain makes sense of the data collected from
various entities?
Data Devices
Data Collectors
Data Aggregators
Data Users and Buyers
The data now is said to come from many sources including
Photos and video footage uploaded to the World Wide Web
Nontraditional IT devices, including the use of radio-frequency identification (RFID) readers, GPS
navigation systems, and seismic processing
Medical information, such as genomic sequencing and diagnostic imaging
All of the Above

Which of the following key roles in the new big data ecosystem has members who possess a combination
of skills to handle raw, unstructured data and to apply complex analytical techniques at massive scales?
Data Savvy Professionals
Deep Analytical Talent
Technology and Data Enablers
None of the Above

The following are the skillsets and behavioral characteristics a data scientist must possess EXCEPT
Qualitative skill
Curious and creative
Skeptical mindset and critical thinking
Communicative and collaborative

Quiz 3

This refers to the process of cleaning data, normalizing datasets, and performing transformations on the
data.
Data Preparation
Data Transformation
Data Conditioning
Data Visualizing
In this phase of the data analytics life cycle, the team assesses the resources available to support the
project in terms of people, technology, time, and data.
Discovery
Data Preparation
Model Building
Model Planning
The following activities is part of the discovery phase EXCEPT
The team determine how much business or domain knowledge the data scientist needs to
develop models.
N t The team catalog the data sources that the team has access to and identify additional data
sources that the team can leverage.
The team identify the main objectives of the project, identify what needs to be achieved in
business terms, and identify what needs to be done to meet the needs.
The team identify the key stakeholders and their interests in the project.
Which of the following describe the key role of Data Engineer?
provides access to key databases or tables and ensuring the appropriate security levels are in place
related to the data repositories.
executes the actual data extractions and performs substantial data manipulation to facilitate the
analytics.
provides subject matter expertise for analytical techniques, data modeling, and applying valid
analytical techniques to given business problems.
gives business domain expertise based on a deep understanding of the data, key performance
indicators (KPIs), key metrics, and business intelligence from a reporting perspective.
Which of the following activity is NOT involve in identifying potential data sources?
Capture aggregate data sources
Evaluate the data structures and tools needed
Perform extract, transform, load processes to data
Scope the sort of data infrastructure needed
In this phase of the data analytics life cycle, the team delivers final reports, briefings, code, and technical
documents.
Model Building

Model Planning

Communicate Results

Operationalize

Which of the following is TRUE about data analytics life cycle?


I. A common mistake made in data science projects is rushing into data collection and analysis, which
precludes spending sufficient time to plan and scope the amount of work involved, understanding
requirements, or even framing the business problem properly.
II. Having a good data analytics process ensures a comprehensive and repeatable method for
conducting analysis and helps focus time and energy.

I only

II only

both I and II

neither I nor
II

The following is part of the data preparation phase EXCEPT


Performing ETLT
Survey and Visualize
Developing Initial Hypothesis
Preparing the Analytic Sandbox
Which of the following key questions are helpful to ask during the discovery phase when interviewing
the project sponsor?
What is the desired outcome of the project? What data sources are available?
What data sources are available?
What industry issues may impact the analysis?
All of the Above
Which of the following person provides the funding and gauges the degree of value from the final
outputs of the working team in a data analytics project?
Project Manager
Project Sponsor
Business Intelligence Analyst
Business User

Quiz 4

Which of the following is TRUE about model building?


I. The phases of model planning and model building can overlap quite a bit, and in practice one can
iterate back and forth between the two phases for a while before settling on a final model.
II. Although the modeling techniques and logic required to develop models can be highly complex, the
actual duration of this phase can be short compared to the time spent preparing the data and defining
the approaches.
I only

II only

both I and II

neither I nor
II

Which of the following are free or open source tools available for data analytics practitioner?
SAS Enterprise Miner
SPSS Modeler
Octave
Alpine Miner
Which of the following is a deliverable under the operationalize phase?
Presentation for project sponsors
Presentation for analysts
Technical specifications of implementing the code
All of the Above
The following activities are involved under the model planning phase EXCEPT
Assess the structure of the datasets.
Ensure that the analytical techniques enable the team to meet the business objectives and accept
or reject the working hypotheses.
Evaluate whether similar, existing approaches are available or if the team will need to create
something new.
Assess the validity of the model and its results.
Which of the following is TRUE about model planning?
I. Under this phase, the team develop datasets for training, testing, and production purposes.
II. Data Exploration, Variable and Model selection characterize this phase.
I only

II only
both I and II

neither I nor
II

Which of the following is TRUE about the final phase of data analytics life cycle?
I. In the final phase, the team communicates the benefits of the project more broadly and sets up a
pilot project to deploy the work in a controlled way before broadening the work to a full enterprise or
ecosystem of users.
II. Under this phase, the team reflect on the project and consider what obstacles were in the project
and what can be improved in the future as well as make recommendations for future work or
improvements to existing processes.
I only

II only

both I and II

neither I nor
II

In creating robust models, the following questions needs to be considered EXCEPT


Does the model avoid intolerable mistakes?
How consistent are the contents and files?
Do any of the inputs need to be transformed or eliminated?
Will the kind of model chosen support the runtime requirements?
Which of the following are activities done under phase 5 of data analytics life cycle?
The team determine if it succeeded or failed in its objectives.
The team reflect on the implications of these findings and measure the business value.
The team record all the findings and then select the three most significant ones that can be shared
with the stakeholders.
All of the Above

Quiz 5

Prior to any regression modelling, the data should always be inspected for the following EXCEPT
Data – entry errors
Expected pattern
Outliers
Missing values
Which of the following statements is/are ALWAYS TRUE?
I. Inferential statistics consists of Estimation and Hypothesis Testing
II. The link between inferential and descriptive statistics is probability
I only

II only

both I and II

neither I nor
II

In predicting Sales Revenue using Newspaper Ads Expenses, we have the


following regression results

Estimate the predicted sales if newspaper ads expenses is 60 units.

15.
6

17.
4

19.
2

20.
8

The following characterizes inferential statistics EXCEPT


Draw conclusions for a larger group/data
Determine relationships
Present data
Make prediction
Which of the following is/are ALWAYS TRUE about simple regression?
I. Simple regression attempt to predict the dependent variable using more than one independent
variable.
II. Simple regression consists of one regression coefficient for each explanatory variable.
I only

II only

both I and II

neither I nor
II

Which of the following is/are ALWAYS TRUE about regression analysis?


I. It’s the technique used most frequently to analyze the relationship between two or more variables.
II. Predictor variables could either be discrete or continuous.
I only

II only

both I and II

neither I nor
II

In predicting Sales Revenue using TV and Radio Ads Expenses, we have the
following regression results

Estimate the predicted sales if tv and radio ads expenses are 200 and 50 respectively.
19.
3

21.
5

23.
7

25.
9

Quiz 6
Based on the following results of logistic regression, which of the following statements is/are TRUE?
I. For every 1 unit increased in Age, the value of logistic function increases by 0.16.
II. The regression coefficient for the Married variable is not significant.

I only

II only

both I and II

neither I nor
II

Based on the following results of logistic regression, what is the likelihood of churning when Age = 40
and Churned_contacts = 5? (Note: Round coefficients up to 2 decimal places)

0.714
0.62
3

0.35
7

0.26
9

Which of the following is TRUE about the logistic function?


I. As the value of y increases, the likelihood of the event f(y) also increases.
II. The values of y are not directly observed but rather, only the value of f(y) in terms of success or
failure is observed.
I only

II only

both I
and II

neither I
nor II

Which of the following is TRUE about logistic regression?


I. When the outcome variable is categorical in nature, logistic regression can be used to
predict the likelihood of an outcome based on the input variables.
II. Logistic regression can only be applied to an outcome variable with two values such as
true/false, pass/fail, or yes/no.
I only

II only

both I and II

neither I nor
II

The following are examples of applications for logistic regression EXCEPT


A model on patient’s successful response to a specific medical treatment with variables including
age, weight, blood pressure, and cholesterol levels.
A churn model for a customer switching to a new network given age and number of contacts who
churned.
A model to determine the relationship of amount of income given age, education, number years
working and gender.
A model to determine the likelihood of a person buying a new automobile given age, income and
gender.
SQ1
Sunday, 30 May 2021 9:10 pm

QUESTION 1
This refers to immense volumes of data, both unstructured and structured, that can be used to
analyze insights which can lead to better decisions and strategic business moves.
Huge Data
Large Data
Big Data
None of the Above

QUESTION 2
The following are included in the ten V’s of Big Data EXCEPT
Velocity
Visual
Veracity
Volume

QUESTION 3
Which of the following common types of big data?
Machine to Machine data
Web and Social Media data
Biometrics data
All of the Above

QUESTION 4
Which of the following is/are ALWAYS TRUE about data science?
I. Deals with unstructured and structured data.
II. It’s a combination of statistics, mathematics, programming, problem-solving, capturing
data in ingenious ways.

I only
II only
both I and II
neither I nor II

QUESTION 5
In obtaining data on data science process, the following questions are asked EXCEPT
Are there anomalies?
How were the data sampled?
Which data are relevant?
None of the Above

QUESTION 6
Which of the following are components of data analytics?
Data mining
This study source wasData visualization
downloaded by 100000820648477 from CourseHero.com on 11-12-2022 00:05:32 GMT -06:00

https://www.coursehero.com/file/97441363/SQ1pdf/
SQ COMPILATION Page 1
Data visualization
Decision Analysis
All of the Above

QUESTION 7
The following is true about data analytics EXCEPT
the science of examining raw data with the purpose of drawing conclusions about that
information.
focus lies in describing the data in its entirety.
involves applying an algorithmic or mechanical process to derive insights.
used in a number of industries to allow the organizations and companies to make better
decisions as well as verify and disprove existing theories or models.

This study source was downloaded by 100000820648477 from CourseHero.com on 11-12-2022 00:05:32 GMT -06:00

https://www.coursehero.com/file/97441363/SQ1pdf/
Powered by TCPDF (www.tcpdf.org)
SQ COMPILATION Page 2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy