0% found this document useful (0 votes)
112 views5 pages

Data Science Questions

1. In linear regression, dummy variables are used to represent categorical values in regression analysis by coding each category as a separate predictor variable. 2. A Type II error occurs when a true alternative hypothesis is mistakenly rejected. 3. The numbers 1-4 representing geographical areas (South, East, North, West) are an example of categorical data.

Uploaded by

Poonam Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views5 pages

Data Science Questions

1. In linear regression, dummy variables are used to represent categorical values in regression analysis by coding each category as a separate predictor variable. 2. A Type II error occurs when a true alternative hypothesis is mistakenly rejected. 3. The numbers 1-4 representing geographical areas (South, East, North, West) are an example of categorical data.

Uploaded by

Poonam Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

In linear regression dummy variable is used:


a. To represent missing data in each sample
b. To include hypothetical data in the regression equation
c. To represent residual value
d. To include categorical value in regression equation

2. Type II error is committed when:


a. The sample size was too small
b. Not enough information has been available
c. A true alternative hypothesis was mistakenly rejected
d. A true null hypothesis was mistakenly rejected

3. A researcher is gathering data from four geographical areas designated: South =1,
East=2, North =3, and West =4. The designated geographical area represents
a. Categorical Data
b. Qualitative data
c. Quantitative data
d. Label data

4. The range of probability is


a. Always smaller than zero
b. Always greater than zero
c. Zero to one
d. Between -1 to 1

5. Cluster analysis or clustering is the assignment of a set of observations into subsets so


that observations in the same cluster are similar in some sense. Hence, clustering is a
method and not a model.
a. True
b. False
6. Give an example where you can use confusion matrix for result determination.
a. Yes and No
b. Response and No Response
c. Good Customer and Bad Customer
d. All Binary cases.

7. Do you think 50 small decision trees are better than a large one. If yes state reason
why?
------------------------------------------
Ans: yes, More robust model (ensemble of weak learners that come and make
a strong learner)  Better to improve a model by taking many small steps
than fewer large steps
 If one tree is erroneous, it can be auto-corrected by the following
 Less prone to over fitting

8. Which of the following model in time series is a combination of 2 different models


without any additional features.
a. ARMA
b. ARIMA
c. ARCH
d. GARCH

9. Which of the following is used to hide limitations of Java behind an API for Cascading?
a. Scalding
b. Cascalog
c. Hcatalog
d. Hcalding

10. Which of the following is included in 5 V’s of Big Data?


a. Volume
b. Variety
c. Velocity
d. All of the above

11. After you have the data which is the next step you will proceed for:
a. Data Wrangling
b. Data Modeling
c. Data Visualization
d. Data Mining
12. Which of the following statements describes how mobile devices, the use of computers
in more and more everyday interactions, and the ability to connect with other devices
almost anywhere are changing society?
I. People are able to use mobile devices for new applications such as finding
directions or finding restaurants
II. Data can be collected from thousands of sources and can be combined to
provide new services to individuals and companies
III. Buildings, cars, classrooms, and offices can now be engineered with sensors to
automate tasks like adjusting the thermostat or even driving
IV. Data that is collected can be used to identify social problems
a. III only
b. I and III
c. II and IV
d. I, II, III, and IV

13. Which data mining technique is more suitable for categorical data analysis?
a. Decision Tree
b. Neural Network
c. Association Rule
d. Linear Regression

14. Which of the following is an example of time series problem?


1. Estimating number of hotel booking for next 6 months.
2. Estimating the stock price of a share for next 3 years of a company.
3. Estimating the house rent in a particular area.
a. Only 3
b. Only 2
c. 1 and 2
d. 1, 2 and 3

15. When working on Neural Network models the model training time depends on the size
of the network?
a. True
b. False

16. What do you understand by machine learning?


a. The autonomous acquisition of knowledge through the use of computer
programs
b. The autonomous acquisition of knowledge through the use of manual programs
c. The selective acquisition of knowledge through the use of computer programs
d. The selective acquisition of knowledge through the use of manual programs
17. ANN is composed of large number of highly interconnected processing elements
(neurons) working in unison to solve problems.
a. True
b. False

18. What will be the output of the following Python code?


Names = [“Ram”, “Shayam”, “Mohan”, “Mahesh”]
print(name[-1][-1])
a. n
b. Shayam
c. All the elements in Name
d. Error

19. What will be the output of the following Python code?


class change:
def __init__(self, x, y, z):
self.a = x + y + z
 
x = change(1,2,3)
y = getattr(x, 'a')
setattr(x, 'a', y+1)
print(x.a)
a. 6
b. 7
c. 0
d. Error

20. For the table given below write the following queries:

a. Find the total sales at the end of the year 2013 for all regions.
Select total_sales from table_name where year =’2013’
b. How to show schema of the table.
Select * from table_name;

c. Delete last 3 rows of the table


Delete from table_name where (S_Id =’108’)
Delete from table_name where (S_Id =’109’)
Delete from table_name where (S_Id =’110’)

d. Select all the records where ORDERS is <1000


Select * from table_name where ORDERS is <1000

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy