0% found this document useful (0 votes)
6 views4 pages

Machine Learning Lab Viva QA

The document contains a list of machine learning lab viva questions and answers, covering fundamental concepts such as data analysis, statistics, machine learning, and data visualization techniques. It differentiates between various statistical methods, types of data, and machine learning algorithms, including supervised and unsupervised learning. Additionally, it discusses practical applications of techniques like decision trees, clustering, and regression analysis.

Uploaded by

skshyam106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Machine Learning Lab Viva QA

The document contains a list of machine learning lab viva questions and answers, covering fundamental concepts such as data analysis, statistics, machine learning, and data visualization techniques. It differentiates between various statistical methods, types of data, and machine learning algorithms, including supervised and unsupervised learning. Additionally, it discusses practical applications of techniques like decision trees, clustering, and regression analysis.

Uploaded by

skshyam106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning Lab Viva Questions

and Answers (BSCL606)


1. 1. What is data analysis?

The process of inspecting, cleaning, transforming, and modeling data to discover useful
information and support decision-making.

2. 2. Why do we need to visualize data?

To understand patterns, trends, and outliers in data quickly and effectively.

3. 3. Mention the differences between box plot and histograms.

- Box plot shows distribution using quartiles; highlights outliers.


- Histogram shows frequency of data intervals (bins).

4. 4. What is statistics?

The science of collecting, analyzing, interpreting, and presenting data.

5. 5. What is descriptive statistics?

It summarizes and describes the features of a dataset (e.g., mean, median, mode).

6. 6. What is inferential statistics?

It makes predictions or inferences about a population based on a sample.

7. 7. Differentiate between Machine learning and AI.

- AI is the broader concept of machines mimicking human behavior.


- ML is a subset where machines learn from data.

8. 8. What are the different types of data?

- Numerical (continuous, discrete)


- Categorical (nominal, ordinal)
- Time series
- Text

9. 9. Illustrate the use of Histograms.

Used to visualize frequency distribution of continuous data, e.g., student scores.

10. 10. Mention the use of Box plots.


To detect spread and outliers in a dataset using min, Q1, median, Q3, and max.

11. 11. What is IQR?

Interquartile Range = Q3 − Q1; it shows the middle 50% spread of the data.

12. 12. How to handle missing values?

By removal, mean/median imputation, forward/backward fill, or using ML models.

13. 13. What is EDA?

Exploratory Data Analysis is the initial process of analyzing data for insights.

14. 14. Define outliers and how to detect them.

Outliers are data points far from other observations. Detected using Box plot, Z-score, or
IQR method.

15. 15. What is dimensionality reduction?

Reducing the number of input variables/features to simplify models (e.g., PCA).

16. 16. Explain PCA.

Principal Component Analysis transforms features into fewer components retaining most
variance.

17. 17. Differentiate between supervised and unsupervised learning.

- Supervised: Labeled data (e.g., classification)


- Unsupervised: Unlabeled data (e.g., clustering)

18. 18. Differentiate between regression and classification.

- Regression predicts continuous values.


- Classification predicts categories.

19. 19. Give examples for binary and multi classification.

- Binary: Spam vs. Not spam


- Multi: Classifying fruits as apple, banana, or orange

20. 20. Give examples for univariate, multivariate and bivariate data analysis.

- Univariate: Histogram of age


- Bivariate: Scatter plot of height vs. weight
- Multivariate: Dataset with age, income, and score

21. 21. Name the measures of central tendency.


Mean, Median, Mode.

22. 22. What is variance, bias and standard deviation?

- Variance: Spread of data.


- Bias: Error due to wrong assumptions.
- Std Deviation: Square root of variance.

23. 23. What is heatmap?

A graphical representation showing values using colors, often for correlation matrices.

24. 24. Define correlation matrix.

A table showing correlation coefficients between variables.

25. 25. Mention types of correlation.

Positive, Negative, and No correlation.

26. 26. Explain the importance of pair plot.

Shows relationships between variables pairwise using scatter plots.

27. 27. Explain Find-S algorithm and its importance.

A concept learning algorithm that finds the most specific hypothesis consistent with
training data.

28. 28. What do you mean by non-parametric algorithms?

Algorithms that do not assume data follows a specific distribution (e.g., KNN).

29. 29. Explain importance of KNN.

Simple, effective for classification/regression; based on distance from neighbors.

30. 30. How do you compute the distance between data points?

Using metrics like Euclidean, Manhattan, or Minkowski distance.

31. 31. Explain Decision Tree.

A tree-like model where nodes represent features and leaves represent outcomes.

32. 32. What is information gain?

A measure of reduction in entropy/surprise from splitting a dataset.

33. 33. What is entropy?


A measure of disorder or impurity in a dataset.

34. 34. What is Gini index?

A metric to measure impurity used in decision trees (CART).

35. 35. What is Bayesian learning?

Probabilistic learning based on Bayes' Theorem.

36. 36. Differentiate between linear regression and polynomial regression.

- Linear: Straight-line fit


- Polynomial: Fits nonlinear curves using higher-order terms

37. 37. Explain locally weighted regression.

A regression where points close to the query point are weighted more heavily.

38. 38. Why do we use standard scaler?

To normalize features to mean = 0 and std = 1 for better ML performance.

39. 39. Mention the applications of polynomial regression.

Modeling growth curves, market trends, or any nonlinear patterns.

40. 40. Mention the applications of clustering algorithms.

Customer segmentation, image compression, anomaly detection.

41. 41. What is elbow technique?

Used in K-Means to determine optimal number of clusters using cost vs. k plot.

42. 42. What is normalization?

Scaling data to a fixed range (like 0 to 1) to bring uniformity.

43. 43. List the applications of decision tree.

Credit scoring, medical diagnosis, loan approval, fraud detection.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy