0% found this document useful (0 votes)
13 views18 pages

Technical Question Bank

The document provides an overview of Tableau and SQL, highlighting the advantages of Tableau over other BI tools, the use of parameters, joins, filters, and the differences between various visualization techniques. It also explains key concepts in SQL such as JOIN clauses, aggregate functions, and the differences between various SQL commands and functions. Additionally, it covers handling NULL values, LOD functions, and the distinctions between sets and groups in Tableau.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

Technical Question Bank

The document provides an overview of Tableau and SQL, highlighting the advantages of Tableau over other BI tools, the use of parameters, joins, filters, and the differences between various visualization techniques. It also explains key concepts in SQL such as JOIN clauses, aggregate functions, and the differences between various SQL commands and functions. Additionally, it covers handling NULL values, LOD functions, and the distinctions between sets and groups in Tableau.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Mastering Tableau

1. Why is Tableau better than other BI tools? Give reasons.

Tableau is better than other BI tools because-


● It is easy to use.
● It integrates with Python, R and other scripting languages.
● It also provides high reliability and performance and can handle millions of data
records.
● It can ingest data from any source and create beautiful dashboards.

2. What is a Parameter in Tableau?

A parameter is a workbook variable in Tableau such as number, date, or string that


can replace a constant value in a calculation, filter, or reference line. A parameter
allows the end user to change the content that appears in worksheets and
dashboards. It can help narrow down the focus of analysis while entering conditions
into parameters.

3. Explain the different types of JOINS in Tableau.

There are four types of JOINS in Tableau:


● INNER JOIN- It is used to join two tables based on the matched common
values.
● LEFT JOIN- It is used to join two tables based on the matched common
values and all values in the left table.
● RIGHT JOIN- It is used to join two tables based on the matched common
values and all values in the right table.
● FULL OUTER JOIN- It is used to join two tables that contain all values from both
tables irrespective of the matched common values.

4. What are the advantages and disadvantages of Tableau?

Advantages of Tableau are:


● It has multiple data source connections.
● It is easy to use.
● It has high visualisation capabilities.
● It has high performance i.e. it can operate on a large amount of data.
● It is mobile friendly.
Disadvantages of Tableau are:
● It has a high cost.
● It needs manual effort.
● There is no automatic refreshing of reports.
● It requires knowledge of SQL.
● There is no version control.

5. What is a Filter? Explain different types of filters in Tableau.

Filter is used to segregate the data based on dimensions and reduce the number
of records present in a dataset for faster processing.
There are six types of filters in Tableau:
● Extract Filters- They are used to filter the extracted data from the data sources.
These filters are used only if the user extracts the data from the data source. It
also helps to lower the queries to the data source.

● Context Filter- It creates datasets based on the original data sheet and the
presets chosen for compiling the data. It helps in applying a relevant, actionable
context to the entire data analysis in Tableau.

● Data Source Filter- It is used to show only essential data sources by


restricting the sensitive data. It helps in minimising data feeds for faster
processing.

● Dimension Filter- Filters that are applied on dimensions are called dimension
filters. With the help of these filters we can select or deselect the values, or we
can perform wildcard selection or condition based selection where we can use
complex formulas or simple conditions to filter out data.

● Measure Filter- Filters that are applied on measurable or quantitative data are
called measure filters. The Measure filter has a range of values- At Least, At
Most and Special sub filters.

● Table Filter- This filter can look through data quickly without adding any
additional filter to the hidden data.

6. What is the difference between a Tree Map and a Heat Map?

Tree Map
● It is used to show a huge amount of hierarchical structured data.
● The levels in the hierarchy of the tree map are visualised as rectangles
containing other rectangles which represent a category in a column.
● A bigger rectangle represents a high frequency category in a column, while a
smaller rectangle represents a low frequency category.

Heat Map
● It is a graphical representation of data where values are depicted by colour.
● Heat maps make it easy to visualise complex data and understand it at a
glance.
● It uses colour to communicate relationships between data values, which is
much harder to understand if presented numerically in a spreadsheet.

7. Differentiate between Joining and Blending.

Joining Blending

It has LEFT JOIN, RIGHT JOIN, INNER It has only LEFT JOIN.
JOIN and FULL OUTER JOIN.

It is used when the data set is from the It is used when the data set is from different
same source. sources.

Data cannot be available in different Data can be available in different levels of


levels of granularity. granularity.

It joins data at row-level. Blending is performed by sending separate


queries to each dataset aggregate.

8. Explain Rank function and Dense_rank function in Tableau.

The Rank function in Tableau accepts two arguments- aggregated measure and
ranking order. The ranking order can be ascending or descending. The ranking order
is optional and by default assigned as descending. For example- If the values are
3,5,6,7,7,9 then their corresponding ranks would be 1,2,3,4,4,6 in ascending order.

The Dense_rank function works in a similar manner as the Rank function except it
won’t skip the next rank when assigning the same rank to identical values. For
example- If the values are 3,5,6,7,7,9 then their corresponding dense ranks would
be 1,2,3,4,4,5 in ascending order.

9. What is Rank_modified and Rank_unique function in Tableau?


The Tableau Rank_Modified function will assign the same rank to an identical value.
When we have a repeating number, we skip a number and assign the same rank to
repeating values. The Highest value will rank as 1, and the following two equal
amounts will rank as 3. For example, if we have 6,9,9,14 then the function will return
the ranks as 4, 3, 3, 1.

The Tableau Rank_UNIQUE function will assign unique ranks to identical values. For
example, if we have 3,5,6,7,7,9 then the function will return the ranks as 1,2,3,4,5,6,7
in ascending order.

10. Explain the Level of Detail (LOD) function.

Level of Detail (LOD) functions are used to run queries which are complex, and involve
many dimensions at the data source level.

The different types of LOD functions are:


● Fixed LOD- It does not require reference to any other dimensions for computing
values using the specified dimensions.
● Include LOD- It computes values using the specified dimensions in addition to
whatever dimensions are in the view.
● Exclude LOD- It subtracts dimensions from the view level of detail.

11. What is Measure and Dimension in Tableau?

In Tableau, Measures represent quantitative data such as integer, string etc. and are
used and analysed by dimensions. While dimensions represent qualitative values to
define a particular category. Examples of dimensions are geographical data, product
details, countries etc.

12. How can we handle NULL values in Tableau?

We can handle the NULL values in Tableau in the following ways:


● Using ZN() function- It assigns 0 to NULL values.
● Using IFNULL() function- We can use if conditions to fill NULL values.
● Using ISNULL() function- It tests a numerical column and returns ‘TRUE’ if the
expression doesn’t contain valid data (NULL).
● Using filter option- It excludes the NULL values from the view using a filter.
● Using hide NULL indicator- We can use hide NULL indicator by clicking on the
bar chart to hide NULL values from the figure.

13. What is Blended Axis and Dual Axis in Tableau?


Blended Axis is used when more than two measures are used in multi-line graphs or
charts. For example- Sales, Profit and Discount per Quarter.
Dual axis is used when two measures are used in dual lines of graphs or charts. Both
axes will be parallel to each other with a different range of values from the source data.
For example- Sales and Profit per Quarter.

14. What are the different types of connections that you can make with your
dataset?
The different connections in Tableau are:
● File Systems such as .csv, Excel, etc.
● Relational Systems such as Oracle, SQL Server, DB2, etc.
● Cloud Systems such as Windows Azure, Google BigQuery, etc.
● Other Sources using ODBC.

15. What is the difference between Sets and Groups in Tableau?

Sets Groups

It is dynamic i.e. it updates data on a daily It is static i.e. it does not update data on a
basis. daily basis.

In sets, you can group data across In groups, you can group data only within
multiple dimensions. one dimension.

It is used to form subsets of data based It puts dimensions together and create a
on the conditions chosen. hierarchy of multiple dimension levels.

Can choose “IN/OUT” or “Show Members There is no such option. The only option
in Set”. available is group/ungroup.
SQL for Data Science

1. What is a JOIN clause? Explain different types of JOIN clauses.

A JOIN clause is used to combine rows from two or more tables, based on a related
column between them.
There are six types of JOIN clauses:
● INNER JOIN- It returns rows that have matching values in both tables.
● LEFT JOIN- It returns all the rows from the left table with corresponding rows
from the right table. If there are no matching rows, NULL is returned as a value
from the second table.
● RIGHT JOIN- It returns all the rows from the right table with corresponding rows
from the left table. If there are no matching rows, NULL is returned as a value
from the first table, which can also be called as the left table.
● FULL OUTER JOIN- It returns all the rows from both the tables. If there are no
matching rows in the tables, NULL is returned.
● CROSS JOIN- It returns all the possible combinations of rows from both
tables.
● SELF JOIN- It will join the table with itself. Example- Finding employees
who are managers in the employee table.

2. What do you mean by Tables and Fields?

A table is a set of data that is organised in a model with columns and rows. In a table,
columns are placed vertically while rows are placed horizontally. A table has a
specified number of columns called fields, but it can have any number of rows, which
are called records.

3. Differentiate between Rank(), Dense_rank() and Row_number.

The Rank() function ranks within partitions with gaps and gives the same ranking for
tied values. Example- If the values are 3,5,6,7,7,9 then their corresponding ranks would
be 1,2,3,4,4,6 in ascending order.

The Dense_rank() function works in a similar manner as the rank function, except it
won’t skip the next rank when assigning the same rank to identical values. Example- if
the values are 3,5,6,7,7,9 then their corresponding dense ranks would be 1,2,3,4,4,5 in
ascending order.

Row_number provides unique numbers for each row within the partition, with different
numbers for tied values.

4. What are Constraints?

Constraints are the rules enforced on the data columns of a table. These are used to
limit the type of data that can go into a table. This ensures the accuracy and reliability
of the data in the database. Constraints could be either on a column level or on a table
level. The column level constraints are applied only to one column, whereas the table
level constraints are applied to the whole table. Commonly used constraints are: NOT
NULL constraint, UNIQUE constraint, DEFAULT constraint, PRIMARY KEY constraint,
FOREIGN KEY constraint.

5. What is the use of a View?

A view is a virtual table which consists of a subset of data contained in a table. Since,
views are virtually present it takes less space to store them. A view can have data of
one or more tables combined, and it depends on the relationship between views and
tables.

6. Explain the different types of Keys in SQL.

● Super Key- It can contain multiple attributes that might not be able to
independently identify tuples in a table. But when grouped with certain
keys, they can identify tuples uniquely.

● Candidate Key- A Candidate key is a subset of Super key and is devoid of any
unnecessary attributes that are not important for uniquely identifying tuples. The
value for the Candidate key is unique and non-NULL for all tuples. And every
table has to have at least one Candidate key. But there can be more than one
Candidate key too.

● Primary Key- Primary key is the Candidate key selected by the database
administrator to uniquely identify tuples in a table. There can be only one
Primary key for a table.

● Alternate Key- There can be only one Primary key for a table. Therefore, all
the remaining Candidate keys are known as Alternate or Secondary keys.

● Foreign Key- Foreign key is an attribute which is a Primary key in its parent
table but is included as an attribute in another host table.
● Composite Key- A Composite key is a Candidate key or Primary key that
consists of more than one attribute.

7. What is Indexing?

Indexes are special lookup tables that the database search engine can use to speed
up data retrieval. In simple words, an index is a pointer to data in a table. An index in a
database is very similar to an index at the back of a book.

For example, if you want to add reference for all the pages in a book that discuss a
certain topic, you first refer to the index, which lists all the topics alphabetically and are
then referred to one or more specific page numbers.

8. Differentiate between DELETE, DROP and TRUNCATE commands.

Delete- It is a Data Manipulation Language (DML) command. It is used to delete one or


more tuples of a table. With the help of the “DELETE” command, we can either delete
all the rows in one go or one by one. i.e. we can use it as per the requirement or the
condition using the WHERE clause. It is comparatively slower than the TRUNCATE
command. The TRUNCATE command does not remove the structure of the table.

Drop- It is a Data Definition Language (DDL) command. It is used to drop the whole
table. With the help of the “DROP” command we can drop (delete) the whole structure
in one go, i.e. it removes the named elements of the schema. By using this command,
the whole table ceases to exist.

Truncate- It is also a Data Definition Language (DDL) command. It is used to delete all
the rows of a relation (table) in one go. With the help of the “TRUNCATE” command, we
can’t delete a single row since the WHERE clause is not used here. By using this
command the existence of all the rows of the table is lost. It is comparatively faster than
the delete command as it deletes all the rows quickly.

9. What is the difference between WHERE and HAVING clauses?

WHERE clause introduces a condition on individual rows; while HAVING clause


introduces a condition on aggregations, i.e. results of selection where a single result,
such as count, average, min, max, or sum, has been produced from multiple rows.
Your query calls for a second kind of condition (i.e. a condition on an aggregation)
hence HAVING works correctly. As a rule of thumb, use WHERE before GROUP BY
and HAVING after GROUP BY. It is a primitive rule, but it is useful in more than 90% of
the cases.
10. Define the Aggregate function in SQL and give its uses.

An aggregate function in SQL returns one value after calculating multiple values of a
column. We often use aggregate functions with the GROUP BY and HAVING clauses of
the SELECT statement.
Types of Aggregate functions are: COUNT(), SUM(), AVG(), MIN(), MAX().

11. What is a Subquery and what are its types?

A subquery in MySQL is a query, which is nested into another SQL query and
embedded with SELECT, INSERT, UPDATE or DELETE statement along with the
various operators. The different types of subquery are-

● Single Value- It returns exactly one column and exactly one row. It can be used
with comparison operators such as =, <, >, <=, >=.

● Multiple Values- It returns multiple rows or columns. It can be used


with operators like IN, EXISTS, ALL or ANY.

● Correlated- It refers to the tables introduced in the outer query. It depends on


the outer query and cannot be run independently from the outer query.

● Non-Correlated- It does not depend on the outer query and can be run
independently from the outer query.

12. What is the difference between Lead and Lag function?

Lead Function- It is used to access data from subsequent rows along with data from
the current row.

Lag Function- It is used to access data from previous rows along with data from the
current row.

13. What is the difference between Union and Union All?

The difference between Union and Union All is that Union extracts the rows that are
being specified in the query, while Union All extracts all the rows including the
duplicates (repeated values) from both the queries.
14. What is Normalization?

Normalization is the process of minimising redundancy from a relation or set of


relations. Redundancy in relation may cause insertion, deletion, and update anomalies.
So, it helps to minimise the redundancy in relations. Normal forms are used to
eliminate or reduce redundancy in database tables.
There are different types of Normalization:
● 1NF: It is said to be in 1NF if it contains Atomic Value.
● 2NF: It is said to be in 2NF if it is in 1NF and all non-key attributes are
dependent on the primary key.
● 3NF: It is said to be in 3NF, if it is in 2NF and no transition dependency
exists.
● BCNF: It is said to be in BCNF if it is in 3NF and has no overlapping
dependency.
● 4NF: It is said to be in 4NF if it is in BCNF and has no multivalued
dependency.

15. Explain different types of relationships in SQL.

The term relation is sometimes used to refer to a table in a relational database.


However, it is more often used to describe the relationships that exist between the
tables in a relational database.

The types of relationships in SQL are:


● One-One Relationship
● One-Many Relationship
● Many-Many Relationship
● Many-One Relationship
Python for Data Science

1. What is Python? What are the benefits of using Python?

Python is a programming language with objects, modules, threads, exceptions, and


automatic memory management. The benefits of using Python are that it is simple and
easy, portable, extensible, has built-in data structures, and it is an open source
language.

2. What is pickling and unpickling?

The Pickle module accepts a Python object, converts it into a string representation, and
dumps it into a file by using a dump process. This entire process is called pickling. On
the other hand, unpickling is the process of retrieving original Python objects from the
stored string representations.

3. How is Python interpreted?

Python is an interpreted language. The Python program runs directly from the source
code and converts the source code written by the programmer into an intermediate
language, which is again translated into the machine language that is executed.

4. What is a Python decorator?

A Python decorator is a specific change that we make in Python syntax to alter


functions easily. It helps in adding new functionality to an existing object without
changing its structure.

5. What is the difference between a list and a tuple?

The difference between a list and a tuple is that a list is mutable while a tuple is
immutable.

Example: ddTuple can be hashed for as a key for dictionaries.


coordinates = { (0,0) : 100, (1,1) : 200}
coordinates[(1,0)] = 150
coordinates[(0,1)] = 125

print(coordinates)
OUTPUT: {(0, 0): 100, (1, 1): 200, (1, 0): 150, (0, 1): 125}

In the above example, we used coordinates (which is a tuple) as a key for our
dictionary.

6. What are namespaces in Python?


In Python, every name that is introduced has a place where it lives and can be hooked
for. This is known as a namespace. It is like a box where a variable name is mapped to
the object placed. Whenever the variable is searched out, this box will also be
searched, to get the corresponding object.

7. What is lambda in Python?

Lambda is a single expression anonymous function that is often used as an inline


function.

8. Explain pass in Python.

Pass refers to a no-operation Python statement. In other words, it is a


placeholder for a code you may want to write at a later stage in a compound
statement. With the use of pass, you can avoid getting an error even when an
empty code is not allowed.

9. What do you mean by PEP 8?

PEP 8 stands for Python Enhancement Proposal. It is a document that provides the
guidelines on how to write the Python code. It is a set of rules that specifies how to
format the Python code for maximum readability. It was written by Guido van Rossum,
Barry Warsaw, and Nick Coghlan in 2001.

10. What is zip() function in Python?

The zip() function in Python returns a zip object, which maps a similar index of multiple
containers. It takes an iterable and converts it into an iterator and aggregates the
elements based on iterables passed, and finally, it returns an iterator of tuples.

11. What are the different file processing modes that are supported by Python?

Python provides four modes to open files- read-only (r), write-only (w), read-write
(rw) and append mode (a).

● Read-only mode (r) - It is used to open a file in read-only mode for reading.
It is the default mode.
● Write-only mode (w) - It is used to open a file in the write-only mode for
writing. It overwrites the file if the file exists. If the file does not exist, it
creates a new file for writing. If the file being replaced contains some data,
the data would be lost.
● Read-Write mode (rw) - It is used to open a file for reading and writing. It
can also be referred to as updating mode.
● Append mode (a) - It is used to open a file for writing. The file pointer is at
the end of the file if the file exists. If the file does not exist, it creates a
new file for writing.

12. What are iterators in Python?

In Python, iterators are used to iterate a group of elements and containers, like a list.
Iterators are collections of items, that can be a list, a tuple, or a dictionary. Python
iterator implements itr and next() methods to iterate the stored elements. In Python, we
generally use loops to iterate over the collections (i.e. list and tuple). In simple words,
iterators are objects which can be traversed through or iterated upon.

13. Explain docstring in Python.

The Python docstring is a string literal that occurs as the first statement in a module,
function, class, or method definition and it provides a convenient way to associate the
documentation. String literals that occur immediately after a simple assignment at the
top are called ‘attribute docstrings’. String literals that occur immediately after another
docstring are called ‘additional docstrings’. Python uses triple quotes to create
docstrings even though the string fits in one line. The docstring phrase ends with a
period (.). It can include multiple lines and may consist of spaces and other special
characters.

14. What is ‘init’ in Python?

The init can be referred to as a method or a constructor in Python. This method is


automatically called to allocate the memory when a new object/instance of a class
is created. All classes have the init method.

15. How do we use enumerate() function in Python?

The enumerate() function is used to iterate through the sequence and retrieve the
index position and its corresponding value at the same time.

For example -
list_1 = ["A","B","C"]
s_1 = "Javatpoint" # creating enumerate objects
object_1 = enumerate(list_1)
object_2 = enumerate(s_1)
print ("Return type:",type(object_1))
print (list(enumerate(list_1)))
print (list(enumerate(s_1)))

Output: Return type: [(0, 'A'), (1, 'B'), (2, 'C')] [(0, 'J'), (1, 'a'), (2, 'v'), (3, 'a'), (4, 't'), (5,
'p'), (6, 'o'), (7, 'i'), (8, 'n'), (9, 't')]
Exploratory Data Analysis and Machine Learning

1. What is the difference between decision trees and random forests?

Decision Trees Random Forests

Decision Trees are prone to overfitting. Random Forests do not overfit by using
multiple trees.

Decision Trees require low computation Random Forest consumes more


power. computation power.

The whole training set is trained alone on N decision trees are trained, each one on
a single tree. the subset of the original training set.

2. What are Covariance and Correlation?

Covariance describes a relationship between two variables. In covariance, only the


sign matters. A positive value shows that both variables vary in the same direction,
and a negative value shows that they vary in the opposite direction.
Cov(X,Y) = (Σ(X - X’)(Y - Y’)) / n - 1 where X’ and Y’ are the means of X and Y data
points.

Whereas Correlation explains the change in one variable that leads to a


proportionate change in the second variable. Correlation varies between -1 to +1.
Corr(X,Y) = Cov(X,Y) / SD(x)SD(y) where SD refers to the standard deviation.

3. What are the assumptions of linear regression?

The assumptions of linear regression are-

● Linearity - It is the relationship between the features and the target.


● Homoscedasticity - The error term has a constant variance.
● Multicollinearity - There is no multicollinearity between the features.
● Independence - Observations are independent of each other.
● Normality - The error (residuals) follows a normal distribution.

4. Explain the central limit theorem.

According to the central limit theorem, the mean of all the given samples of a
population is the same as the mean of the population (approx.) if the sample size
is sufficiently large enough with a finite variation.
5. What is the difference between bagging and boosting?

Bagging Boosting

Bagging tries to solve the over-fitting Boosting tries to reduce bias.


problem.

In bagging, each model is built In boosting, the new models are


independently. influenced by the performance of
previously built models.

Bagging has less variance but high bias. Boosting has high variance but low bias.

Example - Random Forest. Example - GBM, XGBM, LGBM, and


CatBoost.

6. What are the advantages and disadvantages of SVM?

Advantages
● It works well when there is a clear margin of separation between classes.
● It is more effective in high-dimensional spaces.
● It is effective in cases where the number of dimensions is greater than
the number of samples.
● It is relatively more memory efficient.

Disadvantages
● SVM algorithm is not suitable for large data sets.
● It does not perform very well when the data set has more noise, i.e. when target
classes are overlapping.
● In cases where the number of features for each data point exceeds the
number of training data samples, the SVM will underperform.
● The support vector classifier works by putting data points above and below
the classifying hyperplane, and there is no probabilistic explanation for the
classification.

7. What is the difference between correlation and regression?

Correlation Regression
It measures the strength or degree of It measures how one variable affects
relationship between the two variables. another variable. It is about model fitting.

It doesn’t capture casualty. It tries to capture the casualty and


describes the cause and effect.

8. Explain one-tailed and two-tailed tests.

One-tailed test - A one-tailed test is a statistical test in which the critical area of a
distribution is one-sided so that it is either greater than or less than a certain value,
but not both. If the sample that is being tested, falls into the one-sided critical area,
the alternative hypothesis will be accepted instead of the null hypothesis.

Two-tailed test - A two-tailed test is a statistical test in which the critical area of a
distribution is two-sided, and it tests whether a sample is greater than or less than a
certain range of values. If the sample being tested falls into either of the critical areas,
the alternative hypothesis is accepted instead of the null hypothesis.

9. What is the difference between univariate, bivariate, and multivariate


analysis?

● Univariate Analysis - It is a descriptive statistical technique that can be


differentiated based on the count of variables involved at a given instance of
time.
● Bivariate Analysis - This analysis is used to find the difference between two
variables at a time.
● Multivariate Analysis -The study of more than two variables is nothing but
multivariate analysis. This analysis is used to understand the effect of variables
on the responses.

10. How can we handle outliers?

Below are some methods of treating the outliers -

● By trimming or removing the outlier.


● With quantile-based flooring and capping.
● With mean or median imputation.

11. How can we treat missing values?

● By deleting the columns with the missing data.


● By deleting the rows with the missing data.
● By filling the missing data with a value imputation.
● By imputing with an additional column.
● By filling with a regression model.

12. What is a p-value?

The p-value is the probability of observing a value of the test statistic that is, as or more
extreme than what was observed in the sample, assuming that the null hypothesis is
true. If the p-value is less than 0.05, the statistical significance will be greater and it will
give high evidence to reject the null hypothesis.

13. What are the things that should be kept in mind while choosing the value of K
in the KNN algorithm?

If K is small, then results might not be reliable because the noise will have a higher
influence on the result. If K is large, then we have to do a lot of processing, which may
adversely impact the performance of the algorithm.
So, the following things must be considered while choosing the value of K-
● K should be the square root of n (number of data points in the training dataset).
● K should be chosen as an odd value so that there are no ties. If the square root
is even, then add or subtract 1 to it. So, if the value is on the higher end we
make a subtraction and if it is on the lower end then we make an addition.

14. Why is the odd value of K preferred over the even values in the KNN
algorithm?

The odd value of K is preferred over even values in order to ensure that there are no
ties in the voting. If the square root of a number of data points is even, then we add or
subtract 1 to it to make it odd.

15. Explain type 1 and type 2 errors.

Type 1 error or false positive occurs when we reject a hypothesis when it is actually
true. For example- The jury decides a person is guilty even though a person is
innocent.

Type 2 error or false negative occurs when we accept a hypothesis when it is actually
false. For example- A test for a disease may report a negative result, when the patient
is, in fact, infected.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy