Basics of ML
Basics of ML
Sample
• A sample statistic is a piece of information you get from a fraction of a
population.
• A sample is just a part of a population.
• For example, let’s say your population was every American, and you
wanted to find out how much the average person earns. Time and
finances stop you from knocking on every door in America, so you
choose to ask 1,000 random people. This one thousand people is
your sample.
• One you have your sample, you’ll get some kind of statistic. A statistic
is really just a piece of information—in this example, average
earnings.
Populatio
n
• A population is a whole, it’s every member
of a group.
• A population is the opposite to a sample,
which is a fraction or percentage of a group
• Sometimes it’s possible to survey every
member of a group.
• A classic example is the Census, where it’s
the law that you have to respond. Note: if
you do manage to survey everyone, it
actually is called a census
• If you go into a candy store, the owner might have samples of their
products on display.
• It wouldn’t be possible for you to sample everything in the store;
• Financially the owner wouldn’t want you to taste everything for free.
And you probably wouldn’t want to eat a sample of candy from a
couple hundred jars or you might get sick to your stomach.
• So, you might base your opinion about the entire store’s candy line
based on the samples they have to offer.
• The same logic holds true for most surveys in stats; You’re only going
to want to take a sample of the whole population (“population” in
this example would be the entire candy line).
• The result is a statistic about that population.
• Statistics are when you base your data from samples.
Scalar Multiplication
N-dimensional space
• Where,
• n = number of dimensions
• pi, qi = data points
Sample Points
Manhattan Distance
• Manhattan Distance is the sum of absolute differences between
points across all the dimensions.
• Since the representation is 2 dimensional, to calculate Manhattan
Distance, we will take the sum of absolute distances in both the x and
y directions.
• So, the Manhattan distance in a 2-dimensional space is given as:
And the generalized formula for an n-dimensional space is given as:
Minkowski Distance
• Minkowski Distance is the generalized form of Euclidean and
Manhattan Distance.
• The formula for Minkowski Distance is given as:
Here, p represents the order of the norm. Let’s calculate the Minkowski Distance
of the order 3:
When the order(p) is 1, it will represent Manhattan Distance and when the order in the
above formula is 2, it will represent Euclidean Distance.
Hamming Distance
Hamming Distance measures the similarity between two strings of the
same length.
The Hamming Distance between two strings of the same length is the
number of positions at which the corresponding characters are
different.
• Let’s understand the concept using an example. Let’s say we have two
strings:
• “euclidean” and “manhattan”
Since the length of these strings is equal, we can calculate the
Hamming Distance. We will go character by character and match the
strings. The first character of both the strings (e and m respectively) is
different. Similarly, the second character of both the strings (u and a) is
different. and so on.
• Look carefully – seven characters are different whereas two
characters (the last two characters) are similar:
Hence, the Hamming Distance here will be 7. Note that larger the Hamming Distance
between two strings, more dissimilar will be those strings (and vice versa).