Formulas at A Glance - IDS
Formulas at A Glance - IDS
Arithmetic Mean 1 n
x = xi Where, n is number of observations.
n i=1
Weighted arithmetic n
mean:
w x i i
x = i =1
n
Where, wi is the weight of xi
w
i =1
i
Mode
• Value that occurs most Empirical formula:
frequently in the data
• Unimodal, bimodal, mean − mode = 3 (mean − median)
trimodal
𝑛 𝑛
Variance 1 1
𝜎 2 = ∑(𝑥𝑖 − 𝜇)2 = ∑ 𝑥𝑖 2 − 𝜇2
𝑁 𝑁
𝑖=1 𝑖=1
Standard Deviation Square root of variance i.e 𝜎 = √𝜎 2
Skewness is a measure of 𝑁
𝑝 1/ℎ
d(i,j) = lim (∑𝑓=1 |𝑥𝑖1 − 𝑥𝑗1 |) = max |𝑥𝑖𝑓 − 𝑥𝑗𝑓 |
ℎ→∞ 𝑓
𝑑(𝑖, 𝑗) =
∑𝑝𝑓=1 𝛿𝑖𝑗(𝑓)
M −1 f
𝑥 − 𝑚𝑖𝑛(𝑥)
𝑥̂ = (𝑛𝑒𝑤𝑚𝑎𝑥 − 𝑛𝑒𝑤𝑚𝑖𝑛 ) + 𝑛𝑒𝑤𝑚𝑖𝑛
max (𝑥) − 𝑚𝑖𝑛(𝑥)
for range (𝑛𝑒𝑤𝑚𝑎𝑥 − 𝑛𝑒𝑤𝑚𝑖𝑛 )
z-score normalization 𝑥 − µ(𝑥)
𝑥̂ =
σ (𝑥)
𝑥
Normalization by decimal 𝑥̂ =
10𝑗
scaling
j : smallest integer such that max(|𝑥̂| < 1,
New range is [−1, +1].
Correlation coefficient 𝑛 𝑛
(𝑥𝑖 − 𝑥̅ )2 (𝑦𝑖 − 𝑦̅)2 (𝑥𝑖 𝑦𝑖 ) − 𝑛𝑥𝑦
̅̅̅
Range : -1 to +1 𝑟𝑥,𝑦 =∑ =∑
𝑛𝜎𝑥 𝜎𝑦 𝑛𝜎𝑥 𝜎𝑦
𝑖=1 𝑖=1
𝐼𝑛𝑓𝑜(𝐷) = − ∑ 𝑝𝑖 𝑙𝑜𝑔2 ( 𝑝𝑖 )
𝑖=1
Information needed (after using A to split D into v
partitions) to classify D:
𝑣
|𝐷𝑗 |
𝐼𝑛𝑓𝑜𝐴 (𝐷) = ∑ × 𝐼𝑛𝑓𝑜(𝐷𝑗 )
|𝐷|
𝑗=1
Information gained by branching on attribute A
𝐺𝑎𝑖𝑛(𝐴) = 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜𝐴 (𝐷)