Unit 2&3 - 250421 - 215911
Unit 2&3 - 250421 - 215911
y= a0+a1x+ ε
Here,
The values for x and y variables are training datasets for Linear Regression model representation.
Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression: If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linear regression :If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear
Regression.
Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is called a regression
line. A regression line can show two types of relationship:
o Positive Linear Relationship : If the dependent variable increases on the Y-axis and independent
variable increases on X-axis, then such a relationship is termed as a Positive linear relationship.
o Negative Linear Relationship: If the dependent variable decreases on the Y-axis and independent
variable increases on the X-axis, then such a relationship is called a negative linear relationship.
Finding the best fit line: When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be minimized. The best fit line will have the
least error.the different values for weights or the coefficient of lines (a 0, a1) gives a different line of regression,
so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use cost
function.
Cost function-
o The different values for weights or coefficient of lines (a 0, a1) gives the different line of regression, and
the cost function is used to estimate the values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear regression model
is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps the input
variable to the output variable. This mapping function is also known as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of squared
error occurred between the predicted values and actual values. It can be written as:
Where,
Residuals: The distance between the actual value and predicted values is called residual. If the observed points
are far from the regression line, then the residual will be high, and so cost function will high. If the scatter points
are close to the regression line, then the residual will be small and hence the cost function.
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.
o A regression model uses gradient descent to update the coefficients of the line by reducing the cost
function.
o It is done by a random selection of values of coefficient and then iteratively update the values to reach
the minimum cost function.
Y = b0 + b1 * X1 + b2 * X2 +b3 * X3 + ……………bn * Xn + ?
The goal variable here is Y and X1 , X2 , X3 , X4 , …………Xn are the independent variables, b0 is the intercept,
b1 , b2, b3 , b4 , ………..bn are the coefficients, and ? represents the error term.
3. Bayes' theorem
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the
probability of an event with uncertain knowledge.In probability theory, it relates the conditional probability and
marginal probabilities of two random events.Bayes' theorem was named after the British mathematician Thomas
Bayes. The Bayesian inference is an application of Bayes' theorem, which is fundamental to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing new information of the real
world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of
cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI
systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when
we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of
evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This is very useful in
cases where we have a good probability of these three terms and want to determine the fourth one. Suppose we
want to perceive the effect of some unknown cause, and want to compute that cause, then the Bayes' rule becomes:
Question1: What is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time. He
is also aware of some more facts, which are given as follows:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the card is
king is 4/52, then calculate posterior probability P(King|Face), which means the drawn face card is a king
card.
Solution:
Advertisement
Application of Bayes' theorem in Artificial intelligence:
o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
SVM algorithm can be used for Face detection, image classification, text categorization, etc .
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but
we need to find out the best decision boundary that helps to classify the data points. This best boundary is known
as the hyperplane of SVM.The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3
features, then hyperplane will be a 2-dimension plane.We always create a hyperplane that has a maximum margin,
which means the maximum distance between the data points.
Support Vectors:The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support
vector.
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that has
two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be
multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called
as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These points are called
support vectors. The distance between the vectors and the hyperplane is called as margin. And the goal of SVM
is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw
a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions
x and y, so for non-linear data, we will add a third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with
z=1, then it will become as:
5.SVM Kernel
A set of techniques known as kernel methods are used in machine learning to address classification, regression,
and other prediction issues. They are built around the idea of kernels, which are functions that gauge how similar
two data points are to one another in a high-dimensional feature space.Kernel methods' fundamental premise is
used to convert the input data into a high-dimensional feature space, which makes it simpler to distinguish between
classes or generate predictions. Kernel methods employ a kernel function to implicitly map the data into the feature
space, as opposed to manually computing the feature space.The most popular kind of kernel approach is
the Support Vector Machine (SVM), a binary classifier that determines the best hyperplane that most effectively
divides the two groups. In order to efficiently locate the ideal hyperplane, SVMs map the input into a higher-
dimensional space using a kernel function.Other examples of kernel methods include kernel ridge regression,
kernel PCA, and Gaussian processes. Since they are strong, adaptable, and computationally efficient, kernel
approaches are frequently employed in machine learning. They are resilient to noise and outliers and can handle
sophisticated data structures like strings and graphs.
Support Vector Machines (SVMs) use kernel methods to transform the input data into a higher-dimensional
feature space, which makes it simpler to distinguish between classes or generate predictions. Kernel approaches
in SVMs work on the fundamental principle of implicitly mapping input data into a higher-dimensional feature
space without directly computing the coordinates of the data points in that space.
The kernel function in SVMs is essential in determining the decision boundary that divides the various classes. In
order to calculate the degree of similarity between any two points in the feature space, the kernel function
computes their dot product.
The most commonly used kernel function in SVMs is the Gaussian or radial basis function (RBF) kernel. The
RBF kernel maps the input data into an infinite-dimensional feature space using a Gaussian function. This kernel
function is popular because it can capture complex nonlinear relationships in the data.
Other types of kernel functions that can be used in SVMs include the polynomial kernel, the sigmoid kernel, and
the Laplacian kernel. The choice of kernel function depends on the specific problem and the characteristics of the
data.
Basically, kernel methods in SVMs are a powerful technique for solving classification and regression problems,
and they are widely used in machine learning because they can handle complex data structures and are robust to
noise and outliers.
Kernel functions used in machine learning, including in SVMs (Support Vector Machines), have several important
characteristics, including:
o Mercer's condition: A kernel function must satisfy Mercer's condition to be valid. This condition
ensures that the kernel function is positive semi definite, which means that it is always greater than or
equal to zero.
o Positive definiteness: A kernel function is positive definite if it is always greater than zero except for
when the inputs are equal to each other.
o Non-negativity: A kernel function is non-negative, meaning that it produces non-negative values for all
inputs.
o Symmetry: A kernel function is symmetric, meaning that it produces the same value regardless of the
order in which the inputs are given.
o Reproducing property: A kernel function satisfies the reproducing property if it can be used to
reconstruct the input data in the feature space.
o Smoothness: A kernel function is said to be smooth if it produces a smooth transformation of the input
data into the feature space.
o Complexity: The complexity of a kernel function is an important consideration, as more complex kernel
functions may lead to over fitting and reduced generalization performance.
Basically, the choice of kernel function depends on the specific problem and the characteristics of the data, and
selecting an appropriate kernel function can significantly impact the performance of machine learning algorithms.
In Support Vector Machines (SVMs), there are several types of kernel functions that can be used to map the input
data into a higher-dimensional feature space. The choice of kernel function depends on the specific problem and
the characteristics of the data.
Linear Kernel
A linear kernel is a type of kernel function used in machine learning, including in SVMs (Support Vector
Machines). It is the simplest and most commonly used kernel function, and it defines the dot product between the
input vectors in the original feature space.
When using a linear kernel in an SVM, the decision boundary is a linear hyperplane that separates the different
classes in the feature space. This linear boundary can be useful when the data is already separable by a linear
decision boundary or when dealing with high-dimensional data, where the use of more complex kernel functions
may lead to overfitting.
Polynomial Kernel
A particular kind of kernel function utilised in machine learning, such as in SVMs, is a polynomial kernel (Support
Vector Machines). It is a nonlinear kernel function that employs polynomial functions to transfer the input data
into a higher-dimensional feature space.
Where x and y are the input feature vectors, c is a constant term, and d is the degree of the polynomial, K(x, y) =
(x. y + c)d. The constant term is added to, and the dot product of the input vectors elevated to the degree of the
polynomial.The decision boundary of an SVM with a polynomial kernel might capture more intricate correlations
between the input characteristics because it is a nonlinear hyperplane.The degree of nonlinearity in the decision
boundary is determined by the degree of the polynomial.
Time series information is a sequence of facts factors collected or recorded at unique time intervals. Unlike
different information kinds, in which observations are impartial of each other, time series data has an inherent
temporal ordering. This makes it unique and requires specific interest when studying and forecasting future values.
Understanding the tendencies and components of time collection facts is important for effective evaluation and
prediction.
Time collection data consists of observations made sequentially over the years, frequently at normal intervals
including daily, month-to-month, or each 12 months. These information factors could constitute diverse
phenomena, such as inventory expenses, temperature readings, earnings figures, or even internet site site visitors.
The key feature of time series records is its chronological order, which need to be maintained all through
evaluation to keep the temporal relationships among observations.
Time series facts can be decomposed into severa key additives that help us apprehend the underlying styles:
Trend
Definition: A fashion is the lengthy-term movement or direction within the facts over the years. It represents the
general tendency of the facts to growth, decrease, or live stable over an extended length.
Example: A ordinary upward fashion in annual revenue over severa years suggests constant industrial company
growth.
Seasonality
Definition: Seasonality refers to periodic fluctuations or styles that repeat at regular intervals, frequently pushed
by means of seasonal elements like weather, holidays, or monetary cycles.
Example: Retail income peaking during the holiday season each year is a traditional instance of seasonality.
Cyclic Patterns
Definition: Cyclic styles are fluctuations that get up over longer, irregular durations, unlike seasonality, which has
a fixed periodicity. These cycles are frequently inspired with the resource of outside economic or social elements.
Example: Business cycles, wherein periods of economic expansion are followed with the aid of the usage of
recessions, are an example of cyclic styles.
Definition: Noise refers to random variations or fluctuations within the facts that can not be attributed to the
fashion, seasonality, or cyclic patterns. Noise is regularly considered because the "errors" or "residual" aspect of
the time collection.
Example: Sudden spikes in stock fees due to surprising news or activities constitute noise in economic time series
facts.
To capture these components, there are a number of popular time series modelling techniques. This section
gives a brief introduction of each technique, however we will discuss about them in detail in the upcoming
chapters −
Naïve Methods
These are simple estimation techniques, such as the predicted value is given the value equal to mean of
preceding values of the time dependent variable, or previous actual value. These are used for comparison with
sophisticated modelling techniques.
Auto Regression
Auto regression predicts the values of future time periods as a function of values at previous time periods.
Predictions of auto regression may fit the data better than that of naïve methods, but it may not be able to
account for seasonality.
ARIMA Model
An auto-regressive integrated moving-average models the value of a variable as a linear function of previous
values and residual errors at previous time steps of a stationary timeseries. However, the real world data may be
non-stationary and have seasonality, thus Seasonal-ARIMA and Fractional-ARIMA were developed. ARIMA
works on univariate time series, to handle multiple variables VARIMA was introduced.
Exponential Smoothing
It models the value of a variable as an exponential weighted linear function of previous values. This statistical
model can handle trend and seasonality as well.
LSTM
Long Short-Term Memory model (LSTM) is a recurrent neural network which is used for time series to account
for long term dependencies. It can be trained with large amount of data to capture the trends in multi-variate
time series.
linear system is a mathematical model of a system based on the use of a linear operator. Linear systems typically
exhibit features and properties that are much simpler than the nonlinear case. As a mathematical abstraction or
idealization, linear systems find important applications in automatic control theory, signal processing,
and telecommunications. For example, the propagation medium for wireless communication systems can often be
modeled by linear systems.
Nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to
the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians,
and many other scientists since most systems are inherently nonlinear in nature Nonlinear dynamical systems,
describing changes in variables over time, may appear chaotic, unpredictable, or counterintuitive, contrasting with
much simpler linear systems
8. RULE INDUCTION
Rule induction is a data mining process of deducing if-then rules from a data set. These symbolic decision rules
explain an inherent relationship between the attributes and class labels in the data set. Many real-life experiences
are based on intuitive rule induction. For example, we can proclaim a rule that states “if it is 8 a.m. on a weekday,
then highway traffic will be heavy” and “if it is 8 p.m. on a Sunday, then the traffic will be light.” These rules are
not necessarily right all the time. 8 a.m. weekday traffic may be light during a holiday season. But, in general,
these rules hold true and are deduced from real-life experience based on our every day observations. Rule
induction provides a powerful classification approach that can be easily understood by the general audienceRule
induction is a machine-learning technique that involves the discovery of patterns or rules in data. It aims to extract
explicit if-then rules that can accurately predict or classify instances based on their features or attributes.
Data Preparation: The input data is prepared by organizing it into a structured format, such as a table or a matrix,
where each row represents an instance or observation, and each column represents a feature or attribute.
Rule Generation: The rule generation process involves finding patterns or associations in the data that can be
expressed as if-then rules. Various algorithms and methods can be used for rule generation, such as decision tree
algorithms (e.g., C4.5, CART), association rule mining algorithms (e.g., Apriori), and logical reasoning
usefulness. Evaluation metrics can include accuracy, coverage, support, confidence, lift, and other measures
Rule Selection and Pruning: Depending on the complexity of the rule set and the specific requirements, rule
selection and pruning techniques can be applied to refine the rule set. This process involves removing redundant,
Rule Application: Once a set of high-quality rules is obtained, they can be applied to new, unseen instances for
prediction or classification. Each instance is evaluated against the rules, and the applicable rule(s) with the highest
confidence or support is used to make predictions or decisions.Rule induction has been widely used in various
domains, such as data mining, machine learning, expert systems, and decision support systems. It provides
interpretable and human-readable models, making it useful for generating understandable insights and
explanations from data.While rule induction can be effective in capturing explicit patterns and associations in the
data, it may struggle with capturing complex or non-linear relationships. Additionally, rule induction algorithms
may face challenges when dealing with large and high-dimensional datasets, as the search space of possible rules
9. NEURAL NETWORKS
The term "Artificial Neural Network" is derived from Biological neural networks that develop the structure of a
human brain. Similar to the human brain that has neurons interconnected to one another, artificial neural networks
also have neurons that are interconnected to one another in various layers of the networks. These neurons are
known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus represents
Nodes, synapse represents Weights, and Axon represents Output.
Dendrites Inputs
Synapse Weights
Axon Output
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the network of
neurons makes up a human brain so that computers will have an option to understand things and make decisions
in a human-like manner. The artificial neural network is designed by programming computers to behave simply
like interconnected brain cells.There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such a manner
as to be distributed, and we can extract more than one piece of this data when necessary from our memory
parallelly. We can say that the human brain is made up of incredibly amazing parallel processors.We can
understand the artificial neural network with an example, consider an example of a digital logic gate that takes an
input and gives an output. "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get "On"
in output. If both the inputs are "Off," then we get "Off" in output. Here the output depends upon input. Our brain
does not perform the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
To understand the concept of the architecture of an artificial neural network, we have to understand what a neural
network consists of. In order to define a neural network that consists of a large number of artificial neurons, which
are termed units arranged in a sequence of layers. Lets us look at various types of layers available in an artificial
neural network.
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden
features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output that is
conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias. This
computation is represented in the form of a transfer function.
A Feedforward Neural Network (FNN) is a type of artificial neural network where connections between the
nodes do not form cycles. This characteristic differentiates it from recurrent neural networks (RNNs). The
network consists of an input layer, one or more hidden layers, and an output layer. Information flows in one
direction—from input to output—hence the name "feedforward."
Activation Functions
Activation functions introduce non-linearity into the network, enabling it to learn and model complex data
patterns. Common activation functions include:
Sigmoid: σ(x)=σ(x)=11+e−xσ(x)=1+e−x1
Tanh: tanh(x)=ex−e−xex+e−xtanh(x)=ex+e−xex−e−x
ReLU (Rectified Linear Unit): ReLU(x)=max(0,x)ReLU(x)=max(0,x)
Leaky ReLU: Leaky ReLU(x)=max(0.01x,x)Leaky ReLU(x)=max(0.01x,x)
Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial neural networks,
particularly feed-forward networks. It works iteratively, minimizing the cost function by adjusting weights
and biases.In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient descent.
The algorithm computes the gradient using the chain rule from calculus, allowing it to effectively navigate
complex layers in the neural network to minimize the cost function.
PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.
PCA works by considering the variance of each attribute because the high attribute shows the good split between
the classes, and hence it reduces the dimensionality. Some real-world applications of PCA are image processing,
movie recommendation system, optimizing the power allocation in various communication channels. It is a
feature extraction technique, so it contains the important variables and drops the least important variable.
o Dimensionality: It is the number of features or variables present in the given dataset. More easily, it is
the number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each other. Such as if one changes,
the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1 occurs if
variables are inversely proportional to each other, and +1 indicates that variables are directly proportional
to each other.
o Orthogonal: It defines that variables are not correlated to each other, and hence the correlation between
the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector
if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
Covariance Matrix.
Problem-01:
Given data = { 2, 3, 4, 5, 6, 7 ; 1, 5, 3, 6, 7, 8 }.
Compute the principal component using PCA Algorithm.
OR
Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
OR
CLASS 1
X=2,3,4
Y=1,5,3
CLASS 2
X=5,6,7
Y=6,7,8
Solution
Step-01:
Get data.
x1 = (2, 1)
x2 = (3, 5)
x3 = (4, 3)
x4 = (5, 6)
x5 = (6, 7)
x6 = (7, 8)
Step-02:
= (4.5, 5)
Thus,
Step-03:
Step-04:
Now,
Covariance matrix
= (m1 + m2 + m3 + m4 + m5 + m6) / 6
Calculate the eigen values and eigen vectors of the covariance matrix.
So, we have-
From here,
(2.92 – λ)(5.67 – λ) – (3.67 x 3.67) = 0
λ2 – 8.59λ + 3.09 = 0
Clearly, the second eigen value is very small compared to the first eigen value.
So, the second eigen vector can be left out.
Eigen vector corresponding to the greatest eigen value is the principal component for the given
data set.
So. we find the eigen vector corresponding to eigen value λ1.
M = Covariance Matrix
X = Eigen vector
λ = Eigen value
On simplification, we get-
5.3X1 = 3.67X2 ………(1)
1. This concept is flexible and we can easily understand and implement it.
2. It is used for helping the minimization of the logics created by the human.
3. It is the best method for finding the solution of those problems which are suitable for approximate or
uncertain reasoning.
4. It always offers two values, which denote the two possible solutions for a problem and statement.
5. It allows users to build or create the functions which are non-linear of arbitrary complexity.
6. In fuzzy logic, everything is a matter of degree.
7. In the Fuzzy logic, any system which is logical can be easily fuzzified.
8. It is based on natural language processing.
9. It is also used by the quantitative analysts for improving their algorithm's execution.
10. It also allows users to integrate with the programming.
In the architecture of the Fuzzy Logic system, each component plays an important role. The architecture consists
of the different four components which are given below.
1. Rule Base
2. Fuzzification
3. Inference Engine
4. Defuzzification
Following diagram shows the architecture or process of a Fuzzy Logic system:
1. Rule Base
Rule Base is a component used for storing the set of rules and the If-Then conditions given by the experts are used
for controlling the decision-making systems. There are so many updates that come in the Fuzzy theory recently,
which offers effective methods for designing and tuning of fuzzy controllers. These updates or developments
decreases the number of fuzzy set of rules.
2. Fuzzification
Fuzzification is a module or component for transforming the system inputs, i.e., it converts the crisp number into
fuzzy steps. The crisp numbers are those inputs which are measured by the sensors and then fuzzification passed
them into the control systems for further processing. This component divides the input signals into following five
states in any Fuzzy Logic system:
3. Inference Engine
This component is a main component in any Fuzzy Logic system (FLS), because all the information is processed
in the Inference Engine. It allows users to find the matching degree between the current fuzzy input and the rules.
After the matching degree, this system determines which rule is to be added according to the given input field.
When all rules are fired, then they are combined for developing the control actions.
4. Defuzzification
Defuzzification is a module or component, which takes the fuzzy set inputs generated by the Inference Engine,
and then transforms them into a crisp value. It is the last step in the process of a fuzzy logic system. The crisp
value is a type of value which is acceptable by the user. Various techniques are present to do this, but the user has
to select the best one for reducing the errors.
Membership Function
The membership function is a function which represents the graph of fuzzy sets, and allows users to quantify
the linguistic term. It is a graph which is used for mapping each element of x to the value between 0 and 1.
This function of Membership was introduced in the first papers of fuzzy set by Zadeh. For the Fuzzy set B, the
membership function for X is defined as: μB:X → [0,1]. In this function X, each element of set B is mapped to
the value between 0 and 1. This is called a degree of membership or membership value.
To learn about classical and Fuzzy set theory, firstly you have to know about what is set.
Set
A set is a term, which is a collection of unordered or ordered elements. Following are the various examples of a
set:
Types of Set:
1. Finite
2. Empty
3. Infinite
4. Proper
5. Universal
6. Subset
7. Singleton
8. Equivalent Set
9. Disjoint Set
Classical Set
It is a type of set which collects the distinct objects in a group. The sets with the crisp boundaries are classical
sets. In any set, each single entity is called an element or member of that set.
Any set can be easily denoted in the following two different ways:
1. Roaster Form: This is also called as a tabular form. In this form, the set is represented in the following way:
Advertisement
Following are the two examples which describes the set in Roaster or Tabular form:
Example 1:
Set of Natural Numbers: N={1, 2, 3, 4, 5, 6, 7, ......,n).
Example 2:
Set of Prime Numbers less than 50: X={2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47}.
2. Set Builder Form: Set Builder form defines a set with the common properties of an element in a set. In this
form, the set is represented in the following way:
A = {x:p(x)}
The following example describes the set in the builder form:
The set {2, 4, 6, 8, 10, 12, 14, 16, 18} is written as:
B = {x:2 ≤ x < 20 and (x%2) = 0}
Following are the various operations which are performed on the classical sets:
1. Union Operation
2. Intersection Operation
3. Difference Operation
4. Complement Operation
1. Union:
This operation is denoted by (A U B). A U B is the set of those elements which exist in two different sets A and
B. This operation combines all the elements from both the sets and make a new set. It is also called a Logical OR
operation.
A ∪ B = { x | x ∈ A OR x ∈ B }.
Example:
Set A = {10, 11, 12, 13}, Set B = {11, 12, 13, 14, 15}, then A ∪ B = {10, 11, 12, 13, 14, 15}
2. Intersection
This operation is denoted by (A ∩ B). A ∩ B is the set of those elements which are common in both set A and B.
It is also called a Logical OR operation.
A ∩ B = { x | x ∈ A AND x ∈ B }.
Example:
Set A = {10, 11, 12, 13}, Set B = {11, 12, 14} then A ∩ B = {11, 12}
3. Difference Operation
This operation is denoted by (A - B). A-B is the set of only those elements which exist only in set A but not in set
B.
A - B = { x | x ∈ A AND x ∉ B }.
4. Complement Operation: This operation is denoted by (A`). It is applied on a single set. A` is the set of
elements which do not exist in set A.
It can be described as:
A′ = {x|x ∉ A}.
There are following various properties which play an essential role for finding the solution of a fuzzy logic
problem.
1. Commutative Property:
This property provides the following two states which are obtained by two finite sets A and B:
A∪B=B∪A
A∩B=B∩A
2. Associative Property:
This property also provides the following two states but these are obtained by three different finite sets A, B, and
C:
A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C
3. Idempotency Property:
This property also provides the following two states but for a single finite set A:
A∪A=A
A∩A=A
4. Absorption Property
This property also provides the following two states for any two finite sets A and B:
A ∪ (A ∩ B) = A
A ∩ (A ∪ B) = A
5. Distributive Property:
This property also provides the following two states for any three finite sets A, B, and C:
A∪ (B ∩ C) = (A ∪ B)∩ (A ∪ C)
A∩ (B ∪ C) = (A∩B) ∪ (A∩C)
6. Identity Property:
This property provides the following four states for any finite set A and Universal set X:
A ∪ φ =A
A∩X=A
A∩φ=φ
A∪X=X
7. Transitive property
This property provides the following state for the finite sets A, B, and C:
If A ⊆ B ⊆ C, then A ⊆ C
8. Ivolution property
This law gives the following rules for providing the contradiction and tautologies:
Fuzzy Set
The set theory of classical is the subset of Fuzzy set theory. Fuzzy logic is based on this theory, which is a
generalisation of the classical theory of set (i.e., crisp set) introduced by Zadeh in 1965.
A fuzzy set is a collection of values which exist between 0 and 1. Fuzzy sets are denoted or represented by the
tilde (~) character. The sets of Fuzzy theory were introduced in 1965 by Lofti A. Zadeh and Dieter Klaua. In the
fuzzy set, the partial membership also exists. This theory released as an extension of classical set theory.
This theory is denoted mathematically asA fuzzy set (Ã) is a pair of U and M, where U is the Universe of discourse
and M is the membership function which takes on values in the interval [ 0, 1 ]. The universe of discourse (U) is
also denoted by Ω or X.
Given à and B are the two fuzzy sets, and X be the universe of discourse with the following respective member
functions:
Example:
For X1
Example:
For X1
μĀ(x) = 1-μA(x),
Example:
For X1
μĀ(X1) = 1-μA(X1)
μĀ(X1) = 1 - 0.3
μĀ(X1) = 0.7
For X2
μĀ(X2) = 1-μA(X2)
μĀ(X2) = 1 - 0.8
μĀ(X2) = 0.2
For X3
μĀ(X3) = 1-μA(X3)
μĀ(X3) = 1 - 0.5
μĀ(X3) = 0.5
For X4
μĀ(X4) = 1-μA(X4)
μĀ(X4) = 1 - 0.1
μĀ(X4) = 0.9
1. This theory is a class of those sets having sharp boundaries. 1. This theory is a class of those sets having un-sharp bound
2. This set theory is defined by exact boundaries only 0 and 1. 2. This set theory is defined by ambiguous boundaries.
3. In this theory, there is no uncertainty about the boundary's 3. In this theory, there always exists uncertainty about the
location of a set. boundary's location of a set.
4. This theory is widely used in the design of digital systems. 4. It is mainly used for fuzzy controllers.
Following are the different application areas where the Fuzzy Logic concept is widely used:
Fuzzy Logic has various advantages or benefits. Some of them are as follows:
Fuzzy Logic has various disadvantages or limitations. Some of them are as follows:
1. The run time of fuzzy logic systems is slow and takes a long time to produce outputs.
2. Users can understand it easily if they are simple.
3. The possibilities produced by the fuzzy logic system are not always accurate.
4. Many researchers give various ways for solving a given statement using this technique which leads to
ambiguity.
5. Fuzzy logics are not suitable for those problems that require high accuracy.
6. The systems of a Fuzzy logic need a lot of testing for verification and validation.
Fuzzy Decision Tree
Let us now discuss the steps involved in the decision making process −
Determining the Set of Alternatives − In this step, the alternatives from which the decision has to be
taken must be determined.
Evaluating Alternative − Here, the alternatives must be evaluated so that the decision can be taken
about one of the alternatives.
Comparison between Alternatives − In this step, a comparison between the evaluated alternatives is
done.
Types of Decision
In this type of decision making, only a single person is responsible for taking decisions. The decision making
model in this kind can be characterized as −
The goals and constraints stated above are expressed in terms of fuzzy sets.
Now consider a set A. Then, the goal and constraints for this set are given by −
FD=min[i∈XinnfGi(a),j∈XinmfCj(a)]FD=min[i∈XninfGi(a),j∈XminfCj(a)]
Decision making in this case includes several persons so that the expert knowledge from various persons is
utilized to make decisions.
Multi-objective decision making occurs when there are several objectives to be realized. There are following
two issues in this type of decision making −
To acquire proper information related to the satisfaction of the objectives by various alternatives.
To weigh the relative importance of each objective.
A=[a1,a2,...,ai,...,an]A=[a1,a2,...,ai,...,an]
Multi-attribute decision making takes place when the evaluation of alternatives can be carried out based on
several attributes of the object. The attributes can be numerical data, linguistic data and qualitative data.
Mathematically, the multi-attribute evaluation is carried out on the basis of linear equation as follows −
Y=A1X1+A2X2+...+AiXi+...+ArXr
UNIT 3
1. Data Stream in Data Analytics
Data stream refers to the continuous flow of data generated by various sources in real-time. It plays a crucial role
in modern technology, enabling applications to process and analyze information as it arrives, leading to timely
insights and actions. In this article, we are going to discuss concepts of the data stream in data analytics in detail
what data streams are, their importance, and how they are used in fields like finance, telecommunications, and
IoT (Internet of Things).A data stream is an existing, continuous, ordered (implicitly by entrance time or explicitly
by timestamp) chain of items. It is unfeasible to control the order in which units arrive, nor it is feasible to locally
capture stream in its entirety.It is enormous volumes of data, items arrive at a high rate.
Data stream –
A data stream is a(possibly unchained) sequence of tuples. Each tuple comprised of a set of attributes, similar to
a row in a database table.
Transactional data stream –
It is a log interconnection between entities
1. Credit card – purchases by consumers from producer
2. Telecommunications – phone calls by callers to the dialed parties
3. Web – accesses by clients of information at servers
1. Fraud perception
2. Real-time goods dealing
3. Consumer enterprise
4. Observing and describing on inside IT systems
The Data Stream Management System manages continuous streams of data with very fast changes in real-time.
Unlike other databases which may be static data, its source might include sensors or social media. It thus offers
real-time insight and rapid decision-making in those applications that need immediate data analysis and
reporting.
3.DSMS Architecture
DSMS stands for data stream management system. It is nothing but a software application just
like DBMS (database management system) but it involves processing and management of a continuously
flowing data stream rather than static data like Excel PDF or other files. It is generally used to deal data
streamsfrom with various sources which include sensor data, social media fields, financial reports, etc.Just like
DBMS, DSMS also provides a wide range of operations like storage, processing, analyzing, integration also
helps to generate the visualization and report only used for data streams.There are wide range of DSMS
applications available in the market among them Apache Flint, Apache Kafka, Apache Storm, Amazon kinesis,
etc. DSMS processes 2 types of queries standard queries and ad hoc queries.
DSMS consists of various layer which are dedicated to perform particular operation which are as follows:
1. Data source Layer
The first layer of DSMS is data source layer as it name suggest it is comprises of all the data sources which
includes sensors, social media feeds, financial market, stock markets etc. In the layer capturing and parsing of
data stream happens. Basically it is the collection layer which collects the data.
2. Data Ingestion Layer
You can consider this layer as bridge between data source layer and processing layer. The main purpose of this
layer is to handle the flow of data i.e., data flow control, data buffering and data routing.
3. Processing Layer
This layer consider as heart of DSMS architecture it is functional layer of DSMS applications. It process the
data streams in real time. To perform processing it is uses processing engines like Apache flink or Apache
storm etc., The main function of this layer is to filter, transform, aggregate and enriching the data stream. This
can be done by derive insights and detect patterns.
4. Storage Layer
Once data is process we need to store the processed data in any storage unit. Storage layer consist of various
storage like NoSQL database, distributed database etc., It helps to ensure data durability and availability of
data in case of system failure.
5. Querying Layer
As mentioned above it support 2 types of query ad hoc query and standard query. This layer provides the tools
which can be used for querying and analyzing the stored data stream. It also have SQL like query languages or
programming API. This queries can be question like how many entries are done? which type of data is inserted?
etc.,
6. Visualization and Reporting Layer
This layer provides tools for perform visualization like charts, pie chart, histogram etc., On the basis of this
visual representation it also helps to generate the report for analysis.
7. Integration Layer
This layer responsible for integrating DSMS application with traditional system, business intelligence tools,
data warehouses, ML application, NLP applications. It helps to improve already present running applications.
The layers are responsible for working of DSMS applications. It provides scalable and fault tolerance
application which can handle huge volume of streaming data. These layer can change according to the business
requirements some may include all layer some may exclude layers.
DBMS refers to Data Base Management System. DSMS refers to Data Stream Management System.
Data Base Management System deals with Data Stream Management System deals with
persistent data. stream data.
In DBMS random data access takes place. In DSMS sequential data access takes place.
It is based on Query Driven processing model i.e It is based on Data Driven processing model i.e
called pull based model. called push based model.
The data update rates in DBMS is relatively low. The data update rates in DSMS is relatively high.
In DBMS the queries are one time queries. But in DSMS the queries are continuous.
DBMS provides no real time service. DSMS provides real time service.
DBMS uses unbounded disk store means unlimited DSMS uses bounded main memory means
secondary storage. limited main memory.
There are mainly two types of Data Sampling techniques which are further divided into 4 sub-categories each.
They are as follows:
Probability Data Sampling technique involves selecting data points from a dataset in such a way that every data
point has an equal chance of being chosen. Probability sampling techniques ensure that the sample is
representative of the population from which it is drawn, making it possible to generalize the findings from the
sample to the entire population with a known level of confidence.
1. Simple Random Sampling: In Simple random sampling, every dataset has an equal chance or probability
of being selected. For eg. Selection of head or tail. Both of the outcomes of the event have equal
probabilities of getting selected.
2. Systematic Sampling: In Systematic sampling, a regular interval is chosen each after which the dataset
continues for sampling. It is more easier and regular than the previous method of sampling and reduces
inefficiency while improving the speed. For eg. In a series of 10 numbers, we have a sampling after every
2nd number. Here we use the process of Systematic sampling.
3. Stratified Sampling: In Stratified sampling, we follow the strategy of divide & conquer. We opt for the
strategy of dividing into groups on the basis of similar properties and then perform sampling. This ensures
better accuracy. For eg. In a workplace data, the total number of employees is divided among men and
women.
4. Cluster Sampling: Cluster sampling is more or less like stratified sampling. However in cluster sampling
we choose random data and form it in groups, whereas in stratified we use strata, or an orderly division
takes place in the latter. For eg. Picking up users of different networks from a total combination of users.
Non-probability data sampling means that the selection happens on a non-random basis, and it depends on the
individual as to which data does it want to pick. There is no random selection and every selection is made by a
thought and an idea behind it.
1. Convenience Sampling: As the name suggests, the data checker selects the data based on his/her
convenience. It may choose the data sets that would require lesser calculations, and save time while
bringing results at par with probability data sampling technique. For eg. Dataset involving recruitment of
people in IT Industry, where the convenience would be to choose the data which is the latest one, and the
one which encompasses youngsters more.
2. Voluntary Response Sampling: As the name suggests, this sampling method depends on the voluntary
response of the audience for the data. For eg. If a survey is being conducted on types of Blood groups
found in majority at a particular place, and the people who are willing to take part in this survey, and then
if the data sampling is conducted, it will be referred to as the voluntary response sampling.
3. Purposive Sampling: The Sampling method that involves a special purpose falls under purposive
sampling. For eg. If we need to tackle the need of education, we may conduct a survey in the rural areas
and then create a dataset based on people's responses. Such type of sampling is called Purposive Sampling.
4. Snowball Sampling: Snowball sampling technique takes place via contacts. For eg. If we wish to conduct
a survey on the people living in slum areas, and one person contacts us to the other and so on, it is called
a process of snowball sampling.
Find a Target Dataset: Identify the dataset that you want to analyze or draw conclusions about. This
dataset represents the larger population from which a sample will be drawn.
Select a Sample Size: Determine the size of the sample you will collect from the target dataset. The sample
size is the subset of the larger dataset on which the sampling process will be performed.
Decide the Sampling Technique: Choose a suitable sampling technique from options such as Simple
Random Sampling, Systematic Sampling, Cluster Sampling, Snowball Sampling, or Stratified Sampling.
The choice of technique depends on factors such as the nature of the dataset and the research objectives.
Perform Sampling: Apply the selected sampling technique to collect data from the target dataset. Ensure
that the sampling process is carried out systematically and according to the chosen method.
Draw Inferences for the Entire Dataset: Analyze the properties and characteristics of the sampled data
subset. Use statistical methods and analysis techniques to draw inferences and insights that are
representative of the entire dataset.
Extend Properties to the Entire Dataset: Extend the findings and conclusions derived from the sample
to the entire target dataset. This involves extrapolating the insights gained from the sample to make broader
statements or predictions about the larger population.
Data Sampling helps draw conclusions, or inferences of larger datasets using a smaller sample space, which
concerns the entire dataset.
It helps save time and is a quicker and faster approach.
It is a better way in terms of cost effectiveness as it reduces the cost for data analysis, observation and
collection. It is more of like gaining the data, applying sampling method & drawing the conclusion.
It is more accurate in terms of result and conclusion.
Sampling Error: It is the act of differentiation among the entire sample size and the smaller dataset. There
arise some differences in characteristics, or properties among both the datasets that reduce the accuracy
and the sample set is unable to represent a larger piece of information. Sampling Error mostly occurs by a
chance and is regarded as an error-less term.
It becomes difficult in a few data sampling methods, such as forming clusters of similar properties.
Sampling Bias: It is the process of choosing a sample set which does not represent the entire population
on a whole. It occurs mostly due to incorrect method of sampling usage and consists of errors as the given
dataset is not properly able to draw conclusions for the larger set of data.
7.FILTERING STREAM
Bloom Filters
Bloom Filter is a data structure that can do this job. It is mainly a spaced optimized version of hashing where
we may have false positives. The idea is to not store the actual key rather store only hash values. It is mainly a
probabilistic and space optimized hashing where less than 10 bits per key are required for a 1% false positive
probability and is not dependent on the size of individual keys.
A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a
member of a set. For example, checking availability of username is set membership problem, where the set is
the list of all registered username. The price we pay for efficiency is that it is probabilistic in nature that means,
there might be some False Positive results. False positive means, it might tell that given username is already
taken but actually it’s not.
We need k number of hash functions to calculate the hashes for a given input. When we want to add an item in
the filter, the bits at k indices h1(x), h2(x), … hk(x) are set, where indices are calculated using hash functions.
Example – Suppose we want to enter “geeks” in the filter, we are using 3 hash functions and a bit array of length
10, all set to 0 initially. First we’ll calculate the hashes as follows:
h1(“geeks”) % 10 = 1
h2(“geeks”) % 10 = 4
h3(“geeks”) % 10 = 7
Note: These outputs are random for explanation only.
Now we will set the bits at indices 1, 4 and 7 to 1
Now if we want to check “geeks” is present in filter or not. We’ll do the same process but this time in reverse
order. We calculate respective hashes using h1, h2 and h3 and check if all these indices are set to 1 in the bit
array. If all the bits are set then we can say that “geeks” is probably present. If any of the bit at these indices
are 0 then “geeks” is definitely not present.
Probability of False positivity: Let m be the size of bit array, k be the number of hash functions and n be the
number of expected elements to be inserted in the filter, then the probability of false positive p can be calculated
as:
P=(1−[1−1m]kn)k P=(1−[1−m1]kn)k
Size of Bit Array: If expected number of elements n is known and desired false positive probability is p then
the size of bit array m can be calculated as :
m=−nlnP(ln2)2 m=−(ln2)2nlnP
Optimum number of hash functions: The number of hash functions k must be a positive integer. If m is size
of bit array and n is number of elements to be inserted, then k can be calculated as :
k=mnln2 k=nmln2
Space Efficiency
If we want to store large list of items in a set for purpose of set membership, we can store it in hashmap, tries or
simple array or linked list. All these methods require storing item itself, which is not very memory efficient. For
example, if we want to store “geeks” in hashmap we have to store actual string “ geeks” as a key value pair
{some_key : ”geeks”}. Bloom filters do not store the data item at all. As we have seen they use bit array which
allow hash collision. Without hash collision, it would not be compact.
The hash function used in bloom filters should be independent and uniformly distributed. They should be fast as
possible. Fast simple non cryptographic hashes which are independent enough include murmur, FNV series of
hash functions and Jenkins hashes. Generating hash is major operation in bloom filters. Cryptographic hash
functions provide stability and guarantee but are expensive in calculation. With increase in number of hash
functions k, bloom filter become slow. All though non-cryptographic hash functions do not provide guarantee
but provide major performance improvement.
To estimate the number of different elements appearing in a stream, we can hash elements to integers interpreted
as binary numbers. 2 raised to the power that is the longest sequence of 0's seen in the hash value of any stream
element is an estimate of the number of different elements.
Eg. Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = (ax + b) mod 32
a) h(x) = 3x + 1 mod 32
b) h(x) = x + 6 mod 32
a) h(x) = 3x + 7 mod 32
h(4) = 3(4) + 7 mod 32 = 19 mod 32 = 19 = (10011)
h(2) = 3(2) + 7 mod 32 = 13 mod 32 = 13 = (01101)
h(5) = 3(5) + 7 mod 32 = 22 mod 32 = 22 = (10110)
h(9) = 3(9) + 7 mod 32 = 34 mod 32 = 2 = (00010)
h(1) = 3(1) + 7 mod 32 = 10 mod 32 = 10 = (01010)
h(6) = 3(6) + 7 mod 32 = 25 mod 32 = 25 = (11001)
h(3) = 3(3) + 7 mod 32 = 16 mod 32 = 16 = (10000)
h(7) = 3(7) + 7 mod 32 = 28 mod 32 = 28 = (11100)
Trailing zero's {0, 0, 1, 1, 1, 0, 4, 2}
R = max [Trailing Zero] = 4
Output = 2R = 24 = 16
b) h(x) = x + 6 mod 32
h(4) = (4) + 6 mod 32 = 10 mod 32 = 10 = (01010)
h(2) = (2) + 6 mod 32 = 8 mod 32 = 8 = (01000)
h(5) = (5) + 6 mod 32 = 11 mod 32 = 11 = (01011)
h(9) = (9) + 6 mod 32 = 15 mod 32 = 15 = (01111)
h(1) = (1) + 6 mod 32 = 7 mod 32 = 7 = (00111)
h(6) = (6) + 6 mod 32 = 12 mod 32 = 12 = (01110)
h(3) = (3) + 6 mod 32 = 9 mod 32 = 9 = (01001)
h(7) = (7) + 6 mod 32 = 13 mod 32 = 13 = (01101)
Trailing zero's {1, 3, 0, 0, 0, 1, 0, 0}
R = max [Trailing Zero] = 3
Output = 2R = 23 = 8
Designed to find the number 1’s in a data set. This algorithm uses O(log²N) bits to represent a window of N bit,
allows to estimate the number of 1’s in the window with and error of no more than 50%.
So this algorithm gives a 50% precise answer. In DGIM algorithm, each bit that arrives has a timestamp, for the
position at which it arrives. if the first bit has a timestamp 1, the second bit has a timestamp 2 and so on.. the
positions are recognized with the window size N (the window sizes are usually taken as a multiple of 2).The
1. The right side of the bucket should always start with 1. (if it starts with a 0,it is to be neglected) E.g. ·
1001011 → a bucket of size 4 ,having four 1’s and starting with 1 on it’s right end.
2. Every bucket should have at least one 1, else no bucket can be formed.
4. The buckets cannot decrease in size as we move to the left. (move in increasing order towards left)
Let us take an example to understand the algorithm.Estimating the number of 1’s and counting the buckets in the
data stream let us assume the new bit arrives from the right. When the new bit = 0
After the new bit ( 0 ) arrives with a time stamp 101, there is no change in the buckets.But what if the new bit that
·If there was only one bucket of size 1, then nothing more needs to be done. However, if there are now three
buckets of size 1( buckets with timestamp 100,102, 103 in the second step in the picture) We fix the problem by
combining the leftmost(earliest) two buckets of size 1. (purple box)To combine any two adjacent buckets of the
same size, replace them by one bucket of twice the size. The timestamp of the new bucket is the timestamp of the
Now, sometimes combining two buckets of size 1 may create a third bucket of size 2. If so, we combine the leftmost
two buckets of size 2 into a bucket of size 4. This process may ripple through the bucket sizes.How long can you
continue doing this…You can continue if current timestamp- leftmost bucket timestamp of window < N (=24 here)
E.g. 103–87=16 < 24 so I continue, if it greater or equal to then I stop.Finally the answer to the query.How many
1’s are there in the last 20 bits?Counting the sizes of the buckets in the last 20 bits, we say, there are 11 ones.
In real-time, analysis of data allows users to view, analyse and understand data in the system it's entered.
Mathematical reasoning and logic are incorporated into the data, which means it gives users a sense of real-time
data to make decisions.
Real-time analytics allows organizations to gain awareness and actionable information immediately or as soon as
the data has entered their systems. Analytics responses in real-time are completed within a matter of minutes.
They can process a huge amount of data in a short time with high speed and a low response time. For instance,
real-time big-data analytics makes use of financial databases to inform traders of decisions. Analytics may be
performed on-demand or continuously. On-demand alerts users to results when the user wants them. Users can
continuously update their results as events occur. It can also be programmed to respond to specific circumstances
automatically. For instance, real-time web analytics could restructure the administrator's page if the load
presentation is not within the boundaries of the present.
Examples -
o Monitoring orders as they take place to trace them better and determine the type of clothing.
o Continuously modernize customer interactions, such as the number of page views and shopping cart
usage, to better understand the etiquette of users.
o Select customers who are more advanced in their shopping habits in a shop, impacting the decisions in
real time.
Real-time analytics tools for data analytics can pull or push. Streaming demands that faculty push huge amounts
of fast-moving data. If streaming consumes too many resources and isn't an empirical process, data could be
moved at intervals between a couple of seconds and hours. The two may occur between business requirements
that need to be figured out in order not to interrupt the flow. The time to react for real-time analysis can vary from
nearly instantaneous to a few minutes or seconds. The key components of real-time analytics comprise the
following.
o Aggregator
o Broker
o Analytics engine
o Stream processor
Momentum is the primary benefit of real-time analysis of data. The shorter a company has to wait for data from
the moment it arrives and is processed, and the business is able to utilize data insights to make changes and make
the results of a crucial decision.
In the same way, real-time analytics tools allow companies to see how users connect to an item after liberating
the product, so there's no problem in understanding the behaviour of users to make the necessary adjustments.
Advantages of Real-time Analytics:
Real-time analytics in Big Data provides the ability to extract useful insights quickly from massive
datasets. Real-time analytics stands at the forefront of this transformation, enabling organizations to analyze
data streams as they are generated, rather than relying on historical analysis alone. This capability not only
enhances decision-making processes but also empowers businesses to respond dynamically to changing market
conditions, customer behaviors, and operational challenges.
Real-time analytics involves a comprehensive and intricate process that encompasses several critical
components and steps. Here’s a more detailed breakdown of how it operates:
Data Ingestion
Continuous Data Collection: Real-time analytics systems continuously collect data from various
sources, such as sensors, IoT devices, social media feeds, transaction logs, and application databases.
This data can come in various formats, including structured, semi-structured, and unstructured data.
Stream Processing: Data is ingested as streams, meaning it is captured and processed in real-time as it
arrives. Technologies like Apache Kafka, RabbitMQ, and Amazon Kinesis are commonly used for data
ingestion due to their ability to handle high-throughput data streams reliably.
Data Processing Engines
Stream Processing Platforms: Once ingested, the data is processed by stream processing engines such
as Apache Flink, Apache Storm, or Spark Streaming. These platforms are designed to handle continuous
data flows and perform complex event processing, transformations, aggregations, and filtering in real-
time.
In-Memory Processing: To ensure low-latency processing, many real-time analytics solutions use in-
memory computing frameworks. This allows data to be processed directly in memory rather than being
written to disk, significantly speeding up the processing time.
Parallel Processing: Real-time analytics systems often employ parallel processing techniques,
distributing the workload across multiple nodes or processors to handle large volumes of data efficiently.
Real-Time Querying
Low-Latency Query Engines: Real-time query engines like Apache Druid, ClickHouse, and Amazon
Redshift Spectrum allow users to run queries on streaming data with minimal delay. These engines are
optimized for low-latency query execution, providing near-instantaneous results.
Complex Queries: Users can perform complex queries and analytical operations on streaming data, such
as joins, aggregations, window functions, and pattern matching, enabling sophisticated real-time analysis.
Data Storage
Time-Series Databases: Real-time analytics often involves storing data in time-series databases like
InfluxDB, TimescaleDB, or OpenTSDB. These databases are optimized for handling time-stamped data
and can efficiently store and retrieve real-time data points.
NoSQL Databases: For unstructured or semi-structured data, NoSQL databases like MongoDB,
Cassandra, and HBase provide flexible storage solutions that can scale horizontally to accommodate
large data volumes.
Visualization Tools
Dashboards and BI Tools: Real-time data is visualized using dashboards and business intelligence (BI)
tools like Tableau, Power BI, Grafana, and Kibana. These tools provide interactive and customizable
visualizations that allow users to monitor and analyze data in real-time.
Alerts and Notifications: Real-time analytics systems can be configured to trigger alerts and
notifications based on predefined conditions or thresholds. This enables proactive responses to critical
events, such as system failures, security breaches, or significant business metrics.
Benefits and Advantages of using Real-Time Analytics
Immediate Insights
Faster Decision-Making: Real-time analytics provides instant access to data insights, allowing businesses
to make informed decisions quickly.
Proactive Problem Solving: By continuously monitoring data streams, organizations can identify and
address issues as they arise, preventing potential problems from escalating.
Enhanced Operational Efficiency
Optimized Processes: By analyzing data as it is generated real-time analytics, businesses can streamline
processes, reduce waste, and improve overall productivity.
Resource Allocation: Organizations can optimize the allocation of resources, such as labor, inventory, and
energy, based on real-time demand and usage patterns.
Improved Customer Experience
Personalized Interactions: Real-time analytics enables businesses to tailor their interactions with
customers based on current data.
Responsive Service: By analyzing customer behavior and feedback in real-time, businesses can quickly
address issues and adapt their services to meet customer expectations.
Competitive Advantage
Market Responsiveness: Businesses that leverage real-time analytics can quickly adapt to changing
market conditions and emerging trends. This allows them to stay ahead of competitors and capitalize on
new opportunities.
Innovation: Real-time data insights can drive innovation by highlighting emerging trends and customer
preferences. Businesses can use these insights to develop new products and services that meet market
demands.
Enhanced Risk Management
Fraud Detection: Real-time analytics is critical for identifying and preventing fraudulent activities. By
continuously monitoring transactions and behavior patterns, businesses can detect anomalies and take
immediate action to mitigate risks.
Operational Risk Management: Real-time monitoring of operations allows businesses to identify and
address potential risks before they cause significant disruptions.
Improved Financial Performance
Revenue Optimization: Real-time analytics can help businesses identify and capitalize on revenue
opportunities. For example, dynamic pricing models can adjust prices based on current demand and market
conditions, maximizing revenue.
Cost Reduction: By optimizing operations and resource allocation, real-time analytics can lead to
significant cost savings. Businesses can reduce waste, improve efficiency, and lower operational expenses.
Enhanced Collaboration and Communication
Data-Driven Culture: Real-time analytics helps employees across different departments can access and
analyze real-time data, leading to more informed decisions and better collaboration.
Transparent Operations: Real-time data visualization tools, such as dashboards and reports, provide a
clear and transparent view of operations.
Regulatory Compliance
Real-Time Monitoring: For industries with stringent regulatory requirements, real-time analytics provides
continuous monitoring and reporting capabilities.
Audit Trails: Real-time analytics systems can maintain detailed audit trails of data access and
modifications, aiding in compliance and accountability.
Real-time analytics is a powerful tool that finds applications across various industries. Here are some key
applications:
Predictive Maintenance: Manufacturing, utilities, and transportation sectors utilize real-time analytics to
monitor equipment health and predict failures before they occur.
Fraud Detection: Financial services, e-commerce platforms, and insurance companies continuously monitor
transactions and user behavior, to identify anomalies and take immediate action to mitigate fraud risks.
Customer Experience Management: Retailers, hospitality providers, and online services analyze customer
interactions and feedback in real-time, businesses can personalize services, optimize marketing campaigns, and
promptly address customer issues, leading to higher satisfaction and loyalty.
Smart Cities: Urban planners and city administrations employ real-time analytics in traffic management,
public transportation optimization, and real-time monitoring of public safety and environmental conditions.
Healthcare: Healthcare providers use real-time analytics to monitor patient vitals, manage hospital resources,
and provide timely interventions. For instance, real-time analysis of patient data can alert medical staff to
potential emergencies, improving patient outcomes and operational efficiency.
Financial Trading: Financial institutions and traders rely on real-time analytics to make quick, informed
trading decisions. By analyzing market data as it happens, traders can identify trends, detect anomalies, and
execute trades at the optimal moment to maximize profits.
Supply Chain Management: Logistics and supply chain companies use real-time analytics to track shipments,
manage inventory, and optimize delivery routes. This ensures timely deliveries, reduces costs, and improves
overall supply chain efficiency.
Telecommunications: Telecom operators use real-time analytics to monitor network performance, detect
outages, and manage bandwidth. This helps in maintaining service quality, reducing downtime, and enhancing
customer satisfaction.
Energy Management: Utility companies and large enterprises employ real-time analytics for energy
consumption monitoring and optimization. By analyzing real-time data from smart meters and sensors,
businesses can optimize energy usage, reduce costs, and support sustainability initiatives.
Marketing and Advertising: Marketers and advertisers use real-time analytics to measure the effectiveness
of campaigns and adjust strategies on the fly. Real-time insights into customer behavior and engagement help
in creating targeted and impactful marketing efforts.
Retail and E-commerce: Retailers and e-commerce platforms leverage real-time analytics to manage
inventory, optimize pricing strategies, and enhance the shopping experience. Analyzing real-time sales data
and customer interactions helps in making informed decisions that drive sales and improve customer
satisfaction.
Real-time sentiment analysis is an important artificial intelligence-driven process that is used by organizations
for live market research for brand experience and customer experience analysis purposes. In this article, we
explore what is real-time sentiment analysis and what features make for a really brilliant live social feed analysis
tool.
Real-time Sentiment Analysis is a machine learning (ML) technique that automatically recognizes and extracts
the sentiment in a text whenever it occurs. It is most commonly used to analyze brand and product mentions in
live social comments and posts. An important thing to note is that real-time sentiment analysis can be done only
from social media platforms that share live feeds like Twitter does.
The real-time sentiment analysis process uses several ML tasks such as natural language
processing, text analysis, semantic clustering, etc to identify opinions expressed about brand
experiences in live feeds and extract business intelligence from them.
Real-time sentiment analysis has several applications for brand and customer analysis. These
include the following.
2. Real-time sentiment analysis of text feeds from platforms such as Twitter. This is
7. Up-to-date scanning of news websites for relevant news through keywords and
hashtags along with the sentiment in the news.
Live sentiment analysis is done through machine learning algorithms that are trained to
recognize and analyze all data types from multiple data sources, across different languages,
for sentiment.
A real-time sentiment analysis platform needs to be first trained on a data set based on your
industry and needs. Once this is done, the platform performs live sentiment analysis of real-
time feeds effortlessly.
To extract sentiment from live feeds from social media or other online sources, we first need
to add live APIs of those specific platforms, such as Instagram or Facebook. In case of a
platform or online scenario that does not have a live API, such as can be the case of Skype or
Zoom, repeat, time-bound data pull requests are carried out. This gives the solution the ability
to constantly track relevant data based on your set criteria.
All the data from the various platforms thus gathered is now analyzed. All text data in
comments are cleaned up and processed for the next stage. All non-text data from live video
or audio feeds is transcribed and also added to the text pipeline. In this case, the platform
extracts semantic insights by first converting the audio, and the audio in the video data, to
text through speech-to-text software.
This transcript has timestamps for each word and is indexed section by section based on
pauses or changes in the speaker. A granular analysis of the audio content like this gives the
solution enough context to correctly identify entities, themes, and topics based on your
requirements. This time-bound mapping of the text also helps with semantic search.
Even though this may seem like a long drawn-out process, the algorithms complete this in
seconds.
All the data is now analyzed using native natural language processing (NLP), semantic
clustering, and aspect-based sentiment analysis. The platform derives sentiment from aspects
and themes it discovers from the live feed, giving you the sentiment score for each of them.
It can also give you an overall sentiment score in percentile form and tell you sentiment based
on language and data sources, thus giving you a break-up of audience opinions based
on various demographics.
All the intelligence derived from the real-time sentiment analysis in step 3 is now showcased
on a reporting dashboard in the form of statistics, graphs, and other visual elements. It is from
this sentiment analysis dashboard that you can set alerts for brand mentions and keywords in
live feeds as well.
What Are The Most Important Features Of A Real-Time Sentiment Analysis Platform?
A live feed sentiment analysis solution must have certain features that are necessary to extract
and determine real-time insights. These are:
Multiplatform
One of the most important features of a real-time sentiment analysis tool is its ability to
analyze multiple social media platforms. This multiplatform capability means that the tool is
robust enough to handle API calls from different platforms, which have different rules and
configurations so that you get accurate insights from live data.
This gives you the flexibility to choose whether you want to have a combination of platforms
for live feed analysis such as from a Ted talk, live seminar, and Twitter, or just a single
platform, say, live Youtube video analysis.
Multimedia
Being multi-platform also means that the solution needs to have the capability to process
multiple data types such as audio, video, and text. In this way, it allows you to discover brand
and customer sentiment through live TikTok social listening, real-time Instagram social
listening, or live Twitter feed analysis, effortlessly, regardless of the data format.
Multilingual
Another important feature is a multilingual capability. For this, the platform needs to have
part-of-speech taggers for each language that it is analyzing. Machine translations can lead to
a loss of meanings and nuances when translating non-Germanic languages such as Korean,
Chinese, or Arabic into English. This can lead to inaccurate insights from live conversations.
Web scraping
While metrics from a social media platform can tell you numerical data like the number of followers, posts,
likes, dislikes, etc, a real-time sentiment analysis platform can perform data scraping for more qualitative
insights. The tool’s in-built web scraper automatically extracts data from the social media platform you want
to extract sentiment from. It does so by sending HTTP requests to the different web pages it needs to target for
the desired information, downloads them, and then prepares them for analysis.
It parses the saved data and applies various ML tasks such as NLP, semantic classification, and sentiment
analysis. And in this way gives you customer insights beyond the numerical metrics that you are looking for.
Alerts
The sentiment analysis tool for live feeds must have the capability to track and simplify complex data sets as it
conducts repeat scans for brand mentions, keywords, and hashtags. These repeat scans, ultimately, give you live
updates based on comments, posts, and audio content on various channels. Through this feature, you can set
alerts for particular keywords or when there is a spike in your mentions. You can get these notifications on your
Reporting
Another major feature of a real-time sentiment analysis platform is the reporting dashboard. The insights
visualization dashboard is needed to give you the insights that you require in a manner that is easily
understandable. Color-coded pie charts, bar graphs, word clouds, and other formats make it easy for you to
assess sentiment in topics, aspects, and the overall brand, while also giving you metrics in percentile form.
The user-friendly customer experience analysis solution, Repustate IQ, has a very comprehensive reporting
dashboard that gives numerous insights based on various aspects, topics, and sentiment combinations. In
addition, it is also available as an API that can be easily integrated with a dashboard such as Power BI or
Tableau that you are already using. This gives you the ability to leverage a high-precision sentiment analysis
API without having to invest in yet another end-to-end solution that has a fixed reporting dashboard.
The stock market is the collection of markets where stocks and other securities are bought and sold by
investors. Publicly traded companies offer shares of ownership to the public, and those shares can be
bought and sold on the stock market. Investors can make money by buying shares of a company at a low
price and selling them at a higher price. The stock market is a key component of the global economy,
providing businesses with funding for growth and expansion. It is also a popular way for individuals to
Importance Description
Capital Formation It provides a source of capital for companies to raise funds for growth and expansion.
Importance Description
Investment
Investors can potentially grow their wealth over time by investing in the stock market.
Opportunities
Economic Indicators The stock market can indicate the overall health of the economy.
Job Creation Publicly traded companies often create jobs and contribute to the economy’s growth.
Corporate Governance Shareholders can hold companies accountable for their actions and decision-making processes.
Investors can use the stock market to manage their investment risk by diversifying their
Risk Management
portfolio.
The stock market helps allocate resources efficiently by directing investments to companies with
Market Efficiency
promising prospects.
Let us see the data on which we will be working before we begin implementing the software to anticipate
stock market values. In this section, we will examine the stock price of Microsoft Corporation (MSFT) as
reported by the National Association of Securities Dealers Automated Quotations (NASDAQ). The stock
price data will be supplied as a Comma Separated File (.csv) that may be opened and analyzed in Excel or
a Spreadsheet.
MSFT’s stocks are listed on NASDAQ, and their value is updated every working day of th e stock market.
It should be noted that the market does not allow trading on Saturdays and Sundays. Therefore, there is a
gap between the two dates. The Opening Value of the stock, the Highest and Lowest values of that stock
on the same day, as well as the Closing Value at the end of the day are all indicated for each date. Analyzing
this data can be useful for stock market prediction using machine learning techniques. he Adjusted Close
Value reflects the stock’s value after dividends have been declared (t oo technical!). Furthermore, the total
volume of the stocks in the market is provided. With this information, it is up to the job of a Machine
Learning/Data Scientist to look at the data and develop different algorithms that may extract patterns from
Stock Market Prediction Using the Long Short-Term Memory Method.We will use the Long Short-Term
Memory(LSTM) method to create a Machine Learning model to forecast Microsoft Corporation stock
values. They are used to make minor changes to the information by multiplying and adding. Long -term
memory (LSTM) is a deep learning artificial recurrent neural network (RNN) architecture.
Step4.
6: Scaling
7: Creating a Training Set and a Test Set for Stock Market Prediction
The decaying window algorithm not only tracks the most recurring elements in an incoming data stream, but
also discounts any random spikes or spam requests that might have boosted an element’s frequency. In a
decaying window, you assign a score or weight to every element of the incoming data stream. Further, you need
to calculate the aggregate sum for each distinct element by adding all the weights assigned to that element. The
element with the highest total score is listed as trending or the most popular.
In a decaying window algorithm, you assign more weight to newer elements. For a new element, you first
reduce the weight of all the existing elements by a constant factor k and then assign the new element with a
specific weight. The aggregate sum of the decaying exponential weights can be calculated using the following
formula:
∑t−1i=0at−i(1−c)i
In a data stream consisting of various elements, you maintain a separate sum for each distinct element. For every
incoming element, you multiply the sum of all the existing elements by a value of (1−c). Further, you add the
weight of the incoming element to its corresponding aggregate sum.
A threshold can be kept to, ignore elements of weight lesser than that.
Finally, the element with the highest aggregate score is listed as the most popular element.
Example
fifa
ipl - 0.9 * (1-0.1) + 0 = 0.81 (adding 0 because current tag is different than fifa)
fifa - 0.81 * (1-0.1) + 1 = 1.729 (adding 1 because current tag is fifa only)
ipl
fifa - 0 * (1-0.1) = 0
ipl - 0 * (1-0.1) + 1 = 1
fifa - 1 * (1-0.1) + 0 = 0.9 (adding 0 because current tag is different than ipl)
In the end of the sequence, we can see the score of fifa is 2.135 but ipl is 3.7264
So, ipl is more trending then fifa
Even though both of them occurred same number of times in input there score is still different.