Multivariate-Polynomial-Regression
Multivariate-Polynomial-Regression
ISSN 2229-5518
Abstract— Data Mining is the process of extracting some unknown useful information from a given set of data. There are two forms of
data mining – predictive data mining, descriptive data mining. Predictive data mining is the process of estimation of the values based on
the given data set. This can be achieved by regression analysis on the given data set. In this paper, we are specifying the various methods
that can be used typically Polynomial Regression Technique, the various forms of implementing analysis, problems and possible solutions.
Index Terms— Data Mining, Prediction, Regression, Polynomial Regression, Multivariate Polynomial Regression.
—————————— ——————————
1 INTRODUCTION
IJSER
Data mining is considered as an instrumental development
in analysis of data with respect to various sectors like produc-
tion, business and market analysis. There are two forms of
data mining namely: Predictive Data Mining and Descriptive
Data Mining.
Descriptive data mining is the process of extracting the fea-
tures from the given set of values. Predictive Data Mining is Fig. 1. ScatterPlot on x and y values
the process of estimating or predicting future values from an
available set of values. Predictive data mining uses the concept
of regression for the estimation of future values.
There can be ‘n’ number of ways of joining the given points
in scatter graph. The idea of plotting the regression curve is
2 REGRESSION that the diversion within the various points should be mini-
Regression is the method of estimating a relationship from the mal. But if we simply compute the diversion to be minimal,
given data to depict the nature of data set. This relationship i.e., is minimal, we can again have n number of
can then be used for various computations like for the fore- possibilities as the negative and the positives cancel out. So
casting future values or for computing if there exists a relation just computing minimal diversion is not appropriate.
amongst the various variables or not.[3.] There are mainly two methods for finding the best fit curve,
Regression analysis is basically composed of four different namely, Method of Least Square and Method of Least Abso-
stages: lute Value. In Method of Least Square Value(LSV), the sum of
1. Identification of dependent and independent variables. the squares of the diversions is taken to be minimum. This
2. Identification of the form of relationship among the varia- sum is referred to as Sum of Square of Error.
bles like linear, parabolic, exponential, etc. by means of = minimum. (1)
scatter diagram between dependent and independent vari- Another method of finding approximate regression curve is
ables. the method of Least Absolute Value (LAV). Here, it I s as-
3. Computation of regression equation for analysis. sumed that is minimum. This method has a
4. Error analysis to understand how good the estimated drawback that finding a line that satisfies this equation is diffi-
model fits the actual data set. cult. Furthermore, there may be no unique LAV Regression
Curve.
There are various methods of Regression Analysis like:
Simple Linear Regression, Multivariate Linear Regression,
————————————————
Polynomial Regression, Multivariate Polynomial Regression,
• Priyanka Sinha is currently pursuing Masters degree program inSoftware etc.
Technology in Velloe Institute of Technology,India,
PH-91-9629785836. E-mail:er.priyakasinha@gmail.com In Linear Regression, a linear relationship exists between
the variables. The linear relationship can be amongst one re-
sponse variable and one regressor variable called as simple
IJSER © 2013
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 963
ISSN 2229-5518
linear regression or between one response variable and multi- 3 POLYNOMIAL REGRESSION
ple regression variable called as multivariate linear regression.
Polynomial Regression is a model used when the response
The linear regression equation is of the form:
variable is non-linear, i.e., the scatter plot gives a non-linear or
for simple linear regression (2)
curvilinear structure.[3]
for multivariate linear regression (3)
General equation for polynomial regression is of form:
(6)
The linear regression curve is of the form:
To solve the problem of polynomial regression, it can be
converted to equation of Multivariate Linear Regression with
k Regressor variables of the form:
(7)
Where,
(8)
ε is the error component which follows normal distribution
ε i ~N(0,σ2)
The equation can be expressed in matrix form as :
(9)
Where, X, Y, β, ε are the vector matrix form representations
which can be expanded as:
IJSER
is the vector of observations (10)
(14)
(5) (16)
IJSER
Residual MSRes= MSReg/ to be computed.
n-k SSRes
(Res) SSRes/(n-k) MSRes
MSTot=
Total (Tot) n-1 SSTot 6 SOLUTIONS FOR MULTIVARIATE POLYNOMIAL
SSTot/(n-1)
Here, Degree of freedom for the regression equation is (n- REGRESSION PROBLEMS
k).Significance of the regression equation can be estimated by Multicolinearity is a big issue with Multivariate Polynomial
means of Analysis of Variance table called ANOVA table. Regression as it restricts from the proper estimation of regres-
sion curve. To solve this issue, the Polynomial Equation can be
mapped to a higher order space of independent variables
This Multiple Linear Regression model can be used to com- called as the feature space. There are various methods for this
pute the Polynomial Regression Equation. like: Sammon’s Mapping, Curvilinear Distance Analysis, Cur-
vilinear Component Analysis, Kernel Principle Component
Analysis, etc. These methods transform the related regression
4 MULTIVARIATE POLYNOMIAL REGRESSION variables into independent variables which results in better
Polynomial Regression can be applied on single Regressor estimation of the regression curve.
variable called Simple Polynomial Regression or it can be The solution to the problem of computation of parameter
computed on Multiple Regressor Variables as Multiple Poly- each time for increase in order can be solved by computation
nomial Regression[3],[4]. A Second Order Multiple Polynomi- using Orthogonal Polynomial Representation as:
al Regression can be expressed as:
(24)
(19)
Here, 7 CONCLUSION
β 1 , β 2 are called as linear effect parameters. Data Mining in real time problems consist of variety of data sets
β 11 , β 22 are called as quadratic effect parameters. with different properties. The prediction of values in such prob-
β 12 is called as interaction effect parameter. lems can be done by various forms of regression. The Multivari-
ate Polynomial Regression is used for value prediction when
The Regression Function for this is given as:
there are multiple values that contribute to the estimation of val-
ues. These may be related to each other and can be converted to
(20)
independent variable set which can be used for better regression
This is also called as the Response Surface.
estimation using feature reduction techniques.
This can again be represented in Matrix form as:
(21)
IJSER © 2013
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 965
ISSN 2229-5518
REFERENCES
[1] Han, Jiawei and Kamber, Micheline. (2001). Data Mining Concepts
& Techniques, Elsevier
[2] Fayyad, Usama, Pietetsky-Shapiro, Gregory, and Symth, Padharic.
Knowledge Discovery and Data Mining: Towards a Unifying
Framework (1999). KDD Proceedings, AAAI
[3] Gatignon, Hubert. (2010). Statistical Analysis of Management Data
Second Edition. New York: Springer Publication.
[4] Kleinbaum, David G, Kupper, Lawrence L, and, Muller, Keith E. Applied
Regression Analysis and Multivariable Methods 4th Edition. California:
Thomson Publication.
IJSER
IJSER © 2013
http://www.ijser.org