0% found this document useful (0 votes)
29 views4 pages

Multivariate-Polynomial-Regression

The document discusses the methodology of Multivariate Polynomial Regression in data mining, emphasizing its role in predictive data mining for estimating future values. It outlines various regression techniques, the challenges associated with multicollinearity, and proposes solutions for better regression estimation. The paper concludes that Multivariate Polynomial Regression is essential for predicting values when multiple interrelated variables are present.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views4 pages

Multivariate-Polynomial-Regression

The document discusses the methodology of Multivariate Polynomial Regression in data mining, emphasizing its role in predictive data mining for estimating future values. It outlines various regression techniques, the challenges associated with multicollinearity, and proposes solutions for better regression estimation. The paper concludes that Multivariate Polynomial Regression is essential for predicting values when multiple interrelated variables are present.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 962

ISSN 2229-5518

Multivariate Polynomial Regression in Data


Mining: Methodology, Problems and Solutions
Priyanka Sinha

Abstract— Data Mining is the process of extracting some unknown useful information from a given set of data. There are two forms of
data mining – predictive data mining, descriptive data mining. Predictive data mining is the process of estimation of the values based on
the given data set. This can be achieved by regression analysis on the given data set. In this paper, we are specifying the various methods
that can be used typically Polynomial Regression Technique, the various forms of implementing analysis, problems and possible solutions.

Index Terms— Data Mining, Prediction, Regression, Polynomial Regression, Multivariate Polynomial Regression.

——————————  ——————————

1 INTRODUCTION

W ITH the increasing use of computers in our data to day


life, there is a continuous growth in the data. These da-
ta contain a large amount of known and unknown
knowledge which could be utilized for various applications
like for [1],[2] knowledge extraction, pattern analysis, data
archaeology and data dredging. This extraction of unknown
useful information is achieved by the process of Data Mining.

IJSER
Data mining is considered as an instrumental development
in analysis of data with respect to various sectors like produc-
tion, business and market analysis. There are two forms of
data mining namely: Predictive Data Mining and Descriptive
Data Mining.
Descriptive data mining is the process of extracting the fea-
tures from the given set of values. Predictive Data Mining is Fig. 1. ScatterPlot on x and y values
the process of estimating or predicting future values from an
available set of values. Predictive data mining uses the concept
of regression for the estimation of future values.
There can be ‘n’ number of ways of joining the given points
in scatter graph. The idea of plotting the regression curve is
2 REGRESSION that the diversion within the various points should be mini-
Regression is the method of estimating a relationship from the mal. But if we simply compute the diversion to be minimal,
given data to depict the nature of data set. This relationship i.e., is minimal, we can again have n number of
can then be used for various computations like for the fore- possibilities as the negative and the positives cancel out. So
casting future values or for computing if there exists a relation just computing minimal diversion is not appropriate.
amongst the various variables or not.[3.] There are mainly two methods for finding the best fit curve,
Regression analysis is basically composed of four different namely, Method of Least Square and Method of Least Abso-
stages: lute Value. In Method of Least Square Value(LSV), the sum of
1. Identification of dependent and independent variables. the squares of the diversions is taken to be minimum. This
2. Identification of the form of relationship among the varia- sum is referred to as Sum of Square of Error.
bles like linear, parabolic, exponential, etc. by means of = minimum. (1)
scatter diagram between dependent and independent vari- Another method of finding approximate regression curve is
ables. the method of Least Absolute Value (LAV). Here, it I s as-
3. Computation of regression equation for analysis. sumed that is minimum. This method has a
4. Error analysis to understand how good the estimated drawback that finding a line that satisfies this equation is diffi-
model fits the actual data set. cult. Furthermore, there may be no unique LAV Regression
Curve.
There are various methods of Regression Analysis like:
Simple Linear Regression, Multivariate Linear Regression,
————————————————
Polynomial Regression, Multivariate Polynomial Regression,
• Priyanka Sinha is currently pursuing Masters degree program inSoftware etc.
Technology in Velloe Institute of Technology,India,
PH-91-9629785836. E-mail:er.priyakasinha@gmail.com In Linear Regression, a linear relationship exists between
the variables. The linear relationship can be amongst one re-
sponse variable and one regressor variable called as simple
IJSER © 2013
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 963
ISSN 2229-5518

linear regression or between one response variable and multi- 3 POLYNOMIAL REGRESSION
ple regression variable called as multivariate linear regression.
Polynomial Regression is a model used when the response
The linear regression equation is of the form:
variable is non-linear, i.e., the scatter plot gives a non-linear or
for simple linear regression (2)
curvilinear structure.[3]
for multivariate linear regression (3)
General equation for polynomial regression is of form:
(6)
The linear regression curve is of the form:
To solve the problem of polynomial regression, it can be
converted to equation of Multivariate Linear Regression with
k Regressor variables of the form:

(7)
Where,
(8)
ε is the error component which follows normal distribution
ε i ~N(0,σ2)
The equation can be expressed in matrix form as :
(9)
Where, X, Y, β, ε are the vector matrix form representations
which can be expanded as:

Fig. 2. ScatterPlot in Linear Regressionon x and y values

IJSER
is the vector of observations (10)

Quadratic Regression is the regression in which there is a


quadratic relationship between the response variable and the
Regressor variable. Quadratic equation is a special case of pol-
ynomial linear regression where the nature of the curve can be
is the vector of parameters (11)
predicted. The equation is of the form:
(4)
and regression curve is a parabolic curve.

is the vector of errors (12)

is the vector array or variables


(13)

Estimation of parameters is done by Least Square Method.


Assuming the fitted regression equation as:

(14)

Fig. 2. ScatterPlot in Polynomial Regressionon x and y values


Then, by Least Square Method, minimum error is repre-
sented as :

In Polynomial Regression, the relationship between the


response and the Regressor variable is modelled as the nth or- (15)
der polynomial equation. The nature of regression graph pre-
diction is not possible in this case. The form is: The matrix representation for the above equation is given
as:

(5) (16)

By partial differentiation with respect to the regression


IJSER © 2013
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 964
ISSN 2229-5518

equation model parameters, we have k independent normal


equations which can be solved for solution of parameters, The parameter for the given equation can be computed as:
which is represented as: (22)

(17) And the Computed Regression Equation is represented as:

Substitution of gives error as: (23)

(18) 5 PROBLEMS WITH MULTIVARIATE POLYNOMIAL


REGRESSION
TABLE 1 The major issue with Multivariate Polynomial Regression is
ANOVA TABLE FOR REGRESSION PARAMETERS the problem of Multicolinearity. When there are multiple re-
ANOVA Table gression variables, there are high chances that the variables
are interdependent on each other. In such cases, due to this
Degree
Source of Sum of Mean F- this relationship amongst variables, the regression equation
of
Variation
Freedom
Square Square Statistics computed does not properly fit the original graph.
Another problem with Multivariate Polynomial Regression
(Src) (DF) (SS) (MS) (F)
is that the higher degree terms in the equation do not contrib-
MSReg=
Regression ute majorly to the regression equation. So they can be ignored.
k-1 SSReg SSReg/(k-
(Reg) But if the degree is each time estimated and decided I required
1)
F= or not, then each time all the parameters and equations need

IJSER
Residual MSRes= MSReg/ to be computed.
n-k SSRes
(Res) SSRes/(n-k) MSRes
MSTot=
Total (Tot) n-1 SSTot 6 SOLUTIONS FOR MULTIVARIATE POLYNOMIAL
SSTot/(n-1)
Here, Degree of freedom for the regression equation is (n- REGRESSION PROBLEMS
k).Significance of the regression equation can be estimated by Multicolinearity is a big issue with Multivariate Polynomial
means of Analysis of Variance table called ANOVA table. Regression as it restricts from the proper estimation of regres-
sion curve. To solve this issue, the Polynomial Equation can be
mapped to a higher order space of independent variables
This Multiple Linear Regression model can be used to com- called as the feature space. There are various methods for this
pute the Polynomial Regression Equation. like: Sammon’s Mapping, Curvilinear Distance Analysis, Cur-
vilinear Component Analysis, Kernel Principle Component
Analysis, etc. These methods transform the related regression
4 MULTIVARIATE POLYNOMIAL REGRESSION variables into independent variables which results in better
Polynomial Regression can be applied on single Regressor estimation of the regression curve.
variable called Simple Polynomial Regression or it can be The solution to the problem of computation of parameter
computed on Multiple Regressor Variables as Multiple Poly- each time for increase in order can be solved by computation
nomial Regression[3],[4]. A Second Order Multiple Polynomi- using Orthogonal Polynomial Representation as:
al Regression can be expressed as:
(24)
(19)

Here, 7 CONCLUSION
β 1 , β 2 are called as linear effect parameters. Data Mining in real time problems consist of variety of data sets
β 11 , β 22 are called as quadratic effect parameters. with different properties. The prediction of values in such prob-
β 12 is called as interaction effect parameter. lems can be done by various forms of regression. The Multivari-
ate Polynomial Regression is used for value prediction when
The Regression Function for this is given as:
there are multiple values that contribute to the estimation of val-
ues. These may be related to each other and can be converted to
(20)
independent variable set which can be used for better regression
This is also called as the Response Surface.
estimation using feature reduction techniques.
This can again be represented in Matrix form as:

(21)
IJSER © 2013
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 965
ISSN 2229-5518

REFERENCES
[1] Han, Jiawei and Kamber, Micheline. (2001). Data Mining Concepts
& Techniques, Elsevier
[2] Fayyad, Usama, Pietetsky-Shapiro, Gregory, and Symth, Padharic.
Knowledge Discovery and Data Mining: Towards a Unifying
Framework (1999). KDD Proceedings, AAAI
[3] Gatignon, Hubert. (2010). Statistical Analysis of Management Data
Second Edition. New York: Springer Publication.
[4] Kleinbaum, David G, Kupper, Lawrence L, and, Muller, Keith E. Applied
Regression Analysis and Multivariable Methods 4th Edition. California:
Thomson Publication.

IJSER

IJSER © 2013
http://www.ijser.org

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy