Analysis of Stock Market Predictor Variables Using Linear Regression
Analysis of Stock Market Predictor Variables Using Linear Regression
net/publication/326253896
CITATIONS READS
19 15,258
1 author:
Ramaswamy Seethalakshmi
SASTRA University
19 PUBLICATIONS 50 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ramaswamy Seethalakshmi on 17 April 2020.
R.Seethalakshmi
School of Humanities and Sciences, SASTRA Deemed to be University, India
Abstract
Technological advancement increases the study on stock and share market industry.
Decision making is enhanced by various statistical and machine learning algorithms. Enormous
research work have been concentrated on the feature prediction of stock prices based on
historical prices and volume. Performance measures are analyzed in this work with S&P 500
Index using statistical methods in R environment. Results obtained in this study are superior
than the existing methods. The conventional methods for financial market analysis is based on
linear regression. This paper focuses on best independent variables to predict the closing value of
the stock market. This study is used to determine specific factors which are providing most
impact on prediction of closing price.
Keywords: Stock market, Closing price, S&P 500 Index, Linear Regression, AIC
1. Introduction
History has revealed that stock prices and other resources is an essential part of the
important forces of economic activity, and can control or be a pointer of communal mood. In a
financial system where the stock market on an increase is measured to be a flourishing economy.
Often the stock market is measured the principal pointer of a country's financial power and
progress. Stock market research is required in order to make a smart speculation result. Stock
market research is necessary if one wants to earn a major return on stocks. Before putting
money in the stock market one should be alert of the company and its return patterns. Stock
market research will help to make a decision which industry should invest in.
Stock market forecasting is the act of demanding to conclude the future price of
a company stock or other financial instrument traded on an exchange. The successful forecast of
a stock's future value might give up important profit. The efficient-market hypothesis suggests
that stock prices reveal all currently existing information and any price changes that are not
based on newly exposed information thus are intrinsically unpredictable. Stock price prediction
is possible by Data mining Algorithms. Data mining can be defined as “making better use of
data”. Every human being is more and more faced with uncontrollable amounts of data; hence,
data mining or knowledge discovery it seems that affects all of us. It is so known as one of the
key research areas. Preferably, we would like to build up techniques for “making improved use
of any kind of data for any purpose”. On the other hand, we dispute that this goal is too
challenging yet. Over the last three decades, more and more large amounts of historical data have
369
International Journal of Pure and Applied Mathematics Special Issue
been stored by electronic means and this amount is likely to continue to develop significantly in
the future.
Data mining technique have been effectively revealed to produce high forecasting
accurateness of movement of stock price. Now a days, as an alternative of a particular method,
traders have to use various predicting methods to increase several signals and more information
about the markets future. Data mining methods have been introduced for forecasting of
movement indication of stock market index. Data mining techniques have a more successful act
in predicting various fields such as policy, economy and engineering compared to usual
statistical techniques by discovering unknown information of data .
Data mining is systematic method plan to explore data (usually large amount of
data usually business or market related also recognized as “Big Data”) in search of reliable
pattern and/or organized relations between variables, and then to confirm the result by applying
the detected patterns to new. Stock market is very unpredictable in nature. Changes of stock
prices almost instantly. Financial analysts who purchases stocks are not conscious of all factors
like economic growth , inflation affecting stocks prices. They do not have idea in which stocks
to spend and sell. The stock brokers can easily manipulate them. Stock prices depend on news
appearing in news articles. It is not achievable for an average buyer to investigate such large
amount of information . Data Mining technique can be used to deal with this problem. Data
mining can automatically take out significant information from large amount of data that is
disturbing the stock prices. Predicting the stocks prices precisely can be done by Artificial
Neural Network (ANN). The benefit of using ANN is that it can agreement with both linear and
non linear data for predicting the stock prices. Price will move up and down and the linear
regression channel also experience changes as old prices fall off and new prices appear.
Trade in stock market deals the movement of money of a security or stock from a
trader to a buyer. This require these two parties to have the same opinion on a price. Equities
(stocks or shares) present an rights interest in a specific company. Stock market participants
range from small individual stock investors to larger traders investors, who can be based
wherever in the world, and may contain insurance companies or pension funds, banks and hedge
funds. Their buy or sell orders may execute on their behalf by a stock exchange dealer. Stock
trading volume includes the number of lots bought and sold which is express in daily basis . The more
trading volume of a stock is higher, the more the stock is active. Trading volume is an appreciative to
price patterns in practical testing and it's additional vital than stock price.
Stock market contribution refers to the number of agents who buy and sell equity
backed securities either directly or indirectly in a financial trade. Participants are normally
subdivided into three distinct sectors; households, institutions, and foreign traders. Direct
participation occur when any of the above entities buys or sells securities on its own behalf on a
trade. Indirect participation happens when an institutional investor exchanges a stock on behalf
of an individual or household. Indirect investment takes in the form of pooled investment
accounts, retirement accounts, and other managed financial accounts.
2. Literature Review
Box–Jenkins[1] used Time series analysis for forecasting and control. White [2,3,4]
used Neural Networks for stock market forecasting of IBM daily stock returns. Following this, a
370
International Journal of Pure and Applied Mathematics Special Issue
range of studies reported on the efficacy of different learning algorithms and forecasting
methods using ANN. Henry [5] used ARIMA model, to predict the daily close and morning open
price,. But all these predictable methods had troubles when non linearity exists in time series.
Chiang et al.[6] have used ANN to predict the end-of-year net asset value of mutual funds. Kim
and Han [7] found that the complex dimensionality and hidden noise of the stock market data
make it difficult to re calculate the ANN parameters. Romahi and Shen [8] also found that ANN
rarely suffers from over fitting problem. They developed a budding rule based expert system and
obtained a method which is used to predict financial market behaviour. There were also
hybridization models successfully used to predict financial behaviour. The disadvantage was
prerequisite of expert knowledge.
3. Proposed Algorithm
Data
Normalization
Features Selection
Prediction
Performance measure
371
International Journal of Pure and Applied Mathematics Special Issue
Adj
Date Open High Low Close Volume
Close
6/9/1998 1115.72 1119.92 1111.31 1118.41 5.64E+08 1118.41
3.3 Features
Stock market close price is an important piece of information that is very useful for
every short-term trader. The close prices are very important, especially for swing traders and
position traders. It also has implications for practical day trading in many day trading systems.
The stock market close price level provides very important information about the general mood
of investors. It tells a lot about the thinking of big investors that allocate large amount of money
into the stock market for their asset management purposes.
3.4 Regression
If the goal is prediction, or forecasting, or error reduction, linear regression can be used
to fit a predictive model to an observed data set of y and X values. After developing such a
model, if an additional value of X is then given without its accompanying value of y, the fitted
372
International Journal of Pure and Applied Mathematics Special Issue
model can be used to make a prediction of the value of y. Regression predicts a numerical value
[12]. Regression performs operations on a dataset where the target values have been defined
already. And the result can be extended by adding new information [13]. The relations which
regression establishes between predictor and target values can make a pattern. This pattern can
be used on other datasets which their target values are not known. Therefore the data needed for
regression are 2 part, first section for defining model and the other for testing model. In this
section we choose linear regression for our analysis. First, we divide the data into two parts of
training and testing. Then we use the training section for starting analysis and defining the
model.
Model 1: It includes all the available features . The features are described below.
Opening price
The opening price is the value that each share has when the S&P 500 stock exchange opens for
trading. The opening price gives a good indication of where the stock will move during the day.
Since the Stock exchange can be likened with an auction market i.e. buyers and sellers meet to
make deals with the highest bidder, the opening price does not have to be the same as the last
day’s closing price.
An adjusted closing price is a stock's closing price on any given day of trading that has been
amended to include any distributions and corporate actions that occurred at any time prior to the
next day's open. The adjusted closing price is often used when examining historical returns or
performing a detailed analysis on historical returns.
Volume
Volume is one of the most basic and beneficial concepts to understand when trading stocks.
Volume is defined as, “the number of shares or contracts traded in a security or an entire
market during a given period of time.”
Model 2: In model 2, two features such as volume and adjusted close are
omitted
Models are evaluated through standard performance measures and its description is given in
Table2.
373
International Journal of Pure and Applied Mathematics Special Issue
A version of
R-Squared
that has
been
adjusted for
the number
of
predictors in
the model.
R-Squared
Adjusted (1 R 2 ) (n 1) tends to
R 2 adj 1
R2 n k 1 over
estimate the
strength of
the
association
especially if
the model
has more
than one
independent
variable.
F = test statistics for ANOVA for Regression= MSR/MSE, can be used
where MSR=Mean Square Regression, MSE = Mean Square Error in Simple
Linear
F Test
The null and alternative hypotheses for simple linear regression for Regression
the F-test statistic are to assess the
overall fit of
374
International Journal of Pure and Applied Mathematics Special Issue
Ho: b1=0; where b1 is the coefficient for x (i.e. the slope of x) the model.
Ha: b 1 is not 0
measure of
the relative
Akaike AIC = 2 k – 2ln(L) where k is the number of estimated parameters
quality of
information and L be the maximum value of the likelihood function.
statistical
criterion
models for a
AIC
given set of
data.
information-
Bayesian
based
Information
criteria that
Criterion where k is the number of estimated parameters in the model and n assess
BIC is the number of observations in the data set. model fit
Six attributes of the Stock data set is considered for Model 1. After identifying the most
insignificant attribute and eliminate it from data set gives model 2.
This work compares the model 1 and model 2 using AIC, BIC and R2 values. The model outputs
are tabulated in Table 3. Model 1 and Model 2 are in Figure 2 and 3 respectively.
375
International Journal of Pure and Applied Mathematics Special Issue
Model 1 includes all (open,low,high,volume,adjclose) attributes and obtained AIC value as: -
215.9031 .Model 2 with the three(open,low,high) attributes and its AIC value is 3639.1538.
Model1 AIC,BIC values are lesser than model2.Hence this work concludes Model 1 is the best
model; hence we need to include volume and adjclose attributes for predicting the close price.
5. Conclusion
Model 1 with all features fitted with R2 value 0.997. This indicates open, high, low,
volume and adj close are essential for predicting closing value accurately. Model 2 with open,
high and low predict close value fitted with R2 value 0.992 . This indicates prediction of close
value is not affected with adj close. This study reveals with open, high, low and volume itself
enough for finding approximate prediction of close value.
References
[1] G. E. P. Box and G. M. Jenkins, Time series analysis: forecasting and control. San Fransisco,
CA: Holden- Day, 1976. to Bull and Bear Markets, President, Global Financial Data, Inc.
[2] Halbert White, “Economic prediction using neural networks: the case of IBM daily stock
376
International Journal of Pure and Applied Mathematics Special Issue
377
378