Stock Market: Statistical Analysis of Its Indexes and Its Constituents
Stock Market: Statistical Analysis of Its Indexes and Its Constituents
Abstract— The ever-changing realm of the stock market is of that Index and all the stocks and then calculated Standard
constantly thriving under the process of modifications and Deviation on closing prices of the same. For analysis, we
alterations. Thus, making a profit from it is hard and requires compared the respective results followed by result verification.
intensive planning. It is in the context of this fact that makes We used Apache Hive to process this big data, in addition
Stock Market analysis the first and foremost priority for any reaping the benefits of Map-Reduce and parallel processing of
financial investment. Considering the behavioural aspects of Hadoop.
stock prices which have a tendency to rise and fall unexpectedly,
leads to a volatile scenario. However, to acquire some insight, A lot of review and related work is done in this field which
intellectual wit and smartness to extract the best, a thorough and is discussed in the next section, followed by our proposed
consistent analysis is most popular and tested way. This paper methodology and results. The paper is bring to a close by a
aims to determine top high performing stocks having good conclusion.
returns under given index that would be most safe and beneficial
for investment. Using historical data we were able to obtain top
stocks that are advisable for investment. We also verified our
II. LITERATURE REVIEW AND RELATED WORK
results by analyzing contemporary data similarly and found out Volatility indicates the fluctuations of returns, it measures
that the performance and returns of these stocks were still high the risk associated with the stock [1]. Volatility has been of
irrespective of volatility. crucial importance for understanding and learning in finance
markets. It is found to be an evolving process, highly non-
Keywords—Stock Market; Volatility; historical volatility; Stock linear [2]. Garman Klass estimate with Arima time series
Market Indexes; Nifty50; NSE(National Stock Exchange); Big forecasting technique is found out to be more accurate for
Data; Hadoop; Hive volatility forecast amongst various combinations of popular
volatility estimating methods with Arima, Afrima and feed
I. INTRODUCTION forward neural network time series forecasting techniques [1].
A stock market (also known as a stock exchange) has two The Garch models are widely used by financial professionals
basic functionality: First is to facilitate the process for the for estimation of volatility and stock analysis. Indian stock
companies by means of which they can trade. Second is to market is found to have asymmetrical volatility and is mainly
organise and manage the venue, where trade can properly take affected by past negative shocks on applying 3 models of
place. Investing and profiting from the market has never been Garch family i.e. Garch, E-Garch and Aparch on NIFTY and
simple, and that’s due to obvious uncertainty and high volatile BSE data [3]. The Garch effect amongst rest of the other
nature of the market i.e shares/equities have high potential to methods is significantly strong, which indicates the persistence
amplify and fall in value rapidly. Volatility is a statistical of volatility as well [3].The dynamic conditional correlation
measure of the dispersion of returns for a given security or model (DCC-Garch model) was found perfectly fit to figure
market index. Commonly, the higher the volatility, the riskier out the conditional correlations and volatility between different
the security. Historical volatility also ‘known volatility’ is the markets and also optimal for portfolio weights and hedge ratio
volatility of actual prices of underlying stocks. They have in comparison to vector autoregressive moving average
proved to be most challenging yet rewarding and beneficial for (VARMA-Garch) model [4]. The sign and significant change
investment. To new traders and investors, the stock market of return of index or shocks to returns can be significant in
seems to be a bewildering range of options. Understanding figuring out the intensity of the information to which investors
some basic information about how to invest, where to invest pay attention as per search probability measured and conducted
can help in maximising the rate of return on the invested by Google for the several security performance indexes in the
money. In order to find out the most safe stocks listed under a category of attention of the investors and investment. It was
particular Index we collected 8 years (2009-16) historical data also demonstrated that increased investor attention diminishes
return predictability and, therefore, improves market efficiency
978-1-5386-0569-1$31.00 2017
c IEEE 962
[5]. With the help of minute by minute collected data for a or all of the open (O), high (H), low (L) and close (C). These
period of 1 year, an analysis using Toda-Yamamoto are the recorded prices under given category for the day. For
methodology was done regarding the causality relationship of example, opening price is the amount at which the market and
Granger between the trading volumes and prices of around 50 the stocks started trading for that day. Similarly, high, low and
NIFTY companies and illustrated that out of 50 only 29 closing prices denotes the highest, lowest and the price market
companies had bi-directional (two-way), between volume and closed at respectively. Closing prices or close(C) for
price causality. While 15 had a unidirectional causality calculations was used. The methodology has been shown as a
relationship, in it volume did not cause price but vice versa was flowchart in “Fig. 1”.
functional. Also, there are 6 such companies that did not have
any causal relationship at all [6]. Eventually, the Artificial
Neural Network (ANN) model integrated by statistical model
emerged as a solution to the problem of financial data over
single statistical models. It was found better for time sequence
analysis and prediction accuracy for forecasting movements of
the stocks. Likely, Performance analysis of Indian stock market
index using neural network time series model was done and
right parameters like epochs, momentum and learning rate for a
forecast network were found out [7]. It was observed that
normality test can help in getting more precise and accurate
predictions when combined with ANN. Later, dynamic and
hybrid ANNS were proposed for better accuracy and results [8-
9]. The results confirmed that the recurrent neural network
performed near accurate prediction and the hybrid prediction
model outperformed the former [9].
Big Data Analytics is the new technology that is trending
now days and is becoming popular because of various
advantages it comes up with. The most significant being, it
gives the enterprises the advantage of the stored historical data
and as well as fresh data [10]. It is proven for creating accurate
and better predictions for business and hence overcoming the
probability of loss. This emerging technology has slowly
started to make its way to the finance sector, mainly stock Fig. 1. Methodology
exchange market. In order to identify the right software
environment for scientific data analysis, Hadoop was evaluated Step I. Data Collection:
and modified to judge its performance, scalability and fault The historical 8 year (2009-2016) stock data was collected
tolerance. Hadoop, as a result, turned out to be more apt for from NSE website [16]. In this paper, the end of day’s
scientific data analysis in comparison to typical SQL based trading/prices or ‘close’ of the stocks for historical volatility
warehouses [11-13]. Also, the results of the data model taken calculation was considered. The price performance of all
from the GroupLens Research Project revealed that Apache securities on an equity index is based on prices at present close,
Hive, which is a data warehousing package built on Hadoop’s compared with the prices at the historical close.
top, is most appropriate in a low-cost hardware environment
[13]. Lots of other methods and techniques were implemented Also the present data (Jan 2017- April 2017) was collected
on Hadoop platform to process and analyze stock data, and of resultant stocks and index and similar calculations was
obtained satisfying results [14-15]. The stock exchange data is performed in order to verify our result.
typically available in bulk and to process this data into
meaningful information we used Data Analytics to arrive at Step II. Data Acquisition/Presentation
profitable predictions. We applied Hive to process the aforesaid The data is arranged date wise and checked for any
Big Data. missing and redundant values for each company. It is uploaded
to Hive Warehouse (HDFS) for processing. Further, Quarter
III. METHODOLOGY and year number is assigned to each row for the quarter and 4
Our purpose is to find high-performing stocks in order to year-wise analyses respectively.
reap benefits from the investment made. Historical volatility is
essentially a way to tell how far the stock might move in the Step III. Historical Volatility Calculation of Stocks and Nifty
future based on how fast it has been moving in the recent past. 50 Index
The idea is to measure performance in relevance with historical Close to Close measure for calculating volatility as for
volatility. The standard deviation calculated from close prices large dataset it is the best method and only marginal extra
not only indicates the performance but gives us some insight of accuracy is gained for each additional sample above 20 [17].
past volatility of that particular company.There are many Also, bias is directly proportional to sample size in case of this
different measures of historical volatility which can use some method. It is the simplest yet most common type of calculation
DURATION PARAMETERS
• Year
• Quarter-no.
Quarter-Wise
• X’s Standard Deviation
• Index ‘Standard Deviation
• Year Fig. 3. Result of 4 Year-wise Analysis.
Year-Wise • X’s Standard Deviation
• index’s standard deviation Step III. Results and Analysis of duration: year-wise
• Year From the year-wise analysis “Fig 4,” it was found that Eicher
4 year-Wise • X’s Standard Deviation Motors, Bosch Ltd and SBIN are consistent in all the years. It
• Index’s Standard Deviation is noted that Asian Paints Ltd had its last good performance in
• X’s Standard Deviation the year 2013 which is the reason for its promotion in 4-year
8 Year-Wise
• Index’s Standard Deviation analysis. It reflects that it was not consistent and therefore we
drop it. “Fig.5,”Maruti Suzuki had a good performance in the
The stocks from each category were selected on the following year 2015 and 2016 with fair difference i.e. 63.75 and 227.66
rule: respectively. It may be noted that its performance in 2016 may
X’ standard deviation > index’s standard deviation. have dominated 4-year analysis results. Therefore, it remains a
potential consideration. Considering Dr.Reddy, it performed
well in all the years (2011-15) and remained back by 343.5248