0% found this document useful (0 votes)
62 views6 pages

Big Data Analytics For Gold Price Forecasting Based On Decision Tree Algorithm and Support Vector Regression (SVR)

The document discusses using big data analytics to develop forecasting models for predicting gold prices based on economic factors. It describes using the decision tree and support vector regression algorithms to analyze gold price data and determine which model has better performance for forecasting gold prices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views6 pages

Big Data Analytics For Gold Price Forecasting Based On Decision Tree Algorithm and Support Vector Regression (SVR)

The document discusses using big data analytics to develop forecasting models for predicting gold prices based on economic factors. It describes using the decision tree and support vector regression algorithms to analyze gold price data and determine which model has better performance for forecasting gold prices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/274238769

Big Data Analytics for Gold Price Forecasting Based on Decision Tree
Algorithm and Support Vector Regression (SVR)

Article in International Journal of Science and Research (IJSR) · March 2015

CITATIONS READS

19 3,404

2 authors:

Dr. Vadivu G K. Navin


SRM Institute of Science and Technology SRM Institute of Science and Technology
77 PUBLICATIONS 302 CITATIONS 13 PUBLICATIONS 61 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Dr. Vadivu G on 19 May 2015.

The user has requested enhancement of the downloaded file.


International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438

Big Data Analytics for Gold Price Forecasting


Based on Decision Tree Algorithm and Support
Vector Regression (SVR)
Navin1, Dr. G. Vadivu2
1
Department of Information Technology, Database Systems, SRM University, Katankulathur, Chennai, India
2
Professor, Department of Information Technology, SRM University, Katankulathur, Chennai, India

Abstract: Develop a forecasting model for predicting and forecasting gold prices based on economic factors such as inflation,
currency price movements and others. For investing the money, investors are putting their money into gold because gold plays an
important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in India, it is necessary to
develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate
approach to the understanding of gold prices Support vector Regression and decision tree model. The experimental result will show the
better performance from these two (Decision tree algorithm and support vector regression algorithm) algorithms.

Keywords: R, RHadoop, SVR (Support Vector Regression), Decision tree, Gold price.

1. Introduction preferences and various trends of the data. The main purpose
of the big data analytics is to check the timing for enter into
Essentially there is two type of stock market present one is the market and exit to the market to invest the money. There
equity market and the commodity market. An equity market are various tools are present to do the analytics such as BI
is aggregation of the producer and consumer of stocks and tools, Statistical tools, data visualization tools. But most of
the trade in primary other than manufactured product is the the tools cannot support the large amount of data and if any
commodity market. There are two type of commodity are tool supports the large data then it used to take so much time
present in the commodity market one is soft commodity in to process the data or to analyses of the data. Big data
which wheat, coffee, cocoa and sugar are come and other is analytics is use to perform the data mining process, data
hard commodity in which gold, rubber and oil are come. forecasting, data prediction etc. For the forecasting of the
gold price there are 4 or 5 factors are present such as Open
In Indian gold (hard commodity) play the crucial role in the Price, Close Price, Lowest Price, Highest Price and value of
market. Gold is the most popular as an investment the money the Gold. From these factors they can find the percentage
of all the metals and investor buy the gold as per diversifying change of the price.
risk. Especially through the use of futures derivatives and
contracts the gold market is subject to speculation as are We know that one of the reasons of the gold price change is
other markets. Gold trades predominantly as a function of the external effects such as social problems, economic
sentiment and its price is less affected by the laws of supply policies and environmental conditions political. One general
and demand. Gold is the storable and many people invest assumption is made in such cases is that the historical data
their money in the gold market.to invest the money, gold incorporate all those behavior. As a result, the historical data
prediction is the very important way to predict and forecast is the major input to the prediction process. In this hypothesis
the value (price) of the gold. There are so many method to the external effects are modeled as noise, and the phenomena
predict and forecast the price such as linear regression one considered as accidental.
method, logistic regression method, decision tree method,
support vector regression method etc.in this paper we are
describing the two forecasting method one is decision tree
method and second is support vector regression method. All
the method are predict the vale on the basis of factor of the
gold. Forecasting is basically used for check the trend and to
earn money. The price of the gold is depends upon the supply
and demands just like other goods. Big data analytics is the
process of gathering the data from the different sources,
managing or organizing the data and apply analytics on large
amount of the datasets to find the pattern and to check the
tendency of the datasets. Big data can be any type of the data
such as structured data, semi structured data and unstructured
data or it can be mixed of these three datasets. Big data
analytics is useful to find the correlation, customer Before start the analysis, the analyst perform the data
cleaning process or data pre-processing. analyst remove the
Volume 4 Issue 3, March 2015
www.ijsr.net
Paper ID: SUB152560 2026
Licensed Under Creative Commons Attribution CC BY
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
NULL values, data duplicity and data ambiguity from the provide the connection between databases to hadoop. Big
datasets can connect to MySQL database to manipulate the data can be analyzed with many software tools commonly
data or clean the data with the package of RMySQL library used as part of advanced analytics disciplines such as data
and access all the database file through R. R can also connect mining, statistical analysis, predictive analytics and text
to the hadoop to access the hdfs (hadoop distributed file analytics. Many BI tools are support the analytics and
system) file and to perform the analysis on that datasets and visualization technique but the relational database cannot
with the Linux terminal, analyst can access all the platform support the unstructured data or traditional data warehouse
such as R, database and hadoop. Hadoop supports the and data warehouses may not be able to handle sets of big
additional software packages (eco-system tools) to analysis data that need to be updated continually.
purpose, here the apache sqoop ecosystem tool is use for

The one of the best framework is present which support the modeling and non-linear modeling and other with the use of
large datasets is hadoop. Apache hadoop is the open source its libraries. R support many languages such as C, C++, and
framework written in java language to support the large PYTHON etc. to directly manipulate the R objects. For any
amount of the data with map-reduce technique and hadoop- specific function, specific area and specific language user
ecosystems tools (additional software packages) such as upload the packages in R and because of this it used to
apache PIG, apache HIVE, apache SQOOP etc. the main part become highly extensible. With the different packages R
of the apache hadoop is HDHS (hadoop distributed file used to provide better connection, better analytics and better
system). Real-time data on the performance of gold price. visualization. In the graphical representation (visualization),
Many organizations used to gather, process and analyze big R uses lot of plots such as histogram, box plot, pie chart, line
data have turned to a newer class of technologies that graph, bubble chart etc. to analyse the gold fluctuation the
includes Hadoop and hadoop ecosystem tools such as most valuable graph is line graph. R uses the visualization in
YARN, MapReduce, Spark, Hive, MongoDB, Hbase and Pig the form of 2D and 3D. in R visualization can express the
as well as NoSQL databases. Those technologies form the results, excavation process and it allowing user to find the
hadoop framework that supports the processing of large data exact problem after deeply understanding of the data and
sets across clustered technology. This is the storage part of after analyse the data value it recognize which algorithm is
the hadoop and to perform the processing of the data hadoop best for the analysis.
uses the map-reduce technique. The main goal of hadoop is
data locality which is to use a whole server in a large cluster,
in which each server has internal disk drive.to provide the
higher performance Map Reduce technique assign the total
workload to these server and proceed to the for the data
analysis.

2. R
R is a statistical software or data analytical software to
analyse the data and to apply the predictive modeling with
data visualization. R is used for many graphical and
statistical methods, such as time series analysis,
classification, cluster, classical statistical test, linear R has some important features and it facilitate to the data
Volume 4 Issue 3, March 2015
www.ijsr.net
Paper ID: SUB152560 2027
Licensed Under Creative Commons Attribution CC BY
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
manipulation and calculation. It also include learning theory. For the regression and classification task, it
 A facility of storage of data and data handling has very powerful and useful tool. Basically support vector
 in the particular matrices, it suite or the calculation of array regression is used in the time series problem and regression
 A large, integrated, coherent collection of intermediate problem. The best example of the time series data is Gold
tools for data analysis, price data.
 It display the result either on softcopy or hardcopy and
graphical facilities for data analysis Now we are represent the basic concept of the support vector
 It is simple and very effective language which includes regression is a given dataset,
many function such as (conditionals, loops, user-defined
recursive functions) and input and output facilities. Where , , P is the size of training data. X

3. RHadoop represents the space of the instance, .The basic intension


is to determine the function L based on the dataset D. there is
R is the statistical software to analyse the data in the form of the equation to find the value of the function L with respect
visualization and hadoop is the java open source framework to the function . The equation is
which have two core component one is hdfs and other is
map-reduce. To connect R with hadoop, basically three
packages are required rhdfs, rmr2, rHbase in which rJava
package comes under the rhdfs and for the storage purpose
rhbase is required. rhdfs package provides basic connectivity
to the (Hadoop Distributed File System) for access the data. Where the function
R programmers can browse, read, write, and modify files are called the features and
stored in HDFS through R. rhdfs package install only on the
node that will run the R client. rhbase package provides basic and b are the coefficients which have to be determined
connectivity to HBASE which is used for the storage by the dataset.
purpose, using the Thrift server. R programmers can access The featured dimensionality can be finite or infinite. The
the tables stored in HBASE through R. rmr2 package support unknown coefficient R(c) determine by the given functional,
the hadoop map-reduce technique to perform the statistical
analysis on the hadoop cluster and this package install in
every node which are present in the hadoop cluster.
Where the symbol is the constant and the robust error
The connection of R and Hadoop using Streaming is an easy function has defined, and in this function it finding the mod
task but R should be installed in every datanode. RHadoop value of the function.
have many advantage such as scalability, data integration,
flexibility, functionality, transparency.

There are many advantage present in RHadoop which are


given below:
 With hardware failure, it support the highly fault tolerance.
 Designed to be deployed on low-cost hardware
 RHadoop supports the SDA (Streaming Data Access)
 The combination of R and Hadoop support LDS (Large
Data Sets) to perform the analysis. In equation (ii) the function that minimizes the functional
 Portability Across Software Platforms and Heterogeneous depends upon parameter which is finite and it has the given
Hardware form,

4. Support Vector Regression


Support vector Machine (SVM) is based on the statistical Where and K(x,

Volume 4 Issue 3, March 2015


www.ijsr.net
Paper ID: SUB152560 2028
Licensed Under Creative Commons Attribution CC BY
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
y) is the kernel function and K(x, y) finding the inner product price forecasting one is classification tree and other is
in the featured space, regression tree. Classification tree analysis is when the
predicted result is the class to which the data belongs that is
called Classification tree analysis and when the predicted
result can be considered a real number Regression tree
analysis.
Which is satisfies the Mercer’s condition and symmetric
condition. K is the kernel function and there are many kernel
Decision tree is the method to find the target value and check
are available such as Gaussian, trigonometric polynomial
the possibility of the trends with the different branches. In the
function and tensor product etc. the and are the decision tree all are instances are represented as the attribute
coefficient and it determine the maximization and the values and it automatically perform the reduction of the
equation is, complexity and selection of the features and regarding the
predictive analysis its structure is vary understandable and
interpretable.
With respect to this equation the constraints

and the number of the


coefficient will be totally different from the Zero
value.

The support vector regression (SVR) is used as a learning


algorithm to understand the pattern from the input and to
predict gold price as output based on that learning. This
process is divided in two phases, training data phase and
testing data phase and all the Stages from these two phases
are given below:

1) Training phase
 Stage 1: Read the randomly selected training dataset from
local repository.
 Stage 2: Apply windowing operator to transform the data
into a generic dataset. This step will convert the last row of
a window within the time series into a label or target
variable. Last variable is treated as label.
 Stage 3: apply the cross validation process (CVP) of the
It starts from the root node and step by step it goes down till
produced label from that operator in order to feed them as
terminal node to interpret the result. Decision tree is the best
inputs into support vector regression model.
approach to predict the gold value and it give the best result
 Stage 4: Select type of kernel and select special parameters at the time of prediction of the gold price. For each node, we
of support vector regression calculate the EMV (expected monetary value), and place it in
 Stage 5: apply that model into the dataset and observe the the node to indicate that it is the expected value calculated
performance or accuracy. over all branches emanating from that node. There are four
 Stage 6: If accuracy is good than go to step 6, otherwise go key advantage present in the decision tree
to step 4.  It implicitly perform feature selection or variable screening
 Stage 7: Exit from the training phase & apply trained  It require relatively little effort from users for data
model to the testing dataset. prediction and data preparation
 Nonlinear relationships between parameters
2) Testing phase
 Easy to interpret and explain to executives
 Stage 1: Read the randomly selected testing dataset from
local repository. After performing all the experiments in the gold price data
 Stage 2: Apply the training model to test the out of sample sets. We have to compute three error that are MSE, MAD,
dataset for gold price prediction. and MAPE.
 Stage 3: Produce the gold price predicted trends

5. Decision Tree
The decision tree is the visualization form that have a root
node and the leaf node. The leaf node contain the results.
There are two type of nodes are present in the decision tree
one is inner node and other in terminal node. Basically there
are two type of the decision tree can be drown in the gold

Volume 4 Issue 3, March 2015


www.ijsr.net
Paper ID: SUB152560 2029
Licensed Under Creative Commons Attribution CC BY
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Where the mean square error (MSE) is the average of the [13] Xiaoyan Bai,"Context adaptive visualization for
squares of the difference between the forecasted price and the effective business intelligence”, White, D., Sundaram, D.
actual price, the mean absolute deviation (MAD) is the [14] Jinson Zhang,"5Ws Model for Big Data Analysis and
average of the absolute values of the error and the mean Visualization", Mao Lin Huang.
absolute percentage error (MAPE) is the average of the [15] Hadavandi, E., "Developing a Time Series Model Based
absolute values of the percentage error of a forecast. Based on Particle Swarm Optimization for Gold Price
on these three error we have checked the better algorithm for Forecasting", Ghanbari, A., Abbasian-Naghneh, S.
the gold forecasting. All the error are check the measurement [16] Kim, Kyoung-jae, “Financial time series forecasting
and performance for both decision tree analysis and support using support vector machines,” In: Neurocomputing 55,
vector regression. pp.307 – 319, 2003.

6. Conclusion
There are five factor are present in the gold data which are
open value, close value, low value, high value and volume.
Gold provide an effective and useful means of diversifying a
portfolio. The way to achieving success with the gold is to
know your goals and risk profile before jumping in. The
volatility of the gold can be harnessed to accumulate wealth,
but left unchecked, it can also lead to ruin. Based on these
attribute we have predicted the result from both method
.decision tree are best for the feature selection and SVR are
best for the large amount of the dataset. But there are some
problem in the SVM. It takes long time to train the dataset.
Decision tree takes less time to process the data. Decision
tree have less mean square error then the SVM.

References
[1] K.Sahu, R.Panwar, “Exchange Forecasting Using
Hadoop Map-Reduce Technique”, S.Tilekar, R.Satpute.
April 2013
[2] Shahriar Shafiee “An overview of global gold market
and gold price forecasting” , Erkan Topal 2010
[3] K.Sahu, R.Panwar, “Exchange Forecasting Using
Hadoop Map-Reduce Technique”, S.Tilekar, R.Satpute.
April 2013
[4] Daniel Keim “Big-Data Visualization” Huamin Qu,
Kwan-Liu Ma 2013.
[5] Lucas, K. C. Lai, James, N. K. Liu, “Stock Forecasting
Using Support Vector Machine,” In: Proceedings of the
Ninth International Conference on Machine Learning
and Cybernetics, pp. 1607-1614, 2010.
[6] Tak-chung Fu, "Adaptive Data Delivery Framework for
Financial Time Series Visualization ", Fu-lai Chung, Fu-
lai Chung, Chun-fai Lam, Robert Luk 2005
[7] Z. Ismail “Forecasting Gold Prices Using Multiple
Linear Regression Method”, A. Yahya, A. Shabri 2009
[8] Big data Decision tree analytics available online
www.treeplan.com/chapters/introduction-to-decision-
trees.pdf
[9] Ashesh Anand “Forecasting Gold Prices using Time
Series Analysis”, Piyush Dharnidharka.
[10] Big data analytics available online
“searchbusinessanalytics.techtarget.com/definition/big-
data-analytics”.
[11] A. Smola and B. Scholkopf, “A Tutorial on Support
Vector Regression,” Technical Report NeuroCOLT NC-
TR-98-030, 1998.
[12] P.Chandarana,”Big Data analytics frameworks”,
M.Vijayalakshmi, 2014.

Volume 4 Issue 3, March 2015


www.ijsr.net
Paper ID: SUB152560 2030
Licensed Under Creative Commons Attribution CC BY
View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy