IRJET Price Prediction and Analysis of F
IRJET Price Prediction and Analysis of F
---------------------------------------------------------------------***---------------------------------------------------------------
Abstract - Cryptocurrency is the whole new market for The scraped data is stored in CSV file format in local
trading, earning money and gaining profits using a storage. The data is preprocessed for sentiment analysis.
complete digital mode of transaction. In this paper we focus The sentiment analysis will label the data into three types,
on Cryptocurrency named Bitcoin. The predictions of prices p for positive news, n for negative news, and neutral.
are done in real-time based on news and tweets using a 1.2 Data Preprocessing
LSTM model. Our dataset consists of various features related
to bitcoin over 7 years recorded daily. Preprocessing is an important step after data gathering.
It is very hard and unwise to directly use raw data for
Keywords: Bitcoin, Sentiment, Cryptocurrency, LSTM machine learning. Preprocessing includes cleaning,
model, Twitter. integration, transformation, and reduction techniques.
Data cleaning includes handling of missing data and noisy
1. INTRODUCTION data. Data transformation includes normalization,
attribute selection, discretization. Data reduction
comprises Data cube aggregation, attribute subset
Accurate Price prediction of any currency is always a
selection, dimensionality reduction. After applying all the
tedious task. Various Machine Learning algorithms have
above techniques, the data becomes usable for machine
been used in Price prediction of the stock market. Hence, it
learning. The extracted data contained many features out
is now possible to predict the price of highly volatile
of which few were selected. Stop words were removed for
Cryptocurrencies. Bitcoin was invented in 2008 by an
better sentiment analysis from tweets.
unknown person or group of people using the name
Satoshi Nakamoto and initiated in the year 2009 when the
source code was released as open-source. Bitcoins were 1.3 Sentiment Analysis
created as a reward for process mining. Unlike fiat
currency, Bitcoin is created, distributed, traded, and Sentiment Analysis is the method of
stored with the procedure of a decentralized ledger ‘computationally’ defining whether a section of text is
system recognized as the block chain. Being highly volatile positive, negative, or neutral. It’s also termed as opinion
the price of Bitcoin depends on the very large number of mining, which consists of evaluating the behavior of the
variables including people's opinions, buzz, and news individual. The preprocessed data is fed to our model
around the world. Due to encroachment in technology, it is which labels the data. Initially, the model is trained for
possible to process text or spoken languages into an labeling.
analyzable form. Sentiment analysis is the machine
learning methodology for NLP. Sentiment Analysis is the 1.4 Machine Learning Training Model
procedure of ‘computationally’ defining whether a section
of text is positive, negative, or neutral. It is also termed as Long Short-Term Memory (LSTM) networks are a
opinion mining, evaluating the opinion and attitude of the kind of RNN (recurrent neural network) mostly used in
speaker. sequence prediction problems. This is a conduct needed in
intricate problem domains like machine paraphrase,
1.1 Data Creation speech identification, etc. LSTM lies in the complex
structure of DL. Studying and implementing LSTM is
The data is collected from Twitter and well-known news tedious work.
sources. This data consists of news and tweets from
Twitter. Data is directly or indirectly related to Bitcoin.
The news is scraped using the Scrapy framework. Scrapy
provides complete packages for Scrapping requirements.
The tweets are collected using the Twint Python library.
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 483
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
2. LITERATURE SURVEY predict the prices of Bitcoins using two deep learning
methodologies. The Web application is designed on the
The initial part of the paper [1] is database collection. Django web framework and has two pages for one for the
Quandl and CoinmarketCap databases are used for CNN network and other for the LSTM (Long Short Term
retrieving bitcoin values. After acquiring this time-series Memory) network
data recorded daily for five years at different time
instances. They have normalized and smoothened it. For
this, they have implemented different normalization Akhilesh P. Patil has proposed in this paper [4] usage of
techniques. The techniques are log transformation, z-score Short-Term Memory Networks for predicting the future
normalization, box cox normalization, etc. After this, data price of cryptocurrency through a time series model.
is smoothed over the complete period. After feature Major considerations of cryptocurrencies in the market
selection, the sample inputs are fed to the model. The are Bitcoin, Ethereum, and Litecoin.If you have a Table,
variation in the bitcoin values is denoted a pattern. The simply paste it in the box provided below and adjust the
pattern consists of variations in a positive or negative type table or the box. If you adjust the box, you can keep the
compared to the previous day’s data. After establishing the table in single column, if you have long table. In this paper
learning framework and completing the normalization, they have compared various opinions on the
they intend to use the two methods. Bayesian Regression cryptocurrencies
and GLM/Random Forest, then choose the best method to
solve the Bitcoin prediction problem. The accuracy is
compared with different models after the final Prediction
Pavitra Mohanty, Darshan Patel, Parth Patel, Sudipta Roy The fluctuation of the prices of Bitcoin has a reliance on
have presented in this paper [5] a way of predicting future various factors like mining cost, economic factor. The price
fluctuations of cryptocurrencies. Here they have used of the cryptocurrencies for 2 hours is predicted and the
Apache Flume for the gathering of Users' comments from dependency of cryptocurrency price on the number of
Social media and data of price is collected through various positive tweets in this duration is returned. It is noted that
exchanges. In this paper, they have used LSTM (Long social factors play a major role in deciding the price of a
Short-Term Memory) for forecasting Bitcoin trends cryptocurrency. Their proposed framework works in two
through Twitter Data. Sentiment Index from data is phases Training phase and a Detection phase. The training
derived leading to positive, negative, or neutral phase is a one-time activity. For carrying out the training
sentiments. Here, they also have used information from phase, they have collected Twitter data and the concurrent
the Block chain as one of the major considerations Bitcoin and Litecoin prices. The amount of positive,
affecting bitcoin market trends. More weightage on LSTM neutral, and negative.tweets present in one chunk is
is given for prediction of future prices of Bitcoin. Due to calculated. The count of positive tweets, neutral and
high volatility in the market the model does not meet the negative tweets are the features of the dataset, and the
accuracy requirements. The precision given by your model mapped average price is the label of the dataset. The
is 60% and accuracy is 50%. model is validated with the original labels of the given
dataset. If the result of validation is acceptable, then the
Dibakar Raj Pant, PrasangaNeupane, Anuj Poudel, Anup
model has used prediction of future price, if not then a
Kumar Pokhrel, Bishnu Kumar Lama have proposed in this
new model is to be designed. In the detection phase, real-
paper [6] the approach for predicting Bitcoin prices based
time tweets are inputted to the model, and the model
on analysis of tweets gathered from news and twitter. The
predicts the average price for two hours.
data collected is classified into 3 categories- positive,
negative, and neutral. Positive and negative tweets with
historical data are given as input to the RNN model for
prediction of price. In this paper sentiment analysis has a 3. SYSTEM ARCHITECTURE
major execution weight in the workflow. RNN model is
used for future price prediction using the historical data. It The above figure describes the system architecture. The
also shows a moderate correlation of 0.41 between the Training data for the model consists of past seven year
rise of negative opinions on Twitter related to Bitcoin and data out of which news is scraped from trusted news sites
its consequent fall in price the positive and negative using scrapy framework and twitter data using twint
sentiment scores with the accuracy of 77.62% is another library of only those users who have followers more than
useful work. fifty thousand,all this data is stored csv in the form of rows
and columns .Next we work on cleaning and pre-
processing the data .We use nltk library to remove the
stopwords from the data which helps us to get more
accurate sentiment score. In the next step Sentiment
Analysis is done using a Vader sentiment library on the
processed data which helps in determining the trends by
giving us the positive, negative and neutral score.
Alongside we have taken the Bitcoin Closing, Opening,
High and Low prices of each day from 2013 to 2020. So
now we map the sentiment index of a particular day with
above prices of the next day by doing the respective date
manipulation. In this way the training data is prepared.
Now the detection phase begins,in the detection phase
real-time tweets are inputted to the model and the model
predicts the price for the duration of hours after which we
are running the scheduler .The scheduler runs the twitter
scrapper in the crontab on a remote server to get the real
time tweets, the twitter scraper is built using the twint
Image source [6] library . After getting the tweets, it is given to the
sentiment analysis model and then we take the mean of all
the sentiment index of that time period which is fed to the
This paper [7] prediction the price of the two machine model then finally we get the predicted price for
cryptocurrencies like Bitcoin and Litecoin is done based that time period.
on sentiment analysis of tweets. Multiple Linear
Regression is used which forecasts the price with R2_score
of 44% and 59% respectively. From these scores, they can
infer that Bitcoin's price does not get much affected by the
sentiments of tweets in comparison to the Litecoin's price.
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 485
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 486
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
6. EXPERIMENTAL ASSESSMENT
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 487
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
8. CONCLUSION
We have successfully implemented the LSTM model for
prediction of Bitcoin’s prices in real-time based on
sentiment analysis from Twitter and News. The real-time
scheduler is live on server continuously extracting data
and making predictions using the machine learning LSTM
model.
Fig 9. Validation Graph The predictions are shown in graph form on a webpage
ACKNOWLEDGEMENT
6.2 Prediction Output
It gives us pronounced pleasure in presenting the Survey
Paper on “Price Prediction and Analysis of Financial
Markets based on News, Social Feed, and Sentiment Index
using Machine Learning and Market Data.”. We feel very
grateful to our guide Prof. A. D. Dhawale for giving us all
the help and guidance we needed. We are really glad to
have him for his kind support. His valuable suggestions
were very helpful.
REFERENCES
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 488
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 489