0% found this document useful (0 votes)
52 views7 pages

IRJET Price Prediction and Analysis of F

This document summarizes a research paper that aims to predict cryptocurrency prices like Bitcoin based on news, social media feeds, and sentiment analysis using machine learning models. It first collects and preprocesses Twitter and news data on Bitcoin, performing sentiment analysis to label the data. It then uses a Long Short-Term Memory neural network to train on the historical time-series data and labeled sentiment data to predict Bitcoin prices in real-time, accounting for various financial and sentiment features. The paper reviews other works applying decision trees, linear regression, and convolutional neural networks to cryptocurrency price prediction and compares different machine learning approaches.

Uploaded by

anand.dhawale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views7 pages

IRJET Price Prediction and Analysis of F

This document summarizes a research paper that aims to predict cryptocurrency prices like Bitcoin based on news, social media feeds, and sentiment analysis using machine learning models. It first collects and preprocesses Twitter and news data on Bitcoin, performing sentiment analysis to label the data. It then uses a Long Short-Term Memory neural network to train on the historical time-series data and labeled sentiment data to predict Bitcoin prices in real-time, accounting for various financial and sentiment features. The paper reviews other works applying decision trees, linear regression, and convolutional neural networks to cryptocurrency price prediction and compares different machine learning approaches.

Uploaded by

anand.dhawale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072

Price Prediction and Analysis of Financial Markets based on News,


Social Feed, and Sentiment Index using Machine Learning and
Market Data
Tapan Mehta1, Ganesh Kolase2, Vivek Tekade3, Rahul Sathe4, Anand Dhawale5
1,2,3,4 Student, Dept. Of Computer Engineering, Modern Education Society’s College of Engineering, Pune, India
5Professor, Dept. Of Computer Engineering, Modern Education Society’s College of Engineering, Pune, India

---------------------------------------------------------------------***---------------------------------------------------------------
Abstract - Cryptocurrency is the whole new market for The scraped data is stored in CSV file format in local
trading, earning money and gaining profits using a storage. The data is preprocessed for sentiment analysis.
complete digital mode of transaction. In this paper we focus The sentiment analysis will label the data into three types,
on Cryptocurrency named Bitcoin. The predictions of prices p for positive news, n for negative news, and neutral.
are done in real-time based on news and tweets using a 1.2 Data Preprocessing
LSTM model. Our dataset consists of various features related
to bitcoin over 7 years recorded daily. Preprocessing is an important step after data gathering.
It is very hard and unwise to directly use raw data for
Keywords: Bitcoin, Sentiment, Cryptocurrency, LSTM machine learning. Preprocessing includes cleaning,
model, Twitter. integration, transformation, and reduction techniques.
Data cleaning includes handling of missing data and noisy
1. INTRODUCTION data. Data transformation includes normalization,
attribute selection, discretization. Data reduction
comprises Data cube aggregation, attribute subset
Accurate Price prediction of any currency is always a
selection, dimensionality reduction. After applying all the
tedious task. Various Machine Learning algorithms have
above techniques, the data becomes usable for machine
been used in Price prediction of the stock market. Hence, it
learning. The extracted data contained many features out
is now possible to predict the price of highly volatile
of which few were selected. Stop words were removed for
Cryptocurrencies. Bitcoin was invented in 2008 by an
better sentiment analysis from tweets.
unknown person or group of people using the name
Satoshi Nakamoto and initiated in the year 2009 when the
source code was released as open-source. Bitcoins were 1.3 Sentiment Analysis
created as a reward for process mining. Unlike fiat
currency, Bitcoin is created, distributed, traded, and Sentiment Analysis is the method of
stored with the procedure of a decentralized ledger ‘computationally’ defining whether a section of text is
system recognized as the block chain. Being highly volatile positive, negative, or neutral. It’s also termed as opinion
the price of Bitcoin depends on the very large number of mining, which consists of evaluating the behavior of the
variables including people's opinions, buzz, and news individual. The preprocessed data is fed to our model
around the world. Due to encroachment in technology, it is which labels the data. Initially, the model is trained for
possible to process text or spoken languages into an labeling.
analyzable form. Sentiment analysis is the machine
learning methodology for NLP. Sentiment Analysis is the 1.4 Machine Learning Training Model
procedure of ‘computationally’ defining whether a section
of text is positive, negative, or neutral. It is also termed as Long Short-Term Memory (LSTM) networks are a
opinion mining, evaluating the opinion and attitude of the kind of RNN (recurrent neural network) mostly used in
speaker. sequence prediction problems. This is a conduct needed in
intricate problem domains like machine paraphrase,
1.1 Data Creation speech identification, etc. LSTM lies in the complex
structure of DL. Studying and implementing LSTM is
The data is collected from Twitter and well-known news tedious work.
sources. This data consists of news and tweets from
Twitter. Data is directly or indirectly related to Bitcoin.
The news is scraped using the Scrapy framework. Scrapy
provides complete packages for Scrapping requirements.
The tweets are collected using the Twint Python library.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 483
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072

2. LITERATURE SURVEY predict the prices of Bitcoins using two deep learning
methodologies. The Web application is designed on the
The initial part of the paper [1] is database collection. Django web framework and has two pages for one for the
Quandl and CoinmarketCap databases are used for CNN network and other for the LSTM (Long Short Term
retrieving bitcoin values. After acquiring this time-series Memory) network
data recorded daily for five years at different time
instances. They have normalized and smoothened it. For
this, they have implemented different normalization Akhilesh P. Patil has proposed in this paper [4] usage of
techniques. The techniques are log transformation, z-score Short-Term Memory Networks for predicting the future
normalization, box cox normalization, etc. After this, data price of cryptocurrency through a time series model.
is smoothed over the complete period. After feature Major considerations of cryptocurrencies in the market
selection, the sample inputs are fed to the model. The are Bitcoin, Ethereum, and Litecoin.If you have a Table,
variation in the bitcoin values is denoted a pattern. The simply paste it in the box provided below and adjust the
pattern consists of variations in a positive or negative type table or the box. If you adjust the box, you can keep the
compared to the previous day’s data. After establishing the table in single column, if you have long table. In this paper
learning framework and completing the normalization, they have compared various opinions on the
they intend to use the two methods. Bayesian Regression cryptocurrencies
and GLM/Random Forest, then choose the best method to
solve the Bitcoin prediction problem. The accuracy is
compared with different models after the final Prediction

The aim of their work [2] was to derive the accuracy of


Bitcoin Prediction using different machine learning
algorithms and compare their accuracy. They have
collected the dataset for the document with the following
details from quandl.com and applied machine learning
algorithms viz. decision tree and regression for prediction
and price forecast. Test outcomes are matched for decision
trees as well as regression models. The proposed learning
method suggests the best algorithm to choose and adopt
for the cryptocurrency prediction problem. The
experimental study results show that linear regression
outperforms the other by high accuracy on the price
prediction.

The goal for their [3] innovative project is to show how a


trained machine model forecasts the value of a
cryptocurrency if we provide a sufficient quantity of data
and computational power. They have collected the
historical data from poloniex.com using a REST API call.
API gives data from 2015 to the in time intervals of 5 mins
and 2 hours. The collected data is then placed into a Data
Frame. Convolutional Neural Networks (CNN) is a deep
learning methodology used for classification. However, Image source [4]
here we tweak it to be used for prediction. By setting up a Based on which they have declared the sentiment scores
one-dimensional network instead of 2D or 3D, they predict from natural language processing of the textual data.
the output by feeding in a list of the close prices from our Features given to the model are the sentiment scores
dataset The neural networks built on in this project were derived as explained above which are used for future
completed using the Keras libraries. Keras offers neural predictions. The output is represented in the form of a
network API which can run on Tensorflow or Theano. time series graph using Python library Plotly. Here they
Keras facilitates seamless prototyping. Like all python have used the uncertainty quantification method which
libraries Keras also takes advantage of the modularity consists of calculating The Mean absolute error is the
concept providing users with independent configurable calculation of actual and predicted. This Comparison
modules. Since all the code is purely written in python, between Uncertainty quantification methods is done in
python developers do not find it hard to debug or run this paper to get current cryptocurrency trends using the
complex modified code. Predicting the future will always opening mining technique.
be on the top of the list of uses for machine learning
algorithms. Here in this project they have attempted to
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 484
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072

Pavitra Mohanty, Darshan Patel, Parth Patel, Sudipta Roy The fluctuation of the prices of Bitcoin has a reliance on
have presented in this paper [5] a way of predicting future various factors like mining cost, economic factor. The price
fluctuations of cryptocurrencies. Here they have used of the cryptocurrencies for 2 hours is predicted and the
Apache Flume for the gathering of Users' comments from dependency of cryptocurrency price on the number of
Social media and data of price is collected through various positive tweets in this duration is returned. It is noted that
exchanges. In this paper, they have used LSTM (Long social factors play a major role in deciding the price of a
Short-Term Memory) for forecasting Bitcoin trends cryptocurrency. Their proposed framework works in two
through Twitter Data. Sentiment Index from data is phases Training phase and a Detection phase. The training
derived leading to positive, negative, or neutral phase is a one-time activity. For carrying out the training
sentiments. Here, they also have used information from phase, they have collected Twitter data and the concurrent
the Block chain as one of the major considerations Bitcoin and Litecoin prices. The amount of positive,
affecting bitcoin market trends. More weightage on LSTM neutral, and negative.tweets present in one chunk is
is given for prediction of future prices of Bitcoin. Due to calculated. The count of positive tweets, neutral and
high volatility in the market the model does not meet the negative tweets are the features of the dataset, and the
accuracy requirements. The precision given by your model mapped average price is the label of the dataset. The
is 60% and accuracy is 50%. model is validated with the original labels of the given
dataset. If the result of validation is acceptable, then the
Dibakar Raj Pant, PrasangaNeupane, Anuj Poudel, Anup
model has used prediction of future price, if not then a
Kumar Pokhrel, Bishnu Kumar Lama have proposed in this
new model is to be designed. In the detection phase, real-
paper [6] the approach for predicting Bitcoin prices based
time tweets are inputted to the model, and the model
on analysis of tweets gathered from news and twitter. The
predicts the average price for two hours.
data collected is classified into 3 categories- positive,
negative, and neutral. Positive and negative tweets with
historical data are given as input to the RNN model for
prediction of price. In this paper sentiment analysis has a 3. SYSTEM ARCHITECTURE
major execution weight in the workflow. RNN model is
used for future price prediction using the historical data. It The above figure describes the system architecture. The
also shows a moderate correlation of 0.41 between the Training data for the model consists of past seven year
rise of negative opinions on Twitter related to Bitcoin and data out of which news is scraped from trusted news sites
its consequent fall in price the positive and negative using scrapy framework and twitter data using twint
sentiment scores with the accuracy of 77.62% is another library of only those users who have followers more than
useful work. fifty thousand,all this data is stored csv in the form of rows
and columns .Next we work on cleaning and pre-
processing the data .We use nltk library to remove the
stopwords from the data which helps us to get more
accurate sentiment score. In the next step Sentiment
Analysis is done using a Vader sentiment library on the
processed data which helps in determining the trends by
giving us the positive, negative and neutral score.
Alongside we have taken the Bitcoin Closing, Opening,
High and Low prices of each day from 2013 to 2020. So
now we map the sentiment index of a particular day with
above prices of the next day by doing the respective date
manipulation. In this way the training data is prepared.
Now the detection phase begins,in the detection phase
real-time tweets are inputted to the model and the model
predicts the price for the duration of hours after which we
are running the scheduler .The scheduler runs the twitter
scrapper in the crontab on a remote server to get the real
time tweets, the twitter scraper is built using the twint
Image source [6] library . After getting the tweets, it is given to the
sentiment analysis model and then we take the mean of all
the sentiment index of that time period which is fed to the
This paper [7] prediction the price of the two machine model then finally we get the predicted price for
cryptocurrencies like Bitcoin and Litecoin is done based that time period.
on sentiment analysis of tweets. Multiple Linear
Regression is used which forecasts the price with R2_score
of 44% and 59% respectively. From these scores, they can
infer that Bitcoin's price does not get much affected by the
sentiments of tweets in comparison to the Litecoin's price.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 485
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072

4. DATA EXTRACTION FOR TRAINING MODEL

4.1 Data Scraping

Fig 3. System Architecture Fig 5. Data from Twint

Our project is based on sentiment analysis of tweets and


news. Tweets are extracted using Twint Python library
which helps to extract tweets based on required
conditions. We have searched and extracted tweets using
the keyword ‘Bitcoin’ of people with more than 50000
followers. Hence making the tweets more reliable for
training and prediction purposes. We have collected
tweets of the past 7 years from the influencers of Bitcoin.

Fig 6. Bitcoin Prices


Fig 4. Real Time Scheduler
The bitcoin prices are retrieved from the past 7 years from
coindesk.com. The prices dataset consists of 4 types of
prices of each day viz. Closing Price, Opening Price,
Highest Price and Lowest price of that particular day. This
gives clear insight of price variations occurring in a day.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 486
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072

4.2 Sentiment Analysis Hence mapping is done as follows:

Today’s_tweets => price (tomorrow).


Using this strategy for mapping, it will help in predicting
tomorrow’s prices.

5. REAL TIME SCHEDULER

In this project we have used a real time scheduler for


scraping data from considered sources in real time which
will help in predicting the immediate trends of prices.
For a real time scheduler we will be using CronTab. It is a
Linux based real time scheduler and executes the code
after a specified interval mentioned in the execution
engine of the scheduler.

Fig 7. Sentiment Analysis

Sentiment analysis in our module is done in 2 phases.


Initially, as the tweets are scrapped each tweet(text) is
passed through a process of stopword removal.The
stopword removal is done using NLTK library using
‘nlppreprocess’. After the stop word removal is done on
the text, it is passed to VaderSentiment for sentiment
analysis. Fig 9. CronTab Scheduler Command
The output on each tweet has four parameters Positive,
Negative, Neutral and Compound. The mentioned As the scheduler executes periodically the following steps
parameters are the individual weights depicting the are executed -
behaviour of the user in the tweet.
I. Data extraction using Twint Library from Current
Each day has multiple tweets as a result we get multiple
sentiment values for each day so we calculate the mean System time till last ‘n’ numbers of hours.
sentiment values for each day as shown in fig [7].
II. Pre -processing of extracted data.
III. Perform Sentiment Analysis then find its mean.
4.3 Mapping of Tweets and Prices
IV. Using CoinDesk Api to retrieve current Bitcoin
price and then mapping it to the sentiment values.
V. Mean given as input to Machine Learning which
gives the predicted price of the next day.

6. EXPERIMENTAL ASSESSMENT

In this project we have tried to predict the bitcoin


prices,using the sentiment value and it’s analogous actual
bitcoin prices of each day in the past. Here the training
data consists of past data which has sentiments and its
Fig 8. Mapping
price trends, as it is a time-series data. For dealing with
After the sentiment analysis is done, for creating training time-series data LSTM (Long short term memory)
dataset bitcoin prices have to be mapped with the tweets. machine learning algorithm is the most efficient one. LSTM
As the variations in Bitcoin’s prices are due to these tweets
hence price mapping is one of the very essential aspects. can also deal with missing time/date frames and maintain
There are variations in price trends due to previous day the accuracy of the model. Hence LSTM is preferred over
tweets.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 487
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072

other ML algorithms when it comes for training of data


consisting of time-series data.

6.1 Training Model:

The data on which the model is trained consists of 1559


tuples and each tuple consists of Date, Positive, Negative,
Neutral, Compound sentiments with Bitcoin price of the
next day. Out of 1559 tuples we have used 70% of it for
training purposes and remaining for testing. The data is
normalized in the range of 0 to 1.
Fig 11. Predicted and Actual Prices
For training purposes we have used tensorflow backend
.Our model is a multi-feature sentiment analysis as it 7. FUTURE SCOPE
contains multiple sentiment values for each tweet and
news which is input to the training model. Also the Multi- As this project consists of a real-time scheduler which
feature model provides better accuracy than the Single extracts data in real time and makes real time Bitcoin’s
feature mode. price prediction, this can be also used in future for trading
in real-time as a bot. Where the user has to give some
Then the input sentiment values are given to amount to the for trading and the model will give the best
series_to_supervised function which converts the original profits by investing and selling accordingly at the right
values to a set of lag shifted values which are further time.
reframed then passed on to the LSTM model.
As this model only focuses on Bitcoin prediction, it can
also be used for prediction of other cryptocurrencies
which are also famous in the market viz. Etherum,
Litecoin, etc. So profits won’t only be earned by investing
in Bitcoin but also other cryptocurrencies.

8. CONCLUSION
We have successfully implemented the LSTM model for
prediction of Bitcoin’s prices in real-time based on
sentiment analysis from Twitter and News. The real-time
scheduler is live on server continuously extracting data
and making predictions using the machine learning LSTM
model.
Fig 9. Validation Graph The predictions are shown in graph form on a webpage

ACKNOWLEDGEMENT
6.2 Prediction Output
It gives us pronounced pleasure in presenting the Survey
Paper on “Price Prediction and Analysis of Financial
Markets based on News, Social Feed, and Sentiment Index
using Machine Learning and Market Data.”. We feel very
grateful to our guide Prof. A. D. Dhawale for giving us all
the help and guidance we needed. We are really glad to
have him for his kind support. His valuable suggestions
were very helpful.

REFERENCES

[1] Siddhi Velankar, SakshiValecha, Shreya Maji, “Bitcoin


Price Prediction using Machine Learning,”
International Conference on Advanced
Fig 10. Predicted Prices Communications Technology (ICACT) February 11 -
14, 2018.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 488
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072

[2] Karunya Rathan, SomarouthuVenkat Sai, Tubati Sai


Manikanta, “CryptoCurrency price prediction using
Decision Tree and Regression techniques”. Third
International Conference on Trends in Electronics and
Informatics (ICOEI 2019)
[3] S. Yogeshwaran, ManinderJeet Kaur, Piyush
Maheshwari “Project-Based Learning: Predicting
Bitcoin Prices using Deep Learning” 2019 IEEE Global
Engineering Education Conference (EDUCON).

[4] Akhilesh P. Patil, Akarsh T. S, Parkavi A, “A Study of


Opinion Mining and Data Mining Techniques to
analyze the Cryptocurrency Market” 3rd IEEE
International Conference on Computational Systems
and Information Technology for Sustainable Solutions,
2018.
[5] Pavitra Mohanty, Darshan Patel, Parth Patel, Sudipta
Roy, “Predicting Fluctuations in Cryptocurrencies'
Price using users' Comments and Real time Prices,”
2018 7th International Conference on Reliability,
InfocomTechnologies and Optimization (ICRITO).
[6] Dibakar Raj Pant, PrasangaNeupane, Anuj Poudel,
Anup Kumar Pokhrel, Bishnu Kumar Lama, “Recurrent
Neural Network Based Bitcoin Price Prediction by
Twitter Sentiment Analysis”.
[7] Arti Jain, Shashank Tripathi, Harsh Dhar Dwivedi,
Pranav Saxena, “Forecasting Price of Cryptocurrencies
using Tweets Sentiment Analysis.”

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 489

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy