Naan Muthalvan Project Report Stock Market Forecast 4310
Naan Muthalvan Project Report Stock Market Forecast 4310
Tech Saksham
Capstone Project Report
NM ID NAME
P.RAJA
Master Trainer
Stock market prediction has long been a critical area of interest for investors, researchers, and
analysts due to the complex, dynamic nature of financial markets. Traditional forecasting methods
often rely on historical data and basic assumptions, which may fail to capture the non-linear and
volatile patterns inherent in stock price movements. With the advancement of machine learning,
new techniques offer the potential to improve the accuracy and reliability of stock price
predictions. This project focuses on applying linear regression, a widely used machine learning
algorithm, to forecast the closing price of a stock based on historical price data and technical
indicators. The central objective is to investigate the use of the 10-day Exponential Moving
Average (EMA) as a predictor for stock price movements, utilizing it in conjunction with other
technical indicators to build a robust predictive model.
The methodology follows a structured approach that includes data collection, preprocessing, and
analysis. Historical stock price data, including open, high, low, close prices, and trading volume,
is collected from publicly available financial sources. The data undergoes a cleaning process to
handle missing values and outliers before moving on to descriptive and exploratory data analysis
(EDA). EDA helps uncover patterns and relationships between the stock prices and key technical
indicators, such as moving averages, that may influence future price movements. In the subsequent
phase, linear regression is employed to build a predictive model, using the 10-day EMA as one of
the key features. The model is trained, and its performance is evaluated using standard error
metrics, including Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), to assess
its accuracy and predictive power.
The results of this project demonstrate the effectiveness of integrating machine learning techniques
with traditional technical analysis. The model offers improved prediction accuracy compared to
conventional methods, providing more reliable forecasts of stock prices. The project also
highlights the scalability of the model, as it can be adapted for use with other stocks and financial
instruments, making it a versatile tool for investors across different markets. Furthermore, the
model generates actionable insights that can be directly applied to trading strategies, allowing
investors to make data-driven decisions rather than relying on intuition. By offering a more
quantitative and rigorous approach to stock market analysis, this project contributes to the growing
field of machine learning applications in finance and underscores the potential of data-driven
methods for enhancing investment strategies.
1 Chapter 1:Indroduction 4
7 Conclusion 26
8 Future Scope 27
9 References 28
10 Links 29
Stock market forecasting remains a significant challenge due to the unpredictable nature of
financial markets. Stock prices are influenced by a multitude of factors, including company
performance, economic indicators, geopolitical events, and investor sentiment. Traditional
methods of forecasting often rely on historical price data and assumptions about future trends, but
these methods may not capture the dynamic, non-linear patterns inherent in the market. With the
advent of machine learning, new opportunities have emerged to build more accurate and robust
models for stock market predictions.
This project addresses the problem of stock price forecasting by applying machine learning
techniques, specifically linear regression, to predict the closing price of a stock based on historical
data and technical indicators. In particular, we focus on using the 10-day Exponential Moving
Average (EMA) as a key predictor to forecast future stock prices. By integrating machine learning
with traditional technical analysis, the goal is to enhance prediction accuracy and assist investors
in making more informed, data-driven decisions.
The primary goal of this project is to develop a predictive model using machine learning techniques
for forecasting stock prices, with a particular focus on leveraging technical indicators. A central
objective of the project is to apply linear regression to predict future stock prices by using historical
stock data along with key technical indicators, such as the 10-day Exponential Moving Average
(EMA). In addition, the project aims to evaluate the performance of the model by utilizing error
metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) to assess its
accuracy and compare the results with baseline methods. Another key objective is to explore the
relationship between stock prices and various technical indicators in order to identify which
indicators are most useful in predicting stock movements. By doing so, the project seeks to
generate actionable insights for investors, providing recommendations that can help them make
more informed decisions regarding stock trading. Furthermore, the project aims to demonstrate the
potential of data-driven methods for stock market prediction by applying machine learning
techniques such as linear regression to financial data, showcasing their value in improving
forecasting accuracy and enhancing investment strategies.
In this phase, historical stock price data is collected from publicly available sources like Yahoo
Finance, which provides daily data on stock prices, including the open, high, low, close prices, and
trading volume. For the scope of this project, data for a specific stock (e.g., Apple) is gathered
over a time period ranging from 2020 to 2024. The data is obtained in the form of time series and
serves as the foundation for all subsequent analysis..
Data pre-processing is a crucial step in preparing the dataset for analysis. Raw data typically
contains missing values, outliers, or inconsistencies that need to be handled. In this phase, missing
values are either removed or imputed using statistical methods. Outliers are detected using
methods such as z-scores, and the data is cleaned to ensure that it accurately represents the stock’s
performance over time. This phase also involves formatting the data into a structure suitable for
analysis and modeling.
Descriptive analysis is conducted to summarize the main features of the dataset. Key statistical
metrics such as mean, median, standard deviation, and range are calculated for the stock prices.
Additionally, the distribution of the stock’s closing price is examined to identify any significant
trends or patterns that could influence future price movements. This phase provides an initial
understanding of the stock’s historical behaviour, which helps in making decisions about which
variables to include in the predictive model.
Exploratory Data Analysis (EDA) is an essential step to visualize and better understand the
relationships between different variables in the dataset. Using graphical techniques such as scatter
plots, histograms, and correlation matrices, this phase aims to uncover patterns and correlations
between the stock price and other variables, such as trading volume and the 10-day Exponential
Moving Average (EMA). EDA also helps in identifying trends or anomalies that may not be
immediately obvious from the raw data alone.
Statistical analysis is performed to assess the relationships between various variables in the dataset.
This involves correlation analysis to determine how strongly different technical indicators, such
as the EMA, are related to the stock's closing price. Additionally, linear regression is applied to
quantify these relationships and determine the effectiveness of using EMA as a predictor for future
stock prices. Statistical tests, such as hypothesis testing, are also conducted to validate the
significance of these relationships.
Visualization is crucial in presenting the findings of the analysis in an easily interpretable manner.
Various charts and graphs are created to show the trends in stock prices, the relationship between
the stock price and technical indicators, and the performance of the model’s predictions. Common
visualizations include line charts for stock prices, scatter plots for correlations, and bar charts for
performance metrics. These visualizations are used in the report to illustrate key findings and
enhance the clarity of the analysis.
The final phase of the project involves generating actionable insights based on the results of the
analysis and the performance of the predictive model. This includes providing recommendations
for investors based on the stock’s predicted price movements. For example, if the model predicts
an upward trend in the stock’s price, the recommendation might be to buy, whereas a predicted
downward trend might suggest selling. Additionally, suggestions for improving the predictive
model, such as incorporating more technical indicators or using advanced machine learning
algorithms, are also provided.
1.11 Advantages
This project offers several advantages, particularly in the context of stock market prediction. First,
Import Libraries
In [69]:
#import warnings
#warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import quandl
import datetime
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-darkgrid')
plt.rc('figure', figsize=(16,10))
plt.rc('lines', markersize=4)
Configure Quandl
In [36]:
# Set start and end date for stock prices
start_date = datetime.date(2009, 3,8)
end_date = datetime.date.today()
# Load data from Quandl
data = quandl.get('FSE/SAP_X', start_date=start_date, end_date=end_date)
# Save data to CSV file
data.to_csv('data/sap_stock.csv')
In [37]
data.head()
Date
2009-
25.16 25.82 24.48 25.59 NaN 5749357.0 145200289.0 None No
03-09
2009-
25.68 26.95 25.68 26.87 NaN 7507770.0 198480965.0 None No
03-10
2009-
26.50 26.95 26.26 26.64 NaN 5855095.0 155815439.0 None No
03-11
2009-
26.15 26.47 25.82 26.18 NaN 6294955.0 164489409.0 None No
03-12
2009- 26.24 25.65 25.73 NaN 6814568.0 176228331.0 None No
26.01
03-13
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2623 entries, 2009-03-09 to 2019-06-25
Data columns (total 10 columns):
Open 2242 non-null float64
High 2616 non-null float64
Low 2616 non-null float64
Close 2623 non-null float64
Change 11 non-null float64
Traded Volume 2577 non-null float64
Turnover 2570 non-null float64
Last Price of the Day 0 non-null object
Daily Traded Units 0 non-null object
Daily Turnover 7 non-null float64
dtypes: float64(8), object(2)
memory usage: 225.4+ KB
Out[39]: T
Open High Low Close Change
V
Out[40]:
Index(['Open', 'High', 'Low', 'Close', 'Change', 'Traded Volume', 'Turnover', 'Last Price
of the Day', 'Daily Traded Units', 'Daily Turnover'], dtype='object')
In [41]: # Create a new DataFrame with only closing price and date
df = pd.DataFrame(data, columns=['Close'])
# Reset index column so that we have integers to represent time for later analysis
df = df.reset_index()
In [42]: df.head()
0 2009-03-09 25.59
1 2009-03-10 26.87
2 2009-03-11 26.64
3 2009-03-12 26.18
4 2009-03-13 25.73
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2623 entries, 0 to 2622
Data columns (total 2 columns):
Date 2623 non-null datetime64[ns]
Close 2623 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 41.1 KB
Out[44]: False
CHAPTER 4
EDA(EXPLORING DATA ANALAYSIS)
Exploratory Data Analysis (EDA) is the initial phase of data analysis that helps us understand the data
better by examining patterns, trends, and relationships. In this section, we focus on visualizing the closing
stock price over time.
Using the Pandas DataFrame, we select only the 'Close' price and reset the index to make the data easier
to plot. We then use Matplotlib to create a time series plot of the closing stock price over the period from
March 2009 to the present.
# Show plot
plt.show()
Linear Regression
Our data contains only one independent variable (X) which represents the
date and the dependent variable (Y ) we are trying to predict is the Stock
Price. To fit a line to the data points, which then represents an estimated
relationship between X and Y , we can use a Simple Linear Regression.
Y = β0 + β1X
where
In [47]: # Split data into train and test set: 80% / 20%
train, test = train_test_split(df, test_size=0.20)
CHAPTER 6
TRAINING MODEL AND ACCURACY
Model Evaluation
In [51]:
Slope: 0.028327707465384017
Intercept: 25.156203076940393
The slope coefficient tells us that with a 1 unit increase in date the closing
price increases by 0.0276 $
The intercept coefficient is the price at wich the closing price
measurement started, the stock price value at date zero
Regression Evaluation
Let's have a look at how the predicted values compare with the actual value on
random sample from our data set.
Out[55]: (2623, 2)
31 2009-04-21 29.23
# Set x label
plt.xlabel('Date', fontsize=14)
# Set y label
plt.ylabel('Stock Price in $', fontsize=14)
# Show plot
plt.show()
plt.xlabel('Integer Date')
plt.ylabel('Stock Price in $')
plt.show()
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.show()
Residual Histogram
The residuals are nearly normally distributed around zero, with a slight
skewedness to the right.
# And plot on the same axes that seaborn put the histogram
ax.plot(x, p, 'r', lw=2, label='Normal Distribution')
In [63]: df.head()
1 N
N
∑ |yi − y^ i |
i=1
N
∑(yi − y^ i ) 2
i=1
The MAE is 3% (of minimum) and 6% (of maximum) of the Closing Price.
The other two errors are larger, because the errors are squared and have
therefore a greater influence on the result.
Coefficient of determination
with
N N
RSS = ∑ ϵ2
i = ∑(yi − y^ i ) 2
i=1 i=1
TSS = ∑(yi − ȳ i ) 2
i=1
R2: 0.931518054893835
Out[68]: 0.9315252948228882
The value of R2 shows that are model accounts for nearly 94% of the differences
between the actual stock prices and the predicted prices.
In [ ]:
1. Incorporating Additional Features: One way to improve the model’s predictive power is by
including more features that could influence stock prices. These could include technical indicators
like moving averages, Relative Strength Index (RSI), and Bollinger Bands, which are widely used
in financial market analysis. Additionally, incorporating macroeconomic variables such as interest
rates, GDP growth, inflation, and geopolitical events could help the model account for broader
market movements and investor sentiment. By expanding the feature set, the model could capture
a more comprehensive view of the factors affecting stock price dynamics.
2. Advanced Modeling Techniques: The current linear regression model is quite simple and may not
fully capture the non-linear patterns in stock price data. More advanced models, such as Random
Forests, Gradient Boosting Machines (GBM), and Neural Networks, could be explored for
better predictive accuracy. These techniques can handle complex relationships and interactions
between variables, and have been shown to perform well in financial forecasting. For example,
Deep Learning models such as Recurrent Neural Networks (RNNs) or Long Short-Term
Memory (LSTM) networks could be particularly useful for time series data like stock prices, as
they are designed to capture temporal dependencies and patterns.
3. Sentiment Analysis: Another area for improvement is the integration of text data for sentiment
analysis. Stock prices are often influenced by public perception, media coverage, and social
sentiment. By scraping financial news articles, social media platforms (e.g., Twitter), or investor
sentiment indices, sentiment analysis could provide valuable insights into the potential movements
of stock prices. Techniques like Natural Language Processing (NLP) can be used to process
textual data and integrate it with numerical datasets to create more dynamic and responsive models.
4. Model Evaluation and Tuning: In the future, the model could be subjected to more rigorous cross-
validation techniques to assess its generalizability and avoid overfitting. Hyperparameter tuning
using techniques like Grid Search or Random Search could also help in fine-tuning the model for
better accuracy. Additionally, experimenting with different types of regression models, such as
Ridge Regression or Lasso Regression, could help prevent overfitting and provide more stable
predictions in case of multicollinearity among the features.
5. Real-time Prediction Systems: The ultimate goal of stock price prediction is to create a real-time
forecasting system that can provide up-to-the-minute predictions based on the latest data. Future
work could focus on building a real-time prediction engine that ingests live stock market data,
updates the model, and generates predictions on the fly. Such systems would require constant
monitoring of incoming data and would involve implementing data pipelines and streaming
services for continuous learning and forecasting.
6. Evaluation of Financial Market Anomalies: Finally, future research could explore the evaluation
of financial anomalies such as market bubbles, crashes, and volatility. Understanding these
patterns and incorporating them into the model could significantly enhance its performance,
especially during periods of high volatility. Anomalous behavior often cannot be captured by
traditional models, but by considering additional data sources (e.g., financial news, investor
REFERENCE
• Machine Learning with Scikit-Learn
o Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron -
Practical book covering machine learning algorithms, techniques, and applications.
o A Guide to Time Series Analysis in Python - Article introducing time series analysis methods for
forecasting.
o Text Mining and Sentiment Analysis with NLP in Python - Article covering sentiment analysis
techniques using NLP libraries in Python.
• Data Visualization
o Effective Data Visualization with Matplotlib and Seaborn - Guide on creating insightful data
visualizations in Python using popular visualization libraries.
o Real-Time Data Streaming and Analysis for E-commerce - Overview of real-time analytics and
streaming for business applications.
o Feature Engineering for Machine Learning - Guide on engineering features to improve model
accuracy and relevance.
o Stock Price Prediction Using Machine Learning Algorithms: A Survey by Chaudhary, A., &
Kumar, S. - A survey discussing different machine learning techniques for stock price prediction.
o Machine Learning Model Lifecycle Management with MLflow - Article on managing the entire
ML model lifecycle, from development to deployment.
LINKS
1. Scikit-Learn Documentation
https://scikit-learn.org/stable/documentation.html
2. Prophet for Time Series Forecasting
https://facebook.github.io/prophet/
3. Pandas Documentation
https://pandas.pydata.org/docs/
4. Matplotlib Documentation
https://matplotlib.org/stable/contents.html
5. Seaborn Documentation
https://seaborn.pydata.org/
6. NumPy Documentation
https://numpy.org/doc/
7. Linear Regression in Scikit-Learn
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
8. Machine Learning Model Lifecycle Management with MLflow
https://mlflow.org/
9. Google Colab
https://colab.research.google.com/
10. Jupyter Notebooks Documentation
https://jupyter.org/
© Edunet Foundation. All rights reserved | 29