Hedge Fund Secret: How Detecting Market Irregularities: Anomaly
Hedge Fund Secret: How Detecting Market Irregularities: Anomaly
Introduction
Anomaly detection in time-series data is a critical task in various domains, including finance, healthcare and manufacturing. In the
financial sector, detecting anomalies can help identify unusual market behavior, such as sudden price spikes or drops, which may
indicate potential opportunities or risks. This tutorial will focus on using Python to detect anomalies in financial time-series data,
specifically stock prices. We will use the yfinance library to download real financial data and implement an anomaly detection system
using various techniques. By the end of this tutorial, you will have a solid understanding of how to detect anomalies in time-series data
and how to apply these techniques to real-world financial data.
import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import mplfinance as mpf
from sklearn.ensemble import IsolationForest
import plotly.graph_objects as go
data = {}
for ticker in tickers:
ticker_data = yf.Ticker(ticker)
df = ticker_data.history(start=start_date, end=end_date)
df['Ticker'] = ticker
data[ticker] = df
Data Preprocessing
Before we dive into anomaly detection, let’s preprocess the data. We will focus on the closing prices and resample the data to a weekly
frequency to smooth out daily fluctuations.
def preprocess_data(data):
processed_data = {}
for ticker, df in data.items():
df = df['Close'].resample('W').mean()
processed_data[ticker] = df
return processed_data
processed_data = preprocess_data(data)
Visualizing the Data
Let’s visualize the closing prices of the selected stocks to get an initial understanding of the data.
def plot_data(processed_data):
plt.figure(figsize=(14, 7))
for ticker, df in processed_data.items():
plt.plot(df, label=ticker)
plt.title('Weekly Closing Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plot_data(processed_data)
class AnomalyDetector:
def __init__(self, contamination=0.05):
self.contamination = contamination
self.model = IsolationForest(contamination=self.contamination)
def detect_anomalies(self):
self.data['anomaly'] = self.model.predict(self.data.values.reshape(-1, 1))
self.data['anomaly'] = self.data['anomaly'].apply(lambda x: 1 if x == -1 else 0)
return self.data
detectors = {}
for ticker, df in processed_data.items():
detector = AnomalyDetector()
detector.fit(df)
detectors[ticker] = detector.detect_anomalies()
Visualizing Anomalies
Let’s visualize the detected anomalies on the stock price charts.
import matplotlib.pyplot as plt
def plot_anomalies(detectors):
for ticker, df in detectors.items():
plt.figure(figsize=(14, 7))
plt.plot(df.index, df.iloc[:, 0], label='Price')
anomalies = df[df['anomaly'] == 1]
plt.scatter(anomalies.index, anomalies.iloc[:, 0], color='red', label='Anomaly')
plt.title(f'Anomalies in {ticker} Stock Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
plot_anomalies(detectors)
# Assuming `detectors` is a dictionary with ticker symbols as keys and DataFrames as values
# plot_anomalies(detectors)
Conclusion
In this tutorial, we explored the process of detecting anomalies in time-series data using Python. We used the yfinance library to
download real financial data and implemented an anomaly detection system using the Isolation Forest algorithm. By visualizing the
detected anomalies, we gained insights into unusual market behavior that could indicate potential opportunities or risks. Anomaly
detection is a powerful tool in the financial sector, helping investors and analysts make informed decisions. By leveraging Python and
machine learning algorithms, we can build robust systems to detect and analyze anomalies in financial data. We hope this tutorial has
provided you with a solid foundation for applying anomaly detection techniques to your own financial data analysis projects.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js