Stock Volatility Prediction Using OpenAI and Python | by Pranjal Saxen | March 2024

This journey is more than just a technical endeavor; it is a bridge connecting traditional financial wisdom to the frontier of technological innovation. When we dive into the realms EODHD API and OpenAI, we are unlocking new possibilities for financial analysis and paving the way for a future in which market strategies are driven by data-driven insights.

Market volatility embodies the rate at which stock prices rise or fall for a given set of returns and serves as a measure of the uncertainty or risk associated with a particular security or market. For traders and investors, understanding volatility is crucial as it affects decision making, risk assessment and return potential. Volatility is often calculated using the standard deviation or variance, two statistical measures that represent the dispersion of returns for a given security or market index.

Standard deviation and volatility

At its core, the standard deviation provides insight into the average distance of returns from their mean. A higher standard deviation indicates more volatility, indicating that the stock price can change significantly in a short period of time. Mathematically, it is expressed as follows:

where sigma is the standard deviation, NO is the number of observations, x_i is each individual observation and him is the average of all observations.


Variance, on the other hand, takes the square root of the deviations before averaging them and provides the square of the standard deviation. It is calculated as:

While both metrics offer insight into volatility, standard deviation is more commonly used in the financial industry due to its direct correlation with the average rate of return.

The meaning of volatility

Volatility shapes the strategies of traders and investors. High volatility suggests potentially higher risk and return, which is attractive to risk-takers. Conversely, low volatility indicates stability, which is preferred by risk-averse individuals. Moreover, volatility is a cornerstone of portfolio management, option pricing, and risk assessment, so its prediction is critical to financial planning and decision-making.

In the following sections, we will dive into how we can take advantage EODHD API obtain relevant data for volatility analysis and set the stage for building a sophisticated prediction model using OpenAI and deep learning techniques. This integration of technology and finance not only increases the accuracy of volatility forecasts, but also opens up new avenues for data-driven investment strategies.

Tea EODHD API is the cornerstone for accessing comprehensive financial market data that is essential for volatility prediction. It offers historical stock prices, trading volumes and other market indicators necessary for in-depth analysis. A practical approach to harnessing this wealth of information involves using Python, the preferred tool among data scientists for analyzing financial data.

Loading relevant data

First, we load the necessary data using Python and EODHD API. This process involves querying historical stock prices and volumes, as these metrics are key to calculating volatility. You can read more about Historical Data API request and its parameters here. The Python code snippet below shows how to retrieve this data:

import requests
import pandas as pd
import numpy as np
from datetime import datetime

# API configuration
api_key = 'YOUR_EODHD_API_KEY'
symbol = 'AMD' # Example stock symbol
start_date = '2023-01-01'
end_date = '2024-02-29'
url = f'{symbol}?from={start_date}&to={end_date}&api_token={api_key}&fmt=json'

# Fetching data
response = requests.get(url)
data = response.json()

# Parsing and organizing data into a DataFrame
df = pd.DataFrame(data)
df('date') = pd.to_datetime(df('date'))
df.set_index('date', inplace=True)

# Calculating daily returns
df('daily_returns') = df('close').pct_change()

# Calculating historical volatility (standard deviation of daily returns)
window = 30 # 30-day historical volatility
df('hist_volatility_30d') = df('daily_returns').rolling(window=window).std() * np.sqrt(252) # Annualizing

# Displaying the first few rows of the DataFrame to verify

Data Head
Data Tail

Data quality assurance

Ensuring data quality is a crucial step before proceeding with model building, especially when analyzing financial data where accuracy is paramount. This process includes performing sanity checks to verify the completeness, accuracy and consistency of the data. Here are some Python code snippets that perform basic data quality checks on a DataFrame df we previously prepared:

  1. Check for missing values
# Checking for missing values in the DataFrame
missing_values = df.isnull().sum()
print("Missing values in each column:\n", missing_values)

Encountering NaN values ​​in hist_volatility_30d column is expected especially for the first 29 days of your data set because there is not enough historical data available to calculate the 30-day historical volatility for those days.

Maintaining the temporal integrity of your data is essential for volatility prediction models, especially those that use deep learning techniques. Forward fulfillment as such (ffill) is often preferred because it reflects the assumption that the last observed volatility is the best estimate for the immediate future until new data becomes available.

Here’s how you can implement fill for hist_volatility_30d and daily_returns column:

df('hist_volatility_30d') = df('hist_volatility_30d').fillna(method='bfill').fillna(method='ffill')
df('daily_returns') = df('daily_returns').fillna(method='bfill').fillna(method='ffill')

2. Identification of outliers

Outliers can significantly skew your data and consequently the performance of your model. Identifying outliers typically involves tracking statistical metrics or data visualization:

# Simple statistical method to detect outliers using Z-score
from scipy import stats
import numpy as np

# Assuming 'close' is a column you want to check for outliers
z_scores = stats.zscore(df('close'))
abs_z_scores = np.abs(z_scores)
outliers = (abs_z_scores > 3).sum() # Adjust threshold as necessary
print("Number of outliers detected:", outliers)

# Visual method with boxplot for 'close' prices
import matplotlib.pyplot as plt

plt.boxplot(df('close'), vert=False)
plt.title("Boxplot for detecting outliers in 'Close' Prices")

Output: Number of outliers detected: 0

3. Data consistency check

Data consistency can include ensuring that the data conforms to known standards or patterns, particularly important for time series data where chronological order matters:

# Ensuring chronological order
if not df.index.is_monotonic_increasing:

# Check for duplicate dates
duplicate_dates = df.index.duplicated().sum()
print("Number of duplicate dates:", duplicate_dates)

Number of duplicate dates: 0

Next, we will prepare our challenge based on the live data we have collected from EODHD live stock market data provider.

Prepare financial market data for compatibility with OpenAI’s Large Language Models (LLM) and focus on creating a structured story that highlights key information relevant to market volatility.

Conversion of numerical data to textual descriptions

Transform each row of your DataFrame into a structured story. This story should succinctly describe the stock’s performance, including date, open, high, low, close, volume and historical volatility.

def create_narrative_for_amd(row):
date_str ='%B %d, %Y')
narrative = (f"On {date_str}, AMD opened at ${row('open'):.2f}, reached a high of ${row('high'):.2f}, "
f"a low of ${row('low'):.2f}, and closed at ${row('close'):.2f}. "
f"The trading volume was {row('volume'):,}, with a daily return of {row('daily_returns')*100:.2f}% "
f"and a 30-day historical volatility of {row('hist_volatility_30d')*100:.2f}%.")
return narrative

df('narrative') = df.apply(create_narrative_for_amd, axis=1)

Summary stories for analysis

Master the LLM token limit restrictions and focus on the latest trends. Aiming to use a year’s worth of data (or more), we’ll create summaries for each month, capturing key trends and metrics, rather than detailed daily stories. This method helps manage the LLM token limit while providing a comprehensive overview.

monthly_summary = df.resample('M').apply({
'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'volume': 'sum',
'daily_returns': 'mean',
'hist_volatility_30d': 'last'

# Generate monthly narratives
monthly_narratives = monthly_summary.apply(create_narrative, axis=1).tolist()
yearly_summary = " ".join(monthly_narratives)

Create a detailed call for a volatility forecast

Create a challenge that puts the model to task with a yearly summary and guides LLM to use this complex data to forecast volatility.

final_prompt = (f"Given the summarized performance of AMD stock over the past year:\n\n{yearly_summary}\n\n"
"Based on these trends and metrics, provide a month-by-month prediction for AMD's market volatility for the next 3 months. Format your predictions as follows:\n\n"
"- Month YYYY: Predicted volatility XX.XX%"
"- (Continue for each of the next 3 months)")

First, you will need to send a prepared call to the OpenAI API. This step assumes that you have an OpenAI API key and have selected the appropriate model for the task. While the specific API interactions may vary, here is a general example of how to use Python to query the API:

from openai import OpenAI

client = OpenAI(

completion =
# model="gpt-3.5-turbo",
{"role": "system", "content": "You are a highly knowledgeable financial analyst providing insights on stock market volatility."},
{"role": "user", "content": final_prompt}
temperature=0.3, # More deterministic
max_tokens=250, # Sufficient length for detailed monthly predictions
top_p=1.0, # High diversity
frequency_penalty=0.0, # Neutral towards repetition
presence_penalty=0.0, # Neutral towards new content
# stop=("\n\n") # Stops at double newline, helping to ensure concise output

response = completion.choices(0).message.content.strip()

Let’s make is suitable for rendering:

import re

# Regular expression to extract predictions
predictions = re.findall(r'- (\w+ \d{4}): Predicted volatility ((\d.)+)%', response)

# Convert to DataFrame for plotting
df_predictions = pd.DataFrame(predictions, columns=('Month', 'Predicted Volatility'))
df_predictions('Predicted Volatility') = df_predictions('Predicted Volatility').astype(float)


predicted_volatility = {
'date': pd.date_range(start='2024-03-31', periods=3, freq='M'),
'hist_volatility_30d': list(df_predictions("Predicted Volatility")) # Replace these with your actual predictions

df_predicted = pd.DataFrame(predicted_volatility)

df_combined = pd.concat((monthly_summary.reset_index(), df_predicted)).set_index('date')

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

plt.figure(figsize=(14, 7))

# Historical Volatility
plt.plot(df_combined.index(:len(monthly_summary)), df_combined('hist_volatility_30d')(:len(monthly_summary)), label='Historical Volatility', marker='o', linestyle='-', color='blue')

# Predicted Volatility
plt.plot(df_combined.index(len(monthly_summary):), df_combined('hist_volatility_30d')(len(monthly_summary):), label='Predicted Volatility', marker='o', linestyle='--', color='red')

# Formatting the plot
plt.title('Monthly Volatility Forecast for AMD Stock', fontsize=16)
plt.xlabel('Date', fontsize=14)
plt.ylabel('30-Day Historical Volatility', fontsize=14)
plt.xticks(df_combined.index, rotation=45, ha="right")

# Show plot

Predicted volatility using OpenAI

In this article, we delved into the complex realm of market volatility, a crucial metric for financial analysts and investors. By using comprehensive stock market data provided by the company EODHD APIwe extracted important information that forms the backbone of volatility prediction. We then took a bold step forward by integrating the predictive proof of advanced OpenAI models and revealed a new approach to predicting market fluctuations.

We started with a solid understanding of market volatility and its implications. We then learned where and how to get historical stock market data (thanks EODHD API). The data received was perfect to set the conditions for accurate modeling.

Our innovation shined in the application of OpenAI language models, where we transformed structured financial data into a narrative format. This story was not only human-friendly, but also ready for AI consumption, allowing us to leverage the model’s deep learning capabilities to generate forward-looking insights.

The highlight of our survey was the visualization of volatility forecasts. Historical data along with AI-generated forecasts have been displayed in a striking graphical format that communicates past trends as well as future potential. This visual representation serves as a powerful tool for analysts, offering a clear perspective of expected market behavior and aiding strategic decision-making.

As we stand on the brink of a new era of financial analytics, it’s clear that the integration of artificial intelligence with traditional data analytics methodologies has transformative potential. Our article showed not only the feasibility of such integration, but also its practical benefits. When taking advantage of strengths EODHD API and OpenAI models, analysts are equipped with a more nuanced and comprehensive understanding of market dynamics.

In conclusion, the synergy between artificial intelligence and financial data portends a promising future for the industry, promising higher forecasting accuracy and richer analytical insights. As technology advances, we can expect even more sophisticated tools and methodologies to emerge to further empower financial professionals in their efforts to navigate the ever-evolving stock market environment.

Before you go…

If you liked this article and want to stay tuned with more exciting articles about Python and Data Science — Consider becoming an Intermediate member by clicking here

Please consider logging in using my referral link. This way, part of the membership fee goes to me, which motivates me to write more interesting things about Python and Data Science.

Leave a Comment