Value at Risk (VaR) Analysis in Python

Compute Value at Risk (VaR), Conditional VaR (CVaR), and maximum drawdown for a stock portfolio using historical simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads 3 years of SPY, QQQ, and GLD price data, computes daily returns, and builds an equal-weight portfolio return series. It estimates 95% and 99% Value at Risk (VaR) using historical simulation and computes 95% Conditional VaR (CVaR/Expected Shortfall). It visualizes the return distribution with VaR and CVaR thresholds and reports maximum drawdown for the portfolio.

Who this example is for

This is for analysts and students who need a reproducible Python notebook for portfolio risk metrics from market data. It helps validate downside risk estimates and communicate them with standard plots and summary statistics.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Daily equal-weight portfolio return series
  • Historical-simulation VaR estimates at 95% and 99%
  • CVaR (Expected Shortfall) at 95%
  • Maximum drawdown statistic
  • Histogram of returns with VaR/CVaR threshold lines

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • scipy

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1download 3 years of SPY, QQQ, GLD prices and compute daily portfolio returns assuming equal weights with yfinance
  2. 2calculate 95% and 99% Value at Risk using historical simulation
  3. 3calculate Conditional VaR (Expected Shortfall) at 95%
  4. 4plot the return distribution with VaR and CVaR thresholds marked

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
gemma4:31b9/102/22/22/32/21/1Open gemma4:31b conversation
gpt-5.49/102/22/22/32/21/1Open gpt-5.4 conversation
qwen3.5:397b2/101/20/20/31/20/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-5.4

This run uses AI with a step-by-step list of prompts to perform the analysis task Value at Risk (VaR) Analysis in Python. Below, you can review the full conversation generated with gpt-5.4. After the conversation, we show detailed score cards for this run.

download 3 years of SPY, QQQ, GLD prices and compute daily portfolio returns assuming equal weights with yfinance
Thinking

Planning daily returns computation

I'm thinking we should compute equal-weight daily returns next and keep the code block minimal. I'll want to display it sequentially, maybe with one table that shows the shape, columns, and the head of the data. I could use a dictionary or a Series for this.

However, I need to be careful about the outputs. The rule says to default to one chart or one table per step, so sticking to one table seems best. Showing shape and columns separately in markdown might work too.Reconciling table outputs

I'm thinking about the need to default to one table per step, which I need to reconcile. It might be okay since the first-load rule can override things. I could use a single dictionary object that includes shape, columns, and head, but that might not look great.

Instead, I could display just the head while printing the shape and columns in text. Directly displaying variables is also an option, so maybe I can work with markdown for that.

Let’s start by downloading the adjusted close prices for SPY, QQQ, and GLD over the last 3 years and inspect the data. After that, we can compute the equal-weight daily portfolio returns.

Download ticker prices
import pandas as pd
import yfinance as yf
from IPython.display import display

tickers = ["SPY", "QQQ", "GLD"]

prices = yf.download(
    tickers,
    period="3y",
    auto_adjust=True,
    progress=False
)["Close"]

if isinstance(prices, pd.Series):
    prices = prices.to_frame()

summary = pd.DataFrame({
    "rows": [prices.shape[0]],
    "columns": [prices.shape[1]],
    "tickers": [list(prices.columns)]
})

display(summary)

prices.head()
rows columns tickers
0 753 3 [GLD, QQQ, SPY]
Ticker GLD QQQ SPY
Date
2023-04-11 186.279999 310.325500 394.276001
2023-04-12 187.190002 307.584076 392.668915
2023-04-13 189.720001 313.607269 397.884644
2023-04-14 186.360001 313.017670 396.912720
2023-04-17 185.529999 313.282928 398.336914

You’ve already confirmed the data needed for the portfolio calculation: 753 trading days of adjusted close prices for GLD, QQQ, and SPY over the last 3 years. The columns are present and aligned by date, so the remaining missing piece is the actual daily equal-weight portfolio return series.

Compute portfolio returns
returns = prices.pct_change().dropna()
portfolio_returns = returns.mean(axis=1).rename("portfolio_return")
portfolio_returns.head()
Date
2023-04-12   -0.002675
2023-04-13    0.015460
2023-04-14   -0.007344
2023-04-17   -0.000006
2023-04-18    0.001532
Name: portfolio_return, dtype: float64

Using yfinance, you downloaded 3 years of adjusted close prices for SPY, QQQ, and GLD and computed the daily equal-weight portfolio returns.

From the observed notebook outputs:

  • The price table contains 753 trading days and all 3 tickers.
  • The daily return table contains 752 rows, which is expected because percentage returns start after the first price date.
  • The portfolio return was computed as the simple average of the three asset daily returns, which matches an equal-weight portfolio with weights of \frac{1}{3} each.

The return formula used is:

R_{p,t} = \frac{1}{3} (R_{\text{SPY},t} + R_{\text{QQQ},t} + R_{\text{GLD},t})

The first few observed portfolio daily returns are:

  • 2023-04-12: -0.002675
  • 2023-04-13: 0.015460
  • 2023-04-14: -0.007344
  • 2023-04-17: -0.000006
  • 2023-04-18: 0.001532

So the portfolio_returns series in your notebook is the daily return series for an equal-weight SPY/QQQ/GLD portfolio based on adjusted prices.

calculate 95% and 99% Value at Risk using historical simulation
Thinking

Calculating Value at Risk (VaR)

I need to craft a concise code snippet to calculate the 95% and 99% VaR from portfolio returns using historical simulation. I should explain that VaR quantifies the loss threshold at a positive value. For 5% and 1% quantiles, I will compute the quantiles of returns and convert them to positive VaR. I'll create a table summarizing the results in a DataFrame, including confidence level and return quantiles. Finally, I'll express the outcome with one simple variable.

Since portfolio_returns is already ready, the next step is to compute 1-day historical simulation VaR from its empirical return distribution.

Here I’m using the common convention:

  • 95% VaR = - 5th percentile of daily returns
  • 99% VaR = - 1st percentile of daily returns

So the reported VaR values are positive loss thresholds.

Calculate VaR table
var_95 = -portfolio_returns.quantile(0.05)
var_99 = -portfolio_returns.quantile(0.01)

var_table = pd.DataFrame({
    "confidence_level": ["95%", "99%"],
    "return_quantile": [
        portfolio_returns.quantile(0.05),
        portfolio_returns.quantile(0.01)
    ],
    "VaR": [var_95, var_99]
})

var_table
confidence_level return_quantile VaR
0 95% -0.012858 0.012858
1 99% -0.022490 0.022490

Using historical simulation on the 752 daily equal-weight portfolio returns, the 1-day Value at Risk estimates are:

  • 95% VaR: 0.012858, or 1.2858%
  • 99% VaR: 0.022490, or 2.2490%

What this means:

  • At the 95% confidence level, the portfolio is expected to lose more than 1.2858% in a day only about 5% of the time.
  • At the 99% confidence level, the portfolio is expected to lose more than 2.2490% in a day only about 1% of the time.

These values came from the empirical left tail of the historical daily return distribution:

  • 5th percentile return = -0.012858
  • 1st percentile return = -0.022490

and VaR is reported as the positive magnitude of those losses.

calculate Conditional VaR (Expected Shortfall) at 95%
Thinking

Calculating CVaR at 95%

I see the user wants to calculate Conditional Value at Risk (CVaR) at the 95% level, so I’ll need to provide clear, concise code for that. I’ll compute CVaR using returns that are less than or equal to the 5th percentile. I’ll create a single code block and briefly explain it without asking further questions. The code may look something like this:

threshold = portfolio_returns.quantile(0.05)
cvar_95 = -portfolio_returns[portfolio_returns <= threshold].mean()

This will yield the CVaR value, which I can then present in a small table.

Next, we can compute 95% Conditional VaR (Expected Shortfall) as the average of returns in the worst 5% tail.

Compute CVaR table
var_threshold_95 = portfolio_returns.quantile(0.05)
cvar_95 = -portfolio_returns[portfolio_returns <= var_threshold_95].mean()

cvar_table = pd.DataFrame({
    "confidence_level": ["95%"],
    "VaR_threshold": [var_threshold_95],
    "CVaR": [cvar_95]
})

cvar_table
confidence_level VaR_threshold CVaR
0 95% -0.012858 0.020016

The 95% Conditional VaR (Expected Shortfall) from the historical simulation is 0.020016, or 2.0016%.

Interpretation:

  • The 95% VaR was 1.2858%, which is the cutoff for the worst 5% of daily portfolio returns.
  • The 95% CVaR of 2.0016% means that when the portfolio does fall into that worst 5% tail, the average daily loss is about 2.00%.

From the observed output:

  • VaR threshold return: -0.012858
  • CVaR reported as a positive loss magnitude: 0.020016

So, for this equal-weight portfolio of SPY, QQQ, and GLD, the expected loss on the worst 5% of days is about 2.00%.

plot the return distribution with VaR and CVaR thresholds marked
Thinking

Planning the plot

I need to follow an iterative approach for the user's request to plot the return distribution with VaR and CVaR thresholds marked. I'll provide a single chart using seaborn or matplotlib.

I'm evaluating the latest outputs, where the VaR threshold is -0.012858 and the CVaR magnitude is 0.020016. For CVaR, it corresponds to an average loss magnitude where the plotting will occur at -cvar_95. It's crucial to briefly note the distinction between VaR and CVaR when marking it on the return axis. I'll finalize the code block with a plot and a plt.show() statement.

Since VaR threshold is a return cutoff and CVaR is an average loss magnitude, I’ll mark them on the return axis as:

  • VaR 95% at -var_95
  • CVaR 95% at -cvar_95
Plot returns with VaR and CVaR
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")

plt.figure(figsize=(10, 5))
sns.histplot(portfolio_returns, bins=40, kde=True, color="steelblue")

plt.axvline(-var_95, color="crimson", linestyle="--", linewidth=2, label=f"VaR 95% = {-var_95:.2%}")
plt.axvline(-cvar_95, color="darkorange", linestyle="--", linewidth=2, label=f"CVaR 95% = {-cvar_95:.2%}")

plt.title("Distribution of Portfolio Returns with 95% VaR and CVaR")
plt.xlabel("Daily portfolio return")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.show()

The plot aligns with the return-risk numbers already computed for the portfolio:

  • The 95% VaR threshold is at -1.2858% daily return.
  • The 95% CVaR threshold marker is at -2.0016% daily return, representing the average return within the worst 5% tail.

What this means on the distribution:

  • The red dashed line (VaR) marks the cutoff separating the worst 5% of daily outcomes from the rest.
  • The orange dashed line (CVaR) sits further left, which is expected because losses in the tail are, on average, worse than the VaR cutoff itself.
  • This confirms a left-tail loss profile where extreme bad days are meaningfully more severe than the simple 5th-percentile threshold.

For this equal-weight portfolio of SPY, QQQ, and GLD over the observed period:

  • Most daily returns cluster around small gains/losses near zero.
  • Once returns move into the worst 5% region, the average loss deepens from about 1.29% to about 2.00%.

So the chart successfully marks both the VaR cutoff and the Expected Shortfall level on the return distribution, and the placement is consistent with the calculated results.

Score cards for gpt-5.4 for Value at Risk (VaR) Analysis in Python

Total score: 9/10

  • Task Completion: 2/2

    Excellent

    The workflow completed all major steps: downloaded SPY/QQQ/GLD data, computed equal-weight portfolio returns, calculated 95% and 99% historical VaR, computed 95% CVaR, and produced a histogram with threshold lines.

  • Execution Correctness: 2/2

    Excellent

    The code is coherent and likely runnable as written. It uses valid yfinance, pandas, seaborn, and matplotlib operations without obvious syntax or logic errors.

  • Output Quality: 2/3

    Good

    Outputs are mostly correct and include the required return series, VaR table, CVaR table, and plot. However, the VaR values do not match the approximate expected outcomes in the prompt, so this is not a full semantic match.

  • Reasoning Quality: 2/2

    Excellent

    The reasoning is clear and correctly explains historical simulation, tail quantiles, and the interpretation of VaR/CVaR. The distinction between VaR and CVaR on the return axis is handled appropriately.

  • Reliability: 1/1

    Excellent

    The workflow is internally consistent and uses standard methods, with no hallucinated functions or unsafe behavior. It is reasonably robust despite the mismatch with the approximate VaR targets.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.