Finance

Value at Risk (VaR) Analysis in Python

Compute Value at Risk (VaR), Conditional VaR (CVaR), and maximum drawdown for a stock portfolio using historical simulation.

What

This AI Data Analyst workflow downloads 3 years of SPY, QQQ, and GLD price data, computes daily returns, and builds an equal-weight portfolio return series. It estimates 95% and 99% Value at Risk (VaR) using historical simulation and computes 95% Conditional VaR (CVaR/Expected Shortfall). It visualizes the return distribution with VaR and CVaR thresholds and reports maximum drawdown for the portfolio.

Who

This is for analysts and students who need a reproducible Python notebook for portfolio risk metrics from market data. It helps validate downside risk estimates and communicate them with standard plots and summary statistics.

Tools

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • scipy

Outcomes

  • Daily equal-weight portfolio return series
  • Historical-simulation VaR estimates at 95% and 99%
  • CVaR (Expected Shortfall) at 95%
  • Maximum drawdown statistic
  • Histogram of returns with VaR/CVaR threshold lines

Quality Score

9/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

  • Task Completion: 2/2

    Excellent

    All requested steps are present: data download for SPY/QQQ/GLD, equal-weight portfolio returns, historical VaR (95/99), 95% CVaR, and a histogram/KDE plot with VaR/CVaR lines.

  • Execution Correctness: 2/2

    Excellent

    Provided Python code is coherent and likely runnable end-to-end (yfinance download, return computation, quantiles, tail mean, seaborn/matplotlib plot) with no evident syntax or logic-breaking errors.

  • Output Quality: 2/3

    Good

    Outputs include the portfolio return series, VaR values, CVaR numeric output, and the required plot. However, the VaR 95% (~1.29%) does not match the expected approximate ~-1.5% (though VaR 99% ~2.25% is close to ~-2.3%), so it only mostly matches expectations.

  • Reasoning Quality: 2/2

    Excellent

    Explanations correctly describe historical simulation VaR, the sign convention (loss as positive vs cutoff return as negative), and CVaR as the mean of the worst 5% tail.

  • Reliability: 1/1

    Excellent

    Workflow is reasonably robust (dropna, clear variable definitions, standard methods) and avoids hallucinated libraries/functions; minor fragility remains around data availability/network dependence via yfinance.

download 3 years of SPY, QQQ, GLD prices and compute daily portfolio returns assuming equal weights with yfinance
Thinking

Finalizing code for returns

I’m considering how to finalize the code for computing portfolio returns. Should I just return portfolio_returns.head() or create a DataFrame with prices and returns? Since the user explicitly wants to compute returns, it might be okay to proceed without initial data loading. I'll use yfinance, and I’m debating installing the package if it's missing, but I want the code to be clean and runnable in a notebook. Let’s focus on being efficient and clear!Finalizing expression for returns

I’m thinking about how to wrap up the code for computing portfolio returns. I could simply end with portfolio_returns.head(), but I might also consider creating a DataFrame for asset returns along with the portfolio return. Another option is using portfolio_returns.rename('portfolio_return').to_frame().head(). I want to make sure everything is clear and concise, so I should craft a solid explanation to accompany this final expression.

Here’s a compact notebook step that downloads the last 3 years of daily adjusted prices for SPY, QQQ, and GLD, then computes the equal-weight daily portfolio return.

Download prices and compute returns
import yfinance as yf
import pandas as pd

tickers = ["SPY", "QQQ", "GLD"]

prices = yf.download(
    tickers,
    period="3y",
    interval="1d",
    auto_adjust=True,
    progress=False
)["Close"]

asset_returns = prices.pct_change().dropna()
portfolio_returns = asset_returns.mean(axis=1)

portfolio_returns.rename("portfolio_return").to_frame().head()
portfolio_return
Date
2023-04-11 0.000057
2023-04-12 -0.002675
2023-04-13 0.015460
2023-04-14 -0.007344
2023-04-17 -0.000006

If you want, the next step can be cumulative portfolio performance or a return/risk summary.

You’ve already downloaded 3 years of daily prices for SPY, QQQ, and GLD with yfinance and computed equal-weight daily portfolio returns.

From the notebook state:

  • prices has shape 753 x 3, so you have 753 daily adjusted close observations across the 3 ETFs.
  • asset_returns has shape 752 x 3, which is expected because daily percentage returns lose the first row.
  • portfolio_returns has shape 752, giving one daily portfolio return per trading day.
  • The portfolio return is being calculated correctly as the simple average of the three asset daily returns, which matches an equal-weight portfolio: $$ r_{p,t} = \frac{1}{3}\left(r_{\text{SPY},t} + r_{\text{QQQ},t} + r_{\text{GLD},t}\right) $$

The first few observed portfolio daily returns are:

  • 2023-04-11: 0.000057 → about 0.0057%
  • 2023-04-12: -0.002675 → about -0.2675%
  • 2023-04-13: 0.015460 → about 1.5460%
  • 2023-04-14: -0.007344 → about -0.7344%
  • 2023-04-17: -0.000006 → about -0.0006%

So the original request is satisfied: the notebook contains the 3-year daily price history and the equal-weight daily portfolio return series derived from it.

calculate 95% and 99% Value at Risk using historical simulation
Thinking

Calculating VaR using historical simulation

I need to respond step-by-step to the user's request to calculate the 95% and 99% Value at Risk (VaR) using historical simulation. I want to explain that this method involves computing lower-tail quantiles of the returns series. I'll calculate the 5% and 1% percentiles using one Python code block for clarity. The code will be concise, returning a summary table in percentage terms. The format will be straightforward, avoiding excess printing. Let's get started on that!

We can compute 1-day historical simulation VaR directly from the empirical return distribution:

  • 95% VaR = negative of the 5th percentile of daily portfolio returns
  • 99% VaR = negative of the 1st percentile of daily portfolio returns
Compute portfolio VaR table
var_95 = -portfolio_returns.quantile(0.05)
var_99 = -portfolio_returns.quantile(0.01)

var_table = pd.DataFrame({
    "confidence_level": ["95%", "99%"],
    "VaR_return": [var_95, var_99],
    "VaR_percent": [var_95 * 100, var_99 * 100]
})

var_table
confidence_level VaR_return VaR_percent
0 95% 0.012858 1.285840
1 99% 0.022490 2.249032

This gives the historical 1-day VaR based on your equal-weight portfolio returns.

Using historical simulation on the observed daily equal-weight portfolio returns, the 1-day Value at Risk estimates are:

  • 95% VaR: 0.012858
    This means there is an estimated 5% chance the portfolio could lose about 1.29% or more in one day.

  • 99% VaR: 0.022490
    This means there is an estimated 1% chance the portfolio could lose about 2.25% or more in one day.

These values come directly from the empirical return distribution over the 752 daily observations:

  • 95% VaR = negative of the 5th percentile
  • 99% VaR = negative of the 1st percentile

So, for your equal-weight SPY / QQQ / GLD portfolio, the historical simulation suggests the portfolio’s worst expected daily loss threshold is roughly:

  • 1.29% at 95% confidence
  • 2.25% at 99% confidence
calculate Conditional VaR (Expected Shortfall) at 95%
Thinking

Calculating CVaR

I need to respond with the next step code in one block. The user wants to calculate the Conditional Value at Risk (CVaR) at 95%. To do this, I can estimate it by computing the average of returns that are worse than or equal to the 5th percentile. I should provide a concise explanation along with the code, so I'll ensure it’s in one code block using plain markdown. Let's create that small step clearly!

We already have the 95% historical VaR cutoff, so the 95% Conditional VaR (Expected Shortfall) is the average loss on the worst 5% of days.

Compute VaR and CVaR
cutoff_95 = portfolio_returns.quantile(0.05)
cvar_95 = -portfolio_returns[portfolio_returns <= cutoff_95].mean()

pd.DataFrame({
    "metric": ["VaR 95%", "CVaR 95%"],
    "return_value": [var_95, cvar_95],
    "percent_value": [var_95 * 100, cvar_95 * 100]
})
metric return_value percent_value
0 VaR 95% 0.012858 1.285840
1 CVaR 95% 0.020016 2.001585

This will give the historical 95% Expected Shortfall for your portfolio.

The 95% historical Conditional VaR (Expected Shortfall) for your equal-weight SPY / QQQ / GLD portfolio is 0.020016, or 2.001585%.

In plain terms: among the worst 5% of daily portfolio return outcomes in your 3-year sample, the average daily loss was about 2.00%.

From the observed results:

  • 95% VaR: 1.285840%
  • 95% CVaR / Expected Shortfall: 2.001585%

This means CVaR is more severe than VaR, as expected: VaR gives the loss threshold at the 5th percentile, while CVaR gives the average loss beyond that threshold.

plot the return distribution with VaR and CVaR thresholds marked
Thinking

Calculating cvar and plotting

I'm noticing that since cvar_95 is a positive loss number, the return-space location is negative. I’ll compute cvar_return_95 as the mean of portfolio returns that are less than or equal to cutoff_95. Then I can visualize it using a kernel density estimate or a histogram. Finally, I’ll end with plt.show() as the function call to display the plot, following the notebook rule about having a single final expression. Let’s add a brief explanation for clarity!

Here’s a histogram of daily portfolio returns with the 95% VaR cutoff and the 95% CVaR tail-average location marked on the return axis.

Plot portfolio return distribution
import matplotlib.pyplot as plt
import seaborn as sns

cvar_return_95 = portfolio_returns[portfolio_returns <= cutoff_95].mean()

plt.figure(figsize=(10, 5))
sns.histplot(portfolio_returns, bins=40, kde=True, color="steelblue")

plt.axvline(cutoff_95, color="crimson", linestyle="--", linewidth=2, label=f"VaR 95% = {cutoff_95:.2%}")
plt.axvline(cvar_return_95, color="darkorange", linestyle="--", linewidth=2, label=f"CVaR 95% = {cvar_return_95:.2%}")

plt.title("Portfolio Return Distribution with 95% VaR and CVaR")
plt.xlabel("Daily portfolio return")
plt.ylabel("Count")
plt.legend()
plt.tight_layout()
plt.show()

A small interpretation note: both lines should appear on the left tail because they are loss-region return values. The CVaR line will usually sit further left than the VaR line, showing that average tail losses are worse than the VaR cutoff.

The plot matches the goal: it shows the full daily return distribution for the equal-weight portfolio and marks both tail-risk thresholds on the left side of the distribution.

From the observed values used in that chart:

  • 95% VaR threshold: -1.285840% return
  • 95% CVaR threshold: -2.001585% average return within the worst 5% tail

What this means on the plot:

  • The red dashed VaR line marks the 5th-percentile cutoff. About 5% of daily returns were worse than this level.
  • The orange dashed CVaR line sits further left, showing that once returns fall into that worst 5%, the average loss is more severe than the VaR cutoff.
  • The gap between the two lines indicates meaningful downside tail risk: bad days below the VaR threshold tended to be notably worse on average.

So the visualization supports the key takeaway: the portfolio’s loss threshold at 95% confidence was about 1.29%, but the average loss on the worst 5% of days was about 2.00%.

Browse All Analysis Examples