Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who this example is for

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Price DataFrame for the selected tickers over ~2 years
  • Daily returns and a 5x5 correlation heatmap
  • Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
  • Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
  • Portfolio weights visualization (pie chart)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
  2. 2compute daily returns and show the correlation matrix
  3. 3run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
  4. 4identify the max Sharpe ratio portfolio and print the optimal weights

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b5/101/21/22/31/20/1Open gemma4:31b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with glm-5.1

This run uses AI with a step-by-step list of prompts to perform the analysis task Portfolio Optimization in Python. Below, you can review the full conversation generated with glm-5.1. After the conversation, we show detailed score cards for this run.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
Thinking

The user wants to download 2 years of daily prices for 5 tickers using yfinance.

Download stock data
import yfinance as yf
import datetime

end = datetime.date.today()
start = end - datetime.timedelta(days=2*365)

tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "JPM"]
data = yf.download(tickers, start=start, end=end)

data.shape
[*********************100%***********************]  5 of 5 completed
(501, 25)

Run this and we'll inspect the downloaded data in the next step.

Thinking

The data has been downloaded successfully — 501 rows × 25 columns covering 5 tickers with 5 price fields each (Close, High, Low, Open, Volume). Let me inspect the data briefly to confirm it looks good.

The download completed successfully — 501 trading days across 5 tickers with Open, High, Low, Close, and Volume columns (5 × 5 = 25 columns). Let me do a quick sanity check on the date range and a peek at closing prices.

Print date range
print("Date range:", data.index.min().date(), "to", data.index.max().date())
data.tail()
Date range: 2024-04-11 to 2026-04-10
Price Close High ... Open Volume
Ticker AAPL AMZN GOOGL JPM MSFT AAPL AMZN GOOGL JPM MSFT ... AAPL AMZN GOOGL JPM MSFT AAPL AMZN GOOGL JPM MSFT
Date
2026-04-06 258.859985 212.789993 299.989990 295.450012 372.880005 262.160004 212.949997 300.619995 296.829987 373.730011 ... 256.510010 209.800003 295.869995 294.059998 373.489990 29329900 25385500 16945500 7142600 16146600
2026-04-07 253.500000 213.770004 305.459991 297.399994 372.290009 256.200012 213.970001 305.630005 298.179993 372.450012 ... 256.160004 211.240005 302.730011 293.140015 370.339996 62148000 28154600 23205400 7849100 21443300
2026-04-08 258.899994 221.250000 317.320007 307.970001 374.329987 259.750000 226.100006 322.079987 311.260010 385.000000 ... 258.450012 223.789993 320.450012 307.970001 384.980011 41032800 51092700 33547100 10590200 33064800
2026-04-09 260.489990 233.649994 318.489990 310.329987 373.070007 261.119995 233.800003 319.540009 311.260010 373.500000 ... 259.000000 225.229996 315.910004 306.950012 372.500000 28121600 65964200 23739200 6586600 30435300
2026-04-10 260.480011 238.380005 317.239990 309.869995 370.869995 262.190002 240.429993 321.829987 310.350006 375.640015 ... 259.980011 235.300003 320.019989 310.000000 372.980011 31259500 56842200 19127000 6067800 28038000

5 rows × 25 columns

Thinking

The user asked to download 2 years of daily prices for those 5 tickers. The download is complete with 501 trading days (roughly 2 years), covering 2024-04-11 to 2026-04-10, with all 5 tickers and OHLCV data (25 columns). The original goal is fully satisfied — the data has been downloaded and inspected.

The download is complete and fully covers the requested scope:

  • Tickers: AAPL, MSFT, GOOGL, AMZN, JPM — all 5 present
  • Date range: 2024-04-11 to 2026-04-10 (~2 years)
  • Rows: 501 trading days
  • Columns: Open, High, Low, Close, Volume for each ticker (25 columns total)

The DataFrame data is ready in the notebook with all requested daily prices. As a quick snapshot, the latest closing prices (2026-04-10) are: AAPL ≈ 260.48 USD, AMZN ≈ 238.38 USD, GOOGL ≈ 317.24 USD, JPM ≈ 309.87 USD, MSFT ≈ 370.87 USD.

compute daily returns and show the correlation matrix
Thinking

The user wants to compute daily returns from the closing prices and show the correlation matrix. I'll extract the Close prices, compute daily percentage returns, and then show the correlation matrix.

I'll extract the closing prices, compute daily percentage returns, and display the correlation matrix.

Plot returns correlation heatmap
import seaborn as sns
import matplotlib.pyplot as plt

# Extract closing prices and compute daily returns
close_prices = data['Close']
daily_returns = close_prices.pct_change().dropna()

# Correlation matrix
corr = daily_returns.corr()

# Plot heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1, fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix of Daily Returns')
plt.tight_layout()
plt.show()

corr
Ticker AAPL AMZN GOOGL JPM MSFT
Ticker
AAPL 1.000000 0.474660 0.428464 0.344752 0.425048
AMZN 0.474660 1.000000 0.549132 0.420789 0.562355
GOOGL 0.428464 0.549132 1.000000 0.310548 0.421153
JPM 0.344752 0.420789 0.310548 1.000000 0.328329
MSFT 0.425048 0.562355 0.421153 0.328329 1.000000
Thinking

The user's goal was to compute daily returns and show the correlation matrix. The notebook outputs already contain both the heatmap visualization and the numeric correlation matrix. The output shows the correlation values clearly. This is sufficient to answer the goal.

Let me extract insights from the correlation matrix.

Key findings:

  • Highest correlation: AMZN-MSFT at 0.56, followed by AMZN-GOOGL at 0.55
  • Lowest correlation: JPM-GOOGL at 0.31, followed by JPM-MSFT at 0.33
  • JPM (the bank) has consistently lower correlations with the tech stocks, which makes sense sectorally
  • All correlations are positive, ranging from ~0.31 to ~0.56
  • The tech stocks (AAPL, AMZN, GOOGL, MSFT) are moderately correlated with each other (0.42-0.56)

The daily returns and correlation matrix are complete. Here are the key insights from the correlation matrix:

Strongest correlations:

  • AMZN ↔ MSFT: 0.56 — the most tightly linked pair
  • AMZN ↔ GOOGL: 0.55 — second strongest

Weakest correlations:

  • JPM ↔ GOOGL: 0.31 — the least correlated pair
  • JPM ↔ MSFT: 0.33 — nearly as weak

Takeaway: JPM (the only financial stock in the group) consistently shows lower correlation with the four tech stocks (0.31–0.42), which makes intuitive sense — banks and big tech are driven by different market forces. Among the tech names, correlations are moderate (0.42–0.56), meaning they tend to move together but still offer meaningful diversification. No pair exceeds 0.56, so even the most correlated duo (AMZN-MSFT) isn't so tightly bound that holding both offers no benefit.

run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
Thinking

The user wants to run a Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier. I have the daily_returns DataFrame already computed. I need to:

  1. Calculate mean returns and covariance matrix
  2. Generate 5000 random portfolios with random weights
  3. For each portfolio, compute expected return and volatility
  4. Plot the efficient frontier (scatter plot of return vs volatility)

Let me write the code for this.

I'll use the already-computed daily_returns to calculate annualized returns and covariance, then simulate 5000 random portfolios and plot the efficient frontier.

Plot efficient frontier
import numpy as np

# Annualized mean returns and covariance matrix
mean_returns = daily_returns.mean() * 252
cov_matrix = daily_returns.cov() * 252
n_assets = len(mean_returns)

# Monte Carlo: 5000 random portfolios
num_portfolios = 5000
results = np.zeros((3, num_portfolios))  # row 0: return, row 1: volatility, row 2: Sharpe

np.random.seed(42)
for i in range(num_portfolios):
    weights = np.random.dirichlet(np.ones(n_assets))
    port_return = np.dot(weights, mean_returns)
    port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    results[0, i] = port_return
    results[1, i] = port_vol
    results[2, i] = port_return / port_vol  # Sharpe ratio (rf=0)

# Plot efficient frontier
plt.figure(figsize=(10, 7))
sc = plt.scatter(results[1, :], results[0, :], c=results[2, :], cmap='viridis', alpha=0.6, s=8)
plt.colorbar(sc, label='Sharpe Ratio')

# Highlight max Sharpe and min volatility portfolios
max_sharpe_idx = np.argmax(results[2, :])
min_vol_idx = np.argmin(results[1, :])
plt.scatter(results[1, max_sharpe_idx], results[0, max_sharpe_idx], color='red', marker='*', s=200, label='Max Sharpe')
plt.scatter(results[1, min_vol_idx], results[0, min_vol_idx], color='blue', marker='*', s=200, label='Min Volatility')

plt.xlabel('Annualized Volatility')
plt.ylabel('Annualized Return')
plt.title('Efficient Frontier — 5000 Random Portfolios')
plt.legend()
plt.tight_layout()
plt.show()

print(f"Max Sharpe  → Return: {results[0, max_sharpe_idx]:.2%}, Vol: {results[1, max_sharpe_idx]:.2%}, Sharpe: {results[2, max_sharpe_idx]:.3f}")
print(f"Min Vol     → Return: {results[0, min_vol_idx]:.2%}, Vol: {results[1, min_vol_idx]:.2%}, Sharpe: {results[2, min_vol_idx]:.3f}")
Max Sharpe  → Return: 33.08%, Vol: 22.21%, Sharpe: 1.489
Min Vol     → Return: 17.24%, Vol: 19.29%, Sharpe: 0.894
Thinking

The user asked to run a Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier. Looking at the notebook state, cell-4 has already been executed with outputs showing:

  1. A scatter plot of the efficient frontier with 5000 random portfolios
  2. Color-coded by Sharpe ratio
  3. Max Sharpe and Min Volatility portfolios highlighted
  4. Numeric results printed:
    • Max Sharpe → Return: 33.08%, Vol: 22.21%, Sharpe: 1.489
    • Min Vol → Return: 17.24%, Vol: 19.29%, Sharpe: 0.894

The original goal is fully answered - the Monte Carlo simulation was run, the efficient frontier was plotted, and key portfolio metrics were displayed. This is a Path A situation.

The Monte Carlo simulation with 5,000 random portfolios is complete, and the efficient frontier has been plotted. Here's what the results show:

Max Sharpe Ratio Portfolio (red star):

  • Annualized Return: 33.08%
  • Annualized Volatility: 22.21%
  • Sharpe Ratio: 1.489

Min Volatility Portfolio (blue star):

  • Annualized Return: 17.24%
  • Annualized Volatility: 19.29%
  • Sharpe Ratio: 0.894

The frontier shows a clear upward-sloping risk-return relationship. The best risk-adjusted return (Sharpe 1.489) achieves roughly double the return of the minimum-volatility portfolio (33% vs 17%) with only about 3 percentage points more volatility (22.2% vs 19.3%). This suggests that within this stock universe, taking on modestly more risk is well-rewarded — the Sharpe-optimal portfolio is much more efficient per unit of risk than the conservative minimum-volatility allocation.

The color gradient (Sharpe ratio) confirms that the upper-left region of the frontier offers the highest risk-adjusted returns, while portfolios in the lower-right (high volatility, low return) are clearly suboptimal.

identify the max Sharpe ratio portfolio and print the optimal weights
Thinking

The user wants to extract the weights of the max Sharpe ratio portfolio. I need to re-run the simulation (or recover the weights) since the current code only stores returns, volatility, and Sharpe in the results array but not the weights. I'll need to re-run and store the weights, then display the optimal allocation for the max Sharpe portfolio.

Let me modify the code to also store weights, then identify the max Sharpe portfolio and print its weights.

The current code didn't store the individual portfolio weights, only the return/vol/Sharpe metrics. I'll re-run the simulation this time saving all weights, then display the max Sharpe portfolio allocation.

Optimize portfolio weights
np.random.seed(42)

# Store weights alongside metrics
all_weights = np.zeros((num_portfolios, n_assets))

for i in range(num_portfolios):
    weights = np.random.dirichlet(np.ones(n_assets))
    all_weights[i] = weights
    port_return = np.dot(weights, mean_returns)
    port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    results[0, i] = port_return
    results[1, i] = port_vol
    results[2, i] = port_return / port_vol

max_sharpe_idx = np.argmax(results[2, :])
optimal_weights = all_weights[max_sharpe_idx]

# Build a clean DataFrame for display
import pandas as pd
optimal_df = pd.DataFrame({
    'Ticker': mean_returns.index,
    'Weight': optimal_weights
}).sort_values('Weight', ascending=False)

optimal_df.style.format({'Weight': '{:.2%}'})
optimal_df
Ticker Weight
3 JPM 0.504740
2 GOOGL 0.464931
4 MSFT 0.017195
1 AMZN 0.007002
0 AAPL 0.006132
Thinking

The user's goal was to identify the max Sharpe ratio portfolio and print the optimal weights. The latest output (cell-5) already shows exactly this information - a DataFrame with the optimal weights sorted by weight. The output clearly shows:

  • JPM: 50.47%
  • GOOGL: 46.49%
  • MSFT: 1.72%
  • AMZN: 0.70%
  • AAPL: 0.61%

This fully answers the user's goal. Path A is appropriate.

The max Sharpe ratio portfolio (Sharpe = 1.489, Return = 33.08%, Vol = 22.21%) concentrates almost entirely in two stocks:

TickerWeight
JPM50.47%
GOOGL46.49%
MSFT1.72%
AMZN0.70%
AAPL0.61%

JPM and GOOGL together account for ~97% of the portfolio. This makes sense — JPM offers strong returns with relatively low correlation to the tech stocks, while GOOGL provides high returns with moderate volatility. The near-zero allocations to AAPL, AMZN, and MSFT indicate that, at the margin, shifting weight away from JPM or GOOGL toward these stocks reduces the Sharpe ratio given their correlations and return/vol profiles.

Score cards for glm-5.1 for Portfolio Optimization in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major requested steps were completed: data download, daily returns and correlation matrix, Monte Carlo efficient frontier, and max Sharpe weights.

  • Execution Correctness: 2/2

    Excellent

    The code shown is coherent and likely runnable, with outputs matching the intended computations and plots.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: a ~501x25 price dataframe, 5x5 correlation matrix/heatmap, efficient frontier scatter, and printed optimal metrics plus weights.

  • Reasoning Quality: 2/2

    Excellent

    The notebook provides clear, correct interpretations of the correlation structure and portfolio results without obvious logical errors.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and reproducible, using a fixed random seed and standard portfolio calculations.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.