Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who this example is for

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

Price DataFrame for the selected tickers over ~2 years
Daily returns and a 5x5 correlation heatmap
Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
Portfolio weights visualization (pie chart)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

yfinance
pandas
numpy
matplotlib
seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

1download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
2compute daily returns and show the correlation matrix
3run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
4identify the max Sharpe ratio portfolio and print the optimal weights

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model Source	Score	Task	Execution	Output	Reasoning	Reliability	Run
glm-5.1	10/10	2/2	2/2	3/3	2/2	1/1	Open glm-5.1 conversation
gpt-5.4	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-5.4 conversation
gpt-oss:120b	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-oss:120b conversation
qwen3-coder-next	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3-coder-next conversation
qwen3.5:397b	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3.5:397b conversation
gemma4:31b	5/10	1/2	1/2	2/3	1/2	0/1	Open gemma4:31b conversation

How scoring works

Each run is graded on five dimensions:

Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with glm-5.1

This run uses AI with a step-by-step list of prompts to perform the analysis task Portfolio Optimization in Python. Below, you can review the full conversation generated with glm-5.1. After the conversation, we show detailed score cards for this run.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance

Thinking

The user wants to download 2 years of daily prices for 5 tickers using yfinance.

Download stock data

import yfinance as yf
import datetime

end = datetime.date.today()
start = end - datetime.timedelta(days=2*365)

tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "JPM"]
data = yf.download(tickers, start=start, end=end)

data.shape

[*********************100%***********************]  5 of 5 completed

(501, 25)

Run this and we'll inspect the downloaded data in the next step.

Thinking

The data has been downloaded successfully — 501 rows × 25 columns covering 5 tickers with 5 price fields each (Close, High, Low, Open, Volume). Let me inspect the data briefly to confirm it looks good.

The download completed successfully — 501 trading days across 5 tickers with Open, High, Low, Close, and Volume columns (5 × 5 = 25 columns). Let me do a quick sanity check on the date range and a peek at closing prices.

Print date range

print("Date range:", data.index.min().date(), "to", data.index.max().date())
data.tail()

#10

Date range: 2024-04-11 to 2026-04-10

#11

Price	Close					High					...	Open					Volume
Ticker	AAPL	AMZN	GOOGL	JPM	MSFT	AAPL	AMZN	GOOGL	JPM	MSFT	...	AAPL	AMZN	GOOGL	JPM	MSFT	AAPL	AMZN	GOOGL	JPM	MSFT
Date
2026-04-06	258.859985	212.789993	299.989990	295.450012	372.880005	262.160004	212.949997	300.619995	296.829987	373.730011	...	256.510010	209.800003	295.869995	294.059998	373.489990	29329900	25385500	16945500	7142600	16146600
2026-04-07	253.500000	213.770004	305.459991	297.399994	372.290009	256.200012	213.970001	305.630005	298.179993	372.450012	...	256.160004	211.240005	302.730011	293.140015	370.339996	62148000	28154600	23205400	7849100	21443300
2026-04-08	258.899994	221.250000	317.320007	307.970001	374.329987	259.750000	226.100006	322.079987	311.260010	385.000000	...	258.450012	223.789993	320.450012	307.970001	384.980011	41032800	51092700	33547100	10590200	33064800
2026-04-09	260.489990	233.649994	318.489990	310.329987	373.070007	261.119995	233.800003	319.540009	311.260010	373.500000	...	259.000000	225.229996	315.910004	306.950012	372.500000	28121600	65964200	23739200	6586600	30435300
2026-04-10	260.480011	238.380005	317.239990	309.869995	370.869995	262.190002	240.429993	321.829987	310.350006	375.640015	...	259.980011	235.300003	320.019989	310.000000	372.980011	31259500	56842200	19127000	6067800	28038000

5 rows × 25 columns

#12

Thinking

The user asked to download 2 years of daily prices for those 5 tickers. The download is complete with 501 trading days (roughly 2 years), covering 2024-04-11 to 2026-04-10, with all 5 tickers and OHLCV data (25 columns). The original goal is fully satisfied — the data has been downloaded and inspected.

#13

The download is complete and fully covers the requested scope:

Tickers: AAPL, MSFT, GOOGL, AMZN, JPM — all 5 present
Date range: 2024-04-11 to 2026-04-10 (~2 years)
Rows: 501 trading days
Columns: Open, High, Low, Close, Volume for each ticker (25 columns total)

The DataFrame data is ready in the notebook with all requested daily prices. As a quick snapshot, the latest closing prices (2026-04-10) are: AAPL ≈ 260.48 USD, AMZN ≈ 238.38 USD, GOOGL ≈ 317.24 USD, JPM ≈ 309.87 USD, MSFT ≈ 370.87 USD.

#14

compute daily returns and show the correlation matrix

#15

Thinking

The user wants to compute daily returns from the closing prices and show the correlation matrix. I'll extract the Close prices, compute daily percentage returns, and then show the correlation matrix.

#16

I'll extract the closing prices, compute daily percentage returns, and display the correlation matrix.

#17

Plot returns correlation heatmap

import seaborn as sns
import matplotlib.pyplot as plt

# Extract closing prices and compute daily returns
close_prices = data['Close']
daily_returns = close_prices.pct_change().dropna()

# Correlation matrix
corr = daily_returns.corr()

# Plot heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1, fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix of Daily Returns')
plt.tight_layout()
plt.show()

corr

#18

#19

Ticker	AAPL	AMZN	GOOGL	JPM	MSFT
Ticker
AAPL	1.000000	0.474660	0.428464	0.344752	0.425048
AMZN	0.474660	1.000000	0.549132	0.420789	0.562355
GOOGL	0.428464	0.549132	1.000000	0.310548	0.421153
JPM	0.344752	0.420789	0.310548	1.000000	0.328329
MSFT	0.425048	0.562355	0.421153	0.328329	1.000000

#20

Thinking

The user's goal was to compute daily returns and show the correlation matrix. The notebook outputs already contain both the heatmap visualization and the numeric correlation matrix. The output shows the correlation values clearly. This is sufficient to answer the goal.

Let me extract insights from the correlation matrix.

Key findings:

Highest correlation: AMZN-MSFT at 0.56, followed by AMZN-GOOGL at 0.55
Lowest correlation: JPM-GOOGL at 0.31, followed by JPM-MSFT at 0.33
JPM (the bank) has consistently lower correlations with the tech stocks, which makes sense sectorally
All correlations are positive, ranging from ~0.31 to ~0.56
The tech stocks (AAPL, AMZN, GOOGL, MSFT) are moderately correlated with each other (0.42-0.56)

#21

The daily returns and correlation matrix are complete. Here are the key insights from the correlation matrix:

Strongest correlations:

AMZN ↔ MSFT: 0.56 — the most tightly linked pair
AMZN ↔ GOOGL: 0.55 — second strongest

Weakest correlations:

JPM ↔ GOOGL: 0.31 — the least correlated pair
JPM ↔ MSFT: 0.33 — nearly as weak

Takeaway: JPM (the only financial stock in the group) consistently shows lower correlation with the four tech stocks (0.31–0.42), which makes intuitive sense — banks and big tech are driven by different market forces. Among the tech names, correlations are moderate (0.42–0.56), meaning they tend to move together but still offer meaningful diversification. No pair exceeds 0.56, so even the most correlated duo (AMZN-MSFT) isn't so tightly bound that holding both offers no benefit.

#22

run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier

#23

Thinking

The user wants to run a Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier. I have the daily_returns DataFrame already computed. I need to:

Calculate mean returns and covariance matrix
Generate 5000 random portfolios with random weights
For each portfolio, compute expected return and volatility
Plot the efficient frontier (scatter plot of return vs volatility)

Let me write the code for this.

#24

I'll use the already-computed daily_returns to calculate annualized returns and covariance, then simulate 5000 random portfolios and plot the efficient frontier.

#25

Plot efficient frontier

import numpy as np

# Annualized mean returns and covariance matrix
mean_returns = daily_returns.mean() * 252
cov_matrix = daily_returns.cov() * 252
n_assets = len(mean_returns)

# Monte Carlo: 5000 random portfolios
num_portfolios = 5000
results = np.zeros((3, num_portfolios))  # row 0: return, row 1: volatility, row 2: Sharpe

np.random.seed(42)
for i in range(num_portfolios):
    weights = np.random.dirichlet(np.ones(n_assets))
    port_return = np.dot(weights, mean_returns)
    port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    results[0, i] = port_return
    results[1, i] = port_vol
    results[2, i] = port_return / port_vol  # Sharpe ratio (rf=0)

# Plot efficient frontier
plt.figure(figsize=(10, 7))
sc = plt.scatter(results[1, :], results[0, :], c=results[2, :], cmap='viridis', alpha=0.6, s=8)
plt.colorbar(sc, label='Sharpe Ratio')

# Highlight max Sharpe and min volatility portfolios
max_sharpe_idx = np.argmax(results[2, :])
min_vol_idx = np.argmin(results[1, :])
plt.scatter(results[1, max_sharpe_idx], results[0, max_sharpe_idx], color='red', marker='*', s=200, label='Max Sharpe')
plt.scatter(results[1, min_vol_idx], results[0, min_vol_idx], color='blue', marker='*', s=200, label='Min Volatility')

plt.xlabel('Annualized Volatility')
plt.ylabel('Annualized Return')
plt.title('Efficient Frontier — 5000 Random Portfolios')
plt.legend()
plt.tight_layout()
plt.show()

print(f"Max Sharpe  → Return: {results[0, max_sharpe_idx]:.2%}, Vol: {results[1, max_sharpe_idx]:.2%}, Sharpe: {results[2, max_sharpe_idx]:.3f}")
print(f"Min Vol     → Return: {results[0, min_vol_idx]:.2%}, Vol: {results[1, min_vol_idx]:.2%}, Sharpe: {results[2, min_vol_idx]:.3f}")

#26

#27

Max Sharpe  → Return: 33.08%, Vol: 22.21%, Sharpe: 1.489
Min Vol     → Return: 17.24%, Vol: 19.29%, Sharpe: 0.894

#28

Thinking

The user asked to run a Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier. Looking at the notebook state, cell-4 has already been executed with outputs showing:

A scatter plot of the efficient frontier with 5000 random portfolios
Color-coded by Sharpe ratio
Max Sharpe and Min Volatility portfolios highlighted
Numeric results printed:
- Max Sharpe → Return: 33.08%, Vol: 22.21%, Sharpe: 1.489
- Min Vol → Return: 17.24%, Vol: 19.29%, Sharpe: 0.894

The original goal is fully answered - the Monte Carlo simulation was run, the efficient frontier was plotted, and key portfolio metrics were displayed. This is a Path A situation.

#29

The Monte Carlo simulation with 5,000 random portfolios is complete, and the efficient frontier has been plotted. Here's what the results show:

Max Sharpe Ratio Portfolio (red star):

Annualized Return: 33.08%
Annualized Volatility: 22.21%
Sharpe Ratio: 1.489

Min Volatility Portfolio (blue star):

Annualized Return: 17.24%
Annualized Volatility: 19.29%
Sharpe Ratio: 0.894

The frontier shows a clear upward-sloping risk-return relationship. The best risk-adjusted return (Sharpe 1.489) achieves roughly double the return of the minimum-volatility portfolio (33% vs 17%) with only about 3 percentage points more volatility (22.2% vs 19.3%). This suggests that within this stock universe, taking on modestly more risk is well-rewarded — the Sharpe-optimal portfolio is much more efficient per unit of risk than the conservative minimum-volatility allocation.

The color gradient (Sharpe ratio) confirms that the upper-left region of the frontier offers the highest risk-adjusted returns, while portfolios in the lower-right (high volatility, low return) are clearly suboptimal.

#30

identify the max Sharpe ratio portfolio and print the optimal weights

#31

Thinking

The user wants to extract the weights of the max Sharpe ratio portfolio. I need to re-run the simulation (or recover the weights) since the current code only stores returns, volatility, and Sharpe in the results array but not the weights. I'll need to re-run and store the weights, then display the optimal allocation for the max Sharpe portfolio.

Let me modify the code to also store weights, then identify the max Sharpe portfolio and print its weights.

#32

The current code didn't store the individual portfolio weights, only the return/vol/Sharpe metrics. I'll re-run the simulation this time saving all weights, then display the max Sharpe portfolio allocation.

#33

Optimize portfolio weights

np.random.seed(42)

# Store weights alongside metrics
all_weights = np.zeros((num_portfolios, n_assets))

for i in range(num_portfolios):
    weights = np.random.dirichlet(np.ones(n_assets))
    all_weights[i] = weights
    port_return = np.dot(weights, mean_returns)
    port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    results[0, i] = port_return
    results[1, i] = port_vol
    results[2, i] = port_return / port_vol

max_sharpe_idx = np.argmax(results[2, :])
optimal_weights = all_weights[max_sharpe_idx]

# Build a clean DataFrame for display
import pandas as pd
optimal_df = pd.DataFrame({
    'Ticker': mean_returns.index,
    'Weight': optimal_weights
}).sort_values('Weight', ascending=False)

optimal_df.style.format({'Weight': '{:.2%}'})
optimal_df

#34

	Ticker	Weight
3	JPM	0.504740
2	GOOGL	0.464931
4	MSFT	0.017195
1	AMZN	0.007002
0	AAPL	0.006132

#35

Thinking

The user's goal was to identify the max Sharpe ratio portfolio and print the optimal weights. The latest output (cell-5) already shows exactly this information - a DataFrame with the optimal weights sorted by weight. The output clearly shows:

JPM: 50.47%
GOOGL: 46.49%
MSFT: 1.72%
AMZN: 0.70%
AAPL: 0.61%

This fully answers the user's goal. Path A is appropriate.

#36

The max Sharpe ratio portfolio (Sharpe = 1.489, Return = 33.08%, Vol = 22.21%) concentrates almost entirely in two stocks:

Ticker	Weight
JPM	50.47%
GOOGL	46.49%
MSFT	1.72%
AMZN	0.70%
AAPL	0.61%

JPM and GOOGL together account for ~97% of the portfolio. This makes sense — JPM offers strong returns with relatively low correlation to the tech stocks, while GOOGL provides high returns with moderate volatility. The near-zero allocations to AAPL, AMZN, and MSFT indicate that, at the margin, shifting weight away from JPM or GOOGL toward these stocks reduces the Sharpe ratio given their correlations and return/vol profiles.

Score cards for glm-5.1 for Portfolio Optimization in Python

Total score: 10/10

Task Completion: 2/2
Excellent
All major requested steps were completed: data download, daily returns and correlation matrix, Monte Carlo efficient frontier, and max Sharpe weights.
Execution Correctness: 2/2
Excellent
The code shown is coherent and likely runnable, with outputs matching the intended computations and plots.
Output Quality: 3/3
Excellent
Outputs match the expected outcomes semantically: a ~501x25 price dataframe, 5x5 correlation matrix/heatmap, efficient frontier scatter, and printed optimal metrics plus weights.
Reasoning Quality: 2/2
Excellent
The notebook provides clear, correct interpretations of the correlation structure and portfolio results without obvious logical errors.
Reliability: 1/1
Excellent
The workflow is consistent and reproducible, using a fixed random seed and standard portfolio calculations.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Download Free Trial Learn about AI Data Analyst

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.

Portfolio Optimization in Python

What this AI workflow does

Who this example is for

Expected analysis outcomes

Tools and libraries used

Prompt sequence

Model Comparison

AI Analysis with glm-5.1

Score cards for glm-5.1 for Portfolio Optimization in Python

Try MLJAR Studio

Explore More AI Analysis Examples

Exploratory Data Analysis

Time Series

Data Analysis

Machine Learning

NLP

Finance

Statistics