Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who this example is for

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Price DataFrame for the selected tickers over ~2 years
  • Daily returns and a 5x5 correlation heatmap
  • Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
  • Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
  • Portfolio weights visualization (pie chart)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
  2. 2compute daily returns and show the correlation matrix
  3. 3run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
  4. 4identify the max Sharpe ratio portfolio and print the optimal weights

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b5/101/21/22/31/20/1Open gemma4:31b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-5.4

This run uses AI with a step-by-step list of prompts to perform the analysis task Portfolio Optimization in Python. Below, you can review the full conversation generated with gpt-5.4. After the conversation, we show detailed score cards for this run.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
Thinking

Creating concise code for data download

I need to provide the user with a single code block that downloads two years of daily prices using yfinance. The steps include importing necessary libraries, loading the data, and displaying the shape and columns, along with the head of the DataFrame. I want to ensure the code is clear and runnable without asking for output. I'll make sure it covers all essential aspects while keeping it straightforward, without unnecessary tables or charts.Creating inspection code for yfinance data

I need to focus on providing just the code for downloading data with yfinance. First, I'll set data = yf.download(...) and then inspect it with data.shape, data.columns, and data.head(). Since the notebook has rules against comma-separated expressions, I’ll ensure each output is a final single expression. It could be better to create a summary DataFrame that combines shape and column counts, then show the head last. I might want to group by ticker, as that could be helpful too!

Here’s the first step: download the daily data into a DataFrame and do a quick inspection.

Download stock price data
import pandas as pd
import yfinance as yf

tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "JPM"]

prices = yf.download(
    tickers=tickers,
    period="2y",
    interval="1d",
    auto_adjust=True,
    progress=False
)

display(pd.DataFrame({"rows": [prices.shape[0]], "columns": [prices.shape[1]]}))
display(pd.DataFrame({"column_names": [str(c) for c in prices.columns]}))
prices.head()
rows columns
0 501 25
column_names
0 ('Close', 'AAPL')
1 ('Close', 'AMZN')
2 ('Close', 'GOOGL')
3 ('Close', 'JPM')
4 ('Close', 'MSFT')
5 ('High', 'AAPL')
6 ('High', 'AMZN')
7 ('High', 'GOOGL')
8 ('High', 'JPM')
9 ('High', 'MSFT')
10 ('Low', 'AAPL')
11 ('Low', 'AMZN')
12 ('Low', 'GOOGL')
13 ('Low', 'JPM')
14 ('Low', 'MSFT')
15 ('Open', 'AAPL')
16 ('Open', 'AMZN')
17 ('Open', 'GOOGL')
18 ('Open', 'JPM')
19 ('Open', 'MSFT')
20 ('Volume', 'AAPL')
21 ('Volume', 'AMZN')
22 ('Volume', 'GOOGL')
23 ('Volume', 'JPM')
24 ('Volume', 'MSFT')
Price Close High ... Open Volume
Ticker AAPL AMZN GOOGL JPM MSFT AAPL AMZN GOOGL JPM MSFT ... AAPL AMZN GOOGL JPM MSFT AAPL AMZN GOOGL JPM MSFT
Date
2024-04-11 173.461319 189.050003 158.103287 187.356613 421.461121 173.877544 189.770004 158.371063 188.449533 422.879355 ... 166.821749 186.740005 155.623780 187.903073 419.383031 91070300 40020700 27166400 10137700 17966400
2024-04-12 174.957748 186.130005 156.437042 175.238785 415.522278 176.751422 188.380005 158.906637 181.125136 418.752694 ... 172.688393 187.720001 156.665168 180.540349 417.639771 101670900 38608800 25353700 31532300 19253800
2024-04-15 171.132568 183.619995 153.590576 175.334671 407.387115 175.037037 188.690002 157.934677 179.715888 420.367869 ... 173.778487 187.429993 157.557787 176.878161 420.151194 73531800 48052400 27136500 14766600 20273500
2024-04-16 167.852402 183.320007 153.134338 173.330994 408.312866 172.192889 184.830002 154.374092 175.593501 412.075127 ... 170.201022 183.270004 152.926068 175.344232 408.303038 73711200 32891300 20779500 16451800 16765600
2024-04-17 166.484863 181.279999 154.195572 172.640732 405.614349 169.110958 184.570007 155.792375 174.884061 412.547936 ... 168.080344 184.309998 154.344336 173.877434 410.942572 50901200 31359700 21763100 9017100 15855500

5 rows × 25 columns

You now have the requested 2 years of daily price data for all five tickers: AAPL, MSFT, GOOGL, AMZN, and JPM.

From the notebook output:

  • The download returned 501 trading days
  • The DataFrame has 25 columns, which matches:
    • 5 fields: Open, High, Low, Close, Volume
    • across 5 tickers: AAPL, AMZN, GOOGL, JPM, MSFT
  • The columns are stored as a MultiIndex, with levels like ('Close', 'AAPL')
  • The sample rows confirm the index is daily dates and the values are populated correctly

Because auto_adjust=True was used:

  • the OHLC price series are adjusted for splits/dividends
  • there is no separate Adj Close column, which is expected

So the download matches your request: daily historical prices over the last 2 years for the five specified stocks using yfinance.

compute daily returns and show the correlation matrix

Let’s compute daily returns from the adjusted close prices and then show the return correlation matrix.

Compute return correlations
returns = prices["Close"].pct_change().dropna()

corr_matrix = returns.corr()

corr_matrix
Ticker AAPL AMZN GOOGL JPM MSFT
Ticker
AAPL 1.000000 0.474660 0.428464 0.344752 0.425048
AMZN 0.474660 1.000000 0.549141 0.420812 0.562355
GOOGL 0.428464 0.549141 1.000000 0.310542 0.421151
JPM 0.344752 0.420812 0.310542 1.000000 0.328321
MSFT 0.425048 0.562355 0.421151 0.328321 1.000000

Using the adjusted close prices, the daily return correlations are now:

  • Highest correlation: AMZN–MSFT = 0.562
  • Next highest: AMZN–GOOGL = 0.549
  • Lowest correlation: GOOGL–JPM = 0.311

What this means:

  • The large-cap tech names move together moderately, especially AMZN, MSFT, and GOOGL.
  • JPM is the least aligned with the tech stocks, which makes sense since it is a bank rather than a mega-cap technology company.
  • None of the correlations are extremely high, so while these stocks often move in the same direction, they still retain meaningful differences in daily behavior.

Observed correlation matrix:

AAPLAMZNGOOGLJPMMSFT
AAPL1.0000.4750.4280.3450.425
AMZN0.4751.0000.5490.4210.562
GOOGL0.4280.5491.0000.3110.421
JPM0.3450.4210.3111.0000.328
MSFT0.4250.5620.4210.3281.000
run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
Thinking

Considering portfolio simulation steps

I can either use returns if they exist or compute them from prices. I think I'll safely use prices["Close"].pct_change().dropna() to keep things clean. The original goal is to simulate random portfolios and plot the frontier, which should be beginner-friendly. I’ll create one code block, then use seaborn or matplotlib to plot with plt.scatter. I'll ensure the title and labels are clear, mentioning 5000 portfolios, annualizing over 252 trading days with a zero risk-free rate for the Sharpe ratio.

Let’s simulate 5,000 random long-only portfolios from the daily returns and plot the efficient frontier, using annualized return and volatility.

Plot efficient frontier
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

returns = prices["Close"].pct_change().dropna()

mean_returns = returns.mean() * 252
cov_matrix = returns.cov() * 252

n_portfolios = 5000
n_assets = len(mean_returns)

np.random.seed(42)

weights_list = []
portfolio_returns = []
portfolio_vols = []
sharpe_list = []

for _ in range(n_portfolios):
    w = np.random.random(n_assets)
    w = w / w.sum()

    port_return = np.dot(w, mean_returns)
    port_vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
    sharpe = port_return / port_vol

    weights_list.append(w)
    portfolio_returns.append(port_return)
    portfolio_vols.append(port_vol)
    sharpe_list.append(sharpe)

mc_df = pd.DataFrame({
    "Return": portfolio_returns,
    "Volatility": portfolio_vols,
    "Sharpe": sharpe_list
})

max_sharpe_idx = mc_df["Sharpe"].idxmax()
min_vol_idx = mc_df["Volatility"].idxmin()

plt.figure(figsize=(10, 6))
plt.scatter(
    mc_df["Volatility"],
    mc_df["Return"],
    c=mc_df["Sharpe"],
    cmap="viridis",
    s=12,
    alpha=0.7
)
plt.colorbar(label="Sharpe ratio")
plt.scatter(
    mc_df.loc[max_sharpe_idx, "Volatility"],
    mc_df.loc[max_sharpe_idx, "Return"],
    color="red",
    s=120,
    marker="*",
    label="Max Sharpe"
)
plt.scatter(
    mc_df.loc[min_vol_idx, "Volatility"],
    mc_df.loc[min_vol_idx, "Return"],
    color="blue",
    s=80,
    marker="X",
    label="Min Volatility"
)
plt.title("Efficient Frontier from 5,000 Random Portfolios")
plt.xlabel("Annualized Volatility")
plt.ylabel("Annualized Return")
plt.legend()
plt.tight_layout()
plt.show()
Thinking

Crafting a concise explanation

I think the approach should be straightforward since the notebook rules indicate that if multiple tables are needed, they should be displayed sequentially, but in this case, just one table will suffice. I'll focus on keeping the code minimal as well. So, my plan is to create a concise paragraph explaining the context and include the necessary code without any extra headings. That should make it clear and easy to follow!

The simulation and efficient frontier plot are already in place, and the chart indicates two key portfolios were identified: the highest-Sharpe portfolio and the lowest-volatility portfolio. What’s still missing from the visible output is the numeric evidence behind those two points, so the most useful next step is to show their annualized return, volatility, Sharpe ratio, and asset weights.

Summarize optimal portfolios
asset_names = returns.columns.tolist()

summary = pd.DataFrame({
    "Portfolio": ["Max Sharpe", "Min Volatility"],
    "Return": [mc_df.loc[max_sharpe_idx, "Return"], mc_df.loc[min_vol_idx, "Return"]],
    "Volatility": [mc_df.loc[max_sharpe_idx, "Volatility"], mc_df.loc[min_vol_idx, "Volatility"]],
    "Sharpe": [mc_df.loc[max_sharpe_idx, "Sharpe"], mc_df.loc[min_vol_idx, "Sharpe"]],
})

max_w = pd.Series(weights_list[max_sharpe_idx], index=asset_names).rename("Max Sharpe Weight")
min_w = pd.Series(weights_list[min_vol_idx], index=asset_names).rename("Min Volatility Weight")

display(summary.round(4))
pd.concat([max_w, min_w], axis=1).round(4)
Portfolio Return Volatility Sharpe
0 Max Sharpe 0.3288 0.2216 1.4839
1 Min Volatility 0.1684 0.1927 0.8738
Max Sharpe Weight Min Volatility Weight
AAPL 0.0082 0.1899
AMZN 0.0094 0.0021
GOOGL 0.4658 0.1324
JPM 0.4939 0.2890
MSFT 0.0228 0.3865

The Monte Carlo simulation and efficient frontier are supported by the observed outputs.

From the 5,000 random long-only portfolios, the standout portfolio by Sharpe ratio has:

  • Annualized return: 0.3288
  • Annualized volatility: 0.2216
  • Sharpe ratio: 1.4839

Its weights are concentrated mostly in:

  • JPM: 0.4939
  • GOOGL: 0.4658

with only small allocations to AAPL, AMZN, and MSFT.

The minimum-volatility portfolio has:

  • Annualized return: 0.1684
  • Annualized volatility: 0.1927
  • Sharpe ratio: 0.8738

Its weights are more diversified, with the largest allocations in:

  • MSFT: 0.3865
  • JPM: 0.2890
  • AAPL: 0.1899
  • GOOGL: 0.1324

Interpreting the frontier:

  • The max-Sharpe portfolio offers the strongest return per unit of risk among the simulated portfolios.
  • The min-volatility portfolio reduces risk, but also gives up a meaningful amount of return.
  • Based on these simulated results, JPM and GOOGL were the main drivers of the best risk-adjusted portfolio over this 2-year sample.

So the efficient frontier has been produced, and the numeric outputs confirm the key frontier points rather than relying on the plot alone.

identify the max Sharpe ratio portfolio and print the optimal weights
Thinking

Explaining max Sharpe ratio portfolio

I need to base my answer on the notebook's current state. The user is asking me to identify the maximum Sharpe ratio portfolio and print the optimal weights. Since I've already computed that, I can answer directly without needing any code. I should use markdown for clarity and keep it concise while mentioning the max Sharpe value and the weights. It's also good because there are no currency issues to concern myself with.

The max Sharpe ratio portfolio is already identified in the notebook.

Its metrics are:

  • Annualized return: 0.3288
  • Annualized volatility: 0.2216
  • Sharpe ratio: 1.4839

The optimal weights for the max Sharpe portfolio are:

  • AAPL: 0.0082
  • AMZN: 0.0094
  • GOOGL: 0.4658
  • JPM: 0.4939
  • MSFT: 0.0228

So this portfolio is heavily concentrated in JPM and GOOGL.

Score cards for gpt-5.4 for Portfolio Optimization in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major requested steps were completed: data download, daily returns and correlation matrix, Monte Carlo simulation with 5,000 portfolios, efficient frontier plot, and max Sharpe portfolio weights.

  • Execution Correctness: 2/2

    Excellent

    The code shown is coherent and likely runnable, with valid yfinance download, return calculation, simulation, plotting, and summary extraction logic.

  • Output Quality: 3/3

    Excellent

    The outputs match the expected outcomes semantically: a ~501x25 price frame, a 5x5 correlation matrix, an efficient frontier scatter plot, and printed optimal weights with Sharpe, return, and volatility.

  • Reasoning Quality: 2/2

    Excellent

    The notebook provides clear, correct interpretations of the data, correlations, and portfolio results, and the explanations align with the computed outputs.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and uses standard methods without obvious hallucinations or unsafe assumptions.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.