Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who this example is for

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Price DataFrame for the selected tickers over ~2 years
  • Daily returns and a 5x5 correlation heatmap
  • Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
  • Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
  • Portfolio weights visualization (pie chart)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
  2. 2compute daily returns and show the correlation matrix
  3. 3run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
  4. 4identify the max Sharpe ratio portfolio and print the optimal weights

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b5/101/21/22/31/20/1Open gemma4:31b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-oss:120b

This run uses AI with a step-by-step list of prompts to perform the analysis task Portfolio Optimization in Python. Below, you can review the full conversation generated with gpt-oss:120b. After the conversation, we show detailed score cards for this run.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
Thinking

We need to download 2 years of daily prices for those tickers using yfinance. Provide code that loads data, shows shape, columns, head. Since we need to fetch, we can use yfinance download with start/end. We'll compute end = today, start = today - 2 years. Use pandas. Provide code block that does that and ends with displaying head, shape, columns. Use variable like df. Provide code.

Here’s a quick fetch of the last 2 years of daily OHLCV data for the five symbols. The code loads the data into a DataFrame prices, then shows its basic shape, column names, and the first few rows.

Download stock price data
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta

# Define tickers and date range (2 years back from today)
tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "JPM"]
end_date = datetime.today()
start_date = end_date - timedelta(days=2 * 365)

# Download daily price data
prices = yf.download(
    tickers,
    start=start_date.strftime("%Y-%m-%d"),
    end=end_date.strftime("%Y-%m-%d"),
    interval="1d",
    group_by="ticker",   # keep separate columns per ticker
    auto_adjust=False,   # keep raw OHLCV; set to True if you prefer adjusted close
)

# Quick overview
prices.shape, prices.columns[:10], prices.head()
[*********************100%***********************]  5 of 5 completed
((499, 30),
 MultiIndex([( 'AMZN',      'Open'),
             ( 'AMZN',      'High'),
             ( 'AMZN',       'Low'),
             ( 'AMZN',     'Close'),
             ( 'AMZN', 'Adj Close'),
             ( 'AMZN',    'Volume'),
             ('GOOGL',      'Open'),
             ('GOOGL',      'High'),
             ('GOOGL',       'Low'),
             ('GOOGL',     'Close')],
            names=['Ticker', 'Price']),
 Ticker            AMZN                                                  \
 Price             Open        High         Low       Close   Adj Close   
 Date                                                                     
 2024-04-15  187.429993  188.690002  183.000000  183.619995  183.619995   
 2024-04-16  183.270004  184.830002  182.259995  183.320007  183.320007   
 2024-04-17  184.309998  184.570007  179.820007  181.279999  181.279999   
 2024-04-18  181.470001  182.389999  178.649994  179.220001  179.220001   
 2024-04-19  178.740005  179.000000  173.440002  174.630005  174.630005   
 
 Ticker                     GOOGL                                      ...  \
 Price         Volume        Open        High         Low       Close  ...   
 Date                                                                  ...   
 2024-04-15  48052400  158.860001  159.240005  154.589996  154.860001  ...   
 2024-04-16  32891300  154.190002  155.649994  153.429993  154.399994  ...   
 2024-04-17  31359700  155.619995  157.080002  154.580002  155.470001  ...   
 2024-04-18  30723800  155.339996  156.940002  154.619995  156.009995  ...   
 2024-04-19  56000700  156.199997  156.360001  152.300003  154.089996  ...   
 
 Ticker             JPM                                          AAPL  \
 Price              Low       Close   Adj Close    Volume        Open   
 Date                                                                   
 2024-04-15  182.199997  182.889999  175.334671  14766600  175.360001   
 2024-04-16  179.649994  180.800003  173.331009  16451800  171.750000   
 2024-04-17  179.199997  180.080002  172.640732   9017100  169.610001   
 2024-04-18  179.970001  181.250000  173.762405   9557700  168.029999   
 2024-04-19  181.410004  185.800003  178.124451  13402300  166.210007   
 
 Ticker                                                                
 Price             High         Low       Close   Adj Close    Volume  
 Date                                                                  
 2024-04-15  176.630005  172.500000  172.690002  171.132538  73531800  
 2024-04-16  173.759995  168.270004  169.380005  167.852417  73711200  
 2024-04-17  170.649994  168.000000  168.000000  166.484833  50901200  
 2024-04-18  168.639999  166.550003  167.039993  165.533508  43122900  
 2024-04-19  166.399994  164.080002  165.000000  163.511902  68149400  
 
 [5 rows x 30 columns])
Thinking

We need to decide if the current outputs answer the original user request: "download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance". The user asked to download. The code executed and the output shows the shape (499,30), column MultiIndex preview, and head rows. That is sufficient to confirm that the download succeeded and provides the data.

Thus we can provide final answer summarizing that we have the DataFrame prices with 2 years of daily OHLCV data for the five tickers, shape, etc. No further code needed. Provide final answer.

The download succeeded. prices is a DataFrame with 499 rows (daily records for the past two years) and 30 columns (Open, High, Low, Close, Adj Close, and Volume for each of the five tickers). The MultiIndex columns are organized as (Ticker, Price), e.g., ('AAPL', 'Open'), ('MSFT', 'Close'), etc. The first few rows show the expected daily price values for each symbol. You now have the full two‑year daily price series for AAPL, MSFT, GOOGL, AMZN, and JPM ready for further analysis.

compute daily returns and show the correlation matrix
Thinking

We need to compute daily returns for each ticker, likely using Adj Close. Compute pct_change per day for each ticker's Adj Close, then produce correlation matrix (5x5). Show as a DataFrame. Possibly also visualize heatmap. But per instruction, one step only: compute returns and show correlation matrix. Provide code that calculates returns and correlation, then display. End with expression that shows correlation matrix. Let's do that.

Here’s the daily (percentage) returns based on the Adj Close price for each ticker and the resulting correlation matrix.

Compute correlation matrix
# Compute daily pct_change for Adj Close of each ticker
adj_close = prices.xs('Adj Close', level='Price', axis=1)      # shape: (dates, 5)
daily_ret = adj_close.pct_change().dropna()                  # remove first NaN row

# Correlation matrix across the five assets
corr_matrix = daily_ret.corr()

corr_matrix
Ticker AMZN GOOGL MSFT JPM AAPL
Ticker
AMZN 1.000000 0.548094 0.561130 0.421836 0.474951
GOOGL 0.548094 1.000000 0.418972 0.310945 0.427759
MSFT 0.561130 0.418972 1.000000 0.327022 0.424383
JPM 0.421836 0.310945 0.327022 1.000000 0.354816
AAPL 0.474951 0.427759 0.424383 0.354816 1.000000
Thinking

The user asked: compute daily returns and show the correlation matrix. The notebook already computed daily_ret and corr_matrix, and printed corr_matrix output. So the goal is satisfied. Need to provide final answer with insights, no code. Include final answer marker.

The daily percentage returns based on the adjusted‑close prices have been calculated for each of the five stocks.

The resulting correlation matrix (rounded to three decimals) is:

AMZNGOOGLMSFTJPMAAPL
AMZN1.0000.5480.5610.4220.475
GOOGL0.5481.0000.4190.3110.428
MSFT0.5610.4191.0000.3270.424
JPM0.4220.3110.3271.0000.355
AAPL0.4750.4280.4240.3551.000

Interpretation

  • All pairs have positive correlations, indicating that the stocks generally move in the same direction.
  • The strongest relationship is between AMZN and MSFT (≈ 0.56), while JPM shows the weakest links to the tech stocks (correlations around 0.31–0.42).
  • The tech giants (AAPL, MSFT, GOOGL, AMZN) are moderately correlated with each other (≈ 0.42–0.55), reflecting shared market dynamics.

These results fulfill the request to compute daily returns and present their correlation matrix.

run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
Thinking

We need to run Monte Carlo simulation of random portfolios using daily returns data. Steps: compute mean returns, cov matrix, generate random weights for 5000 portfolios, calculate annualized return, volatility, Sharpe ratio (assuming risk-free rate 0). Then plot efficient frontier scatter of volatility vs return, color by Sharpe. Use seaborn/matplotlib.

We have daily_ret DataFrame (498 rows). Compute mean daily return, cov. Annualization factor = 252 trading days.

Let's propose code to generate portfolio metrics and plot.

We need to output code block only, final expression maybe display plot using matplotlib.

Will implement.

Below we’ll (1) generate 5 000 random weight combinations, (2) compute each portfolio’s annualised mean return, volatility and Sharpe ratio, and (3) plot the efficient‑frontier scatter (volatility ↔ return) coloured by Sharpe.

Plot efficient frontier
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# --- 1. Prepare inputs ----------------------------------------------------
mu_daily = daily_ret.mean()
cov_daily = daily_ret.cov()

trading_days = 252
mu_annual = mu_daily * trading_days
cov_annual = cov_daily * trading_days

tickers = mu_daily.index.tolist()
n_assets = len(tickers)

# --- 2. Monte‑Carlo simulation --------------------------------------------
n_portfolios = 5_000
rng = np.random.default_rng()

# Random weights that sum to 1
weights = rng.random((n_portfolios, n_assets))
weights = weights / weights.sum(axis=1, keepdims=True)

# Portfolio statistics
port_ret = weights @ mu_annual.values                     # expected annual return
# corrected volatility calculation
port_vol = np.sqrt(np.einsum('ij,ij->i', weights @ cov_annual.values, weights))
port_sharpe = port_ret / port_vol                         # risk‑free rate = 0

# --- 3. Plot efficient frontier --------------------------------------------
plt.figure(figsize=(10, 6))
scatter = plt.scatter(port_vol, port_ret, c=port_sharpe, cmap='viridis',
                     alpha=0.6, edgecolor='k')
plt.title('Monte‑Carlo Efficient Frontier (5 000 Random Portfolios)')
plt.xlabel('Annualised Volatility (σ)')
plt.ylabel('Annualised Expected Return (µ)')
cbar = plt.colorbar(scatter)
cbar.set_label('Sharpe Ratio')
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
Thinking

The user wants to "run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier". The notebook cell 3 already executed, producing a figure. So the goal is accomplished: simulation done, plot produced. Need to provide final answer summarizing insights. According to path decision, we can choose Path A because output includes the plot. Should include final answer with insights, without code. Provide concise interpretation.

The Monte‑Carlo simulation generated 5 000 random portfolios, each with a set of weights that sum to 1. For every portfolio we computed the annualised expected return, annualised volatility, and Sharpe ratio (assuming a risk‑free rate of 0).

The scatter plot visualises the efficient frontier:

  • X‑axis (volatility σ) – portfolio risk, measured as the annualised standard deviation of returns.
  • Y‑axis (expected return µ) – annualised mean return of the portfolio.
  • Colour – Sharpe ratio; brighter colours indicate higher risk‑adjusted performance.

The cloud of points shows a clear trade‑off between risk and return: portfolios with higher volatility tend to achieve higher expected returns, while those in the lower‑left region offer lower risk but also lower returns. The colour gradient highlights the most efficient portfolios (high Sharpe) typically lying toward the upper‑left side of the frontier. This visualisation fulfills the requested Monte‑Carlo simulation and efficient‑frontier plot.

identify the max Sharpe ratio portfolio and print the optimal weights
Thinking

We need to write code to find the portfolio with max Sharpe ratio, and print optimal weights. Use existing variables: port_sharpe, weights, port_ret, port_vol. Should identify index of max, then display weights with ticker names. Provide code block only.

Show best Sharpe portfolio
# Index of portfolio with the highest Sharpe ratio
max_idx = port_sharpe.argmax()

# Corresponding metrics
best_ret   = port_ret[max_idx]
best_vol   = port_vol[max_idx]
best_sharpe = port_sharpe[max_idx]

# Optimal weights (aligned with ticker symbols)
optimal_weights = pd.Series(weights[max_idx], index=tickers)

print(f"Maximum Sharpe Ratio Portfolio:")
print(f"  Sharpe Ratio = {best_sharpe:.4f}")
print(f"  Expected Annual Return = {best_ret:.4%}")
print(f"  Annual Volatility = {best_vol:.4%}")
print("\nOptimal Weights:")
print(optimal_weights.round(4))
Maximum Sharpe Ratio Portfolio:
  Sharpe Ratio = 1.5949
  Expected Annual Return = 34.7692%
  Annual Volatility = 21.8003%

Optimal Weights:
AMZN     0.0292
GOOGL    0.4526
MSFT     0.0206
JPM      0.4511
AAPL     0.0466
dtype: float64
Thinking

We need to decide if current outputs suffice to answer original goal: "identify the max Sharpe ratio portfolio and print the optimal weights". The output from cell-4 already printed maximum Sharpe ratio portfolio details and optimal weights. So we have the answer. Need to provide final answer with those values, not code, include <!--final answer-->. Must not restate output already visible? The guideline says "Do not restate output already visible in the latest output." Wait: it says "Do not restate output already visible in the latest output." Hmm. We have an output already showing the values. The rule says not to restate output already visible. So we may need to avoid repeating exactly same text? But we need to provide final answer. Possibly we can summarize without verbatim repetition? We should avoid copying exactly same lines. Provide concise answer: Sharpe ratio = 1.5949, expected annual return 34.7692%, volatility 21.8003%, weights: AMZN ~2.92%, GOOGL ~45.26%, MSFT ~2.06%, JPM ~45.11%, AAPL ~4.66%. This extracts but not verbatim. That likely satisfies. Provide final answer with comment.

The portfolio with the highest Sharpe ratio has:

  • Sharpe ratio: 1.5949
  • Expected annual return: 34.7692 %
  • Annual volatility: 21.8003 %

Optimal asset weights

TickerWeight
AMZN2.92 %
GOOGL45.26 %
MSFT2.06 %
JPM45.11 %
AAPL4.66 %

Score cards for gpt-oss:120b for Portfolio Optimization in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major requested steps were completed: price download, returns/correlation, Monte Carlo frontier, and max-Sharpe portfolio with weights.

  • Execution Correctness: 2/2

    Excellent

    The code is coherent and the provided outputs indicate it ran successfully, including the plot and printed portfolio metrics.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: a ~2-year price dataframe, 5x5 correlation matrix, efficient frontier scatter, and optimal weights with Sharpe/return/volatility.

  • Reasoning Quality: 2/2

    Excellent

    The workflow uses appropriate finance logic: adjusted-close returns, annualization, random portfolio simulation, and Sharpe maximization.

  • Reliability: 1/1

    Excellent

    The approach is consistent and reasonable, with no obvious hallucinations or unstable logic in the provided workflow.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.