Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who this example is for

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Price DataFrame for the selected tickers over ~2 years
  • Daily returns and a 5x5 correlation heatmap
  • Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
  • Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
  • Portfolio weights visualization (pie chart)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
  2. 2compute daily returns and show the correlation matrix
  3. 3run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
  4. 4identify the max Sharpe ratio portfolio and print the optimal weights

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b5/101/21/22/31/20/1Open gemma4:31b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with qwen3-coder-next

This run uses AI with a step-by-step list of prompts to perform the analysis task Portfolio Optimization in Python. Below, you can review the full conversation generated with qwen3-coder-next. After the conversation, we show detailed score cards for this run.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance

I'll download 2 years of daily price data for those 5 stocks using yfinance.

Download stock price data
import yfinance as yf
import pandas as pd

# Define tickers
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JPM']

# Download 2 years of daily price data
data = yf.download(tickers, period='2y', interval='1d')

# Display the shape and first few rows
data.shape, data.head()
[*********************100%***********************]  5 of 5 completed
((501, 25),
 Price            Close                                                  \
 Ticker            AAPL        AMZN       GOOGL         JPM        MSFT   
 Date                                                                     
 2024-04-11  173.461334  189.050003  158.103287  187.356598  421.461121   
 2024-04-12  174.957748  186.130005  156.437073  175.238785  415.522278   
 2024-04-15  171.132553  183.619995  153.590561  175.334656  407.387146   
 2024-04-16  167.852417  183.320007  153.134338  173.330994  408.312897   
 2024-04-17  166.484833  181.279999  154.195572  172.640717  405.614349   
 
 Price             High                                                  ...  \
 Ticker            AAPL        AMZN       GOOGL         JPM        MSFT  ...   
 Date                                                                    ...   
 2024-04-11  173.877560  189.770004  158.371063  188.449517  422.879355  ...   
 2024-04-12  176.751422  188.380005  158.906668  181.125136  418.752694  ...   
 2024-04-15  175.037022  188.690002  157.934661  179.715872  420.367901  ...   
 2024-04-16  172.192905  184.830002  154.374092  175.593501  412.075158  ...   
 2024-04-17  169.110927  184.570007  155.792375  174.884045  412.547936  ...   
 
 Price             Open                                                  \
 Ticker            AAPL        AMZN       GOOGL         JPM        MSFT   
 Date                                                                     
 2024-04-11  166.821764  186.740005  155.623780  187.903058  419.383031   
 2024-04-12  172.688393  187.720001  156.665198  180.540349  417.639771   
 2024-04-15  173.778471  187.429993  157.557772  176.878146  420.151225   
 2024-04-16  170.201038  183.270004  152.926068  175.344232  408.303068   
 2024-04-17  168.080313  184.309998  154.344336  173.877419  410.942572   
 
 Price          Volume                                          
 Ticker           AAPL      AMZN     GOOGL       JPM      MSFT  
 Date                                                           
 2024-04-11   91070300  40020700  27166400  10137700  17966400  
 2024-04-12  101670900  38608800  25353700  31532300  19253800  
 2024-04-15   73531800  48052400  27136500  14766600  20273500  
 2024-04-16   73711200  32891300  20779500  16451800  16765600  
 2024-04-17   50901200  31359700  21763100   9017100  15855500  
 
 [5 rows x 25 columns])

The output confirms successful download of 2 years of daily price data for all 5 tickers (AAPL, MSFT, GOOGL, AMZN, JPM). The DataFrame shape is (501, 25), indicating 501 trading days (approximately 2 years of weekdays) and 25 columns (5 price types × 5 tickers: Close, High, Low, Open, Volume). The head shows price values in USD for the first 5 dates starting from 2024-04-11.

The data appears complete and structured with a MultiIndex on rows (Date) and columns (Price, Ticker). This fulfills the user goal.

Successfully downloaded 2 years of daily price data (501 rows) for AAPL, MSFT, GOOGL, AMZN, and JPM. The dataset includes price columns (Close, High, Low, Open, Volume) for each ticker.

compute daily returns and show the correlation matrix

I'll compute daily returns from the price data and show the correlation matrix.

Compute daily returns and correlation
# Compute daily returns (percentage change)
returns = data['Close'].pct_change()

# Display correlation matrix of daily returns
returns.corr()
Ticker AAPL AMZN GOOGL JPM MSFT
Ticker
AAPL 1.000000 0.474660 0.428463 0.344752 0.425048
AMZN 0.474660 1.000000 0.549132 0.420789 0.562355
GOOGL 0.428463 0.549132 1.000000 0.310548 0.421153
JPM 0.344752 0.420789 0.310548 1.000000 0.328329
MSFT 0.425048 0.562355 0.421153 0.328329 1.000000

The daily returns correlation matrix has been computed and displayed successfully. The matrix shows pairwise correlations between the daily returns of the five stocks:

  • Strongest correlation: AMZN and MSFT (0.562), indicating their prices tend to move together more than others.
  • Moderate correlations: GOOGL-AMZN (0.549), AAPL-MSFT (0.425), AAPL-GOOGL (0.428), AMZN-GOOGL (0.549).
  • Weakest correlation: JPM with other tech stocks (e.g., JPM-AAPL = 0.345, JPM-GOOGL = 0.311), reflecting its financial sector vs. tech sector differentiation.

This fulfills the original goal of computing daily returns and showing the correlation matrix.

Daily returns correlation matrix computed and displayed successfully. Key insights: AMZN and MSFT show the highest correlation (0.562), while JPM (financial sector) exhibits weaker correlations with tech stocks like AAPL (0.345) and GOOGL (0.311), consistent with sector-based return relationships.

run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier

I'll run a Monte Carlo simulation to generate 5000 random portfolios and plot the efficient frontier. For this, I'll need to compute portfolio returns and volatility for each random weight combination.

Let's proceed with:

  1. Define a function to simulate random portfolios
  2. Calculate expected returns and volatility for each portfolio
  3. Plot the efficient frontier
Plot efficient frontier with max Sharpe and min volatility portfolios
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Define number of assets and simulation runs
num_assets = len(tickers)
num_portfolios = 5000

# Calculate mean daily returns and covariance matrix
mean_returns = returns.mean()
cov_matrix = returns.cov()

# Initialize arrays to store portfolio returns, volatility, and Sharpe ratio
results = np.zeros((3, num_portfolios))
weights_record = []

# Run Monte Carlo simulation
np.random.seed(42)  # for reproducibility
for i in range(num_portfolios):
    # Generate random weights and normalize to sum to 1
    weights = np.random.random(num_assets)
    weights /= np.sum(weights)
    weights_record.append(weights)
    
    # Portfolio return (annualized, assuming 252 trading days)
    portfolio_return = np.sum(mean_returns * weights) * 252
    # Portfolio volatility (annualized)
    portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix * 252, weights)))
    
    # Store results
    results[0, i] = portfolio_return
    results[1, i] = portfolio_std_dev
    results[2, i] = portfolio_return / portfolio_std_dev  # Sharpe ratio

# Find the portfolio with the maximum Sharpe ratio (tangent portfolio)
max_sharpe_idx = np.argmax(results[2])
max_sharpe_return = results[0, max_sharpe_idx]
max_sharpe_vol = results[1, max_sharpe_idx]
max_sharpe_weights = weights_record[max_sharpe_idx]

# Find the minimum volatility portfolio
min_vol_idx = np.argmin(results[1])
min_vol_return = results[0, min_vol_idx]
min_vol_vol = results[1, min_vol_idx]
min_vol_weights = weights_record[min_vol_idx]

# Plot efficient frontier
plt.figure(figsize=(10, 7))
plt.scatter(results[1], results[0], c=results[2], cmap='viridis')
plt.colorbar(label='Sharpe Ratio')
plt.scatter(max_sharpe_vol, max_sharpe_return, c='red', s=100, label='Max Sharpe Ratio')
plt.scatter(min_vol_vol, min_vol_return, c='blue', s=100, label='Min Volatility')
plt.title('Efficient Frontier')
plt.xlabel('Volatility (Std Dev)')
plt.ylabel('Expected Return (Annualized)')
plt.legend()
plt.grid(True)
plt.show()

# Display key portfolio statistics
print(f"Max Sharpe Portfolio: Return={max_sharpe_return:.4f}, Volatility={max_sharpe_vol:.4f}")
print(f"Min Volatility Portfolio: Return={min_vol_return:.4f}, Volatility={min_vol_vol:.4f}")
Max Sharpe Portfolio: Return=0.3287, Volatility=0.2216
Min Volatility Portfolio: Return=0.1684, Volatility=0.1927

The Monte Carlo simulation with 5000 random portfolios has been executed successfully, and the efficient frontier has been plotted. The output includes:

  • A scatter plot of portfolios (color-coded by Sharpe ratio) showing the efficient frontier shape.
  • Red dot marking the portfolio with the maximum Sharpe ratio: 32.87% annualized return with 22.16% volatility.
  • Blue dot marking the minimum volatility portfolio: 16.84% annualized return with 19.27% volatility.

The efficient frontier is clearly visualized, with higher-return portfolios exhibiting higher volatility — consistent with modern portfolio theory. The red (tangency) portfolio represents the optimal risk-reward tradeoff, while the blue (global minimum variance) portfolio represents the least risky option.

All of this satisfies the original goal: run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier.

Monte Carlo simulation with 5000 portfolios successfully completed. The efficient frontier plot shows the tradeoff between risk (volatility) and return, with the maximum Sharpe ratio portfolio (32.87% return, 22.16% volatility) and minimum volatility portfolio (16.84% return, 19.27% volatility) highlighted.

identify the max Sharpe ratio portfolio and print the optimal weights

Let's identify the portfolio with the maximum Sharpe ratio and print its optimal weights.

The notebook already computed the maximum Sharpe ratio portfolio, and we have its index (max_sharpe_idx) and weights (max_sharpe_weights). We can print the weights alongside the ticker names for clarity.

Here’s the code to display the optimal weights:

Display optimal portfolio weights
# Display optimal weights for the max Sharpe ratio portfolio
print("Optimal Weights for Max Sharpe Ratio Portfolio:")
for ticker, weight in zip(tickers, max_sharpe_weights):
    print(f"{ticker}: {weight:.4f} ({weight * 100:.2f}%)")
Optimal Weights for Max Sharpe Ratio Portfolio:
AAPL: 0.0082 (0.82%)
MSFT: 0.0094 (0.94%)
GOOGL: 0.4658 (46.58%)
AMZN: 0.4939 (49.39%)
JPM: 0.0228 (2.28%)

The latest executed notebook cell ([cell-4]) successfully printed the optimal weights for the maximum Sharpe ratio portfolio, directly satisfying the original goal.

From the output:

  • AAPL: 0.82%
  • MSFT: 0.94%
  • GOOGL: 46.58%
  • AMZN: 49.39%
  • JPM: 2.28%

These weights sum to 100% and represent the allocation that yields the highest Sharpe ratio (risk-adjusted return). The portfolio is heavily concentrated in AMZN and GOOGL, consistent with their strong historical returns in the 2-year period.

Optimal weights for the maximum Sharpe ratio portfolio:

  • AAPL: 0.82%
  • MSFT: 0.94%
  • GOOGL: 46.58%
  • AMZN: 49.39%
  • JPM: 2.28%

Score cards for qwen3-coder-next for Portfolio Optimization in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major steps were completed: data download, returns/correlation matrix, Monte Carlo efficient frontier, max Sharpe identification, and optimal weights output.

  • Execution Correctness: 2/2

    Excellent

    The code is coherent and the outputs shown are consistent with the implemented workflow, including valid shapes, matrix values, and portfolio statistics.

  • Output Quality: 3/3

    Excellent

    The notebook produced the expected artifacts semantically: price dataframe, 5x5 correlation matrix, efficient frontier scatter plot, and printed optimal weights with Sharpe/return/volatility.

  • Reasoning Quality: 2/2

    Excellent

    The explanations correctly describe each step and the portfolio interpretation, with no major logical issues visible in the provided content.

  • Reliability: 1/1

    Excellent

    The workflow is reasonably robust and reproducible, using a fixed random seed and standard portfolio calculations.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.