Finance

Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

What

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Tools

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • seaborn

Outcomes

  • Price DataFrame for the selected tickers over ~2 years
  • Daily returns and a 5x5 correlation heatmap
  • Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
  • Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
  • Portfolio weights visualization (pie chart)

Quality Score

7/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

  • Task Completion: 1/2

    Needs work

    Downloaded prices, computed returns/correlation, ran 5000-portfolio Monte Carlo, and identified max-Sharpe weights. However, the expected optimal-weights pie chart is not present, and the price dataframe is not in the expected ~(504, 5) form (it includes 30 OHLCV fields).

  • Execution Correctness: 2/2

    Excellent

    Code shown is coherent and likely runnable: yfinance download, adjusted-close extraction, returns/covariance, simulation loop, plotting, and summary extraction are syntactically correct.

  • Output Quality: 2/3

    Good

    Efficient frontier scatter (risk vs return colored by Sharpe) and a 5x5 correlation matrix are produced, and max-Sharpe metrics/weights are printed. But the correlation heatmap is not shown (only a table), and the weights pie chart is missing; prices output shape differs from the expected single-price 5-column dataframe.

  • Reasoning Quality: 1/2

    Needs work

    Reasoning about using adjusted close and annualization is correct, but several assistant messages claim outputs are already present rather than directly providing the requested artifacts (notably the missing heatmap/pie chart).

  • Reliability: 1/1

    Excellent

    Workflow uses standard, deterministic steps (seeded simulation) and avoids hallucinated APIs; outputs align with the code provided, though it omits some requested visualizations.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
Thinking

Creating a code response

I need to respond with a simple code snippet. The user wants to download two years of daily prices using yfinance, so I'll write code to do that.

Since the dataset isn't loaded yet, I should first load it and inspect its shape, columns, and head. The code needs to import yfinance and pandas, download the adjusted daily prices, perhaps keeping the adjusted close as standard. I want to ensure the final output is minimal and runnable. Let's get this code ready!Designing the download code

I'm thinking about how to structure the code for downloading stock prices. Using the 'Close' price seems clean, so I’ll only focus on that to create a DataFrame of daily close prices. I can set the period to ‘2y’ and the interval to ‘1d’. After downloading, I should inspect the shape, columns, and the head of the DataFrame. I’ll assign the variables for tickers and prices, and I'll make sure everything is concise in one code block. Let's put this all together!

Let’s start by downloading the daily close prices for those 5 stocks over the last 2 years and inspecting the result.

Download stock closing prices
import yfinance as yf
from IPython.display import display

tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "JPM"]

prices = yf.download(
    tickers=tickers,
    period="2y",
    interval="1d",
    auto_adjust=False,
    progress=False
)["Close"]

display(prices.shape)
display(prices.columns.tolist())
prices.head()
(502, 5)
['AAPL', 'AMZN', 'GOOGL', 'JPM', 'MSFT']
Ticker AAPL AMZN GOOGL JPM MSFT
Date
2024-04-09 169.669998 185.669998 156.600006 197.149994 426.279999
2024-04-10 167.779999 185.949997 156.139999 195.470001 423.260010
2024-04-11 175.039993 189.050003 159.410004 195.429993 427.929993
2024-04-12 176.550003 186.130005 157.729996 182.789993 421.899994
2024-04-15 172.690002 183.619995 154.860001 182.889999 413.640015

You now have the requested 2 years of daily price data in prices.

From the observed output:

  • The table has 502 trading days and 5 ticker columns.
  • The downloaded symbols are exactly AAPL, AMZN, GOOGL, JPM, MSFT.
  • The data is indexed by Date, so each row is one trading day.
  • The values shown are the daily Close prices from yfinance because the code selected ["Close"].

A sample of the first available dates confirms the structure is correct, with one closing price per stock per day starting in April 2024.

compute daily returns and show the correlation matrix
Thinking

Creating returns and correlation matrix

I need to work through the process step by step. First, I’ll provide code to compute daily returns and show a correlation matrix, ideally in a table format. Since the user specifically asked for this, I think I should compute the returns variable and display the correlation matrix afterward. I might not need to show the head of returns unless it’s particularly helpful. So, I’ll focus on crafting a concise code block for these tasks.

I’ll compute simple daily returns with pct_change() and then show the return correlation matrix.

Compute return correlations
daily_returns = prices.pct_change().dropna()

corr_matrix = daily_returns.corr()
corr_matrix
Ticker AAPL AMZN GOOGL JPM MSFT
Ticker
AAPL 1.000000 0.476646 0.430903 0.344002 0.426318
AMZN 0.476646 1.000000 0.551856 0.418373 0.567576
GOOGL 0.430903 0.551856 1.000000 0.307970 0.424181
JPM 0.344002 0.418373 0.307970 1.000000 0.326947
MSFT 0.426318 0.567576 0.424181 0.326947 1.000000

The daily returns were computed correctly as percentage changes from one trading day to the next, leaving 501 return rows for the 5 stocks.

The correlation matrix shows how similarly the stocks’ daily returns moved over the last 2 years:

  • Strongest relationship: AMZN and MSFT = 0.568
  • Next strongest: AMZN and GOOGL = 0.552
  • AAPL and AMZN = 0.477
  • AAPL and GOOGL = 0.431
  • AAPL and MSFT = 0.426

JPM is the least correlated with the large-cap tech names:

  • JPM and GOOGL = 0.308 lowest
  • JPM and MSFT = 0.327
  • JPM and AAPL = 0.344
  • JPM and AMZN = 0.418

What this means:

  • The four tech-related stocks tend to move together moderately, especially AMZN, MSFT, and GOOGL.
  • JPM is less synchronized with them, so it offers the most diversification in this small group.
  • None of the correlations are extremely high, so while these names are related, they are not moving in lockstep day to day.
run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier

We already have daily_returns, so the next step is to simulate 5,000 random long-only portfolios using annualized return and risk.

Simulate portfolio returns
import numpy as np
import pandas as pd

np.random.seed(42)

annual_returns = daily_returns.mean() * 252
annual_cov = daily_returns.cov() * 252

n_portfolios = 5000
symbols = daily_returns.columns.tolist()

results = []

for _ in range(n_portfolios):
    weights = np.random.random(len(symbols))
    weights = weights / weights.sum()
    
    port_return = np.dot(weights, annual_returns)
    port_vol = np.sqrt(weights @ annual_cov @ weights)
    sharpe = port_return / port_vol
    
    row = {
        "return": port_return,
        "volatility": port_vol,
        "sharpe": sharpe,
    }
    row.update({f"w_{s}": w for s, w in zip(symbols, weights)})
    results.append(row)

portfolios = pd.DataFrame(results)
portfolios.head()
return volatility sharpe w_AAPL w_AMZN w_GOOGL w_JPM w_MSFT
0 0.236918 0.225437 1.050929 0.133197 0.338101 0.260318 0.212900 0.055485
1 0.212930 0.203694 1.045342 0.065285 0.024308 0.362501 0.251571 0.296334
2 0.234236 0.248419 0.942906 0.009284 0.437468 0.375464 0.095773 0.082010
3 0.226770 0.209914 1.080301 0.105673 0.175297 0.302353 0.248877 0.167800
4 0.193561 0.202476 0.955969 0.327909 0.074759 0.156568 0.196343 0.244421
Thinking

Analyzing notebook outputs

I need to provide insights and a code snippet to analyze the latest notebook outputs against the original user goal. The response should feature exactly one concise code block. It has to follow notebook rules, focusing on minimal output and concluding with one expression. I’ll use matplotlib for a scatter plot, highlighting maximum Sharpe ratio and minimum volatility, ending with plt.show(). I’ll compose a brief response with observations and the code.

The simulation is in place: the sample portfolios already show annualized expected returns around 0.19 to 0.24, volatility around 0.20 to 0.25, and Sharpe ratios near 1.0 in the first few rows. The missing piece for your original goal is the efficient frontier plot.

Plot efficient frontier
import matplotlib.pyplot as plt
import seaborn as sns

max_sharpe = portfolios.loc[portfolios["sharpe"].idxmax()]
min_vol = portfolios.loc[portfolios["volatility"].idxmin()]

plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=portfolios,
    x="volatility",
    y="return",
    hue="sharpe",
    palette="viridis",
    s=20,
    edgecolor=None
)

plt.scatter(max_sharpe["volatility"], max_sharpe["return"], color="red", s=120, marker="*", label="Max Sharpe")
plt.scatter(min_vol["volatility"], min_vol["return"], color="blue", s=100, marker="X", label="Min Volatility")

plt.title("Efficient Frontier - 5000 Random Portfolios")
plt.xlabel("Annualized Volatility")
plt.ylabel("Annualized Return")
plt.legend()
plt.tight_layout()
plt.show()
Thinking

Analyzing correlations

I'm noticing that the correlations are moderately positive, particularly between AMZN and MSFT (0.57) and AMZN and GOOGL (0.55). In contrast, JPM shows a lower correlation range (0.31-0.42), which can support diversification strategies. I should tie this back to the efficient frontier interpretation to provide a clearer picture. I want to ensure I'm concise in my explanations while capturing the significant relationships.

You now have the Monte Carlo simulation and the efficient frontier plot for 5,000 random portfolios built from AAPL, AMZN, GOOGL, JPM, and MSFT.

From the observed outputs:

  • The simulation used 5 assets over 502 daily price observations.
  • Asset correlations are all positive, but not identical, which is exactly what makes portfolio simulation useful.
    • The strongest co-movement is between AMZN and MSFT at about 0.57, and AMZN and GOOGL at about 0.55.
    • JPM is less correlated with the tech names, roughly 0.31 to 0.42, so it contributes diversification.
  • The simulated portfolios show annualized expected returns in the sample around 0.19 to 0.24, with annualized volatility around 0.20 to 0.25.
  • The sample Sharpe ratios are near 1.0, with one shown portfolio already around 1.08, so the portfolio set includes reasonably efficient risk-return combinations.

The plot you produced is the efficient frontier view of those 5,000 random portfolios:

  • Each point is one random portfolio.
  • The x-axis is annualized volatility.
  • The y-axis is annualized return.
  • Color represents Sharpe ratio.
  • The red star marks the maximum-Sharpe portfolio.
  • The blue X marks the minimum-volatility portfolio.

Interpretation:

  • Portfolios toward the upper-left area are more attractive because they deliver higher return for lower risk.
  • The maximum-Sharpe point is the best risk-adjusted portfolio among the simulated set.
  • The minimum-volatility point is the lowest-risk portfolio among the simulated set.
  • Because JPM is less correlated with the others, it likely helps pull some portfolios leftward on the chart by reducing overall volatility.

So, based on the notebook outputs, the original request has been satisfied: the 5,000-portfolio Monte Carlo simulation was run, and the efficient frontier was plotted successfully.

identify the max Sharpe ratio portfolio and print the optimal weights
Thinking

Finalizing expression display

I need to use display(metrics) and optimal_weights as the final expression. I want to ensure they aren’t comma-separated, and that’s good! I think it's also a good idea to provide a brief explanation before the code so that it’s clear what I'm presenting. That should help the user understand the context better! Now, let's get that explanation ready to go!

The max-Sharpe portfolio is already in portfolios, so this step just extracts it and lists the optimal asset weights clearly.

Select max Sharpe weights
max_sharpe = portfolios.loc[portfolios["sharpe"].idxmax()]

weight_cols = [c for c in portfolios.columns if c.startswith("w_")]
optimal_weights = (
    max_sharpe[weight_cols]
    .rename(lambda c: c.replace("w_", ""))
    .sort_values(ascending=False)
    .to_frame(name="weight")
)

display(max_sharpe[["return", "volatility", "sharpe"]].to_frame().T)
optimal_weights
return volatility sharpe
3277 0.318802 0.225231 1.415443
weight
GOOGL 0.524667
JPM 0.269374
AAPL 0.160410
AMZN 0.031821
MSFT 0.013728

The max Sharpe ratio portfolio in your simulation has:

  • Expected annual return: 0.318802
  • Annualized volatility: 0.225231
  • Sharpe ratio: 1.415443

The optimal weights are concentrated mostly in GOOGL and JPM, with very small allocations to AMZN and MSFT:

  • GOOGL: 0.524667
  • JPM: 0.269374
  • AAPL: 0.160410
  • AMZN: 0.031821
  • MSFT: 0.013728

This means the highest risk-adjusted portfolio among the 5,000 random portfolios is driven primarily by GOOGL, with JPM adding diversification and AAPL as a meaningful secondary allocation.

Browse All Analysis Examples