Finance

Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

What

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Tools

yfinance
pandas
numpy
matplotlib
seaborn

Outcomes

Price DataFrame for the selected tickers over ~2 years
Daily returns and a 5x5 correlation heatmap
Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
Portfolio weights visualization (pie chart)

Quality Score

7/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

Task Completion: 1/2
Needs work
Downloaded prices, computed returns/correlation, ran 5000-portfolio Monte Carlo, and identified max-Sharpe weights. However, the expected optimal-weights pie chart is not present, and the price dataframe is not in the expected ~(504, 5) form (it includes 30 OHLCV fields).
Execution Correctness: 2/2
Excellent
Code shown is coherent and likely runnable: yfinance download, adjusted-close extraction, returns/covariance, simulation loop, plotting, and summary extraction are syntactically correct.
Output Quality: 2/3
Good
Efficient frontier scatter (risk vs return colored by Sharpe) and a 5x5 correlation matrix are produced, and max-Sharpe metrics/weights are printed. But the correlation heatmap is not shown (only a table), and the weights pie chart is missing; prices output shape differs from the expected single-price 5-column dataframe.
Reasoning Quality: 1/2
Needs work
Reasoning about using adjusted close and annualization is correct, but several assistant messages claim outputs are already present rather than directly providing the requested artifacts (notably the missing heatmap/pie chart).
Reliability: 1/1
Excellent
Workflow uses standard, deterministic steps (seeded simulation) and avoids hallucinated APIs; outputs align with the code provided, though it omits some requested visualizations.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance

Thinking

Creating a code response

I need to respond with a simple code snippet. The user wants to download two years of daily prices using yfinance, so I'll write code to do that.

Since the dataset isn't loaded yet, I should first load it and inspect its shape, columns, and head. The code needs to import yfinance and pandas, download the adjusted daily prices, perhaps keeping the adjusted close as standard. I want to ensure the final output is minimal and runnable. Let's get this code ready!Designing the download code

I'm thinking about how to structure the code for downloading stock prices. Using the 'Close' price seems clean, so I’ll only focus on that to create a DataFrame of daily close prices. I can set the period to ‘2y’ and the interval to ‘1d’. After downloading, I should inspect the shape, columns, and the head of the DataFrame. I’ll assign the variables for tickers and prices, and I'll make sure everything is concise in one code block. Let's put this all together!

Let’s start by downloading the daily close prices for those 5 stocks over the last 2 years and inspecting the result.

Download stock closing prices

import yfinance as yf
from IPython.display import display

tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "JPM"]

prices = yf.download(
    tickers=tickers,
    period="2y",
    interval="1d",
    auto_adjust=False,
    progress=False
)["Close"]

display(prices.shape)
display(prices.columns.tolist())
prices.head()

(502, 5)

['AAPL', 'AMZN', 'GOOGL', 'JPM', 'MSFT']

Ticker	AAPL	AMZN	GOOGL	JPM	MSFT
Date
2024-04-09	169.669998	185.669998	156.600006	197.149994	426.279999
2024-04-10	167.779999	185.949997	156.139999	195.470001	423.260010
2024-04-11	175.039993	189.050003	159.410004	195.429993	427.929993
2024-04-12	176.550003	186.130005	157.729996	182.789993	421.899994
2024-04-15	172.690002	183.619995	154.860001	182.889999	413.640015

You now have the requested 2 years of daily price data in prices.

From the observed output:

The table has 502 trading days and 5 ticker columns.
The downloaded symbols are exactly AAPL, AMZN, GOOGL, JPM, MSFT.
The data is indexed by Date, so each row is one trading day.
The values shown are the daily Close prices from yfinance because the code selected ["Close"].

A sample of the first available dates confirms the structure is correct, with one closing price per stock per day starting in April 2024.

compute daily returns and show the correlation matrix

#10

Thinking

Creating returns and correlation matrix

I need to work through the process step by step. First, I’ll provide code to compute daily returns and show a correlation matrix, ideally in a table format. Since the user specifically asked for this, I think I should compute the returns variable and display the correlation matrix afterward. I might not need to show the head of returns unless it’s particularly helpful. So, I’ll focus on crafting a concise code block for these tasks.

#11

I’ll compute simple daily returns with pct_change() and then show the return correlation matrix.

#12

Compute return correlations

daily_returns = prices.pct_change().dropna()

corr_matrix = daily_returns.corr()
corr_matrix

#13

Ticker	AAPL	AMZN	GOOGL	JPM	MSFT
Ticker
AAPL	1.000000	0.476646	0.430903	0.344002	0.426318
AMZN	0.476646	1.000000	0.551856	0.418373	0.567576
GOOGL	0.430903	0.551856	1.000000	0.307970	0.424181
JPM	0.344002	0.418373	0.307970	1.000000	0.326947
MSFT	0.426318	0.567576	0.424181	0.326947	1.000000

#14

The daily returns were computed correctly as percentage changes from one trading day to the next, leaving 501 return rows for the 5 stocks.

The correlation matrix shows how similarly the stocks’ daily returns moved over the last 2 years:

Strongest relationship: AMZN and MSFT = 0.568
Next strongest: AMZN and GOOGL = 0.552
AAPL and AMZN = 0.477
AAPL and GOOGL = 0.431
AAPL and MSFT = 0.426

JPM is the least correlated with the large-cap tech names:

JPM and GOOGL = 0.308 lowest
JPM and MSFT = 0.327
JPM and AAPL = 0.344
JPM and AMZN = 0.418

What this means:

The four tech-related stocks tend to move together moderately, especially AMZN, MSFT, and GOOGL.
JPM is less synchronized with them, so it offers the most diversification in this small group.
None of the correlations are extremely high, so while these names are related, they are not moving in lockstep day to day.

#15

run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier

#16

We already have daily_returns, so the next step is to simulate 5,000 random long-only portfolios using annualized return and risk.

#17

Simulate portfolio returns

import numpy as np
import pandas as pd

np.random.seed(42)

annual_returns = daily_returns.mean() * 252
annual_cov = daily_returns.cov() * 252

n_portfolios = 5000
symbols = daily_returns.columns.tolist()

results = []

for _ in range(n_portfolios):
    weights = np.random.random(len(symbols))
    weights = weights / weights.sum()
    
    port_return = np.dot(weights, annual_returns)
    port_vol = np.sqrt(weights @ annual_cov @ weights)
    sharpe = port_return / port_vol
    
    row = {
        "return": port_return,
        "volatility": port_vol,
        "sharpe": sharpe,
    }
    row.update({f"w_{s}": w for s, w in zip(symbols, weights)})
    results.append(row)

portfolios = pd.DataFrame(results)
portfolios.head()

#18

	return	volatility	sharpe	w_AAPL	w_AMZN	w_GOOGL	w_JPM	w_MSFT
0	0.236918	0.225437	1.050929	0.133197	0.338101	0.260318	0.212900	0.055485
1	0.212930	0.203694	1.045342	0.065285	0.024308	0.362501	0.251571	0.296334
2	0.234236	0.248419	0.942906	0.009284	0.437468	0.375464	0.095773	0.082010
3	0.226770	0.209914	1.080301	0.105673	0.175297	0.302353	0.248877	0.167800
4	0.193561	0.202476	0.955969	0.327909	0.074759	0.156568	0.196343	0.244421

#19

Thinking

Analyzing notebook outputs

I need to provide insights and a code snippet to analyze the latest notebook outputs against the original user goal. The response should feature exactly one concise code block. It has to follow notebook rules, focusing on minimal output and concluding with one expression. I’ll use matplotlib for a scatter plot, highlighting maximum Sharpe ratio and minimum volatility, ending with plt.show(). I’ll compose a brief response with observations and the code.

#20

The simulation is in place: the sample portfolios already show annualized expected returns around 0.19 to 0.24, volatility around 0.20 to 0.25, and Sharpe ratios near 1.0 in the first few rows. The missing piece for your original goal is the efficient frontier plot.

#21

Plot efficient frontier

import matplotlib.pyplot as plt
import seaborn as sns

max_sharpe = portfolios.loc[portfolios["sharpe"].idxmax()]
min_vol = portfolios.loc[portfolios["volatility"].idxmin()]

plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=portfolios,
    x="volatility",
    y="return",
    hue="sharpe",
    palette="viridis",
    s=20,
    edgecolor=None
)

plt.scatter(max_sharpe["volatility"], max_sharpe["return"], color="red", s=120, marker="*", label="Max Sharpe")
plt.scatter(min_vol["volatility"], min_vol["return"], color="blue", s=100, marker="X", label="Min Volatility")

plt.title("Efficient Frontier - 5000 Random Portfolios")
plt.xlabel("Annualized Volatility")
plt.ylabel("Annualized Return")
plt.legend()
plt.tight_layout()
plt.show()

#22

#23

Thinking

Analyzing correlations

I'm noticing that the correlations are moderately positive, particularly between AMZN and MSFT (0.57) and AMZN and GOOGL (0.55). In contrast, JPM shows a lower correlation range (0.31-0.42), which can support diversification strategies. I should tie this back to the efficient frontier interpretation to provide a clearer picture. I want to ensure I'm concise in my explanations while capturing the significant relationships.

#24

You now have the Monte Carlo simulation and the efficient frontier plot for 5,000 random portfolios built from AAPL, AMZN, GOOGL, JPM, and MSFT.

From the observed outputs:

The simulation used 5 assets over 502 daily price observations.
Asset correlations are all positive, but not identical, which is exactly what makes portfolio simulation useful.
- The strongest co-movement is between AMZN and MSFT at about 0.57, and AMZN and GOOGL at about 0.55.
- JPM is less correlated with the tech names, roughly 0.31 to 0.42, so it contributes diversification.
The simulated portfolios show annualized expected returns in the sample around 0.19 to 0.24, with annualized volatility around 0.20 to 0.25.
The sample Sharpe ratios are near 1.0, with one shown portfolio already around 1.08, so the portfolio set includes reasonably efficient risk-return combinations.

The plot you produced is the efficient frontier view of those 5,000 random portfolios:

Each point is one random portfolio.
The x-axis is annualized volatility.
The y-axis is annualized return.
Color represents Sharpe ratio.
The red star marks the maximum-Sharpe portfolio.
The blue X marks the minimum-volatility portfolio.

Interpretation:

Portfolios toward the upper-left area are more attractive because they deliver higher return for lower risk.
The maximum-Sharpe point is the best risk-adjusted portfolio among the simulated set.
The minimum-volatility point is the lowest-risk portfolio among the simulated set.
Because JPM is less correlated with the others, it likely helps pull some portfolios leftward on the chart by reducing overall volatility.

So, based on the notebook outputs, the original request has been satisfied: the 5,000-portfolio Monte Carlo simulation was run, and the efficient frontier was plotted successfully.

#25

identify the max Sharpe ratio portfolio and print the optimal weights

#26

Thinking

Finalizing expression display

I need to use display(metrics) and optimal_weights as the final expression. I want to ensure they aren’t comma-separated, and that’s good! I think it's also a good idea to provide a brief explanation before the code so that it’s clear what I'm presenting. That should help the user understand the context better! Now, let's get that explanation ready to go!

#27

The max-Sharpe portfolio is already in portfolios, so this step just extracts it and lists the optimal asset weights clearly.

#28

Select max Sharpe weights

max_sharpe = portfolios.loc[portfolios["sharpe"].idxmax()]

weight_cols = [c for c in portfolios.columns if c.startswith("w_")]
optimal_weights = (
    max_sharpe[weight_cols]
    .rename(lambda c: c.replace("w_", ""))
    .sort_values(ascending=False)
    .to_frame(name="weight")
)

display(max_sharpe[["return", "volatility", "sharpe"]].to_frame().T)
optimal_weights

#29

	return	volatility	sharpe
3277	0.318802	0.225231	1.415443

#30

	weight
GOOGL	0.524667
JPM	0.269374
AAPL	0.160410
AMZN	0.031821
MSFT	0.013728

#31

The max Sharpe ratio portfolio in your simulation has:

Expected annual return: 0.318802
Annualized volatility: 0.225231
Sharpe ratio: 1.415443

The optimal weights are concentrated mostly in GOOGL and JPM, with very small allocations to AMZN and MSFT:

GOOGL: 0.524667
JPM: 0.269374
AAPL: 0.160410
AMZN: 0.031821
MSFT: 0.013728

This means the highest risk-adjusted portfolio among the 5,000 random portfolios is driven primarily by GOOGL, with JPM adding diversification and AAPL as a meaningful secondary allocation.

Portfolio Optimization in Python

What

Who

Tools

Outcomes

Quality Score

Browse All Analysis Examples

Machine Learning

Time Series

Data Analysis

Exploratory Data Analysis

Finance

Nlp

Statistics