Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who this example is for

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

Price DataFrame for the selected tickers over ~2 years
Daily returns and a 5x5 correlation heatmap
Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
Portfolio weights visualization (pie chart)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

yfinance
pandas
numpy
matplotlib
seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

1download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
2compute daily returns and show the correlation matrix
3run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
4identify the max Sharpe ratio portfolio and print the optimal weights

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model Source	Score	Task	Execution	Output	Reasoning	Reliability	Run
glm-5.1	10/10	2/2	2/2	3/3	2/2	1/1	Open glm-5.1 conversation
gpt-5.4	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-5.4 conversation
gpt-oss:120b	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-oss:120b conversation
qwen3-coder-next	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3-coder-next conversation
qwen3.5:397b	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3.5:397b conversation
gemma4:31b	5/10	1/2	1/2	2/3	1/2	0/1	Open gemma4:31b conversation

How scoring works

Each run is graded on five dimensions:

Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-5.4

This run uses AI with a step-by-step list of prompts to perform the analysis task Portfolio Optimization in Python. Below, you can review the full conversation generated with gpt-5.4. After the conversation, we show detailed score cards for this run.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance

Thinking

Creating concise code for data download

I need to provide the user with a single code block that downloads two years of daily prices using yfinance. The steps include importing necessary libraries, loading the data, and displaying the shape and columns, along with the head of the DataFrame. I want to ensure the code is clear and runnable without asking for output. I'll make sure it covers all essential aspects while keeping it straightforward, without unnecessary tables or charts.Creating inspection code for yfinance data

I need to focus on providing just the code for downloading data with yfinance. First, I'll set data = yf.download(...) and then inspect it with data.shape, data.columns, and data.head(). Since the notebook has rules against comma-separated expressions, I’ll ensure each output is a final single expression. It could be better to create a summary DataFrame that combines shape and column counts, then show the head last. I might want to group by ticker, as that could be helpful too!

Here’s the first step: download the daily data into a DataFrame and do a quick inspection.

Download stock price data

import pandas as pd
import yfinance as yf

tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "JPM"]

prices = yf.download(
    tickers=tickers,
    period="2y",
    interval="1d",
    auto_adjust=True,
    progress=False
)

display(pd.DataFrame({"rows": [prices.shape[0]], "columns": [prices.shape[1]]}))
display(pd.DataFrame({"column_names": [str(c) for c in prices.columns]}))
prices.head()

	rows	columns
0	501	25

	column_names
0	('Close', 'AAPL')
1	('Close', 'AMZN')
2	('Close', 'GOOGL')
3	('Close', 'JPM')
4	('Close', 'MSFT')
5	('High', 'AAPL')
6	('High', 'AMZN')
7	('High', 'GOOGL')
8	('High', 'JPM')
9	('High', 'MSFT')
10	('Low', 'AAPL')
11	('Low', 'AMZN')
12	('Low', 'GOOGL')
13	('Low', 'JPM')
14	('Low', 'MSFT')
15	('Open', 'AAPL')
16	('Open', 'AMZN')
17	('Open', 'GOOGL')
18	('Open', 'JPM')
19	('Open', 'MSFT')
20	('Volume', 'AAPL')
21	('Volume', 'AMZN')
22	('Volume', 'GOOGL')
23	('Volume', 'JPM')
24	('Volume', 'MSFT')

Price	Close					High					...	Open					Volume
Ticker	AAPL	AMZN	GOOGL	JPM	MSFT	AAPL	AMZN	GOOGL	JPM	MSFT	...	AAPL	AMZN	GOOGL	JPM	MSFT	AAPL	AMZN	GOOGL	JPM	MSFT
Date
2024-04-11	173.461319	189.050003	158.103287	187.356613	421.461121	173.877544	189.770004	158.371063	188.449533	422.879355	...	166.821749	186.740005	155.623780	187.903073	419.383031	91070300	40020700	27166400	10137700	17966400
2024-04-12	174.957748	186.130005	156.437042	175.238785	415.522278	176.751422	188.380005	158.906637	181.125136	418.752694	...	172.688393	187.720001	156.665168	180.540349	417.639771	101670900	38608800	25353700	31532300	19253800
2024-04-15	171.132568	183.619995	153.590576	175.334671	407.387115	175.037037	188.690002	157.934677	179.715888	420.367869	...	173.778487	187.429993	157.557787	176.878161	420.151194	73531800	48052400	27136500	14766600	20273500
2024-04-16	167.852402	183.320007	153.134338	173.330994	408.312866	172.192889	184.830002	154.374092	175.593501	412.075127	...	170.201022	183.270004	152.926068	175.344232	408.303038	73711200	32891300	20779500	16451800	16765600
2024-04-17	166.484863	181.279999	154.195572	172.640732	405.614349	169.110958	184.570007	155.792375	174.884061	412.547936	...	168.080344	184.309998	154.344336	173.877434	410.942572	50901200	31359700	21763100	9017100	15855500

5 rows × 25 columns

You now have the requested 2 years of daily price data for all five tickers: AAPL, MSFT, GOOGL, AMZN, and JPM.

From the notebook output:

The download returned 501 trading days
The DataFrame has 25 columns, which matches:
- 5 fields: Open, High, Low, Close, Volume
- across 5 tickers: AAPL, AMZN, GOOGL, JPM, MSFT
The columns are stored as a MultiIndex, with levels like ('Close', 'AAPL')
The sample rows confirm the index is daily dates and the values are populated correctly

Because auto_adjust=True was used:

the OHLC price series are adjusted for splits/dividends
there is no separate Adj Close column, which is expected

So the download matches your request: daily historical prices over the last 2 years for the five specified stocks using yfinance.

compute daily returns and show the correlation matrix

#10

Let’s compute daily returns from the adjusted close prices and then show the return correlation matrix.

#11

Compute return correlations

returns = prices["Close"].pct_change().dropna()

corr_matrix = returns.corr()

corr_matrix

#12

Ticker	AAPL	AMZN	GOOGL	JPM	MSFT
Ticker
AAPL	1.000000	0.474660	0.428464	0.344752	0.425048
AMZN	0.474660	1.000000	0.549141	0.420812	0.562355
GOOGL	0.428464	0.549141	1.000000	0.310542	0.421151
JPM	0.344752	0.420812	0.310542	1.000000	0.328321
MSFT	0.425048	0.562355	0.421151	0.328321	1.000000

#13

Using the adjusted close prices, the daily return correlations are now:

Highest correlation: AMZN–MSFT = 0.562
Next highest: AMZN–GOOGL = 0.549
Lowest correlation: GOOGL–JPM = 0.311

What this means:

The large-cap tech names move together moderately, especially AMZN, MSFT, and GOOGL.
JPM is the least aligned with the tech stocks, which makes sense since it is a bank rather than a mega-cap technology company.
None of the correlations are extremely high, so while these stocks often move in the same direction, they still retain meaningful differences in daily behavior.

Observed correlation matrix:

	AAPL	AMZN	GOOGL	JPM	MSFT
AAPL	1.000	0.475	0.428	0.345	0.425
AMZN	0.475	1.000	0.549	0.421	0.562
GOOGL	0.428	0.549	1.000	0.311	0.421
JPM	0.345	0.421	0.311	1.000	0.328
MSFT	0.425	0.562	0.421	0.328	1.000

#14

run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier

#15

Thinking

Considering portfolio simulation steps

I can either use returns if they exist or compute them from prices. I think I'll safely use prices["Close"].pct_change().dropna() to keep things clean. The original goal is to simulate random portfolios and plot the frontier, which should be beginner-friendly. I’ll create one code block, then use seaborn or matplotlib to plot with plt.scatter. I'll ensure the title and labels are clear, mentioning 5000 portfolios, annualizing over 252 trading days with a zero risk-free rate for the Sharpe ratio.

#16

Let’s simulate 5,000 random long-only portfolios from the daily returns and plot the efficient frontier, using annualized return and volatility.

#17

Plot efficient frontier

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

returns = prices["Close"].pct_change().dropna()

mean_returns = returns.mean() * 252
cov_matrix = returns.cov() * 252

n_portfolios = 5000
n_assets = len(mean_returns)

np.random.seed(42)

weights_list = []
portfolio_returns = []
portfolio_vols = []
sharpe_list = []

for _ in range(n_portfolios):
    w = np.random.random(n_assets)
    w = w / w.sum()

    port_return = np.dot(w, mean_returns)
    port_vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
    sharpe = port_return / port_vol

    weights_list.append(w)
    portfolio_returns.append(port_return)
    portfolio_vols.append(port_vol)
    sharpe_list.append(sharpe)

mc_df = pd.DataFrame({
    "Return": portfolio_returns,
    "Volatility": portfolio_vols,
    "Sharpe": sharpe_list
})

max_sharpe_idx = mc_df["Sharpe"].idxmax()
min_vol_idx = mc_df["Volatility"].idxmin()

plt.figure(figsize=(10, 6))
plt.scatter(
    mc_df["Volatility"],
    mc_df["Return"],
    c=mc_df["Sharpe"],
    cmap="viridis",
    s=12,
    alpha=0.7
)
plt.colorbar(label="Sharpe ratio")
plt.scatter(
    mc_df.loc[max_sharpe_idx, "Volatility"],
    mc_df.loc[max_sharpe_idx, "Return"],
    color="red",
    s=120,
    marker="*",
    label="Max Sharpe"
)
plt.scatter(
    mc_df.loc[min_vol_idx, "Volatility"],
    mc_df.loc[min_vol_idx, "Return"],
    color="blue",
    s=80,
    marker="X",
    label="Min Volatility"
)
plt.title("Efficient Frontier from 5,000 Random Portfolios")
plt.xlabel("Annualized Volatility")
plt.ylabel("Annualized Return")
plt.legend()
plt.tight_layout()
plt.show()

#18

#19

Thinking

Crafting a concise explanation

I think the approach should be straightforward since the notebook rules indicate that if multiple tables are needed, they should be displayed sequentially, but in this case, just one table will suffice. I'll focus on keeping the code minimal as well. So, my plan is to create a concise paragraph explaining the context and include the necessary code without any extra headings. That should make it clear and easy to follow!

#20

The simulation and efficient frontier plot are already in place, and the chart indicates two key portfolios were identified: the highest-Sharpe portfolio and the lowest-volatility portfolio. What’s still missing from the visible output is the numeric evidence behind those two points, so the most useful next step is to show their annualized return, volatility, Sharpe ratio, and asset weights.

#21

Summarize optimal portfolios

asset_names = returns.columns.tolist()

summary = pd.DataFrame({
    "Portfolio": ["Max Sharpe", "Min Volatility"],
    "Return": [mc_df.loc[max_sharpe_idx, "Return"], mc_df.loc[min_vol_idx, "Return"]],
    "Volatility": [mc_df.loc[max_sharpe_idx, "Volatility"], mc_df.loc[min_vol_idx, "Volatility"]],
    "Sharpe": [mc_df.loc[max_sharpe_idx, "Sharpe"], mc_df.loc[min_vol_idx, "Sharpe"]],
})

max_w = pd.Series(weights_list[max_sharpe_idx], index=asset_names).rename("Max Sharpe Weight")
min_w = pd.Series(weights_list[min_vol_idx], index=asset_names).rename("Min Volatility Weight")

display(summary.round(4))
pd.concat([max_w, min_w], axis=1).round(4)

#22

	Portfolio	Return	Volatility	Sharpe
0	Max Sharpe	0.3288	0.2216	1.4839
1	Min Volatility	0.1684	0.1927	0.8738

#23

	Max Sharpe Weight	Min Volatility Weight
AAPL	0.0082	0.1899
AMZN	0.0094	0.0021
GOOGL	0.4658	0.1324
JPM	0.4939	0.2890
MSFT	0.0228	0.3865

#24

The Monte Carlo simulation and efficient frontier are supported by the observed outputs.

From the 5,000 random long-only portfolios, the standout portfolio by Sharpe ratio has:

Annualized return: 0.3288
Annualized volatility: 0.2216
Sharpe ratio: 1.4839

Its weights are concentrated mostly in:

JPM: 0.4939
GOOGL: 0.4658

with only small allocations to AAPL, AMZN, and MSFT.

The minimum-volatility portfolio has:

Annualized return: 0.1684
Annualized volatility: 0.1927
Sharpe ratio: 0.8738

Its weights are more diversified, with the largest allocations in:

MSFT: 0.3865
JPM: 0.2890
AAPL: 0.1899
GOOGL: 0.1324

Interpreting the frontier:

The max-Sharpe portfolio offers the strongest return per unit of risk among the simulated portfolios.
The min-volatility portfolio reduces risk, but also gives up a meaningful amount of return.
Based on these simulated results, JPM and GOOGL were the main drivers of the best risk-adjusted portfolio over this 2-year sample.

So the efficient frontier has been produced, and the numeric outputs confirm the key frontier points rather than relying on the plot alone.

#25

identify the max Sharpe ratio portfolio and print the optimal weights

#26

Thinking

Explaining max Sharpe ratio portfolio

I need to base my answer on the notebook's current state. The user is asking me to identify the maximum Sharpe ratio portfolio and print the optimal weights. Since I've already computed that, I can answer directly without needing any code. I should use markdown for clarity and keep it concise while mentioning the max Sharpe value and the weights. It's also good because there are no currency issues to concern myself with.

#27

The max Sharpe ratio portfolio is already identified in the notebook.

Its metrics are:

Annualized return: 0.3288
Annualized volatility: 0.2216
Sharpe ratio: 1.4839

The optimal weights for the max Sharpe portfolio are:

AAPL: 0.0082
AMZN: 0.0094
GOOGL: 0.4658
JPM: 0.4939
MSFT: 0.0228

So this portfolio is heavily concentrated in JPM and GOOGL.

Score cards for gpt-5.4 for Portfolio Optimization in Python

Total score: 10/10

Task Completion: 2/2
Excellent
All major requested steps were completed: data download, daily returns and correlation matrix, Monte Carlo simulation with 5,000 portfolios, efficient frontier plot, and max Sharpe portfolio weights.
Execution Correctness: 2/2
Excellent
The code shown is coherent and likely runnable, with valid yfinance download, return calculation, simulation, plotting, and summary extraction logic.
Output Quality: 3/3
Excellent
The outputs match the expected outcomes semantically: a ~501x25 price frame, a 5x5 correlation matrix, an efficient frontier scatter plot, and printed optimal weights with Sharpe, return, and volatility.
Reasoning Quality: 2/2
Excellent
The notebook provides clear, correct interpretations of the data, correlations, and portfolio results, and the explanations align with the computed outputs.
Reliability: 1/1
Excellent
The workflow is consistent and uses standard methods without obvious hallucinations or unsafe assumptions.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Download Free Trial Learn about AI Data Analyst

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.

Portfolio Optimization in Python

What this AI workflow does

Who this example is for

Expected analysis outcomes

Tools and libraries used

Prompt sequence

Model Comparison

AI Analysis with gpt-5.4

Score cards for gpt-5.4 for Portfolio Optimization in Python

Try MLJAR Studio

Explore More AI Analysis Examples

Exploratory Data Analysis

Time Series

Data Analysis

Machine Learning

NLP

Finance

Statistics