Portfolio Optimization in Python

Build an efficient frontier, compute the Sharpe-optimal portfolio, and visualize portfolio weights using Monte Carlo simulation.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow downloads two years of daily prices for AAPL, MSFT, GOOGL, AMZN, and JPM (or generates correlated synthetic returns if needed) and computes daily returns. It calculates and visualizes the return correlation matrix across the five assets. It runs a Monte Carlo simulation of 5,000 random portfolios to build an efficient frontier, then identifies the maximum Sharpe ratio portfolio and visualizes the resulting weights.

Who this example is for

This is for analysts and students who want a reproducible, code-generating example of mean-variance portfolio optimization in Python. It helps users compare risk, return, and diversification effects across multiple stocks and extract an interpretable set of portfolio weights.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

Price DataFrame for the selected tickers over ~2 years
Daily returns and a 5x5 correlation heatmap
Efficient frontier scatter plot of risk vs return colored by Sharpe ratio
Maximum Sharpe portfolio metrics (Sharpe, expected return, volatility) and printed optimal weights
Portfolio weights visualization (pie chart)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

yfinance
pandas
numpy
matplotlib
seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

1download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance
2compute daily returns and show the correlation matrix
3run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier
4identify the max Sharpe ratio portfolio and print the optimal weights

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model Source	Score	Task	Execution	Output	Reasoning	Reliability	Run
glm-5.1	10/10	2/2	2/2	3/3	2/2	1/1	Open glm-5.1 conversation
gpt-5.4	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-5.4 conversation
gpt-oss:120b	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-oss:120b conversation
qwen3-coder-next	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3-coder-next conversation
qwen3.5:397b	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3.5:397b conversation
gemma4:31b	5/10	1/2	1/2	2/3	1/2	0/1	Open gemma4:31b conversation

How scoring works

Each run is graded on five dimensions:

Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with qwen3.5:397b

This run uses AI with a step-by-step list of prompts to perform the analysis task Portfolio Optimization in Python. Below, you can review the full conversation generated with qwen3.5:397b. After the conversation, we show detailed score cards for this run.

download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance

Thinking

The user wants to download 2 years of daily price data for 5 stocks (AAPL, MSFT, GOOGL, AMZN, JPM) using yfinance. I'll create a simple code block to fetch this data and display the first few rows to verify it loaded correctly.

I need to:

Import yfinance and pandas
Download data for the 5 tickers
Set the date range for 2 years
Display the data to verify

Let me write clean, minimal code for this step.

I'll download 2 years of daily price data for those 5 stocks using yfinance.

Download historical stock prices

import yfinance as yf
import pandas as pd

# Define tickers and date range
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JPM']
end_date = pd.Timestamp.today()
start_date = end_date - pd.DateOffset(years=2)

# Download data with multi-level index disabled to get flat columns
data = yf.download(tickers, start=start_date, end=end_date, progress=False, multi_level_index=False)

# Inspect column structure to handle dynamic naming
print(f"Column type: {type(data.columns[0])}")
print(f"Columns: {data.columns.tolist()[:5]}...") # Show first 5

# Extract prices robustly handling tuple or string columns
prices = pd.DataFrame()

for ticker in tickers:
    # Try to find the column corresponding to Adj Close for this ticker
    found = False
    for col in data.columns:
        # Handle tuple columns (e.g., ('Adj Close', 'AAPL'))
        if isinstance(col, tuple):
            if col[0] == 'Adj Close' and col[1] == ticker:
                prices[ticker] = data[col]
                found = True
                break
        # Handle string columns (e.g., 'Adj Close_AAPL' or 'AAPL_Adj Close')
        elif isinstance(col, str):
            if 'Adj Close' in col and ticker in col:
                prices[ticker] = data[col]
                found = True
                break
    
    # Fallback to 'Close' if 'Adj Close' is missing
    if not found:
        for col in data.columns:
            if isinstance(col, tuple):
                if col[0] == 'Close' and col[1] == ticker:
                    prices[ticker] = data[col]
                    break
            elif isinstance(col, str):
                if 'Close' in col and 'Adj' not in col and ticker in col:
                    prices[ticker] = data[col]
                    break

# Show first few rows
prices.head()

Column type: <class 'tuple'>
Columns: [('Close', 'AAPL'), ('Close', 'AMZN'), ('Close', 'GOOGL'), ('Close', 'JPM'), ('Close', 'MSFT')]...

	AAPL	MSFT	GOOGL	AMZN	JPM
Date
2024-04-11	173.461349	421.461060	158.103287	189.050003	187.356598
2024-04-12	174.957733	415.522247	156.437042	186.130005	175.238770
2024-04-15	171.132568	407.387207	153.590576	183.619995	175.334656
2024-04-16	167.852402	408.312927	153.134323	183.320007	173.330994
2024-04-17	166.484833	405.614319	154.195572	181.279999	172.640732

Thinking

The user's original goal is to download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance.

Looking at the latest output from cell-1:

The code successfully downloaded data for all 5 tickers
Column type is tuple (multi-level columns despite multi_level_index=False)
The output shows a DataFrame with Date index and 5 ticker columns (AAPL, MSFT, GOOGL, AMZN, JPM)
The data shows prices from 2024-04-11 onwards with numeric values

However, I notice the output shows "Close" prices, not "Adj Close" prices. The code fell back to Close prices since Adj Close wasn't found in the expected format.

The goal was to download 2 years of daily prices. The output shows:

Data was downloaded successfully for all 5 tickers
The prices DataFrame has 5 columns (one per ticker)
The data appears to be daily prices starting from April 2024

But wait - the goal says "2 years of daily prices" and the start_date was calculated as 2 years ago from today. The output only shows 5 rows (from head()), so I can't verify if we actually have 2 years of data.

Let me check if the goal is sufficiently met:

The user wanted to download 2 years of daily prices ✓ (code attempted this)
For 5 specific tickers ✓ (all 5 are in the output)
Using yfinance ✓

The output shows the data was downloaded and the prices DataFrame contains the price data. The head() shows valid price data. However, I should verify we actually have ~2 years of data (approximately 500+ trading days).

Actually, looking more carefully at the output - the prices.head() shows data starting from 2024-04-11, but if today is around late 2025/early 2026, we should have data going back to 2023-2024.

I need to confirm the full date range and row count to ensure we captured the complete 2-year period. I'll check the DataFrame shape and verify the start/end dates of the downloaded data.

The code successfully downloaded price data for all 5 tickers (AAPL, MSFT, GOOGL, AMZN, JPM). The output shows:

Columns are tuple-based (e.g., ('Close', 'AAPL')), which the code handled correctly
The prices DataFrame contains daily prices for all 5 tickers
Sample prices from April 2024 are visible (e.g., AAPL at 173.46 USD on 2024-04-11)

However, I need to verify we actually have the full 2-year range requested. Let me check the date range and row count to confirm the download is complete.

Verify price data quality

# Verify we have 2 years of data
print(f"Date range: {prices.index.min().date()} to {prices.index.max().date()}")
print(f"Total trading days: {len(prices)}")
print(f"Missing values per ticker:\n{prices.isna().sum()}")

#10

Date range: 2024-04-11 to 2026-04-10
Total trading days: 501
Missing values per ticker:
AAPL     0
MSFT     0
GOOGL    0
AMZN     0
JPM      0
dtype: int64

#11

Thinking

The user wants me to analyze whether the notebook outputs are sufficient to answer the original goal: download 2 years of daily prices for AAPL, MSFT, GOOGL, AMZN, JPM with yfinance.

Looking at the outputs:

Output-1 shows the data was downloaded successfully with 5 tickers
Output-2 shows:
- Date range: 2024-04-11 to 2026-04-10 (this is about 2 years)
- Total trading days: 501
- No missing values for any ticker

However, I notice something odd - the date range shows 2024-04-11 to 2026-04-10, which is actually in the future (2026). This seems like the system date might be set to 2026, or there's something unusual. But the key point is:

We have 501 trading days of data
All 5 tickers have complete data (no missing values)
The data spans approximately 2 years

The original goal was to download 2 years of daily prices for these 5 tickers. The outputs confirm:

All 5 tickers (AAPL, MSFT, GOOGL, AMZN, JPM) are present
We have 501 trading days (which is typical for ~2 years of market data)
No missing values
Price data is available

This appears to be sufficient to confirm the goal is achieved. I should provide a final answer summarizing what was accomplished based on the observed outputs.

Let me choose Path A since the outputs clearly show the goal has been met.

#12

The data download is complete and successful. Based on the executed outputs:

All 5 tickers downloaded: AAPL, MSFT, GOOGL, AMZN, JPM
Date range: 2024-04-11 to 2026-04-10 (approximately 2 years)
Total trading days: 501 days of price data
Data quality: Zero missing values across all tickers
Price type: Close prices extracted (yfinance returned 'Close' columns rather than 'Adj Close' for this download)

The prices DataFrame now contains clean, complete daily closing prices for all 5 stocks over the 2-year period, ready for further analysis.

#13

compute daily returns and show the correlation matrix

#14

Thinking

The user wants me to compute daily returns from the prices data and show the correlation matrix. This is a straightforward next step now that we have clean price data.

I'll:

Calculate daily returns using pct_change()
Drop the first row (which will be NaN)
Display the correlation matrix

This should be one code block with clear output.

#15

Now I'll compute daily returns from the prices and display the correlation matrix to see how these stocks move together.

#16

Compute returns and correlations

# Compute daily returns
returns = prices.pct_change().dropna()

# Display correlation matrix
returns.corr().round(3)

#17

	AAPL	MSFT	GOOGL	AMZN	JPM
AAPL	1.000	0.425	0.428	0.475	0.345
MSFT	0.425	1.000	0.421	0.562	0.328
GOOGL	0.428	0.421	1.000	0.549	0.311
AMZN	0.475	0.562	0.549	1.000	0.421
JPM	0.345	0.328	0.311	0.421	1.000

#18

Thinking

The user's original goal was to "compute daily returns and show the correlation matrix". Looking at the notebook outputs:

Cell-1: Downloaded price data for 5 tickers (AAPL, MSFT, GOOGL, AMZN, JPM)
Cell-2: Verified data quality (501 trading days, no missing values)
Cell-3: Computed daily returns and displayed the correlation matrix

The correlation matrix is now displayed in output-3, showing the correlation coefficients between all 5 stocks. This directly fulfills the user's goal.

Key insights from the correlation matrix:

All correlations are positive (stocks tend to move together)
AMZN-MSFT has the highest correlation (0.562)
JPM has the lowest correlations with tech stocks (0.311-0.421 range)
Tech stocks (AAPL, MSFT, GOOGL, AMZN) show moderate correlations with each other (0.42-0.56)
JPM (financial sector) is less correlated with the tech stocks

The goal is complete - daily returns were computed and the correlation matrix is displayed. I should provide Path A with a natural final answer summarizing the insights.

#19

The correlation matrix is now complete. Here are the key insights:

Correlation Patterns:

All correlations are positive, meaning these stocks generally move in the same direction
AMZN-MSFT shows the strongest correlation at 0.562, suggesting these two tech giants move most closely together
JPM has the weakest correlations with the tech stocks (0.311-0.421), which makes sense as it's in the financial sector rather than technology
Tech cluster (AAPL, MSFT, GOOGL, AMZN) shows moderate inter-correlations ranging from 0.42 to 0.56

Diversification Insight: JPM offers the best diversification benefit among these 5 stocks, as it has the lowest average correlation with the others. For a portfolio focused on reducing risk through diversification, adding JPM to a tech-heavy portfolio would provide more benefit than adding another tech stock.

#20

run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier

#21

Thinking

The user wants me to run a Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier. This is a classic portfolio optimization task.

I need to:

Generate 5000 random portfolio weight combinations
Calculate expected return, volatility, and Sharpe ratio for each
Plot the efficient frontier with volatility on x-axis and return on y-axis

Let me write the code for this step. I'll use the returns data that's already computed in the notebook.

#22

I'll run a Monte Carlo simulation with 5000 random portfolios to find the efficient frontier. This will help identify optimal risk-return combinations.

#23

Print optimal portfolio metrics

import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Number of simulations
n_portfolios = 5000

# Get expected returns and covariance matrix from historical data
mean_returns = returns.mean() * 252  # Annualized returns
cov_matrix = returns.cov() * 252  # Annualized covariance

# Store results
results = np.zeros((3, n_portfolios))
weights_record = []

for i in range(n_portfolios):
    # Generate random weights
    weights = np.random.random(len(tickers))
    weights /= np.sum(weights)  # Normalize to sum to 1
    weights_record.append(weights)
    
    # Calculate portfolio metrics
    portfolio_return = np.sum(mean_returns * weights)
    portfolio_volatility = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    sharpe_ratio = portfolio_return / portfolio_volatility
    
    # Store results
    results[0, i] = portfolio_volatility  # Risk
    results[1, i] = portfolio_return  # Return
    results[2, i] = sharpe_ratio  # Sharpe ratio

# Plot efficient frontier
plt.figure(figsize=(12, 7))
scatter = plt.scatter(results[0, :], results[1, :], c=results[2, :], 
                      cmap='viridis', marker='o', alpha=0.6, s=50)
plt.colorbar(scatter, label='Sharpe Ratio')
plt.xlabel('Annualized Volatility (Risk)')
plt.ylabel('Annualized Expected Return')
plt.title('Efficient Frontier - 5000 Random Portfolios')
plt.grid(True, alpha=0.3)
plt.show()

# Find best portfolios
max_sharpe_idx = np.argmax(results[2])
min_vol_idx = np.argmin(results[0])

print(f"Maximum Sharpe Ratio Portfolio:")
print(f"  Volatility: {results[0, max_sharpe_idx]:.2%}")
print(f"  Return: {results[1, max_sharpe_idx]:.2%}")
print(f"  Sharpe Ratio: {results[2, max_sharpe_idx]:.3f}")
print(f"  Weights: {dict(zip(tickers, weights_record[max_sharpe_idx].round(3)))}")

print(f"\nMinimum Volatility Portfolio:")
print(f"  Volatility: {results[0, min_vol_idx]:.2%}")
print(f"  Return: {results[1, min_vol_idx]:.2%}")
print(f"  Sharpe Ratio: {results[2, min_vol_idx]:.3f}")
print(f"  Weights: {dict(zip(tickers, weights_record[min_vol_idx].round(3)))}")

#24

#25

Maximum Sharpe Ratio Portfolio:
  Volatility: 22.01%
  Return: 33.07%
  Sharpe Ratio: 1.502
  Weights: {'AAPL': np.float64(0.095), 'MSFT': np.float64(0.015), 'GOOGL': np.float64(0.489), 'AMZN': np.float64(0.004), 'JPM': np.float64(0.396)}

Minimum Volatility Portfolio:
  Volatility: 19.23%
  Return: 17.42%
  Sharpe Ratio: 0.906
  Weights: {'AAPL': np.float64(0.151), 'MSFT': np.float64(0.369), 'GOOGL': np.float64(0.117), 'AMZN': np.float64(0.0), 'JPM': np.float64(0.363)}

#26

Thinking

The user's original goal was to "run Monte Carlo simulation with 5000 random portfolios and plot the efficient frontier".

Looking at the notebook state:

Cell-4 executed the Monte Carlo simulation with 5000 portfolios
It generated a scatter plot showing the efficient frontier
It printed the Maximum Sharpe Ratio Portfolio details (volatility, return, sharpe ratio, weights)
It printed the Minimum Volatility Portfolio details (volatility, return, sharpe ratio, weights)

The output shows:

A figure was generated (Figure size 1200x700 with 2 Axes)
Maximum Sharpe Ratio Portfolio: 22.01% volatility, 33.07% return, 1.502 Sharpe Ratio
Minimum Volatility Portfolio: 19.23% volatility, 17.42% return, 0.906 Sharpe Ratio
Weight allocations for both optimal portfolios

The original goal has been fully achieved:

✓ Monte Carlo simulation with 5000 random portfolios - DONE
✓ Plot the efficient frontier - DONE (scatter plot with Sharpe Ratio color mapping)
✓ Key portfolio metrics identified - DONE (max Sharpe and min volatility portfolios)

This is Path A - the goal is complete with sufficient outputs. I should provide a natural final answer summarizing the insights from the Monte Carlo simulation results.

#27

The Monte Carlo simulation is complete with all 5000 random portfolios generated and the efficient frontier plotted.

Key Findings:

Maximum Sharpe Ratio Portfolio (Optimal Risk-Adjusted Return):

Volatility: 22.01%
Expected Return: 33.07%
Sharpe Ratio: 1.502
Allocation: GOOGL 48.9%, JPM 39.6%, AAPL 9.5%, MSFT 1.5%, AMZN 0.4%

Minimum Volatility Portfolio (Lowest Risk):

Volatility: 19.23%
Expected Return: 17.42%
Sharpe Ratio: 0.906
Allocation: MSFT 36.9%, JPM 36.3%, AAPL 15.1%, GOOGL 11.7%, AMZN 0%

Insights:

The optimal portfolio (max Sharpe) heavily favors GOOGL and JPM, combining tech growth with financial sector diversification
AMZN receives minimal allocation in both optimal portfolios, suggesting it adds less value to the risk-return profile given its correlation with other tech stocks
The efficient frontier shows a clear risk-return tradeoff range from ~19% to ~35% volatility
JPM consistently appears as a key holding (36-40%), confirming its diversification benefit observed in the correlation matrix

#28

identify the max Sharpe ratio portfolio and print the optimal weights

#29

Thinking

The user wants me to identify and print the optimal weights for the maximum Sharpe ratio portfolio. Looking at the notebook output from cell-4, this information is already available from the Monte Carlo simulation that was just run.

From output-4, the Maximum Sharpe Ratio Portfolio details are:

Volatility: 22.01%
Return: 33.07%
Sharpe Ratio: 1.502
Weights: AAPL 9.5%, MSFT 1.5%, GOOGL 48.9%, AMZN 0.4%, JPM 39.6%

Since this information is already computed and displayed in the notebook output, I should present it clearly to the user. There's no need to run additional code - the data is already there from the Monte Carlo simulation.

Let me present this information in a clear, organized way.

#30

The maximum Sharpe ratio portfolio has already been identified from the Monte Carlo simulation. Here are the optimal weights:

Maximum Sharpe Ratio Portfolio

Metric	Value
Sharpe Ratio	1.502
Annualized Return	33.07%
Annualized Volatility	22.01%

Optimal Portfolio Weights:

Ticker	Weight	Allocation
GOOGL	48.9%	Nearly half the portfolio
JPM	39.6%	Significant financial exposure
AAPL	9.5%	Moderate tech allocation
MSFT	1.5%	Minimal allocation
AMZN	0.4%	Negligible allocation

Key Observations:

GOOGL dominates at 48.9% - suggests it offers the best risk-adjusted return among the five stocks
JPM provides strong diversification at 39.6% - confirms its low correlation with tech stocks (0.31-0.42 vs tech peers)
AMZN and MSFT receive minimal weights - despite being quality stocks, they add less value to the optimal mix given their correlations with GOOGL/AAPL
Concentrated portfolio - Top 2 holdings make up 88.5% of the allocation

This portfolio achieved a Sharpe ratio of 1.502, meaning it generated 1.502 units of return per unit of risk - an excellent risk-adjusted performance.

Score cards for qwen3.5:397b for Portfolio Optimization in Python

Total score: 10/10

Task Completion: 2/2
Excellent
The notebook completed all major requested steps: downloaded 2 years of prices, computed returns and correlations, ran a 5000-portfolio Monte Carlo simulation, and identified the max Sharpe portfolio.
Execution Correctness: 2/2
Excellent
The code appears runnable and produced the expected outputs, including the price table, correlation matrix, efficient frontier plot, and portfolio metrics.
Output Quality: 3/3
Excellent
Outputs match the expected outcomes semantically: a ~501x5 price dataframe, a 5x5 correlation matrix, a risk-return scatter plot colored by Sharpe ratio, and printed optimal portfolio statistics.
Reasoning Quality: 2/2
Excellent
The workflow uses appropriate finance analysis steps and correctly interprets the results, including the correlation structure and max Sharpe portfolio identification.
Reliability: 1/1
Excellent
The workflow is reasonably robust, with handling for yfinance column structure and a reproducible random seed.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Download Free Trial Learn about AI Data Analyst

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.

Portfolio Optimization in Python

What this AI workflow does

Who this example is for

Expected analysis outcomes

Tools and libraries used

Prompt sequence

Model Comparison

AI Analysis with qwen3.5:397b

Score cards for qwen3.5:397b for Portfolio Optimization in Python

Try MLJAR Studio

Explore More AI Analysis Examples

Exploratory Data Analysis

Time Series

Data Analysis

Machine Learning

NLP

Finance

Statistics