Apple Stock Price Analysis in Python

Load Apple stock price data, compute moving averages, calculate daily returns, visualize volatility, and compute the Sharpe ratio.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow pulls the last two years of AAPL OHLCV data using yfinance and organizes it into a pandas time-series DataFrame. It generates 20-day and 50-day moving averages, plots price with the moving-average overlays, and highlights crossover behavior visually. It computes daily returns, visualizes the return distribution, then calculates 30-day rolling volatility and an annualised Sharpe ratio.

Who this example is for

This is for analysts and students who want a reproducible Python notebook for basic equity time-series analysis on Apple stock. It helps users learn how to compute and interpret moving averages, returns, volatility, and risk-adjusted performance from market data.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • AAPL OHLCV DataFrame for ~2 years of trading days
  • Price chart with MA20 and MA50 overlays and visible crossovers
  • Daily returns series and histogram of returns
  • 30-day rolling volatility plot
  • Printed annualised Sharpe ratio

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • yfinance
  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load AAPL stock prices for the last 2 years using yfinance
  2. 2add 20-day and 50-day moving averages and plot
  3. 3calculate daily returns and plot their distribution
  4. 4compute 30-day rolling volatility and annualised Sharpe ratio

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
glm-5.19/102/22/23/31/21/1Open glm-5.1 conversation
qwen3.5:397b7/102/21/23/31/20/1Open qwen3.5:397b conversation
qwen3-coder-next6/102/21/22/31/20/1Open qwen3-coder-next conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gemma4:31b

This run uses AI with a step-by-step list of prompts to perform the analysis task Apple Stock Price Analysis in Python. Below, you can review the full conversation generated with gemma4:31b. After the conversation, we show detailed score cards for this run.

load AAPL stock prices for the last 2 years using yfinance
Thinking

The user wants to load AAPL stock prices for the last 2 years using the yfinance library.

Plan:

  1. Import yfinance and datetime.
  2. Calculate the start and end dates (today and 2 years ago).
  3. Download the data for 'AAPL'.
  4. Display the basic info: shape, columns, and head().

Constraints:

  • One code block per response.
  • Do not combine first-load inspection with plotting/modeling.
  • Use markdown for text.
  • Currency formatting rule: USD for money. (Though no money values are requested yet, keep in mind).

Step 1: Load and inspect data.

I'll start by downloading the AAPL stock data for the past two years using yfinance and performing an initial inspection of the dataset.

Download AAPL stock data
import yfinance as yf
from datetime import datetime, timedelta

# Define the ticker and the time range
ticker = "AAPL"
end_date = datetime.now()
start_date = end_date - timedelta(days=2*365)

# Download the data
df = yf.download(ticker, start=start_date, end=end_date)

# Basic inspection
print(f"Dataset shape: {df.shape}")
display(df.head())
[*********************100%***********************]  1 of 1 completed
Dataset shape: (500, 5)
Price Close High Low Open Volume
Ticker AAPL AAPL AAPL AAPL AAPL
Date
2024-04-12 174.957748 176.751422 172.638856 172.688393 101670900
2024-04-15 171.132538 175.037006 170.944249 173.778456 73531800
2024-04-16 167.852417 172.192905 166.752427 170.201038 73711200
2024-04-17 166.484848 169.110942 166.484848 168.080328 50901200
2024-04-18 165.533493 167.119069 165.047922 166.514570 43122900
Thinking

The user wanted to load AAPL stock prices for the last 2 years using yfinance. The previous cell successfully downloaded the data. The output shows a DataFrame df with shape (500, 5) containing columns: Close, High, Low, Open, Volume. The date range starts from 2024-04-12 (though it looks like it might be sorted or filtered in a way that the head shows the most recent, but actually, it's showing 2024-04-12 as the first entry in the head, which is strange for a 2y lookback unless it's sorted descending or that's just where the head starts). Wait, yf.download usually returns data sorted by date ascending. If the head shows April 2024, and it's for 2 years, that seems okay, but I need to verify the full range to ensure the "last 2 years" requirement is met. Wait, the prompt asks me to decide between Path A (Final Answer) and Path B (More evidence). The goal was simply "load AAPL stock prices for the last 2 years using yfinance". The code ran, the data is in df, the shape is 500 rows (approx 252 trading days * 2 = 504), and the columns are correct. The objective "load... prices" has been achieved.

Insights:

  • The dataset contains 500 trading days of data.
  • It includes standard OHLCV (Open, High, Low, Close, Volume) columns.
  • Prices are provided for AAPL, with the most recent entries around April 2024 (based on the head output).

Since the goal was just to "load" the data, and it has been loaded and inspected, the goal is complete.

Wait, checking the currency rule: "never use $ for money/currency values in prose. Always write amounts using USD". The prices in the head are around 174.95, 171.13 etc. I will mention the prices using USD.

Final Answer check:

  • Goal: load AAPL stock prices for the last 2 years.
  • Status: Done.
  • Data is in variable df.

Decision: Path A.

The AAPL stock data for the last two years has been successfully loaded into a DataFrame. The dataset contains 500 trading days with Open, High, Low, Close, and Volume metrics. For instance, the closing price on 2024-04-12 was 174.96 USD.

add 20-day and 50-day moving averages and plot
Thinking

The user wants to calculate the 20-day and 50-day moving averages (MA) for AAPL stock prices and then plot them.

Current state:

  • df contains AAPL stock data (Close, High, Low, Open, Volume).
  • The column names seem to be MultiIndex (Price, Ticker). Looking at the output: Price Close AAPL.
  • Wait, yf.download with a single ticker sometimes creates a MultiIndex if not handled, or just standard columns. Let's check the output: Price Close High Low Open Volume Ticker AAPL AAPL AAPL AAPL AAPL This indicates a MultiIndex column structure.

Goal:

  1. Calculate 20-day MA.
  2. Calculate 50-day MA.
  3. Plot the Close price along with these moving averages.

Constraint check:

  • One execution step at a time.
  • One runnable code block.
  • Do not combine calculation and plotting if it violates "one chart OR one table per step" or "keep each step small". However, calculating MAs and plotting them is a single logical unit of "plotting the trend". But the prompt asks to "add ... and plot".
  • Actually, the prompt "add ... and plot" is a request. I should probably calculate first, then plot. Or do both if it's a single plot.
  • Wait, "Default to one chart OR one table per step".
  • Let's do the calculation first to be safe and clean, or just do the plot which requires the calculation.
  • Usually, adding features (columns) is one step, visualizing is another.
  • I will first calculate the moving averages.

Wait, looking at the column structure: df['Close'] might return a DataFrame if it's a MultiIndex. If it's a MultiIndex (Price, Ticker), then df[('Close', 'AAPL')] is the way. Or I can flatten the columns. Let's look at the provided output: Price Close High Low Open Volume Ticker AAPL AAPL AAPL AAPL AAPL The columns are indeed MultiIndex.

Step 1: Calculate the moving averages. I'll use the 'Close' price. Actually, since the user asked for "add ... and plot", and I need to follow the "one primary objective per response" rule, I'll separate them. I'll calculate the MAs now.

Correction: If I calculate and then plot in the next turn, it's cleaner. However, if I do it in one block, I might be doing too much. Let's calculate the moving averages first.

One detail: yfinance recently changed their output format. For a single ticker, it often returns a MultiIndex. I should handle that.

Plan:

  1. Calculate the 20-day and 50-day moving averages of the 'Close' price.
  2. I'll flatten the columns first to make it easier to work with.

Actually, the most efficient way is to just create the new columns. df['MA20'] = df['Close'].rolling(window=20).mean() - but since it's MultiIndex, I need to be specific. If I do df.columns = df.columns.get_level_values(0), I simplify the DataFrame.

Let's do the calculations first.

Wait, the rules say: "Default to one chart OR one table per step." Adding columns doesn't necessarily produce a chart or a table unless I display the head. I'll calculate the MAs and show the head of the resulting dataframe.

Code:

  1. Flatten columns (since they are MultiIndex).
  2. Calculate MA20.
  3. Calculate MA50.
  4. Display head.

Then in the next turn, I will plot.

I'll first flatten the MultiIndex columns to make the data easier to work with, then calculate the 20-day and 50-day moving averages based on the closing price.

Calculate moving averages
# Flatten the MultiIndex columns
df.columns = df.columns.get_level_values(0)

# Calculate moving averages
df['MA20'] = df['Close'].rolling(window=20).mean()
df['MA50'] = df['Close'].rolling(window=50).mean()

# Display the latest rows to verify the calculations
df.tail()
Price Close High Low Open Volume MA20 MA50
Date
2026-04-06 258.859985 262.160004 256.459991 256.510010 29329900 253.480499 260.518867
2026-04-07 253.500000 256.200012 245.699997 256.160004 62148000 253.161499 260.632705
2026-04-08 258.899994 259.750000 256.529999 258.450012 41032800 253.064999 260.707280
2026-04-09 260.489990 261.119995 256.070007 259.000000 28121600 253.048999 260.756509
2026-04-10 260.480011 262.190002 259.019989 259.980011 31259500 253.285000 260.842104
Thinking

The user wants to add 20-day and 50-day moving averages and plot them for AAPL stock prices. Looking at the notebook state:

  • Cell 1: Data was loaded.
  • Cell 2: MA20 and MA50 were calculated and verified using df.tail().
  • Current status: Moving averages are calculated, but the "plot" part of the goal has not been executed yet.

The latest output shows the calculated moving averages:

  • For 2026-04-10, Close is 260.48 USD, MA20 is 253.285 USD, and MA50 is 260.842 USD.

Missing: The plot.

Decision: Path B. Provide a code block to plot the Close price along with MA20 and MA50.

Plan:

  1. Use matplotlib and seaborn (as per visualization policy) to plot Close, MA20, and MA50 over time.
  2. Ensure labels and titles are clear.

The moving averages have been calculated, with the latest closing price of 260.48 USD sitting slightly above the 20-day average (253.29 USD) and very close to the 50-day average (260.84 USD). Now, I will plot these trends to visualize the price action.

Plot stock price trends
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x=df.index, y='Close', label='Close Price', color='blue')
sns.lineplot(data=df, x=df.index, y='MA20', label='20-Day MA', color='orange')
sns.lineplot(data=df, x=df.index, y='MA50', label='50-Day MA', color='green')

plt.title('AAPL Stock Price with 20-Day and 50-Day Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.grid(True)
plt.show()
Thinking

The user wants to add 20-day and 50-day moving averages and plot them. Looking at the notebook state:

  • Cell 1: Downloaded AAPL data.
  • Cell 2: Flattened columns, calculated MA20 and MA50, and displayed the tail of the dataframe.
  • Cell 3: Plotted Close, MA20, and MA50.

Observed outputs:

  • Output 2 shows the numeric values for the last few days:
    • 2026-04-10: Close = 260.480011 USD, MA20 = 253.285000 USD, MA50 = 260.842104 USD.
  • Output 3 shows a figure object <Figure size 1200x600 with 1 Axes>.

The goal was: "add 20-day and 50-day moving averages and plot". The data has been added (verified by df.tail()), and the plot has been generated. Since I have both the numeric values (supporting the calculation) and the resulting chart, the goal is satisfied.

Wait, checking the "Sufficiency heuristics": "If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A".

I have the plot (cell 3) and the numeric values (cell 2). The current state satisfies the user's request.

Final Answer content:

  • The 20-day and 50-day moving averages were successfully added to the dataset.
  • Based on the data, as of 2026-04-10, the closing price is 260.48 USD, which is above the 20-day moving average (253.29 USD) but slightly below the 50-day moving average (260.84 USD).
  • The plot visualizes the price trend relative to these two averages, helping identify short-term and medium-term momentum.

Check currency rule: No $, use USD.

Final Answer candidate: The 20-day and 50-day moving averages have been calculated and plotted. As of the latest data point on 2026-04-10, the closing price is 260.48 USD, which is currently above the 20-day average (253.29 USD) and nearly aligned with the 50-day average (260.84 USD). The resulting visualization shows the stock's price action relative to these trend lines.

<!--final answer-->

The 20-day and 50-day moving averages have been calculated and plotted. As of the latest data point on 2026-04-10, the closing price is 260.48 USD, which is currently above the 20-day average (253.29 USD) and nearly aligned with the 50-day average (260.84 USD). The resulting visualization shows the stock's price action relative to these trend lines, helping to identify current momentum.

calculate daily returns and plot their distribution
Thinking

The user wants to calculate daily returns for the stock (AAPL) and plot their distribution.

- `df`: A DataFrame containing stock data (Close, High, Low, Open, Volume, MA20, MA50).
- The index is the Date.
- `Close` prices are available.

- Step 1: Calculate daily returns. Daily return is typically calculated as the percentage change of the closing price: $(Price_t / Price_{t-1}) - 1$.
- Step 2: Plot the distribution of these returns. A histogram or a KDE (Kernel Density Estimate) plot is the standard way to visualize distribution.

- To calculate daily returns: `df['Daily Return'] = df['Close'].pct_change()`.
- To plot the distribution: use `sns.histplot` or `sns.kdeplot`.

- Propose calculating daily returns and plotting the distribution.
- Provide the code block.
- Ensure currency formatting is avoided if not applicable (returns are percentages, not currency values).

Wait, the user's previous turn was a follow-up analyze, but now it's a new request. The request is: "calculate daily returns and plot their distribution".

Plan:

  1. Calculate the daily percentage change of the 'Close' price.
  2. Create a histogram with a KDE overlay using seaborn to visualize the distribution of these returns.

I will calculate the daily returns based on the closing price and plot their distribution using a histogram with a kernel density estimate (KDE) to see the volatility and spread of the stock's daily movements.

Plot daily returns distribution
# Calculate daily returns
df['Daily Return'] = df['Close'].pct_change()

# Plot the distribution of daily returns
plt.figure(figsize=(10, 6))
sns.histplot(df['Daily Return'].dropna(), kde=True, color='purple', bins=50)

plt.title('Distribution of AAPL Daily Returns')
plt.xlabel('Daily Return')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)
plt.show()
Thinking

The user wants to calculate daily returns and plot their distribution for AAPL stock. Looking at the notebook state:

  • [cell-4] calculates df['Daily Return'] using .pct_change().
  • [cell-4] plots the distribution using sns.histplot with kde=True.
  • [output-4] shows the resulting figure.

The goal is "calculate daily returns and plot their distribution". The latest output is a chart. According to "Sufficiency heuristics": "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart."

The chart shows the distribution, but I don't have the descriptive statistics (mean, std, min, max, skewness, kurtosis) that would normally accompany a distribution analysis to provide concrete insights.

Therefore, I should choose Path B to get the numeric summary of the daily returns.

Plan:

  1. Provide insights from the existing plot (it's a bell-shaped distribution centered around 0).
  2. Provide a code block to calculate summary statistics of the 'Daily Return' column.

The distribution plot shows that AAPL's daily returns are approximately normally distributed and centered around zero, which is typical for stock returns. The presence of a bell-shaped curve suggests that most daily price movements are small, while larger gains or losses occur less frequently. To provide a more precise analysis of the volatility and the extent of these returns, I will calculate the descriptive statistics.

Summarize daily returns
# Calculate summary statistics for daily returns
daily_returns_stats = df['Daily Return'].describe()
display(daily_returns_stats)
count    499.000000
mean       0.000955
std        0.017845
min       -0.092456
25%       -0.006743
50%        0.001317
75%        0.007802
max        0.153288
Name: Daily Return, dtype: float64
Thinking

The user wants to calculate daily returns and plot their distribution for AAPL stock.

Looking at the notebook state:

  • cell-4: Calculated df['Daily Return'] = df['Close'].pct_change() and plotted the distribution using sns.histplot with a KDE. (Output: Figure)
  • cell-5: Calculated daily_returns_stats = df['Daily Return'].describe(). (Output: Series with count, mean, std, min, 25%, 50%, 75%, max)

Evaluation:

  • The original goal was "calculate daily returns and plot their distribution".
  • Daily returns were calculated.
  • The distribution was plotted.
  • Summary statistics were computed to support the visual analysis.

Current state:

  • We have the chart (visual distribution).
  • We have the summary statistics (numeric distribution: mean ~0.09%, std ~1.78%, min -9.2%, max 15.3%).

The outputs are sufficient to answer the original goal. The user has both the visual representation and the quantitative summary of the daily returns distribution.

Decision: Path A.

The daily returns for AAPL have been calculated and visualized. The distribution is approximately normal and centered near zero, with a mean daily return of 0.000955 (approximately 0.1%). The volatility is reflected in the standard deviation of 0.017845 (approximately 1.78%). The returns range from a minimum of -9.25% to a maximum of 15.33%, indicating that while most daily movements are small, the stock has experienced occasional high-volatility swings.

compute 30-day rolling volatility and annualised Sharpe ratio
Thinking

The user wants to compute the 30-day rolling volatility and the annualized Sharpe ratio based on the existing DataFrame df.

- `df` contains 'Close' prices and 'Daily Return'.
- `Daily Return` is already calculated using `pct_change()`.
- The index is dates.

- *Rolling Volatility*: This is typically the rolling standard deviation of daily returns.
- *Annualized Volatility*: Daily volatility $\times \sqrt{252}$ (trading days in a year).
- *Sharpe Ratio*: $(\text{Mean Excess Return}) / (\text{Standard Deviation of Excess Return})$.
- *Annualized Sharpe Ratio*: $(\text{Annualized Return}) / (\text{Annualized Volatility})$ or $\text{Daily Sharpe} \times \sqrt{252}$.
- *Assumption*: For simplicity, if a risk-free rate isn't provided, it's common to assume 0 or a small constant. I will assume 0 unless I specify otherwise, but the formula $\frac{\text{mean}(\text{returns})}{\text{std}(\text{returns})} \times \sqrt{252}$ is the standard quick calculation.

- Step 1: Calculate 30-day rolling standard deviation of 'Daily Return'.
- Step 2: Annualize this volatility (multiply by $\sqrt{252}$).
- Step 3: Calculate the overall annualized Sharpe ratio.

- Since the user asked for "30-day rolling volatility", I should create a column for this.
- Then calculate the aggregate annualized Sharpe ratio for the period.

- Calculate `df['Rolling Volatility'] = df['Daily Return'].rolling(window=30).std() * (252**0.5)`.
- Calculate `annualized_sharpe = (df['Daily Return'].mean() / df['Daily Return'].std()) * (252**0.5)`.

- 1 runnable code block.
- Plain language explanation.
- No `$` for currency (not applicable here, but keep in mind).

*Self-Correction*: The prompt asks for "compute 30-day rolling volatility AND annualised Sharpe ratio". I can do both in one step since they are related calculations.

*Refined Logic*:
- Window = 30.
- Volatility = rolling std * sqrt(252).
- Sharpe = (mean / std) * sqrt(252).

Let's write the code.

I will calculate the 30-day rolling volatility by taking the rolling standard deviation of the daily returns and annualizing it (multiplying by the square root of 252 trading days). I will also compute the annualized Sharpe ratio, assuming a risk-free rate of 0 for this calculation.

Calculate volatility and Sharpe ratio
import numpy as np

# 30-day rolling annualized volatility
df['Rolling Vol'] = df['Daily Return'].rolling(window=30).std() * np.sqrt(252)

# Annualized Sharpe Ratio (assuming risk-free rate = 0)
sharpe_ratio = (df['Daily Return'].mean() / df['Daily Return'].std()) * np.sqrt(252)

print(f"Annualized Sharpe Ratio: {sharpe_ratio:.4f}")
df[['Close', 'Daily Return', 'Rolling Vol']].tail()
Annualized Sharpe Ratio: 0.8498
Price Close Daily Return Rolling Vol
Date
2026-04-06 258.859985 0.011488 0.205846
2026-04-07 253.500000 -0.020706 0.212773
2026-04-08 258.899994 0.021302 0.211730
2026-04-09 260.489990 0.006141 0.211192
2026-04-10 260.480011 -0.000038 0.211038
Thinking

The user goal is to compute the 30-day rolling volatility and the annualized Sharpe ratio.

Looking at the notebook state:

  • [cell-6] calculates df['Rolling Vol'] (30-day rolling annualized volatility).
  • [cell-6] calculates sharpe_ratio (annualized Sharpe ratio).
  • [output-6] shows the result: Annualized Sharpe Ratio: 0.8498 and a table showing the latest Rolling Vol values (e.g., 0.211038 on 2026-04-10).

The results are present and concrete.

  • Annualized Sharpe Ratio: 0.8498.
  • 30-day Rolling Volatility: The latest value is approximately 21.1%.

Since the output contains the requested numeric values, Path A is appropriate.

Insights:

  • The annualized Sharpe ratio of 0.8498 indicates that the stock provided a positive return relative to its volatility over the period, though it's below the typical "good" threshold of 1.0.
  • The rolling volatility has recently stabilized around 21.1%, suggesting a moderate level of risk relative to historical daily swings.

Constraint check:

  • Use USD for money? (No money values in the summary, only ratios).
  • No $ symbol.
  • Path A: Final answer, no code.
  • Include <!--final answer-->.
  • No chain-of-thought/headers.

The 30-day rolling volatility and annualized Sharpe ratio have been computed. The annualized Sharpe ratio is 0.8498, indicating that the asset provided a positive risk-adjusted return over the observed period. The rolling volatility is currently around 21.1%, reflecting the annualized standard deviation of daily returns over the most recent 30 trading days.

Score cards for gemma4:31b for Apple Stock Price Analysis in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major requested steps were completed: data loading, moving averages, daily returns distribution, rolling volatility, and Sharpe ratio.

  • Execution Correctness: 2/2

    Excellent

    The code shown is syntactically coherent and the outputs indicate it ran successfully, including printed statistics and generated plots.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: OHLCV data loaded, MA crossover plot shown, return histogram produced, and rolling volatility with Sharpe ratio printed.

  • Reasoning Quality: 2/2

    Excellent

    The workflow reasoning is clear and appropriately sequences loading, feature engineering, visualization, and risk metrics.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and robust enough for the task, with no evident hallucinations or unsafe behavior.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.