Bitcoin Returns and Volatility Analysis

Analyze Bitcoin historical returns, compute log returns, measure drawdowns, and detect volatility regimes.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads 4-hour Bitcoin historical price data from a CSV URL and plots the closing price time series. It computes log returns, summarizes the return distribution with skewness and kurtosis, and visualizes the distribution. It measures drawdowns over time and uses 30-day rolling volatility to flag high- and low-volatility regimes.

Who this example is for

This is for analysts and researchers who want a reproducible notebook workflow for basic crypto return and risk diagnostics. It helps quantify tail behavior, drawdown risk, and volatility regime shifts from raw price data.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • BTC closing price line chart
  • Log return distribution plot with skewness and kurtosis
  • Maximum drawdown time series chart
  • 30-day rolling volatility chart with high/low regime bands

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scipy

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load Bitcoin price data from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/bitcoin-historical-data/btc_4h_data_2018_to_2025.csv and plot closing prices
  2. 2calculate log returns and plot distribution with kurtosis and skewness
  3. 3compute and plot maximum drawdown over time
  4. 4identify high and low volatility regimes using 30-day rolling volatility

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next9/102/22/23/31/21/1Open qwen3-coder-next conversation
glm-5.16/102/21/22/31/20/1Open glm-5.1 conversation
qwen3.5:397b6/101/22/22/31/20/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-5.4

This run uses AI with a step-by-step list of prompts to perform the analysis task Bitcoin Returns and Volatility Analysis. Below, you can review the full conversation generated with gpt-5.4. After the conversation, we show detailed score cards for this run.

load Bitcoin price data from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/bitcoin-historical-data/btc_4h_data_2018_to_2025.csv and plot closing prices
Thinking

Planning data inspection process

I need to comply with the rules, so I’ll first load and inspect the dataset only if it isn’t already loaded. Since the user asked to load and plot closing prices, I'll clarify that inspection is the first step. Using pandas, I’ll show the shape, columns, and head of the dataset sequentially. This should comply with the guideline of one chart or table per step, keeping it minimal with clear outputs. I’ll make sure to do this effectively!Creating runnable code block

I need to put together one runnable code block that finishes with a single final bare expression. I can start with importing pandas and using the IPython display feature. I’ll include code to read the CSV file from a URL, parsing the 'timestamp' column. Then, I’ll create a summary DataFrame showing the number of rows and columns, display it, and show the columns of the original DataFrame. Finally, I’ll end the block with df.head(), ensuring it's clean and concise—no print statements needed!

Let’s load it and do the required first inspection before plotting.

Load and inspect data
import pandas as pd
from IPython.display import display

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/bitcoin-historical-data/btc_4h_data_2018_to_2025.csv"
df = pd.read_csv(url)

display(pd.DataFrame({"metric": ["rows", "columns"], "value": [df.shape[0], df.shape[1]]}))
display(pd.DataFrame({"column": df.columns}))

df.head()
metric value
0 rows 17958
1 columns 12
column
0 Open time
1 Open
2 High
3 Low
4 Close
5 Volume
6 Close time
7 Quote asset volume
8 Number of trades
9 Taker buy base asset volume
10 Taker buy quote asset volume
11 Ignore
Open time Open High Low Close Volume Close time Quote asset volume Number of trades Taker buy base asset volume Taker buy quote asset volume Ignore
0 2018-01-01 00:00:00.000000 13715.65 13715.65 13155.38 13410.03 1676.204807 2018-01-01 03:59:59.999000 2.251607e+07 19438 739.518666 9.937537e+06 0
1 2018-01-01 04:00:00.000000 13434.98 13818.55 13322.15 13570.35 1302.214836 2018-01-01 07:59:59.999000 1.770254e+07 18061 588.516222 7.998796e+06 0
2 2018-01-01 08:00:00.000000 13569.98 13735.24 13001.13 13220.56 1319.755931 2018-01-01 11:59:59.999000 1.765121e+07 17200 582.668293 7.792109e+06 0
3 2018-01-01 12:00:00.000000 13220.56 13330.00 12750.00 13247.00 1831.933153 2018-01-01 15:59:59.999000 2.391259e+07 21357 830.314747 1.083304e+07 0
4 2018-01-01 16:00:00.000000 13247.00 13290.65 12940.00 13240.37 1092.337234 2018-01-01 19:59:59.999000 1.434887e+07 13178 557.453779 7.324320e+06 0

The data is already in the notebook and the Close column is present, so the remaining step is just to convert the time column and plot the closing prices over time.

Plot Bitcoin closing price over time
import matplotlib.pyplot as plt
import seaborn as sns

df["Open time"] = pd.to_datetime(df["Open time"])

plt.figure(figsize=(14, 5))
sns.lineplot(data=df, x="Open time", y="Close")
plt.title("Bitcoin Closing Price Over Time (4h)")
plt.xlabel("Date")
plt.ylabel("Close Price")
plt.tight_layout()
plt.show()

The data shown in the notebook confirms you loaded a Bitcoin 4-hour OHLCV dataset with 17,958 rows and a Close column, and the plot output shows the closing-price time series was rendered using Open time on the x-axis and Close on the y-axis. That matches the requested task: load the Bitcoin price data from the provided URL and plot closing prices.

calculate log returns and plot distribution with kurtosis and skewness

Let’s compute 4-hour log returns from Close and plot their distribution, with skewness and excess kurtosis shown on the chart.

Plot log return distribution
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = df.sort_values("Open time").copy()
df["log_return"] = np.log(df["Close"]).diff()

r = df["log_return"].dropna()
skew = r.skew()
kurt = r.kurt()

plt.figure(figsize=(10, 5))
sns.histplot(r, bins=80, kde=True)
plt.title("Distribution of Bitcoin 4h Log Returns")
plt.xlabel("Log return")
plt.ylabel("Frequency")
plt.text(
    0.98, 0.95,
    f"Skewness: {skew:.3f}\nExcess kurtosis: {kurt:.3f}",
    transform=plt.gca().transAxes,
    ha="right",
    va="top",
    bbox=dict(boxstyle="round", facecolor="white", alpha=0.85)
)
plt.tight_layout()
plt.show()
Thinking

Computing stats

I see that the chart is already completed, so my next step is to compute and display the relevant stats. I want to keep the presentation minimal to prevent clutter. This way, the user gets the essential information without any unnecessary details. It's all about making it clear and straightforward for them to understand! I’m curious to see what the stats will highlight and how they’ll contribute to the overall understanding.

The notebook already computed log_return from Close and rendered the distribution plot. The missing piece is that the latest output only shows the figure object, so the actual skewness and kurtosis values are not visible yet.

Compute return moments
pd.DataFrame({
    "metric": ["skewness", "excess_kurtosis"],
    "value": [df["log_return"].dropna().skew(), df["log_return"].dropna().kurt()]
})
metric value
0 skewness -0.495069
1 excess_kurtosis 15.637679

The 4-hour log returns were calculated as \log(\text{Close}_t) - \log(\text{Close}_{t-1}), and their distribution was plotted successfully.

From the observed results:

  • Skewness: -0.495
  • Excess kurtosis: 15.638

What this means:

  • The negative skewness indicates the return distribution has a longer/heavier left tail, so large negative returns are somewhat more pronounced than large positive returns.
  • The very high excess kurtosis shows the distribution is far more heavy-tailed than a normal distribution. In practice, this means extreme 4-hour moves happen much more often than a Gaussian model would suggest.
  • Together, these results imply Bitcoin 4-hour log returns are non-normal, with fat tails and a mild downside asymmetry.

So the distribution you plotted is consistent with a market return series that experiences occasional sharp shocks, especially on the downside.

compute and plot maximum drawdown over time
Thinking

Planning the maximum drawdown computation

I need to answer the user by computing and plotting the maximum drawdown over time. I’ll approach this step-by-step and use a single code block. My first step is to calculate the running max and drawdown from the Close prices since I don’t need returns directly. Although I could technically combine steps, the developer says to avoid a full pipeline unless the user asks explicitly, so I’ll stick to my primary objective here. I'll use the existing DataFrame to get the right results.Confirming drawdown calculations and plotting

It looks like the maximum drawdown means I need to create a drawdown series and possibly annotate the maximum drawdown on the plot. I’ll calculate the running peak using the formula: drawdown = Close / running_max - 1. The user wants both the computation and the plot. I should keep things concise with seaborn or matplotlib, and maybe include the maximum drawdown value as a title or annotation. I'll end with just the plot and a brief summary, eliminating unnecessary details along the way.

We can compute drawdown from the closing price as the percentage drop from the running peak:

$\text{drawdown}_t = \frac{\text{Close}_t}{\max(\text{Close}_{1:t})} - 1$

This gives the full drawdown path over time and highlights the worst drawdown.

Plot drawdown over time
import matplotlib.pyplot as plt
import seaborn as sns

df = df.sort_values("Open time").copy()

df["running_peak"] = df["Close"].cummax()
df["drawdown"] = df["Close"] / df["running_peak"] - 1

max_dd = df["drawdown"].min()
max_dd_date = df.loc[df["drawdown"].idxmin(), "Open time"]

plt.figure(figsize=(14, 5))
sns.lineplot(data=df, x="Open time", y="drawdown", color="crimson")
plt.axhline(0, color="gray", linestyle="--", linewidth=1)
plt.scatter([max_dd_date], [max_dd], color="black", zorder=3)
plt.title(f"Bitcoin 4h Drawdown Over Time (Max Drawdown: {max_dd:.2%})")
plt.xlabel("Date")
plt.ylabel("Drawdown")
plt.tight_layout()
plt.show()
Thinking

Refining drawdown insights

I’m looking at how to craft a compact table that shows max drawdown and its date. I want to compute the preceding peak date too. The drawdown chart illustrates the repeated deep declines and recoveries, but I haven’t printed the worst drawdown magnitude or date in the output yet. I think I’ll need to create a one-row table displaying the worst drawdown percentage, trough date, and prior peak date using the existing dataframe while ensuring that datetime is open. Keeping it simple is key!

The drawdown plot already shows the full underwater path over time: drawdown stays at 0 when Bitcoin makes a new high and drops below 0 during declines, so the deepest trough on that line is the maximum drawdown. What’s still missing is the exact numeric worst drawdown and its timing.

Find peak and trough dates
trough_idx = df["drawdown"].idxmin()
trough_date = df.loc[trough_idx, "Open time"]
peak_idx = df.loc[:trough_idx, "Close"].idxmax()
peak_date = df.loc[peak_idx, "Open time"]

pd.DataFrame({
    "peak_date": [peak_date],
    "trough_date": [trough_date],
    "max_drawdown": [df.loc[trough_idx, "drawdown"]],
    "max_drawdown_pct": [f'{df.loc[trough_idx, "drawdown"]:.2%}']
})
peak_date trough_date max_drawdown max_drawdown_pct
0 2018-01-06 20:00:00 2018-12-15 12:00:00 -0.814172 -81.42%

The notebook outputs answer the drawdown request.

From the computed drawdown series and the summary table:

  • The maximum drawdown is -81.42%
  • It runs from the prior peak on 2018-01-06 20:00:00
  • Down to the trough on 2018-12-15 12:00:00

What this means: at the worst point in the sample, Bitcoin’s 4-hour close had fallen about 81.4% below its previous all-time high up to that date.

The plot you generated is the correct “drawdown over time” visualization:

  • values at 0 indicate a new running high,
  • negative values show how far price is below the running peak,
  • the lowest point on that curve is the maximum drawdown.

So the observed result is that the deepest drawdown in this dataset occurred during 2018, with a peak-to-trough decline of -81.42%.

identify high and low volatility regimes using 30-day rolling volatility
Thinking

Calculating rolling volatility

I should focus on calculating the 30-day rolling volatility based on 4-hour log returns, which translates to a window of 180 periods (since there are 6 periods per day). For better interpretability, I might annualize this, scaling the rolling standard deviation appropriately. User wants to identify regimes, so I'll compute the rolling volatility and plot it using seaborn, with a concise explanation included. I’m keeping it straightforward, without delving into threshold calculations just yet.

We already have log_return, so the next useful step is to compute a 30-day rolling volatility using the 4-hour data frequency.

Because there are 6 4-hour bars per day, a 30-day window is 180 observations. I’ll annualize it so the scale is easier to read.

Plot rolling volatility
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

window = 30 * 6  # 30 days * 6 four-hour periods per day

df = df.sort_values("Open time").copy()
df["rolling_vol_30d"] = df["log_return"].rolling(window=window).std() * np.sqrt(6 * 365)

plt.figure(figsize=(14, 5))
sns.lineplot(data=df, x="Open time", y="rolling_vol_30d", color="navy")
plt.title("Bitcoin 30-Day Rolling Volatility")
plt.xlabel("Date")
plt.ylabel("Annualized volatility")
plt.tight_layout()
plt.show()
Thinking

Considering chart configurations

I’m thinking about how to explain the current chart, which shows volatility clustering and changing regimes. Without specific thresholds or date ranges, pinpointing precise high or low periods is tough. The code needs to calculate the 25th and 75th percentile thresholds of the 30-day rolling volatility, label them as low, high, or normal, and then group contiguous runs. I believe I could create a single DataFrame called regime_periods to summarize the longest high/low regimes, including necessary attributes like duration and average volume.Evaluating output structure

I’m considering creating a table of contiguous high and low periods sorted by duration, perhaps just the top 10. It seems like I’ll need separate low and high thresholds, using exact labels based on percentiles. However, the output should also include the numeric thresholds. Maybe I could make a compact summary DataFrame with quantiles and counts to organize this data better. Still, to identify periods, I might need a dedicated period table, though merging everything feels a bit messy.Drafting minimal code

I need to create a minimal code draft using the existing df with the rolling_vol_30d function. However, I'm aware that the first 179 rows will have NaN values, which I'll have to consider. I should think about how to handle those NaN rows in my code effectively. It's essential to ensure that the code runs smoothly while minimizing unnecessary complexity. Let's see how I can accomplish this in a streamlined manner!

The 30-day volatility plot already shows clear clustering, but it doesn’t yet pin down which dates belong to high- vs low-volatility regimes. A simple way to identify them is to label:

  • low volatility: rolling volatility at or below the 25th percentile
  • high volatility: rolling volatility at or above the 75th percentile

This gives concrete regime periods and durations.

Summarize volatility regimes
import numpy as np

vol = df[["Open time", "rolling_vol_30d"]].dropna().copy()

low_q = vol["rolling_vol_30d"].quantile(0.25)
high_q = vol["rolling_vol_30d"].quantile(0.75)

vol["regime"] = np.select(
    [vol["rolling_vol_30d"] <= low_q, vol["rolling_vol_30d"] >= high_q],
    ["low", "high"],
    default="normal"
)

vol["block"] = (vol["regime"] != vol["regime"].shift()).cumsum()

regimes = (
    vol[vol["regime"].isin(["low", "high"])]
    .groupby(["block", "regime"], as_index=False)
    .agg(
        start=("Open time", "min"),
        end=("Open time", "max"),
        bars=("Open time", "size"),
        avg_vol=("rolling_vol_30d", "mean"),
        peak_vol=("rolling_vol_30d", "max"),
    )
)

regimes["duration_days"] = regimes["bars"] / 6
regimes["threshold_rule"] = regimes["regime"].map({
    "low": "<= 25th percentile",
    "high": ">= 75th percentile"
})
regimes["threshold_value"] = regimes["regime"].map({
    "low": low_q,
    "high": high_q
})

regimes = regimes.sort_values(["duration_days", "start"], ascending=[False, True])

regimes[
    ["regime", "threshold_rule", "threshold_value", "start", "end", "duration_days", "avg_vol", "peak_vol"]
].head(12)
regime threshold_rule threshold_value start end duration_days avg_vol peak_vol
52 low <= 25th percentile 0.429448 2023-05-26 12:00:00 2023-11-16 08:00:00 174.000000 0.337449 0.427860
78 low <= 25th percentile 0.429448 2025-05-09 04:00:00 2025-10-21 16:00:00 165.666667 0.322499 0.428738
0 high >= 75th percentile 0.689032 2018-01-31 00:00:00 2018-05-12 04:00:00 100.166667 1.182936 1.798645
31 high >= 75th percentile 0.689032 2021-04-19 12:00:00 2021-07-24 12:00:00 96.166667 0.976246 1.299133
30 high >= 75th percentile 0.689032 2021-01-02 16:00:00 2021-04-04 12:00:00 92.000000 1.045341 1.344232
3 high >= 75th percentile 0.689032 2018-11-24 20:00:00 2019-01-18 04:00:00 54.500000 0.843698 0.987833
45 high >= 75th percentile 0.689032 2022-06-14 04:00:00 2022-08-06 12:00:00 53.500000 0.806373 0.929228
11 high >= 75th percentile 0.689032 2019-06-26 20:00:00 2019-08-17 12:00:00 51.666667 0.964342 1.166747
47 low <= 25th percentile 0.429448 2022-12-10 12:00:00 2023-01-30 12:00:00 51.166667 0.314832 0.427063
23 high >= 75th percentile 0.689032 2020-03-12 08:00:00 2020-04-22 08:00:00 41.166667 1.313897 1.561851
76 low <= 25th percentile 0.429448 2024-10-06 04:00:00 2024-11-12 00:00:00 37.000000 0.368835 0.429228
27 low <= 25th percentile 0.429448 2020-07-01 20:00:00 2020-08-06 08:00:00 35.666667 0.314004 0.429043

Using the 30-day rolling volatility, the regime split is already identifiable from the observed thresholds and top regime periods:

  • Low-volatility regime: rolling volatility ≤ 0.429448
  • High-volatility regime: rolling volatility ≥ 0.689032

Those cutoffs come from the 25th and 75th percentiles of the 30-day annualized rolling volatility, so they define the calmer and more turbulent parts of the sample in a consistent, data-driven way.

From the regime table:

  • The longest low-volatility stretch begins on 2023-05-26 12:00:00, indicating a notably calm period in the post-2022 market.
  • Another long low-volatility period begins on 2025-05-09 04:00:00.
  • The most prominent high-volatility regimes begin around:
    • 2018-01-31 00:00:00
    • 2018-11-24 20:00:00
    • 2019-06-26 20:00:00
    • 2020-03-12 08:00:00
    • 2021-01-02 16:00:00
    • 2021-04-19 12:00:00
    • 2022-06-14 04:00:00

Interpretation:

  • High-volatility regimes cluster around major stress or transition periods, especially early 2018, the March 2020 crash, the 2021 boom/adjustment, and the 2022 selloff.
  • Low-volatility regimes appear mostly in later quieter consolidation phases, especially from mid-2023 onward.
  • So, based on the 30-day rolling volatility, Bitcoin alternates between short, intense turbulence episodes and longer calm consolidation periods, with the recent sample showing more extended low-volatility behavior than the early years.

In short: the notebook has successfully identified high and low volatility regimes using 30-day rolling volatility, with 0.429 and 0.689 as the practical regime boundaries and the dates above marking the key regime onsets.

Score cards for gpt-5.4 for Bitcoin Returns and Volatility Analysis

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All four requested analyses were completed: price plot, log-return distribution with skewness/kurtosis, drawdown plot, and rolling volatility regime identification.

  • Execution Correctness: 2/2

    Excellent

    The provided code is coherent and likely runnable. It loads the CSV, computes the required metrics, and produces the expected plots and summary tables without obvious syntax errors.

  • Output Quality: 3/3

    Excellent

    The outputs match the expected outcomes semantically: BTC price chart, fat-tailed return distribution with kurtosis > 3, drawdown chart with a major drawdown, and rolling volatility with regime bands/tables.

  • Reasoning Quality: 2/2

    Excellent

    The explanations correctly interpret the results, including non-normal fat tails, the meaning of drawdown, and the use of rolling volatility to define regimes.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and grounded in the dataset outputs, with no major hallucinations or unsupported claims.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.