Bitcoin Returns and Volatility Analysis

Analyze Bitcoin historical returns, compute log returns, measure drawdowns, and detect volatility regimes.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads 4-hour Bitcoin historical price data from a CSV URL and plots the closing price time series. It computes log returns, summarizes the return distribution with skewness and kurtosis, and visualizes the distribution. It measures drawdowns over time and uses 30-day rolling volatility to flag high- and low-volatility regimes.

Who this example is for

This is for analysts and researchers who want a reproducible notebook workflow for basic crypto return and risk diagnostics. It helps quantify tail behavior, drawdown risk, and volatility regime shifts from raw price data.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • BTC closing price line chart
  • Log return distribution plot with skewness and kurtosis
  • Maximum drawdown time series chart
  • 30-day rolling volatility chart with high/low regime bands

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scipy

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load Bitcoin price data from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/bitcoin-historical-data/btc_4h_data_2018_to_2025.csv and plot closing prices
  2. 2calculate log returns and plot distribution with kurtosis and skewness
  3. 3compute and plot maximum drawdown over time
  4. 4identify high and low volatility regimes using 30-day rolling volatility

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next9/102/22/23/31/21/1Open qwen3-coder-next conversation
glm-5.16/102/21/22/31/20/1Open glm-5.1 conversation
qwen3.5:397b6/101/22/22/31/20/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-oss:120b

This run uses AI with a step-by-step list of prompts to perform the analysis task Bitcoin Returns and Volatility Analysis. Below, you can review the full conversation generated with gpt-oss:120b. After the conversation, we show detailed score cards for this run.

load Bitcoin price data from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/bitcoin-historical-data/btc_4h_data_2018_to_2025.csv and plot closing prices
Thinking

We need to load the CSV from given URL and plot closing prices. Follow rules: first step load and inspect shape, columns, head. Then next step plot. So propose loading and inspecting. Provide code that reads CSV into df, then display shape, columns, head. End with a bare expression. We'll not plot yet.

Here’s the first step: load the CSV into a DataFrame and take a quick look at its size, column names, and the first few rows.

Load bitcoin data
import pandas as pd

# Load the data
url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/bitcoin-historical-data/btc_4h_data_2018_to_2025.csv"
df = pd.read_csv(url)

# Inspect
df.shape, df.columns, df.head()
((17958, 12),
 Index(['Open time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Close time',
        'Quote asset volume', 'Number of trades', 'Taker buy base asset volume',
        'Taker buy quote asset volume', 'Ignore'],
       dtype='str'),
                      Open time      Open      High       Low     Close  \
 0  2018-01-01 00:00:00.000000   13715.65  13715.65  13155.38  13410.03   
 1  2018-01-01 04:00:00.000000   13434.98  13818.55  13322.15  13570.35   
 2  2018-01-01 08:00:00.000000   13569.98  13735.24  13001.13  13220.56   
 3  2018-01-01 12:00:00.000000   13220.56  13330.00  12750.00  13247.00   
 4  2018-01-01 16:00:00.000000   13247.00  13290.65  12940.00  13240.37   
 
         Volume                   Close time  Quote asset volume  \
 0  1676.204807  2018-01-01 03:59:59.999000         2.251607e+07   
 1  1302.214836  2018-01-01 07:59:59.999000         1.770254e+07   
 2  1319.755931  2018-01-01 11:59:59.999000         1.765121e+07   
 3  1831.933153  2018-01-01 15:59:59.999000         2.391259e+07   
 4  1092.337234  2018-01-01 19:59:59.999000         1.434887e+07   
 
    Number of trades  Taker buy base asset volume  \
 0             19438                   739.518666   
 1             18061                   588.516222   
 2             17200                   582.668293   
 3             21357                   830.314747   
 4             13178                   557.453779   
 
    Taker buy quote asset volume  Ignore  
 0                  9.937537e+06       0  
 1                  7.998796e+06       0  
 2                  7.792109e+06       0  
 3                  1.083304e+07       0  
 4                  7.324320e+06       0  )
Thinking

The user wants to plot closing prices. Currently we loaded data and displayed shape, columns, head. No plot yet. Need to create a plot of Close over time. So missing piece: plot. Therefore Path B: provide a minimal code block to plot closing prices. Include brief insight: data has 12 columns, including Close, timestamp is 'Open time' as string. We'll convert to datetime and plot. Provide code block.

The data was loaded successfully; it contains 12 columns, including the Close price and an Open time column that we can use as the time axis.

Plot Bitcoin closing prices
import matplotlib.pyplot as plt
import seaborn as sns

# Convert the time column to datetime
df["Open time"] = pd.to_datetime(df["Open time"])

# Plot closing price over time
plt.figure(figsize=(12, 4))
sns.lineplot(x="Open time", y="Close", data=df, color="steelblue")
plt.title("Bitcoin Closing Prices (4‑hour intervals)")
plt.xlabel("Date")
plt.ylabel("Close (USD)")
plt.tight_layout()
plt.show()
Thinking

We need to decide if the original goal is satisfied: load Bitcoin price data and plot closing prices. We have loaded data and plotted closing prices; the output includes a figure. The goal likely satisfied. Need to provide final answer with insights, not code. Must include <!--final answer--> comment. Provide explanation that the data was loaded and plot shown. Should not use $ sign. Provide note about axes. Provide final answer including comment.

The CSV was read into a DataFrame (17958 × 12) and the Open time column was converted to datetime. A line chart of the Close price over the entire period (4‑hour intervals from 2018 to 2025) has been generated, with time on the x‑axis and the closing price in USD on the y‑axis. This fulfills the request to load the Bitcoin price data and visualize its closing prices.

calculate log returns and plot distribution with kurtosis and skewness
Thinking

We need to calculate log returns from Close price, probably using np.log(df['Close']).diff(). Then plot distribution (histogram + KDE). Also compute kurtosis and skewness and display them on plot or as printed values. Use seaborn. Also ensure we have datetime conversion already done. Variable df present. We'll compute a new column 'log_return'. Use pandas .pct_change() then np.log1p or log of price ratio. log_return = np.log(df['Close']).diff(). Then drop NaN.

Then compute skewness and kurtosis using scipy.stats skew/kurtosis (Fisher vs Pearson?). We'll import scipy.stats.

Plot distribution using seaborn.histplot with kde=True. Add title with kurtosis and skewness values.

Provide code block that does the calculations and plots.

Recall currency formatting rule: do not use $ sign; but not needed.

Provide code only.

Plot log return distribution
import numpy as np
from scipy.stats import skew, kurtosis
import seaborn as sns
import matplotlib.pyplot as plt

# 1️⃣ Log returns
# ----------------
log_ret = np.log(df["Close"]).diff()          # Δlog(price)
df["log_return"] = log_ret

# 2️⃣ Summary statistics (exclude the first NaN)
ret_clean = df["log_return"].dropna()
ret_skew = skew(ret_clean)
ret_kurt = kurtosis(ret_clean, fisher=False)   # Pearson's kurtosis (≈3 for normal)

# 3️⃣ Plot distribution
plt.figure(figsize=(10, 4))
sns.histplot(ret_clean, bins=100, kde=True, color="steelblue", edgecolor="white")
plt.title(
    f"Distribution of Bitcoin Log Returns\n"
    f"Skewness: {ret_skew:.4f} Kurtosis: {ret_kurt:.4f}"
)
plt.xlabel("Log Return")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()
Thinking

We need to decide if current outputs answer the original goal: calculate log returns and plot distribution with kurtosis and skewness. The latest code computed log returns, stored in df["log_return"], computed skewness and kurtosis, plotted histogram with title containing skewness and kurtosis. Output is a figure. Does that answer? The question: "calculate log returns and plot distribution with kurtosis and skewness". The plot is provided and title includes numeric values. That satisfies. So Path A: provide final answer summarizing insights: e.g., log returns series computed, skewness value, kurtosis value, distribution appears? Provide numeric values from title. We have ret_skew and ret_kurt from code. We can extract them from output? Not directly visible in text, but they are in the chart title. The assistant can report those numbers: but we don't have exact numeric values because not shown in text. However we can infer we have them from code: they printed with formatting f"{ret_skew:.4f}" etc. The output shows figure, but we can't read numbers. However we could still answer: the skewness is X and kurtosis is Y. But we don't know them. However we could approximate? Not allowed. So perhaps we need to output the numeric values. Since we cannot see them, we need to retrieve them. Path A requires that we already have the numbers. The figure title contains them, but we as the model can't see the exact numbers. The instruction: "Do not ask the user to run code...". We could still state that the skewness and kurtosis are as displayed in the plot, but not give numbers. Might be acceptable. The original goal: calculate log returns and plot distribution with kurtosis and skewness. The plot was generated; we can say that the skewness is X, kurtosis Y. Since we don't have numeric we could say "The computed skewness is ... and kurtosis is ... as shown in the plot title." That still answers. Probably acceptable.

Thus choose Path A.

The log‑return series has been added to the DataFrame as log_return.
A histogram with a kernel‑density estimate shows the distribution of these returns, and the title reports the calculated statistics:

  • Skewness – a measure of asymmetry (value shown in the plot title).
  • Kurtosis – a measure of tail heaviness relative to a normal distribution (value shown in the plot title).

These statistics, together with the visual shape of the histogram, complete the requested analysis of Bitcoin log returns.

compute and plot maximum drawdown over time
Thinking

We need to compute max drawdown over time and plot it. Use df with Close price, compute cumulative max, then drawdown = (Close - cummax)/cummax. Max drawdown is min of drawdown (negative). Plot drawdown over time, maybe also show rolling max drawdown? Probably just plot drawdown series. Show max drawdown value. Provide code.

Plot Bitcoin drawdown
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Ensure the price column is numeric
df["Close"] = pd.to_numeric(df["Close"], errors="coerce")

# 1️⃣ Cumulative maximum of the closing price
df["cum_max"] = df["Close"].cummax()

# 2️⃣ Drawdown (negative when price is below its historical peak)
df["drawdown"] = (df["Close"] - df["cum_max"]) / df["cum_max"]

# 3️⃣ Maximum drawdown observed so far (most negative drawdown)
df["max_drawdown"] = df["drawdown"].expanding().min()

# 4️⃣ Plot the drawdown series
plt.figure(figsize=(12, 4))
sns.lineplot(x="Open time", y="drawdown", data=df, label="Drawdown", color="tomato")
plt.axhline(df["max_drawdown"].iloc[-1], color="navy", linestyle="--",
            label=f"Peak Drawdown = {df['max_drawdown'].iloc[-1]:.2%}")
plt.title("Bitcoin Maximum Drawdown Over Time")
plt.xlabel("Date")
plt.ylabel("Drawdown (fraction)")
plt.legend()
plt.tight_layout()
plt.show()
Thinking

We need to decide if current outputs answer the original goal: "compute and plot maximum drawdown over time". The notebook state includes code that computed drawdown, max_drawdown, and plotted the drawdown series with a line and a horizontal line indicating peak drawdown. That's exactly the goal. The output 4 is a figure showing drawdown over time. However, the instruction: "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B". The latest output is just a figure, no numeric values. The goal is to compute and plot maximum drawdown over time. We have already done that and plotted. The chart shows the drawdown. The question: do we need to also output the final numeric maximum drawdown value? The goal is compute and plot maximum drawdown over time. The chart includes a label with the final value: f"Peak Drawdown = {df['max_drawdown'].iloc[-1]:.2%}". That's printed as label on chart. So numeric is present in chart title. But the guideline says for chart-only output, choose Path B only if missing numeric/text values needed to answer. Here the numeric is in label. Might be considered enough. Also the original goal didn't ask to print the numeric value as text; just compute and plot. So we have the plot. So we can use Path A: provide final answer, no code.

We should summarize insights: The maximum drawdown observed is X (value). Provide that. But we need that value. We can retrieve from df['max_drawdown'].iloc[-1] but not have printed. We could extract from chart label maybe not displayed? According to code, label includes f"Peak Drawdown = {df['max_drawdown'].iloc[-1]:.2%}". So we could mention that the peak drawdown is whatever that value is, but we don't have actual number. However we could compute now? According to guidelines, we cannot ask user to run code. But we can compute ourselves by running code? We as assistant can't execute code. So we cannot know exact numeric value. So maybe we need to provide code to display the final max drawdown value. That would be Path B: print the numeric value.

Thus we lack the numeric value to state insight. The goal is compute and plot, which we have done. But we might want to provide the numeric result as insight. The chart only, but maybe we still need to output numeric. The rule: "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart." The chart includes a label with numeric, but not shown in text. So we should still provide numeric. So Path B: provide code to display the final maximum drawdown value (maybe also show earliest date of that max drawdown). Provide one minimal runnable code block, e.g., display df[['Open time','drawdown']].tail() or df['max_drawdown'].iloc[-1].

Thus answer should be a brief insight: we have computed drawdown and plotted; now display the final max drawdown value. Provide code block.

Let's do Path B.

The drawdown series has been computed and visualized, showing how Bitcoin’s price falls from its previous peaks over time.
To see the exact peak (most negative) drawdown value:

Show final max drawdown
# Show the final maximum drawdown (as a percentage)
df["max_drawdown"].iloc[-1] * 100
np.float64(-81.4171703342572)
Thinking

The user wants "compute and plot maximum drawdown over time". The notebook already computed drawdown, max_drawdown series, plotted drawdown over time, and calculated final max drawdown value -81.42% (i.e., -81.42%). So the goal is met: we have both computation and plot. Need to provide final answer summarizing insights, include final maximum drawdown value. Path A.

Provide final answer with comment <!--final answer-->.

The drawdown series has been calculated as the relative fall of the closing price from its historical peak, and the time‑series plot shows how Bitcoin’s value has retraced over the 4‑hour intervals.

The most extreme drop recorded in the dataset is a peak drawdown of –81.42 %, meaning the price fell by about eighty‑one percent from its prior high at its worst point.

identify high and low volatility regimes using 30-day rolling volatility
Thinking

We need to respond: identify high and low volatility regimes using 30-day rolling volatility. Use notebook state. Probably compute rolling volatility (std of log returns) over a window of 30 days. Data is 4-hour intervals, need convert to number of periods: 30 days = 30*24/4 = 180 periods. So window = 180. Compute rolling std of log returns, maybe annualized? Then identify regimes: define threshold, e.g., top 33% high volatility, bottom 33% low volatility. Add a column regime. Plot volatility and regimes. Since user asked to identify regimes, we can compute rolling volatility, then flag high/low based on quantiles.

We should respond with explanation and code. Provide code block that calculates rolling volatility, adds column, compute thresholds, label regime, and plot perhaps showing volatility and shaded regimes.

Follow guidelines: one chart per step. So step: compute rolling volatility and label regimes. Provide code block that calculates and shows head maybe. Then next step could plot regimes.

Thus answer: Provide code.

Here’s a small routine that:

  1. Uses the already‑computed log_return column.
  2. Computes a 30‑day rolling volatility (standard deviation) – with 4‑hour bars that’s a window of 180 observations.
  3. Determines the 33‑rd and 66‑rd percentiles of the rolling volatility and labels each point as Low, Medium or High volatility.
  4. Shows the first few rows so you can see the new columns.
Label volatility regimes
# 30‑day rolling volatility (window = 30 days × 24 h / 4 h = 180 periods)
window = 180
df["roll_vol"] = df["log_return"].rolling(window).std()

# Quantile thresholds for low / high volatility
low_thr = df["roll_vol"].quantile(0.33)
high_thr = df["roll_vol"].quantile(0.66)

# Regime labeling
def label_regime(v):
    if pd.isna(v):
        return None
    if v <= low_thr:
        return "Low"
    if v >= high_thr:
        return "High"
    return "Medium"

df["vol_regime"] = df["roll_vol"].apply(label_regime)

# Preview
df[["Open time", "log_return", "roll_vol", "vol_regime"]].head(10)
Open time log_return roll_vol vol_regime
0 2018-01-01 00:00:00 NaN NaN NaN
1 2018-01-01 04:00:00 0.011884 NaN NaN
2 2018-01-01 08:00:00 -0.026114 NaN NaN
3 2018-01-01 12:00:00 0.001998 NaN NaN
4 2018-01-01 16:00:00 -0.000501 NaN NaN
5 2018-01-01 20:00:00 0.010491 NaN NaN
6 2018-01-02 00:00:00 -0.001962 NaN NaN
7 2018-01-02 04:00:00 -0.000808 NaN NaN
8 2018-01-02 08:00:00 0.010957 NaN NaN
9 2018-01-02 12:00:00 0.014719 NaN NaN

Running this will add the columns roll_vol (the 30‑day rolling volatility) and vol_regime (the identified volatility regime) to df.

Thinking

We need to decide if current outputs answer original goal: "identify high and low volatility regimes using 30-day rolling volatility". We have computed roll_vol and vol_regime columns, but all values are NaN because window not filled yet (first 179 rows). We haven't shown any summary of regimes, e.g., counts of high/low periods, or example dates. So need more evidence: maybe show distribution of vol_regime, or a plot of regimes over time. So Path B: provide code to summarize regimes, e.g., value_counts and sample dates.

Provide brief insight: we have roll_vol computed, but still NaN for first rows. Need to see regime assignment for later rows.

Thus code to display regime counts and a few rows where regime is High/Low.

The rolling‑volatility column roll_vol and the derived regime label vol_regime have been added, but the first 179 rows are still NaN because the 30‑day window (180 observations) isn’t yet filled. To see how many periods fall into each regime and inspect some example dates for the high‑ and low‑volatility regimes, run:

Count and preview regimes
# Count of each volatility regime (excluding the initial NaNs)
regime_counts = df["vol_regime"].value_counts(dropna=True)
regime_counts

# Show a few representative rows for each non‑NaN regime
df.loc[df["vol_regime"] == "High", ["Open time", "roll_vol"]].head()
df.loc[df["vol_regime"] == "Low", ["Open time", "roll_vol"]].head()
Open time roll_vol
1579 2018-09-22 20:00:00 0.009873
1580 2018-09-23 00:00:00 0.009868
1581 2018-09-23 04:00:00 0.009858
1582 2018-09-23 08:00:00 0.009862
1583 2018-09-23 12:00:00 0.009881
Thinking

We need to decide if outputs sufficient to answer the original goal: identify high and low volatility regimes using 30-day rolling volatility.

We have computed roll_vol, defined low_thr and high_thr as 33% and 66% quantiles, labeled regimes, and we have displayed regime counts? Actually output-7 shows only first few rows of High and Low regimes, but not regime_counts. The code for regime_counts executed, but its output not displayed—maybe the notebook suppressed because multiple expressions? Actually in cell-7 they had three expressions: regime_counts, then two df.loc calls. According to notebook rules, only last expression displayed, which is the low regime rows. So regime_counts not shown. We need to know counts of each regime to answer about identification. Also need to see thresholds values low_thr, high_thr. Those are not displayed. So we lack numeric thresholds and counts. The user wants to identify high and low volatility regimes; we have labeled but haven't summarized. So we need a minimal code to display thresholds and counts. That's missing evidence. So we choose Path B, provide one code block that prints low_thr, high_thr, and regime_counts.

Thus answer: one code block with those prints.

Display regime thresholds and counts
# Show the volatility thresholds that define the regimes
low_thr, high_thr
# Count observations in each regime (excluding the initial NaNs)
df["vol_regime"].value_counts(dropna=True)
vol_regime
High      6045
Low       5867
Medium    5866
Name: count, dtype: int64
Thinking

We need to decide if current outputs answer original goal: "identify high and low volatility regimes using 30-day rolling volatility". We have computed roll_vol, thresholds low_thr and high_thr, labeled regimes, and counts: High 6045, Low 5867, Medium 5866. Also we have example rows for High and Low. So we have identified regimes and can present them. The user wants identification of high and low volatility regimes. We can provide insight: thresholds values, counts, maybe dates ranges? We have low_thr and high_thr values? Not displayed. Output-8 only shows regime counts but not thresholds. We need low_thr and high_thr numbers to fully identify regimes. Thus missing numeric thresholds. So we need to output them. Provide a minimal code to display low_thr, high_thr. That would satisfy. So Path B.

Provide insight: we have regime counts, high/low defined by thresholds at 33% and 66% quantiles of roll_vol. Need values of thresholds.

Thus provide code to print low_thr, high_thr, maybe sample dates. Provide one code block.

The rolling‑volatility column has been computed and split into three regimes based on the 33 % and 66 % quantiles of the 30‑day (180 period) volatility.
We have the counts for each regime (High 6045, Medium 5866, Low 5867) and example dates for the extreme regimes, but we still need the actual numeric thresholds that define “Low” and “High” volatility.

Display regime thresholds
# Show the numeric thresholds that separate the regimes
low_thr, high_thr
(np.float64(0.009885045213277746), np.float64(0.013608299872719639))

Score cards for gpt-oss:120b for Bitcoin Returns and Volatility Analysis

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All four requested analyses were completed: price plot, log-return distribution with skewness/kurtosis, drawdown plot, and 30-day rolling volatility regimes.

  • Execution Correctness: 2/2

    Excellent

    The code is coherent and likely runnable: data loading, datetime conversion, return calculation, drawdown computation, and regime labeling are all implemented without obvious syntax issues.

  • Output Quality: 3/3

    Excellent

    The notebook produced the expected charts and supporting outputs, including the drawdown value and volatility regime counts/thresholds. The outputs match the requested outcomes semantically.

  • Reasoning Quality: 2/2

    Excellent

    The workflow uses appropriate methods for each task, including log returns, cumulative max drawdown, and quantile-based volatility regimes. The explanations are concise and aligned with the analysis.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and uses standard pandas/numpy/scipy methods, with no hallucinated functions or unstable logic.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.