What Happens After the Worst Days in the Market?
Written by Piotr Płoński · Published on April 16, 2026 · Notebook code
Analyze what happens after the largest daily drops in the S&P 500. This AI-assisted notebook explores mean reversion, return distributions, and how market behavior changes after extreme events.
This notebook investigates what happens after the worst daily declines in the S&P 500 using publicly available market data. Starting from historical price data, it identifies extreme down days and examines how the market behaves in the following sessions.
Using simple natural language prompts, the analysis computes forward returns, compares them to baseline performance, and explores whether mean reversion exists after large drops. It goes beyond averages by analyzing full return distributions, showing how both the central tendency and dispersion change after extreme events.
The results reveal that while strong rebounds often follow the most severe declines, they come with significantly higher volatility. The notebook also explores how this effect has evolved over time, showing that post-2000 rebounds are larger but less consistent. This example demonstrates how AI-assisted analysis can quickly uncover meaningful patterns in financial data.
Thinking
Processing stock data
I’m thinking about how to handle the latest stock data in a DataFrame. I want to keep it simple, so I'll return the DataFrame and show the head. In a notebook, a final bare expression at the end is essential, maybe by using sp500.head(). I need to consider if columns are MultiIndex from yfinance, especially if grouping by ticker, and flatten them if needed. The user wants the Adjusted Close labeled correctly, so I should rename "Adj Close" to "Adjusted Close." Finally, I’ll compute the daily returns percentage change and convert it to a percentage.Calculating daily returns
I’m thinking about how to express daily returns. Since they're commonly noted as a percentage change, I should multiply by 100 and label it as Daily Return in percent. It's essential to highlight that these values are in percent units. I also need to drop any missing values, and I realize the first row might be removed because of NaN from pct_change(). After that, I should ensure the data is sorted in ascending order by date, which yfinance usually does already.
Here’s a compact notebook cell that loads the S&P 500 daily history from Yahoo Finance, keeps the requested columns, computes daily returns from Adjusted Close, drops missing values, and shows the first rows.
Load and preprocess SP500 data
import pandas as pd
import yfinance as yf
start_date = "1950-01-01"
end_date = (pd.Timestamp.today().normalize() + pd.Timedelta(days=1)).strftime("%Y-%m-%d")
sp500 = yf.download(
"^GSPC",
start=start_date,
end=end_date,
auto_adjust=False,
progress=False
)
if isinstance(sp500.columns, pd.MultiIndex):
sp500.columns = sp500.columns.get_level_values(0)
sp500 = sp500.reset_index()
sp500 = sp500[
["Date", "Open", "High", "Low", "Close", "Adj Close", "Volume"]
].rename(columns={"Adj Close": "Adjusted Close"})
sp500 = sp500.sort_values("Date").reset_index(drop=True)
sp500["Daily Return"] = sp500["Adjusted Close"].pct_change() * 100
sp500 = sp500.dropna().reset_index(drop=True)
sp500.head()| Price | Date | Open | High | Low | Close | Adjusted Close | Volume | Daily Return |
|---|---|---|---|---|---|---|---|---|
| 0 | 1950-01-04 | 16.850000 | 16.850000 | 16.850000 | 16.850000 | 16.850000 | 1890000 | 1.140459 |
| 1 | 1950-01-05 | 16.930000 | 16.930000 | 16.930000 | 16.930000 | 16.930000 | 2550000 | 0.474777 |
| 2 | 1950-01-06 | 16.980000 | 16.980000 | 16.980000 | 16.980000 | 16.980000 | 2010000 | 0.295329 |
| 3 | 1950-01-09 | 17.080000 | 17.080000 | 17.080000 | 17.080000 | 17.080000 | 2520000 | 0.588930 |
| 4 | 1950-01-10 | 17.030001 | 17.030001 | 17.030001 | 17.030001 | 17.030001 | 2160000 | -0.292736 |
This returns the dataset as a pandas DataFrame in sp500.
Thinking
Clarifying DataFrame structure
I need to mention the pandas DataFrame called sp500. I want to address the peculiar “Price” label, ensuring it’s recognized as a column name and not an extra column. It’s really important to clarify that. This will help to avoid confusion when discussing data analysis or queries regarding the DataFrame. I’ll make sure to provide a clear explanation about this in my response. Let's keep it precise and helpful!
The requested dataset is already present as the pandas DataFrame sp500.
From the observed notebook state and output:
sp500has 19,191 rows and 8 columns- It contains the requested fields:
DateOpenHighLowCloseAdjusted CloseVolumeDaily Return
- The data is in ascending date order
- Missing values were dropped after computing daily returns, which is why the first visible row is 1950-01-04 rather than 1950-01-03: the first return from
pct_change()is naturally missing and was removed - The first few observed rows show early S&P 500 levels around 16.85 to 17.03, with daily returns such as 1.140459%, 0.474777%, and -0.292736%
The first few rows shown are the expected preview of the final DataFrame, so this satisfies the original request.
Thinking
Thinking through user tasks
I’m considering how to approach multiple tasks the user has asked for. While I need to focus on one thing at a time, I wonder if I could execute them in one cell. The developer suggests taking it step-by-step, so I should analyze outputs carefully. It looks like the main goal is to summarize the worst days and generate counts. I’ll need to provide minimal code to create sorted tables and calculate percentages for extreme days. A brief explanation and code will help clarify my approach.Planning concise outputs
I can’t really make assumptions about the outputs, so I’ll keep things concise. I plan to create variables like worst_days, summary_counts, and yearly_extremes, then display them sequentially. According to the rules, if I need multiple tables, I can show them using display(...). My first step will be to generate the 15 worst days and summary stats, along with a year breakdown for clustering. After executing this, I can comment on any visible patterns, but since I don't have outputs yet, I’ll hold off on commentary for now.Analyzing notebook states
The instruction suggests that I should use the notebook state to see data, rather than generating new outputs. So, my next step is to produce code to compute the requested information, stopping after that—no need to ask the user to share anything since the fresh output isn't available. I think I’ll sort the data by Daily Return in ascending order and display the Date, Daily Return, and Close price. I’ll also count thresholds and calculate percentages of total days for specific periods like 2008 and 2020. For the output, I'll produce two tables to stay within the guideline of one chart or table per step, even though that might feel like quite a lot!
Let’s compute the worst days, the threshold counts, and a simple year-level clustering summary in one pass. After that, we can comment on the specific periods from the actual output.
Summarize worst market days
import pandas as pd
from IPython.display import display
worst_days = (
sp500.sort_values("Daily Return", ascending=True)
.loc[:, ["Date", "Daily Return", "Close"]]
.head(15)
.reset_index(drop=True)
)
total_days = len(sp500)
summary = pd.DataFrame({
"Category": ["Return < -3%", "Return < -5%"],
"Count": [
(sp500["Daily Return"] < -3).sum(),
(sp500["Daily Return"] < -5).sum(),
],
})
summary["Pct of Total Days"] = (summary["Count"] / total_days * 100).round(3)
extreme_by_year = (
sp500.assign(Year=sp500["Date"].dt.year)
.groupby("Year")
.agg(
Days_lt_neg3=("Daily Return", lambda s: (s < -3).sum()),
Days_lt_neg5=("Daily Return", lambda s: (s < -5).sum()),
)
.query("Days_lt_neg3 > 0 or Days_lt_neg5 > 0")
.sort_values(["Days_lt_neg5", "Days_lt_neg3"], ascending=False)
.head(15)
.reset_index()
)
display(worst_days)
display(summary)
display(extreme_by_year)| Price | Date | Daily Return | Close |
|---|---|---|---|
| 0 | 1987-10-19 | -20.466931 | 224.839996 |
| 1 | 2020-03-16 | -11.984055 | 2386.129883 |
| 2 | 2020-03-12 | -9.511268 | 2480.639893 |
| 3 | 2008-10-15 | -9.034978 | 907.840027 |
| 4 | 2008-12-01 | -8.929524 | 816.210022 |
| 5 | 2008-09-29 | -8.806776 | 1106.420044 |
| 6 | 1987-10-26 | -8.278947 | 227.669998 |
| 7 | 2008-10-09 | -7.616710 | 909.919983 |
| 8 | 2020-03-09 | -7.596970 | 2746.560059 |
| 9 | 1997-10-27 | -6.865684 | 876.989990 |
| 10 | 1998-08-31 | -6.801408 | 957.280029 |
| 11 | 1988-01-08 | -6.768305 | 243.399994 |
| 12 | 2008-11-20 | -6.712293 | 752.440002 |
| 13 | 1962-05-28 | -6.675637 | 55.500000 |
| 14 | 2011-08-08 | -6.663446 | 1119.459961 |
| Category | Count | Pct of Total Days | |
|---|---|---|---|
| 0 | Return < -3% | 132 | 0.688 |
| 1 | Return < -5% | 30 | 0.156 |
| Year | Days_lt_neg3 | Days_lt_neg5 | |
|---|---|---|---|
| 0 | 2008 | 23 | 11 |
| 1 | 2020 | 16 | 5 |
| 2 | 1987 | 6 | 3 |
| 3 | 2009 | 12 | 1 |
| 4 | 2011 | 6 | 1 |
| 5 | 1998 | 5 | 1 |
| 6 | 1950 | 4 | 1 |
| 7 | 2000 | 4 | 1 |
| 8 | 2025 | 3 | 1 |
| 9 | 1962 | 2 | 1 |
| 10 | 1988 | 2 | 1 |
| 11 | 1955 | 1 | 1 |
| 12 | 1989 | 1 | 1 |
| 13 | 1997 | 1 | 1 |
| 14 | 2022 | 8 | 0 |
Once this runs, we can briefly interpret whether the extreme days are concentrated in periods like 2008 and 2020.
The worst single day in the dataset is 1987-10-19, with a -20.47% daily return, far worse than any other day shown. After that, the next most severe declines are concentrated in crisis periods, especially March 2020 and late 2008.
The 15 worst days show clear clustering:
- 2008 appears most often in the top 15, with multiple extreme declines in September, October, November, and December
- 2020 has three of the very worst days, all in March
- 1987 appears twice, including the single worst day
- A few other isolated crisis-era dates appear in 1997, 1998, 2011, and 1962
For the thresholds:
- Days with returns < -3%: 132
- Days with returns < -5%: 30
As a share of all trading days:
- < -3% days account for 0.688%
- < -5% days account for 0.156%
So these extreme declines are very rare, especially the deepest ones.
The year-level counts confirm that they are not evenly spread through history but are heavily clustered in specific stress periods:
- 2008: 23 days below -3%, including 11 below -5%
- 2020: 16 days below -3%, including 5 below -5%
- 1987: 6 days below -3%, including 3 below -5%
- 2009 still shows elevated stress with 12 days below -3%
- 2022 had 8 days below -3%, but none below -5%
Visible pattern: the worst down days are strongly concentrated in major market shock regimes rather than being spread uniformly over time. 2008 stands out as the most sustained cluster of severe losses, while 2020 shows a very sharp, compressed burst of extreme days. 1987 is notable for the most dramatic single-day crash.
Thinking
Creating visualization code
I see that the user wants a visualization and we’ll take an iterative approach with one chart at a time. It looks like we have enough context to propose a scatter plot using seaborn and matplotlib. I’ll prepare a runnable code block that produces the plot while keeping the code minimal.
Let’s ensure to include elements like titles, labels, and legends. I’ll use scatter points in gray and highlight subsets in orange and red, annotating important dates and formatting the x-axis.Crafting the visualization
I need to rotate the labels and consider using seaborn for consistency in the theme. It’s important to note that values less than -5 are a subset of those less than -3. If I plot them in orange and red, the red points will overlap on the orange ones, which is good to keep in mind for clarity.
I’ll provide a concise explanation, possibly mentioning the clusters from 2008 and 2020 as context, and then I’ll follow that up with the necessary code.
The extreme drops are clearly clustered in crisis windows like 1987, 2008, and 2020, so a time-based scatter plot is a good way to show that concentration.
Plot extreme down days
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates
sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(16, 7))
mask_lt_3 = sp500["Daily Return"] < -3
mask_lt_5 = sp500["Daily Return"] < -5
# All days
ax.scatter(
sp500["Date"],
sp500["Daily Return"],
color="lightgray",
alpha=0.35,
s=12,
label="All days"
)
# Extreme down days
ax.scatter(
sp500.loc[mask_lt_3, "Date"],
sp500.loc[mask_lt_3, "Daily Return"],
color="orange",
alpha=0.8,
s=24,
label="Return < -3%"
)
ax.scatter(
sp500.loc[mask_lt_5, "Date"],
sp500.loc[mask_lt_5, "Daily Return"],
color="red",
alpha=0.9,
s=30,
label="Return < -5%"
)
# Reference lines
ax.axhline(-3, color="orange", linestyle="--", linewidth=1.5, alpha=0.9)
ax.axhline(-5, color="red", linestyle="--", linewidth=1.5, alpha=0.9)
# Annotate 3 notable crash days
annot_dates = pd.to_datetime(["1987-10-19", "2008-10-15", "2020-03-16"])
annot_df = sp500[sp500["Date"].isin(annot_dates)]
for _, row in annot_df.iterrows():
ax.annotate(
row["Date"].strftime("%Y-%m-%d"),
xy=(row["Date"], row["Daily Return"]),
xytext=(8, -12),
textcoords="offset points",
fontsize=9,
color="black"
)
ax.set_title("Extreme Down Days in S&P 500 History", fontsize=15)
ax.set_xlabel("Date")
ax.set_ylabel("Daily Return (%)")
ax.xaxis.set_major_locator(mdates.YearLocator(5))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
plt.xticks(rotation=45)
ax.legend(frameon=True)
plt.tight_layout()
plt.show()If you want, the next step can be a zoomed-in version for just the worst periods so the 2008 and 2020 clusters are easier to inspect.
Thinking
Preparing data visualization
I need to determine whether to follow path A or B. I've got a chart output plus numeric/text values from a prior cell to support it. My main goal here is to create a visualization and return the plot. The latest outputs should be enough, as there’s a chart and supporting numeric values. I should analyze the outputs, connect them to the goal, and mention how the plot effectively highlights clustering and annotations. I'll ensure to keep it concise!
The visualization matches the requested view of extreme down days in S&P 500 history and is supported by the values already shown in the notebook.
It highlights the main pattern clearly: most trading days are ordinary, while the worst declines are rare and cluster in major crisis periods rather than being spread evenly through time. The supporting table shows only 132 days below -3% and just 30 below -5%, which is a very small share of all trading days.
The annotated crash dates are well chosen because they represent the most important extremes in the data:
- 1987-10-19 is the single worst day at -20.47%
- 2008-10-15 is one of the most severe losses during the financial crisis at -9.03%
- 2020-03-16 is the second worst day in the dataset at -11.98%
The worst-day table also confirms the visual clustering you would expect to see on the chart:
- 2008 contributes many of the deepest declines
- March 2020 contains several of the largest one-day drops
- 1987 stands out for the most extreme single-day crash
The year summary reinforces that pattern:
- 2008 had 23 days below -3% and 11 below -5%
- 2020 had 16 days below -3% and 5 below -5%
- 1987 had 6 days below -3% and 3 below -5%
So the plot does exactly what it should: it makes the rare, crisis-driven nature of the worst S&P 500 down days easy to see, while also distinguishing moderate extreme losses below -3% from the most severe collapses below -5%.
Thinking
Deciding next steps for analysis
I need to follow iterative rules for the user’s request to analyze returns after extreme negative days. Since the user asks for a full task, I'll need to define the first step. I think creating forward return columns and possibly a table is a good start. I'll provide runnable Python code for generating a "next_5d_return" definition. Since we have fresh outputs available, let's compute those forward returns and inspect a few rows carefully.Crafting forward return calculations
I'm working on the formula for calculating the cumulative 5-trading-day return, specifically from t+1 to t+5. The formula looks like: ((\prod_{i=1}^5 (1+r_{t+i}/100)-1)\times100). I need to code this and also preserve the S&P 500 data while adding columns for "crash_3" and "crash_5." The primary goals are to create forward-return columns and group flags. I’ll finalize this code and show the relevant columns, inspecting the first 10 rows for clarity.
Let’s start by creating the forward-return columns and the crash flags, making sure the 5-day return is a true future cumulative return from $t+1$ through $t+5$.
Create future return flags
import numpy as np
from IPython.display import display
# Next-day return
sp500["next_day_return"] = sp500["Daily Return"].shift(-1)
# Next 5-trading-day cumulative return: from t+1 to t+5
future_growth_5d = (1 + sp500["Daily Return"].shift(-1) / 100).rolling(5).apply(np.prod, raw=True)
sp500["next_5d_return"] = (future_growth_5d.shift(-4) - 1) * 100
# Crash-day flags
sp500["crash_3"] = sp500["Daily Return"] < -3
sp500["crash_5"] = sp500["Daily Return"] < -5
display(
sp500.loc[:, ["Date", "Daily Return", "next_day_return", "next_5d_return", "crash_3", "crash_5"]].head(10)
)| Price | Date | Daily Return | next_day_return | next_5d_return | crash_3 | crash_5 |
|---|---|---|---|---|---|---|
| 0 | 1950-01-04 | 1.140459 | 0.474777 | 1.424331 | False | False |
| 1 | 1950-01-05 | 0.474777 | 0.295329 | -1.004135 | False | False |
| 2 | 1950-01-06 | 0.295329 | 0.588930 | -1.825674 | False | False |
| 3 | 1950-01-09 | 0.588930 | -0.292736 | -2.107732 | False | False |
| 4 | 1950-01-10 | -0.292736 | 0.352316 | -0.998239 | False | False |
| 5 | 1950-01-11 | 0.352316 | -1.930953 | -1.404329 | False | False |
| 6 | 1950-01-12 | -1.930953 | -0.536994 | 0.656328 | False | False |
| 7 | 1950-01-13 | -0.536994 | 0.299935 | 1.379721 | False | False |
| 8 | 1950-01-16 | 0.299935 | 0.837328 | 1.196177 | False | False |
| 9 | 1950-01-17 | 0.837328 | -0.059313 | 0.000000 | False | False |
Next I’d compute the comparison table for all days, crash_3, and crash_5.
Thinking
Evaluating Python Code for Statistics
I need to decide between option A or B, but the latest outputs only show forward columns sample. I should compute a comparison stats table because the current insights are insufficient. My plan is to provide a minimal runnable Python code block that gathers the missing evidence. I’ll calculate statistics for crash_3 and crash_5 on next_day and next_5d, and ensure that my next_5d calculations are correct. I'll also need to clarify my formulas to verify that they align with the expected returns.Building a Comparison Table
I need to work on creating a function that computes future_growth_5d by shifting and using the rolling product over specific values. The main goal is to have a minimal and distinct code block compared to previous steps. There's a weird 0.000000 in the output, which I suspect might be from display truncation, so I’ll ignore that for now. I'll compute statistics while excluding NaNs for valid next_day and next_5d. Including counts and rounded percentages in the comparison table will help improve clarity for the user.Crafting the Code Structure
I'm thinking about how to structure the code effectively. First, I’ll define a dictionary for the groups, such as 'All days', 'Crash < -3%', and 'Crash < -5%', using the appropriate indices from sp500. Then, I’ll create a summary function to calculate mean, median, standard deviation, and win rate. For each group and horizon, I'll build rows by filtering the sp500 data and dropping NaN values. Finally, I might pivot the data to create a clear comparison table with MultiIndex columns.Creating comparison table
I’m looking to create a clear comparison table for the user, combining stats for the next day and the next five days. I can organize it by computing returns for each group and including columns like mean, median, standard deviation, and win rates for both time frames.
I should simplify the names in the table for clarity. I’ll also ensure my response includes brief insights from the current outputs and present the code clearly without additional headings.
The forward-return setup looks ready: the latest sample confirms next_day_return, next_5d_return, crash_3, and crash_5 are now attached to sp500. The missing piece is the comparison summary for all days versus crash days.
Compare crash return stats
import pandas as pd
from IPython.display import display
groups = {
"All days": pd.Series(True, index=sp500.index),
"Crash < -3%": sp500["crash_3"],
"Crash < -5%": sp500["crash_5"],
}
def summarize(mask, col):
s = sp500.loc[mask, col].dropna()
return {
"N": len(s),
"Mean": s.mean(),
"Median": s.median(),
"Std Dev": s.std(),
"Win Rate (%)": (s.gt(0).mean() * 100),
}
rows = []
for name, mask in groups.items():
nd = summarize(mask, "next_day_return")
f5 = summarize(mask, "next_5d_return")
rows.append({
"Group": name,
"Next Day N": nd["N"],
"Next Day Mean (%)": nd["Mean"],
"Next Day Median (%)": nd["Median"],
"Next Day Std Dev (%)": nd["Std Dev"],
"Next Day Win Rate (%)": nd["Win Rate (%)"],
"Next 5D N": f5["N"],
"Next 5D Mean (%)": f5["Mean"],
"Next 5D Median (%)": f5["Median"],
"Next 5D Std Dev (%)": f5["Std Dev"],
"Next 5D Win Rate (%)": f5["Win Rate (%)"],
})
comparison = pd.DataFrame(rows).round(3)
display(comparison)| Group | Next Day N | Next Day Mean (%) | Next Day Median (%) | Next Day Std Dev (%) | Next Day Win Rate (%) | Next 5D N | Next 5D Mean (%) | Next 5D Median (%) | Next 5D Std Dev (%) | Next 5D Win Rate (%) | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | All days | 19190 | 0.036 | 0.050 | 0.993 | 53.137 | 19186 | 0.181 | 0.298 | 2.175 | 56.870 |
| 1 | Crash < -3% | 132 | 0.317 | 0.268 | 3.425 | 57.576 | 132 | 0.598 | 0.618 | 5.658 | 56.061 |
| 2 | Crash < -5% | 30 | 1.923 | 3.097 | 5.436 | 76.667 | 30 | 1.999 | 3.634 | 7.212 | 66.667 |
Yes—there is some evidence of mean reversion after extreme down days, especially after the most severe crashes, but the effect is noisy.
From the comparison table:
-
Baseline (all days):
- Next-day mean return: 0.036%
- Next-day median: 0.050%
- Next-day win rate: 53.1%
- Next 5-day mean return: 0.181%
- Next 5-day median: 0.298%
- Next 5-day win rate: 56.9%
-
After days worse than -3%:
- Next-day mean return rises to 0.317%
- Next-day median rises to 0.268%
- Next-day win rate increases slightly to 57.6%
- Next 5-day mean return is 0.598%
- Next 5-day median is 0.618%
- Next 5-day win rate is 56.1%
-
After days worse than -5%:
- Next-day mean return jumps to 1.923%
- Next-day median is 3.097%
- Next-day win rate rises sharply to 76.7%
- Next 5-day mean return is 1.999%
- Next 5-day median is 3.634%
- Next 5-day win rate is 66.7%
What this means:
- The market tends to rebound more strongly after very large down days, particularly after -5% or worse sessions.
- The effect is much clearer for the next day than for the next 5 days.
- For the -3% group, the signal is only modest:
- returns are better than baseline,
- but the 5-day win rate is actually about the same as baseline,
- so the effect is not especially strong.
- For the -5% group, the rebound signal is much stronger:
- both mean and median forward returns are substantially above normal,
- and win rates are clearly higher than baseline.
But it is also noisy:
- Volatility after crash days is much higher than normal.
- Baseline next-day standard deviation: 0.993%
- After < -3% days: 3.425%
- After < -5% days: 5.436%
- The same pattern holds for next 5-day returns:
- Baseline standard deviation: 2.175%
- After < -3% days: 5.658%
- After < -5% days: 7.212%
So the practical interpretation is:
- Yes, there is mean reversion, especially after the most extreme one-day drops.
- No, it is not clean or stable—the rebound comes with much larger dispersion, meaning outcomes vary a lot.
- The strongest apparent effect is in the small sample of 30 days with returns below -5%, so it looks meaningful but should be treated cautiously because rare-event estimates are less stable.
Thinking
Analyzing plotting requirements
I need to act as an AI notebook analyst and respond step by step. The user specifically asked for both histograms and boxplots from the latest notebook state. Although the rules suggest one chart or table per step, I think it's permissible to combine both plots in a single code block, possibly using subplots. I'll summarize briefly from observed outputs while making sure not to assume anything unseen, and then I’ll provide the required code.Preparing plotting code and summary
I need to provide a concise explanation followed by the plotting code, stopping there without asking for output sharing. Although the user requested a brief summary, I want to avoid assuming too much about unseen plots. However, I can derive some insights from means, medians, and standard deviations to indicate potential skewness. For example, if the mean is less than the median slightly, it may suggest left skewness. I think a cautious conclusion about negative tails could work, but I still need to prioritize the code.Creating plotting code
It seems I need to analyze the latest outputs against the original goal. If results are missing, I should provide code for the plots and a brief note on current summary implications. I plan to use seaborn and matplotlib, ensuring minimal and runnable code. I can create two subplots with a histogram/KDE on top and a boxplot underneath. Using a long-form DataFrame, I'll include mean lines and possibly clip the x-range. The user's request for a histogram or KDE on the same chart will guide the implementation.
The summary table already suggests two things before plotting:
- Dispersion rises a lot after crash days, especially for
< -5%. - For
< -5%, the median next-day return (3.097%) is above the mean (1.923%), so the average does not look like it’s being driven only by a few huge rebounds.
Here’s the plotting step for the overlaid next-day distributions plus boxplots:
Plot next day returns
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
plot_df = pd.concat([
pd.DataFrame({
"Group": "All days",
"next_day_return": sp500["next_day_return"].dropna()
}),
pd.DataFrame({
"Group": "Daily Return < -3%",
"next_day_return": sp500.loc[sp500["crash_3"], "next_day_return"].dropna()
}),
pd.DataFrame({
"Group": "Daily Return < -5%",
"next_day_return": sp500.loc[sp500["crash_5"], "next_day_return"].dropna()
}),
], ignore_index=True)
order = ["All days", "Daily Return < -3%", "Daily Return < -5%"]
palette = {
"All days": "gray",
"Daily Return < -3%": "orange",
"Daily Return < -5%": "red",
}
means = (
plot_df.groupby("Group")["next_day_return"]
.mean()
.reindex(order)
)
fig, axes = plt.subplots(
2, 1, figsize=(12, 10),
gridspec_kw={"height_ratios": [3, 1]},
constrained_layout=True
)
# Overlaid KDEs
for group in order:
sns.kdeplot(
data=plot_df.loc[plot_df["Group"] == group],
x="next_day_return",
ax=axes[0],
label=f"{group} (mean={means[group]:.3f}%)",
color=palette[group],
linewidth=2,
fill=False,
bw_adjust=1.1
)
axes[0].axvline(0, color="black", linestyle="--", linewidth=1.5, label="0%")
for group in order:
axes[0].axvline(
means[group],
color=palette[group],
linestyle=":",
linewidth=2
)
axes[0].set_title("Distribution of Next-Day Returns After Market Drops", fontsize=14)
axes[0].set_xlabel("Next-Day Return (%)")
axes[0].set_ylabel("Density")
axes[0].legend()
# Boxplots
sns.boxplot(
data=plot_df,
x="next_day_return",
y="Group",
order=order,
palette=palette,
ax=axes[1],
orient="h"
)
axes[1].axvline(0, color="black", linestyle="--", linewidth=1.5)
axes[1].set_title("Boxplots of Next-Day Returns", fontsize=12)
axes[1].set_xlabel("Next-Day Return (%)")
axes[1].set_ylabel("")
plt.show()If you want, the next step can be a brief interpretation of the actual plotted shapes (skew, tail behavior, and whether rebounds are concentrated in a few outliers).
Thinking
Deciding on analysis approach
I need to choose between Path A or B for my analysis. The latest outputs give a chart figure and a numeric comparison table. The goal requires me to summarize skew and extreme rebounds that are affecting averages and dispersion comparisons. I have enough data from previous outputs to go with Path A.
I should analyze the latest outputs and connect the insights without restating the visible outputs directly. It’s all about concise interpretation. Let’s put that together!
The plots support a clear pattern: next-day returns shift upward after extreme down days, but they also become much more dispersed.
- All days form the tightest distribution, centered close to zero with a slight positive tilt.
- After daily drops worse than -3%, the next-day distribution moves somewhat to the right, indicating a modest rebound tendency.
- After daily drops worse than -5%, the distribution shifts much more strongly to the right, consistent with a much larger average rebound.
On skewness:
- The crash-day groups appear less symmetric and more irregular than the full-sample distribution because they are based on far fewer observations and much higher volatility.
- For the < -5% group, the fact that the median next-day return is well above zero and even above the mean suggests the rebound effect is not just coming from a couple of huge outliers. The central tendency itself is strongly positive.
- For the < -3% group, the rebound is weaker and looks more mixed.
On whether extreme rebounds are driving the average:
- For < -3%, the mean is only modestly above the median, so large rebounds may help the average somewhat, but they do not appear to be the whole story.
- For < -5%, the boxplot and summary stats suggest broadly positive post-crash behavior, not merely a few exceptional bounce days.
On dispersion:
- Dispersion rises sharply as the prior-day drop gets more severe.
- The boxplots should show this clearly: the < -3% and especially < -5% groups have much wider spread and more extreme observations than the full sample.
- That matches the summary statistics, where next-day volatility after crash days is several times larger than normal.
So the main takeaway is:
- Yes, there is evidence of mean reversion after extreme negative days, especially after very large drops below -5%.
- But that rebound comes with substantially higher uncertainty, so the post-crash environment is not just more positive on average—it is also much more volatile.
The full-sample results showed mean reversion, especially for Daily Return < -5%, but that group only had 30 cases total, so the pre/post-2000 split may be a bit noisy.
A good next step is to compute the period-by-period summary table first:
Compare periods after crashes
import pandas as pd
from IPython.display import display
period_mask = sp500["Date"] < pd.Timestamp("2000-01-01")
sp500["Period"] = period_mask.map({True: "Pre-2000", False: "Post-2000"})
rows = []
for period in ["Pre-2000", "Post-2000"]:
for label, mask in {
"Daily Return < -3%": sp500["crash_3"],
"Daily Return < -5%": sp500["crash_5"],
}.items():
s = sp500.loc[(sp500["Period"] == period) & mask, "next_day_return"].dropna()
rows.append({
"Period": period,
"Group": label,
"N": len(s),
"Mean (%)": s.mean(),
"Median (%)": s.median(),
"Win Rate (%)": s.gt(0).mean() * 100,
})
period_comparison = pd.DataFrame(rows).round(3)
display(period_comparison)
period_comparison| Period | Group | N | Mean (%) | Median (%) | Win Rate (%) | |
|---|---|---|---|---|---|---|
| 0 | Pre-2000 | Daily Return < -3% | 34 | 0.145 | 0.604 | 61.765 |
| 1 | Pre-2000 | Daily Return < -5% | 10 | 0.653 | 2.591 | 80.000 |
| 2 | Post-2000 | Daily Return < -3% | 98 | 0.377 | 0.238 | 56.122 |
| 3 | Post-2000 | Daily Return < -5% | 20 | 2.559 | 3.651 | 75.000 |
| Period | Group | N | Mean (%) | Median (%) | Win Rate (%) | |
|---|---|---|---|---|---|---|
| 0 | Pre-2000 | Daily Return < -3% | 34 | 0.145 | 0.604 | 61.765 |
| 1 | Pre-2000 | Daily Return < -5% | 10 | 0.653 | 2.591 | 80.000 |
| 2 | Post-2000 | Daily Return < -3% | 98 | 0.377 | 0.238 | 56.122 |
| 3 | Post-2000 | Daily Return < -5% | 20 | 2.559 | 3.651 | 75.000 |
Thinking
Analyzing rebound trends
I'm looking at rebound trends before and after 2000. In the pre-2000 period with a smaller mean but a higher median, it suggests more consistent, milder bounces. For rebounds of less than -5%, the pre-2000 mean is 0.653 and the median is 2.591, showing an 80% win rate. Post-2000, the mean is 2.559 and median at 3.651 with a 75% win rate. This indicates stronger post-2000 rebounds in magnitude, but the samples are quite small. I need to mention that sample sizes are important, especially for rebounds of less than -5%.
The split suggests that next-day mean reversion is present both before and after 2000, but its shape has changed.
-
After drops worse than -3%, both periods still show a positive next-day average, so the rebound effect did not disappear.
- Pre-2000 looks more consistent: higher median and higher win rate.
- Post-2000 has the larger mean, but a lower median and lower win rate. That points to rebounds that are less frequent or less uniform, with some bigger upside days lifting the average.
-
After drops worse than -5%, the rebound effect appears stronger post-2000:
- The mean is much higher post-2000.
- The median is also higher post-2000, which is important because it suggests the stronger result is not only coming from a few extreme bounce days.
- The win rate stays high in both periods, though it is slightly lower post-2000.
So the best overall reading is:
- Mean reversion remains stable in direction across both eras: extreme down days are followed, on average, by positive next-day returns both pre-2000 and post-2000.
- Its magnitude has generally strengthened after 2000, especially for the most severe down days.
- Its consistency is more mixed after 2000, particularly for the
< -3%group, where the average rebound is stronger but the median/win rate are weaker.
One caution: the < -5% sample is very small in each split (N=10 pre-2000, N=20 post-2000), so that part of the comparison is directionally useful but should be treated as noisy.
Thinking
Adding disclaimer for advice
I need to make sure to include a disclaimer that this isn’t financial advice when answering the user's question. It's crucial to state that, without adding any code since I just need to modify text. I should mention the outputs clearly but keep it concise and natural, without headers. The currency rule isn't relevant here, so let's focus on crafting a brief summary that includes the necessary disclaimer.
Yes — here’s a clean version with the disclaimer included:
The period split shows that the next-day mean reversion effect remains positive both before and after 2000, but it looks stronger post-2000, especially after the most extreme down days.
-
For Daily Return < -3%:
- Pre-2000: mean
0.145%, median0.604%, win rate61.8%(N=34) - Post-2000: mean
0.377%, median0.238%, win rate56.1%(N=98)
Interpretation: the rebound is still present in both periods. Post-2000 has a higher average bounce, but pre-2000 had a higher median and win rate, suggesting the earlier period may have had more consistent rebounds, while the later period may include a few larger upside reversals that lift the mean.
- Pre-2000: mean
-
For Daily Return < -5%:
- Pre-2000: mean
0.653%, median2.591%, win rate80.0%(N=10) - Post-2000: mean
2.559%, median3.651%, win rate75.0%(N=20)
Interpretation: the rebound effect appears materially stronger post-2000 for the most severe down days. Both the mean and median are higher, while the win rate stays high in both periods.
- Pre-2000: mean
Overall, the evidence suggests:
- The direction of mean reversion is stable over time: extreme down days are followed by positive next-day returns on average in both eras.
- The magnitude appears to have increased after 2000, especially for very large selloffs.
- The consistency is less clearly improved, since medians and win rates do not uniformly rise.
One important caution: the < -5% group is based on a small sample (10 pre-2000, 20 post-2000), so those estimates are more sensitive to a few unusual episodes.
Disclaimer: This analysis is for educational and research purposes only and is not financial advice. Past market behavior does not guarantee future results, and these patterns should not be used on their own as an investment strategy.
Start using AI for Data Analysis
MLJAR Studio helps you analyze data with AI, run machine learning workflows, and build reproducible notebook-based results on your own computer.
Runs locally • Supports local LLMs