Time Series
Time Series Anomaly Detection in Python
Detect anomalies in a time series using rolling z-score and Isolation Forest, then visualize flagged points.
What
This AI Data Analyst workflow loads the NYC Taxi Demand time series from a URL and plots demand over time. It detects anomalies using a rolling mean with a 3-sigma (rolling z-score) rule and visualizes the flagged points on the series. It then applies Isolation Forest for anomaly detection and compares which timestamps are flagged by each method.
Who
This is for data analysts and data scientists who need a reproducible notebook pattern for time series anomaly detection. It helps users compare a simple statistical thresholding approach with a machine-learning method on the same dataset.
Tools
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
Outcomes
- Loaded and parsed NYC taxi demand time series from a remote CSV
- Line chart of taxi demand over time
- Rolling z-score (3-sigma) anomalies highlighted on the time series
- Isolation Forest anomalies highlighted and compared against the z-score method
Quality Score
1/10
Last scored: Apr 7, 2026
Task Completion: 0/2
Needs workThe workflow did not complete the defined anomaly detection tasks (rolling 3-sigma, Isolation Forest, comparison/overlap plots). Even the initial step (load NYC taxi demand and plot) was not successfully executed.
Execution Correctness: 0/2
Needs workThe provided code errors on read_csv due to referencing non-existent columns (parse_dates=['Order Date']) and then plotting x='Order Date', y='Sales', so it is not runnable for the specified dataset.
Output Quality: 0/3
Needs workNo valid plots or anomaly outputs were produced; the only output is a traceback. The code targets unrelated columns ('Order Date', 'Sales') rather than taxi demand fields.
Reasoning Quality: 1/2
Needs workThe narrative mentions defensively identifying datetime/numeric columns, but the actual code contradicts this by hard-coding incorrect column names, leading to failure.
Reliability: 0/1
Needs workThe approach is fragile and inconsistent with the dataset/task, with hard-coded assumptions that immediately break and no fallback logic implemented.
Thinking
Preparing to plot data
I don’t need to plot anything just yet. First, I'll focus on loading the data and inspecting it to ensure everything looks correct. Once I confirm that, I’ll move on to creating the plot. It makes sense to have the data ready and validated before visualizing it, right? So I’ll provide the necessary code for loading and inspecting the data first, then we can discuss the plotting aspect!
Let’s start by loading the dataset and inspecting its structure before plotting.
Load and inspect dataset
import pandas as pd
from IPython.display import display
url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/nyc-taxi-demand/nyc_taxi.csv"
df = pd.read_csv(url)
display(pd.DataFrame({"rows": [df.shape[0]], "columns": [df.shape[1]]}))
display(pd.DataFrame({"column_name": df.columns}))
df.head()| rows | columns | |
|---|---|---|
| 0 | 10320 | 2 |
| column_name | |
|---|---|
| 0 | timestamp |
| 1 | value |
| timestamp | value | |
|---|---|---|
| 0 | 2014-07-01 00:00:00 | 10844 |
| 1 | 2014-07-01 00:30:00 | 8127 |
| 2 | 2014-07-01 01:00:00 | 6210 |
| 3 | 2014-07-01 01:30:00 | 4656 |
| 4 | 2014-07-01 02:00:00 | 3820 |
The data is already in the notebook and the structure matches a time series for NYC taxi demand: one datetime column (timestamp) and one numeric demand column (value), with 10,320 half-hourly observations starting on 2014-07-01 00:00:00. The remaining step is to convert the timestamp to datetime and plot the series.
Plot taxi demand over time
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df["timestamp"] = pd.to_datetime(df["timestamp"])
plt.figure(figsize=(14, 5))
sns.lineplot(data=df, x="timestamp", y="value")
plt.title("NYC Taxi Demand Over Time")
plt.xlabel("Timestamp")
plt.ylabel("Taxi Demand")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()Thinking
Deciding between options
I need to choose between A or B based on the latest outputs, which include numeric/text values supporting the schema and a figure object. Sufficiency heuristics suggest if I have both a valid chart/figure and numeric/text values, option A is better. I need to extract concrete insights from the observed outputs, connecting them to the goal of plotting the time series. Also, the dataframe shape seems stale; the actual shape is 10,320 observations.
<!--final answer-->**Synthesizing time series data**I can see that the time series represents half-hourly demand counts starting from July 1, 2014. The exact frequency isn't explicitly stated, but we can observe that the first timestamps are spaced 30 minutes apart. It’s important to mention that the chart will plot demand over time. I’ll make sure to keep it concise and clear in my final response.
<!--final answer-->The data used for the plot is a 2-column time series with timestamp and value, and the sample rows show the demand is recorded at regular 30-minute intervals starting from 2014-07-01 00:00:00. With 10,320 observations, this gives a continuous demand series over time rather than a small sample.
The plotted figure corresponds to NYC taxi demand over time, using:
timestampon the x-axisvalueon the y-axis
From the visible sample values, demand is relatively high at midnight and then drops through the early morning hours, which is the kind of temporal pattern the time-series chart is meant to show across the full period.
Thinking
Analyzing anomaly detection
I need to focus on analyzing the latest outputs and understand the original user goal, which has now shifted to detecting anomalies using the rolling mean and 3-sigma rule. The notebook's variables show a dataframe shape of 144x2, while another section indicates a taxi dataframe with 10,320 rows. I should propose a next step for computing the rolling mean and standard deviation, then plotting the flagged points using seaborn and matplotlib, ensuring clarity with one chart per step.Planning anomaly detection
For the time series data that’s half-hourly, a rolling window of 48 makes sense—though I wonder if using the center would introduce data leakage. I should craft a concise explanation focusing on the observed data in 30-minute intervals while incorporating min_periods=48. The code should create a copy of the dataframe, check timestamps, and plot anomalies with upper and lower bands. I think using shifted rolling stats could help avoid contemporaneous contamination, which ends up being a solid decision.Deciding on plotting strategy
I’m considering whether to use centered rolling stats, but with the daily cycle in mind, a window of 48 seems suitable for half-hour intervals. I only need to compute and plot anomalies in one step. If many anomalies appear, I could suggest tuning the window. The code should ideally end with one clear expression, displaying the plot and maybe counting anomalies. I think it's best to stick with a single chart and end the code block with plt.show(). This feels like a solid plan!
Since the series is every 30 minutes, a 48-point window gives a 1-day rolling baseline. I’d use the previous 48 points to compute the rolling mean and standard deviation, then flag anything outside the 3-sigma band.
Detect demand anomalies
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# work on a copy and make sure time is sorted
s = df.copy()
s = s.sort_values("timestamp")
window = 48 # 48 half-hours = 1 day
# use only prior observations for the rolling baseline
s["roll_mean"] = s["value"].shift(1).rolling(window=window, min_periods=window).mean()
s["roll_std"] = s["value"].shift(1).rolling(window=window, min_periods=window).std()
s["upper"] = s["roll_mean"] + 3 * s["roll_std"]
s["lower"] = s["roll_mean"] - 3 * s["roll_std"]
s["is_anomaly"] = (
s["roll_mean"].notna() &
((s["value"] > s["upper"]) | (s["value"] < s["lower"]))
)
plt.figure(figsize=(14, 5))
sns.lineplot(data=s, x="timestamp", y="value", label="Demand", linewidth=1)
plt.scatter(
s.loc[s["is_anomaly"], "timestamp"],
s.loc[s["is_anomaly"], "value"],
color="red",
s=18,
label="Anomaly"
)
plt.title("NYC Taxi Demand Anomalies (Rolling Mean, 3-Sigma)")
plt.xlabel("Timestamp")
plt.ylabel("Taxi Demand")
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()Thinking
Analyzing data anomalies
I need to ensure that my code runs with minimal changes from the last step. I could compute anomalies again and display their count along with the first and last anomalies—a one-block code seems best. There’s a strange shape in my dataframe, showing as 144 x 2, which may not match earlier outputs. The rules suggest I verify the dataset schema before proceeding to avoid using outdated variables. I'll analyze the latest outputs to clarify the situation.Verifying data consistency
If there's inconsistency, I should definitely verify and gather any missing evidence. It might be tempting to avoid using the dataframe directly by reloading from the URL, but that wasn't the user's original goal. Still, since the notebook's state shows the dataframe is unrelated, it seems safer to reconstruct from the URL or use the latest cell source. I’ll try using the URL variable that might point to nyc_taxi.csv. I'll read from the URL again into taxi_df, compute anomaly counts, and create a compact table to stay on track.Outlining anomaly detection code
I'm focusing on keeping the first-load inspection separate from plotting or modeling since I'm not doing general analysis, just computing anomalies. I need to write a single minimal runnable code block that can show the anomaly count and the first and last 10 anomalies sequentially. The anomaly plot works without errors, confirming the rolling 3-sigma logic. I should include a summary table with detected anomalies and make sure to display everything appropriately using IPython display.Structuring anomaly output
The policy states I can include one chart or one table per step. For Path B, I think a compact table is best, but I still need to display the count. Creating a DataFrame for the first 10 anomalies while including the count separately in markdown could work. Alternatively, I could show anoms.head(10) and end with anoms.shape[0], though that's not quite a table. A summary table with the count and earliest/latest anomalies seems more appropriate to ensure I capture all necessary details.Designing anomaly table
I’m considering creating a table for the first 15 flagged points, which will also allow me to show the count using anoms.shape[0] as a separate expression. However, that wouldn’t be strictly a table. I could include all anomalies, but that might result in too many rows. A compact top-N table with a summary row in the same DataFrame sounds manageable. Using pd.concat for a summary row isn’t ideal, so I’ll aim for a table with timestamp, value, lower, and upper for the first 15 anomalies.
The anomaly chart was produced successfully, so the rolling 3-sigma detection logic is in place. What the current output does not show is the actual flagged timestamps and values, so this compact table surfaces the detected points behind the plot.
Detect rolling window anomalies
import pandas as pd
taxi = pd.read_csv(url)
taxi["timestamp"] = pd.to_datetime(taxi["timestamp"])
taxi = taxi.sort_values("timestamp")
window = 48
base = taxi["value"].shift(1)
taxi["roll_mean"] = base.rolling(window=window, min_periods=window).mean()
taxi["roll_std"] = base.rolling(window=window, min_periods=window).std()
taxi["upper"] = taxi["roll_mean"] + 3 * taxi["roll_std"]
taxi["lower"] = taxi["roll_mean"] - 3 * taxi["roll_std"]
anoms = taxi.loc[
taxi["roll_mean"].notna() &
((taxi["value"] > taxi["upper"]) | (taxi["value"] < taxi["lower"])),
["timestamp", "value", "lower", "upper"]
].reset_index(drop=True)
anoms.head(15)| timestamp | value | lower | upper | |
|---|---|---|---|---|
| 0 | 2015-01-27 18:00:00 | 12687 | -6155.315538 | 12127.982205 |
Using a 48-point rolling window (1 day of 30-minute intervals) and the 3-sigma rule, the notebook flagged an anomaly at:
- 2015-01-27 18:00:00 with observed demand 12,687
At that timestamp, the rolling bounds were approximately:
- lower bound: -6,155.32
- upper bound: 12,127.98
Since 12,687 > 12,127.98, that point is correctly flagged as an anomaly by the rule:
\text{anomaly if } value > μ_{rolling} + 3σ_{rolling} \text{ or } value < μ_{rolling} - 3σ_{rolling}
The plot you generated matches this: the time series is shown with flagged points overlaid in red, and the table confirms at least one detected spike beyond the rolling 3-sigma threshold.
Thinking
Planning anomaly detection
I need to answer the user's request using the current notebook state. I'll take it one step at a time and analyze the latest outputs for context. The user wants to apply the Isolation Forest for detecting anomalies and compare it with the z-score method, which I've already set up using the rolling 3-sigma.
For the next step, it seems logical to focus on fitting the Isolation Forest using the demand value along with time features like hour and day of the week. My goal is to generate anomaly flags and check how they overlap with the z-score method. Let’s keep it simple and actionable!Extracting features for analysis
I can mention that since the timestamp is ordered, it's important to extract simple cyclical features like hour and day of the week, along with the value. There's no target variable, so I need to be cautious about contamination and ensure the anomaly fraction is similar to the z-score method.
Next, the logical step is to run the Isolation Forest to mark anomalies in the same dataframe. I'll create features like hour, day of the week, and possibly weekend indicators. Let’s make sure to use scikit-learn for this!Simplifying feature selection
I’m considering that using the raw timestamp integer might overshadow the trend, so it's probably better to focus on features like hour and day of the week, and maybe even use sine/cosine transformations to keep it beginner-friendly. These features, along with the value, seem manageable.
Since Isolation Forest is tree-based, it doesn’t require scaling, which is a plus. I should use contamination based on the mean of detected anomalies to ensure fair comparisons. However, with only one anomaly detected out of 10,320, it’s too low to trust the model. Setting contamination to "auto" might be a better approach for stability.Evaluating anomaly detection methods
I'm considering whether using the Isolation Forest method on values alone could detect more anomalies effectively. While it's simpler and could highlight the same spike as the z-score, it might be less interesting. Since the user is interested in comparing, fitting on values only could be a good first step for a clean comparison, and then I could visualize the overlap. The original goal includes detecting anomalies, so I'll generate a comparison table now while noting the need for counts and overlaps in the next step.Planning anomaly comparison
I’m thinking about how to fit and compare counts and overlaps in one code block, possibly outputting a single table with flagged timestamps. To do this meaningfully, I could add a new column to the existing DataFrame instead of re-reading it. I want to build Isolation Forest flags based on features that consider time—like value, hour, and day of the week—because periodicity matters. For simplicity, I might choose to include value, hour, dayofweek, and month, while determining the contamination method for better accuracy.Deciding on contamination levels
I’m considering how to set contamination for the Isolation Forest method. Maybe I could set it at 0.01 to flag a larger proportion—possibly around 103 anomalies—but that's not a fair comparison to the z-score method. Instead, I might set it at 0.001 to find about 10 anomalies. Using max(0.001, s["is_anomaly"].mean()) could yield a similar result. Since the goal is to detect anomalies simply, I'll go with IsolationForest(..., contamination=0.001).Crafting code for anomaly detection
I'm starting conservatively here. I need to mention to the user if they want stricter or looser contamination adjustments. I think using n_estimators=300 and random_state=42 sounds good. It seems I want to create a DataFrame for comparison, perhaps outputting a summary table for anomalies detected.
The code is structured to use the Isolation Forest model for anomaly detection while extracting relevant time features from the dataset.
I need to remember to display only one summary table for clarity!Organizing anomaly summary
I need to focus on creating just one summary table for clarity. It could list timestamps flagged by either method, possibly including both flags, but I want to keep the number of rows manageable. Showing only the union of anomalies detected by both methods should make comparisons more straightforward. I think if there are 10 or more rows, that's acceptable.
I might store or update the existing s variable in the notebook state to keep everything organized.Analyzing anomaly detection
I'm creating a table that identifies anomalies using the Isolation Forest method with conservative contamination set at 0.1%. This step lists points flagged by either method for overlap inspection. Choosing a contamination of 0.001 results in about 10 anomalies, which is good. I've decided not to use the raw timestamp as it acts like an ID, focusing instead on calendar features and value. I should ensure the code wraps up neatly without excess expressions. Finally, I'll import sklearn, but I don't need to ask for running it. Maybe I’ll plot the overlap later!Preparing for plotting
Next, I'll plot both methods on the same chart. For the latest output, I've noted that the rolling 3-sigma has identified one clear spike. In the Isolation Forest, I'm keeping the raw timestamp out as a feature since it acts like an ID, and instead, I'll use demand alongside calendar features. This approach should provide a more meaningful comparison. After that, I’ll finalize the code to ensure everything runs smoothly!
The rolling 3-sigma method currently found one clear spike. For the Isolation Forest comparison, I’d avoid using the raw timestamp directly because it acts more like an order/ID field than a meaningful numeric feature. Instead, this step uses value plus simple calendar features.
Detect anomalies with IsolationForest
from sklearn.ensemble import IsolationForest
s["hour"] = s["timestamp"].dt.hour + s["timestamp"].dt.minute / 60
s["dayofweek"] = s["timestamp"].dt.dayofweek
s["month"] = s["timestamp"].dt.month
X = s[["value", "hour", "dayofweek", "month"]]
iso_model = IsolationForest(
n_estimators=300,
contamination=0.001,
random_state=42
)
s["iforest_anomaly"] = iso_model.fit_predict(X) == -1
comparison = s.loc[
s["is_anomaly"] | s["iforest_anomaly"],
["timestamp", "value", "is_anomaly", "iforest_anomaly"]
].reset_index(drop=True)
comparison| timestamp | value | is_anomaly | iforest_anomaly | |
|---|---|---|---|---|
| 0 | 2015-01-01 00:30:00 | 29547 | False | True |
| 1 | 2015-01-01 01:00:00 | 30236 | False | True |
| 2 | 2015-01-11 00:00:00 | 26653 | False | True |
| 3 | 2015-01-18 00:00:00 | 25423 | False | True |
| 4 | 2015-01-19 23:30:00 | 8189 | False | True |
| 5 | 2015-01-25 00:00:00 | 25026 | False | True |
| 6 | 2015-01-26 21:30:00 | 2611 | False | True |
| 7 | 2015-01-26 22:00:00 | 1783 | False | True |
| 8 | 2015-01-26 22:30:00 | 866 | False | True |
| 9 | 2015-01-26 23:00:00 | 297 | False | True |
| 10 | 2015-01-26 23:30:00 | 189 | False | True |
| 11 | 2015-01-27 18:00:00 | 12687 | True | False |
The two methods are detecting very different behavior on this series.
From the observed results:
- The rolling 3-sigma method flagged 1 anomaly:
- 2015-01-27 18:00:00, value 12,687
- Isolation Forest flagged 11 anomalies in the comparison output, including:
- unusually high demand points such as 2015-01-01 00:30:00 (29,547) and 2015-01-11 00:00:00 (26,653)
- an unusual low-demand run on 2015-01-26 21:30 to 23:30, dropping from 2,611 down to 189
The key comparison is:
- No overlap appears in the displayed results.
- The point flagged by the rolling z-score method was not flagged by Isolation Forest.
- The points flagged by Isolation Forest were not flagged by the rolling z-score method.
What this means:
- The rolling 3-sigma / z-score-style method is a local detector. It asks whether a point is extreme relative to the recent 48-point history. That is why it catches the 2015-01-27 18:00 spike: it stands out strongly versus its recent baseline.
- Isolation Forest is a more global multivariate detector here, using
value,hour,dayofweek, andmonth. It is finding observations that look isolated in that feature space, especially the very low late-night values on 2015-01-26 and some unusually high values at specific calendar times.
So, for your goal of comparing methods:
- Rolling z-score is better for spotting sudden departures from recent normal behavior.
- Isolation Forest is better for spotting globally unusual patterns across value and time-related features.
- In this dataset/run, the methods are complementary rather than redundant.