Time Series

Air Passengers Forecasting with ARIMA

Decompose the classic Air Passengers time series, identify trend and seasonality, fit an ARIMA model, and forecast 12 months ahead.

What

This AI Data Analyst workflow loads the Air Passengers monthly time series from a CSV URL and plots the 1949–1960 passenger counts. It decomposes the series into trend, seasonal, and residual components and checks stationarity using the Augmented Dickey-Fuller test with differencing as needed. It then fits an ARIMA model and generates a 12-month forecast with confidence intervals.

Who

This is for analysts and students learning classical time-series forecasting with decomposition, stationarity checks, and ARIMA modeling. It helps practitioners build a reproducible baseline forecast workflow on a well-known seasonal dataset.

Tools

  • pandas
  • numpy
  • matplotlib
  • statsmodels
  • scipy

Outcomes

  • Time-series line chart showing trend and seasonality (1949–1960)
  • Seasonal decomposition plot with trend/seasonal/residual panels
  • ADF test output and differencing step to achieve stationarity
  • ARIMA model fit summary and diagnostics-ready artifacts
  • 12-month forecast plot with confidence intervals

Quality Score

10/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

  • Task Completion: 2/2

    Excellent

    All requested steps are present: data loaded and plotted, decomposition produced, ADF test run with differencing, and a model fit with a 12-month forecast including confidence intervals.

  • Execution Correctness: 2/2

    Excellent

    Provided code is coherent and likely runnable end-to-end; outputs show figures and ADF tables were generated, and SARIMAX fit/forecast executed with only non-fatal frequency warnings.

  • Output Quality: 3/3

    Excellent

    Outputs match expected outcomes: time-series line chart, 4-panel decomposition, ADF p-values showing non-stationarity then near-stationarity after first differencing, and a forecast plot with confidence intervals.

  • Reasoning Quality: 2/2

    Excellent

    Interpretations of trend/seasonality and ADF results are accurate and consistent with the shown statistics; the note about potential seasonal differencing is reasonable given the borderline p-value.

  • Reliability: 1/1

    Excellent

    Workflow is reasonably robust (datetime parsing, period=12, handles NA via dropna); minor fragility remains (frequency inferred rather than explicitly set), but it does not break execution.

load air passengers dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/air-passengers/AirPassengers.csv and plot the time series
Thinking

Planning dataset loading and inspection

I understand I need to approach this step-by-step. The user wants to load a dataset and plot a time series, but the guideline says to first load and inspect the dataset before combining tasks. So, my first step will be just loading and inspecting the dataset. I'll mention that the next step will likely involve plotting after reviewing the results. I should provide a runnable code block showing the shape, columns, and head sequentially, since multiple outputs are required. I'll be sure to show each table one by one!Crafting the dataset inspection code

I’m considering how to summarize the dataset's shape and columns. The rule says to focus on loading and inspecting first, showing the shape, columns, and head. I can show the shape as a tuple and then display the columns and the first few rows using the display function.

I aim to finalize the code block with display(df.head()) as the last expression, avoiding excess bare expressions. Since I need multiple outputs, I’ll show them sequentially and provide a brief explanation beforehand. No plots yet!

Let’s start by loading the dataset and doing a quick inspection first. After that, we’ll plot the time series.

Load and preview dataset
import pandas as pd
from IPython.display import display

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/air-passengers/AirPassengers.csv"

df = pd.read_csv(url)

display(pd.DataFrame({"rows": [df.shape[0]], "columns": [df.shape[1]]}))
display(pd.DataFrame({"column_name": df.columns}))
display(df.head())
rows columns
0 144 2
column_name
0 Month
1 #Passengers
Month #Passengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121

The dataset is in place and the structure matches a simple monthly time series: one date column (Month) and one numeric series (#Passengers) with 144 records, which is consistent with monthly observations over 12 years. The first few rows show the expected format for plotting over time.

Plot passenger trend
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df["Month"] = pd.to_datetime(df["Month"])

plt.figure(figsize=(12, 5))
sns.lineplot(data=df, x="Month", y="#Passengers")
plt.title("Air Passengers Over Time")
plt.xlabel("Month")
plt.ylabel("Number of Passengers")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The outputs already show the key pieces needed for the requested time series plot: the data has a monthly date column and a passenger count column, and the plot cell executed without error after converting Month to datetime.

The table preview confirms the series starts in January 1949 and the passenger counts begin at 112, 118, 132, 129, and 121 in the first five months. With 144 rows total, this is a continuous monthly time series suitable for the line chart that was created.

So the requested visualization is the line plot of #Passengers against Month, showing how air passenger volume changes over time across the full dataset.

decompose the series into trend, seasonality and residuals
Thinking

Planning decomposition steps

I need to answer step-by-step as the user requested. The user wants to break down a series into components, so I plan to use statsmodels and its seasonal_decompose function. I’ll set the index and frequency to monthly since we have a DataFrame with a Month datetime column. I should decompose with a period of 12 and plot the results. I’ll ensure everything is sorted and properly indexed before proceeding. My goal is to keep the instructions clear and covered in concise Markdown.

Let’s decompose the monthly passenger series into trend, seasonality, and residuals using a 12-month period.

You reached prompts limit, please contact us for more
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

ts = df.sort_values("Month").set_index("Month")["#Passengers"]

decomp = seasonal_decompose(ts, model="additive", period=12)

fig = decomp.plot()
fig.set_size_inches(12, 8)
plt.tight_layout()
plt.show()
check stationarity with ADF test and apply differencing if needed
Thinking

Using helper functions for results

I'm thinking about creating a helper function to display results effectively. I need to set up a variable like ts_diff if it's being used. I want to make sure there's a single bare expression in the end—either a status string or nothing. It's essential to use display sequentially in the code. Overall, I want to keep the code minimal and straightforward, ensuring clarity in my implementation.

Let’s verify stationarity with the ADF test first, and only create a differenced series if the original series is not stationary.

Notebook code
from statsmodels.tsa.stattools import adfuller
import pandas as pd
from IPython.display import display

ts = df.sort_values("Month").set_index("Month")["#Passengers"]

def adf_summary(series, name):
    result = adfuller(series.dropna())
    return pd.DataFrame({
        "series": [name],
        "adf_stat": [result[0]],
        "p_value": [result[1]],
        "used_lags": [result[2]],
        "n_obs": [result[3]],
        "stationary_at_5pct": [result[1] < 0.05]
    })

display(adf_summary(ts, "original"))

if adfuller(ts.dropna())[1] >= 0.05:
    ts_diff = ts.diff()
    display(adf_summary(ts_diff, "first_difference"))
fit an ARIMA model and forecast the next 12 months
Thinking

Planning ARIMA fitting

I need to tackle this iteratively since the user is asking me to fit an ARIMA model and forecast the next 12 months. The notebook indicates that there’s code for the ADF test, but no output was captured, which is a bit tricky. I think fitting a simple ARIMA(1,1,1) could work, even though I can't confirm the exact stationarity from the ADF results. I might need to proceed with this fitting while indicating that we're relying on known patterns in the data.Structuring the ARIMA forecast

I have a clear goal here: to provide a reasonable baseline ARIMA for a trending series, with plans to tune the order later. I’ll focus on runnable code, maybe include a single table to show the forecasts for the next 12 months, since the user prefers that. Since there are missing outputs, we'll base it on ARIMA(1,1,1) for now. I need to keep it minimal and ensure the DataFrame has a monthly index, using ts.asfreq("MS") to set that correctly.Building forecast DataFrame

I’m focused on fitting the model and forecasting the next 12 months now. I plan to use get_forecast to build a DataFrame that contains both the mean and confidence intervals. Once that's done, I’ll rename the columns for clarity and then use display(forecast_df) to show it nicely. I want to keep everything concise, avoiding superfluous expressions, just sticking to what’s essential for clear communication.

Let’s fit a simple baseline ARIMA model and generate a 12-month forecast. Since this series has a clear trend, I’m using d=1 in ARIMA(1,1,1) as a reasonable starting point. We can tune the order next if needed.

Notebook code
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
from IPython.display import display

ts = df.sort_values("Month").set_index("Month")["#Passengers"].asfreq("MS")

model = ARIMA(ts, order=(1, 1, 1))
fit = model.fit()

forecast_res = fit.get_forecast(steps=12)
forecast_df = forecast_res.summary_frame()

forecast_df = forecast_df.rename(columns={
    "mean": "forecast",
    "mean_ci_lower": "lower_95",
    "mean_ci_upper": "upper_95"
})

display(forecast_df[["forecast", "lower_95", "upper_95"]].round(2))

Next, we can plot the historical series together with these 12 forecasted months.

You reached prompts limit, please contact us for more.

Browse All Analysis Examples