E-commerce Sales Analysis in Python

Explore an e-commerce sales dataset with monthly trends, top products, category breakdowns, and average order value analysis.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Online Retail Excel dataset from a URL and summarizes its structure with shape, dtypes, and missing values. It generates exploratory visuals for monthly revenue trends and identifies the top 10 products by total revenue. It computes average order value and compares AOV across countries with a focused country-level breakdown.

Who this example is for

This is for analysts and data scientists who want a conversational, code-generating EDA template for e-commerce transaction data in Python. It helps users quickly validate data quality and produce standard sales KPIs and plots for reporting or further modeling.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

Dataset shape, dtypes, and null counts
Monthly revenue trend line chart
Top 10 products by total revenue bar chart
Average order value overall and by country
Bar chart of top countries by AOV

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

pandas
numpy
matplotlib
seaborn
openpyxl

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

1load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
2plot monthly revenue trend
3show top 10 products by total revenue
4what is the average order value and how does it vary by country?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model Source	Score	Task	Execution	Output	Reasoning	Reliability	Run
glm-5.1	10/10	2/2	2/2	3/3	2/2	1/1	Open glm-5.1 conversation
gpt-5.4	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-5.4 conversation
gpt-oss:120b	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-oss:120b conversation
qwen3.5:397b	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3.5:397b conversation
gemma4:31b	9/10	2/2	2/2	3/3	1/2	1/1	Open gemma4:31b conversation
qwen3-coder-next	9/10	2/2	2/2	3/3	1/2	1/1	Open qwen3-coder-next conversation

How scoring works

Each run is graded on five dimensions:

Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-5.4

This run uses AI with a step-by-step list of prompts to perform the analysis task E-commerce Sales Analysis in Python. Below, you can review the full conversation generated with gpt-5.4. After the conversation, we show detailed score cards for this run.

load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape

Thinking

Evaluating dataset loading

I need to follow a step-by-step approach for the user's request about loading a dataset and showing its basic info and shape. Since the dataset isn’t loaded yet, I’ll first load it and then only show its shape, columns, and the head. Although the user asked for basic info too, I might need to provide code for both the shape and the info. I’ll display results sequentially, keeping in mind the guideline of one chart or table per step.Crafting dataset code

I’m getting ready to create a concise explanation and then the code. I’ll use pandas to read the Excel file from a URL. The code will look like this: I’ll import pandas, read the Excel file, and then display key information.

I’ll include df.info(), a table of columns with their data types, and df.head(), along with df.shape at the end. I know I'll have two tables for inspection, which might seem like it goes against the "one table per step" rule, but this should be alright since it’s all related to dataset inspection steps.

Let’s start by loading the Excel file and inspecting the dataset structure.

Inspect dataset structure

import pandas as pd
from IPython.display import display

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx"
df = pd.read_excel(url)

df.info()

cols = pd.DataFrame({
    "column": df.columns,
    "dtype": df.dtypes.astype(str)
})
display(cols)

display(df.head())

df.shape

<class 'pandas.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    541909 non-null  object        
 1   StockCode    541909 non-null  object        
 2   Description  540455 non-null  object        
 3   Quantity     541909 non-null  int64         
 4   InvoiceDate  541909 non-null  datetime64[us]
 5   UnitPrice    541909 non-null  float64       
 6   CustomerID   406829 non-null  float64       
 7   Country      541909 non-null  str           
dtypes: datetime64[us](1), float64(2), int64(1), object(3), str(1)
memory usage: 40.0+ MB

	column	dtype
InvoiceNo	InvoiceNo	object
StockCode	StockCode	object
Description	Description	object
Quantity	Quantity	int64
InvoiceDate	InvoiceDate	datetime64[us]
UnitPrice	UnitPrice	float64
CustomerID	CustomerID	float64
Country	Country	str

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
0	536365	85123A	WHITE HANGING HEART T-LIGHT HOLDER	6	2010-12-01 08:26:00	2.55	17850.0	United Kingdom
1	536365	71053	WHITE METAL LANTERN	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom
2	536365	84406B	CREAM CUPID HEARTS COAT HANGER	8	2010-12-01 08:26:00	2.75	17850.0	United Kingdom
3	536365	84029G	KNITTED UNION FLAG HOT WATER BOTTLE	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom
4	536365	84029E	RED WOOLLY HOTTIE WHITE HEART.	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom

(541909, 8)

The dataset is loaded and its basic structure is clear from the output.

It contains 541,909 rows and 8 columns, so the shape is (541909, 8).

The columns are:

InvoiceNo — object
StockCode — object
Description — object
Quantity — int64
InvoiceDate — datetime
UnitPrice — float64
CustomerID — float64
Country — string

A few useful observations from the info output:

InvoiceNo, StockCode, Quantity, InvoiceDate, UnitPrice, and Country are fully populated.
Description has some missing values: 540,455 non-null, so 1,454 are missing.
CustomerID has many missing values: 406,829 non-null, so 135,080 are missing.
Memory usage is about 40.0 MB.

The first few rows show this is transactional retail data with invoice numbers, item codes, product descriptions, quantities, timestamps, prices, customer IDs, and country.

#10

plot monthly revenue trend

#11

Thinking

Clarifying notebook state

I’m trying to respond based on the current state of the notebook. There's a bit of confusion because the variable for the iris dataframe conflicts with what I see in the output; it looks like the df is actually retail data. I really just need to aggregate the monthly revenue by calculating revenue as Quantity times Unit Price, but I should also note that cancellations or returns could negatively impact revenue. Since the user asked for a plot, I'll make sure the code is concise and runnable, likely using seaborn or matplotlib.Finalizing my plotting code

I'm considering how to end this code block with just a bare expression, as we can rely on the plot without needing a final expression. I think it's important to clarify that I'll plot net monthly revenue, calculated as Quantity times Unit Price, considering that returns and cancellations will still show negative values in the data. After coding, if there are any missing outputs, I should have the user run it without needing to ask. Let's focus on crafting one clear chart!

#12

Let’s plot net monthly revenue as Quantity × UnitPrice, grouped by invoice month. This keeps returns/cancellations in the data, so the trend reflects net sales.

#13

Plot monthly revenue trend

import seaborn as sns
import matplotlib.pyplot as plt

monthly_revenue = df.copy()
monthly_revenue["Revenue"] = monthly_revenue["Quantity"] * monthly_revenue["UnitPrice"]
monthly_revenue["Month"] = monthly_revenue["InvoiceDate"].dt.to_period("M").dt.to_timestamp()

monthly_revenue = (
    monthly_revenue.groupby("Month", as_index=False)["Revenue"]
    .sum()
)

plt.figure(figsize=(12, 5))
sns.lineplot(data=monthly_revenue, x="Month", y="Revenue", marker="o")
plt.title("Monthly Revenue Trend")
plt.xlabel("Month")
plt.ylabel("Revenue")
plt.xticks(rotation=45)
plt.tight_layout()

#14

#15

The monthly revenue trend chart was created successfully, so the requested visualization is in place. From the setup used, revenue is calculated as Quantity × UnitPrice and summed by calendar month, which means the line reflects net monthly revenue including negative transactions such as returns or cancellations.

To support the chart with the actual month-by-month values:

#16

Round revenue values

monthly_revenue.assign(Revenue=monthly_revenue["Revenue"].round(2))

#17

	Month	Revenue
0	2010-12-01	748957.02
1	2011-01-01	560000.26
2	2011-02-01	498062.65
3	2011-03-01	683267.08
4	2011-04-01	493207.12
5	2011-05-01	723333.51
6	2011-06-01	691123.12
7	2011-07-01	681300.11
8	2011-08-01	682680.51
9	2011-09-01	1019687.62
10	2011-10-01	1070704.67
11	2011-11-01	1461756.25
12	2011-12-01	433668.01

#18

The monthly revenue trend shows a clear rise over 2011, with some seasonality and a strong year-end peak.

Key takeaways from the plotted values:

Revenue starts at 748,957.02 USD in 2010-12.
There is a dip in early 2011, reaching 498,062.65 USD in 2011-02.
From spring through summer, revenue stays relatively steady, mostly between about 681,300 USD and 723,334 USD.
A strong acceleration begins in 2011-09 at 1,019,687.62 USD.
The highest month is 2011-11 with 1,461,756.25 USD, making it the clear peak in the trend.
2011-12 drops to 433,668.01 USD, which is likely because the dataset only contains part of December rather than a full month.

So, the chart indicates:

moderate revenue in the first half of 2011,
a major uplift in autumn,
and a pronounced peak in November.

Because revenue was calculated as Quantity × UnitPrice, this is net monthly revenue, meaning returns or cancellations with negative quantities are included in the monthly totals.

#19

show top 10 products by total revenue

#20

Thinking

Deciding on product identifiers

I'm considering using StockCode + Description as the product identifier since descriptions can vary or repeat. With my notebook already loaded, I can easily integrate a concise code snippet. I'll ensure there are no comma-separated expressions and use display(top_products) for clarity.

To calculate revenue, I’m creating a copy of the DataFrame, multiplying Quantity with UnitPrice, and then grouping by StockCode and Description to sum their revenue. Also, I need to address that negative quantities or returns are included.

#21

I’d use StockCode + Description as the product key, since descriptions alone aren’t always unique. This will show net revenue by product, so returns/cancellations are included if they exist.

#22

Compute top product revenue

product_revenue = df.copy()
product_revenue["Revenue"] = product_revenue["Quantity"] * product_revenue["UnitPrice"]

top_products = (
    product_revenue.dropna(subset=["Description"])
    .groupby(["StockCode", "Description"], as_index=False)["Revenue"]
    .sum()
    .sort_values("Revenue", ascending=False)
    .head(10)
)

top_products["Revenue"] = top_products["Revenue"].round(2)
top_products

#23

	StockCode	Description	Revenue
4780	DOT	DOTCOM POSTAGE	206245.48
1464	22423	REGENCY CAKESTAND 3 TIER	164762.19
3045	47566	PARTY BUNTING	98302.98
4373	85123A	WHITE HANGING HEART T-LIGHT HOLDER	97715.99
4360	85099B	JUMBO BAG RED RETROSPOT	92356.03
2254	23084	RABBIT NIGHT LIGHT	66756.59
4783	POST	POSTAGE	66230.64
1117	22086	PAPER CHAIN KIT 50'S CHRISTMAS	63791.94
3291	84879	ASSORTED COLOUR BIRD ORNAMENT	58959.73
3128	79321	CHILLI LIGHTS	53768.06

#24

Here are the top 10 products by total revenue from the observed output:

DOTCOM POSTAGE (DOT) — 206245.48 USD
REGENCY CAKESTAND 3 TIER (22423) — 164762.19 USD
PARTY BUNTING (47566) — 98302.98 USD
WHITE HANGING HEART T-LIGHT HOLDER (85123A) — 97715.99 USD
JUMBO BAG RED RETROSPOT (85099B) — 92356.03 USD
RABBIT NIGHT LIGHT (23084) — 66756.59 USD
POSTAGE (POST) — 66230.64 USD
PAPER CHAIN KIT 50'S CHRISTMAS (22086) — 63791.94 USD
ASSORTED COLOUR BIRD ORNAMENT (84879) — 58959.73 USD
CHILLI LIGHTS (79321) — 53768.06 USD

A couple of useful takeaways:

DOTCOM POSTAGE is the top revenue contributor by a wide margin, so service/logistics-related line items are materially affecting the ranking.
Among physical merchandise, REGENCY CAKESTAND 3 TIER stands out as the strongest product.
The top sellers are mostly home décor, gift, and seasonal/party items, which fits the retail pattern already seen in the broader revenue trend.

These totals are based on net revenue calculated as Quantity × UnitPrice, so any returns or cancellations with negative quantities are included.

#25

what is the average order value and how does it vary by country?

#26

Thinking

Calculating invoice totals

I’m considering creating invoice totals and showing AOV overall, plus by country in the next step. Since the user is asking now, I think I should start with the first step. We can compute order values grouped by InvoiceNo and Country since each invoice belongs to a country.

I could display the overall AOV first and then create a summary table for the first ten countries. It seems better to show one comprehensive table at first, along with the overall average as a single expression.Creating invoice totals and AOV

I can't assume any outputs or numbers yet, so I'm thinking about providing code to create invoice-level totals, calculate the overall AOV, and produce a country summary table sorted by AOV. My main goal here is to summarize AOV effectively, even though it includes two outputs. I should display overall AOV then show the country-level AOV summary using something like country_aov.head(15).

I'll be careful with the code, considering whether to exclude non-product postage. For AOV, I need to note that using net invoice revenue as Quantity multiplied by UnitPrice includes returns and cancellations. However, negative totals from canceled invoices might drag down the AOV, so we might want to discuss that further.

#27

Let’s calculate order value at the invoice level first, then summarize the overall average order value and the average by country. I’m using net order value as Quantity × UnitPrice, so returns/cancellations stay in the totals.

#28

Compute average order value

order_values = (
    df.assign(Revenue=df["Quantity"] * df["UnitPrice"])
    .groupby(["InvoiceNo", "Country"], as_index=False)["Revenue"]
    .sum()
    .rename(columns={"Revenue": "OrderValue"})
)

overall_aov = pd.DataFrame({
    "metric": ["Average order value"],
    "value_usd": [order_values["OrderValue"].mean().round(2)]
})

country_aov = (
    order_values.groupby("Country", as_index=False)
    .agg(
        avg_order_value=("OrderValue", "mean"),
        orders=("InvoiceNo", "nunique")
    )
    .sort_values("avg_order_value", ascending=False)
)

country_aov["avg_order_value"] = country_aov["avg_order_value"].round(2)

display(overall_aov)
country_aov.head(15)

#29

	metric	value_usd
0	Average order value	376.36

#30

	Country	avg_order_value	orders
24	Netherlands	2818.43	101
0	Australia	1986.63	69
21	Lebanon	1693.88	1
20	Japan	1262.16	28
4	Brazil	1143.60	1
28	RSA	1002.31	1
30	Singapore	912.04	10
9	Denmark	893.72	21
25	Norway	879.09	40
18	Israel	878.65	9
32	Sweden	795.56	46
15	Greece	785.09	6
33	Switzerland	761.96	74
10	EIRE	731.32	360
16	Hong Kong	674.47	15

#31

Thinking

Crafting concise insights

I need to formulate an answer based on the outputs from the notebook. Choosing Path A seems sufficient for this. My goal is to create a natural final answer without code, making sure to analyze the outputs rather than just rephrasing them. I want to mention the overall AOV of $376.36 and highlight that the Netherlands has the highest variation at $2,818.43. I should be cautious about small order counts in Lebanon, Brazil, and RSA. Finally, I'll connect this to the original goal!

#32

The average order value overall is 376.36 USD.

It varies a lot by country. From the observed results, the highest average order values are:

Netherlands: 2818.43 USD across 101 orders
Australia: 1986.63 USD across 69 orders
Japan: 1262.16 USD across 28 orders
Denmark: 893.72 USD across 21 orders
Norway: 879.09 USD across 40 orders
Sweden: 795.56 USD across 46 orders
Switzerland: 761.96 USD across 74 orders
EIRE: 731.32 USD across 360 orders
Hong Kong: 674.47 USD across 15 orders

A few countries also show very high values, but they are based on only 1 order, so they’re less reliable as a pattern:

Lebanon: 1693.88 USD
Brazil: 1143.60 USD
RSA: 1002.31 USD

What this means:

Order values are not evenly distributed internationally.
The Netherlands and Australia stand out with much larger orders than the overall average.
EIRE has a lower average than those top countries, but with 360 orders it likely represents a more stable benchmark than countries with very small order counts.
The gap between the overall average (376.36 USD) and many of these country averages suggests that some international markets tend to place larger basket-size orders.

Score cards for gpt-5.4 for E-commerce Sales Analysis in Python

Total score: 10/10

Task Completion: 2/2
Excellent
All major requested steps were attempted: dataset loading/basic info, monthly revenue trend, top 10 products, and AOV by country. The expected analyses are present in the notebook outputs.
Execution Correctness: 2/2
Excellent
The code shown is coherent and likely runnable, with valid pandas grouping and plotting logic. No major syntax or structural errors are evident from the provided content.
Output Quality: 3/3
Excellent
The outputs match the expected outcomes semantically: shape/dtypes/nulls, a monthly revenue line chart with peaks, a top-10 product revenue table, and country-level AOV results. The bar chart for top 5 countries by AOV is not explicitly shown, but the country AOV data is produced.
Reasoning Quality: 2/2
Excellent
The explanations correctly describe how revenue and order value were computed and note that returns/cancellations can affect totals. The reasoning is clear and aligned with the analysis steps.
Reliability: 1/1
Excellent
The workflow is mostly consistent and grounded in the dataset outputs, with no obvious hallucinations. It is somewhat fragile in presentation because the final country AOV visualization is not explicitly created.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Download Free Trial Learn about AI Data Analyst

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.

E-commerce Sales Analysis in Python

What this AI workflow does

Who this example is for

Expected analysis outcomes

Tools and libraries used

Prompt sequence

Model Comparison

AI Analysis with gpt-5.4

Score cards for gpt-5.4 for E-commerce Sales Analysis in Python

Try MLJAR Studio

Explore More AI Analysis Examples

Exploratory Data Analysis

Time Series

Data Analysis

Machine Learning

NLP

Finance

Statistics