E-commerce Sales Analysis in Python

Explore an e-commerce sales dataset with monthly trends, top products, category breakdowns, and average order value analysis.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Online Retail Excel dataset from a URL and summarizes its structure with shape, dtypes, and missing values. It generates exploratory visuals for monthly revenue trends and identifies the top 10 products by total revenue. It computes average order value and compares AOV across countries with a focused country-level breakdown.

Who this example is for

This is for analysts and data scientists who want a conversational, code-generating EDA template for e-commerce transaction data in Python. It helps users quickly validate data quality and produce standard sales KPIs and plots for reporting or further modeling.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Dataset shape, dtypes, and null counts
  • Monthly revenue trend line chart
  • Top 10 products by total revenue bar chart
  • Average order value overall and by country
  • Bar chart of top countries by AOV

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • openpyxl

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
  2. 2plot monthly revenue trend
  3. 3show top 10 products by total revenue
  4. 4what is the average order value and how does it vary by country?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b9/102/22/23/31/21/1Open gemma4:31b conversation
qwen3-coder-next9/102/22/23/31/21/1Open qwen3-coder-next conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-5.4

This run uses AI with a step-by-step list of prompts to perform the analysis task E-commerce Sales Analysis in Python. Below, you can review the full conversation generated with gpt-5.4. After the conversation, we show detailed score cards for this run.

load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
Thinking

Evaluating dataset loading

I need to follow a step-by-step approach for the user's request about loading a dataset and showing its basic info and shape. Since the dataset isn’t loaded yet, I’ll first load it and then only show its shape, columns, and the head. Although the user asked for basic info too, I might need to provide code for both the shape and the info. I’ll display results sequentially, keeping in mind the guideline of one chart or table per step.Crafting dataset code

I’m getting ready to create a concise explanation and then the code. I’ll use pandas to read the Excel file from a URL. The code will look like this: I’ll import pandas, read the Excel file, and then display key information.

I’ll include df.info(), a table of columns with their data types, and df.head(), along with df.shape at the end. I know I'll have two tables for inspection, which might seem like it goes against the "one table per step" rule, but this should be alright since it’s all related to dataset inspection steps.

Let’s start by loading the Excel file and inspecting the dataset structure.

Inspect dataset structure
import pandas as pd
from IPython.display import display

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx"
df = pd.read_excel(url)

df.info()

cols = pd.DataFrame({
    "column": df.columns,
    "dtype": df.dtypes.astype(str)
})
display(cols)

display(df.head())

df.shape
<class 'pandas.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    541909 non-null  object        
 1   StockCode    541909 non-null  object        
 2   Description  540455 non-null  object        
 3   Quantity     541909 non-null  int64         
 4   InvoiceDate  541909 non-null  datetime64[us]
 5   UnitPrice    541909 non-null  float64       
 6   CustomerID   406829 non-null  float64       
 7   Country      541909 non-null  str           
dtypes: datetime64[us](1), float64(2), int64(1), object(3), str(1)
memory usage: 40.0+ MB
column dtype
InvoiceNo InvoiceNo object
StockCode StockCode object
Description Description object
Quantity Quantity int64
InvoiceDate InvoiceDate datetime64[us]
UnitPrice UnitPrice float64
CustomerID CustomerID float64
Country Country str
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
(541909, 8)

The dataset is loaded and its basic structure is clear from the output.

It contains 541,909 rows and 8 columns, so the shape is (541909, 8).

The columns are:

  • InvoiceNo — object
  • StockCode — object
  • Description — object
  • Quantity — int64
  • InvoiceDate — datetime
  • UnitPrice — float64
  • CustomerID — float64
  • Country — string

A few useful observations from the info output:

  • InvoiceNo, StockCode, Quantity, InvoiceDate, UnitPrice, and Country are fully populated.
  • Description has some missing values: 540,455 non-null, so 1,454 are missing.
  • CustomerID has many missing values: 406,829 non-null, so 135,080 are missing.
  • Memory usage is about 40.0 MB.

The first few rows show this is transactional retail data with invoice numbers, item codes, product descriptions, quantities, timestamps, prices, customer IDs, and country.

plot monthly revenue trend
Thinking

Clarifying notebook state

I’m trying to respond based on the current state of the notebook. There's a bit of confusion because the variable for the iris dataframe conflicts with what I see in the output; it looks like the df is actually retail data. I really just need to aggregate the monthly revenue by calculating revenue as Quantity times Unit Price, but I should also note that cancellations or returns could negatively impact revenue. Since the user asked for a plot, I'll make sure the code is concise and runnable, likely using seaborn or matplotlib.Finalizing my plotting code

I'm considering how to end this code block with just a bare expression, as we can rely on the plot without needing a final expression. I think it's important to clarify that I'll plot net monthly revenue, calculated as Quantity times Unit Price, considering that returns and cancellations will still show negative values in the data. After coding, if there are any missing outputs, I should have the user run it without needing to ask. Let's focus on crafting one clear chart!

Let’s plot net monthly revenue as Quantity × UnitPrice, grouped by invoice month. This keeps returns/cancellations in the data, so the trend reflects net sales.

Plot monthly revenue trend
import seaborn as sns
import matplotlib.pyplot as plt

monthly_revenue = df.copy()
monthly_revenue["Revenue"] = monthly_revenue["Quantity"] * monthly_revenue["UnitPrice"]
monthly_revenue["Month"] = monthly_revenue["InvoiceDate"].dt.to_period("M").dt.to_timestamp()

monthly_revenue = (
    monthly_revenue.groupby("Month", as_index=False)["Revenue"]
    .sum()
)

plt.figure(figsize=(12, 5))
sns.lineplot(data=monthly_revenue, x="Month", y="Revenue", marker="o")
plt.title("Monthly Revenue Trend")
plt.xlabel("Month")
plt.ylabel("Revenue")
plt.xticks(rotation=45)
plt.tight_layout()

The monthly revenue trend chart was created successfully, so the requested visualization is in place. From the setup used, revenue is calculated as Quantity × UnitPrice and summed by calendar month, which means the line reflects net monthly revenue including negative transactions such as returns or cancellations.

To support the chart with the actual month-by-month values:

Round revenue values
monthly_revenue.assign(Revenue=monthly_revenue["Revenue"].round(2))
Month Revenue
0 2010-12-01 748957.02
1 2011-01-01 560000.26
2 2011-02-01 498062.65
3 2011-03-01 683267.08
4 2011-04-01 493207.12
5 2011-05-01 723333.51
6 2011-06-01 691123.12
7 2011-07-01 681300.11
8 2011-08-01 682680.51
9 2011-09-01 1019687.62
10 2011-10-01 1070704.67
11 2011-11-01 1461756.25
12 2011-12-01 433668.01

The monthly revenue trend shows a clear rise over 2011, with some seasonality and a strong year-end peak.

Key takeaways from the plotted values:

  • Revenue starts at 748,957.02 USD in 2010-12.
  • There is a dip in early 2011, reaching 498,062.65 USD in 2011-02.
  • From spring through summer, revenue stays relatively steady, mostly between about 681,300 USD and 723,334 USD.
  • A strong acceleration begins in 2011-09 at 1,019,687.62 USD.
  • The highest month is 2011-11 with 1,461,756.25 USD, making it the clear peak in the trend.
  • 2011-12 drops to 433,668.01 USD, which is likely because the dataset only contains part of December rather than a full month.

So, the chart indicates:

  • moderate revenue in the first half of 2011,
  • a major uplift in autumn,
  • and a pronounced peak in November.

Because revenue was calculated as Quantity × UnitPrice, this is net monthly revenue, meaning returns or cancellations with negative quantities are included in the monthly totals.

show top 10 products by total revenue
Thinking

Deciding on product identifiers

I'm considering using StockCode + Description as the product identifier since descriptions can vary or repeat. With my notebook already loaded, I can easily integrate a concise code snippet. I'll ensure there are no comma-separated expressions and use display(top_products) for clarity.

To calculate revenue, I’m creating a copy of the DataFrame, multiplying Quantity with UnitPrice, and then grouping by StockCode and Description to sum their revenue. Also, I need to address that negative quantities or returns are included.

I’d use StockCode + Description as the product key, since descriptions alone aren’t always unique. This will show net revenue by product, so returns/cancellations are included if they exist.

Compute top product revenue
product_revenue = df.copy()
product_revenue["Revenue"] = product_revenue["Quantity"] * product_revenue["UnitPrice"]

top_products = (
    product_revenue.dropna(subset=["Description"])
    .groupby(["StockCode", "Description"], as_index=False)["Revenue"]
    .sum()
    .sort_values("Revenue", ascending=False)
    .head(10)
)

top_products["Revenue"] = top_products["Revenue"].round(2)
top_products
StockCode Description Revenue
4780 DOT DOTCOM POSTAGE 206245.48
1464 22423 REGENCY CAKESTAND 3 TIER 164762.19
3045 47566 PARTY BUNTING 98302.98
4373 85123A WHITE HANGING HEART T-LIGHT HOLDER 97715.99
4360 85099B JUMBO BAG RED RETROSPOT 92356.03
2254 23084 RABBIT NIGHT LIGHT 66756.59
4783 POST POSTAGE 66230.64
1117 22086 PAPER CHAIN KIT 50'S CHRISTMAS 63791.94
3291 84879 ASSORTED COLOUR BIRD ORNAMENT 58959.73
3128 79321 CHILLI LIGHTS 53768.06

Here are the top 10 products by total revenue from the observed output:

  1. DOTCOM POSTAGE (DOT) — 206245.48 USD
  2. REGENCY CAKESTAND 3 TIER (22423) — 164762.19 USD
  3. PARTY BUNTING (47566) — 98302.98 USD
  4. WHITE HANGING HEART T-LIGHT HOLDER (85123A) — 97715.99 USD
  5. JUMBO BAG RED RETROSPOT (85099B) — 92356.03 USD
  6. RABBIT NIGHT LIGHT (23084) — 66756.59 USD
  7. POSTAGE (POST) — 66230.64 USD
  8. PAPER CHAIN KIT 50'S CHRISTMAS (22086) — 63791.94 USD
  9. ASSORTED COLOUR BIRD ORNAMENT (84879) — 58959.73 USD
  10. CHILLI LIGHTS (79321) — 53768.06 USD

A couple of useful takeaways:

  • DOTCOM POSTAGE is the top revenue contributor by a wide margin, so service/logistics-related line items are materially affecting the ranking.
  • Among physical merchandise, REGENCY CAKESTAND 3 TIER stands out as the strongest product.
  • The top sellers are mostly home décor, gift, and seasonal/party items, which fits the retail pattern already seen in the broader revenue trend.

These totals are based on net revenue calculated as Quantity × UnitPrice, so any returns or cancellations with negative quantities are included.

what is the average order value and how does it vary by country?
Thinking

Calculating invoice totals

I’m considering creating invoice totals and showing AOV overall, plus by country in the next step. Since the user is asking now, I think I should start with the first step. We can compute order values grouped by InvoiceNo and Country since each invoice belongs to a country.

I could display the overall AOV first and then create a summary table for the first ten countries. It seems better to show one comprehensive table at first, along with the overall average as a single expression.Creating invoice totals and AOV

I can't assume any outputs or numbers yet, so I'm thinking about providing code to create invoice-level totals, calculate the overall AOV, and produce a country summary table sorted by AOV. My main goal here is to summarize AOV effectively, even though it includes two outputs. I should display overall AOV then show the country-level AOV summary using something like country_aov.head(15).

I'll be careful with the code, considering whether to exclude non-product postage. For AOV, I need to note that using net invoice revenue as Quantity multiplied by UnitPrice includes returns and cancellations. However, negative totals from canceled invoices might drag down the AOV, so we might want to discuss that further.

Let’s calculate order value at the invoice level first, then summarize the overall average order value and the average by country. I’m using net order value as Quantity × UnitPrice, so returns/cancellations stay in the totals.

Compute average order value
order_values = (
    df.assign(Revenue=df["Quantity"] * df["UnitPrice"])
    .groupby(["InvoiceNo", "Country"], as_index=False)["Revenue"]
    .sum()
    .rename(columns={"Revenue": "OrderValue"})
)

overall_aov = pd.DataFrame({
    "metric": ["Average order value"],
    "value_usd": [order_values["OrderValue"].mean().round(2)]
})

country_aov = (
    order_values.groupby("Country", as_index=False)
    .agg(
        avg_order_value=("OrderValue", "mean"),
        orders=("InvoiceNo", "nunique")
    )
    .sort_values("avg_order_value", ascending=False)
)

country_aov["avg_order_value"] = country_aov["avg_order_value"].round(2)

display(overall_aov)
country_aov.head(15)
metric value_usd
0 Average order value 376.36
Country avg_order_value orders
24 Netherlands 2818.43 101
0 Australia 1986.63 69
21 Lebanon 1693.88 1
20 Japan 1262.16 28
4 Brazil 1143.60 1
28 RSA 1002.31 1
30 Singapore 912.04 10
9 Denmark 893.72 21
25 Norway 879.09 40
18 Israel 878.65 9
32 Sweden 795.56 46
15 Greece 785.09 6
33 Switzerland 761.96 74
10 EIRE 731.32 360
16 Hong Kong 674.47 15
Thinking

Crafting concise insights

I need to formulate an answer based on the outputs from the notebook. Choosing Path A seems sufficient for this. My goal is to create a natural final answer without code, making sure to analyze the outputs rather than just rephrasing them. I want to mention the overall AOV of $376.36 and highlight that the Netherlands has the highest variation at $2,818.43. I should be cautious about small order counts in Lebanon, Brazil, and RSA. Finally, I'll connect this to the original goal!

The average order value overall is 376.36 USD.

It varies a lot by country. From the observed results, the highest average order values are:

  • Netherlands: 2818.43 USD across 101 orders
  • Australia: 1986.63 USD across 69 orders
  • Japan: 1262.16 USD across 28 orders
  • Denmark: 893.72 USD across 21 orders
  • Norway: 879.09 USD across 40 orders
  • Sweden: 795.56 USD across 46 orders
  • Switzerland: 761.96 USD across 74 orders
  • EIRE: 731.32 USD across 360 orders
  • Hong Kong: 674.47 USD across 15 orders

A few countries also show very high values, but they are based on only 1 order, so they’re less reliable as a pattern:

  • Lebanon: 1693.88 USD
  • Brazil: 1143.60 USD
  • RSA: 1002.31 USD

What this means:

  • Order values are not evenly distributed internationally.
  • The Netherlands and Australia stand out with much larger orders than the overall average.
  • EIRE has a lower average than those top countries, but with 360 orders it likely represents a more stable benchmark than countries with very small order counts.
  • The gap between the overall average (376.36 USD) and many of these country averages suggests that some international markets tend to place larger basket-size orders.

Score cards for gpt-5.4 for E-commerce Sales Analysis in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major requested steps were attempted: dataset loading/basic info, monthly revenue trend, top 10 products, and AOV by country. The expected analyses are present in the notebook outputs.

  • Execution Correctness: 2/2

    Excellent

    The code shown is coherent and likely runnable, with valid pandas grouping and plotting logic. No major syntax or structural errors are evident from the provided content.

  • Output Quality: 3/3

    Excellent

    The outputs match the expected outcomes semantically: shape/dtypes/nulls, a monthly revenue line chart with peaks, a top-10 product revenue table, and country-level AOV results. The bar chart for top 5 countries by AOV is not explicitly shown, but the country AOV data is produced.

  • Reasoning Quality: 2/2

    Excellent

    The explanations correctly describe how revenue and order value were computed and note that returns/cancellations can affect totals. The reasoning is clear and aligned with the analysis steps.

  • Reliability: 1/1

    Excellent

    The workflow is mostly consistent and grounded in the dataset outputs, with no obvious hallucinations. It is somewhat fragile in presentation because the final country AOV visualization is not explicitly created.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.