E-commerce Sales Analysis in Python

Explore an e-commerce sales dataset with monthly trends, top products, category breakdowns, and average order value analysis.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Online Retail Excel dataset from a URL and summarizes its structure with shape, dtypes, and missing values. It generates exploratory visuals for monthly revenue trends and identifies the top 10 products by total revenue. It computes average order value and compares AOV across countries with a focused country-level breakdown.

Who this example is for

This is for analysts and data scientists who want a conversational, code-generating EDA template for e-commerce transaction data in Python. It helps users quickly validate data quality and produce standard sales KPIs and plots for reporting or further modeling.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Dataset shape, dtypes, and null counts
  • Monthly revenue trend line chart
  • Top 10 products by total revenue bar chart
  • Average order value overall and by country
  • Bar chart of top countries by AOV

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • openpyxl

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
  2. 2plot monthly revenue trend
  3. 3show top 10 products by total revenue
  4. 4what is the average order value and how does it vary by country?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b9/102/22/23/31/21/1Open gemma4:31b conversation
qwen3-coder-next9/102/22/23/31/21/1Open qwen3-coder-next conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-oss:120b

This run uses AI with a step-by-step list of prompts to perform the analysis task E-commerce Sales Analysis in Python. Below, you can review the full conversation generated with gpt-oss:120b. After the conversation, we show detailed score cards for this run.

load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
Thinking

We need to load the dataset from the URL, likely Excel file. Show shape, columns, head. Provide code block. End with displaying shape, columns, head. According to notebook rules, we can just write code that loads into df, then show df.shape, df.columns, df.head(). We'll put them sequentially maybe using separate statements. Need to end with at most one bare expression. Could do:

df = pd.read_excel(url) df.shape df.columns df.head()

But that's three expressions. Need at most one final bare expression. Could do:

df = pd.read_excel(url) display(df.shape) display(df.columns) df.head()

display returns None? Actually display is from IPython.display, shows representation. But display(df.shape) will display tuple. That's fine and a function call, not a bare expression. The final bare expression is df.head() which will show table.

Let's produce code accordingly.

Here’s a quick load of the Excel file and a look at its size, column names, and a few rows:

Load online retail data
import pandas as pd
from IPython.display import display

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx"

# Load the workbook
df = pd.read_excel(url)

# Show basic information
display(df.shape)          # number of rows, columns
display(df.columns)       # column names
df.head()                  # first few rows
(541909, 8)
Index(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'],
      dtype='str')
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
Thinking

We have loaded the dataset and displayed shape and columns and head. The original goal: "load the retail sales dataset from ... and show basic info and shape". The current output already shows shape (541,909 rows, 8 columns) and column names, plus first rows. That's sufficient. Need to provide final answer with insights, not code. Provide concise answer with shape and columns, maybe summarize head. Use <!--final answer--> marker.

The dataset was loaded successfully.

  • Shape: 541,909 rows × 8 columns
  • Columns: InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country

A quick glance at the first few records shows typical transaction details such as invoice numbers, product codes, quantities, unit prices, and customer information.

plot monthly revenue trend
Thinking

We have df with 541k rows. Need to compute monthly revenue (Quantity * UnitPrice) aggregated by month. Then plot line chart with seaborn/matplotlib. Provide code block that calculates revenue per month and displays plot. Should not show extra prints. Use seaborn lineplot.

We need to ensure InvoiceDate is datetime (already). Compute df['Revenue'] = df['Quantity'] * df['UnitPrice']; groupby month using dt.to_period('M') or dt.strftime('%Y-%m').

Plot.

Provide code block.

Plot monthly revenue
import seaborn as sns
import matplotlib.pyplot as plt

# Ensure date column is datetime (it already is from read_excel)
df["InvoiceDate"] = pd.to_datetime(df["InvoiceDate"])

# Revenue per transaction
df["Revenue"] = df["Quantity"] * df["UnitPrice"]

# Aggregate monthly revenue
monthly_rev = (
    df.groupby(df["InvoiceDate"].dt.to_period("M"))["Revenue"]
    .sum()
    .reset_index()
)
monthly_rev["InvoiceDate"] = monthly_rev["InvoiceDate"].dt.to_timestamp()

# Plot
plt.figure(figsize=(12, 4))
sns.lineplot(data=monthly_rev, x="InvoiceDate", y="Revenue", marker="o")
plt.title("Monthly Revenue Trend")
plt.xlabel("Month")
plt.ylabel("Revenue (USD)")
plt.tight_layout()
monthly_rev
InvoiceDate Revenue
0 2010-12-01 748957.020
1 2011-01-01 560000.260
2 2011-02-01 498062.650
3 2011-03-01 683267.080
4 2011-04-01 493207.121
5 2011-05-01 723333.510
6 2011-06-01 691123.120
7 2011-07-01 681300.111
8 2011-08-01 682680.510
9 2011-09-01 1019687.622
10 2011-10-01 1070704.670
11 2011-11-01 1461756.250
12 2011-12-01 433668.010
Thinking

We need decide if current outputs sufficient to answer original goal: plot monthly revenue trend. The latest output includes a line chart (figure) and also a displayed DataFrame monthly_rev which shows month and revenue values. That's both chart and numeric values, so seems sufficient. We need to provide final answer with insights. Must not include code. Provide final answer with insights and include <!--final answer--> marker. Ensure no meta phrases like "goal achieved". Provide concise interpretation. Use USD format for values.

The monthly revenue steadily increased after the start of 2010, with notable spikes in September 2011 (≈ 1,019,688 USD), October 2011 (≈ 1,070,705 USD) and a peak in November 2011 (≈ 1,461,756 USD). The trend line visualises this growth, showing lower revenue in the early months (e.g., December 2010 ≈ 748,957 USD) and a clear upward trajectory toward the end of 2011.

show top 10 products by total revenue
Thinking

We need to compute total revenue per product (Description) and show top 10. Use df with Revenue column already computed. Should aggregate sum Revenue grouped by Description, sort descending and show top 10. Provide code block that outputs a DataFrame. Also probably display a bar chart for visualization. The user said "show top 10 products by total revenue". Could present a table and optionally a plot. Provide code.

Show top revenue products
# Total revenue per product (using the Description column)
product_rev = (
    df.groupby("Description")["Revenue"]
    .sum()
    .reset_index()
    .sort_values("Revenue", ascending=False)
)

# Keep the top 10
top10 = product_rev.head(10)

# Show the table
top10
Description Revenue
1098 DOTCOM POSTAGE 206245.48
2915 REGENCY CAKESTAND 3 TIER 164762.19
3918 WHITE HANGING HEART T-LIGHT HOLDER 99668.47
2471 PARTY BUNTING 98302.98
1866 JUMBO BAG RED RETROSPOT 92356.03
2803 RABBIT NIGHT LIGHT 66756.59
2753 POSTAGE 66230.64
2439 PAPER CHAIN KIT 50'S CHRISTMAS 63791.94
244 ASSORTED COLOUR BIRD ORNAMENT 58959.73
773 CHILLI LIGHTS 53768.06
Thinking

We need to decide A or B. Goal: show top 10 products by total revenue. The notebook already ran code to compute top10 and displayed a table. Output shows top 10 products with revenue values. So we have sufficient answer. Must provide natural final answer, no code, with <!--final answer-->.

We must include insights (the top 10 list). Provide description.

Proceed.

The ten products that generated the highest total revenue are:

RankProduct (Description)Total Revenue
1DOTCOM POSTAGE206,245.48 USD
2REGENCY CAKESTAND 3 TIER164,762.19 USD
3WHITE HANGING HEART T‑LIGHT HOLDER99,668.47 USD
4PARTY BUNTING98,302.98 USD
5JUMBO BAG RED RETROSPOT92,356.03 USD
6RABBIT NIGHT LIGHT66,756.59 USD
7POSTAGE66,230.64 USD
8PAPER CHAIN KIT 50'S CHRISTMAS63,791.94 USD
9ASSORTED COLOUR BIRD ORNAMENT58,959.73 USD
10CHILLI LIGHTS53,768.06 USD

These items account for the largest share of the dataset’s overall revenue.

what is the average order value and how does it vary by country?
Thinking

We have df with Revenue per row (line item). Need average order value per InvoiceNo (total revenue per invoice). Compute order total, then average across all orders overall, also by country. Country is column, but country is per line; need order's country (assuming same per invoice). Compute order totals per InvoiceNo, also assign Country via mode or first. Then compute overall average order value, and per-country average order value. Provide code and output. Also possibly visualize variation; but ask for average order value and how it varies by country – we can produce a table sorted descending average order value per country, maybe also count of orders. Provide code and show output. Follow guidelines: one step at a time. Provide code to compute order totals and country level averages, then display table. Provide final answer with insights.

Calculate average order value
# ---- 1️⃣ Compute total revenue per invoice (order) ----
order_rev = (
    df.groupby("InvoiceNo")
    .agg(
        OrderValue=("Revenue", "sum"),
        Country=("Country", "first")  # each invoice belongs to a single country
    )
    .reset_index()
)

# ---- 2️⃣ Overall average order value ----
overall_avg = order_rev["OrderValue"].mean()
overall_avg
np.float64(376.3609240926641)
Avg order value per country
# ---- 3️⃣ Average order value by country (and number of orders) ----
country_stats = (
    order_rev.groupby("Country")
    .agg(
        AvgOrderValue=("OrderValue", "mean"),
        NumOrders=("OrderValue", "size")
    )
    .reset_index()
    .sort_values("AvgOrderValue", ascending=False)
)

country_stats
Country AvgOrderValue NumOrders
24 Netherlands 2818.431089 101
0 Australia 1986.627101 69
21 Lebanon 1693.880000 1
20 Japan 1262.165000 28
4 Brazil 1143.600000 1
28 RSA 1002.310000 1
30 Singapore 912.039000 10
9 Denmark 893.720952 21
25 Norway 879.086500 40
18 Israel 878.646667 9
32 Sweden 795.563261 46
15 Greece 785.086667 6
33 Switzerland 761.964189 74
10 EIRE 731.324500 360
16 Hong Kong 674.469333 15
7 Cyprus 647.314500 20
35 United Arab Emirates 634.093333 3
17 Iceland 615.714286 7
5 Canada 611.063333 6
6 Channel Islands 608.675455 33
1 Austria 534.437895 19
31 Spain 521.662667 105
12 Finland 465.140417 48
13 France 428.208026 461
22 Lithuania 415.265000 4
27 Portugal 413.620000 71
14 Germany 367.658723 603
37 Unspecified 365.368462 13
36 United Kingdom 348.506272 23494
3 Belgium 343.789580 119
19 Italy 307.100182 55
26 Poland 300.547500 24
11 European Community 258.350000 5
23 Malta 250.547000 10
34 USA 247.274286 7
8 Czech Republic 141.544000 5
2 Bahrain 137.100000 4
29 Saudi Arabia 65.585000 2
Thinking

We have outputs:

  • overall_avg = 376.3609240926641 (np.float64). So overall average order value is about 376.36 USD.

  • country_stats table shows average order value per country and number of orders, sorted descending by AvgOrderValue. The table includes many rows but truncated. Provide insights: Netherlands highest avg 2818.43 USD (101 orders). Next Australia 1986.63 USD, etc. Provide the variation.

Thus we have sufficient information to answer the question: average order value overall, and how it varies by country (list top few, maybe all). The output already contains the data; we can summarize. Provide final answer with bullet list.

We must not include code. Provide final answer with markdown, include currency as USD. Mark <!--final answer-->.

Overall average order value

  • 376.36 USD per invoice (order).

Variation by country

CountryAvg. order value (USD)Number of orders
Netherlands2,818.43101
Australia1,986.6369
Lebanon1,693.881
Japan1,262.1728
Brazil1,143.601
RSA (South Africa)1,002.311
Singapore912.0410
Denmark893.7221
Norway879.0940
Israel878.659
Sweden795.5646
Greece785.096
Switzerland761.9674
EIRE (Ireland)731.32360
Hong Kong674.4715
Cyprus647.3120
United Arab Emirates …… (continues)

Interpretation: The average order value varies widely across countries. The Netherlands shows the highest average (≈ 2,818 USD), while many other markets have averages between roughly 600 USD and 1,200 USD. Overall, a typical order across all countries is about 376 USD.

Score cards for gpt-oss:120b for E-commerce Sales Analysis in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All four requested analysis steps were completed: dataset loading/basic info, monthly revenue trend, top 10 products by revenue, and average order value by country.

  • Execution Correctness: 2/2

    Excellent

    The code is coherent and likely runnable: it loads the Excel file, computes revenue, aggregates by month/product/country, and produces the requested outputs.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically, including shape/dtypes-like info, a monthly revenue line chart, a top-10 product table, and country-level AOV results.

  • Reasoning Quality: 2/2

    Excellent

    The workflow uses appropriate aggregations and correctly interprets average order value as invoice-level revenue averaged by country.

  • Reliability: 1/1

    Excellent

    The analysis is consistent and grounded in the dataset outputs, with no obvious hallucinations or unsupported claims.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.