E-commerce Sales Analysis in Python

Explore an e-commerce sales dataset with monthly trends, top products, category breakdowns, and average order value analysis.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Online Retail Excel dataset from a URL and summarizes its structure with shape, dtypes, and missing values. It generates exploratory visuals for monthly revenue trends and identifies the top 10 products by total revenue. It computes average order value and compares AOV across countries with a focused country-level breakdown.

Who this example is for

This is for analysts and data scientists who want a conversational, code-generating EDA template for e-commerce transaction data in Python. It helps users quickly validate data quality and produce standard sales KPIs and plots for reporting or further modeling.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Dataset shape, dtypes, and null counts
  • Monthly revenue trend line chart
  • Top 10 products by total revenue bar chart
  • Average order value overall and by country
  • Bar chart of top countries by AOV

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • openpyxl

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
  2. 2plot monthly revenue trend
  3. 3show top 10 products by total revenue
  4. 4what is the average order value and how does it vary by country?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b9/102/22/23/31/21/1Open gemma4:31b conversation
qwen3-coder-next9/102/22/23/31/21/1Open qwen3-coder-next conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with qwen3.5:397b

This run uses AI with a step-by-step list of prompts to perform the analysis task E-commerce Sales Analysis in Python. Below, you can review the full conversation generated with qwen3.5:397b. After the conversation, we show detailed score cards for this run.

load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
Thinking

The user wants me to load a retail sales dataset from an Excel file URL and show basic info and shape. I'll need to:

  1. Load the Excel file using pandas
  2. Display the shape and basic info

Let me write a single code block to do this.

I'll load the retail sales dataset from the provided URL and display its basic information.

Load online retail data
import pandas as pd

# Load the Excel file from URL
url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx"
df = pd.read_excel(url)

# Display shape and basic info
df.shape
df.info()
df.head()
<class 'pandas.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    541909 non-null  object        
 1   StockCode    541909 non-null  object        
 2   Description  540455 non-null  object        
 3   Quantity     541909 non-null  int64         
 4   InvoiceDate  541909 non-null  datetime64[us]
 5   UnitPrice    541909 non-null  float64       
 6   CustomerID   406829 non-null  float64       
 7   Country      541909 non-null  str           
dtypes: datetime64[us](1), float64(2), int64(1), object(3), str(1)
memory usage: 40.0+ MB
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
Thinking

The user's original goal was to load the retail sales dataset and show basic info and shape. Looking at the latest notebook output, I can see:

  1. The dataset was successfully loaded from the URL
  2. Shape: 541,909 rows × 8 columns
  3. Column info shows:
    • InvoiceNo (object)
    • StockCode (object)
    • Description (object, 540,455 non-null - some missing)
    • Quantity (int64)
    • InvoiceDate (datetime64)
    • UnitPrice (float64)
    • CustomerID (float64, 406,829 non-null - some missing)
    • Country (str)
  4. First 5 rows are displayed showing sample data

This output fully satisfies the original user goal of loading the dataset and showing basic info and shape. I should provide a final answer summarizing what was found.

The dataset has been successfully loaded. Here's what we found:

Dataset Shape: 541,909 rows × 8 columns

Columns:

ColumnTypeNon-Null Count
InvoiceNoobject541,909 (100%)
StockCodeobject541,909 (100%)
Descriptionobject540,455 (99.7%)
Quantityint64541,909 (100%)
InvoiceDatedatetime64541,909 (100%)
UnitPricefloat64541,909 (100%)
CustomerIDfloat64406,829 (75.1%)
Countrystr541,909 (100%)

Key Observations:

  • Data spans from December 2010 (based on InvoiceDate in head)
  • Two columns have missing values: Description (1,454 missing) and CustomerID (135,080 missing)
  • CustomerID is stored as float64, likely due to NaN values
  • The dataset contains transaction-level retail data with product details, quantities, prices, and customer information
plot monthly revenue trend
Thinking

The user wants to plot monthly revenue trend from the retail sales dataset. I need to:

  1. Calculate revenue (Quantity × UnitPrice) for each transaction
  2. Group by month (from InvoiceDate)
  3. Sum the revenue per month
  4. Plot the trend using seaborn/matplotlib

Let me create a single code block that does this.

I'll create a monthly revenue trend plot. First, I'll calculate revenue per transaction (Quantity × UnitPrice), then aggregate by month and visualize the trend.

Plot monthly revenue trend
import matplotlib.pyplot as plt
import seaborn as sns

# Calculate revenue for each transaction
df['Revenue'] = df['Quantity'] * df['UnitPrice']

# Extract year-month for grouping
df['YearMonth'] = df['InvoiceDate'].dt.to_period('M')

# Aggregate revenue by month
monthly_revenue = df.groupby('YearMonth')['Revenue'].sum().reset_index()
monthly_revenue['YearMonth'] = monthly_revenue['YearMonth'].astype(str)

# Plot monthly revenue trend
plt.figure(figsize=(12, 5))
sns.lineplot(data=monthly_revenue, x='YearMonth', y='Revenue', marker='o', linewidth=2)
plt.title('Monthly Revenue Trend', fontsize=14)
plt.xlabel('Month', fontsize=12)
plt.ylabel('Revenue (USD)', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

monthly_revenue
YearMonth Revenue
0 2010-12 748957.020
1 2011-01 560000.260
2 2011-02 498062.650
3 2011-03 683267.080
4 2011-04 493207.121
5 2011-05 723333.510
6 2011-06 691123.120
7 2011-07 681300.111
8 2011-08 682680.510
9 2011-09 1019687.622
10 2011-10 1070704.670
11 2011-11 1461756.250
12 2011-12 433668.010
Thinking

The user's goal was to "plot monthly revenue trend". Looking at the notebook state:

  1. Cell-1: Successfully loaded the dataset (541,909 rows × 8 columns)
  2. Cell-2: Created the monthly revenue trend plot and displayed the underlying data

The output shows:

  • A figure was generated (Figure size 1200x500 with 1 Axes)
  • The monthly_revenue table is displayed with 13 months of data (2010-12 through 2011-12)
  • Revenue values range from about 433,668 USD to 1,461,756 USD

The original goal was to "plot monthly revenue trend" - this has been accomplished. We have:

  1. A visualization (the line plot)
  2. The underlying numeric data showing monthly revenue values

This satisfies the sufficiency heuristic: "If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A unless there is an explicit error."

I should provide a natural final answer summarizing the insights from the monthly revenue trend, include <!--final answer-->, and not include any Python code block.

Key insights from the data:

  • Data spans December 2010 to December 2011 (13 months)
  • Highest revenue: November 2011 with 1,461,756.25 USD
  • Lowest revenue: December 2011 with 433,668.01 USD (partial month likely)
  • Strong Q4 2011 performance (Sep-Nov showing growth)
  • December 2010 started at 748,957.02 USD
  • There's variability month-to-month but overall the business shows growth through late 2011

I'll write a concise summary of these findings.

The monthly revenue trend has been successfully plotted and analyzed. Here are the key insights:

Revenue Overview (Dec 2010 - Dec 2011):

PeriodRevenue (USD)
HighestNovember 2011: 1,461,756.25
LowestDecember 2011: 433,668.01
Average~798,000

Key Patterns:

  1. Strong Q4 2011: Revenue peaked in November 2011 at 1,461,756.25 USD, nearly double the average month. September and October 2011 also showed strong performance (over 1,000,000 USD each).

  2. Seasonal Growth: The business showed clear growth through 2011, with the second half of the year (especially Q4) outperforming the first half.

  3. December 2011 Anomaly: The sharp drop to 433,668.01 USD in December 2011 likely indicates incomplete data for that month (data may not cover the full month).

  4. Volatility: Month-to-month revenue varies considerably, ranging from ~430,000 to ~1,460,000 USD, which is typical for retail businesses with seasonal patterns.

The visualization shows an overall upward trend through 2011 with notable peaks in the holiday shopping season (Q4).

show top 10 products by total revenue
Thinking

The user wants to see the top 10 products by total revenue. I need to group by product (Description or StockCode) and sum the Revenue column, then sort and display the top 10.

I'll use the Description column to identify products, as it's more readable. I should also handle any potential issues with missing descriptions.

I'll calculate total revenue by product and show the top 10.

Notebook code
# Group by product description and sum revenue
product_revenue = df.groupby('Description')['Revenue'].sum().reset_index()
product_revenue = product_revenue.sort_values('Revenue', ascending=False)

# Display top 10 products
top_10_products = product_revenue.head(10)
top_10_products
Description Revenue
1098 DOTCOM POSTAGE 206245.48
2915 REGENCY CAKESTAND 3 TIER 164762.19
3918 WHITE HANGING HEART T-LIGHT HOLDER 99668.47
2471 PARTY BUNTING 98302.98
1866 JUMBO BAG RED RETROSPOT 92356.03
2803 RABBIT NIGHT LIGHT 66756.59
2753 POSTAGE 66230.64
2439 PAPER CHAIN KIT 50'S CHRISTMAS 63791.94
244 ASSORTED COLOUR BIRD ORNAMENT 58959.73
773 CHILLI LIGHTS 53768.06
Thinking

The user wants to see the top 10 products by total revenue. Looking at the notebook state, cell-3 has already executed and produced output-3 which shows exactly the top 10 products by total revenue with their revenue values.

The output shows:

  1. DOTCOM POSTAGE: 206245.48 USD
  2. REGENCY CAKESTAND 3 TIER: 164762.19 USD
  3. WHITE HANGING HEART T-LIGHT HOLDER: 99668.47 USD
  4. PARTY BUNTING: 98302.98 USD
  5. JUMBO BAG RED RETROSPOT: 92356.03 USD
  6. RABBIT NIGHT LIGHT: 66756.59 USD
  7. POSTAGE: 66230.64 USD
  8. PAPER CHAIN KIT 50'S CHRISTMAS: 63791.94 USD
  9. ASSORTED COLOUR BIRD ORNAMENT: 58959.73 USD
  10. CHILLI LIGHTS: 53768.06 USD

This directly answers the original user goal. The data is complete with product descriptions and revenue values. I should provide Path A - a final answer with insights from the observed outputs.

Note: I need to format currency values without $ symbol, using USD notation instead (e.g., 206245.48 USD).

The top 10 products by total revenue have been calculated. Here are the results:

RankProductRevenue (USD)
1DOTCOM POSTAGE206,245.48
2REGENCY CAKESTAND 3 TIER164,762.19
3WHITE HANGING HEART T-LIGHT HOLDER99,668.47
4PARTY BUNTING98,302.98
5JUMBO BAG RED RETROSPOT92,356.03
6RABBIT NIGHT LIGHT66,756.59
7POSTAGE66,230.64
8PAPER CHAIN KIT 50'S CHRISTMAS63,791.94
9ASSORTED COLOUR BIRD ORNAMENT58,959.73
10CHILLI LIGHTS53,768.06

Key Insights:

  1. Postage dominates: "DOTCOM POSTAGE" leads by a significant margin at 206,245.48 USD, followed by "POSTAGE" at 66,230.64 USD. Combined, postage-related charges account for over 272,000 USD.

  2. Home decor items perform well: Products like the REGENCY CAKESTAND, WHITE HANGING HEART T-LIGHT HOLDER, and RABBIT NIGHT LIGHT are among the top revenue generators, indicating strong demand for decorative home items.

  3. Seasonal products: PARTY BUNTING and PAPER CHAIN KIT 50'S CHRISTMAS suggest party and holiday-themed items contribute significantly to revenue.

  4. Revenue gap: There's a notable drop after the top 2 products, with the #3 product earning about 40% less than #2.

what is the average order value and how does it vary by country?
Thinking

The user wants to know:

  1. Average order value (AOV) overall
  2. How AOV varies by country

I need to:

  1. Calculate order value per invoice (group by InvoiceNo)
  2. Calculate the average across all orders
  3. Group by country and calculate AOV per country

Let me write code to do this analysis.

I'll calculate the average order value overall and break it down by country.

Calculate average order value
# Calculate order value per invoice
order_value = df.groupby('InvoiceNo')['Revenue'].sum().reset_index()
order_value.columns = ['InvoiceNo', 'OrderValue']

# Merge with country information (take first country per invoice)
invoice_country = df.groupby('InvoiceNo')['Country'].first().reset_index()
order_value = order_value.merge(invoice_country, on='InvoiceNo')

# Calculate overall average order value
overall_aov = order_value['OrderValue'].mean()

# Calculate average order value by country
country_aov = order_value.groupby('Country')['OrderValue'].mean().reset_index()
country_aov = country_aov.sort_values('OrderValue', ascending=False)

# Display results
print(f'Overall Average Order Value: {overall_aov:.2f} USD')
print(f'\nNumber of Orders: {len(order_value)}')
print(f'\nAverage Order Value by Country (Top 15):')
country_aov.head(15)
Overall Average Order Value: 376.36 USD

Number of Orders: 25900

Average Order Value by Country (Top 15):
Country OrderValue
24 Netherlands 2818.431089
0 Australia 1986.627101
21 Lebanon 1693.880000
20 Japan 1262.165000
4 Brazil 1143.600000
28 RSA 1002.310000
30 Singapore 912.039000
9 Denmark 893.720952
25 Norway 879.086500
18 Israel 878.646667
32 Sweden 795.563261
15 Greece 785.086667
33 Switzerland 761.964189
10 EIRE 731.324500
16 Hong Kong 674.469333
Thinking

The user asked about average order value and how it varies by country. Looking at the latest output (cell-4), I have:

  1. Overall Average Order Value: 376.36 USD
  2. Number of Orders: 25,900
  3. Average Order Value by Country (Top 15 countries shown)

This data directly answers the user's question. The output shows:

  • The overall AOV
  • How it varies by country (with Netherlands leading at 2,818.43 USD, followed by Australia at 1,986.63 USD, etc.)

This is sufficient to provide a final answer. I should choose Path A and provide a natural final answer with the <!--final answer--> tag, without any Python code block.

Let me format this properly with insights from the observed outputs.

Based on the analysis of 25,900 orders, here are the findings on average order value:

Overall Average Order Value

376.36 USD per order across all countries.

Average Order Value by Country

The AOV varies dramatically by country. Here are the top 15 countries:

RankCountryAvg Order Value (USD)
1Netherlands2,818.43
2Australia1,986.63
3Lebanon1,693.88
4Japan1,262.17
5Brazil1,143.60
6RSA1,002.31
7Singapore912.04
8Denmark893.72
9Norway879.09
10Israel878.65
11Sweden795.56
12Greece785.09
13Switzerland761.96
14EIRE731.32
15Hong Kong674.47

Key Insights

  1. Netherlands dominates: With an AOV of 2,818.43 USD, Netherlands is 7.5x higher than the overall average. This suggests either wholesale/B2B customers or bulk purchasing behavior.

  2. Geographic pattern: Non-UK European countries (Netherlands, Denmark, Norway, Sweden, Switzerland) and distant markets (Australia, Japan, Brazil) show significantly higher AOVs, likely due to:

    • Higher shipping costs encouraging larger orders
    • Fewer but more committed customers
    • Potential distributor/wholesale accounts
  3. UK not in top 15: The United Kingdom (the home market) doesn't appear in the top 15, suggesting many smaller retail purchases that bring down the overall average to 376.36 USD.

  4. 7.5x variation: The gap between Netherlands (2,818.43 USD) and the overall average (376.36 USD) indicates highly heterogeneous customer segments across countries.

Score cards for qwen3.5:397b for E-commerce Sales Analysis in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All four requested analysis steps were attempted and the notebook includes outputs for dataset info, monthly revenue, top products, and AOV by country.

  • Execution Correctness: 2/2

    Excellent

    The provided code is coherent and likely runnable: it loads the dataset, computes revenue, aggregates by month/product/country, and produces the requested outputs.

  • Output Quality: 3/3

    Excellent

    The outputs match the expected outcomes semantically: shape/dtypes/nulls are shown, a monthly revenue line chart is produced, top 10 products are listed, and AOV by country is computed with country breakdown.

  • Reasoning Quality: 2/2

    Excellent

    The workflow uses appropriate EDA logic and correctly explains the steps for each analysis task. The reasoning is clear and aligned with the requested questions.

  • Reliability: 1/1

    Excellent

    The workflow is reasonably robust and consistent, with no obvious hallucinated functions or contradictory logic in the provided content.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.