E-commerce Sales Analysis in Python

Explore an e-commerce sales dataset with monthly trends, top products, category breakdowns, and average order value analysis.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Online Retail Excel dataset from a URL and summarizes its structure with shape, dtypes, and missing values. It generates exploratory visuals for monthly revenue trends and identifies the top 10 products by total revenue. It computes average order value and compares AOV across countries with a focused country-level breakdown.

Who this example is for

This is for analysts and data scientists who want a conversational, code-generating EDA template for e-commerce transaction data in Python. It helps users quickly validate data quality and produce standard sales KPIs and plots for reporting or further modeling.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Dataset shape, dtypes, and null counts
  • Monthly revenue trend line chart
  • Top 10 products by total revenue bar chart
  • Average order value overall and by country
  • Bar chart of top countries by AOV

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • openpyxl

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
  2. 2plot monthly revenue trend
  3. 3show top 10 products by total revenue
  4. 4what is the average order value and how does it vary by country?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
gemma4:31b9/102/22/23/31/21/1Open gemma4:31b conversation
qwen3-coder-next9/102/22/23/31/21/1Open qwen3-coder-next conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with glm-5.1

This run uses AI with a step-by-step list of prompts to perform the analysis task E-commerce Sales Analysis in Python. Below, you can review the full conversation generated with glm-5.1. After the conversation, we show detailed score cards for this run.

load the retail sales dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx and show basic info and shape
Thinking

The user wants to load a retail sales dataset from an Excel file URL and show basic info and shape. Let me provide code to load it and inspect.

Let's load the dataset and take a first look at it.

Load online retail data
import pandas as pd

df = pd.read_excel("https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/online-retail/Online-Retail.xlsx")
df.shape
(541909, 8)

Run this and we'll inspect the shape and basic info next.

Thinking

The dataset is loaded with 541,909 rows and 8 columns. The shape is visible but the user also asked for "basic info" which typically includes column types, non-null counts, and a sample of the data. We have the shape but not the info. Let me get the basic info and a quick look at the data.

The dataset loaded successfully — 541,909 rows and 8 columns. We have the shape, but we still need basic info (data types, non-null counts) and a preview. Let me grab those now.

Show dataframe info
df.info()
<class 'pandas.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    541909 non-null  object        
 1   StockCode    541909 non-null  object        
 2   Description  540455 non-null  object        
 3   Quantity     541909 non-null  int64         
 4   InvoiceDate  541909 non-null  datetime64[us]
 5   UnitPrice    541909 non-null  float64       
 6   CustomerID   406829 non-null  float64       
 7   Country      541909 non-null  str           
dtypes: datetime64[us](1), float64(2), int64(1), object(3), str(1)
memory usage: 40.0+ MB
Thinking

The user's original goal was to load the dataset and show basic info and shape. We now have both:

  1. Shape: (541909, 8)
  2. df.info() output showing column names, non-null counts, and dtypes

Let me summarize the key insights from these outputs as a final answer.

Here's a summary of the dataset:

  • Shape: 541,909 rows × 8 columns
  • Columns: InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country
  • Data types: 1 datetime, 2 float64, 1 int64, 3 object, 1 string
  • Memory usage: ~40 MB

Key observations from the info:

  • Missing values: Description is missing 1,454 entries (540,455 / 541,909), and CustomerID is missing a significant 135,080 entries (406,829 / 541,909) — roughly 25% of transactions lack a customer ID. This is important to address before any customer-level analysis.
  • InvoiceDate was correctly parsed as datetime during load.
  • CustomerID is stored as float64, which is typical when pandas encounters NaNs in an integer column — you may want to convert it to a string/integer type after handling missing values.
  • UnitPrice is float64 and Quantity is int64, both ready for numerical computations (e.g., creating a Revenue column as Quantity × UnitPrice).
plot monthly revenue trend
Thinking

The user wants to plot monthly revenue trend. I need to:

  1. Create a Revenue column (Quantity * UnitPrice)
  2. Extract month from InvoiceDate
  3. Group by month and sum revenue
  4. Plot the trend

Let me write this step by step.

To plot monthly revenue, I'll first create a Revenue column and then aggregate by month.

Plot monthly revenue
import seaborn as sns
import matplotlib.pyplot as plt

# Create revenue column
df['Revenue'] = df['Quantity'] * df['UnitPrice']

# Extract month period
df['Month'] = df['InvoiceDate'].dt.to_period('M')

# Monthly revenue
monthly_rev = df.groupby('Month')['Revenue'].sum().reset_index()
monthly_rev['Month'] = monthly_rev['Month'].astype(str)

# Plot
plt.figure(figsize=(12, 5))
sns.lineplot(data=monthly_rev, x='Month', y='Revenue', marker='o')
plt.xticks(rotation=45, ha='right')
plt.title('Monthly Revenue Trend')
plt.ylabel('Revenue (USD)')
plt.xlabel('Month')
plt.tight_layout()
plt.show()
Thinking

The user goal was to plot the monthly revenue trend. The chart has been produced (Figure size 1200x500 with 1 Axes). However, I don't have the actual visual of the chart or the numeric values. Let me check the notebook state - monthly_rev DataFrame has shape 13 x 2, so there are 13 months of data.

The chart was successfully generated. But I don't have the actual numeric values to provide concrete insights about the trend. Let me check - the output just says "<Figure size 1200x500 with 1 Axes>" which means the chart was created but I can't see the actual values.

The goal was to "plot monthly revenue trend" - the plot has been created. But to provide meaningful insights, I should show the underlying data values. Let me choose Path B to display the monthly revenue values so I can extract concrete insights.

Actually, the sufficiency heuristic says: "If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A unless there is an explicit error."

The output contains a valid chart but no explicit numeric/text values. So per the heuristic, I should choose Path B to print the values.

But wait - the goal was specifically to "plot" the trend, which has been accomplished. The chart is the output. However, providing insights from observed outputs requires knowing the actual numbers. Since I only see "<Figure size 1200x500 with 1 Axes>" and not the actual chart content or numeric values, I should display the monthly_rev values to provide meaningful insights.

Let me go with Path B - display the monthly revenue values to provide concrete insights.

The chart was generated successfully, but to give concrete insights I need to see the actual monthly revenue values behind it. Let me display those numbers.

Sort monthly revenue
monthly_rev.sort_values('Month')
Month Revenue
0 2010-12 748957.020
1 2011-01 560000.260
2 2011-02 498062.650
3 2011-03 683267.080
4 2011-04 493207.121
5 2011-05 723333.510
6 2011-06 691123.120
7 2011-07 681300.111
8 2011-08 682680.510
9 2011-09 1019687.622
10 2011-10 1070704.670
11 2011-11 1461756.250
12 2011-12 433668.010
Thinking

The user asked to plot monthly revenue trend. The chart has been generated and the numeric values are now available. Let me analyze the data:

  • Dec 2010: 748,957 USD
  • Jan 2011: 560,000 USD
  • Feb 2011: 498,063 USD
  • Mar 2011: 683,267 USD
  • Apr 2011: 493,207 USD
  • May 2011: 723,334 USD
  • Jun 2011: 691,123 USD
  • Jul 2011: 681,300 USD
  • Aug 2011: 682,681 USD
  • Sep 2011: 1,019,688 USD
  • Oct 2011: 1,070,705 USD
  • Nov 2011: 1,461,756 USD (peak)
  • Dec 2011: 433,668 USD (note: likely incomplete month)

Key insights:

  • Revenue was relatively flat from Jan-Aug 2011, fluctuating between ~493K-723K USD
  • Strong growth starting Sep 2011, peaking in Nov 2011 at ~1.46M USD
  • Dec 2011 appears artificially low (likely incomplete data - only partial December)
  • The overall trend from Sep-Nov shows a significant holiday season uplift

This is sufficient to answer the original goal. Path A.

The monthly revenue trend reveals several clear patterns:

  • Flat mid-range period (Jan–Aug 2011): Revenue oscillated between roughly 493,000 USD and 723,000 USD, with no strong directional trend. April 2011 was the weakest month in this range at ~493,207 USD.

  • Strong holiday uplift (Sep–Nov 2011): Revenue surged from ~1,019,688 USD in September to a peak of ~1,461,756 USD in November — nearly a 3× increase from the summer months. This clearly reflects the holiday shopping season driving sales.

  • December 2010 vs December 2011: Dec 2010 logged ~748,957 USD while Dec 2011 shows only ~433,668 USD. The Dec 2011 figure is almost certainly incomplete (the dataset likely cuts off mid-month), so it shouldn't be interpreted as a real decline.

  • Overall trend: Excluding the partial Dec 2011, the data shows a clear upward trajectory in the second half of the year, with Q4 2011 revenue dramatically outperforming the rest of the year.

show top 10 products by total revenue
Thinking

The user wants to see the top 10 products by total revenue. I'll group by product (Description) and sum Revenue, then sort and display the top 10.

Calculate top 10 products
top_products = df.groupby('Description')['Revenue'].sum().reset_index().sort_values('Revenue', ascending=False).head(10)
top_products
Description Revenue
1098 DOTCOM POSTAGE 206245.48
2915 REGENCY CAKESTAND 3 TIER 164762.19
3918 WHITE HANGING HEART T-LIGHT HOLDER 99668.47
2471 PARTY BUNTING 98302.98
1866 JUMBO BAG RED RETROSPOT 92356.03
2803 RABBIT NIGHT LIGHT 66756.59
2753 POSTAGE 66230.64
2439 PAPER CHAIN KIT 50'S CHRISTMAS 63791.94
244 ASSORTED COLOUR BIRD ORNAMENT 58959.73
773 CHILLI LIGHTS 53768.06
Thinking

The user goal is to show top 10 products by total revenue. The latest output already contains a table showing exactly this — the top 10 products ranked by total revenue with their dollar amounts. This fully answers the goal.

Here are the top 10 products by total revenue:

RankProductRevenue
1DOTCOM POSTAGE206,245.48 USD
2REGENCY CAKESTAND 3 TIER164,762.19 USD
3WHITE HANGING HEART T-LIGHT HOLDER99,668.47 USD
4PARTY BUNTING98,302.98 USD
5JUMBO BAG RED RETROSPOT92,356.03 USD
6RABBIT NIGHT LIGHT66,756.59 USD
7POSTAGE66,230.64 USD
8PAPER CHAIN KIT 50'S CHRISTMAS63,791.94 USD
9ASSORTED COLOUR BIRD ORNAMENT58,959.73 USD
10CHILLI LIGHTS53,768.06 USD

A couple of observations worth noting:

  • DOTCOM POSTAGE dominates at ~206,245 USD, which is nearly 25% more than the second-place product. This likely represents shipping/handling charges rather than a physical product, so it may be worth excluding depending on your analysis focus.
  • POSTAGE (rank 7) similarly appears to be a shipping-related line item rather than a sellable product.
  • Among actual merchandise, REGENCY CAKESTAND 3 TIER leads at ~164,762 USD, followed by WHITE HANGING HEART T-LIGHT HOLDER and PARTY BUNTING — all gift/decor items consistent with the holiday revenue spike seen earlier.
what is the average order value and how does it vary by country?
Thinking

The user wants to know the average order value and how it varies by country. I need to calculate this from the dataset.

An "order" is typically defined by InvoiceNo. So average order value = total revenue / number of unique invoices.

To break it down by country, I need to group by InvoiceNo and Country first to get order-level totals, then average by country.

Let me compute this step by step:

  1. Calculate order-level revenue (group by InvoiceNo, Country)
  2. Then compute average order value overall and by country

I'll start with computing the overall average order value and then the breakdown by country.

I'll calculate average order value at the invoice level, then break it down by country. An "order" here is defined by InvoiceNo.

Calculate average order value
# Calculate order-level totals (one row per invoice)
order_totals = df.groupby(['InvoiceNo', 'Country'])['Revenue'].sum().reset_index()
order_totals.columns = ['InvoiceNo', 'Country', 'OrderRevenue']

# Overall average order value
overall_aov = order_totals['OrderRevenue'].mean()

# Average order value by country
aov_by_country = order_totals.groupby('Country')['OrderRevenue'].mean().reset_index().sort_values('OrderRevenue', ascending=False)
aov_by_country.columns = ['Country', 'AvgOrderValue']

overall_aov, aov_by_country.head(20)
(np.float64(376.3609240926641),
                  Country  AvgOrderValue
 24           Netherlands    2818.431089
 0              Australia    1986.627101
 21               Lebanon    1693.880000
 20                 Japan    1262.165000
 4                 Brazil    1143.600000
 28                   RSA    1002.310000
 30             Singapore     912.039000
 9                Denmark     893.720952
 25                Norway     879.086500
 18                Israel     878.646667
 32                Sweden     795.563261
 15                Greece     785.086667
 33           Switzerland     761.964189
 10                  EIRE     731.324500
 16             Hong Kong     674.469333
 7                 Cyprus     647.314500
 35  United Arab Emirates     634.093333
 17               Iceland     615.714286
 5                 Canada     611.063333
 6        Channel Islands     608.675455)
Thinking

The user asked about average order value and how it varies by country. The latest output already provides both the overall AOV and a country-level breakdown. Let me analyze and provide a final answer.

Overall AOV: 376.36 USD

Top countries by AOV:

  • Netherlands: 2,818.43 USD
  • Australia: 1,986.63 USD
  • Lebanon: 1,693.88 USD
  • Japan: 1,262.17 USD
  • Brazil: 1,143.60 USD

The output already has the full answer. I'll go with Path A.

The overall average order value is 376.36 USD, but there's massive variation by country:

Highest AOV countries:

  • Netherlands leads dramatically at 2,818.43 USD — nearly 7.5× the global average
  • Australia follows at 1,986.63 USD
  • Lebanon at 1,693.88 USD
  • Japan at 1,262.17 USD
  • Brazil at 1,143.60 USD

Key observations:

  • The top 5 countries by AOV all exceed 1,000 USD per order, roughly 3× the overall average — these are likely smaller markets with fewer but larger wholesale/bulk orders.
  • Netherlands stands out as an extreme outlier; its AOV is ~42% higher than the next country (Australia), suggesting a handful of very large orders are pulling the average up.
  • Many of the high-AOV countries have relatively few orders, so the averages are volatile. The UK (the dominant market by volume) likely sits closer to the overall average, pulling it down toward 376 USD.

The spread from top (~2,818 USD) to bottom is very wide, indicating that a one-size-fits-all pricing or marketing strategy wouldn't work across geographies — high-AOV countries may warrant dedicated account management, while lower-AOV markets could benefit from upselling campaigns.

Score cards for glm-5.1 for E-commerce Sales Analysis in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major requested analyses were completed: dataset loading/basic info, monthly revenue trend, top 10 products, and AOV by country.

  • Execution Correctness: 2/2

    Excellent

    The code shown is coherent and likely runnable, with valid pandas grouping and plotting steps and no obvious syntax errors.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically, including shape/dtypes/null counts, a monthly revenue chart, top products table, and country-level AOV results.

  • Reasoning Quality: 2/2

    Excellent

    The analysis correctly interprets the outputs and provides sensible insights, such as the holiday-season revenue spike and the meaning of AOV by country.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and grounded in the displayed outputs, with no major hallucinations or unsafe steps.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.