AI Data Analysis Benchmarks for Exploratory Data Analysis

We defined practical analysis workflows from multiple domains, then ran them with AI Data Analyst using different LLM engines. On this page you can browse each workflow, open full notebook conversations, and compare model quality in shared score tables. The overall results show that modern LLMs perform very well on structured data analysis tasks.

Exploratory Data Analysis Workflow Examples

Browse reproducible AI data analysis workflows in Exploratory Data Analysis. Open any example to review prompts, conversation steps, generated code, outputs, and model-level quality scores.

Boston Housing Prices EDA in Python

Explore the Boston Housing dataset with price distributions, feature correlations, and outlier detection using an AI data analyst.

Open analysis →

E-commerce Sales Analysis in Python

Explore an e-commerce sales dataset with monthly trends, top products, category breakdowns, and average order value analysis.

Open analysis →

HR Employee Attrition Analysis in Python

Explore the IBM HR Analytics dataset to uncover attrition patterns by department, age, salary, and job satisfaction.

Open analysis →

Iris Species Classification with Decision Tree

Train a decision tree classifier on the Iris dataset, evaluate accuracy, and visualize the decision boundaries using an AI data analyst.

Open analysis →

Titanic Survival Analysis in Python

Explore the Titanic dataset with survival rates by class, sex, and age, handle missing values, and visualize patterns using an AI data analyst.

Open analysis →

Model Comparison for Exploratory Data Analysis

Compare LLM performance across workflows in this category. Open any score chip to jump directly to that model run and inspect the full conversation and notebook output.

Average score (0-10)

gemma4:31b

10.00

n=5

glm-5.1

10.00

n=5

gpt-oss:120b

10.00

n=5

qwen3-coder-next

10.00

n=5

qwen3.5:397b

9.80

n=5

gpt-5.4

9.60

n=5

gemma4:31b

Average score: 10.00/10

Scored workflows: 5

glm-5.1

Average score: 10.00/10

Scored workflows: 5

gpt-oss:120b

Average score: 10.00/10

Scored workflows: 5

qwen3-coder-next

Average score: 10.00/10

Scored workflows: 5

qwen3.5:397b

Average score: 9.80/10

Scored workflows: 5

gpt-5.4

Average score: 9.60/10

Scored workflows: 5

Detailed Workflow Comparison Table for Exploratory Data Analysis

This table compares model scores for each workflow in Exploratory Data Analysis. Open any score chip to jump directly to the selected model conversation and review full prompts, code, outputs, and score cards.

Workflow	gemma4:31b	glm-5.1	gpt-5.4	gpt-oss:120b	qwen3-coder-next	qwen3.5:397b
Boston Housing Prices EDA in Python housing-prices-eda	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10
E-commerce Sales Analysis in Python ecommerce-sales-eda	10.0/10	10.0/10	8.0/10	10.0/10	10.0/10	9.0/10
HR Employee Attrition Analysis in Python hr-attrition-analysis	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10
Iris Species Classification with Decision Tree iris-classification	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10
Titanic Survival Analysis in Python titanic-survival-analysis	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10

What This Benchmark Shows

We tested the same step-by-step data analysis workflows across multiple LLM models and compared results using a shared scoring rubric. In Exploratory Data Analysis, most models produce strong notebook outputs with high task completion and useful analytical reasoning. Use these examples as a reference for prompt design, model selection, and workflow quality before running similar analyses on your own data in MLJAR Studio.

Start using AI for Exploratory Data Analysis

MLJAR Studio helps you analyze data with AI, run machine learning workflows, and build reproducible notebook-based results on your own computer.

Runs locally • Supports local LLMs

Start Free Trial

View Documentation