AI Data Analysis Benchmarks

We defined practical analysis workflows from multiple domains, then ran them with AI Data Analyst using different LLM engines. On this page you can browse each workflow, open full notebook conversations, and compare model quality in shared score tables. The overall results show that modern LLMs perform very well on structured data analysis tasks.

Global model comparison

These cards and table compare all scored model runs across all published analysis workflows.

Average score (0-10)

gpt-oss:120b
9.75
n=24
gpt-5.4
9.58
n=24
glm-5.1
9.38
n=24
gemma4:31b
8.96
n=24
qwen3-coder-next
8.58
n=24
qwen3.5:397b
8.46
n=24

gpt-oss:120b

Average score: 9.75/10

Scored workflows: 24

gpt-5.4

Average score: 9.58/10

Scored workflows: 24

glm-5.1

Average score: 9.38/10

Scored workflows: 24

gemma4:31b

Average score: 8.96/10

Scored workflows: 24

qwen3-coder-next

Average score: 8.58/10

Scored workflows: 24

qwen3.5:397b

Average score: 8.46/10

Scored workflows: 24

Detailed Workflow Comparison Table

This table compares model scores for each workflow. Open any score chip to jump directly to the selected model conversation and review full prompts, code, outputs, and score cards.

Workflowgemma4:31bglm-5.1gpt-5.4gpt-oss:120bqwen3-coder-nextqwen3.5:397b
Air Passengers Forecasting with ARIMA
air-passengers-forecast
10.0/1010.0/109.0/1010.0/108.0/1010.0/10
Apple Stock Price Analysis in Python
stock-price-analysis
10.0/109.0/1010.0/1010.0/106.0/107.0/10
Bitcoin Returns and Volatility Analysis
crypto-returns-analysis
10.0/106.0/1010.0/1010.0/109.0/106.0/10
Boston Housing Prices EDA in Python
housing-prices-eda
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Breast Cancer Diagnosis with SVM in Python
breast-cancer-diagnosis
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
E-commerce Sales Analysis in Python
ecommerce-sales-eda
9.0/1010.0/1010.0/1010.0/109.0/1010.0/10
Energy Consumption Forecasting with Prophet
energy-consumption-forecast
10.0/1010.0/1010.0/1010.0/103.0/1010.0/10
Exploratory Data Analysis (EDA) in Python
exploratory-data-analysis-python
8.0/1010.0/1010.0/1010.0/1010.0/109.0/10
How to Analyze a CSV File in Python
analyze-csv-python
10.0/109.0/109.0/1010.0/109.0/104.0/10
HR Employee Attrition Analysis in Python
hr-attrition-analysis
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Hypothesis Testing in Python (t-test, ANOVA)
hypothesis-testing-python
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Iris Feature Analysis and Visualization in Python
iris-feature-analysis
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Iris Species Classification with Decision Tree
iris-classification
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Linear Regression Analysis in Python
regression-analysis-python
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Portfolio Optimization in Python
portfolio-optimization
5.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Red Wine Quality EDA in Python
wine-quality-eda
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Sentiment Analysis of Amazon Reviews
sentiment-analysis-python
10.0/108.0/108.0/1010.0/109.0/104.0/10
Telco Customer Churn Prediction in Python
customer-churn-prediction
6.0/1010.0/1010.0/1010.0/1010.0/106.0/10
Text Data EDA in Python
text-eda-python
8.0/1010.0/1010.0/1010.0/106.0/1010.0/10
Time Series Anomaly Detection in Python
anomaly-detection
6.0/1010.0/108.0/108.0/107.0/1010.0/10
Titanic Survival Analysis in Python
titanic-survival-analysis
10.0/1010.0/1010.0/1010.0/1010.0/1010.0/10
Topic Modeling with LDA in Python
topic-modeling-python
9.0/106.0/109.0/1010.0/107.0/1010.0/10
Value at Risk (VaR) Analysis in Python
risk-metrics-var
9.0/1010.0/109.0/1010.0/1010.0/102.0/10
Data Cleaning with Pandas in Python
pandas-data-cleaning
5.0/107.0/108.0/106.0/103.0/105.0/10

What This Benchmark Shows

We tested the same step-by-step data analysis workflows across multiple LLM models and compared results using a shared scoring rubric. Across domains, most models deliver strong notebook outputs with high task completion and useful analytical reasoning. Use these examples as a reference for prompt design, model selection, and workflow quality before running similar analyses on your own data in MLJAR Studio.

Start using AI for Data Analysis

MLJAR Studio helps you analyze data with AI, run machine learning workflows, and build reproducible notebook-based results on your own computer.

Runs locally • Supports local LLMs