AI Data Analysis Benchmarks for Machine Learning

We defined practical analysis workflows from multiple domains, then ran them with AI Data Analyst using different LLM engines. On this page you can browse each workflow, open full notebook conversations, and compare model quality in shared score tables. The overall results show that modern LLMs perform very well on structured data analysis tasks.

Machine Learning Workflow Examples

Browse reproducible AI data analysis workflows in Machine Learning. Open any example to review prompts, conversation steps, generated code, outputs, and model-level quality scores.

Breast Cancer Diagnosis with SVM in Python

Load the Breast Cancer Wisconsin dataset, train an SVM classifier, and visualize PCA-reduced decision regions using an AI data analyst.

Open analysis →

Red Wine Quality EDA in Python

Explore the Wine Quality dataset with distribution plots, correlation analysis, and quality score breakdown using an AI data analyst.

Open analysis →

Telco Customer Churn Prediction in Python

Analyze the Telco Customer Churn dataset, engineer features, train a random forest classifier, and identify top churn drivers.

Open analysis →

Model Comparison for Machine Learning

Compare LLM performance across workflows in this category. Open any score chip to jump directly to that model run and inspect the full conversation and notebook output.

Average score (0-10)

gpt-5.4

10.00

n=3

qwen3-coder-next

10.00

n=3

gpt-oss:120b

9.67

n=3

glm-5.1

9.00

n=3

gemma4:31b

8.67

n=3

qwen3.5:397b

8.33

n=3

gpt-5.4

Average score: 10.00/10

Scored workflows: 3

qwen3-coder-next

Average score: 10.00/10

Scored workflows: 3

gpt-oss:120b

Average score: 9.67/10

Scored workflows: 3

glm-5.1

Average score: 9.00/10

Scored workflows: 3

gemma4:31b

Average score: 8.67/10

Scored workflows: 3

qwen3.5:397b

Average score: 8.33/10

Scored workflows: 3

Detailed Workflow Comparison Table for Machine Learning

This table compares model scores for each workflow in Machine Learning. Open any score chip to jump directly to the selected model conversation and review full prompts, code, outputs, and score cards.

Workflow	gemma4:31b	glm-5.1	gpt-5.4	gpt-oss:120b	qwen3-coder-next	qwen3.5:397b
Breast Cancer Diagnosis with SVM in Python breast-cancer-diagnosis	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10	10.0/10
Red Wine Quality EDA in Python wine-quality-eda	10.0/10	7.0/10	10.0/10	10.0/10	10.0/10	10.0/10
Telco Customer Churn Prediction in Python customer-churn-prediction	6.0/10	10.0/10	10.0/10	9.0/10	10.0/10	5.0/10

What This Benchmark Shows

We tested the same step-by-step data analysis workflows across multiple LLM models and compared results using a shared scoring rubric. In Machine Learning, most models produce strong notebook outputs with high task completion and useful analytical reasoning. Use these examples as a reference for prompt design, model selection, and workflow quality before running similar analyses on your own data in MLJAR Studio.

Start using AI for Machine Learning

MLJAR Studio helps you analyze data with AI, run machine learning workflows, and build reproducible notebook-based results on your own computer.

Runs locally • Supports local LLMs

Download Studio

View Documentation