Data Analysis

Exploratory Data Analysis (EDA) in Python

Complete EDA workflow: load data, check quality, compute correlations, detect outliers, and visualize distributions — all with an AI data analyst.

What

This AI Data Analyst workflow loads the scikit-learn diabetes dataset and summarizes its structure, feature names, and basic statistics. It generates correlation analysis across all features, including a full correlation matrix visualization. It detects per-feature outliers using the IQR method and identifies which features relate most strongly to the target with supporting plots and correlation coefficients.

Who

This is for analysts and students who want a repeatable EDA template for a small, well-known regression dataset. It helps you practice data quality checks, correlation inspection, and outlier detection in a conversational, code-generating notebook.

Tools

scikit-learn
pandas
numpy
matplotlib
seaborn

Outcomes

Dataset shape (442, 10) with feature names and summary stats
10x10 correlation heatmap of all features
Outlier counts per feature using the IQR method
Top 3 feature vs target scatter plots with correlation coefficients

Quality Score

7/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

Task Completion: 1/2
Needs work
Loaded the sklearn diabetes dataset, produced a 10x10 feature correlation heatmap, computed IQR outlier counts per feature, and identified strongest feature-target correlations. However, it did not produce the required scatter plots of the top 3 features vs target with correlation coefficients.
Execution Correctness: 2/2
Excellent
All provided code blocks are syntactically correct and consistent (data loading, correlation heatmap, IQR outlier table, and target correlation ranking) and are likely runnable as-is.
Output Quality: 2/3
Good
Outputs match most expected outcomes: dataset shape info (442 rows, 10 features), a 10x10 correlation heatmap, and a clear outlier summary table plus feature-target correlations. Missing the scatter plots requirement prevents a full match.
Reasoning Quality: 1/2
Needs work
Interpretations of correlations and outlier distribution are generally aligned with the computed outputs, but some narrative claims (e.g., specific correlation patterns in the heatmap) are not directly evidenced by numeric values shown.
Reliability: 1/1
Excellent
Workflow is consistent and uses standard, robust methods (pandas corr, seaborn heatmap, IQR rule) without hallucinated APIs; results are reproducible from the code.

load the diabetes dataset (from sklearn) and show basic info

You reached prompts limit, please contact us for more.

plot a correlation matrix of all features

You reached prompts limit, please contact us for more.

detect outliers in each feature using the IQR method

You reached prompts limit, please contact us for more.

which features have the strongest relationship with the target variable?

You reached prompts limit, please contact us for more.

Exploratory Data Analysis (EDA) in Python

What

Who

Tools

Outcomes

Quality Score

Browse All Analysis Examples

Machine Learning

Time Series

Data Analysis

Exploratory Data Analysis

Finance

Nlp

Statistics