Data Analysis

Exploratory Data Analysis (EDA) in Python

Complete EDA workflow: load data, check quality, compute correlations, detect outliers, and visualize distributions — all with an AI data analyst.

What

This AI Data Analyst workflow loads the scikit-learn diabetes dataset and summarizes its structure, feature names, and basic statistics. It generates correlation analysis across all features, including a full correlation matrix visualization. It detects per-feature outliers using the IQR method and identifies which features relate most strongly to the target with supporting plots and correlation coefficients.

Who

This is for analysts and students who want a repeatable EDA template for a small, well-known regression dataset. It helps you practice data quality checks, correlation inspection, and outlier detection in a conversational, code-generating notebook.

Tools

  • scikit-learn
  • pandas
  • numpy
  • matplotlib
  • seaborn

Outcomes

  • Dataset shape (442, 10) with feature names and summary stats
  • 10x10 correlation heatmap of all features
  • Outlier counts per feature using the IQR method
  • Top 3 feature vs target scatter plots with correlation coefficients

Quality Score

7/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

  • Task Completion: 1/2

    Needs work

    Loaded the sklearn diabetes dataset, produced a 10x10 feature correlation heatmap, computed IQR outlier counts per feature, and identified strongest feature-target correlations. However, it did not produce the required scatter plots of the top 3 features vs target with correlation coefficients.

  • Execution Correctness: 2/2

    Excellent

    All provided code blocks are syntactically correct and consistent (data loading, correlation heatmap, IQR outlier table, and target correlation ranking) and are likely runnable as-is.

  • Output Quality: 2/3

    Good

    Outputs match most expected outcomes: dataset shape info (442 rows, 10 features), a 10x10 correlation heatmap, and a clear outlier summary table plus feature-target correlations. Missing the scatter plots requirement prevents a full match.

  • Reasoning Quality: 1/2

    Needs work

    Interpretations of correlations and outlier distribution are generally aligned with the computed outputs, but some narrative claims (e.g., specific correlation patterns in the heatmap) are not directly evidenced by numeric values shown.

  • Reliability: 1/1

    Excellent

    Workflow is consistent and uses standard, robust methods (pandas corr, seaborn heatmap, IQR rule) without hallucinated APIs; results are reproducible from the code.

load the diabetes dataset (from sklearn) and show basic info

You reached prompts limit, please contact us for more.

plot a correlation matrix of all features

You reached prompts limit, please contact us for more.

detect outliers in each feature using the IQR method

You reached prompts limit, please contact us for more.

which features have the strongest relationship with the target variable?

You reached prompts limit, please contact us for more.

Browse All Analysis Examples