Exploratory Data Analysis (EDA) in Python

Complete EDA workflow: load data, check quality, compute correlations, detect outliers, and visualize distributions — all with an AI data analyst.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the scikit-learn diabetes dataset and summarizes its structure, feature names, and basic statistics. It generates correlation analysis across all features, including a full correlation matrix visualization. It detects per-feature outliers using the IQR method and identifies which features relate most strongly to the target with supporting plots and correlation coefficients.

Who this example is for

This is for analysts and students who want a repeatable EDA template for a small, well-known regression dataset. It helps you practice data quality checks, correlation inspection, and outlier detection in a conversational, code-generating notebook.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Dataset shape (442, 10) with feature names and summary stats
  • 10x10 correlation heatmap of all features
  • Outlier counts per feature using the IQR method
  • Top 3 feature vs target scatter plots with correlation coefficients

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • scikit-learn
  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load the diabetes dataset (from sklearn) and show basic info
  2. 2plot a correlation matrix of all features
  3. 3detect outliers in each feature using the IQR method
  4. 4which features have the strongest relationship with the target variable?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b9/102/22/22/32/21/1Open qwen3.5:397b conversation
gemma4:31b8/102/21/22/32/21/1Open gemma4:31b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with qwen3-coder-next

This run uses AI with a step-by-step list of prompts to perform the analysis task Exploratory Data Analysis (EDA) in Python. Below, you can review the full conversation generated with qwen3-coder-next. After the conversation, we show detailed score cards for this run.

load the diabetes dataset (from sklearn) and show basic info
Load diabetes data
from sklearn.datasets import load_diabetes
import pandas as pd

# Load the diabetes dataset
diabetes = load_diabetes()

# Create a DataFrame for better readability
df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
df['target'] = diabetes.target

# Show basic info: shape and first few rows
df.shape, df.head()
((442, 11),
         age       sex       bmi        bp        s1        s2        s3  \
 0  0.038076  0.050680  0.061696  0.021872 -0.044223 -0.034821 -0.043401   
 1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163  0.074412   
 2  0.085299  0.050680  0.044451 -0.005670 -0.045599 -0.034194 -0.032356   
 3 -0.089063 -0.044642 -0.011595 -0.036656  0.012191  0.024991 -0.036038   
 4  0.005383 -0.044642 -0.036385  0.021872  0.003935  0.015596  0.008142   
 
          s4        s5        s6  target  
 0 -0.002592  0.019907 -0.017646   151.0  
 1 -0.039493 -0.068332 -0.092204    75.0  
 2 -0.002592  0.002861 -0.025930   141.0  
 3  0.034309  0.022688 -0.009362   206.0  
 4 -0.002592 -0.031988 -0.046641   135.0  )

The original goal was to load the diabetes dataset from sklearn and show basic info. The latest notebook output confirms:

  • Dataset shape: 442 rows (samples) × 11 columns (10 features + 1 target)
  • Feature columns: age, sex, bmi, bp, s1, s2, s3, s4, s5, s6 (all numeric, likely standardized)
  • Target column: target (continuous, representing disease progression)
  • First 5 rows show realistic values: e.g., row 0 has age=0.038, target=151.0

This satisfies the goal — the dataset is loaded, structured in a DataFrame, and basic info (shape, preview) is displayed.

The diabetes dataset has been successfully loaded with 442 samples and 11 columns (10 features + 1 target). The features are numerical and appear to be standardized (mean ~0, small magnitude), and the target (disease progression) ranges from low (75.0) to high (206.0) in the first few rows.

plot a correlation matrix of all features

I'll compute the correlation matrix for all features (including the target) and plot it as a heatmap using seaborn.

Plot correlation heatmap
import seaborn as sns
import matplotlib.pyplot as plt

# Compute correlation matrix (all columns)
corr = df.corr()

# Plot heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=False, cmap='coolwarm', center=0, square=True)
plt.title('Correlation Matrix of Diabetes Dataset Features')
plt.show()

The heatmap for the correlation matrix was created successfully. Since the notebook output shows <Figure size 1000x800 with 2 Axes>, the plot was rendered — but to interpret it, we need the actual correlation values. Let's print the correlation matrix to identify the strongest relationships.

Display correlation matrix heatmap
corr.style.format("{:.2f}").background_gradient(cmap='coolwarm')
  age sex bmi bp s1 s2 s3 s4 s5 s6 target
age 1.00 0.17 0.19 0.34 0.26 0.22 -0.08 0.20 0.27 0.30 0.19
sex 0.17 1.00 0.09 0.24 0.04 0.14 -0.38 0.33 0.15 0.21 0.04
bmi 0.19 0.09 1.00 0.40 0.25 0.26 -0.37 0.41 0.45 0.39 0.59
bp 0.34 0.24 0.40 1.00 0.24 0.19 -0.18 0.26 0.39 0.39 0.44
s1 0.26 0.04 0.25 0.24 1.00 0.90 0.05 0.54 0.52 0.33 0.21
s2 0.22 0.14 0.26 0.19 0.90 1.00 -0.20 0.66 0.32 0.29 0.17
s3 -0.08 -0.38 -0.37 -0.18 0.05 -0.20 1.00 -0.74 -0.40 -0.27 -0.39
s4 0.20 0.33 0.41 0.26 0.54 0.66 -0.74 1.00 0.62 0.42 0.43
s5 0.27 0.15 0.45 0.39 0.52 0.32 -0.40 0.62 1.00 0.46 0.57
s6 0.30 0.21 0.39 0.39 0.33 0.29 -0.27 0.42 0.46 1.00 0.38
target 0.19 0.04 0.59 0.44 0.21 0.17 -0.39 0.43 0.57 0.38 1.00

The correlation heatmap was rendered successfully, but to extract concrete insights from the correlation values (as the user goal was to "plot a correlation matrix of all features"), we need to inspect the numeric matrix. The Styler output indicates the matrix exists, but doesn’t display it inline.

Let’s display the correlation matrix numerically in a concise, readable format.

Round correlation values
corr.round(2)
age sex bmi bp s1 s2 s3 s4 s5 s6 target
age 1.00 0.17 0.19 0.34 0.26 0.22 -0.08 0.20 0.27 0.30 0.19
sex 0.17 1.00 0.09 0.24 0.04 0.14 -0.38 0.33 0.15 0.21 0.04
bmi 0.19 0.09 1.00 0.40 0.25 0.26 -0.37 0.41 0.45 0.39 0.59
bp 0.34 0.24 0.40 1.00 0.24 0.19 -0.18 0.26 0.39 0.39 0.44
s1 0.26 0.04 0.25 0.24 1.00 0.90 0.05 0.54 0.52 0.33 0.21
s2 0.22 0.14 0.26 0.19 0.90 1.00 -0.20 0.66 0.32 0.29 0.17
s3 -0.08 -0.38 -0.37 -0.18 0.05 -0.20 1.00 -0.74 -0.40 -0.27 -0.39
s4 0.20 0.33 0.41 0.26 0.54 0.66 -0.74 1.00 0.62 0.42 0.43
s5 0.27 0.15 0.45 0.39 0.52 0.32 -0.40 0.62 1.00 0.46 0.57
s6 0.30 0.21 0.39 0.39 0.33 0.29 -0.27 0.42 0.46 1.00 0.38
target 0.19 0.04 0.59 0.44 0.21 0.17 -0.39 0.43 0.57 0.38 1.00

The correlation matrix has been plotted as a heatmap and the full numeric correlation matrix has been displayed:

  • Strongest positive correlations with the target include bmi (0.59), s5 (0.57), bp (0.44), and s4 (0.43).
  • Strongest negative correlation with target is s3 (-0.39).
  • Feature s1 and s2 are highly correlated (0.90), indicating potential multicollinearity.
  • The heatmap and numeric matrix together fulfill the original goal of plotting and interpreting the correlation matrix of all features.
detect outliers in each feature using the IQR method

To detect outliers using the IQR method, we'll compute for each feature:

  • Q1 (25th percentile), Q3 (75th percentile), IQR = Q3 − Q1
  • Outlier bounds: < Q1 − 1.5·IQR or > Q3 + 1.5·IQR
    Then count how many outliers exist per feature.

Let's compute and display a clean summary: number of outliers per column (excluding the target column if desired — but here we include all, as the prompt says "each feature").

Count outliers per column
outlier_counts = {}
for col in df.columns:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5 * IQR
    upper = Q3 + 1.5 * IQR
    count = ((df[col] < lower) | (df[col] > upper)).sum()
    outlier_counts[col] = count

# Convert to Series for tidy output
outlier_series = pd.Series(outlier_counts, name='outliers')
outlier_series.sort_values(ascending=False)
s6        9
s1        8
s2        7
s3        7
s5        4
bmi       3
s4        2
age       0
sex       0
bp        0
target    0
Name: outliers, dtype: int64

The IQR-based outlier detection has been executed and the results show the count of outliers per feature:

  • s6: 9 outliers
  • s1: 8 outliers
  • s2: 7 outliers
  • s3: 7 outliers
  • s5: 4 outliers
  • bmi: 3 outliers
  • s4: 2 outliers
  • age, sex, bp, target: 0 outliers

This directly answers the original goal: outliers per feature using the IQR method.

which features have the strongest relationship with the target variable?

From the correlation matrix (corr), the features with the strongest linear relationships with target (highest absolute correlation values) are:

  • bmi: 0.59
  • s5: 0.57
  • bp: 0.44
  • s6: 0.38
  • s4: 0.43
  • s1, s2, age, sex: < 0.20 (weaker)

Let’s extract and sort these to make it clearer.

Sort correlation with target
target_corr = corr['target'].drop('target').sort_values(ascending=False)
target_corr
bmi    0.586450
s5     0.565883
bp     0.441482
s4     0.430453
s6     0.382483
s1     0.212022
age    0.187889
s2     0.174054
sex    0.043062
s3    -0.394789
Name: target, dtype: float64

The latest output (target_corr) directly answers the original goal: it ranks all features by their correlation with target, showing the strongest linear relationships.

Top 5 features with strongest relationship to target:

  • bmi (0.59)
  • s5 (0.57)
  • bp (0.44)
  • s4 (0.43)
  • s6 (0.38)

Score cards for qwen3-coder-next for Exploratory Data Analysis (EDA) in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    The workflow covers loading the diabetes dataset, correlation analysis, IQR outlier detection, and identifying strongest target relationships. All major requested steps are present.

  • Execution Correctness: 2/2

    Excellent

    The code shown is syntactically valid and the outputs are consistent with the diabetes dataset. The notebook evidence indicates the cells ran successfully.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: dataset shape/info, a 10x10 correlation matrix heatmap, outlier counts per feature, and ranked correlations with the target. The strongest features are correctly identified.

  • Reasoning Quality: 2/2

    Excellent

    The explanations are clear and logically connect the computations to the user prompts. The correlation and outlier interpretations are appropriate and supported by the outputs.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and uses standard methods without obvious hallucinations or fragile logic. Results are reproducible from the provided code.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.