Iris Feature Analysis and Visualization in Python

Load the Iris dataset from scikit-learn, create a seaborn feature pairplot, and explore species separation patterns using an AI data analyst.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Iris dataset from scikit-learn into a pandas DataFrame and reports basic statistics and dataset shape. It generates a seaborn pairplot of the four numeric features, colored by species, to visualize pairwise relationships. It then compares feature separation patterns across species and summarizes which measurements most clearly distinguish the classes.

Who this example is for

This is for learners and analysts who want a compact, conversational example of exploratory data analysis on a classic multiclass dataset. It helps you practice interpreting pairplots and translating visual separation into a feature-level summary.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Iris DataFrame with shape (150, 5) and df.describe() summary
  • Seaborn pairplot (4x4) colored by setosa, versicolor, and virginica
  • Identification of petal length and petal width as the strongest separators
  • Observation that setosa is fully isolated in petal feature space

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • scikit-learn
  • pandas
  • seaborn
  • matplotlib
  • numpy

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load iris dataset from scikit-learn and show basic statistics
  2. 2create a feature pairplot colored by species
  3. 3which features best separate the three species?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with qwen3-coder-next

This run uses AI with a step-by-step list of prompts to perform the analysis task Iris Feature Analysis and Visualization in Python. Below, you can review the full conversation generated with qwen3-coder-next. After the conversation, we show detailed score cards for this run.

load iris dataset from scikit-learn and show basic statistics
Load iris data and show statistics
from sklearn.datasets import load_iris
import pandas as pd

# Load iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Show basic statistics
df.describe()
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333 1.000000
std 0.828066 0.435866 1.765298 0.762238 0.819232
min 4.300000 2.000000 1.000000 0.100000 0.000000
25% 5.100000 2.800000 1.600000 0.300000 0.000000
50% 5.800000 3.000000 4.350000 1.300000 1.000000
75% 6.400000 3.300000 5.100000 1.800000 2.000000
max 7.900000 4.400000 6.900000 2.500000 2.000000

The notebook output shows the basic statistics for the iris dataset, including count, mean, std, min, 25%, 50%, 75%, and max for all four features (sepal length, sepal width, petal length, petal width) and the target variable. The results confirm the iris dataset has 150 samples, with feature values in centimeters, and target values ranging from 0 to 2 (representing three iris species: setosa, versicolor, and virginica). This fully satisfies the original goal.

The iris dataset contains 150 samples with the following basic statistics:

  • Sepal length: mean = 5.84 cm, range = [4.3, 7.9] cm
  • Sepal width: mean = 3.06 cm, range = [2.0, 4.4] cm
  • Petal length: mean = 3.76 cm, range = [1.0, 6.9] cm
  • Petal width: mean = 1.20 cm, range = [0.1, 2.5] cm
  • Target: mean = 1.0 (encoding three species: 0, 1, 2)
create a feature pairplot colored by species
Visualize feature pairplot by species
import seaborn as sns
import matplotlib.pyplot as plt

# Create pairplot colored by species (target)
sns.pairplot(df, hue='target', diag_kind='hist', height=2.5)
plt.suptitle('Iris Dataset: Feature Pairplot by Species', y=1.02)
plt.show()

The pairplot was successfully generated, as indicated by the output <Figure size 1058.88x1000 with 20 Axes>. This confirms the visualization of feature pairwise relationships with points colored by species (target). The 20 axes indicate a full 4×4 grid of scatter plots (4 features on x and y) plus 4 diagonal histograms — exactly as expected for a 4-feature dataset.

No further action is needed, as the original goal — "create a feature pairplot colored by species" — is fully satisfied by the executed code and output.

A feature pairplot for the Iris dataset, colored by species (target), has been successfully created. It visualizes pairwise relationships between all four features (sepal length, sepal width, petal length, petal width), with distinct colors for the three species. Diagonal histograms show the distribution of each feature.

which features best separate the three species?

The pairplot already gives a clear visual indication: petal length and petal width appear to best separate the three species (target = 0, 1, 2), as their scatter plots show clear clustering with minimal overlap between species.

To support this visually and quantitatively, let’s compute and visualize the correlation between each feature and the target — a higher absolute correlation suggests better separation.

Compute correlations with target
# Compute correlation with target
corr_with_target = df.corr()['target'].sort_values(ascending=False)
corr_with_target
target               1.000000
petal width (cm)     0.956547
petal length (cm)    0.949035
sepal length (cm)    0.782561
sepal width (cm)    -0.426658
Name: target, dtype: float64

The correlation output confirms what the pairplot visually suggested: petal width (0.957) and petal length (0.949) have the strongest positive correlation with the target (species), meaning they best separate the three species. In contrast, sepal width has a moderate negative correlation (−0.427), and sepal length (0.783) is moderately predictive but less discriminative than the petal features.

Based on correlation with the target, the two features that best separate the three Iris species are petal width (0.957) and petal length (0.949) — nearly perfect positive correlations. Sepal length (0.783) shows moderate separation, while sepal width (−0.427) is the weakest predictor.

Score cards for qwen3-coder-next for Iris Feature Analysis and Visualization in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    The notebook loads the Iris dataset, shows basic statistics, creates a pairplot, and answers which features separate species best.

  • Execution Correctness: 2/2

    Excellent

    The code is syntactically coherent and the provided outputs indicate it ran successfully, including the dataframe summary, plot, and correlation results.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes: a 150-row Iris summary, a 4x4 pairplot, and a correct conclusion that petal length and petal width best separate species, with setosa isolated implied by the pairplot.

  • Reasoning Quality: 2/2

    Excellent

    The reasoning correctly uses the pairplot and correlation results to identify petal features as the strongest separators and is consistent with the Iris dataset structure.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and grounded in actual outputs, with no obvious hallucinations or unsupported claims.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.