Iris Feature Analysis and Visualization in Python

Load the Iris dataset from scikit-learn, create a seaborn feature pairplot, and explore species separation patterns using an AI data analyst.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Iris dataset from scikit-learn into a pandas DataFrame and reports basic statistics and dataset shape. It generates a seaborn pairplot of the four numeric features, colored by species, to visualize pairwise relationships. It then compares feature separation patterns across species and summarizes which measurements most clearly distinguish the classes.

Who this example is for

This is for learners and analysts who want a compact, conversational example of exploratory data analysis on a classic multiclass dataset. It helps you practice interpreting pairplots and translating visual separation into a feature-level summary.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Iris DataFrame with shape (150, 5) and df.describe() summary
  • Seaborn pairplot (4x4) colored by setosa, versicolor, and virginica
  • Identification of petal length and petal width as the strongest separators
  • Observation that setosa is fully isolated in petal feature space

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • scikit-learn
  • pandas
  • seaborn
  • matplotlib
  • numpy

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load iris dataset from scikit-learn and show basic statistics
  2. 2create a feature pairplot colored by species
  3. 3which features best separate the three species?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gpt-oss:120b

This run uses AI with a step-by-step list of prompts to perform the analysis task Iris Feature Analysis and Visualization in Python. Below, you can review the full conversation generated with gpt-oss:120b. After the conversation, we show detailed score cards for this run.

load iris dataset from scikit-learn and show basic statistics
Thinking

We need to load iris dataset from scikit-learn and show basic stats: shape, columns, head, descriptive stats. Follow notebook rules: load, then show shape, columns, head, then descriptive stats (maybe .describe()).

We should do one step: load dataset and show basic stats. According to guidelines, first step is load and inspect only: show shape, columns, head. That includes basic statistics? Maybe basic stats includes .describe().

But guidelines: "When a dataset is not yet loaded, first load and inspect only: show shape, columns, and head(). Do not combine with plotting/modeling." Descriptive stats maybe allowed as part of basic inspection? I'd treat it as stats. But guidelines suggest only shape, columns, head. Let's just load and show shape, columns, head. Then next step can compute basic descriptive stats.

Thus provide code to load iris from sklearn, convert to DataFrame, display shape, columns, head. Provide code block ending with a bare expression (maybe df.head()). No print needed.

We'll use pandas.

Proceed.

Load iris dataset
from sklearn.datasets import load_iris
import pandas as pd

# Load the iris dataset and convert to a pandas DataFrame
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df["target"] = iris.target

# Show basic information
df.shape, df.columns, df.head()
((150, 5),
 Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
        'petal width (cm)', 'target'],
       dtype='str'),
    sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
 0                5.1               3.5                1.4               0.2   
 1                4.9               3.0                1.4               0.2   
 2                4.7               3.2                1.3               0.2   
 3                4.6               3.1                1.5               0.2   
 4                5.0               3.6                1.4               0.2   
 
    target  
 0       0  
 1       0  
 2       0  
 3       0  
 4       0  )
Thinking

We have loaded iris dataset and printed shape, columns, head. The original user request: "load iris dataset from scikit-learn and show basic statistics". Basic statistics likely refers to descriptive stats: mean, std, min, max, quartiles for each feature. The current output shows shape, columns, head, but not statistics. So need to provide basic stats. That's missing. So we need Path B: provide brief insights from current outputs, then code to compute stats.

Provide insights: dataset has 150 rows, 5 columns (4 features + target). Show first rows. Then code to compute df.describe().

Return one code block.

The Iris data has been loaded as a 150 × 5 DataFrame, with the four measurement columns (sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)) and a numeric target column indicating the species (0, 1, 2).

Show summary statistics
# Basic descriptive statistics for all numeric columns
df.describe()
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333 1.000000
std 0.828066 0.435866 1.765298 0.762238 0.819232
min 4.300000 2.000000 1.000000 0.100000 0.000000
25% 5.100000 2.800000 1.600000 0.300000 0.000000
50% 5.800000 3.000000 4.350000 1.300000 1.000000
75% 6.400000 3.300000 5.100000 1.800000 2.000000
max 7.900000 4.400000 6.900000 2.500000 2.000000
Thinking

We need to decide if outputs suffice to answer original user goal: load iris dataset and show basic statistics. We have already loaded dataframe and displayed shape, columns, head, and descriptive stats via df.describe(). That provides basic statistics (count, mean, std, min, 25%, 50%, 75%, max) for each numeric column, including target. That's sufficient. So choose Path A: provide final answer summarizing insights.

We must include <!--final answer--> marker and no code block. Provide concise answer referencing observed stats. No meta phrases like "goal achieved". Provide insights.

The Iris data set contains 150 observations across five columns: the four flower measurements (sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)) and a numeric target indicating species (0 = setosa, 1 = versicolor, 2 = virginica).

Descriptive statistics

FeatureCountMeanStd‑devMin25 %50 %75 %Max
sepal length (cm)1505.840.834.305.105.806.407.90
sepal width (cm)1503.060.442.002.803.003.304.40
petal length (cm)1503.761.771.001.604.355.106.90
petal width (cm)1501.200.760.100.301.301.802.50
target (species code)1501.000.820.000.001.002.002.00

These statistics summarize the central tendency and dispersion of each measurement, confirming the dataset’s expected size and structure.

create a feature pairplot colored by species
Thinking

User wants a pairplot colored by species (target). We have df with target numeric 0,1,2. We should map species names. Provide code using seaborn pairplot (sns.pairplot) hue='target' or using mapping to species name. Provide code block that creates the plot. According to guidelines: one chart per step, use seaborn, include title. Provide code, don't ask to run. End with bare expression (the plot). Use seaborn; need import. Provide code.

Plot iris pairwise scattermatrix
import seaborn as sns
import matplotlib.pyplot as plt

# Optional: map numeric target to species names for clearer legend
species_map = dict(enumerate(load_iris().target_names))
df["species"] = df["target"].map(species_map)

# Pairwise scatterplot matrix colored by species
pairplot = sns.pairplot(
    df,
    vars=iris.feature_names,   # the four measurement columns
    hue="species",
    palette="Set2",
    plot_kws=dict(alpha=0.7, edgecolor="k", linewidth=0.3)
)

pairplot.fig.suptitle("Iris measurements – pairwise relationships by species", y=1.02)
pairplot
<seaborn.axisgrid.PairGrid at 0x710c0bf89070>
Thinking

We have the user originally wants "create a feature pairplot colored by species". The notebook already executed the pairplot creation code in cell-3. Output is a PairGrid object and Figure displayed. So the goal is satisfied: they have a pairplot. The answer should be insights? The instructions: If outputs sufficient, provide final answer based on observed outputs, include <!--final answer-->, no code.

Thus we should say something like: The pairplot has been generated showing relationships colored by species, and maybe describe what it shows (e.g., clear separation). Provide final answer.

We should not include any code block. Also no meta phrases about completion. Just concise. Provide final answer.

The pairwise scatterplot matrix has been generated, displaying the four Iris measurements (sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)) with points colored by species (setosa, versicolor, virginica). The chart clearly shows distinct clusters, especially for petal dimensions, which separate the three species.

which features best separate the three species?
Thinking

We need to answer which features best separate species based on pairplot and possibly statistical analysis. We can compute separability: maybe use ANOVA F-values or classification importance. Provide simple analysis: compute group means/std, maybe boxplots, and compute pairwise differences. Could also compute correlation with target (using label encoding) and evaluate.

We should propose a short analysis: compute ANOVA F-score via sklearn's f_classif, or use feature importance from a simple model (e.g., RandomForest). Provide code and output. Since we haven't run analysis yet, we need to provide code and let it run, then analyze. According to instruction, give step: compute ANOVA F-values. Provide code with final expression that shows scores. Then wait.

We’ll evaluate how strongly each measurement (sepal length, sepal width, petal length, petal width) varies across the three species. A quick way is an ANOVA F‑test (scikit‑learn’s f_classif) which returns a score for each feature – the larger the score, the better the feature separates the classes.

Compute ANOVA feature scores
from sklearn.feature_selection import f_classif

# Features matrix (numeric measurements only)
X = df[iris.feature_names]

# Target vector (numeric class label)
y = df["target"]

# Compute ANOVA F‑scores and p‑values
f_scores, p_values = f_classif(X, y)

# Combine into a readable DataFrame
feature_stats = pd.DataFrame({
    "feature": iris.feature_names,
    "F_score": f_scores,
    "p_value": p_values
}).sort_values("F_score", ascending=False)

feature_stats
feature F_score p_value
2 petal length (cm) 1180.161182 2.856777e-91
3 petal width (cm) 960.007147 4.169446e-85
0 sepal length (cm) 119.264502 1.669669e-31
1 sepal width (cm) 49.160040 4.492017e-17
Thinking

We have output-4: a table with features sorted by F_score. This directly answers which features best separate the three species: petal length, petal width, then sepal length, then sepal width. So we can answer directly, no further code needed. Must include <!--final answer--> at end. Provide concise answer.

The ANOVA F‑test shows the discriminatory power of each measurement:

RankFeatureF‑score (higher = better separation)
1petal length (cm)1180.16
2petal width (cm)960.01
3sepal length (cm)119.26
4sepal width (cm)49.16

Thus, petal length and petal width are the strongest features for separating the three Iris species, while sepal length provides moderate separation and sepal width is the weakest.

Score cards for gpt-oss:120b for Iris Feature Analysis and Visualization in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All three requested parts were completed: basic statistics for the Iris data, a species-colored pairplot, and an analysis of which features separate the species best.

  • Execution Correctness: 2/2

    Excellent

    The code shown is syntactically valid and the outputs match the intended workflow, including the DataFrame creation, pairplot, and ANOVA feature ranking.

  • Output Quality: 3/3

    Excellent

    The outputs align with the expected outcomes: a 150×5 dataset summary, a 4×4 pairplot colored by species, and a correct conclusion that petal length and petal width are the strongest separators with setosa isolated.

  • Reasoning Quality: 2/2

    Excellent

    The reasoning is clear and appropriate, using descriptive statistics, visualization, and ANOVA scores to support the separation claim.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and grounded in standard analysis methods, with no obvious hallucinations or unsupported claims.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.