Iris Species Classification with Decision Tree

Train a decision tree classifier on the Iris dataset, evaluate accuracy, and visualize the decision boundaries using an AI data analyst.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Iris dataset from scikit-learn and creates an 80/20 train-test split. It trains a decision tree classifier, reports accuracy, and generates a classification report. It also plots a confusion matrix and visualizes feature importances as a bar chart.

Who this example is for

This is for learners and practitioners who want a compact, reproducible example of multiclass classification with a decision tree. It helps validate model performance with standard metrics and interpret which Iris features drive predictions.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Train/test split with dataset shapes printed
  • Decision tree model trained with accuracy and classification report
  • Confusion matrix heatmap for error inspection
  • Feature importance bar chart highlighting the most influential features

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • scikit-learn
  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load iris dataset (from sklearn) and split into train/test sets with 80/20 ratio
  2. 2train a decision tree classifier and show accuracy
  3. 3plot the confusion matrix
  4. 4show feature importances as a bar chart

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with qwen3.5:397b

This run uses AI with a step-by-step list of prompts to perform the analysis task Iris Species Classification with Decision Tree. Below, you can review the full conversation generated with qwen3.5:397b. After the conversation, we show detailed score cards for this run.

load iris dataset (from sklearn) and split into train/test sets with 80/20 ratio
Thinking

The user wants me to load the iris dataset from sklearn and split it into train/test sets with 80/20 ratio. This is a straightforward task that I can do in one code block.

I'll:

  1. Load the iris dataset from sklearn.datasets
  2. Split it using train_test_split with test_size=0.2 (80/20 ratio)
  3. Show the shapes of the resulting sets to confirm the split

Let me write clean, simple code for this.

I'll load the iris dataset from sklearn and split it into training and testing sets with an 80/20 ratio.

Load and split iris data
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split into train/test sets (80/20 ratio)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Display the shapes to confirm the split
X_train.shape, X_test.shape
((120, 4), (30, 4))
Thinking

The user's goal was to load the iris dataset from sklearn and split it into train/test sets with an 80/20 ratio.

Looking at the latest output:

  • The code executed successfully
  • Output shows: ((120, 4), (30, 4))
  • This means X_train has 120 samples with 4 features, X_test has 30 samples with 4 features
  • Total samples: 120 + 30 = 150 (which is correct for the iris dataset)
  • Split ratio: 120/150 = 80% train, 30/150 = 20% test ✓

The goal has been achieved:

  1. ✓ Iris dataset loaded from sklearn
  2. ✓ Split into train/test with 80/20 ratio (120 train, 30 test out of 150 total)

This is sufficient to answer the original goal. I should provide Path A with a final answer and no code block.

The iris dataset has been successfully loaded and split:

  • Training set: 120 samples × 4 features (80%)
  • Test set: 30 samples × 4 features (20%)
  • Total: 150 samples (matches the standard iris dataset size)

The 80/20 split is confirmed: 120/150 = 80% for training and 30/150 = 20% for testing.

train a decision tree classifier and show accuracy
Thinking

The user wants to train a decision tree classifier on the iris dataset that was already loaded and split. I need to:

  1. Import DecisionTreeClassifier from sklearn
  2. Train it on X_train, y_train
  3. Evaluate accuracy on X_test, y_test
  4. Show the accuracy

This is a single step task, so I'll provide one code block.

I'll train a decision tree classifier on the iris training data and evaluate its accuracy on the test set.

Train decision tree and evaluate accuracy
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Train the decision tree classifier
dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train, y_train)

# Make predictions on test set
y_pred = dt_classifier.predict(X_test)

# Calculate and display accuracy
accuracy = accuracy_score(y_test, y_pred)
accuracy
1.0
Thinking

Looking at the notebook outputs:

  1. Cell-1: Successfully loaded iris dataset and split into train/test (80/20 ratio) - shapes are (120, 4) and (30, 4)
  2. Cell-2: Trained a decision tree classifier and calculated accuracy = 1.0 (100%)

The original user goal was to "train a decision tree classifier and show accuracy". This has been accomplished:

  • The model was trained on the iris training data
  • Accuracy on test set is 1.0 (100%)

The outputs are sufficient to answer the original goal. I should provide Path A - a natural final answer based on observed outputs with <!--final answer--> tag and no Python code block.

The decision tree classifier has been successfully trained and evaluated:

  • Test Accuracy: 1.0 (100%)
  • Training samples: 120
  • Test samples: 30

The model achieved perfect accuracy on the test set, correctly classifying all 30 test samples. This is a strong result on the iris dataset, which is a well-known benchmark classification problem with 3 flower species and 4 numerical features.

plot the confusion matrix
Thinking

The user wants to plot a confusion matrix for the decision tree classifier that was already trained. I have the model (dt_classifier), the test data (X_test, y_test), and the predictions (y_pred) from the previous step.

I'll use sklearn's confusion_matrix function and seaborn to create a heatmap visualization of the confusion matrix.

I'll create a confusion matrix visualization to show how well the decision tree classifier performed on each class.

Plot confusion matrix heatmap
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Get class names from iris dataset
iris = load_iris()
class_names = iris.target_names

# Calculate confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Create heatmap visualization
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Decision Tree Confusion Matrix')
plt.show()
Thinking

The user's original goal was to "plot the confusion matrix" for the decision tree classifier. Looking at the notebook state:

  1. Cell-1: Loaded iris dataset and split into train/test
  2. Cell-2: Trained decision tree classifier and got 100% accuracy
  3. Cell-3: Created confusion matrix heatmap visualization

The output from cell-3 shows <Figure size 600x500 with 2 Axes>, which indicates the plot was successfully created. However, this is just a figure object reference - I cannot see the actual visual content or the underlying confusion matrix values.

According to the sufficiency heuristics:

  • "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart."

The confusion matrix values themselves aren't displayed in the output. I should print the actual confusion matrix values to provide concrete insights about the classification performance.

Let me choose Path B and print the confusion matrix values to complement the visualization.

The confusion matrix heatmap has been created successfully. However, I can only see the figure object reference in the output. To provide concrete insights about the classification performance, let me display the actual confusion matrix values.

Notebook code
from sklearn.datasets import load_iris

# Get class names
iris = load_iris()
class_names = iris.target_names

# Print confusion matrix with labels
print("Confusion Matrix Values:")
print(f"Classes: {class_names}")
print(f"\n{cm}")
print(f"\nPerfect classification: All 30 test samples correctly predicted")
Confusion Matrix Values:
Classes: ['setosa' 'versicolor' 'virginica']

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Perfect classification: All 30 test samples correctly predicted
Thinking

The user's original goal was to "plot the confusion matrix". Looking at the notebook state and outputs:

  1. Cell-3 created the confusion matrix heatmap visualization (the actual plot)
  2. Cell-4 printed the confusion matrix values showing perfect classification

The outputs show:

  • A confusion matrix figure was created (cell-3 output: <Figure size 600x500 with 2 Axes>)
  • The actual matrix values are displayed (cell-4 output showing the 3x3 matrix with all predictions correct)

The confusion matrix shows:

  • 10 setosa samples correctly predicted as setosa
  • 9 versicolor samples correctly predicted as versicolor
  • 11 virginica samples correctly predicted as virginica
  • Zero misclassifications across all classes

This means the decision tree achieved 100% accuracy on the test set (30/30 correct predictions).

The original goal was to "plot the confusion matrix" - this has been accomplished. The plot was created in cell-3, and the values were printed in cell-4. Both the visual representation and the underlying numeric values are available.

According to the sufficiency heuristics:

  • "If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A unless there is an explicit error."

Both conditions are met:

  1. Chart/figure object exists (cell-3)
  2. Explicit numeric values are available (cell-4)

So I should choose Path A - provide a natural final answer based on observed outputs, include <!--final answer-->, and do not include any Python code block.

The confusion matrix has been successfully plotted and analyzed. The visualization shows a perfect classification result:

Confusion Matrix Results:

  • Setosa: 10/10 correctly predicted
  • Versicolor: 9/9 correctly predicted
  • Virginica: 11/11 correctly predicted

All 30 test samples were classified correctly with zero misclassifications across all three iris species. This confirms the 100% accuracy score from the decision tree classifier. The diagonal pattern in the matrix (all values on the diagonal, zeros elsewhere) indicates perfect prediction performance on this test set.

show feature importances as a bar chart
Thinking

The user wants to see feature importances as a bar chart. From the notebook state, I can see that a DecisionTreeClassifier has been trained (dt_classifier) on the iris dataset. Decision trees have a feature_importances_ attribute that shows how important each feature is for making predictions.

I need to:

  1. Extract the feature importances from the trained model
  2. Get the feature names from the iris dataset
  3. Create a bar chart using seaborn/matplotlib to visualize the importances

Let me create a single code block to do this.

I'll create a bar chart showing the feature importances from the trained decision tree classifier.

Plot decision tree feature importances
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Get feature names and importances
iris = load_iris()
feature_names = iris.feature_names
importances = dt_classifier.feature_importances_

# Create bar chart
plt.figure(figsize=(8, 5))
sns.barplot(x=importances, y=feature_names, palette='viridis')
plt.xlabel('Importance Score')
plt.ylabel('Feature')
plt.title('Decision Tree Feature Importances')
plt.tight_layout()
plt.show()
/var/folders/pp/x90pfm0140d4glxshn77bthh0000gn/T/ipykernel_3173/1524414935.py:12: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=importances, y=feature_names, palette='viridis')
Thinking

The user's original goal was to "show feature importances as a bar chart". Looking at the notebook state:

  1. Cell-5 executed a bar chart showing feature importances from the decision tree classifier
  2. Output-5 shows <Figure size 800x500 with 1 Axes> indicating the chart was created successfully

However, according to the sufficiency heuristics:

  • "For ranking tasks (for example top features, feature importance, top-N categories), do not choose Path A from chart-only output; first produce a value table/list via Path B."

The current output only shows a figure object, not the actual numeric values of the feature importances. To properly answer the goal and provide concrete insights, I should print the actual feature importance values as a table/list.

Let me choose Path B and create code to display the actual feature importance values in a readable format.

The feature importance bar chart has been created (cell-5), but the output only shows a figure object without the actual numeric values. For a ranking task like feature importances, I should display the underlying values to provide concrete insights.

Display feature importances
import pandas as pd
from sklearn.datasets import load_iris

# Get feature names and importances
iris = load_iris()
feature_names = iris.feature_names
importances = dt_classifier.feature_importances_

# Create a DataFrame for clear display
importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
}).sort_values('Importance', ascending=False)

importance_df
Feature Importance
2 petal length (cm) 0.906143
3 petal width (cm) 0.077186
1 sepal width (cm) 0.016670
0 sepal length (cm) 0.000000
Thinking

The user wants to see feature importances as a bar chart. Looking at the notebook state:

  • Cell-5 created a bar chart showing feature importances (output shows a figure was generated)
  • Cell-6 displayed the underlying numeric values in a DataFrame showing:
    • petal length (cm): 0.906143 (most important)
    • petal width (cm): 0.077186
    • sepal width (cm): 0.016670
    • sepal length (cm): 0.000000 (not used by the tree)

The original goal was to "show feature importances as a bar chart". Cell-5 already created the bar chart visualization. Cell-6 supplemented it with the numeric values.

According to the sufficiency heuristics:

  • "If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A"

We have both:

  1. The bar chart (cell-5 output - figure object)
  2. The numeric values table (cell-6 output - explicit importance scores)

This means Path A is appropriate - I should provide a natural final answer summarizing the feature importance findings based on the observed outputs, include <!--final answer-->, and not include any Python code block.

Let me craft a concise response that:

  1. Summarizes the key insights from the feature importance results
  2. References the actual values observed
  3. Marks it as final answer

The feature importance analysis is complete. The decision tree classifier reveals clear insights about which features drive predictions:

Key Findings:

FeatureImportance
petal length (cm)0.906
petal width (cm)0.077
sepal width (cm)0.017
sepal length (cm)0.000

Insights:

  • Petal length dominates with 90.6% of the importance score, meaning the decision tree relies almost entirely on this single feature for classification
  • Petal width contributes modestly at 7.7%, providing some additional splitting power
  • Sepal width has minimal impact at 1.7%
  • Sepal length is unused (0% importance) - the tree found no useful splits based on this feature

This aligns with the perfect classification accuracy (100% on test set) - the iris species are highly separable using petal measurements alone, particularly petal length. The bar chart in cell-5 visualizes these rankings, and cell-6 confirms the exact numeric values.

Score cards for qwen3.5:397b for Iris Species Classification with Decision Tree

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All major requested steps were completed: train/test split, decision tree training with accuracy, confusion matrix plot, and feature importances bar chart.

  • Execution Correctness: 2/2

    Excellent

    The code appears runnable and produced the expected outputs, including the split shapes, accuracy, confusion matrix, and feature importance table.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: 80/20 split, accuracy of 1.0, confusion matrix heatmap, and feature importances showing petal length and petal width dominate.

  • Reasoning Quality: 2/2

    Excellent

    The workflow correctly interprets the results and explains them clearly, including why the confusion matrix and feature importances support the model performance.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and grounded in observed outputs, with only a minor seaborn FutureWarning that does not affect the analysis.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.