Train Decision Tree classifier

Classification is a task of predicting discrete target labels. The Python scikit-learn package provides an implementation of the Decision Tree algorithm for classification, the DecisionTreeClassifier. We will train a Decision Tree model on the Iris dataset.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

The dataset used in this notebook describes the properties of iris flowers. The species column is the target label, and the rest of the columns are the flower features.

All needed packages are automatically imported by MLJAR Studio :)

Please note that each code recipe has a side note on the left with cookbook name. It is for easier navigation. You can click on it to open documentation.

# import packages
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

Load sample data set and then split it into X and y variables.

# load example dataset
df = pd.read_csv(
    "https://raw.githubusercontent.com/pplonski/datasets-for-start/master/iris/data.csv",
    skipinitialspace=True,
)
# display first rows
df.head()
# create X columns list and set y column
x_cols = [
    "sepal length (cm)",
    "sepal width (cm)",
    "petal length (cm)",
    "petal widght (cm)",
]
y_col = "class"
# set input matrix
X = df[x_cols]
# set target vector
y = df[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")

Create Decision Tree object with DecisionTreeClassifier class.

# initialize Decision Tree
my_tree = DecisionTreeClassifier(criterion="gini", random_state=42)
# display model card
my_tree
# fit model
my_tree.fit(X, y)
# compute prediction
predicted = my_tree.predict(X)
print("Predictions")
print(predicted)

# predict class probabilities
predicted_proba = my_tree.predict_proba(X)
print("Predicted class probabilities")
print(predicted_proba)

Conclusions

In this notebook, we trained a Decision Tree classifier on the Iris dataset. This notebook serves solely to demonstrate how to train a Decision Tree model for a classification task. For more advanced topics, please refer to other notebooks to learn how to:

  • tune hyperparameters for the Decision Tree,
  • Save and load the Decision Tree model,
  • Visualize the Decision Tree model,
  • Evaluate prediction performance using different metrics.

Recipes used in the train-decision-tree-classifier.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the train-decision-tree-classifier.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

pandas>=1.0.0

scikit-learn>=1.5.0