Train Decision Tree regressor

Regression is a task of predicting continuous target. The scikit-learn package provide implmentation of Decision Tree algorithm for regression. The class for building regression DecisionTree is called DecisionTreeRegressor. We will train Decision Tree model on Housing that describes house properties with its value. The MEDV column is the value of house, and the rest of columns are house features.

# import packages
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
# load example dataset
df = pd.read_csv(
    "https://raw.githubusercontent.com/pplonski/datasets-for-start/master/housing/data.csv"
)
# display first rows
df.head()
# create X columns list and set y column
x_cols = [
    "CRIM",
    "ZN",
    "INDUS",
    "CHAS",
    "NOX",
    "RM",
    "AGE",
    "DIS",
    "RAD",
    "TAX",
    "PTRATIO",
    "B",
    "LSTAT",
]
y_col = "MEDV"
# set input matrix
X = df[x_cols]
# set target vector
y = df[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")

Create Decision tree model and then fit it on X, y data.

# initialize Decision Tree
my_tree = DecisionTreeRegressor(criterion="squared_error", random_state=42)
# display model card
my_tree
# fit model
my_tree.fit(X, y)
# compute prediction
predicted = my_tree.predict(X)
print("Predictions")
print(predicted)

Conclusions

In this notebook, we trained Decision Tree regressor. Please note that this example is greatly simplified:

  • we are using whole data set for training and then compute prediction on training samples,
  • we have used default hyper parameters for Decision Tree algorithm.

This notebook is only to demonstrate how to train Decision Tree model on regression task. Please check other notebooks to get familiar on how to:

  • tune hyper parameters for Decision Tree,
  • save and load Decision Tree model,
  • visualize Decision Tree model,
  • compute prediction performance with different metrics.

Recipes used in the train-decision-tree-regressor.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the train-decision-tree-regressor.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

pandas>=1.0.0

scikit-learn>=1.5.0