Train Decision Tree regressor

Train a Decision Tree Regressor using scikit-learn. This machine learning algorithm predicts continuous targets. Use the DecisionTreeRegressor class to model housing data. MEDV is the target variable (house value), and other columns are features.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

You don't need to manually copy and paste imported packages, they are automatically imported by MLJAR Studio.

# import packages
import pandas as pd
from sklearn.tree import DecisionTreeRegressor

Load dataset and display first 5 rows from header.

# load example dataset
df = pd.read_csv(
    "https://raw.githubusercontent.com/pplonski/datasets-for-start/master/housing/data.csv"
)
# display first rows
df.head()

Create X and y variables. The X typically represents the feature matrix, it is an input data. Each row corresponds to an individual data point, and each column corresponds to a feature (variable) describing an aspect of the data points. For instance, in a dataset predicting house prices, X might include columns for square footage, number of bedrooms, location, etc. The y represents the target vector or output data. Each element corresponds to the target value or label associated with the data points in X. Continuing with the house prices example, y would be the actual house prices (MEDV column) that the model aims to predict.

# create X columns list and set y column
x_cols = [
    "CRIM",
    "ZN",
    "INDUS",
    "CHAS",
    "NOX",
    "RM",
    "AGE",
    "DIS",
    "RAD",
    "TAX",
    "PTRATIO",
    "B",
    "LSTAT",
]
y_col = "MEDV"
# set input matrix
X = df[x_cols]
# set target vector
y = df[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")

Create Decision tree model and then fit it on X, y data.

# initialize Decision Tree
my_tree = DecisionTreeRegressor(criterion="squared_error", random_state=42)
# display model card
my_tree
# fit model
my_tree.fit(X, y)
# compute prediction
predicted = my_tree.predict(X)
print("Predictions")
print(predicted)

Conclusions

In this notebook, we trained Decision Tree regressor. Please note that this example is greatly simplified:

  • we are using whole data set for training and then compute prediction on training samples,
  • we have used default hyperparameters for Decision Tree algorithm.

This notebook is only to demonstrate how to train Decision Tree model on a regression task. Please check other notebooks to get familiar on how to:

  • tune hyperparameters for Decision Tree,
  • save and load Decision Tree model,
  • visualize Decision Tree model,
  • compute prediction performance with different metrics.

Recipes used in the train-decision-tree-regressor.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the train-decision-tree-regressor.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

pandas>=1.0.0

scikit-learn>=1.5.0