Train Random Forest classifier

Python implementation of Random Forest algorithm available in scikit-learn package is very popular. In this notebook, we will train Random Forest classifier. We will use Iris dataset, which presents mutliclass classification task.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

The outline of the notebook:

  • we load Iris dataset using Sample datasets recipe,
  • DataFrame is splitted to X and y using Select X,y recipe,
  • we create object of RandomForestClassifier and perform training in fit() function.
# import packages
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# load example dataset
df = pd.read_csv(
    "https://raw.githubusercontent.com/pplonski/datasets-for-start/master/iris/data.csv",
    skipinitialspace=True,
)
# display first rows
df.head()
# create X columns list and set y column
x_cols = [
    "sepal length (cm)",
    "sepal width (cm)",
    "petal length (cm)",
    "petal widght (cm)",
]
y_col = "class"
# set input matrix
X = df[x_cols]
# set target vector
y = df[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")
# initialize Random Forest
forest = RandomForestClassifier(
    n_estimators=100, criterion="gini", random_state=42, n_jobs=-1
)
# display model card
forest
# fit model
forest.fit(X, y)

Conclusions

Python and scikit-learn make Random Forest training really easy. You need to have data prepared and split it into X and y. In this notebook, we assumed that we know hyperparameters values (for example, number of trees). In real life, you will need to optimize hyperparameters values for Random Forest to get the most accurate model response.

Recipes used in the train-random-forest-classifier.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the train-random-forest-classifier.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

pandas>=1.0.0

scikit-learn>=1.5.0

Similar notebooks

List of similar Python notebooks, so you can find more inspiration 😊