Train Random Forest classifier
Python implementation of Random Forest algorithm available in scikit-learn
package is very popular. In this notebook, we will train Random Forest classifier. We will use Iris dataset, which presents mutliclass classification task.
MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.
The outline of the notebook:
- we load Iris dataset using
Sample datasets
recipe, - DataFrame is splitted to X and y using
Select X,y
recipe, - we create object of
RandomForestClassifier
and perform training infit()
function.
# import packages
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# load example dataset
df = pd.read_csv(
"https://raw.githubusercontent.com/pplonski/datasets-for-start/master/iris/data.csv",
skipinitialspace=True,
)
# display first rows
df.head()
# create X columns list and set y column
x_cols = [
"sepal length (cm)",
"sepal width (cm)",
"petal length (cm)",
"petal widght (cm)",
]
y_col = "class"
# set input matrix
X = df[x_cols]
# set target vector
y = df[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")
# initialize Random Forest
forest = RandomForestClassifier(
n_estimators=100, criterion="gini", random_state=42, n_jobs=-1
)
# display model card
forest
# fit model
forest.fit(X, y)
Conclusions
Python and scikit-learn
make Random Forest training really easy. You need to have data prepared and split it into X
and y
. In this notebook, we assumed that we know hyperparameters values (for example, number of trees). In real life, you will need to optimize hyperparameters values for Random Forest to get the most accurate model response.
Recipes used in the train-random-forest-classifier.ipynb
All code recipes used in this notebook are listed below. You can click them to check their documentation.
Packages used in the train-random-forest-classifier.ipynb
List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.
pandas>=1.0.0
scikit-learn>=1.5.0
Similar notebooks
List of similar Python notebooks, so you can find more inspiration 😊