Automated Machine Learning
K-Means Features

Use K-Means clustering algorithm to enhance your data
and improve model's performance

K-Means Features Procedure Overview

AutoML computes the K-Means centers based on numeric features. If needed the scaling is applied.
The information about distance to K-Means centers and center number is added to each sample.
The number of clusters is determined based on number of rows & columns in the training data.
We use Mini-Batch K-Means versions available in the scikit-learn package.
For details please check the source code in the GitHub.
K-Means Features Generation Overview

Advantages of K-Means Features


Text data type

Fast

Features generation step is very fast thanks to Mini-Batch version of the K-Means algorithm.

Categorical data type

Automatic

Number of clusters in the K-Means is selected automatically based on the training data properties.

Datetime data type

Accurate

Improve your Machine Learning pipeline accuracy by including K-Means features in the data.

Check more features engineering methods

Golden Features Search

Features Preprocessing

Features Selection