Mar 31 2021 · Aleksandra Płońska, Piotr Płoński

The next-generation of AutoML frameworks

Next-generation of AutoML frameworks Automated Machine Learning (AutoML) is a process of building a complete Machine Learning pipeline automatically, without (or with minimal) human help. The AutoML solutions are quite new, with the first research papers from 2013 (Auto-Weka), 2015 (Auto-sklearn), and 2016 (TPOT). Currently, there are several AutoML open-source frameworks and commercial platforms available that can work with a variety of data. There is worth mentioning such open-source solutions like AutoGluon, H2O, or MLJAR AutoML.

The main goal of the AutoML framework was to find the best possible ML pipeline under the selected time budget. For its purpose, AutoML frameworks were training many different ML algorithms and tune their hyper-parameters. The improvements in the performance can be obtained by increasing the number of algorithms and checked hyper-parameters settings, which means longer computation time. But is the performance of the ML pipeline the only goal of the analysis?

The AutoML users

The goal of AutoML analysis depends on the user. Based on the authors’ observations, there are several types of AutoML users:

  • Software engineers well understand the Machine Learning theory but have a lack of experience in building ML pipelines. They are looking for AutoML with a simple API that can be easily integrated into their system. They need the best ML pipeline but under the constrain for prediction time on a single sample. They don’t want to build a stacked ensemble with several layers of models that need a lot of time to compute prediction for a single sample. They are looking for the ML model that computes prediction in milliseconds to be easily used in the REST API service.

  • Researchers and citizen data scientists are in the second group of AutoML users. They well understand the Machine Learning theory but are not proficient in coding. They are looking for a low-code solution that can be used to check many different algorithm combinations easily. They would like to understand as much as possible about their data. For example, which features are the most important, or are there any new feature combinations with predictive power (golden features).

  • The next group of users is seasoned data scientists. They are using AutoML for initial data understanding and quick prototyping. They would like to solve business needs quickly. They are looking for unique insights about data that can be used to improve company operations - the ML explainability is very important to them. They often need a simple Machine Learning model that can be easily interpreted and explained to other stakeholders (for example single Decision Tree with extracted text rules).

Machine Learning Pipeline

The next-generation of AutoML

The AutoML needs to be multipurpose to fill most of the users’ needs. That’s why AutoML frameworks often introduce the switchable mode of work. We will show you how it works in MLJAR AutoML (authors contribute to this open-source project). The MLJAR AutoML can work in four modes:

  • Explain that is perfect for initial data understanding. It splits data into 80/20 train and test sets. It trains algorithms like Baseline, Decision Tree, Linear, Random Forest, Xgboost, Neural Network, and Ensemble. It doesn’t perform hyperparameters tuning. It uses default hyperparameters values. In this mode, full explanations are computed for models: decision tree visualization, decision tree rules extraction, feature importance, SHAP dependency plots, SHAP decision plots.
  • Perform mode for training production-ready ML pipelines. It uses 5-fold cross-validation and tune algorithms like Random Forest, Xgboost, LightGBM, CatBoost, Neural Network, and Ensemble. It searches for the ML pipeline with prediction time for a single sample under 500 milliseconds (which can be set as an argument).
  • Compete mode for searching the best performing ML models under selected time budget. It is using an adjusted validation strategy that depends on dataset size. It highly tunes algorithms. Additionally, it applies advanced feature engineering techniques, like golden features, feature selection, k-means features.
  • Optuna the mode for tuning ML models without a hard time limit for AutoML training. It is using the Optuna framework for hyperparameters tuning. It provides well-performing models, but the computational time can be large.
MLJAR AutoML can work in 4 modes:
- Explain
- Perform
- Compete
- Optuna

# training in Explain mode, perfect for data exploration
automl = AutoML(mode="Explain"), y)

# training in Perform mode, perfect for production-ready systems
# AutoML trained for 4 hours
automl = AutoML(mode="Perform", total_time_limit=4*3600), y)

# training in Compete mode, perfect for high performing ML pipelines
automl = AutoML(mode="Compete", total_time_limit=4*3600), y)

# training in Optuna mode, highly tune selected algorithm
automl = AutoML(
    algorithms=["CatBoost", "LightGBM", "Xgboost"],
), y)

The Automatic Documentation and Notebook support

The AutoML can not be a black-box. The user wants to check the internals of each trained ML configuration in the pipeline. In the MLJAR AutoML the each trained model has its own markdown file with documentation. The Markdown files can be displayed in the code IDE like Visual Studio Code or code repositories like GitHub.

What is more, the markdown documentation can be easily translated into raw HTML and displayed in Python Notebook as an interactive report (example notebook with AutoML report -> link).

The AutoML framework comparison on Kaggle datasets

We compared AutoML frameworks on 10 tabular datasets from Kaggle’s past competitions. The datasets used in the comparison are described in the table below.

Competition Task Metric Year Teams Rows Columns
Mercedes-Benz Greener Manufacturing regression R2 2017 3,823 4,209 377
Santander Value Prediction Challenge regression RMSLE 2019 4,463 4,459 4,992
Allstate Claims Severity regression MAE 2017 3,045 180,000 131
BNP-Paribas Cardif Claims Management binary log-loss 2016 2,920 110,000 132
Santander Customer Transaction Prediction binary AUC 2019 8,751 220,000 201
Santander Customer Satisfaction binary AUC 2016 5,115 76,000 370
porto-seguro-safe-driver-prediction binary Gini 2018 5,156 600,000 58
IEEE Fraud Detection binary AUC 2019 6,351 590,000 432
Walmart Recruiting Trip Type Classification multi-class log-loss 2016 1,043 650,000 7
Otto Group Product Classification Challenge multi-class log-loss 2015 3,507 62,000 94

Each AutoML was trained for 4 hours on the m5.24xlarge EC2 machine (96 CPU and 384 GB RAM). The AutoML predictions on test data were evaluated on the private leaderboard by Kaggle’s internal scoring system. Below we report the Percentile Rank on the private leaderboard. The better solutions have higher Percentile Rank values. The first place solution in the competition will get Percentile Rank = 1. The results from Auto-WEKA, Auto-Sklearn, TPOT, H2O, GCP-Tables, AutoGluon presented here are from the AutoGluon article.

Dataset Auto-WEKA auto-sklearn TPOT H2O AutoML GCP-Tables AutoGluon MLJAR
ieee-fraud 0.119 0.349 x x 0.119 0.322 0.172
value 0.114 0.319 0.325 0.377 x 0.415 0.445
walmart x 0.39 0.379 x 0.398 0.384 0.423
transaction 0.131 0.329 0.326 x 0.404 0.406 0.463
porto 0.158 0.331 0.315 0.406 0.434 0.462 0.54
allstate 0.124 0.31 0.237 0.352 0.74 0.706 0.764
mercedes 0.16 0.444 0.547 0.363 0.658 0.169 0.879
otto 0.145 0.717 0.597 0.729 0.821 0.988 0.924
satisfaction 0.235 0.408 0.495 0.74 0.763 0.823 0.975
bnp-paribas 0.193 0.412 0.46 0.417 0.44 0.986 0.986


The AutoML frameworks are getting better every day. The constant improvement is not only in their performance. The next-generation of AutoML can provide insights and explanations about analyzed data, create new features, and generate documentation (with native Notebooks support). The AutoML solutions are multi-purpose. They are used by different groups of users: seasoned data scientists, citizen data scientists, and software engineers.


  1. (Auto-Weka) C. Thornton, et al., Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms, 2013
  2. (Auto-Sklearn) M. Feurer, et al., Efficient and Robust Automated Machine Learning, 2015
  3. (TPOT) R. S. Olson, et al., Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Proceedings of GECCO 2016
  4. (AutoGluon) N. Erickson, et al., AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data, 2020
  5. (H2O)
  6. (MLJAR AutoML)