May 14 2019 · Piotr Płoński

List 12 AutoML software and services

Automated Machine Learning is the end-to-end process of applying machine learning in an automatic way.

AutoML list

The full autoML pipeline usually consists of:

  • data pre-processing,
  • feature engineering,
  • feature extraction,
  • feature selection,
  • model training,
  • algorithm selection,
  • hyperparameter optimization

The outlined steps can be very time-consuming. There is a lot of ML algorithms that can be applied at each step of the analysis. The difficulty in manual construction of ML pipeline lays in the difference between data formats, interfaces and computational-intensity of ML algorithms. The Automated Machine Learning solutions aim to solve this problem by checking automatically different combinations of ML algorithms. The process of automated machine learning is controlled by statistical or machine learning algorithm.

Open source AutoML (in alphabetical order)

Auto-Keras

  • Auto-Keras provides automated architecture and hyperparameters search for deep learning models.
  • Main authors: developed by DATA Lab at Texas A&M University
  • ML task: image classification
  • Usage: Python package
  • Language: Python
  • Code on github: https://github.com/keras-team/autokeras

auto-sklearn

  • automated scikit-learn alternative. Auto-sklearn uses Bayesian optimization, meta-learning and ensemble construction. The package was presented at NIPS, 2015.
  • Main authors: M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, F. Hutter
  • ML tasks: binary classification, multiclass classification, regression
  • Usage: Python Package
  • Language: Python
  • Code on github: https://github.com/automl/auto-sklearn

automl-gs

  • Automl-gs generates raw Python code using Jinja templates and trains a model using the generated code. It provides data preprocessing and hyperparameters tuning. It uses Neural Networks (Tensorflow) and Xgboost.
  • Main author: Max Woolf
  • ML tasks: binary classification, multiclass classification, regression
  • Usage: Python package, command line
  • Language: Python
  • Code on github: https://github.com/minimaxir/automl-gs

Auto-Weka

  • Automated Machine Learning with WEKA
  • Main authors: Lars Kotthoff, Chris Thornton, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown.
  • ML tasks: binary classification, multiclass classification, regression
  • Usage: User Interface
  • Language: Java
  • Code on github: https://github.com/automl/autoweka

FeatureTools

  • FeatureTools use Deep Feature Synthesis to perform automated feature engineering on relational and temporal data.
  • Main authors: James Max Kanter and Kalyan Veeramachaneni
  • ML task: feature engineering
  • Usage: Python package
  • Language: Python
  • code on github: https://github.com/Featuretools/featuretools

h2o automl

  • H2O AutoML provides automated feature preprocessing, machine learning model tuning and training.
  • Main authors: H2O.ai
  • ML tasks: binary classification, multiclass classification, regression
  • Usage: Python or R package
  • Language: Java
  • code on github: https://github.com/h2oai/h2o-3

Ludwig

  • Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
  • Main authors: Uber
  • ML tasks: all (tabular data classification, regression, image recognition, NLP, time series)
  • Usage: Python package, command line scripts
  • Language: Python
  • code on github: https://github.com/uber/ludwig

mljar-supervised

  • mljar-supervised provides automated feature preprocessing, machine learning model tuning and training
  • Main authors: Piotr Płoński (MLJAR, Inc.)
  • ML tasks: binary classification, (multiclass classification, regression, anomaly detection, time series, work in progress)
  • Usage: Python package
  • Language: Python
  • Code on github: https://github.com/mljar/mljar-supervised

Neural Network Intelligence (NNI)

  • AutoML toolkit for neural architecture search and hyper-parameter tuning. Helps you to train NN locally or remotely.
  • Main authors: Microsoft Research (MSR)
  • ML tasks: all (you need to define the NN architecture and NNI will help you to tune it and train locally or in the cloud)
  • Usage: Python package, command line
  • Language: Python
  • Code on github: https://github.com/microsoft/nni

tpot

  • AutoML tool that optimizes machine learning pipelines using genetic programming
  • Main authors: Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore
  • ML tasks: binary classification, multiclass classification and regression
  • Usage: Python package
  • Language: Python
  • Code on github: https://github.com/EpistasisLab/tpot

TransmografAI

  • AutoML library for building machine learning workflows on Apache Spark
  • Main authors: Salesforce
  • ML tasks: binary classification, multiclass classification, and regression
  • Usage: Scala, Java packages
  • Language: Scala
  • Github code: https://github.com/salesforce/TransmogrifAI

Auto_ml (unmaintained)

It is worth to mention auto_ml Python package created by Preston Parry https://github.com/ClimbsRocks/auto_ml which is unmaintained.

Proprietary AutoML available in the cloud or on-premise (alphabetical order)

Below is the list of AutoML services available in the cloud or on-premise. Services listed here offer very similar functionality:

  • the user provides the input data set, usually as a flat file,
  • the user select target column which will be predicted, and input features
  • the user selects time limit for AutoML training,
  • AutoML is checking many possible data pipelines, train, and tune them,
  • in the end, AutoML selects the best performing algorithm (according to selected metric and validation),
  • the best model can be deployed in the cloud and accessed with REST API or can be used for batch predictions in the service.

Proprietary AutoML providers:

In most cases of AutoML in the cloud, the user is tied to the provider - there is no option to download model and use it locally. In MLJAR service you can download models and use them locally. (if you are aware of other providers where user can download model and use as he wants, please let me know in comments, I will update the post)

If you found some software or service missing in the list, please let me know in the comments!

Become a Data Science wizard, today!

Forget about Python problems, just do your work.

MLJAR Studio