Automated Machine Learning

Automated Machine Learning (AutoML) is a process of applying full machine learning pipeline in automatic way. The AutoML solution can do feature preprocessing and eningeering, algorithm training and hyperparameters selection.

Training data

The service works with structured data. It accepts CSV (Comma Separated Values) files as input. File used for training should have a target column. User uploads data file to mljar service.

data upload

Dataset statistics

For each data uploaded to the service the following statistics are computed:
  • Min, Max, Median, Mean, Std
  • Percent of missing values
  • Number of unique values
  • Distribution
For each column (feature) you can:
  • Select its usage - how it will be used in ML model. The column can be: Id column, model input, model output (a target), sample weight or exclude from analysis
  • Select its type. It can be numeric, discrete or categorical.
feature statistics

Machine Learning Experiment

To train machine learning model you need to create a ML experiment. It is easy and done with few-clicks. Most of the parameters which can be selected are set to smart defaults. You are required to select a training data.

Available validation:
  • k-fold cross validation
  • train / validation split
  • validation with separate dataset
Available preprocessing:
  • Fill missing values with mean
  • Fill missing values with median
  • Fill missing values with minimum
  • Convert categorical to integers
  • Convert categorical to binary with one-hot encoding
Available algorithms:
  • Extreme Gradient Boosting (Xgboost)
  • LightGBM
  • Random Forest
  • Regularized Greedy Forest
  • Extra Trees
  • k-Nearest Neighbors
  • Logistic Regression
  • Neural Network
  • Ensemble
Available tuning:>
  • Optimize LogLoss or AUC for binary classification
  • Optimize MSE or MAE for regression
  • Select number of models trained
  • Select time limit for model training
new ml experiment

Machine Learning model information

The service store information about each model and its training process. You can check:
  • hyperparameters values
  • preprocessing used
  • scores in each cross validation fold
  • learning curves computed as mean on all cross validation folds
To prevent overfitting the early stopping is used on all models. The model internal architecture stored in the service is always from best iteration number.
model information

Feature Importance

You can check the importance of your features for:
  • Extreme Gradient Boosting
  • LightGBM
  • Random Forest
  • Extra Trees
feature importance

Deploy Machine Learning model

There are many options how you can use your model:
  • You can compute predictions with user interface
  • You can download model and use it locally - yes, all models are yours and you can do what you want with them!
  • You can download model's code
  • You can use our REST API to access your models in the cloud

deploy ml