Automated Machine Learning
Automated Machine Learning (AutoML) is a process of applying full machine learning pipeline in automatic way. The AutoML solution can do feature preprocessing and eningeering, algorithm training and hyperparameters selection.
The service works with structured data. It accepts CSV (Comma Separated Values) files as input. File used for training should have a target column. User uploads data file to mljar service.
For each data uploaded to the service the following statistics are computed:
- Min, Max, Median, Mean, Std
- Percent of missing values
- Number of unique values
For each column (feature) you can:
- Select its usage - how it will be used in ML model. The column can be: Id column, model input, model output (a target), sample weight or exclude from analysis
- Select its type. It can be numeric, discrete or categorical.
Machine Learning Experiment
To train machine learning model you need to create a ML experiment. It is easy and done with few-clicks. Most of the parameters which can be selected are set to smart defaults. You are required to select a training data.
- k-fold cross validation
- train / validation split
- validation with separate dataset
- Fill missing values with mean
- Fill missing values with median
- Fill missing values with minimum
- Convert categorical to integers
- Convert categorical to binary with one-hot encoding
- Extreme Gradient Boosting (Xgboost)
- Random Forest
- Regularized Greedy Forest
- Extra Trees
- k-Nearest Neighbors
- Logistic Regression
- Neural Network
- Optimize LogLoss or AUC for binary classification
- Optimize MSE or MAE for regression
- Select number of models trained
- Select time limit for model training