May 17 2019 · Piotr Płoński

Compare MLJAR with Google AutoML Tables

Recently, Google has released AutoML service for structured datasets. It is called AutoML Tables and is currently available in Beta. I've decided to compare my open-source solution with Google AutoML Tables.

Datasets used in comparison

My open-source package mljar-supervised currently supports only binary classification problems, that's why I've selected binary classification tasks (my AutoML service available in the cloud supports additionally regression). Datasets are from openml.org repository and can be accessed by id.

id title rows cols
3 kr-vs-kp 3196 37
24 mushroom 8124 23
38 sick 3772 30
44 spambase 4601 58
179 adult 488842 15
720 abalone 4177 9

Each dataset was split into training (70% of data) and testing (30% of data) datasets. The data after split is available in the gihub.

The code used in comparison

At first, I want to run few repetitions of data split for each dataset. The Google AutoML Tables has a user interface, nice! But to run a few repetitions on 6 datasets I want to use API (I'm a little lazy :-) ). I had a plan to use Google AutoML tables API (python or REST) but the documentation was unclear how to use it. On their website there are two links:

It wasn't clear to me how to use any of them. There were no examples of how they can be used. Maybe, I need to spend more time to try to understand the code behind clients but it wasn't my goal - I want to train ML models, not debug client libraries code. (Anyway, Google AutoML Tables is still in beta, maybe that's the reason of docs mess)

After wasting some time trying to understand how Google clients work, I come to the conclusion that there will be only 1 repetition and I will run computation manually with Google Cloud graphical user interface.

I set the training time to 1 hour in mljar-supervised and Google AutoML Tables.

For mljar-supervised I wrote a quick python script:

result = {}

for dataset_id in [3, 24, 38, 44, 179, 720]:

    result[dataset_id] = {"logloss": [],
                            "f1":[]}    

    df = pd.read_csv('./data/{0}.csv'.format(dataset_id))
    x_cols = [c for c in df.columns if c != 'target']
    X = df[x_cols]
    y = df['target']
    for repeat in range(1):
        seed = 1+repeat
        X_train, X_test, y_train, y_test = \
            sklearn.model_selection.train_test_split(X, y, test_size = 0.3, random_state=seed)
        # AutoML :)
        automl = AutoML(total_time_limit=60*60) # 60 minutes
        automl.fit(X_train, y_train)
        y_predicted_rf = automl.predict(X_test)
        result[dataset_id]["logloss"] += [log_loss(y_test, y_predicted_rf['p_1'])]

As you see, most of the code is data reading and manipulation. The AutoML code is in 3 lines (ok, 5 after formatting).

The results

To measure the performance I've used LogLoss metric (the lower the better). The results are presented in the chart below.

MLJAR vs Google AutoML Tables logloss results

On all datasets, the results from mljar-supervised are better. Only on dataset id=179 results from Google AutoML Tables are comparable (on a similar level).

Why is Google AutoML Tables performing so poor?

  • Google AutoML Tables is still in Beta. Maybe, not all optimizations are ready and available. (For sure, API and clients docs need polishing.)
  • I assume that Google AutoML Tables heavily depend on Neural Networks and Automatic Neural Architecture Search algorithms. First of all, Neural Networks are not always the best tool to deal with structured data. The second, Neural Architecture Search is not yet very efficient. It requires a lot of computing power. The Google AutoML Tables solution is using 92 machines in parallel! (for my open-source package I used 1 machine)

What I dislike in Google AutoML Tables? :(

  • I don't like that user is not able to see what models (Neural Networks) are checked during the training. User can only see that something is in computation, the screenshot below

Model training at Google AutoML Tables

After model training, some basic evaluation is made (without information what kind of model is it)

google autoML evaluation 1

google autoML evaluation

Of course, the model cannot be downloaded and used offline in local settings.

  • The price of 1-hour training is very high, it is ~20 USD. The price is high because Google AutoML Tables is using 92 machines in parallel. To get some reasonable results probably you will need to select more than 1 hour. Let's say 10 hours. Then the cost of running one ML experiment is about 200 USD. And you don't know what kind of model is trained. Doing a few ML experiments can easily finish with few thousand bill. Not nice!

  • I was trying to train 6 models in parallel but I got an error about exhausted computational resources.

google autoML resources exhausted

Again, not a very efficient algorithm for architecture search and you cannot train more than 5 models in parallel.

  • The predictions in Google AutoML Tables are continuous, to get classification labels user needs to select a decision threshold. The 0.5 threshold is not the best choice for all datasets and this should be automated.

What I like in Google AutoML Tables? :)

  • I'm not a fan of Google AutoML Tables service, but I do like the Google AutoML (I like the direction in which they are developing). They provide a wide range of machine learning services, for image and video classification, natural language processing and translation services.

enter image description here

Conclusion

  • I doubt that Google AutoML Tables will be used by data scientists (for prototyping or benchmarking). The solution is very expensive and does not provide details about models that are checked. Running one Machine Learning experiment can easily cost more than 200 USD.
  • Is Google AutoML Tables a tool for people that don't know Machine Learning? I don't think so. It is too much black-box and too expensive. It will be rather a one-try tool, that people will check out of curiosity (like me).
  • My open-source AutoML python package got better results than Google! :) This is nice :) But to be objective, it was done on 6 datasets and binary classification only. For sure, there can be datasets that Google AutoML Tables can outperform mljar-supervised, if you find such dataset please let me know!