MLJAR vs Google Cloud AutoML Tables (me vs Google)

Published at: May 17, 2019, 3:24 p.m. | Author: Piotr Płoński

Recently, Google has released AutoML service for structured datasets. It is called AutoML Tables and is currently available in Beta. I've decided to compare my open-source solution with Google AutoML Tables.

AutoML software and services list

Published at: May 14, 2019, 8:02 a.m. | Author: Piotr Płoński

List of Automated Machine Learning software and services (open source and proprietary)

Random Forest vs Neural Network (classification, tabular data)

Published at: May 10, 2019, 2:53 p.m. | Author: Piotr Płoński

Which is better: Random Forest or Neural Network? This is a common question, with a very easy answer: it depends :) I will try to show you when it is good to use Random Forest and when to use Neural Network.

Random Forest vs AutoML (with python code)

Published at: May 7, 2019, 12:19 p.m. | Author: Piotr Płoński

Random Forest versus AutoML you say. Hmmm..., it's obvious that the performance of AutoML will be better. You will check many models and then ensemble them. This is true, but I would like to show you other advantages of AutoML, that will help you deal with dirty, real-life data and make your life easier!

Random Forest Overfitting

Published at: April 5, 2019, 10:49 a.m. | Author: Piotr Płoński

Does Random Forest overfit? When I first saw this question I was a little surprised. The first thought is, of course, they do! Any complex machine learning algorithm can overfit. I’ve trained hundreds of Random Forest (RF) models and many times observed RF overfitting when too many trees were used during training. The second thought, wait, why people are asking such a question? Let's dig more and do some research.

MLJAR to the rescue

Published at: March 23, 2019, 11:47 a.m. | Author: Jeff King

I was and still am fascinated by Machine Learning. Coming from a Pharmaceutical background without knowledge of programming or any kind of coding experience I thought I would not be able to get a piece of this new Tech cake. But with the advent of Automated Machine learning (AutoML) non-data scientists like myself have an array of tools to satisfy their once-thought incurable itch to create ML models without writing a single line of code.

Feature engineering - tell your model what to look at

Published at: Nov. 30, 2018, 8:11 a.m. | Author: Paweł Grabiński

Data in the real world can be extremely messy and chaotic. It doesn’t matter if it is a relational SQL database, Excel file or any other source of data. Despite being usually constructed as tables where each row (called sample) has its own values corresponding to a given column (called feature), the data may be hard to understand and process. To make the reading of the data easier for our machine learning models and thanks to that increase its performance, we can conduct feature engineering.

Validation - Learning, Not Memorizing

Published at: Oct. 17, 2018, 5:59 p.m. | Author: Paweł Grabiński

In business and science alike, while conducting some processes, it is always necessary to measure efficiency and quality. When it comes to the financial deals, the situation is simple. Income either in short or in long term is the way to go. But what about Machine Learning?

AutoML Comparison

Published at: Dec. 7, 2017, 8:44 p.m. | Author: Piotr Płoński

Automatic Machine Learning(autoML) is a process of building Machine Learning models by algorithm with no human supervision. We compare three autoML packages (auto-sklearn, h2o and mljar). The comparison was performed on binary classification task on 28 datasets from openml.

Churn Prediction with Automatic ML

Published at: Sept. 27, 2017, 8:01 a.m. | Author: Dominik Krzemiński

Sometimes we don’t even realize how common machine learning (ML) is in our daily lives. Various “intelligent” algorithms help us for instance with finding the most important facts (Google), they suggest what movie to watch (Netflix), or influence our shopping decisions (Amazon). The biggest international companies quickly recognized the potential of machine learning and transferred it to business solutions. Nowadays not only big companies are able to use ML. Imagine — not so abstract — situation when a company tries to predict customer behavior based on some personal data. Just a few years ago, the best strategy to solve this problem would be to hire a good data science team. Nowadays, thanks to growing ML popularity, it is available even for small start-ups. Today, I would like to present you a demo of how to solve difficult business problems with ML. We will take advantage of service and its R API. With just a few lines of code we will be able to achieve very good results.


Published at: Sept. 20, 2017, 10:09 a.m. | Author: Piotr Płoński

Hi! We have added R API for mljar - so you can run sklearn, xgboost, lightGBM, Keras, RGF from one R line :) Please check it on

Are hyper-parameters really important in Machine Learning?

Published at: Aug. 22, 2017, 10:20 a.m. | Author: Dominik Krzemiński

It seems that one of the most problematic topics for machine-learning self-learners is to understand the difference between parameters and hyper-parameters. The concept of hyper-parameters is very important, because these values directly influence overall performance of ML algorithms.The simplest definition of hyper-parameters is that they are a special type of parameters that cannot be inferred from the data. Imagine, for instance, a neural network. As you probably know, artificial neurons learning is achieved by tuning their weights in a way that the network gives the best output label in regard to the input data. However, architectures of neural networks vary depending on the task. There are many things to be considered: number of layers, size of each layer, number of connections, etc. They need to be considered before network tuning, so in this case they are called hyper-parameters.

Employee Analytics

Published at: May 19, 2017, 1:33 p.m. | Author: Piotr Płoński

The analytic methods can improve Human Resources (HR) management for companies with large number of employees. It is very easy to give example, how can companies benefit from machine learning methods applied to HR. Let’s assume that training of new employee costs 1000$ and if we can predict which employee is going to leave next month, and propose him/her a bonus program worth 500$ to keep him for next 6 months, we are 500$ on plus and keep experienced, well-trained employee under the hood, with higher morale.

MLJAR python API

Published at: Feb. 21, 2017, 8:34 p.m. | Author: Piotr Płoński

We are thrilled to announce our MLJAR python API. It makes building and tuning machine learning models super easy! You just write few lines of python code and all models are trained and tuned in the cloud on multiple machines and all results are available to check in your web browser! It is very powerful! :) You can check it on mljar github:

Machine Learning Wars

Published at: Dec. 12, 2016, 1:33 p.m. | Author: Piotr Płoński

Herein the performance of MLJAR on Kaggle dataset from “Give me some credit” challenge is reported. The obtained results are compared with other predictive APIs from Amazon, Google, PredicSis and BigML. This post was inspired with Louis Dorard's [article][1]. [1]:

Searching for brain regions responsible for kids dyslexia

Published at: Nov. 9, 2016, 4:40 p.m. | Author: Piotr Płoński

Dyslexia is reading disorder - characterized by trouble with reading despite normal intelligence. We used ML methods to find brain regions responsible for this! Check out the paper: "Multi-Parameter Machine Learning Approach to the Neuroanatomical Basis of Developmental Dyslexia" in Human Brain Mapping journal.

Building Binary Classifier on data

Published at: Nov. 9, 2016, 4:20 p.m. | Author: Piotr Płoński

We made a youtube video with instruction how to build a binary classifier on dataset. We use raw data - but you should make some magic, create new features and use MLJAR to find the best models!

MLJAR Rationale

Published at: Oct. 28, 2016, 10:56 a.m. | Author: Piotr Płoński

MLJAR is a platform for rapid prototyping, developing and deploying machine learning models. Yeah! Here we list MLJAR rationale.

Predict stock market on AI tournament

Published at: Sept. 28, 2016, 1:30 p.m. | Author: Piotr Płoński

Machine learning models used by hedge funds for predicting stock market are of course super top secret as well as data used for their creation. However, there is one which makes its data public - Dataset is encrypted and prediction of stock market is transformed into binary classification problem. Every 7 days (one round time) a new dataset is released and anyone can download it, train model and upload predictions. At the end of round, the best predictions are rewarded - there is no need to model upload. As you can see from leaderboard is going really well in this competition.

3, 2, 1 - Start! MLJAR Blog

Published at: Sept. 27, 2016, 7:11 a.m. | Author: Piotr Płoński

MLJAR is a framework for building machine learning models - it is done fast, accurate, easy and (almost) automatic. We hope it can help many data hackers. In this blog we are going to describe interesting (in our opinion) use cases - so be in touch with us!