Blog Category: Tutorial

Feature engineering - tell your model what to look at

Published at: Nov. 30, 2018, 8:11 a.m. | Author: Paweł Grabiński

Data in the real world can be extremely messy and chaotic. It doesn’t matter if it is a relational SQL database, Excel file or any other source of data. Despite being usually constructed as tables where each row (called sample) has its own values corresponding to a given column (called feature), the data may be hard to understand and process. To make the reading of the data easier for our machine learning models and thanks to that increase its performance, we can conduct feature engineering.

Validation - Learning, Not Memorizing

Published at: Oct. 17, 2018, 5:59 p.m. | Author: Paweł Grabiński

In business and science alike, while conducting some processes, it is always necessary to measure efficiency and quality. When it comes to the financial deals, the situation is simple. Income either in short or in long term is the way to go. But what about Machine Learning?

Churn Prediction with Automatic ML

Published at: Sept. 27, 2017, 8:01 a.m. | Author: Dominik Krzemiński

Sometimes we don’t even realize how common machine learning (ML) is in our daily lives. Various “intelligent” algorithms help us for instance with finding the most important facts (Google), they suggest what movie to watch (Netflix), or influence our shopping decisions (Amazon). The biggest international companies quickly recognized the potential of machine learning and transferred it to business solutions. Nowadays not only big companies are able to use ML. Imagine — not so abstract — situation when a company tries to predict customer behavior based on some personal data. Just a few years ago, the best strategy to solve this problem would be to hire a good data science team. Nowadays, thanks to growing ML popularity, it is available even for small start-ups. Today, I would like to present you a demo of how to solve difficult business problems with ML. We will take advantage of service and its R API. With just a few lines of code we will be able to achieve very good results.

Are hyper-parameters really important in Machine Learning?

Published at: Aug. 22, 2017, 10:20 a.m. | Author: Dominik Krzemiński

It seems that one of the most problematic topics for machine-learning self-learners is to understand the difference between parameters and hyper-parameters. The concept of hyper-parameters is very important, because these values directly influence overall performance of ML algorithms.The simplest definition of hyper-parameters is that they are a special type of parameters that cannot be inferred from the data. Imagine, for instance, a neural network. As you probably know, artificial neurons learning is achieved by tuning their weights in a way that the network gives the best output label in regard to the input data. However, architectures of neural networks vary depending on the task. There are many things to be considered: number of layers, size of each layer, number of connections, etc. They need to be considered before network tuning, so in this case they are called hyper-parameters.

MLJAR python API

Published at: Feb. 21, 2017, 8:34 p.m. | Author: Piotr Płoński

We are thrilled to announce our MLJAR python API. It makes building and tuning machine learning models super easy! You just write few lines of python code and all models are trained and tuned in the cloud on multiple machines and all results are available to check in your web browser! It is very powerful! :) You can check it on mljar github:

Machine Learning Wars

Published at: Dec. 12, 2016, 1:33 p.m. | Author: Piotr Płoński

Herein the performance of MLJAR on Kaggle dataset from “Give me some credit” challenge is reported. The obtained results are compared with other predictive APIs from Amazon, Google, PredicSis and BigML. This post was inspired with Louis Dorard's [article][1]. [1]:

Building Binary Classifier on data

Published at: Nov. 9, 2016, 4:20 p.m. | Author: Piotr Płoński

We made a youtube video with instruction how to build a binary classifier on dataset. We use raw data - but you should make some magic, create new features and use MLJAR to find the best models!