When I first saw this question I was a little surprised. The first thought is, of course, they do! Any complex machine learning algorithm can overfit. I’ve trained hundreds of Random Forest (RF) models and many times observed they overfit. The second thought, wait, why people are asking such a question? Let’s dig more and do some research. After quick googling, I’ve found the following paragraph on Leo Breiman (the creator of the Random Forest algorithm) website:
April 05, 2019 by Piotr Płoński Random forest
March 23, 2019 by Jeff King (MLJAR's user) Testimonial
I was and still am fascinated by Machine Learning. Coming from a Pharmaceutical background without knowledge of programming or any kind of coding experience I thought I would not be able to get a piece of this new Tech cake. But with the advent of Automated Machine learning (AutoML) non-data scientists like myself have an array of tools to satisfy their once-thought incurable itch to create ML models without writing a single line of code. But perseverance is the name of the game and I through application and with the help of countless videos taught myself to learn how to create ML algorithms. Having explored a few of the available AutoML tools I just want to outline my trip to this amazing world of AutoML with a use case scenario providing some insight into the performance of the various open-sourced AutoML solutions at the same time. I assume you are aware of the major tasks in the machine learning workflow namely data preparation, feature engineering, training a model, evaluation of the model, hyperparameter tuning and finally serving the model.
November 30, 2018 by Paweł Grabiński Feature engineering
Data in the real world can be extremely messy and chaotic. It doesn’t matter if it is a relational SQL database, Excel file or any other source of data. Despite being usually constructed as tables where each row (called sample) has its own values corresponding to a given column (called feature), the data may be hard to understand and process. To make the reading of the data easier for our machine learning models and thanks to that increase its performance, we can conduct feature engineering.
In business and science alike, while conducting some processes, it is always necessary to measure efficiency and quality. When it comes to the financial deals, the situation is simple. Income either in short or in long term is the way to go. But what about Machine Learning? To measure the quality of a developed model, we use the process of validation which ensures that we are moving forward in our search for the efficiency and the optimal capacity.
December 07, 2017 by Piotr Płoński Compare
Automated Machine Learning (autoML) is a process of building Machine Learning models by the algorithm with no human intervention. There are several autoML packages available for building predictive models:
September 27, 2017 by Dominik Krzemiński Automl
Sometimes we don’t even realize how common machine learning (ML) is in our daily lives. Various “intelligent” algorithms help us for instance with finding the most important facts (Google), they suggest what movie to watch (Netflix), or influence our shopping decisions (Amazon). The biggest international companies quickly recognized the potential of machine learning and transferred it to business solutions.
August 22, 2017 by Dominik Krzemiński Hyperparameters
Look at some titles of recent questions posted on Quora or Stack Overflow:
May 19, 2017 by Piotr Płoński Employee analytics
The analytic methods can improve Human Resources (HR) management for companies with large number of employees. It is very easy to give example, how can companies benefit from machine learning methods applied to HR. Let’s assume that training of new employee costs 1000$ and if we can predict which employee is going to leave next month, and propose him/her a bonus program worth 500$ to keep him for next 6 months, we are 500$ on plus and keep experienced, well-trained employee under the hood, with higher morale.
December 12, 2016 by Piotr Płoński Compare
Herein the performance of MLJAR on Kaggle dataset from “Give me some credit” challenge is reported. The obtained results are compared with other predictive APIs from Amazon, Google, PredicSis and BigML. This post was inspired with Louis Dorard’s article.
September 28, 2016 by Piotr Płoński
The data and machine learning models used by hedge funds are secret. However, there is one hedge fund which makes its data public - Numer.ai. The dataset is encrypted and prediction of stock market is transformed into binary classification problem. Every 7 days (one round) a new dataset is released and anyone can download it, train model and upload predictions. At the end of the round, the best predictions are rewarded - there is no need to upload the model. As you can see from the leaderboard,
MLJAR_COMis doing really well!