MLJAR's Blog

  • How to save and load Random Forest from Scikit-Learn in Python?

    June 24, 2020 by Piotr Płoński Random forest

    In this post I will show you how to save and load Random Forest model trained with scikit-learn in Python. The method presented here can be applied to any algorithm from sckit-learn (this is amazing about scikit-learn!).

  • How to reduce memory used by Random Forest from Scikit-Learn in Python?

    June 24, 2020 by Piotr Płoński Random forest

    The Random Forest algorithm from scikit-learn package can sometimes consume too much memory:

  • Visualize a Decision Tree in 4 Ways with Scikit-Learn and Python

    June 22, 2020 by Piotr Płoński Decision tree

    Visualize Decision Tree in Python bannerA Decision Tree is a supervised algorithm used in machine learning. It is using a binary tree graph (each node has two children) to assign for each data sample a target value. The target values are presented in the tree leaves. To reach to the leaf, the sample is propagated through nodes, starting at the root node. In each node a decision is made, to which descendant node it should go. A decision is made based on the selected sample’s feature. Decision Tree learning is a process of finding the optimal rules in each internal tree node according to the selected metric.

  • Compare MLJAR with Google AutoML Tables

    May 17, 2019 by Piotr Płoński Compare

    Recently, Google has released AutoML service for structured datasets. It is called AutoML Tables and is currently available in Beta. I’ve decided to compare my open-source solution with Google AutoML Tables.

  • AutoML software and services

    May 14, 2019 by Piotr Płoński Automl

    Automated Machine Learning is the end-to-end process of applying machine learning in an automatic way.

  • Random Forest vs Neural Network (classification, tabular data)

    May 10, 2019 by Piotr Płoński Random forest Neural network

    Which is better: Random Forest or Neural Network? This is a common question, with a very easy answer: it depends :) I will try to show you when it is good to use Random Forest and when to use Neural Network.

  • Random Forest vs AutoML (with python code)

    May 07, 2019 by Piotr Płoński Random forest Automl

    Random Forest versus AutoML you say. Hmmm…, it’s obvious that the performance of AutoML will be better. You will check many models and then ensemble them. This is true, but I would like to show you other advantages of AutoML, that will help you deal with dirty, real-life data.

  • Does Random Forest overfit?

    April 05, 2019 by Piotr Płoński Random forest

    When I first saw this question I was a little surprised. The first thought is, of course, they do! Any complex machine learning algorithm can overfit. I’ve trained hundreds of Random Forest (RF) models and many times observed they overfit. The second thought, wait, why people are asking such a question? Let’s dig more and do some research. After quick googling, I’ve found the following paragraph on Leo Breiman (the creator of the Random Forest algorithm) website:

  • Testimonial - MLJAR to the rescue

    March 23, 2019 by Jeff King (MLJAR's user) Testimonial

    I was and still am fascinated by Machine Learning. Coming from a Pharmaceutical background without knowledge of programming or any kind of coding experience I thought I would not be able to get a piece of this new Tech cake. But with the advent of Automated Machine learning (AutoML) non-data scientists like myself have an array of tools to satisfy their once-thought incurable itch to create ML models without writing a single line of code. But perseverance is the name of the game and I through application and with the help of countless videos taught myself to learn how to create ML algorithms. Having explored a few of the available AutoML tools I just want to outline my trip to this amazing world of AutoML with a use case scenario providing some insight into the performance of the various open-sourced AutoML solutions at the same time. I assume you are aware of the major tasks in the machine learning workflow namely data preparation, feature engineering, training a model, evaluation of the model, hyperparameter tuning and finally serving the model.

  • Feature engineering - tell your model what to look at

    November 30, 2018 by Paweł Grabiński Feature engineering

    Data in the real world can be extremely messy and chaotic. It doesn’t matter if it is a relational SQL database, Excel file or any other source of data. Despite being usually constructed as tables where each row (called sample) has its own values corresponding to a given column (called feature), the data may be hard to understand and process. To make the reading of the data easier for our machine learning models and thanks to that increase its performance, we can conduct feature engineering.