MLJAR's Blog

  • The next-generation of AutoML frameworks

    March 31, 2021 by Aleksandra Płońska & Piotr Płoński Automl

    Next-generation of AutoML frameworks Automated Machine Learning (AutoML) is a process of building a complete Machine Learning pipeline automatically, without (or with minimal) human help. The AutoML solutions are quite new, with the first research papers from 2013 (Auto-Weka), 2015 (Auto-sklearn), and 2016 (TPOT). Currently, there are several AutoML open-source frameworks and commercial platforms available that can work with a variety of data. There is worth mentioning such open-source solutions like AutoGluon, H2O, or MLJAR AutoML.

  • CatBoost with custom evaluation metric

    March 25, 2021 by Piotr Płoński Catboost

    CatBoost Custom Evaluation Metric CatBoost is a powerful gradient boosting framework. It can be used for classification, regression, and ranking. It is available in many languages, like: Python, R, Java, and C++. It can handle categorical features without any preprocessing. As all gradient boosting algorithms it can overfit if trained with too many trees (iterations). If the number of trees is too small, then we will observe underfit. To find the optimal number of trees the early stopping can be applied. This technique observes the evaluation metric on the separate dataset (from training).

  • How to use early stopping in Xgboost training?

    March 17, 2021 by Piotr Płoński Xgboost

    Xgboost Early Stopping in Python Xgboost is a powerful gradient boosting framework that can be used to train Machine Learning models. It is important to select optimal number of trees in the model during the training. Too small number of trees will result in underfitting. On the other hand, too large number of trees will result in overfitting. How to find the optimal number of trees? You can use an early stopping.

  • How to save and load Xgboost in Python?

    March 16, 2021 by Piotr Płoński Xgboost

    Save and Load Xgboost in Python Xgboost is a powerful gradient boosting framework. It provides interfaces in many languages: Python, R, Java, C++, Juila, Perl, and Scala. In this post, I will show you how to save and load Xgboost models in Python. The Xgboost provides several Python API types, that can be a source of confusion at the beginning of the Machine Learning journey. I will try to show different ways for saving and loading the Xgboost models, and show which one is the safest.

  • MLJAR AutoML adds integration with Optuna

    March 15, 2021 by Piotr Płoński Automl Optuna

    MLJAR integration with Optuna The MLJAR provides an open-source Automated Machine Learning framework for creating Machine Learning pipelines. It has a built-in heuristic algorithm for hyperparameters tuning based on: random search over a defined set of hyperparameters values, and hill-climbing over best solutions to search for further improvements. This solution works very well on Machine Learning tasks under a selected time budget. However, there might be situations when the model performance is the primary goal and the time needed for computation is not the limit. Thus, we propose the new mode: “Optuna” in the MLJAR framework. In this mode, we utilize the Optuna hyperparameters tuning framework. It is availbale in the mljar-supervised package starting from version 0.10.0.

  • Lead Scoring

    March 05, 2021 by Aleksandra Płońska Lead scoring

    If you’re selling, promoting, and engaging customers to buy new services, you’ve certainly come across the concept of lead scoring. This term is a source of interest for marketing agencies that, engaging possible information about the client, look for those who will be interested in a specific product or service.

  • How does AutoML work?

    March 04, 2021 by Piotr Płoński Automl

    The AutoML stands for Automated Machine Learning. It builds a Machine Learning pipeline in an automated way. But how exactly it works? What is behind the scene? There are many proprietary AutoML systems, and we probably never get to know how they work. Luckily, the MLJAR AutoML is open-source. Its code is available at GitHub. In this article, we will look inside MLJAR AutoML to show how it works.

  • AutoML in the Notebook

    March 04, 2021 by Piotr Płoński Automl Notebook

    Python Notebooks provide interactive computing environment that is perfect for experimenting with data. The Notebooks are widely used by Data Scientists in data analysis and discovery tasks. Currently, there are many versions of Notebooks. The first, and the most used version is Jupyter Notebook. There are also many cloud-based Notebooks, like Kaggle Notebooks or CoCalc Notebooks.

  • Extract Rules from Decision Tree in 3 Ways with Scikit-Learn and Python

    February 25, 2021 by Piotr Płoński Decision tree Scikit learn

    The rules extraction from the Decision Tree can help with better understanding how samples propagate through the tree during the prediction. It can be needed if we want to implement a Decision Tree without Scikit-learn or different than Python language. Decision Trees are easy to move to any programming language because there are set of if-else statements. I’ve seen many examples of moving scikit-learn Decision Trees into C, C++, Java, or even SQL.

  • Tensorflow vs Scikit-learn

    October 01, 2020 by Piotr Płoński Tensorflow Scikitlearn Neuralnetwork

    Have you ever wonder what is the difference between Tensorflow and Sckit-learn? Which one is better? Have you ever needed Tensorflow when you already use Scikit-learn?