MLJAR's Blog

  • Python Virtual Environment Explained

    October 29, 2021 by Piotr Płoński Python Virtual environment

    Python Virtual Environments ExplainedHave you ever messed with Python packages? Have you ever had problems running someone else script because of broken dependencies? Would you like to know why and how to fix it? In this article, we explain how does Python virtual environment work.

  • How to authenticate Python to access Google Sheets with Service Account JSON credentials

    October 28, 2021 by Aleksandra Płońska & Piotr Płoński Mljar Google sheets

    Authenticate Google Sheets with JSON credentialsWhen you need to access Google Sheets data from Python script you will need to prove that you have access to the resource. There are several options to authenticate to Google API, one of them is Service Account. We will show you how to get JSON file with credentials to access Google Sheets.

  • Read Google Sheets in Python with no-code MLJAR Studio

    October 26, 2021 by Aleksandra Płońska & Piotr Płoński Mljar

    Read Google Sheets in Python with no-code MLJAR Studio is a desktop application for creating Python scripts. It has a graphical user interface (GUI) for code generation. In this post we would like to show you, how to easily create a Python script for reading data from Google Sheets in Python without writing a code - code will be generated through code-forms.

  • MLJAR Studio a new way to build data apps

    August 31, 2021 by Aleksandra Płońska & Piotr Płoński Mljar

    Next-generation of AutoML frameworks We started to work on a new product - MLJAR Studio. It is a desktop application for the interactive development of data apps. We hope it will bring more flexibility to our users.

  • The next-generation of AutoML frameworks

    March 31, 2021 by Aleksandra Płońska & Piotr Płoński Automl

    Next-generation of AutoML frameworks Automated Machine Learning (AutoML) is a process of building a complete Machine Learning pipeline automatically, without (or with minimal) human help. The AutoML solutions are quite new, with the first research papers from 2013 (Auto-Weka), 2015 (Auto-sklearn), and 2016 (TPOT). Currently, there are several AutoML open-source frameworks and commercial platforms available that can work with a variety of data. There is worth mentioning such open-source solutions like AutoGluon, H2O, or MLJAR AutoML.

  • CatBoost with custom evaluation metric

    March 25, 2021 by Piotr Płoński Catboost

    CatBoost Custom Evaluation Metric CatBoost is a powerful gradient boosting framework. It can be used for classification, regression, and ranking. It is available in many languages, like: Python, R, Java, and C++. It can handle categorical features without any preprocessing. As all gradient boosting algorithms it can overfit if trained with too many trees (iterations). If the number of trees is too small, then we will observe underfit. To find the optimal number of trees the early stopping can be applied. This technique observes the evaluation metric on the separate dataset (from training).

  • How to use early stopping in Xgboost training?

    March 17, 2021 by Piotr Płoński Xgboost

    Xgboost Early Stopping in Python Xgboost is a powerful gradient boosting framework that can be used to train Machine Learning models. It is important to select optimal number of trees in the model during the training. Too small number of trees will result in underfitting. On the other hand, too large number of trees will result in overfitting. How to find the optimal number of trees? You can use an early stopping.

  • How to save and load Xgboost in Python?

    March 16, 2021 by Piotr Płoński Xgboost

    Save and Load Xgboost in Python Xgboost is a powerful gradient boosting framework. It provides interfaces in many languages: Python, R, Java, C++, Juila, Perl, and Scala. In this post, I will show you how to save and load Xgboost models in Python. The Xgboost provides several Python API types, that can be a source of confusion at the beginning of the Machine Learning journey. I will try to show different ways for saving and loading the Xgboost models, and show which one is the safest.

  • MLJAR AutoML adds integration with Optuna

    March 15, 2021 by Piotr Płoński Automl Optuna

    MLJAR integration with Optuna The MLJAR provides an open-source Automated Machine Learning framework for creating Machine Learning pipelines. It has a built-in heuristic algorithm for hyperparameters tuning based on: random search over a defined set of hyperparameters values, and hill-climbing over best solutions to search for further improvements. This solution works very well on Machine Learning tasks under a selected time budget. However, there might be situations when the model performance is the primary goal and the time needed for computation is not the limit. Thus, we propose the new mode: “Optuna” in the MLJAR framework. In this mode, we utilize the Optuna hyperparameters tuning framework. It is availbale in the mljar-supervised package starting from version 0.10.0.

  • Lead Scoring

    March 05, 2021 by Aleksandra Płońska Lead scoring

    If you’re selling, promoting, and engaging customers to buy new services, you’ve certainly come across the concept of lead scoring. This term is a source of interest for marketing agencies that, engaging possible information about the client, look for those who will be interested in a specific product or service.