Starting from version
1.0.0
our open-source Automated Machine Learning Python package mljar-supervised
is supporting fairness aware training of Machine Learning pipelines. Our AutoML can measure fairness and mitigate bias for provided sensitive features. We support three Machine Learning tasks: binary classification, multiclass classification and regression. We provide usage and implementation details in this article.
MLJAR's Blog
-
Fairness in Automated Machine Learning
June 26, 2023 by Aleksandra Płońska, Piotr Płoński Automl Fairness
-
Python Dashboard for 15,963 Data Analyst job listings
April 26, 2023 by Adrian Błazeusz Matplotlib Jupyter
The US job market is filled with exciting opportunities for aspiring Data Analysts. However, landing your first job can be challenging due to the diverse range of requirements employers are looking for. In this article, we analyze data from 15,963 Data Analyst job listings. We build a Jupyter Notebook with data analysis and visualization, and serve it as an interactive web app. For example, we search for the most needed skills and show their dependency on the average yearly salary. Let’s check what the most needed skills for Data Analyst are!
-
Jupyter Notebook in 4 flavors
December 03, 2022 by Aleksandra Płońska, Piotr Płoński Jupyter
Jupyter Notebook is a popular open-source tool for development and exploration in a data world. It started from IPython Notebook developed by Fernando Pérez and Brian Granger. Currently, the Jupyter Notebook is available as 4 different web applications: Classic Jupyter Notebook, Jupyter Lab, Jupyter RetroLab, and Jupyter Lite. Let’s look closer for differences between those Jupyter versions.
-
9 ways to set colors in Matplotlib
November 21, 2022 by Aleksandra Płońska, Piotr Płoński Matplotlib
Matplotlib
is a powerful visualization package for Python. It is very customizable, thanks to this it is widly used in commercial and in academic use cases. In this article, I will show you 9 different ways how to set colors inMatplotlib
plots. All parts of the plot can be customized with a new color. You can set colors for axes, labels, background, title. However, not every data scientist is a graphic designer that can compose nice looking colors in a single plot, so I can show you how to use predefined Matplotlib styles to get attractive plots. -
3 ways to get Pandas DataFrame row count
November 12, 2022 by Aleksandra Płońska, Piotr Płoński Pandas
The Pandas it’s a popular data manipulation library. The Pandas has over 15k stars on Github. It’s an open-source project that allows, among others: automatic and explicit data alignment, easy handling of missing data, Intelligent label-based slicing, indexing, and subsetting of large data sets, merging data sets, or flexible reshaping and pivoting of data sets There are 3 ways to get the row count from Pandas DataFrame. I will describe them all in this article. My preferred way is to use
df.shape
to get number of rows and columns. This method is fast and simple. -
Convert Jupyter Notebook to Python script in 3 ways
November 10, 2022 by Aleksandra Płońska, Piotr Płoński Jupyter Python Nbconvert
Jupyter Notebook saves files in
.ipynb
format. It is a JSON with code, Markdown, and outputs. There are many cases in which we would like to convert Jupyter Notebook to plain Python script. For example, you would like to keep Python code in the repository or would like to turn your notebook into a standalone package. I will show you 3 ways to export the Jupyter Notebook file to Python script. -
Save a Plot to a File in Matplotlib (using 14 formats)
November 08, 2022 by Aleksandra Płońska, Piotr Płoński Matplotlib
The
Matplotlib
is a popular plotting library for Python. It can be used in Python scripts and Jupyter Notebooks. The plot can be displayed in a separate window or a notebook. What if you would like to save the plot to a file? In this article, I will show you how to save theMatplotlib
plot into a file. It can be done by using 14 different formats. -
Complete list of 594 PyTZ timezones
November 08, 2022 by Aleksandra Płońska, Piotr Płoński Python
The
pyTZ
package is a Python implementation of the tz database. You can usepyTZ
to list all available timezones from thetz database
but not only. Below is a list of all available timezones inPyTZ
. It is in total 594 timezones. -
2 ways to save and load scikit-learn model
November 04, 2022 by Aleksandra Płońska, Piotr Płoński Scikit-learn
After training of Machine Learning model, you need to save it for future use. In this article, I will show you 2 ways to save and load
scikit-learn
models. One method is usingpickle
package, it is fast but the model can take more storage than in the second approach. The alternative is to usejoblib
package, which can save some space on disk but is slower than thepickle
. -
5 ways to publish Jupyter Notebook Presentation
November 03, 2022 by Aleksandra Płońska, Piotr Płoński Jupyter Presentation
Presentation created with Jupyter Notebook is exported to an HTML file. It is interactive, thanks to the Reveal.js library. There are several options to publish HTML presentations in the cloud.