AutoML Financial Use Case

Discover how MLJAR AutoML revolutionizes finance by boosting credit scoring accuracy, speeding up risk assessments, automating feature engineering, and cutting development costs for smarter, faster financial decisions.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

How AutoML Can Help

In our case, we are going to use a credit scoring dataset. It is designed to predict the likelihood that an individual will experience financial difficulties in the next two years. This comprehensive dataset contains 150,000 data samples of individuals, which include various characteristics such as age, monthly income, debt ratio, number of dependents, and more. By analyzing this dataset, we aim to develop predictive models that accurately estimate the likelihood of financial distress, helping institutions make informed credit decisions and implement proactive risk management strategies. MLJAR AutoML is revolutionizing financial institutions' approaches to credit scoring, risk management, and data analysis. By automating the model development process, MLJAR AutoML offers numerous benefits that translate into significant business growth.

So let's get started ๐Ÿค—!

# import packages
import pandas as pd
from sklearn.model_selection import train_test_split
from supervised import AutoML
from sklearn.metrics import accuracy_score

Load data

Import relevant financial data for credit scoring analysis.

# load example dataset
df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/credit/data.csv")
# display DataFrame shape
print(f"Loaded data shape {df.shape}")
# display first rows
df.head()

Split dataframe to train/test

To split a dataframe into train and test sets, we divide the data to create separate datasets for training and evaluating a model. This ensures we can assess the model's performance on unseen data.

This step is essential when you have only one base dataset.

# split data
train, test = train_test_split(df, train_size=0.75, shuffle=True, random_state=42)
# display data shapes
print(f"All data shape {df.shape}")
print(f"Train shape {train.shape}")
print(f"Test shape {test.shape}")

Select X,y for ML training

We will split the training set into features (x_train) and target (y_train) variables for model training.

# create X columns list and set y column
x_cols = ["Id", "RevolvingUtilizationOfUnsecuredLines", "age", "NumberOfTime30-59DaysPastDueNotWorse", "DebtRatio", "MonthlyIncome", "NumberOfOpenCreditLinesAndLoans", "NumberOfTimes90DaysLate", "NumberRealEstateLoansOrLines", "NumberOfTime60-89DaysPastDueNotWorse", "NumberOfDependents"]
y_col = "SeriousDlqin2yrs"
# set input matrix
x_train = train[x_cols]
# set target vector
y_train = train[y_col]
# display data shapes
print(f"x_train shape is {x_train.shape}")
print(f"y_train shape is {y_train.shape}")

Select X,y for evaluating the ML model

We will split the test set into features (x_test) and target (y_test) variables to evaluate the model's performance.

# create X columns list and set y column
x_cols = ["Id", "RevolvingUtilizationOfUnsecuredLines", "age", "NumberOfTime30-59DaysPastDueNotWorse", "DebtRatio", "MonthlyIncome", "NumberOfOpenCreditLinesAndLoans", "NumberOfTimes90DaysLate", "NumberRealEstateLoansOrLines", "NumberOfTime60-89DaysPastDueNotWorse", "NumberOfDependents"]
y_col = "SeriousDlqin2yrs"
# set input matrix
x_test = test[x_cols]
# set target vector
y_test = test[y_col]
# display data shapes
print(f"x_test shape is {x_test.shape}")
print(f"y_test shape is {y_test.shape}")

Fit AutoML

We need to train a model for our dataset. The fit() method will handle the model training and optimization automatically.

# create automl object
automl = AutoML(total_time_limit=300, mode="Explain")
# train automl
automl.fit(x_train, y_train)

Compute predictions

Generate predictions on the test data and display the results.

# predict with AutoML
predictions = automl.predict(x_test)
# predicted values
print(predictions)

Compute accuracy

We are computing the accuracy score and valid values (y_test) with our predictions.

# compute metric
metric_accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {metric_accuracy}")

Conclusions

The integration of MLJAR AutoML in the financial sector, particularly for credit scoring, is a game-changer. AutoML provides highly accurate credit risk assessments, streamlined loan approvals, and enhanced fraud detection. By automating complex data analysis and decision-making processes, it enables financial institutions to make more informed, timely decisions while improving efficiency and reducing operational costs. As AutoML technology continues to evolve, its role in enhancing accuracy and reliability in credit scoring and other financial applications will become increasingly vital, driving innovation and growth in the industry.

See you soon๐Ÿ‘‹.

Recipes used in the automl-financial-use-case.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the automl-financial-use-case.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

pandas>=1.0.0

scikit-learn>=1.5.0

mljar-supervised>=1.1.7