AutoML Technological Use Case

How AutoML Can Help

We're going to use a credit card fraud detection dataset in this instance. Its purpose is to forecast the probability of a fraudulent transaction. 284,807 data samples of transactions total in this extensive dataset. These samples comprise a range of attributes, including transaction amount, duration, and anonymised features obtained by principal component analysis (PCA). Our goal is to create prediction models that correctly identify fraudulent transactions by studying this information. This will assist financial institutions reduce losses and safeguard customers from fraud. Financial institutions' approaches to fraud detection, risk management, and data analysis are being completely transformed by AutoML. MLJAR AutoML provides many advantages through model development process automation that result in notable business growth.

Business Value

30%
Faster

When compared to conventional approaches, MLJAR AutoML can shorten the time needed for model development by about 30%, which might encourage creativity and shorten the time it takes for new features and products to reach the market.

25%
More Scalable

By improving scalability and flexibility by approximately 25%, MLJAR AutoML tools facilitate the handling of various data sources and enable adaptation to evolving business requirements and technology breakthroughs.

40%
More Accessible

MLJAR AutoML systems have the potential to democratize AI capabilities by making advanced machine learning techniques 40% more approachable for teams lacking deep knowledge. This will enable more teams to include machine learning into their projects.

20%
More Competitive

Tech companies can get a 20% competitive advantage by using MLJAR AutoML, which speeds up the construction and deployment of sophisticated machine learning models and improves product differentiation and market positioning.

AutoML Report

MLJAR AutoML offers profound insights into model performance, data analysis, and assessment metrics through its capacity to provide extensive reports filled with insightful data. Here are a couple such examples.

Leaderboard

To evaluate the effectiveness of trained models, AutoML has used logloss as its performance measure. As can be seen in the table and graph below, 3_Default_Xgboost was consequently chosen as the best model.

Best model	name	model_type	metric_type	metric_value	train_time
	1_Baseline	Baseline	logloss	0.0128227	1.96
	2_DecisionTree	Decision Tree	logloss	0.00327765	27.18
the best	3_Default_Xgboost	Xgboost	logloss	0.00171326	17.88
	4_Default_NeuralNetwork	Neural Network	logloss	0.00380802	28.83
	5_Default_RandomForest	Random Forest	logloss	0.00244796	38.73
	Ensemble	Ensemble	logloss	0.00171326	6.43

AutoML Performance

Spearman Correlation of Models

The plot illustrates the relationships between different models based on their rank-order correlation. Spearman correlation measures how well the relationship between two variables can be described using a monotonic function, highlighting the strength of the models' performance rankings. Higher correlation values indicate stronger relationships, where models perform similarly across different metrics or datasets.

models spearman correlation

Feature Importance

The plot visualizes the significance of various features across different models. In this heatmap, each cell represents the importance of a specific feature for a particular model, with the color intensity indicating the level of importance. Darker or more intense colors signify higher importance, while lighter colors indicate lower importance. This visualization helps in comparing the contribution of features across multiple models, highlighting which features consistently play a critical role and which are less influential in predictive performance.

Feature Importance across models

Install and import necessary packages

Install the packages with the command:

pip install pandas, mljar-supervised, scikit-learn

Import the packages into your code:

# import packages
import pandas as pd
from supervised import AutoML
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Load data

Import the dataset containing information about credit card fraud.

# read data from csv file
df = pd.read_csv(r"C:\Users\Your_user\technological-use-case\creditcard.csv")
# display data shape
print(f"Loaded data shape {df.shape}")
# display first rows
df.head()

Loaded data shape (284807, 31)

	Time	V1	V2	V3	V4	V5	V6	V7	V8	V9	...	V21	V22	V23	V24	V25	V26	V27	V28	Amount
0	0.0	-1.359807	-0.072781	2.536347	1.378155	-0.338321	0.462388	0.239599	0.098698	0.363787	...	-0.018307	0.277838	-0.110474	0.066928	0.128539	-0.189115	0.133558	-0.021053	149.62
1	0.0	1.191857	0.266151	0.166480	0.448154	0.060018	-0.082361	-0.078803	0.085102	-0.255425	...	-0.225775	-0.638672	0.101288	-0.339846	0.167170	0.125895	-0.008983	0.014724	2.69
2	1.0	-1.358354	-1.340163	1.773209	0.379780	-0.503198	1.800499	0.791461	0.247676	-1.514654	...	0.247998	0.771679	0.909412	-0.689281	-0.327642	-0.139097	-0.055353	-0.059752	378.66
3	1.0	-0.966272	-0.185226	1.792993	-0.863291	-0.010309	1.247203	0.237609	0.377436	-1.387024	...	-0.108300	0.005274	-0.190321	-1.175575	0.647376	-0.221929	0.062723	0.061458	123.50
4	2.0	-1.158233	0.877737	1.548718	0.403034	-0.407193	0.095921	0.592941	-0.270533	0.817739	...	-0.009431	0.798278	-0.137458	0.141267	-0.206010	0.502292	0.219422	0.215153	69.99

5 rows × 31 columns

Split dataframe to train/test

To split a dataframe into train and test sets, we divide the data to create separate datasets for training and evaluating a model. This ensures we can assess the model's performance on unseen data.

This step is essential when you have only one base dataset.

# split data
train, test = train_test_split(df, train_size=0.95, shuffle=True, random_state=42)
# display data shapes
print(f"All data shape {df.shape}")
print(f"Train shape {train.shape}")
print(f"Test shape {test.shape}")

All data shape (284807, 31)
Train shape (270566, 31)
Test shape (14241, 31)

Select X,y for ML training

We will split the training set into features (X) and target (y) variables for model training.

# create X columns list and set y column
x_cols = ["Time", "V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21", "V22", "V23", "V24", "V25", "V26", "V27", "V28", "Amount"]
y_col = "Class"
# set input matrix
X = train[x_cols]
# set target vector
y = train[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")

X shape is (270566, 30)
y shape is (270566,)

Fit AutoML

We need to train a model for our dataset. The fit() method will handle the model training and optimization automatically.

# create automl object
automl = AutoML(total_time_limit=300, mode="Explain")
# train automl
automl.fit(X, y)

Compute predictions

Use the trained AutoML model to make predictions on test data.

# predict with AutoML
predictions = automl.predict(test)
# predicted values
print(predictions)

[1 0 0 ... 0 0 0]

Compute accuracy

We need to retrieve the true values of credit card fraud to compare with our predictions. After that, we compute the accuracy score.

# get true values of fraud
true_values = test["Class"]
# compute metric
metric_accuracy = accuracy_score(true_values, predictions)
print(f"Accuracy: {metric_accuracy}")

Accuracy: 0.9915736254476512

Conlusions

When applied to climate model simulations, MLJAR AutoML provides a number of benefits for enhancing model dependability and comprehending accidents in the simulation. MLJAR AutoML improves the precision of simulation results prediction and failure root cause analysis by automating the intricate process of data analysis. Due to this advanced automation, researchers can handle large amounts of simulation data quickly and effectively, identifying trends and other elements that they might have missed using more conventional techniques. The increasing assimilation of this technology has potential to propel our comprehension of climate processes and augment the accuracy of climate forecasts.

See you soon👋.