AutoML Medical Use Case

Explore how MLJAR AutoML transforms healthcare with cutting-edge machine learning. Discover its impact on disease prediction, treatment optimization, and patient care, driving efficiency and accuracy in medical practice.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

How AutoML Can Help

MLJAR AutoML simplifies complex medical analysis by automating machine learning processes. It excels in tasks such as diagnosing diseases, predicting patient outcomes, and personalizing treatment plans with remarkable precision and ease. The breast cancer dataset comprises various attributes, including tumor size, texture, and shape, used to predict whether a tumor is malignant or benign. By capturing these critical features, MLJAR AutoML enhances diagnostic accuracy and streamlines healthcare processes. This tool drives growth in the medical field by automating data analysis, delivering precise insights that improve patient care and optimize clinical decision-making.

Let's make some diagnoses 🩺!

# import packages
import pandas as pd
from sklearn.model_selection import train_test_split
from supervised import AutoML
from sklearn.metrics import accuracy_score

Load training data

Import the breast cancer dataset for analysis and model building.

# load example dataset
df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/breast_cancer_wisconsin/data.csv")
# display DataFrame shape
print(f"Loaded data shape {df.shape}")
# display first rows
df.head()

Split dataframe to train/test

To split a dataframe into train and test sets, we divide the data to create separate datasets for training and evaluating a model. This ensures we can assess the model's performance on unseen data.

This step is essential when you have only one base dataset.

# split data
train, test = train_test_split(df, train_size=0.75, shuffle=True, random_state=42)
# display data shapes
print(f"All data shape {df.shape}")
print(f"Train shape {train.shape}")
print(f"Test shape {test.shape}")

Select X,y for ML training

We will split the training set into features (X_train) and target (y_train) variables for model training.

# create X columns list and set y column
x_cols = ["id", "radius_mean", "texture_mean", "perimeter_mean", "area_mean", "smoothness_mean", "compactness_mean", "concavity_mean", "concave points_mean", "symmetry_mean", "fractal_dimension_mean", "radius_se", "texture_se", "perimeter_se", "area_se", "smoothness_se", "compactness_se", "concavity_se", "concave points_se", "symmetry_se", "fractal_dimension_se", "radius_worst", "texture_worst", "perimeter_worst", "area_worst", "smoothness_worst", "compactness_worst", "concavity_worst", "concave points_worst", "symmetry_worst", "fractal_dimension_worst"]
y_col = "diagnosis"
# set input matrix
X_train = train[x_cols]
# set target vector
y_train = train[y_col]
# display data shapes
print(f"X_train shape is {X_train.shape}")
print(f"y_train shape is {y_train.shape}")

Fit AutoML

We need to train a model for our dataset. The fit() method will handle the model training and optimization automatically.

# create automl object
automl = AutoML(total_time_limit=300, mode="Explain")
# train automl
automl.fit(X_train, y_train)

Compute predictions

Use the trained AutoML model to make predictions on test data to identify cancer cases.

# predict with AutoML
predictions = automl.predict(test)
# predicted values
print(predictions)

Compute accuracy

We need to retrieve the true values of employee attrition to compare with our predictions. After that, we compute the accuracy score.

true_values = test["diagnosis"]
# compute metric
metric_accuracy = accuracy_score(true_values, predictions)
print(f"Accuracy: {metric_accuracy}")

Conlusions

The application of MLJAR AutoML in breast cancer research and diagnosis offers significant advantages. It enables accurate predictions, early detection, and personalized treatment plans by automating complex data analysis. AutoML's ability to handle vast datasets efficiently aids in identifying patterns and insights that might be missed by traditional methods. As this technology progresses, its impact on breast cancer diagnosis and treatment will become increasingly vital, leading to improved patient outcomes and more effective healthcare strategies.

See you soon👋.

Recipes used in the automl-medical-use-case.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the automl-medical-use-case.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

pandas>=1.0.0

scikit-learn>=1.5.0

mljar-supervised>=1.1.7