Biotechnology

AI and Machine Learning for Protein Sequence–Structure–Function Modeling: A High-Throughput Research Case Study

machine learning
artificial intelligence
AutoML
protein engineering
AlphaFold2
sequence-structure-function
molecular docking
explainable AI
SHAP
biocatalysis

MLJAR tools were used in the following publication.

Guiding Discovery of Protein Sequence-Structure-Function Modeling

Hussain, Azam, Brooks III, Charles L.

University of Michigan, Macromolecular Science and Engineering Program, Ann Arbor, MI, USA | University of Michigan, Department of Chemistry, Ann Arbor, MI, USA | University of Michigan, Biophysics Program, Ann Arbor, MI, USA

This bioRxiv study presents a high-throughput AI pipeline for protein engineering that links sequence, structure, and function using AlphaFold2, GPU-accelerated docking, and machine learning models. The workflow predicts enzyme stereoselectivity and reactivity across a large ancestral protein library and then applies explainable AI (SHAP) to identify key residues in binding and second-shell regions. Using AutoML-style model selection with ensemble trees (CatBoost, XGBoost, Random Forest), the approach achieves strong agreement with experimental screens and recovers known functional switches. The work illustrates how artificial intelligence and automated modeling can accelerate biocatalyst discovery and rational enzyme design.

Bioinformatics • July 16, 2023

DOI: 10.1101/2023.07.14.548822

Research Domains

Explore peer-reviewed and applied machine learning studies across diverse domains, including healthcare analytics, financial modeling, manufacturing optimization, and structured data classification problems.

Why Researchers and ML Engineers Choose MLJAR Studio

A private, AI-powered Python notebook designed for reproducible machine learning experiments, structured benchmarking, and applied research workflows - fully under your control.

Reproducible Machine Learning Experiments

Design structured pipelines, save experiment runs, and compare results across iterations with full transparency. Every validation setup, hyperparameter configuration, and model benchmark is recorded - making your research repeatable and defensible.

Local-First Execution & Data Control

Run all workflows directly on your machine. Sensitive datasets remain private, with no mandatory cloud uploads or external AI services required. Maintain full control over runtime environments and compliance requirements.

Autonomous Model Benchmarking & Optimization

Automatically compare candidate models, perform cross-validation, and run hyperparameter optimization while retaining full visibility into generated Python code and evaluation metrics. Accelerate experimentation without sacrificing methodological rigor.

Build Research-Grade ML Workflows Locally

Run automated model benchmarking, hyperparameter optimization, and autonomous experiments while keeping full control over your data.

Download MLJAR Studio

View Documentation