Red Wine Quality EDA in Python
Explore the Wine Quality dataset with distribution plots, correlation analysis, and quality score breakdown using an AI data analyst.
What this AI workflow does
This AI Data Analyst workflow loads the Red Wine Quality CSV from a URL and inspects the dataset shape and column names. It generates exploratory visualizations including a quality score distribution plot and a full-feature correlation heatmap. It then ranks the features by correlation with the target quality score to identify the strongest positive and negative relationships.
Who this example is for
This is for data analysts and students who want a guided EDA example on a small, tabular machine-learning dataset. It helps anyone needing a quick way to validate basic dataset structure and identify candidate predictors for modeling.
Expected analysis outcomes
These are the results the AI workflow is expected to generate.
- Dataset loaded from URL with shape (1599, 12) and column list
- Distribution plot of quality scores showing most wines score 5–6
- 12x12 correlation heatmap across all features
- List of top correlations with quality, including alcohol (~+0.48) and volatile acidity (~-0.39)
Tools and libraries used
Main Python packages and tooling used to run this AI data analysis task.
- pandas
- numpy
- matplotlib
- seaborn
Prompt sequence
This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.
- 1load the red wine quality dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/red-wine-quality/winequality-red.csv and show shape and column names
- 2plot the distribution of quality scores
- 3show a correlation heatmap of all features
- 4which features correlate most with quality?
Model Comparison
We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.
| Model Source | Score | Task | Execution | Output | Reasoning | Reliability | Run |
|---|---|---|---|---|---|---|---|
| gemma4:31b | 10/10 | 2/2 | 2/2 | 3/3 | 2/2 | 1/1 | Open gemma4:31b conversation |
| glm-5.1 | 10/10 | 2/2 | 2/2 | 3/3 | 2/2 | 1/1 | Open glm-5.1 conversation |
| gpt-5.4 | 10/10 | 2/2 | 2/2 | 3/3 | 2/2 | 1/1 | Open gpt-5.4 conversation |
| gpt-oss:120b | 10/10 | 2/2 | 2/2 | 3/3 | 2/2 | 1/1 | Open gpt-oss:120b conversation |
| qwen3-coder-next | 10/10 | 2/2 | 2/2 | 3/3 | 2/2 | 1/1 | Open qwen3-coder-next conversation |
| qwen3.5:397b | 10/10 | 2/2 | 2/2 | 3/3 | 2/2 | 1/1 | Open qwen3.5:397b conversation |
How scoring works
Each run is graded on five dimensions:
- Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
- Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
- Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
- Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
- Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.
The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.
AI Analysis with gpt-oss:120b
This run uses AI with a step-by-step list of prompts to perform the analysis task Red Wine Quality EDA in Python. Below, you can review the full conversation generated with gpt-oss:120b. After the conversation, we show detailed score cards for this run.
Score cards for gpt-oss:120b for Red Wine Quality EDA in Python
Total score: 10/10
Task Completion: 2/2
ExcellentAll major requested steps were completed: dataset loaded with correct shape/columns, quality distribution plotted, correlation heatmap shown, and features correlated with quality identified.
Execution Correctness: 2/2
ExcellentThe final code is syntactically sound and the outputs indicate it ran successfully, including the corrected dataset load and correlation computation.
Output Quality: 3/3
ExcellentOutputs match the expected outcomes semantically: shape (1599, 12), quality scores distribution, 12x12 heatmap, and the top correlators alcohol and volatile acidity with the expected signs and approximate magnitudes.
Reasoning Quality: 2/2
ExcellentThe workflow correctly diagnosed the initial parsing issue, reloaded the CSV properly, and interpreted the correlation results accurately and helpfully.
Reliability: 1/1
ExcellentThe workflow is consistent and robust, with a clear correction step after the initial loading error.
Try MLJAR Studio
Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.
Explore More AI Analysis Examples
Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.