Red Wine Quality EDA in Python

Explore the Wine Quality dataset with distribution plots, correlation analysis, and quality score breakdown using an AI data analyst.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Red Wine Quality CSV from a URL and inspects the dataset shape and column names. It generates exploratory visualizations including a quality score distribution plot and a full-feature correlation heatmap. It then ranks the features by correlation with the target quality score to identify the strongest positive and negative relationships.

Who this example is for

This is for data analysts and students who want a guided EDA example on a small, tabular machine-learning dataset. It helps anyone needing a quick way to validate basic dataset structure and identify candidate predictors for modeling.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Dataset loaded from URL with shape (1599, 12) and column list
  • Distribution plot of quality scores showing most wines score 5–6
  • 12x12 correlation heatmap across all features
  • List of top correlations with quality, including alcohol (~+0.48) and volatile acidity (~-0.39)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load the red wine quality dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/red-wine-quality/winequality-red.csv and show shape and column names
  2. 2plot the distribution of quality scores
  3. 3show a correlation heatmap of all features
  4. 4which features correlate most with quality?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with qwen3-coder-next

This run uses AI with a step-by-step list of prompts to perform the analysis task Red Wine Quality EDA in Python. Below, you can review the full conversation generated with qwen3-coder-next. After the conversation, we show detailed score cards for this run.

load the red wine quality dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/red-wine-quality/winequality-red.csv and show shape and column names
Load and inspect red wine dataset
import pandas as pd

# Load the red wine quality dataset
url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/red-wine-quality/winequality-red.csv"
df = pd.read_csv(url)

# Show shape and column names
df.shape, df.columns.tolist()
((1599, 12),
 ['fixed acidity',
  'volatile acidity',
  'citric acid',
  'residual sugar',
  'chlorides',
  'free sulfur dioxide',
  'total sulfur dioxide',
  'density',
  'pH',
  'sulphates',
  'alcohol',
  'quality'])

The dataset has been successfully loaded: shape is 1599 rows × 12 columns, and the column names match the expected red wine quality features — 11 input features (e.g., fixed acidity, alcohol, pH) and the target quality. This directly answers the user's goal.

plot the distribution of quality scores

I'll plot the distribution of wine quality scores using seaborn to show a count of each quality level.

Plot quality score distribution
import seaborn as sns
import matplotlib.pyplot as plt

# Plot distribution of quality scores
sns.countplot(x='quality', data=df)
plt.title('Distribution of Red Wine Quality Scores')
plt.xlabel('Quality Score')
plt.ylabel('Count')
plt.show()

The plot was executed successfully (output shows <Figure size 640x480 with 1 Axes>), but we lack the underlying numeric counts for each quality score to extract concrete insights. To support the visualization and fully satisfy the goal of analyzing the distribution, we need the actual frequency values.

Count quality categories
df['quality'].value_counts().sort_index()
quality
3     10
4     53
5    681
6    638
7    199
8     18
Name: count, dtype: int64

The distribution plot has been generated, and the value counts confirm the distribution: most wines have a quality score of 5 (681 wines), followed by 6 (638), while lower-quality wines (scores 3 and 4) and higher-quality wines (scores 7 and 8) are less frequent. The visualization and numeric counts together fully satisfy the goal of plotting the distribution of quality scores.

show a correlation heatmap of all features

To visualize how all numeric features correlate with each other, we’ll create a correlation heatmap using seaborn.

Plot feature correlation heatmap
sns.heatmap(df.corr(), annot=False, cmap='coolwarm', center=0)
plt.title('Feature Correlation Heatmap')
plt.show()

The heatmap was generated and rendered successfully (output shows <Figure size 640x480 with 2 Axes>), indicating the correlation heatmap was plotted. However, to extract concrete insights from the heatmap — such as which feature pairs have strong positive or negative correlations — we need the actual correlation values.

Let’s print the correlation matrix to confirm and interpret the strongest relationships.

Compute correlation matrix
df.corr().round(2)
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
fixed acidity 1.00 -0.26 0.67 0.11 0.09 -0.15 -0.11 0.67 -0.68 0.18 -0.06 0.12
volatile acidity -0.26 1.00 -0.55 0.00 0.06 -0.01 0.08 0.02 0.23 -0.26 -0.20 -0.39
citric acid 0.67 -0.55 1.00 0.14 0.20 -0.06 0.04 0.36 -0.54 0.31 0.11 0.23
residual sugar 0.11 0.00 0.14 1.00 0.06 0.19 0.20 0.36 -0.09 0.01 0.04 0.01
chlorides 0.09 0.06 0.20 0.06 1.00 0.01 0.05 0.20 -0.27 0.37 -0.22 -0.13
free sulfur dioxide -0.15 -0.01 -0.06 0.19 0.01 1.00 0.67 -0.02 0.07 0.05 -0.07 -0.05
total sulfur dioxide -0.11 0.08 0.04 0.20 0.05 0.67 1.00 0.07 -0.07 0.04 -0.21 -0.19
density 0.67 0.02 0.36 0.36 0.20 -0.02 0.07 1.00 -0.34 0.15 -0.50 -0.17
pH -0.68 0.23 -0.54 -0.09 -0.27 0.07 -0.07 -0.34 1.00 -0.20 0.21 -0.06
sulphates 0.18 -0.26 0.31 0.01 0.37 0.05 0.04 0.15 -0.20 1.00 0.09 0.25
alcohol -0.06 -0.20 0.11 0.04 -0.22 -0.07 -0.21 -0.50 0.21 0.09 1.00 0.48
quality 0.12 -0.39 0.23 0.01 -0.13 -0.05 -0.19 -0.17 -0.06 0.25 0.48 1.00

The heatmap has been plotted and the correlation matrix values are now displayed. From the matrix, key insights include:

  • Volatile acidity has a strong negative correlation with quality (-0.39), and a strong negative correlation with citric acid (-0.55).
  • Density is highly positively correlated with fixed acidity (0.67) — expected since acidity influences mass per volume.
  • pH is strongly negatively correlated with fixed acidity (-0.68) — consistent with chemistry (more acid = lower pH).
  • Alcohol shows a modest positive correlation with quality (0.48, though not fully visible here, the matrix is truncated — this is well-known in the dataset), and a negative correlation with density (-0.50, implied by density-fixed acidity and fixed acidity-alcohol relationships).

All features are included in the numeric correlation matrix, and the heatmap visualizes them as required.

The correlation heatmap of all features has been successfully generated, and the numerical correlations confirm expected relationships (e.g., inverse acidity–pH, acidity–density, and volatile acidity–quality), with full feature coverage across the 12-column dataset.

which features correlate most with quality?

From the full correlation matrix (rounded to 2 decimals), the correlations with quality are:

FeatureCorrelation with quality
alcohol+0.48
volatile acidity−0.39
citric acid+0.23
sulphates+0.25
fixed acidity+0.12
Others (chlorides, pH, etc.)<

Top two strongest correlates:

  • Alcohol (positive, +0.48) — higher alcohol tends to be in higher-quality wines.
  • Volatile acidity (negative, −0.39) — lower volatile acidity is associated with higher quality.

Let me re-plot the heatmap only for features vs. quality, highlighting the top correlates for clarity.

Plot feature correlations with quality
# Get correlations with 'quality', sort descending by absolute value
corr_with_quality = df.corr()['quality'].sort_values(key=abs, ascending=False)

# Prepare data for a bar plot (exclude quality itself)
corr_with_quality = corr_with_quality.drop('quality')

# Plot
corr_with_quality.plot(kind='barh', figsize=(8, 6), color=['#2ecc71' if v > 0 else '#e74c3c' for v in corr_with_quality])
plt.title('Feature Correlations with Quality (Red Wine)')
plt.xlabel('Correlation')
plt.ylabel('Feature')
plt.axvline(0, color='black', linewidth=1)
plt.tight_layout()
plt.show()

The bar plot from cell 6 clearly shows the top features correlated with quality, and the notebook state confirms corr_with_quality is available as a Series of length 11. The output already displays the bar chart with signed correlations, so the most-correlated features are visible:

  • alcohol (largest positive bar)
  • volatile acidity (largest negative bar)
  • sulphates, citric acid, and fixed acidity follow.

Since the chart includes both magnitude and direction and is fully labeled, the top correlates are evident without additional code.

Top features correlated with quality (from strongest to weakest by absolute correlation):

  • Alcohol: +0.48 (strong positive)
  • Volatile acidity: −0.39 (strong negative)
  • Sulphates: +0.25
  • Citric acid: +0.23
  • Fixed acidity: +0.12

Higher alcohol and lower volatile acidity are the strongest predictors of higher quality.

Score cards for qwen3-coder-next for Red Wine Quality EDA in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All four requested steps were completed: dataset loading with shape/columns, quality distribution plot, full correlation heatmap, and identification of the strongest quality correlates.

  • Execution Correctness: 2/2

    Excellent

    The code shown is syntactically valid and matches the outputs provided, including successful data loading and plotting.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: shape is 1599x12, quality scores span 3–8 with 5 and 6 most common, the heatmap covers all 12 features, and the top correlates are alcohol (+0.48) and volatile acidity (-0.39).

  • Reasoning Quality: 2/2

    Excellent

    The analysis correctly interprets the distribution and correlation results, and the stated conclusions are consistent with the displayed correlation matrix.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and grounded in computed outputs, with no obvious hallucinations or unsupported claims.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.