Red Wine Quality EDA in Python

Explore the Wine Quality dataset with distribution plots, correlation analysis, and quality score breakdown using an AI data analyst.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Red Wine Quality CSV from a URL and inspects the dataset shape and column names. It generates exploratory visualizations including a quality score distribution plot and a full-feature correlation heatmap. It then ranks the features by correlation with the target quality score to identify the strongest positive and negative relationships.

Who this example is for

This is for data analysts and students who want a guided EDA example on a small, tabular machine-learning dataset. It helps anyone needing a quick way to validate basic dataset structure and identify candidate predictors for modeling.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Dataset loaded from URL with shape (1599, 12) and column list
  • Distribution plot of quality scores showing most wines score 5–6
  • 12x12 correlation heatmap across all features
  • List of top correlations with quality, including alcohol (~+0.48) and volatile acidity (~-0.39)

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load the red wine quality dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/red-wine-quality/winequality-red.csv and show shape and column names
  2. 2plot the distribution of quality scores
  3. 3show a correlation heatmap of all features
  4. 4which features correlate most with quality?

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gemma4:31b

This run uses AI with a step-by-step list of prompts to perform the analysis task Red Wine Quality EDA in Python. Below, you can review the full conversation generated with gemma4:31b. After the conversation, we show detailed score cards for this run.

load the red wine quality dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/red-wine-quality/winequality-red.csv and show shape and column names
Thinking

The user wants to load the Red Wine Quality dataset from a specific URL and see its shape and column names.

Plan:

  1. Load the dataset using pandas.read_csv.
  2. Display the shape.
  3. Display the columns.
  4. Display the first few rows (head()) as per the "load and inspect" rule.

I'll start by loading the red wine quality dataset and checking its basic structure.

Load and inspect data
import pandas as pd

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/red-wine-quality/winequality-red.csv"
df = pd.read_csv(url)

# Inspect dimensions, columns and first few rows
print(f"Shape: {df.shape}")
display(df.columns)
display(df.head())
Shape: (1599, 12)
Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='str')
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
Thinking

The user wants to load the red wine quality dataset from a specific URL and show the shape and column names.

  • Code executed: df = pd.read_csv(url), print(f"Shape: {df.shape}"), display(df.columns), display(df.head()).

  • Output: Shape: (1599, 12), Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol', 'quality'], dtype='str'), and the head of the dataframe.

  • Original goal: load the dataset, show shape, and show column names.

  • Current state: Data is loaded, shape is printed (1599, 12), and column names are displayed.

The current output fully satisfies the original goal.

Path A.

The red wine quality dataset has been loaded. It contains 1,599 rows and 12 columns. The columns include various chemical properties such as fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol, as well as the target quality rating.

plot the distribution of quality scores
Thinking

The user wants to plot the distribution of the quality column. The dataset df is already loaded. The quality column contains scores (likely integers). A histogram or a count plot is appropriate for a distribution of a discrete variable like quality scores. I will use seaborn for this.

Plan:

  1. Use sns.countplot or sns.histplot to show the distribution of quality.
  2. Add title and axis labels.

I'll plot a count plot to see how the quality scores are distributed across the dataset.

Plot wine quality distribution
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 5))
sns.countplot(data=df, x='quality', palette='viridis')
plt.title('Distribution of Wine Quality Scores')
plt.xlabel('Quality Score')
plt.ylabel('Count')
plt.show()
/var/folders/pp/x90pfm0140d4glxshn77bthh0000gn/T/ipykernel_6823/493604273.py:5: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.countplot(data=df, x='quality', palette='viridis')
Thinking

The user wants to plot the distribution of quality scores. The latest output is a sns.countplot (visual output). According to the "Sufficiency heuristics": "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart." The user's goal was "plot the distribution of quality scores". The chart was generated, but the numeric distribution (counts for each quality score) was not printed. I need to print the value counts for the quality column to provide a complete answer.

Path B:

  • Insights: The plot shows the distribution of wine quality scores, but we don't have the exact counts for each score.
  • Code: df['quality'].value_counts().sort_index() to see the actual numbers.

The chart shows the distribution of wine quality scores, but the exact counts for each score are not currently visible. I'll generate the frequency table to provide the precise numbers behind the visualization.

Count quality values
df['quality'].value_counts().sort_index()
quality
3     10
4     53
5    681
6    638
7    199
8     18
Name: count, dtype: int64
Thinking

The user's original goal is to "plot the distribution of quality scores".

Looking at the notebook state:

  • Cell 1: Loaded the data.
  • Cell 2: Plotted the distribution of 'quality' using sns.countplot.
  • Cell 3: Calculated the value counts for 'quality' and sorted them.

Output 2 is a figure showing the distribution. Output 3 provides the exact counts:

  • 3: 10
  • 4: 53
  • 5: 681
  • 6: 638
  • 7: 199
  • 8: 18

The user goal was to plot the distribution. This has been done visually (Cell 2) and numerically (Cell 3). According to the sufficiency heuristics, if the goal is to show/plot/chart/visualize and the output contains both a valid chart and explicit numeric values, Path A should be chosen.

Insights from outputs:

  • The distribution is centered around scores 5 and 6, which contain the vast majority of the samples (681 and 638 respectively).
  • High quality (score 8) and low quality (score 3) wines are rare.
  • The data is slightly skewed, with a higher frequency of average-to-good quality wines.

The goal is complete.

Plan:

  • Extract insights from the plot and the value counts.
  • Provide a final answer.
  • Use <!--final answer-->.

The distribution of quality scores shows that the majority of the wines fall into the middle categories, with scores of 5 (681 wines) and 6 (638 wines) being the most frequent. High-quality wines (score 8) and low-quality wines (score 3) are relatively rare, with only 18 and 10 occurrences respectively. This indicates a bell-shaped distribution centered around average quality.

show a correlation heatmap of all features
Thinking

The user wants to see a correlation heatmap of all features in the df DataFrame.

- `df` is already loaded.
- Columns: `fixed acidity`, `volatile acidity`, `citric acid`, `residual sugar`, `chlorides`, `free sulfur dioxide`, `total sulfur dioxide`, `density`, `pH`, `sulphates`, `alcohol`, `quality`.
- Libraries `seaborn` and `matplotlib.pyplot` are already imported.

- Calculate the correlation matrix using `df.corr()`.
- Plot the heatmap using `sns.heatmap`.
- Use `annot=True` to show values.
- Use a diverging color map (e.g., 'coolwarm').
- Set a suitable figure size.

- Propose one next step: Create the correlation heatmap.
- Provide the code block.
- Follow the notebook rules (minimal code, one block).

I'll create a correlation heatmap to visualize the relationships between all the features, which will help identify which variables are most strongly linked to the wine quality.

Plot correlation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap of Wine Features')
plt.show()
Thinking

The user wants a correlation heatmap of all features. Looking at the notebook state:

  • cell-4 executed the code to generate the heatmap.
  • output-4 contains <Figure size 1200x800 with 2 Axes>, which means the heatmap was successfully generated.

However, the "Sufficiency heuristics" state: "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart."

The current output is only the figure object. I don't have the numeric correlation values in the text of the notebook state (the output shows the figure, but not the correlation matrix itself as a table).

To provide a comprehensive answer and follow the heuristics, I should first extract the correlation values for the quality column (the target) to provide concrete insights.

Path B is required to print the correlation matrix or specifically the correlations with quality to support the visual findings.

Wait, the original goal was just to "show a correlation heatmap of all features". The heatmap has been shown. But the instructions say "If the latest output is chart-only... choose Path B first to print the values used in the chart."

Actually, looking at the prompt, the goal is "show a correlation heatmap". The plot is there. But I need to provide "concrete insights from observed outputs". Without the numbers, I'm just guessing from the image (which I can't actually "see" in the raw string output-4, though I know the code ran).

Actually, as an AI Data Analyst, I should analyze the correlation matrix numerically to give a precise answer.

Let's check the correlation matrix values.

Plan:

  1. Provide insights based on the fact that the heatmap was generated.
  2. Use Path B to print the correlation values, specifically for the target variable quality, to allow for concrete insight extraction.

Actually, if the user just asked to "show" it, and it's shown, maybe it's enough? No, the heuristics are strict: "If the latest output is chart-only... choose Path B first to print the values".

I will print the correlation matrix (specifically the correlation of all features with quality).

The correlation heatmap has been generated, providing a visual overview of how the chemical properties of the wine relate to one another and to the quality score. To provide precise insights into which features most strongly influence the quality, I will extract the numerical correlation values.

Calculate quality correlations
df.corr()['quality'].sort_values(ascending=False)
quality                 1.000000
alcohol                 0.476166
sulphates               0.251397
citric acid             0.226373
fixed acidity           0.124052
residual sugar          0.013732
free sulfur dioxide    -0.050656
pH                     -0.057731
chlorides              -0.128907
density                -0.174919
total sulfur dioxide   -0.185100
volatile acidity       -0.390558
Name: quality, dtype: float64
Thinking

The user wants a correlation heatmap of all features. Looking at the notebook state:

  • Cell 4 generated the correlation heatmap (sns.heatmap(df.corr(), ...)).
  • Cell 5 provided the numerical correlations between features and the target quality.

The user's original goal was "show a correlation heatmap of all features". The heatmap was produced in Cell 4. The numeric values for the target column were produced in Cell 5.

According to the sufficiency heuristics:

  • "If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A".

The user has seen the plot and the sorted correlations for the target variable. This is sufficient to satisfy the goal of "showing a correlation heatmap" and interpreting its most important parts.

Insights to extract:

  • alcohol has the strongest positive correlation with quality (0.48).
  • volatile acidity has the strongest negative correlation with quality (-0.39).
  • Other features like sulphates and citric acid show moderate positive correlations.
  • Features like total sulfur dioxide and density show moderate negative correlations.

I will provide a final answer summarizing these findings.

The correlation heatmap successfully visualizes the relationships between all chemical properties of the wine. Based on the resulting coefficients:

  • Strongest Positive Driver: alcohol has the highest positive correlation with quality (0.48), suggesting that higher alcohol content is generally associated with higher-rated wines.
  • Strongest Negative Driver: volatile acidity shows a strong negative correlation (-0.39), indicating that as the perceived vinegar-like acidity increases, the quality score tends to drop.
  • Other notable factors: sulphates (0.25) and citric acid (0.23) also positively influence quality, while total sulfur dioxide (-0.19) and density (-0.17) have a negative impact.
which features correlate most with quality?
Thinking

The user wants to know which features correlate most with quality. I have the notebook state. The latest output [output-5] contains the sorted correlation values for quality.

Analysis of [output-5]:

  • Positive correlations:
    • alcohol: 0.476 (Strongest positive)
    • sulphates: 0.251
    • citric acid: 0.226
  • Negative correlations:
    • volatile acidity: -0.391 (Strongest negative)
    • total sulfur dioxide: -0.185
    • density: -0.175

I should summarize these findings for the user. No code is needed because the data is already present in the notebook state.

Wait, the prompt asks "which features correlate most with quality?". I can provide a clear answer based on the data.

Positive influence: alcohol is the strongest. Negative influence: volatile acidity is the strongest.

I will state these findings clearly.

Based on the correlation analysis, the features that correlate most strongly with wine quality are:

  • Strongest Positive Correlation: alcohol (0.48), indicating that higher alcohol content is generally associated with higher quality scores.
  • Strongest Negative Correlation: volatile acidity (-0.39), meaning that as the volatile acidity increases, the quality score tends to decrease.

Other notable contributors include sulphates and citric acid (positive correlation) and total sulfur dioxide and density (negative correlation).

Score cards for gemma4:31b for Red Wine Quality EDA in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All four requested steps were completed: dataset loaded with shape/columns, quality distribution plotted, correlation heatmap shown, and top correlations with quality identified.

  • Execution Correctness: 2/2

    Excellent

    The code is coherent and likely runnable as written. It uses standard pandas/seaborn/matplotlib operations and the provided outputs match the executed steps.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: shape (1599, 12), quality counts showing 5 and 6 as most common, a 12x12 heatmap, and the strongest correlations with alcohol and volatile acidity.

  • Reasoning Quality: 2/2

    Excellent

    The workflow correctly interprets the dataset, chooses an appropriate count plot for discrete quality scores, and extracts the relevant correlation ranking for quality.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and grounded in the data, with no obvious hallucinations or unsupported claims.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.