Machine Learning
Red Wine Quality EDA in Python
Explore the Wine Quality dataset with distribution plots, correlation analysis, and quality score breakdown using an AI data analyst.
What
This AI Data Analyst workflow loads the Red Wine Quality CSV from a URL and inspects the dataset shape and column names. It generates exploratory visualizations including a quality score distribution plot and a full-feature correlation heatmap. It then ranks the features by correlation with the target quality score to identify the strongest positive and negative relationships.
Who
This is for data analysts and students who want a guided EDA example on a small, tabular machine-learning dataset. It helps anyone needing a quick way to validate basic dataset structure and identify candidate predictors for modeling.
Tools
- pandas
- numpy
- matplotlib
- seaborn
Outcomes
- Dataset loaded from URL with shape (1599, 12) and column list
- Distribution plot of quality scores showing most wines score 5–6
- 12x12 correlation heatmap across all features
- List of top correlations with quality, including alcohol (~+0.48) and volatile acidity (~-0.39)
Quality Score
9/10
Last scored: Apr 7, 2026
Task Completion: 2/2
ExcellentAll requested steps are present: dataset loaded from the given URL with shape and columns shown, quality distribution plotted, full correlation heatmap produced, and strongest correlations with quality identified.
Execution Correctness: 2/2
ExcellentCode is syntactically correct and consistent: pd.read_csv loads the CSV, plots render via seaborn/matplotlib, corr computed with df.corr(numeric_only=True), and correlations are sorted correctly using abs.
Output Quality: 3/3
ExcellentOutputs match expected outcomes: shape (1599, 12) and correct column list; quality distribution plot produced; heatmap corresponds to a 12x12 correlation matrix; top correlators align with expectations (alcohol ~+0.48, volatile acidity ~-0.39).
Reasoning Quality: 1/2
Needs workReasoning is mostly clear but includes some generic/hedged statements (e.g., 'most wines are expected to cluster') rather than directly reading the plotted distribution counts from output.
Reliability: 1/1
ExcellentWorkflow is reasonably robust (uses numeric_only for correlations and avoids hardcoding results), and reported correlation values are supported by the shown correlation table output.