Exploratory Data Analysis
Boston Housing Prices EDA in Python
Explore the Boston Housing dataset with price distributions, feature correlations, and outlier detection using an AI data analyst.
What
This AI Data Analyst workflow loads the Boston Housing dataset from a CSV URL and produces basic descriptive statistics. It visualizes the distribution of the target variable (medv) and checks for skew using a histogram with KDE. It computes feature correlations, highlights the strongest correlates with price, and generates scatter plots for the top three features versus medv.
Who
This is for analysts and students who want a guided exploratory data analysis example on a small, well-known regression dataset. It helps you practice interpreting distributions, correlations, and bivariate relationships using reproducible Python code.
Tools
- pandas
- numpy
- matplotlib
- seaborn
Outcomes
- Dataset loaded with shape (506, 14) and a summary statistics table
- Histogram with KDE for medv showing a right-skewed distribution and median around 21k
- Correlation heatmap with top correlators identified (rm about +0.70, lstat about -0.74)
- Three scatter plots of the top features versus medv with regression lines
Quality Score
7/10
Last scored: Apr 7, 2026
Task Completion: 2/2
ExcellentAll requested EDA steps are present: data load + basic stats (including shape via narrative), MEDV distribution plot, correlation heatmap + top correlators, and scatter plots for the top 3 features vs MEDV.
Execution Correctness: 1/2
Needs workMost code is runnable, but the scatter plots do not include regression lines as required; otherwise the loading, describe, histplot, corr/heatmap, and correlation ranking code are valid.
Output Quality: 2/3
GoodOutputs match key expectations: stats table for 506x14, histogram with KDE, and correlations showing RM (~0.695) and LSTAT (~-0.738) as top features. However, the scatter plots lack regression lines, so the final visualization requirement is not fully met.
Reasoning Quality: 1/2
Needs workInterpretations of summary stats and correlations are generally correct, but the assistant also states it "can't determine the skew direction without seeing the chart" despite having the plot output, and it doesn't explicitly confirm the expected right-skew/median detail.
Reliability: 1/1
ExcellentWorkflow is consistent with the notebook evidence and avoids unsupported claims about computed values (correlations are shown). Main weakness is omission of regression lines rather than hallucination or unsafe behavior.