Data Analysis
Exploratory Data Analysis (EDA) in Python
Complete EDA workflow: load data, check quality, compute correlations, detect outliers, and visualize distributions — all with an AI data analyst.
What
This AI Data Analyst workflow loads the scikit-learn diabetes dataset and summarizes its structure, feature names, and basic statistics. It generates correlation analysis across all features, including a full correlation matrix visualization. It detects per-feature outliers using the IQR method and identifies which features relate most strongly to the target with supporting plots and correlation coefficients.
Who
This is for analysts and students who want a repeatable EDA template for a small, well-known regression dataset. It helps you practice data quality checks, correlation inspection, and outlier detection in a conversational, code-generating notebook.
Tools
- scikit-learn
- pandas
- numpy
- matplotlib
- seaborn
Outcomes
- Dataset shape (442, 10) with feature names and summary stats
- 10x10 correlation heatmap of all features
- Outlier counts per feature using the IQR method
- Top 3 feature vs target scatter plots with correlation coefficients
Quality Score
7/10
Last scored: Apr 7, 2026
Task Completion: 1/2
Needs workLoaded the sklearn diabetes dataset, produced a 10x10 feature correlation heatmap, computed IQR outlier counts per feature, and identified strongest feature-target correlations. However, it did not produce the required scatter plots of the top 3 features vs target with correlation coefficients.
Execution Correctness: 2/2
ExcellentAll provided code blocks are syntactically correct and consistent (data loading, correlation heatmap, IQR outlier table, and target correlation ranking) and are likely runnable as-is.
Output Quality: 2/3
GoodOutputs match most expected outcomes: dataset shape info (442 rows, 10 features), a 10x10 correlation heatmap, and a clear outlier summary table plus feature-target correlations. Missing the scatter plots requirement prevents a full match.
Reasoning Quality: 1/2
Needs workInterpretations of correlations and outlier distribution are generally aligned with the computed outputs, but some narrative claims (e.g., specific correlation patterns in the heatmap) are not directly evidenced by numeric values shown.
Reliability: 1/1
ExcellentWorkflow is consistent and uses standard, robust methods (pandas corr, seaborn heatmap, IQR rule) without hallucinated APIs; results are reproducible from the code.