Exploratory Data Analysis
Titanic Survival Analysis in Python
Explore the Titanic dataset with survival rates by class, sex, and age, handle missing values, and visualize patterns using an AI data analyst.
What
This AI Data Analyst workflow loads the Titanic training dataset from a URL and computes the overall survival rate and dataset shape. It generates visual comparisons of survival rates by passenger class and sex, and plots age distributions for survivors versus non-survivors. It also audits missing values by column to identify fields that need cleaning or imputation.
Who
This is for analysts and students who want a guided, conversational EDA example on a well-known classification dataset. It helps anyone practicing data cleaning and basic demographic breakdowns with reproducible Python code and plots.
Tools
- pandas
- numpy
- matplotlib
- seaborn
- requests
Outcomes
- Load the Titanic CSV from the provided URL and confirm shape (891, 12)
- Compute overall survival rate (38.4%)
- Create a grouped bar chart of survival rate by class and sex
- Plot survivor vs non-survivor age distributions
- Report missing values: Age 177, Cabin 687, Embarked 2
Quality Score
7/10
Last scored: Apr 7, 2026
Task Completion: 1/2
Needs workLoaded the dataset, computed overall survival rate, produced the requested plots, and computed missing values. However, it did not report the dataset shape (891, 12) and did not explicitly state the key plot findings expected (e.g., 1st class + female highest; survivors skew younger).
Execution Correctness: 2/2
ExcellentAll provided code blocks are syntactically correct and logically consistent (read_csv, groupby mean, seaborn plots, missingness summary). The workflow is likely runnable in a standard notebook environment.
Output Quality: 2/3
GoodOutputs match several expected outcomes: survival rate ~38.38% and missing values (Age 177, Cabin 687, Embarked 2) with correct percentages, and plots were generated. But the analysis text avoids/omits the expected semantic conclusions from the plots and does not show the dataset shape.
Reasoning Quality: 1/2
Needs workReasoning is generally coherent about what the code/plots represent, but it is overly cautious and fails to extract the main insights the task asked for (class/sex survival ranking and age skew).
Reliability: 1/1
ExcellentApproach is standard EDA and reasonably robust (uses dropna for age, computes missingness systematically). Minor fragility: uses display() without importing it, though this typically works in notebooks.