Statistics
Hypothesis Testing in Python (t-test, ANOVA)
Perform t-tests, chi-square tests, and ANOVA using real data to answer business questions — guided by an AI data analyst.
What
This AI Data Analyst workflow loads the Titanic training dataset from a URL and runs a two-sample t-test to compare average age between survivors and non-survivors. It performs a chi-square test of independence to assess whether survival is associated with passenger class. It runs a one-way ANOVA to test whether fare differs across the three passenger classes and summarizes the results in plain language.
Who
This is for analysts and students who need a concrete, code-driven example of hypothesis testing on a real dataset. It helps anyone answering business-style questions with statistical tests and interpretable outputs in a notebook.
Tools
- pandas
- numpy
- scipy
- statsmodels
- matplotlib
- seaborn
Outcomes
- t-test result comparing survivor vs non-survivor age with p-value and interpretation
- chi-square test result for survival vs passenger class independence with p-value and interpretation
- one-way ANOVA result for fare differences across classes with p-value and interpretation
- plain-language summary table consolidating all three hypothesis tests
Quality Score
9/10
Last scored: Apr 7, 2026
Task Completion: 2/2
ExcellentAll three requested hypothesis tests (Welch t-test on Age by survival, chi-square independence for Survived×Pclass, and one-way ANOVA for Fare by Pclass) were performed and interpreted, and a combined narrative summary was provided.
Execution Correctness: 2/2
ExcellentThe Python code uses appropriate libraries (pandas, scipy.stats) and correct functions/inputs (ttest_ind with equal_var=False, chi2_contingency on crosstab, f_oneway on grouped fares); outputs shown are consistent with the code.
Output Quality: 2/3
GoodOutputs match expected outcomes semantically: t-test p<0.05 with survivors younger, chi-square p<<0.001 showing dependence, and ANOVA p<<0.001 showing fare differs by class. However, the expected 'plain-language summary table of all three tests' is not actually presented as a single table (only narrative sections and separate result DataFrames).
Reasoning Quality: 2/2
ExcellentReasoning correctly states null hypotheses, uses p-values to make decisions, and provides sensible business interpretations (association of class with survival; fare tied to class; age effect small).
Reliability: 1/1
ExcellentWorkflow is reasonably robust (handles missing Age/Fare via dropna, uses Welch’s t-test to avoid equal-variance assumption) and avoids unsupported claims beyond the test results.