Statistics

Hypothesis Testing in Python (t-test, ANOVA)

Perform t-tests, chi-square tests, and ANOVA using real data to answer business questions — guided by an AI data analyst.

What

This AI Data Analyst workflow loads the Titanic training dataset from a URL and runs a two-sample t-test to compare average age between survivors and non-survivors. It performs a chi-square test of independence to assess whether survival is associated with passenger class. It runs a one-way ANOVA to test whether fare differs across the three passenger classes and summarizes the results in plain language.

Who

This is for analysts and students who need a concrete, code-driven example of hypothesis testing on a real dataset. It helps anyone answering business-style questions with statistical tests and interpretable outputs in a notebook.

Tools

  • pandas
  • numpy
  • scipy
  • statsmodels
  • matplotlib
  • seaborn

Outcomes

  • t-test result comparing survivor vs non-survivor age with p-value and interpretation
  • chi-square test result for survival vs passenger class independence with p-value and interpretation
  • one-way ANOVA result for fare differences across classes with p-value and interpretation
  • plain-language summary table consolidating all three hypothesis tests

Quality Score

9/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

  • Task Completion: 2/2

    Excellent

    All three requested hypothesis tests (Welch t-test on Age by survival, chi-square independence for Survived×Pclass, and one-way ANOVA for Fare by Pclass) were performed and interpreted, and a combined narrative summary was provided.

  • Execution Correctness: 2/2

    Excellent

    The Python code uses appropriate libraries (pandas, scipy.stats) and correct functions/inputs (ttest_ind with equal_var=False, chi2_contingency on crosstab, f_oneway on grouped fares); outputs shown are consistent with the code.

  • Output Quality: 2/3

    Good

    Outputs match expected outcomes semantically: t-test p<0.05 with survivors younger, chi-square p<<0.001 showing dependence, and ANOVA p<<0.001 showing fare differs by class. However, the expected 'plain-language summary table of all three tests' is not actually presented as a single table (only narrative sections and separate result DataFrames).

  • Reasoning Quality: 2/2

    Excellent

    Reasoning correctly states null hypotheses, uses p-values to make decisions, and provides sensible business interpretations (association of class with survival; fare tied to class; age effect small).

  • Reliability: 1/1

    Excellent

    Workflow is reasonably robust (handles missing Age/Fare via dropna, uses Welch’s t-test to avoid equal-variance assumption) and avoids unsupported claims beyond the test results.

load titanic data from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/Titanic/train.csv and test whether average age differs significantly between survivors and non-survivors using a t-test
Thinking

Setting up data display

I'm planning to use pandas to import a CSV file from a URL. First, I’ll read the file with pd.read_csv(url), then I’ll display the shape of the DataFrame and its columns. I’ll use df.shape to get the number of rows and columns and create a simple display of that information. Finally, I’ll show the first few rows with df.head(). This approach seems fine!

Let’s start by loading the Titanic training data and doing a quick inspection so we can verify the columns before running the t-test.

Load Titanic data
import pandas as pd

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/Titanic/train.csv"
df = pd.read_csv(url)

display(pd.DataFrame({"rows": [df.shape[0]], "columns_count": [df.shape[1]]}))
display(pd.DataFrame({"column_name": df.columns}))
df.head()
rows columns_count
0 891 12
column_name
0 PassengerId
1 Survived
2 Pclass
3 Name
4 Sex
5 Age
6 SibSp
7 Parch
8 Ticket
9 Fare
10 Cabin
11 Embarked
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

The data is in place and the columns needed for your question are confirmed: Survived and Age are both present, so the missing piece is the actual statistical test.

Compare age by survival
from scipy import stats

age_survived = df.loc[df["Survived"] == 1, "Age"].dropna()
age_not_survived = df.loc[df["Survived"] == 0, "Age"].dropna()

result = {
    "mean_age_survived": age_survived.mean(),
    "mean_age_not_survived": age_not_survived.mean(),
    "n_survived": age_survived.shape[0],
    "n_not_survived": age_not_survived.shape[0],
    "t_statistic": stats.ttest_ind(age_survived, age_not_survived, equal_var=False).statistic,
    "p_value": stats.ttest_ind(age_survived, age_not_survived, equal_var=False).pvalue,
}

pd.DataFrame([result])
mean_age_survived mean_age_not_survived n_survived n_not_survived t_statistic p_value
0 28.34369 30.626179 290 424 -2.04603 0.04119

Using the observed results from the Titanic training data:

  • Average age of survivors: 28.34
  • Average age of non-survivors: 30.63
  • Sample sizes after removing missing ages: 290 survivors and 424 non-survivors

A Welch two-sample t-test was used, which does not assume equal variances. The test returned:

  • t = -2.046
  • p = 0.04119

Since the p-value is below 0.05, there is statistically significant evidence at the 5% level that the average age differs between survivors and non-survivors.

In this sample, survivors were younger on average than non-survivors by about 2.28 years.

test whether survival rate is independent of passenger class using a chi-square test
Thinking

Planning chi-square test

I need to respond with a clear next step for performing a chi-square test on passenger survival rates and class. I’ll analyze existing outputs and the notebook state. The user asked for code that constructs a contingency table and uses scipy.stats.chi2_contingency. I might display the table and results together carefully, ensuring everything flows well since I'll only provide one code block. I have to remember to include an explanation alongside the code.Clarifying chi-square approach

I want to focus on computing a contingency table and performing the chi-square test in one step. Since the notebook allows only one table per output, I’ll stick to providing a single DataFrame with the test summary and no extra tables. It might be helpful to calculate the survival rates by class, but it isn't necessary for this task. I'll clearly state the null hypothesis and call chi2_contingency using the crosstab. Finally, I’ll return a DataFrame with the results like chi-square value, p-value, and degrees of freedom.

We already have the Titanic data loaded, so the next step is the chi-square test on the contingency table of Pclass vs Survived.

Test Pclass survival independence
from scipy.stats import chi2_contingency

contingency = pd.crosstab(df["Pclass"], df["Survived"])
chi2, p_value, dof, expected = chi2_contingency(contingency)

pd.DataFrame([{
    "chi2_statistic": chi2,
    "degrees_of_freedom": dof,
    "p_value": p_value,
    "independent_at_5pct": p_value >= 0.05
}])
chi2_statistic degrees_of_freedom p_value independent_at_5pct
0 102.888989 2 4.549252e-23 False

The chi-square test shows that survival is not independent of passenger class.

From the observed output:

  • \chi^2 = 102.89
  • degrees of freedom = 2
  • p = 4.55 × 10^{-23}

Because the p-value is far below 0.05, we reject the null hypothesis of independence. This means there is very strong evidence of an association between Pclass and Survived.

So, in this Titanic dataset, survival rate differs by passenger class.

test whether fare differs significantly across the three passenger classes using ANOVA
Thinking

Planning ANOVA Analysis

I need to provide an answer for a new user prompt about testing whether fare differs significantly across three passenger classes using ANOVA. I'll use the notebook state to load the DataFrame and write a single code block to run ANOVA on fare across Pclass groups, dropping any NA values. It seems best to create a simple result DataFrame with the F statistic, p-value, and possibly means. I want to keep the explanation concise and focused on the code.Creating ANOVA Summary Table

The user wants to know if the fares differ significantly, so I need to return a DataFrame that includes sample sizes and means, along with ANOVA test statistics in one row. It would be best to provide a summary that has these values for each class but keep everything in a single table. I'll ensure to mention that fares might be skewed, but ANOVA is still standard for this analysis since they specifically asked for it.

Let’s run a one-way ANOVA for Fare across the three Pclass groups.

Compare fares across classes
from scipy.stats import f_oneway

fare_1 = df.loc[df["Pclass"] == 1, "Fare"].dropna()
fare_2 = df.loc[df["Pclass"] == 2, "Fare"].dropna()
fare_3 = df.loc[df["Pclass"] == 3, "Fare"].dropna()

f_stat, p_value = f_oneway(fare_1, fare_2, fare_3)

pd.DataFrame([{
    "mean_fare_class_1": fare_1.mean(),
    "mean_fare_class_2": fare_2.mean(),
    "mean_fare_class_3": fare_3.mean(),
    "n_class_1": fare_1.shape[0],
    "n_class_2": fare_2.shape[0],
    "n_class_3": fare_3.shape[0],
    "f_statistic": f_stat,
    "p_value": p_value,
    "differs_at_5pct": p_value < 0.05
}])
mean_fare_class_1 mean_fare_class_2 mean_fare_class_3 n_class_1 n_class_2 n_class_3 f_statistic p_value differs_at_5pct
0 84.154687 20.662183 13.67555 216 184 491 242.344157 1.031376e-84 True

Yes—Fare differs significantly across the three passenger classes based on the ANOVA results.

Observed results:

  • Mean fare in class 1: 84.15
  • Mean fare in class 2: 20.66
  • Mean fare in class 3: 13.68
  • ANOVA F = 242.34
  • p = 1.03 × 10^{-84}

Interpretation:

  • The p-value is far below 0.05, so we reject the null hypothesis that all three passenger classes have the same mean fare.
  • This provides extremely strong evidence that average fare is not the same across classes.
  • The class means also show a clear pattern: first class has the highest average fare, second class is much lower, and third class is the lowest.

So, fare varies significantly by passenger class in this dataset.

summarise all three test results and their business interpretation

Here’s a concise summary of the three statistical tests and what they mean in business terms.

| Question | Test | Key result | Conclusion | |---|---|---:|---| | Do survivors and non-survivors differ in age? | Welch two-sample t-test | t=-2.046, p=0.041 | Yes, slightly. Survivors were younger on average. | | Is passenger class associated with survival? | Chi-square test of independence | \chi^2=102.889, p=4.55 × 10^{-23} | Yes, very strongly. Survival depended on class. | | Does fare differ across the three passenger classes? | One-way ANOVA | F=242.344, p=1.03 × 10^{-84} | Yes, very strongly. Mean fare differs across classes. |

1) Age vs survival

  • Mean age of survivors: 28.34
  • Mean age of non-survivors: 30.63
  • Since p=0.041 < 0.05, the age difference is statistically significant.

Business interpretation:
Younger passengers had a modest survival advantage. Age matters, but the difference is not huge, so it is likely a secondary factor rather than the main driver.

2) Passenger class vs survival

  • The p-value is extremely small: \4.55 × 10^{-23}
  • This means survival and passenger class are clearly not independent.

Business interpretation:
Passenger class was a major factor in survival outcomes. In practical terms, class reflects unequal access to safety, priority, or onboard conditions. If this were a customer-risk or service-outcome setting, it would suggest that customer tier strongly affects results.

3) Fare vs passenger class

  • Mean fare, class 1: 84.15
  • Mean fare, class 2: 20.66
  • Mean fare, class 3: 13.68
  • The p-value is extremely small: \1.03 × 10^{-84}

Business interpretation:
Fare levels are clearly separated by class, which confirms that Pclass represents a real pricing segmentation. First-class passengers paid much more on average than second- and third-class passengers. This supports treating class as a strong proxy for customer value or socioeconomic tier.

Overall takeaway

Together, these three tests tell a consistent story:

  • Class matters a lot
    • It is strongly linked to both survival and fare.
  • Age matters somewhat
    • Survivors were a bit younger on average, but the effect is much weaker than class.
  • Fare and class align closely
    • Higher-class passengers paid substantially more, so fare and class both capture passenger status.

If you want, the next useful step is a short executive summary paragraph or a single chart for each result.

Browse All Analysis Examples