Statistics

Hypothesis Testing in Python (t-test, ANOVA)

Perform t-tests, chi-square tests, and ANOVA using real data to answer business questions — guided by an AI data analyst.

What

This AI Data Analyst workflow loads the Titanic training dataset from a URL and runs a two-sample t-test to compare average age between survivors and non-survivors. It performs a chi-square test of independence to assess whether survival is associated with passenger class. It runs a one-way ANOVA to test whether fare differs across the three passenger classes and summarizes the results in plain language.

Who

This is for analysts and students who need a concrete, code-driven example of hypothesis testing on a real dataset. It helps anyone answering business-style questions with statistical tests and interpretable outputs in a notebook.

Tools

pandas
numpy
scipy
statsmodels
matplotlib
seaborn

Outcomes

t-test result comparing survivor vs non-survivor age with p-value and interpretation
chi-square test result for survival vs passenger class independence with p-value and interpretation
one-way ANOVA result for fare differences across classes with p-value and interpretation
plain-language summary table consolidating all three hypothesis tests

Quality Score

9/10

gpt-5.2mljar-ai

Last scored: Apr 7, 2026

Task Completion: 2/2
Excellent
All three requested hypothesis tests (Welch t-test on Age by survival, chi-square independence for Survived×Pclass, and one-way ANOVA for Fare by Pclass) were performed and interpreted, and a combined narrative summary was provided.
Execution Correctness: 2/2
Excellent
The Python code uses appropriate libraries (pandas, scipy.stats) and correct functions/inputs (ttest_ind with equal_var=False, chi2_contingency on crosstab, f_oneway on grouped fares); outputs shown are consistent with the code.
Output Quality: 2/3
Good
Outputs match expected outcomes semantically: t-test p<0.05 with survivors younger, chi-square p<<0.001 showing dependence, and ANOVA p<<0.001 showing fare differs by class. However, the expected 'plain-language summary table of all three tests' is not actually presented as a single table (only narrative sections and separate result DataFrames).
Reasoning Quality: 2/2
Excellent
Reasoning correctly states null hypotheses, uses p-values to make decisions, and provides sensible business interpretations (association of class with survival; fare tied to class; age effect small).
Reliability: 1/1
Excellent
Workflow is reasonably robust (handles missing Age/Fare via dropna, uses Welch’s t-test to avoid equal-variance assumption) and avoids unsupported claims beyond the test results.

load titanic data from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/Titanic/train.csv and test whether average age differs significantly between survivors and non-survivors using a t-test

Thinking

Setting up data display

I'm planning to use pandas to import a CSV file from a URL. First, I’ll read the file with pd.read_csv(url), then I’ll display the shape of the DataFrame and its columns. I’ll use df.shape to get the number of rows and columns and create a simple display of that information. Finally, I’ll show the first few rows with df.head(). This approach seems fine!

Let’s start by loading the Titanic training data and doing a quick inspection so we can verify the columns before running the t-test.

Load Titanic data

import pandas as pd

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/Titanic/train.csv"
df = pd.read_csv(url)

display(pd.DataFrame({"rows": [df.shape[0]], "columns_count": [df.shape[1]]}))
display(pd.DataFrame({"column_name": df.columns}))
df.head()

	rows	columns_count
0	891	12

	column_name
0	PassengerId
1	Survived
2	Pclass
3	Name
4	Sex
5	Age
6	SibSp
7	Parch
8	Ticket
9	Fare
10	Cabin
11	Embarked

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

The data is in place and the columns needed for your question are confirmed: Survived and Age are both present, so the missing piece is the actual statistical test.

Compare age by survival

from scipy import stats

age_survived = df.loc[df["Survived"] == 1, "Age"].dropna()
age_not_survived = df.loc[df["Survived"] == 0, "Age"].dropna()

result = {
    "mean_age_survived": age_survived.mean(),
    "mean_age_not_survived": age_not_survived.mean(),
    "n_survived": age_survived.shape[0],
    "n_not_survived": age_not_survived.shape[0],
    "t_statistic": stats.ttest_ind(age_survived, age_not_survived, equal_var=False).statistic,
    "p_value": stats.ttest_ind(age_survived, age_not_survived, equal_var=False).pvalue,
}

pd.DataFrame([result])

#10

	mean_age_survived	mean_age_not_survived	n_survived	n_not_survived	t_statistic	p_value
0	28.34369	30.626179	290	424	-2.04603	0.04119

#11

Using the observed results from the Titanic training data:

Average age of survivors: 28.34
Average age of non-survivors: 30.63
Sample sizes after removing missing ages: 290 survivors and 424 non-survivors

A Welch two-sample t-test was used, which does not assume equal variances. The test returned:

t = -2.046
p = 0.04119

Since the p-value is below 0.05, there is statistically significant evidence at the 5% level that the average age differs between survivors and non-survivors.

In this sample, survivors were younger on average than non-survivors by about 2.28 years.

#12

test whether survival rate is independent of passenger class using a chi-square test

#13

Thinking

Planning chi-square test

I need to respond with a clear next step for performing a chi-square test on passenger survival rates and class. I’ll analyze existing outputs and the notebook state. The user asked for code that constructs a contingency table and uses scipy.stats.chi2_contingency. I might display the table and results together carefully, ensuring everything flows well since I'll only provide one code block. I have to remember to include an explanation alongside the code.Clarifying chi-square approach

I want to focus on computing a contingency table and performing the chi-square test in one step. Since the notebook allows only one table per output, I’ll stick to providing a single DataFrame with the test summary and no extra tables. It might be helpful to calculate the survival rates by class, but it isn't necessary for this task. I'll clearly state the null hypothesis and call chi2_contingency using the crosstab. Finally, I’ll return a DataFrame with the results like chi-square value, p-value, and degrees of freedom.

#14

We already have the Titanic data loaded, so the next step is the chi-square test on the contingency table of Pclass vs Survived.

#15

Test Pclass survival independence

from scipy.stats import chi2_contingency

contingency = pd.crosstab(df["Pclass"], df["Survived"])
chi2, p_value, dof, expected = chi2_contingency(contingency)

pd.DataFrame([{
    "chi2_statistic": chi2,
    "degrees_of_freedom": dof,
    "p_value": p_value,
    "independent_at_5pct": p_value >= 0.05
}])

#16

	chi2_statistic	degrees_of_freedom	p_value	independent_at_5pct
0	102.888989	2	4.549252e-23	False

#17

The chi-square test shows that survival is not independent of passenger class.

From the observed output:

\chi^2 = 102.89
degrees of freedom = 2
p = 4.55 × 10^{-23}

Because the p-value is far below 0.05, we reject the null hypothesis of independence. This means there is very strong evidence of an association between Pclass and Survived.

So, in this Titanic dataset, survival rate differs by passenger class.

#18

test whether fare differs significantly across the three passenger classes using ANOVA

#19

Thinking

Planning ANOVA Analysis

I need to provide an answer for a new user prompt about testing whether fare differs significantly across three passenger classes using ANOVA. I'll use the notebook state to load the DataFrame and write a single code block to run ANOVA on fare across Pclass groups, dropping any NA values. It seems best to create a simple result DataFrame with the F statistic, p-value, and possibly means. I want to keep the explanation concise and focused on the code.Creating ANOVA Summary Table

The user wants to know if the fares differ significantly, so I need to return a DataFrame that includes sample sizes and means, along with ANOVA test statistics in one row. It would be best to provide a summary that has these values for each class but keep everything in a single table. I'll ensure to mention that fares might be skewed, but ANOVA is still standard for this analysis since they specifically asked for it.

#20

Let’s run a one-way ANOVA for Fare across the three Pclass groups.

#21

Compare fares across classes

from scipy.stats import f_oneway

fare_1 = df.loc[df["Pclass"] == 1, "Fare"].dropna()
fare_2 = df.loc[df["Pclass"] == 2, "Fare"].dropna()
fare_3 = df.loc[df["Pclass"] == 3, "Fare"].dropna()

f_stat, p_value = f_oneway(fare_1, fare_2, fare_3)

pd.DataFrame([{
    "mean_fare_class_1": fare_1.mean(),
    "mean_fare_class_2": fare_2.mean(),
    "mean_fare_class_3": fare_3.mean(),
    "n_class_1": fare_1.shape[0],
    "n_class_2": fare_2.shape[0],
    "n_class_3": fare_3.shape[0],
    "f_statistic": f_stat,
    "p_value": p_value,
    "differs_at_5pct": p_value < 0.05
}])

#22

	mean_fare_class_1	mean_fare_class_2	mean_fare_class_3	n_class_1	n_class_2	n_class_3	f_statistic	p_value	differs_at_5pct
0	84.154687	20.662183	13.67555	216	184	491	242.344157	1.031376e-84	True

#23

Yes—Fare differs significantly across the three passenger classes based on the ANOVA results.

Observed results:

Mean fare in class 1: 84.15
Mean fare in class 2: 20.66
Mean fare in class 3: 13.68
ANOVA F = 242.34
p = 1.03 × 10^{-84}

Interpretation:

The p-value is far below 0.05, so we reject the null hypothesis that all three passenger classes have the same mean fare.
This provides extremely strong evidence that average fare is not the same across classes.
The class means also show a clear pattern: first class has the highest average fare, second class is much lower, and third class is the lowest.

So, fare varies significantly by passenger class in this dataset.

#24

summarise all three test results and their business interpretation

#25

Here’s a concise summary of the three statistical tests and what they mean in business terms.

| Question | Test | Key result | Conclusion | |---|---|---:|---| | Do survivors and non-survivors differ in age? | Welch two-sample t-test | t=-2.046, p=0.041 | Yes, slightly. Survivors were younger on average. | | Is passenger class associated with survival? | Chi-square test of independence | \chi^2=102.889, p=4.55 × 10^{-23} | Yes, very strongly. Survival depended on class. | | Does fare differ across the three passenger classes? | One-way ANOVA | F=242.344, p=1.03 × 10^{-84} | Yes, very strongly. Mean fare differs across classes. |

1) Age vs survival

Mean age of survivors: 28.34
Mean age of non-survivors: 30.63
Since p=0.041 < 0.05, the age difference is statistically significant.

Business interpretation:
Younger passengers had a modest survival advantage. Age matters, but the difference is not huge, so it is likely a secondary factor rather than the main driver.

2) Passenger class vs survival

The p-value is extremely small: \4.55 × 10^{-23}
This means survival and passenger class are clearly not independent.

Business interpretation:
Passenger class was a major factor in survival outcomes. In practical terms, class reflects unequal access to safety, priority, or onboard conditions. If this were a customer-risk or service-outcome setting, it would suggest that customer tier strongly affects results.

3) Fare vs passenger class

Mean fare, class 1: 84.15
Mean fare, class 2: 20.66
Mean fare, class 3: 13.68
The p-value is extremely small: \1.03 × 10^{-84}

Business interpretation:
Fare levels are clearly separated by class, which confirms that Pclass represents a real pricing segmentation. First-class passengers paid much more on average than second- and third-class passengers. This supports treating class as a strong proxy for customer value or socioeconomic tier.

Overall takeaway

Together, these three tests tell a consistent story:

Class matters a lot
- It is strongly linked to both survival and fare.
Age matters somewhat
- Survivors were a bit younger on average, but the effect is much weaker than class.
Fare and class align closely
- Higher-class passengers paid substantially more, so fare and class both capture passenger status.

If you want, the next useful step is a short executive summary paragraph or a single chart for each result.

Hypothesis Testing in Python (t-test, ANOVA)

What

Who

Tools

Outcomes

Quality Score

1) Age vs survival

2) Passenger class vs survival

3) Fare vs passenger class

Overall takeaway

Browse All Analysis Examples

Machine Learning

Time Series

Data Analysis

Exploratory Data Analysis

Finance

Nlp

Statistics