Customer Churn Drivers Analysis in Python on Telco Data

Written by Piotr Płoński · Published on April 24, 2026 · Notebook code

Analyze 7,043 telco customers in Python to identify the main churn drivers, segment high-risk groups, and estimate the business impact of a retention campaign.

This notebook analyzes the Telco Customer Churn dataset with 7,043 customer records and 21 columns covering demographics, contract type, internet services, payment methods, monthly charges, total charges, and churn outcome. The data source is the public telco churn dataset available in the `datasets-for-start` repository and loaded from GitHub at https://github.com/pplonski/datasets-for-start/tree/master/telco-customer-churn.

Using plain-English prompts with an AI data analyst, the workflow loads the dataset, inspects its structure, cleans the `TotalCharges` column, computes churn rates across key customer attributes, and summarizes the highest-risk segments. The notebook then turns those findings into a concrete business recommendation and a simple impact estimate, with all analysis executed automatically in Python using pandas.

The results show that churn is concentrated in month-to-month customers, fiber optic users, electronic-check payers, and customers without online security or tech support. The strongest operational recommendation is a contract-conversion retention campaign for month-to-month customers, with an estimated impact of about 122 retained customers in a conservative scenario and up to 243 in an upside scenario.

Below we publish the full conversation with AI Data Analyst, including prompts, generated code, and notebook outputs.

Open in AI-Assisted Notebook › Load this dataset: https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/telco-customer-churn/Telco-Customer-Churn.csv The dataset contains customer information such as demographics, services, contract type, payment method, monthly charges, and whether the customer churned. Task: Analyze the dataset and identify the key factors driving customer churn.

Thinking

Loading and inspecting dataset

I need to follow an iterative notebook style for this task. Since the user wants to load a dataset and analyze churn factors, I’ll start by loading and inspecting it. I’ll provide a concise code block for this step, ensuring I only focus on loading the data and showing its shape, columns, and a preview with head(). I can use pd.read_csv() for loading and utilize display() without prints, finishing with an expression like df.shape. This will ensure clarity in how the dataset looks initially.Crafting dataset inspection code

I need to follow the rules, which say I can only include one chart or one table per step. However, since the first-load inspection explicitly requires showing the shape, columns, and head, I can include two tables along with the shape output, which seems acceptable despite the default rule. I’ll focus on keeping the code clear and concise while crafting it to meet these requirements. Let's ensure it's straightforward and effective!

Let’s start by loading the dataset and doing a quick inspection so we can confirm the structure before looking at churn drivers.

Load and inspect churn data

import pandas as pd
from IPython.display import display

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/telco-customer-churn/Telco-Customer-Churn.csv"
df = pd.read_csv(url)

display(df.head())
display(pd.DataFrame({"column": df.columns, "dtype": df.dtypes.astype(str)}))
df.shape

	customerID	gender	Partner	Dependents	tenure	PhoneService	MultipleLines	InternetService	OnlineSecurity	...	DeviceProtection	TechSupport	StreamingTV	StreamingMovies	Contract	PaperlessBilling	PaymentMethod	MonthlyCharges	TotalCharges	Churn
0	7590-VHVEG	Female	Yes	No	1	No	No phone service	DSL	No	...	No	No	No	No	Month-to-month	Yes	Electronic check	29.85	29.85	No
1	5575-GNVDE	Male	No	No	34	Yes	No	DSL	Yes	...	Yes	No	No	No	One year	No	Mailed check	56.95	1889.5	No
2	3668-QPYBK	Male	No	No	2	Yes	No	DSL	Yes	...	No	No	No	No	Month-to-month	Yes	Mailed check	53.85	108.15	Yes
3	7795-CFOCW	Male	No	No	45	No	No phone service	DSL	Yes	...	Yes	Yes	No	No	One year	No	Bank transfer (automatic)	42.30	1840.75	No
4	9237-HQITU	Female	No	No	2	Yes	No	Fiber optic	No	...	No	No	No	No	Month-to-month	Yes	Electronic check	70.70	151.65	Yes

5 rows × 21 columns

	column	dtype
customerID	customerID	object
gender	gender	object
SeniorCitizen	SeniorCitizen	int64
Partner	Partner	object
Dependents	Dependents	object
tenure	tenure	int64
PhoneService	PhoneService	object
MultipleLines	MultipleLines	object
InternetService	InternetService	object
OnlineSecurity	OnlineSecurity	object
OnlineBackup	OnlineBackup	object
DeviceProtection	DeviceProtection	object
TechSupport	TechSupport	object
StreamingTV	StreamingTV	object
StreamingMovies	StreamingMovies	object
Contract	Contract	object
PaperlessBilling	PaperlessBilling	object
PaymentMethod	PaymentMethod	object
MonthlyCharges	MonthlyCharges	float64
TotalCharges	TotalCharges	object
Churn	Churn	object

(7043, 21)

From the current output, we only know the dataset size and the available columns. That already tells us this is a good churn dataset for analysis: it includes likely drivers such as tenure, contract type, payment method, monthly charges, total charges, support/security add-ons, and the churn label itself. The main thing still missing is evidence about which of those variables are actually associated with churn.

A minimal next step is to clean TotalCharges and compute churn rates by key categories plus a simple numeric summary.

Summarize churn rates by category

import pandas as pd
from IPython.display import display

tmp = df.copy()
tmp["TotalCharges"] = pd.to_numeric(tmp["TotalCharges"], errors="coerce")
tmp["ChurnFlag"] = tmp["Churn"].map({"Yes": 1, "No": 0})

display(tmp["Churn"].value_counts(normalize=True).rename("share").reset_index())

cols = ["Contract", "InternetService", "PaymentMethod", "OnlineSecurity", "TechSupport", "PaperlessBilling", "SeniorCitizen"]
for col in cols:
    rates = (
        tmp.groupby(col, dropna=False)["ChurnFlag"]
        .agg(["mean", "count"])
        .sort_values("mean", ascending=False)
        .reset_index()
    )
    display(rates)

tmp[["tenure", "MonthlyCharges", "TotalCharges", "ChurnFlag"]].groupby("ChurnFlag").mean().round(2).reset_index()

#10

	Churn	share
0	No	0.73463
1	Yes	0.26537

#11

	Contract	mean	count
0	Month-to-month	0.427097	3875
1	One year	0.112695	1473
2	Two year	0.028319	1695

#12

	InternetService	mean	count
0	Fiber optic	0.418928	3096
1	DSL	0.189591	2421
2	No	0.074050	1526

#13

	PaymentMethod	mean	count
0	Electronic check	0.452854	2365
1	Mailed check	0.191067	1612
2	Bank transfer (automatic)	0.167098	1544
3	Credit card (automatic)	0.152431	1522

#14

	OnlineSecurity	mean	count
0	No	0.417667	3498
1	Yes	0.146112	2019
2	No internet service	0.074050	1526

#15

	TechSupport	mean	count
0	No	0.416355	3473
1	Yes	0.151663	2044
2	No internet service	0.074050	1526

#16

	PaperlessBilling	mean	count
0	Yes	0.335651	4171
1	No	0.163301	2872

#17

	SeniorCitizen	mean	count
0	1	0.416813	1142
1	0	0.236062	5901

#18

	ChurnFlag	tenure	MonthlyCharges	TotalCharges
0	0	37.57	61.27	2555.34
1	1	17.98	74.44	1531.80

#19

Thinking

Analyzing outputs for churn factors

I need to decide between option A or B and analyze outputs. There's churn prevalence data along with group means for Contract, Internet Service, Payment Method, and Online Security. However, some outputs may be truncated, possibly omitting important groups like Tech Support and Senior Citizen. While I have several strong factors already, it might help to consider numeric variables from these remaining categories to strengthen my analysis of "key factors driving churn." A minimal code block could gather any missing evidence.Choosing option B for evidence

It seems the notebook state likely has more output, but it’s truncated. I need to stick to the latest outputs without assuming hidden data. I think I'll choose option B to gather compact tables for the remaining evidence like Tech Support, Paperless Billing, Senior Citizen, and numeric summaries for tenure and charges. It’s important to include one minimal runnable code block. Current outputs suggest missing evidence, so I’ll compute churn rates by suspected variables and numeric means, keeping it concise and straightforward.

#20

The current results already show several strong churn drivers:

Churn is about 26.5%, so the classes are somewhat imbalanced but not extreme.
Contract type is the clearest factor: month-to-month customers churn far more often than customers on one-year or two-year contracts.
Internet service matters a lot: fiber optic customers churn much more than DSL customers, and customers with no internet service churn the least.
Payment method is important: customers using electronic check have the highest churn by a wide margin.
Online security is strongly associated with retention: customers without it churn much more than those who have it.

To finish the answer cleanly, the missing evidence is the rest of the likely drivers visible in the schema: support/billing/demographic effects and the numeric differences in tenure and charges.

#21

Summarize churn by features

from IPython.display import display

for col in ["TechSupport", "PaperlessBilling", "SeniorCitizen"]:
    summary = (
        tmp.groupby(col, dropna=False)["ChurnFlag"]
        .agg(churn_rate="mean", customers="count")
        .sort_values("churn_rate", ascending=False)
        .reset_index()
    )
    display(summary)

numeric_summary = (
    tmp.groupby("Churn")[
        ["tenure", "MonthlyCharges", "TotalCharges"]
    ]
    .mean()
    .round(2)
    .reset_index()
)
display(numeric_summary)

tmp["TotalCharges"].isna().sum()

#22

	TechSupport	churn_rate	customers
0	No	0.416355	3473
1	Yes	0.151663	2044
2	No internet service	0.074050	1526

#23

	PaperlessBilling	churn_rate	customers
0	Yes	0.335651	4171
1	No	0.163301	2872

#24

	SeniorCitizen	churn_rate	customers
0	1	0.416813	1142
1	0	0.236062	5901

#25

	Churn	tenure	MonthlyCharges	TotalCharges
0	No	37.57	61.27	2555.34
1	Yes	17.98	74.44	1531.80

#26

np.int64(11)

#27

The outputs are sufficient to identify the main churn drivers.

The clearest factor is contract type:

Month-to-month customers churn at 42.7%
One-year customers churn at 11.3%
Two-year customers churn at 2.8%

This is a very large gap, so short commitment plans are the strongest churn signal in the data.

Internet service type is another major driver:

Fiber optic customers churn at 41.9%
DSL customers churn at 19.0%
Customers with no internet service churn at 7.4%

That suggests fiber customers are much more likely to leave, possibly due to pricing, service expectations, or competition.

Payment method also stands out:

Electronic check customers churn at 45.3%
Other payment methods are much lower, around 15.2% to 19.1%

So electronic check is a strong risk marker.

Service add-ons are strongly associated with retention:

Without OnlineSecurity, churn is 41.8% vs 14.6% with it
Without TechSupport, churn is 41.6% vs 15.2% with it

Customers lacking these support/protection features are far more likely to churn.

There are also clear customer-profile differences:

Customers with PaperlessBilling = Yes churn at 33.6% vs 16.3% for No
Senior citizens churn at 41.7% vs 23.6% for non-seniors

Finally, churners differ meaningfully on tenure and charges:

Churners have much shorter average tenure: 18.0 months vs 37.6 months
Churners pay higher average monthly charges: 74.44 USD vs 61.27 USD
Their average total charges are lower: 1531.80 USD vs 2555.34 USD, which fits with leaving earlier

Putting it together, the key churn drivers in this dataset are:

Month-to-month contracts
Fiber optic internet
Electronic check payment
No online security
No tech support
Higher monthly charges
Shorter tenure
Paperless billing
Senior citizen status

The overall pattern is that churn is highest among newer customers on flexible contracts, paying more per month, especially those on fiber service and without support/security-related add-ons.

#28

Segment customers into high-risk groups. Recommend ONE business action to reduce churn. Estimate the expected impact of this action. Provide a clear report with numbers and explanations.

#29

Thinking

Analyzing report requirements

I need to provide an answer based on the notebook outputs. The user is asking for a report. I might consider using existing outputs to analyze and give a direct response, possibly with just one step. It’s important to segment the data into high-risk groups and recommend an action. I might estimate the impact using observed rates while being clear with the numbers and explanations. I should also mention the 11 missing TotalCharges that are irrelevant and strive to create a concise report addressing segment intersections.Segmenting high-risk groups

I can segment customers into high-risk groups based on the strongest individual factors. For instance, month-to-month customers show a churn rate of 42.7%, while those using fiber optic have a churn rate of 41.9%. I should consider focusing on the month-to-month customers by offering them migration to one-year contracts since it's the largest group and has the highest churn. If I can convert 1655 churners into one-year customers, I could reduce their churn rate significantly to approximately 11.3%.Estimating impact of adoption

I need to estimate the actual impact from the adoption of one-year contracts. If I target 20% of the month-to-month base, that’s about 775 customers. If they behave like the current one-year group, I could avoid a churn of approximately 243 customers. I should also calculate the overall churn rate improvement, which would lower it by 3.45 percentage points. However, this is optimistic since it’s based on observation, not causation. A more conservative estimate would be a 15.7 percentage point reduction among the converted customers.Calculating churn impact

I’m working on estimating the current churners, figuring out that overall, it's about 1869, based on some calculations. I see a potential drop in churn rate to around 24.8%, which seems promising.

I will also provide a base case and a conservative range for the expected impact. One idea is creating a retention bundle, like a 12-month contract with benefits. However, with the user’s focus on one action, I'll suggest offering that retention plan to customers.Segmenting high-risk customers

I’m considering how to segment customers into high-risk groups for better prioritization. The top priority includes those with observed churn rates of 40% or more, such as electronic checks at 45.3% and month-to-month plans at 42.7%. For a secondary group with 30-40% churn, there's paperless billing at 33.6%.

Churners tend to have an average tenure of 18 months and higher charges, indicating they’re relatively early in their lifecycle. I’ll emphasize that we should prioritize customers within their first 18 months. However, I need to be careful in claiming causality; I should note it's an association, not proof. I also think compiling this information into a bullet list or table could help present the data clearly.Preparing report insights

I think we're being asked to provide insights in a notebook context. Since there are already sufficient outputs, it might be best just to answer without including code.

I should consider using natural headings in a markdown report, and a table might help. Using percentages and avoiding dollar signs could be clearer as well.

For revenue impact, if we estimate how many customers are retained, the average charges of 74.44 USD could mean preserving around 9,082 USD or up to 18,089 USD in recurring revenue.Calculating revenue impact

I want to compute the potential revenue impact for the user. For 243 customers, that would be about 18,088.92 USD, and for a more conservative estimate of 122 customers, it’s around 9,081.68 USD.

I should avoid assuming that the monthly charges are the same for different groups, but using the average monthly charges of churners, which is 74.44 USD, can work as a rough proxy.

Overall, I could say that roughly 9,000 to 18,000 USD in monthly recurring revenue could be protected. It’s also worth mentioning the missing 11 TotalCharges, but that shouldn't be a significant issue.Considering customer segmentation

I’m calculating the risk based on churn compared to our baseline metrics. Roughly, it shows a few ratios: like 1.707 and 1.61, which highlight potential churn. It seems the month-to-month segment is quite significant, making up 55% of the base. Even converting 20% of that means we could reach about 775 customers.

I think we should segment these into tiers, focusing on urgent outreach for the high-risk groups while explaining overlaps that may increase risk further. Let’s craft a clear action plan around that!Evaluating retention strategies

I’m focusing on converting month-to-month customers to a 1-year contract with a retention offer, ideally including a small incentive. However, I think that's more of a campaign rather than a single action. For expected impacts, I'm estimating a 20% acceptance rate among these customers, which means around 775 might be targeted. If we conservatively retain about 122 customers, that would reduce churn from 26.5% to about 24.8%, preserving roughly $9,082 in monthly recurring revenue. In a better scenario, it could be 243 retained, dropping churn to 23.1%. I’ll need to be precise with these calculations!

#30

Here is a clear business report based on the notebook results.

Churn snapshot

Total customers: 7,043
Overall churn rate: 26.5%
Estimated churned customers: about 1,869

Churners also look different from non-churners:

Average tenure: 18.0 months for churners vs 37.6 months for non-churners
Average monthly charges: 74.44 USD for churners vs 61.27 USD for non-churners

So the customers most at risk tend to be newer and paying higher monthly bills.

High-risk customer segments

A practical risk segmentation from the observed churn rates is:

Tier 1: highest-risk groups

These groups all churn at around 42% to 45%, far above the 26.5% overall rate.

Segment	Customers	Churn rate	Comment
Electronic check	2,365	45.3%	Highest single churn rate observed
Month-to-month contract	3,875	42.7%	Large and very risky
Fiber optic internet	3,096	41.9%	Strong churn signal
No online security	3,498	41.8%	Much higher risk than customers with security
Senior citizens	1,142	41.7%	Smaller segment, but very high risk
No tech support	3,473	41.6%	Strong retention gap

Tier 2: elevated-risk group

Segment	Customers	Churn rate	Comment
Paperless billing = Yes	4,171	33.6%	Above average, but less severe than Tier 1

Low-risk comparison groups

These groups help show where retention is already stronger:

One-year contract: 11.3%
Two-year contract: 2.8%
Online security = Yes: 14.6%
Tech support = Yes: 15.2%
No internet service: 7.4%

What this means

The biggest churn pattern is not one isolated factor. It is a profile:

short-commitment customers
paying higher monthly charges
often with fiber optic
often using electronic check
often lacking security/support add-ons

In practice, a customer matching several of these conditions is very likely to be in the highest-risk pool.

Recommended action: one retention campaign

I would recommend one action:

Target month-to-month customers with an offer to move to a 1-year contract.

Examples of the offer:

a small monthly discount
a one-time bill credit
or bundled value such as free online security / tech support for a limited period

Why this action?

It targets the largest high-risk segment
- Month-to-month customers = 3,875
- That is about 55% of the entire customer base
It has the clearest churn gap
- Month-to-month churn: 42.7%
- One-year churn: 11.3%
- Observed gap: 31.4 percentage points
It is operationally simple
- You can run it as one outbound retention campaign
- It is easy to target and easy to measure

Expected impact estimate

Since this is observational data, not a controlled experiment, the safest way is to estimate impact with explicit assumptions.

Assumption set

You convert 20% of month-to-month customers to the new 1-year offer
That means 775 customers affected
To stay conservative, assume the campaign achieves only half of the observed churn gap between month-to-month and one-year customers

Calculation

Observed churn gap: 42.7% - 11.3% = 31.4 percentage points
Conservative realized improvement: 15.7 percentage points
Expected avoided churn:
775 × 15.7% ≈ 122 customers retained

Business impact

Current churned customers: about 1,869
After retaining 122, churn would fall to about 1,747
New overall churn rate: about 24.8%
Improvement: about 1.7 percentage points

Revenue interpretation

Using the average monthly charge of churners (74.44 USD) as a rough proxy:

122 retained customers × 74.44 USD ≈ 9,082 USD monthly recurring revenue preserved

Upside scenario

If the full observed gap were achieved for those converted customers:

Avoided churn: 775 × 31.4% ≈ 243 customers
Overall churn reduction: about 3.5 percentage points
Monthly recurring revenue preserved: about 18,089 USD

So a reasonable impact range is:

Conservative: retain about 122 customers
Upper bound: retain about 243 customers

Final recommendation

If you do only one thing, do this:

Launch a contract-conversion retention campaign for month-to-month customers, prioritizing those with higher monthly charges and shorter tenure.

Why this is the best single action:

biggest risky segment
strongest practical lever
easiest to target
measurable impact
likely worth roughly 122 to 243 retained customers, depending on execution quality

If you want, the next step can be a small scoring table that combines these risk signals into a simple high/medium/low churn-risk list for the customer base.

Start using AI for Data Analysis

MLJAR Studio helps you analyze data with AI, run machine learning workflows, and build reproducible notebook-based results on your own computer.

Runs locally • Supports local LLMs

Download Studio

View Documentation