Customer Churn Drivers Analysis in Python on Telco Data

Written by Piotr Płoński · Published on April 24, 2026 · Notebook code

Analyze 7,043 telco customers in Python to identify the main churn drivers, segment high-risk groups, and estimate the business impact of a retention campaign.

This notebook analyzes the Telco Customer Churn dataset with 7,043 customer records and 21 columns covering demographics, contract type, internet services, payment methods, monthly charges, total charges, and churn outcome. The data source is the public telco churn dataset available in the `datasets-for-start` repository and loaded from GitHub at https://github.com/pplonski/datasets-for-start/tree/master/telco-customer-churn.

Using plain-English prompts with an AI data analyst, the workflow loads the dataset, inspects its structure, cleans the `TotalCharges` column, computes churn rates across key customer attributes, and summarizes the highest-risk segments. The notebook then turns those findings into a concrete business recommendation and a simple impact estimate, with all analysis executed automatically in Python using pandas.

The results show that churn is concentrated in month-to-month customers, fiber optic users, electronic-check payers, and customers without online security or tech support. The strongest operational recommendation is a contract-conversion retention campaign for month-to-month customers, with an estimated impact of about 122 retained customers in a conservative scenario and up to 243 in an upside scenario.

Below we publish the full conversation with AI Data Analyst, including prompts, generated code, and notebook outputs.
Open in AI-Assisted Notebook › Load this dataset: https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/telco-customer-churn/Telco-Customer-Churn.csv The dataset contains customer information such as demographics, services, contract type, payment method, monthly charges, and whether the customer churned. Task: Analyze the dataset and identify the key factors driving customer churn.
Thinking

Loading and inspecting dataset

I need to follow an iterative notebook style for this task. Since the user wants to load a dataset and analyze churn factors, I’ll start by loading and inspecting it. I’ll provide a concise code block for this step, ensuring I only focus on loading the data and showing its shape, columns, and a preview with head(). I can use pd.read_csv() for loading and utilize display() without prints, finishing with an expression like df.shape. This will ensure clarity in how the dataset looks initially.Crafting dataset inspection code

I need to follow the rules, which say I can only include one chart or one table per step. However, since the first-load inspection explicitly requires showing the shape, columns, and head, I can include two tables along with the shape output, which seems acceptable despite the default rule. I’ll focus on keeping the code clear and concise while crafting it to meet these requirements. Let's ensure it's straightforward and effective!

Let’s start by loading the dataset and doing a quick inspection so we can confirm the structure before looking at churn drivers.

Load and inspect churn data
import pandas as pd
from IPython.display import display

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/telco-customer-churn/Telco-Customer-Churn.csv"
df = pd.read_csv(url)

display(df.head())
display(pd.DataFrame({"column": df.columns, "dtype": df.dtypes.astype(str)}))
df.shape
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity ... DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 7590-VHVEG Female 0 Yes No 1 No No phone service DSL No ... No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes ... Yes No No No One year No Mailed check 56.95 1889.5 No
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes ... No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 7795-CFOCW Male 0 No No 45 No No phone service DSL Yes ... Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 9237-HQITU Female 0 No No 2 Yes No Fiber optic No ... No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes

5 rows × 21 columns

column dtype
customerID customerID object
gender gender object
SeniorCitizen SeniorCitizen int64
Partner Partner object
Dependents Dependents object
tenure tenure int64
PhoneService PhoneService object
MultipleLines MultipleLines object
InternetService InternetService object
OnlineSecurity OnlineSecurity object
OnlineBackup OnlineBackup object
DeviceProtection DeviceProtection object
TechSupport TechSupport object
StreamingTV StreamingTV object
StreamingMovies StreamingMovies object
Contract Contract object
PaperlessBilling PaperlessBilling object
PaymentMethod PaymentMethod object
MonthlyCharges MonthlyCharges float64
TotalCharges TotalCharges object
Churn Churn object
(7043, 21)

From the current output, we only know the dataset size and the available columns. That already tells us this is a good churn dataset for analysis: it includes likely drivers such as tenure, contract type, payment method, monthly charges, total charges, support/security add-ons, and the churn label itself. The main thing still missing is evidence about which of those variables are actually associated with churn.

A minimal next step is to clean TotalCharges and compute churn rates by key categories plus a simple numeric summary.

Summarize churn rates by category
import pandas as pd
from IPython.display import display

tmp = df.copy()
tmp["TotalCharges"] = pd.to_numeric(tmp["TotalCharges"], errors="coerce")
tmp["ChurnFlag"] = tmp["Churn"].map({"Yes": 1, "No": 0})

display(tmp["Churn"].value_counts(normalize=True).rename("share").reset_index())

cols = ["Contract", "InternetService", "PaymentMethod", "OnlineSecurity", "TechSupport", "PaperlessBilling", "SeniorCitizen"]
for col in cols:
    rates = (
        tmp.groupby(col, dropna=False)["ChurnFlag"]
        .agg(["mean", "count"])
        .sort_values("mean", ascending=False)
        .reset_index()
    )
    display(rates)

tmp[["tenure", "MonthlyCharges", "TotalCharges", "ChurnFlag"]].groupby("ChurnFlag").mean().round(2).reset_index()
Churn share
0 No 0.73463
1 Yes 0.26537
Contract mean count
0 Month-to-month 0.427097 3875
1 One year 0.112695 1473
2 Two year 0.028319 1695
InternetService mean count
0 Fiber optic 0.418928 3096
1 DSL 0.189591 2421
2 No 0.074050 1526
PaymentMethod mean count
0 Electronic check 0.452854 2365
1 Mailed check 0.191067 1612
2 Bank transfer (automatic) 0.167098 1544
3 Credit card (automatic) 0.152431 1522
OnlineSecurity mean count
0 No 0.417667 3498
1 Yes 0.146112 2019
2 No internet service 0.074050 1526
TechSupport mean count
0 No 0.416355 3473
1 Yes 0.151663 2044
2 No internet service 0.074050 1526
PaperlessBilling mean count
0 Yes 0.335651 4171
1 No 0.163301 2872
SeniorCitizen mean count
0 1 0.416813 1142
1 0 0.236062 5901
ChurnFlag tenure MonthlyCharges TotalCharges
0 0 37.57 61.27 2555.34
1 1 17.98 74.44 1531.80
Thinking

Analyzing outputs for churn factors

I need to decide between option A or B and analyze outputs. There's churn prevalence data along with group means for Contract, Internet Service, Payment Method, and Online Security. However, some outputs may be truncated, possibly omitting important groups like Tech Support and Senior Citizen. While I have several strong factors already, it might help to consider numeric variables from these remaining categories to strengthen my analysis of "key factors driving churn." A minimal code block could gather any missing evidence.Choosing option B for evidence

It seems the notebook state likely has more output, but it’s truncated. I need to stick to the latest outputs without assuming hidden data. I think I'll choose option B to gather compact tables for the remaining evidence like Tech Support, Paperless Billing, Senior Citizen, and numeric summaries for tenure and charges. It’s important to include one minimal runnable code block. Current outputs suggest missing evidence, so I’ll compute churn rates by suspected variables and numeric means, keeping it concise and straightforward.

The current results already show several strong churn drivers:

  • Churn is about 26.5%, so the classes are somewhat imbalanced but not extreme.
  • Contract type is the clearest factor: month-to-month customers churn far more often than customers on one-year or two-year contracts.
  • Internet service matters a lot: fiber optic customers churn much more than DSL customers, and customers with no internet service churn the least.
  • Payment method is important: customers using electronic check have the highest churn by a wide margin.
  • Online security is strongly associated with retention: customers without it churn much more than those who have it.

To finish the answer cleanly, the missing evidence is the rest of the likely drivers visible in the schema: support/billing/demographic effects and the numeric differences in tenure and charges.

Summarize churn by features
from IPython.display import display

for col in ["TechSupport", "PaperlessBilling", "SeniorCitizen"]:
    summary = (
        tmp.groupby(col, dropna=False)["ChurnFlag"]
        .agg(churn_rate="mean", customers="count")
        .sort_values("churn_rate", ascending=False)
        .reset_index()
    )
    display(summary)

numeric_summary = (
    tmp.groupby("Churn")[
        ["tenure", "MonthlyCharges", "TotalCharges"]
    ]
    .mean()
    .round(2)
    .reset_index()
)
display(numeric_summary)

tmp["TotalCharges"].isna().sum()
TechSupport churn_rate customers
0 No 0.416355 3473
1 Yes 0.151663 2044
2 No internet service 0.074050 1526
PaperlessBilling churn_rate customers
0 Yes 0.335651 4171
1 No 0.163301 2872
SeniorCitizen churn_rate customers
0 1 0.416813 1142
1 0 0.236062 5901
Churn tenure MonthlyCharges TotalCharges
0 No 37.57 61.27 2555.34
1 Yes 17.98 74.44 1531.80
np.int64(11)

The outputs are sufficient to identify the main churn drivers.

The clearest factor is contract type:

  • Month-to-month customers churn at 42.7%
  • One-year customers churn at 11.3%
  • Two-year customers churn at 2.8%

This is a very large gap, so short commitment plans are the strongest churn signal in the data.

Internet service type is another major driver:

  • Fiber optic customers churn at 41.9%
  • DSL customers churn at 19.0%
  • Customers with no internet service churn at 7.4%

That suggests fiber customers are much more likely to leave, possibly due to pricing, service expectations, or competition.

Payment method also stands out:

  • Electronic check customers churn at 45.3%
  • Other payment methods are much lower, around 15.2% to 19.1%

So electronic check is a strong risk marker.

Service add-ons are strongly associated with retention:

  • Without OnlineSecurity, churn is 41.8% vs 14.6% with it
  • Without TechSupport, churn is 41.6% vs 15.2% with it

Customers lacking these support/protection features are far more likely to churn.

There are also clear customer-profile differences:

  • Customers with PaperlessBilling = Yes churn at 33.6% vs 16.3% for No
  • Senior citizens churn at 41.7% vs 23.6% for non-seniors

Finally, churners differ meaningfully on tenure and charges:

  • Churners have much shorter average tenure: 18.0 months vs 37.6 months
  • Churners pay higher average monthly charges: 74.44 USD vs 61.27 USD
  • Their average total charges are lower: 1531.80 USD vs 2555.34 USD, which fits with leaving earlier

Putting it together, the key churn drivers in this dataset are:

  1. Month-to-month contracts
  2. Fiber optic internet
  3. Electronic check payment
  4. No online security
  5. No tech support
  6. Higher monthly charges
  7. Shorter tenure
  8. Paperless billing
  9. Senior citizen status

The overall pattern is that churn is highest among newer customers on flexible contracts, paying more per month, especially those on fiber service and without support/security-related add-ons.

Segment customers into high-risk groups. Recommend ONE business action to reduce churn. Estimate the expected impact of this action. Provide a clear report with numbers and explanations.
Thinking

Analyzing report requirements

I need to provide an answer based on the notebook outputs. The user is asking for a report. I might consider using existing outputs to analyze and give a direct response, possibly with just one step. It’s important to segment the data into high-risk groups and recommend an action. I might estimate the impact using observed rates while being clear with the numbers and explanations. I should also mention the 11 missing TotalCharges that are irrelevant and strive to create a concise report addressing segment intersections.Segmenting high-risk groups

I can segment customers into high-risk groups based on the strongest individual factors. For instance, month-to-month customers show a churn rate of 42.7%, while those using fiber optic have a churn rate of 41.9%. I should consider focusing on the month-to-month customers by offering them migration to one-year contracts since it's the largest group and has the highest churn. If I can convert 1655 churners into one-year customers, I could reduce their churn rate significantly to approximately 11.3%.Estimating impact of adoption

I need to estimate the actual impact from the adoption of one-year contracts. If I target 20% of the month-to-month base, that’s about 775 customers. If they behave like the current one-year group, I could avoid a churn of approximately 243 customers. I should also calculate the overall churn rate improvement, which would lower it by 3.45 percentage points. However, this is optimistic since it’s based on observation, not causation. A more conservative estimate would be a 15.7 percentage point reduction among the converted customers.Calculating churn impact

I’m working on estimating the current churners, figuring out that overall, it's about 1869, based on some calculations. I see a potential drop in churn rate to around 24.8%, which seems promising.

I will also provide a base case and a conservative range for the expected impact. One idea is creating a retention bundle, like a 12-month contract with benefits. However, with the user’s focus on one action, I'll suggest offering that retention plan to customers.Segmenting high-risk customers

I’m considering how to segment customers into high-risk groups for better prioritization. The top priority includes those with observed churn rates of 40% or more, such as electronic checks at 45.3% and month-to-month plans at 42.7%. For a secondary group with 30-40% churn, there's paperless billing at 33.6%.

Churners tend to have an average tenure of 18 months and higher charges, indicating they’re relatively early in their lifecycle. I’ll emphasize that we should prioritize customers within their first 18 months. However, I need to be careful in claiming causality; I should note it's an association, not proof. I also think compiling this information into a bullet list or table could help present the data clearly.Preparing report insights

I think we're being asked to provide insights in a notebook context. Since there are already sufficient outputs, it might be best just to answer without including code.

I should consider using natural headings in a markdown report, and a table might help. Using percentages and avoiding dollar signs could be clearer as well.

For revenue impact, if we estimate how many customers are retained, the average charges of 74.44 USD could mean preserving around 9,082 USD or up to 18,089 USD in recurring revenue.Calculating revenue impact

I want to compute the potential revenue impact for the user. For 243 customers, that would be about 18,088.92 USD, and for a more conservative estimate of 122 customers, it’s around 9,081.68 USD.

I should avoid assuming that the monthly charges are the same for different groups, but using the average monthly charges of churners, which is 74.44 USD, can work as a rough proxy.

Overall, I could say that roughly 9,000 to 18,000 USD in monthly recurring revenue could be protected. It’s also worth mentioning the missing 11 TotalCharges, but that shouldn't be a significant issue.Considering customer segmentation

I’m calculating the risk based on churn compared to our baseline metrics. Roughly, it shows a few ratios: like 1.707 and 1.61, which highlight potential churn. It seems the month-to-month segment is quite significant, making up 55% of the base. Even converting 20% of that means we could reach about 775 customers.

I think we should segment these into tiers, focusing on urgent outreach for the high-risk groups while explaining overlaps that may increase risk further. Let’s craft a clear action plan around that!Evaluating retention strategies

I’m focusing on converting month-to-month customers to a 1-year contract with a retention offer, ideally including a small incentive. However, I think that's more of a campaign rather than a single action. For expected impacts, I'm estimating a 20% acceptance rate among these customers, which means around 775 might be targeted. If we conservatively retain about 122 customers, that would reduce churn from 26.5% to about 24.8%, preserving roughly $9,082 in monthly recurring revenue. In a better scenario, it could be 243 retained, dropping churn to 23.1%. I’ll need to be precise with these calculations!

Here is a clear business report based on the notebook results.

Churn snapshot

  • Total customers: 7,043
  • Overall churn rate: 26.5%
  • Estimated churned customers: about 1,869

Churners also look different from non-churners:

  • Average tenure: 18.0 months for churners vs 37.6 months for non-churners
  • Average monthly charges: 74.44 USD for churners vs 61.27 USD for non-churners

So the customers most at risk tend to be newer and paying higher monthly bills.


High-risk customer segments

A practical risk segmentation from the observed churn rates is:

Tier 1: highest-risk groups

These groups all churn at around 42% to 45%, far above the 26.5% overall rate.

SegmentCustomersChurn rateComment
Electronic check2,36545.3%Highest single churn rate observed
Month-to-month contract3,87542.7%Large and very risky
Fiber optic internet3,09641.9%Strong churn signal
No online security3,49841.8%Much higher risk than customers with security
Senior citizens1,14241.7%Smaller segment, but very high risk
No tech support3,47341.6%Strong retention gap

Tier 2: elevated-risk group

SegmentCustomersChurn rateComment
Paperless billing = Yes4,17133.6%Above average, but less severe than Tier 1

Low-risk comparison groups

These groups help show where retention is already stronger:

  • One-year contract: 11.3%
  • Two-year contract: 2.8%
  • Online security = Yes: 14.6%
  • Tech support = Yes: 15.2%
  • No internet service: 7.4%

What this means

The biggest churn pattern is not one isolated factor. It is a profile:

  • short-commitment customers
  • paying higher monthly charges
  • often with fiber optic
  • often using electronic check
  • often lacking security/support add-ons

In practice, a customer matching several of these conditions is very likely to be in the highest-risk pool.


Recommended action: one retention campaign

I would recommend one action:

Target month-to-month customers with an offer to move to a 1-year contract.

Examples of the offer:

  • a small monthly discount
  • a one-time bill credit
  • or bundled value such as free online security / tech support for a limited period

Why this action?

  1. It targets the largest high-risk segment

    • Month-to-month customers = 3,875
    • That is about 55% of the entire customer base
  2. It has the clearest churn gap

    • Month-to-month churn: 42.7%
    • One-year churn: 11.3%
    • Observed gap: 31.4 percentage points
  3. It is operationally simple

    • You can run it as one outbound retention campaign
    • It is easy to target and easy to measure

Expected impact estimate

Since this is observational data, not a controlled experiment, the safest way is to estimate impact with explicit assumptions.

Assumption set

  • You convert 20% of month-to-month customers to the new 1-year offer
  • That means 775 customers affected
  • To stay conservative, assume the campaign achieves only half of the observed churn gap between month-to-month and one-year customers

Calculation

  • Observed churn gap: 42.7% - 11.3% = 31.4 percentage points
  • Conservative realized improvement: 15.7 percentage points
  • Expected avoided churn:
    775 × 15.7% ≈ 122 customers retained

Business impact

  • Current churned customers: about 1,869
  • After retaining 122, churn would fall to about 1,747
  • New overall churn rate: about 24.8%
  • Improvement: about 1.7 percentage points

Revenue interpretation

Using the average monthly charge of churners (74.44 USD) as a rough proxy:

  • 122 retained customers × 74.44 USD ≈ 9,082 USD monthly recurring revenue preserved

Upside scenario

If the full observed gap were achieved for those converted customers:

  • Avoided churn: 775 × 31.4% ≈ 243 customers
  • Overall churn reduction: about 3.5 percentage points
  • Monthly recurring revenue preserved: about 18,089 USD

So a reasonable impact range is:

  • Conservative: retain about 122 customers
  • Upper bound: retain about 243 customers

Final recommendation

If you do only one thing, do this:

Launch a contract-conversion retention campaign for month-to-month customers, prioritizing those with higher monthly charges and shorter tenure.

Why this is the best single action:

  • biggest risky segment
  • strongest practical lever
  • easiest to target
  • measurable impact
  • likely worth roughly 122 to 243 retained customers, depending on execution quality

If you want, the next step can be a small scoring table that combines these risk signals into a simple high/medium/low churn-risk list for the customer base.

Start using AI for Data Analysis

MLJAR Studio helps you analyze data with AI, run machine learning workflows, and build reproducible notebook-based results on your own computer.

Runs locally • Supports local LLMs