HR Employee Attrition Analysis in Python

Explore the IBM HR Analytics dataset to uncover attrition patterns by department, age, salary, and job satisfaction.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the IBM HR Analytics attrition CSV from a URL, summarizes the dataset shape, and calculates the overall attrition rate. It generates visual comparisons of attrition rates by department and job role, and contrasts monthly income distributions for employees who left versus stayed. It also examines relationships between job satisfaction, work-life balance, and attrition using correlation analysis and a heatmap.

Who this example is for

This is for HR analysts and people analytics practitioners who need a reproducible way to explore attrition patterns in a standard benchmark dataset. It is also useful for data analysts learning exploratory analysis workflows that combine grouped summaries, distribution plots, and correlation checks.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

  • Loaded dataset with shape (1470, 35) and computed overall attrition rate (16.1%)
  • Bar chart of attrition rate by department and job role
  • Box plot comparing monthly income for leavers vs stayers
  • Correlation heatmap linking job satisfaction and work-life balance with attrition

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

  • pandas
  • numpy
  • matplotlib
  • seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

  1. 1load HR attrition dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/employee_attrition/HR-Employee-Attrition-All.csv and show overall attrition rate
  2. 2plot attrition rate by department and job role
  3. 3compare monthly income distribution for employees who left vs stayed
  4. 4show correlation between job satisfaction, work-life balance and attrition

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model SourceScoreTaskExecutionOutputReasoningReliabilityRun
gemma4:31b10/102/22/23/32/21/1Open gemma4:31b conversation
glm-5.110/102/22/23/32/21/1Open glm-5.1 conversation
gpt-5.410/102/22/23/32/21/1Open gpt-5.4 conversation
gpt-oss:120b10/102/22/23/32/21/1Open gpt-oss:120b conversation
qwen3-coder-next10/102/22/23/32/21/1Open qwen3-coder-next conversation
qwen3.5:397b10/102/22/23/32/21/1Open qwen3.5:397b conversation
How scoring works

Each run is graded on five dimensions:

  • Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
  • Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
  • Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
  • Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
  • Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gemma4:31b

This run uses AI with a step-by-step list of prompts to perform the analysis task HR Employee Attrition Analysis in Python. Below, you can review the full conversation generated with gemma4:31b. After the conversation, we show detailed score cards for this run.

load HR attrition dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/employee_attrition/HR-Employee-Attrition-All.csv and show overall attrition rate
Thinking

The user wants to load a specific HR attrition dataset from a URL and calculate the overall attrition rate.

Plan:

  1. Load the dataset using pandas.
  2. Inspect the data (shape, columns, head) to identify the target column (attrition).
  3. Calculate the attrition rate.

Constraint check:

  • "When a dataset is not yet loaded in current notebook context, first load and inspect only: show shape, columns, and head()."
  • "Do not combine first-load data inspection with plotting/modeling in the same response."
  • "Keep one primary objective per response."

So, the first step should only be loading and inspecting the data. I cannot calculate the attrition rate in the same step as the initial load/inspect.

Wait, the user specifically asked to "load... and show overall attrition rate". However, my system instructions say: "When a dataset is not yet loaded... first load and inspect only: show shape, columns, and head()." and "Do not combine first-load data inspection with plotting/modeling in the same response."

Calculating the attrition rate is an analysis step (aggregating data), which usually follows inspection.

Step 1: Load the data and show shape, columns, and head.

I'll start by loading the dataset and inspecting its structure to identify the column that indicates employee attrition.

Load and inspect data
import pandas as pd

url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/employee_attrition/HR-Employee-Attrition-All.csv"
df = pd.read_csv(url)

# Basic inspection
print(f"Dataset shape: {df.shape}")
print("\nColumns:")
print(df.columns.tolist())
df.head()
Dataset shape: (1470, 35)

Columns:
['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department', 'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount', 'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate', 'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction', 'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion', 'YearsWithCurrManager']
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

Thinking

The user wants the overall attrition rate from the HR dataset. The dataset has been loaded, and I can see the column Attrition (with values 'Yes' and 'No'). I have the shape (1470, 35) and the first few rows, but the actual attrition rate (the percentage of 'Yes' in the Attrition column) has not been calculated yet.

Current state:

  • Dataset loaded.
  • Column Attrition exists.
  • I need to calculate the percentage of 'Yes' values in the Attrition column.

Action:

  • Path B: Calculate the attrition rate.

The dataset contains 1,470 employees, and the Attrition column identifies whether an employee has left the company using "Yes" or "No" values. To determine the overall attrition rate, I will calculate the proportion of employees who have left.

Calculate attrition rate
attrition_rate = df['Attrition'].value_counts(normalize=True)['Yes'] * 100
attrition_rate
np.float64(16.122448979591837)
Thinking

The user wants to load the HR attrition dataset from a specific URL and show the overall attrition rate.

Current state:

  1. Dataset has been loaded from the URL.
  2. df.shape is (1470, 35).
  3. df['Attrition'].value_counts(normalize=True)['Yes'] * 100 was executed.
  4. The output is 16.122448979591837.

Observation:

  • The dataset is loaded.
  • The overall attrition rate is approximately 16.12%.

Conclusion: The original goal ("load HR attrition dataset... and show overall attrition rate") has been fully met.

Decision: Path A.

The HR attrition dataset has been loaded, containing 1,470 employees. Based on the data, the overall attrition rate is approximately 16.12%.

plot attrition rate by department and job role
Thinking

The user wants to plot the attrition rate broken down by Department and JobRole.

- `df` is loaded.
- Columns include `Attrition`, `Department`, and `JobRole`.
- `Attrition` contains "Yes" and "No".
- `Department` and `JobRole` are categorical.

- Calculate the attrition rate per group.
- Attrition rate = count(Attrition == 'Yes') / total_count * 100.
- Since I need to plot it by both Department and Job Role, a grouped bar chart or a faceted bar chart would be ideal.
- A common approach is to create a pivot table or group by both, calculate the mean of a boolean version of Attrition, and then plot.
- Better yet, a bar chart showing attrition rate for each Job Role within each Department.

- Convert `Attrition` to a boolean/numeric (Yes=1, No=0).
- Group by `Department` and `JobRole`.
- Calculate the mean of the numeric attrition column.
- Use Seaborn to create a bar plot.

- Library: `seaborn` and `matplotlib`.
- Chart type: Bar plot.
- X-axis: `JobRole`.
- Y-axis: `Attrition Rate (%)`.
- Hue: `Department` (to see the split).
- Since `JobRole` has many categories, a vertical bar plot might get crowded; a horizontal one or a faceted one might be better. However, a standard bar plot with a good figure size is usually fine.

*Wait*, the user asked for "attrition rate by department and job role". This could be interpreted as two separate plots or one combined plot. A combined plot showing Job Role on the x-axis and using `hue='Department'` would show how roles are distributed across departments and their attrition. Actually, roles are usually specific to departments. A faceted plot (one subplot per department) showing job roles would be the most readable.

Let's refine:
1. Map `Attrition` "Yes" to 1 and "No" to 0.
2. Group by `Department` and `JobRole` and calculate the mean.
3. Plot using seaborn.

Actually, for a cleaner look, I'll create a summary dataframe first.

*Correction*: To make the visualization clear, a bar chart where the x-axis is the `JobRole` and the y-axis is the `Attrition Rate`, grouped by `Department` (hue), is a good start. Or, since Job Roles are essentially unique to departments (or few), I'll just group and plot.

*Decision*:
- Compute attrition rate: `df.groupby(['Department', 'JobRole'])['Attrition'].apply(lambda x: (x == 'Yes').mean() * 100).reset_index(name='AttritionRate')`.
- Plot using `sns.barplot`.

*Constraint check*:
- One runnable code block.
- Seaborn by default.
- One chart.

Let's check the columns again. `Department` and `JobRole` are present.

Plan:
- Convert `Attrition` to numeric for easy averaging.
- Create a grouped dataframe.
- Plot a bar chart.

Wait, a better way to visualize this is to have `JobRole` on the X axis and use `Department` as a hue, or better, a faceted plot if there are many roles. Let's go with a bar chart and adjust the figure size for readability.

Wait, if I just use `sns.barplot(data=df, x='JobRole', y='AttritionNumeric', hue='Department')`, seaborn handles the aggregation (mean) automatically.

Let's write the code.

To visualize the attrition rate by department and job role, I will first convert the Attrition column into a numeric format (1 for "Yes" and 0 for "No"). Then, I'll use a bar plot to show the average attrition rate across different job roles, colored by department.

Plot attrition by job role
import seaborn as sns
import matplotlib.pyplot as plt

# Convert Attrition to numeric for easy averaging (mean = rate)
df['AttritionNumeric'] = df['Attrition'].map({'Yes': 1, 'No': 0})

plt.figure(figsize=(12, 6))
sns.barplot(data=df, x='JobRole', y='AttritionNumeric', hue='Department')

plt.title('Attrition Rate by Job Role and Department')
plt.ylabel('Attrition Rate (Proportion)')
plt.xlabel('Job Role')
plt.xticks(rotation=45)
plt.legend(title='Department', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
Thinking

The user wants to "plot attrition rate by department and job role". In [cell-3], the code used sns.barplot with x='JobRole', y='AttritionNumeric', and hue='Department'. The output [output-3] shows <Figure size 1200x600 with 1 Axes>, confirming the plot was generated.

According to the "Sufficiency heuristics": "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart."

The current state is that we have a visual representation, but no concrete numeric table showing the exact attrition rates per department and job role. To provide a high-quality analytical answer, I need the actual numbers behind the bars.

Path B is required to generate the numeric summary table.

Plan:

  1. Extract the attrition rates grouped by Department and JobRole.
  2. Display this summary table.

The plot provides a visual overview of attrition, but to accurately identify which roles and departments have the highest attrition rates, we need the underlying numeric values.

Calculate attrition rates
# Calculate the actual attrition rates by Department and Job Role
attrition_summary = df.groupby(['Department', 'JobRole'])['AttritionNumeric'].mean().reset_index()
attrition_summary = attrition_summary.rename(columns={'AttritionNumeric': 'AttritionRate'})
attrition_summary.sort_values(by='AttritionRate', ascending=False)
Department JobRole AttritionRate
10 Sales Sales Representative 0.397590
3 Research & Development Laboratory Technician 0.239382
0 Human Resources Human Resources 0.230769
9 Sales Sales Executive 0.174847
7 Research & Development Research Scientist 0.160959
5 Research & Development Manufacturing Director 0.068966
2 Research & Development Healthcare Representative 0.068702
4 Research & Development Manager 0.055556
8 Sales Manager 0.054054
6 Research & Development Research Director 0.025000
1 Human Resources Manager 0.000000
Thinking

The user wants to "plot attrition rate by department and job role".

- Cell 1: Data loaded.
- Cell 2: Overall attrition rate calculated (~16.12%).
- Cell 3: A bar plot showing attrition rate by `JobRole` and `Department` was created.
- Cell 4: A summary table showing the actual `AttritionRate` by `Department` and `JobRole`, sorted in descending order.

- The original goal was to "plot attrition rate by department and job role".
- Cell 3 provided the plot.
- Cell 4 provided the exact numeric values associated with that plot.

- The user has both the visualization and the supporting data.
- The highest attrition is seen among Sales Representatives in the Sales department (~39.76%), followed by Laboratory Technicians in R&D (~23.94%) and HR personnel in Human Resources (~23.08%).
- Managers across all departments show significantly lower attrition rates.

- Path A is appropriate because the original goal (plot) was achieved and supported by the numeric summary in the last step.

- Provide the final answer based on the observed outputs.
- Ensure no Python code is included.
- Include `<!--final answer-->`.

The attrition rate varies significantly across departments and job roles. The Sales department has the highest attrition, specifically among Sales Representatives, who exhibit an attrition rate of approximately 39.76%. This is followed by Laboratory Technicians in Research & Development (23.94%) and the Human Resources role within the Human Resources department (23.08%). In contrast, Manager roles across all departments show much lower attrition rates, with the Manager in Human Resources having 0% attrition.

compare monthly income distribution for employees who left vs stayed
Thinking
  • Current goal: Compare monthly income distribution for employees who left vs stayed.
    • Notebook state:

      • Dataset df is loaded.
      • Attrition (Yes/No) and AttritionNumeric (1/0) columns exist.
      • MonthlyIncome column is available (seen in df.columns in cell-1).
    • The user wants to "compare monthly income distribution".

    • Appropriate visualization for distribution comparison is usually a boxplot, violin plot, or a KDE plot (histogram).

    • Given the goal, a boxplot or a KDE plot split by Attrition is standard.

    • Use seaborn and matplotlib.

    • Plot MonthlyIncome on the y-axis and Attrition on the x-axis (or vice versa).

    • A boxplot is excellent for seeing medians and quartiles.

    • A KDE plot is better for seeing the density/shape of the distribution.

    • I'll start with a boxplot as it's the clearest way to compare central tendencies and spread.

    • sns.boxplot(data=df, x='Attrition', y='MonthlyIncome')

    • Add title, labels.

To compare the monthly income distribution between employees who left and those who stayed, a boxplot is an effective way to visualize the median, quartiles, and potential outliers for both groups.

Plot income by attrition
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='Attrition', y='MonthlyIncome', palette='Set2')

plt.title('Monthly Income Distribution by Attrition Status')
plt.xlabel('Attrition')
plt.ylabel('Monthly Income (USD)')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
/var/folders/pp/x90pfm0140d4glxshn77bthh0000gn/T/ipykernel_6739/969618432.py:2: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.boxplot(data=df, x='Attrition', y='MonthlyIncome', palette='Set2')
Thinking

The user goal is to "compare monthly income distribution for employees who left vs stayed". Looking at the notebook state:

  • Cell 5 generated a boxplot of MonthlyIncome by Attrition.
  • This is a visual output. According to the "Sufficiency heuristics": "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first and print the values used in the chart."

The current output is a figure. I don't have the actual numeric summaries (median, quartiles, mean) to provide a precise comparison in the final answer.

Therefore, I should choose Path B. I need to calculate the descriptive statistics (mean, median, 25th, 75th percentiles) of MonthlyIncome grouped by Attrition.

Plan:

  1. Extract insights from the boxplot (employees who left generally have lower monthly incomes).
  2. Provide a code block to calculate these statistics to be precise.

The boxplot indicates that employees who left generally had lower monthly incomes than those who stayed, with the median income for the "Yes" attrition group being noticeably lower than for the "No" group.

To provide a precise comparison, I will calculate the summary statistics for both groups.

Summarize income by attrition
df.groupby('Attrition')['MonthlyIncome'].describe()
count mean std min 25% 50% 75% max
Attrition
No 1233.0 6832.739659 4818.208001 1051.0 3211.0 5204.0 8834.0 19999.0
Yes 237.0 4787.092827 3640.210367 1009.0 2373.0 3202.0 5916.0 19859.0
Thinking

The user wants to compare monthly income distribution for employees who left vs stayed.

Looking at the notebook state:

  • Cell 5 generated a boxplot of MonthlyIncome by Attrition.
  • Cell 6 generated the descriptive statistics (describe()) for MonthlyIncome grouped by Attrition.

The results from describe() are:

  • For employees who stayed (No):
    • Mean: 6832.74 USD
    • Median (50%): 5204.0 USD
    • 75th Percentile: 8834.0 USD
  • For employees who left (Yes):
    • Mean: 4787.09 USD
    • Median (50%): 3202.0 USD
    • 75th Percentile: 5916.0 USD

The comparison is clear:

  1. Employees who left had a significantly lower average monthly income (4787.09 USD vs 6832.74 USD).
  2. The median income for those who left (3202.0 USD) is much lower than for those who stayed (5204.0 USD).
  3. Even the top 25% threshold for those who left (5916.0 USD) is barely above the median for those who stayed.

The goal was to "compare monthly income distribution". I have both a visual distribution (boxplot) and the numeric summary (describe). This is sufficient to answer the goal.

Path A: Final answer.

Currency formatting: use USD.

Answer components:

  • Insight from boxplot: lower median and overall lower distribution for those who left.
  • Insight from numbers: specific means and medians.
  • Conclusion: Lower monthly income is associated with higher attrition.

Employees who left the company had significantly lower monthly incomes compared to those who stayed. The median monthly income for employees who left was 3202.0 USD, while those who stayed had a much higher median of 5204.0 USD.

This trend is further supported by the mean values, where employees who stayed earned an average of 6832.74 USD compared to 4787.09 USD for those who left. Additionally, the 75th percentile for employees who left (5916.0 USD) is only slightly above the median for those who stayed, indicating that a large majority of employees who left were in the lower income brackets.

show correlation between job satisfaction, work-life balance and attrition
Thinking

The user wants to show the correlation between JobSatisfaction, WorkLifeBalance, and Attrition.

- `df` is already loaded.
- `Attrition` is a categorical column ('Yes', 'No').
- `AttritionNumeric` was created in `cell-3` (`{'Yes': 1, 'No': 0}`).
- `JobSatisfaction` and `WorkLifeBalance` are likely numeric (ordinal) columns in this dataset.

- To calculate correlation, I need numeric values.
- I should use `AttritionNumeric` for the target.
- I can use a correlation matrix (Pearson correlation) and visualize it with a heatmap.

- Select the relevant columns: `JobSatisfaction`, `WorkLifeBalance`, and `AttritionNumeric`.
- Compute the correlation matrix.
- Plot using `sns.heatmap`.

- One runnable code block.
- No multiple blocks.
- No assumptions.
- Clear labels and titles.

To analyze the correlation between job satisfaction, work-life balance, and attrition, I will use the AttritionNumeric column created earlier and calculate the correlation matrix for these specific variables.

Plot correlation heatmap
# Select relevant columns for correlation analysis
corr_cols = ['JobSatisfaction', 'WorkLifeBalance', 'AttritionNumeric']
correlation_matrix = df[corr_cols].corr()

# Visualize the correlation matrix with a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)

plt.title('Correlation: Job Satisfaction, Work-Life Balance, and Attrition')
plt.show()
Thinking

The user wants to see the correlation between JobSatisfaction, WorkLifeBalance, and Attrition. Looking at the notebook state:

  • Cell 7 executed code that calculates the correlation matrix for ['JobSatisfaction', 'WorkLifeBalance', 'AttritionNumeric'] and plots a heatmap.
  • However, [output-7] is just <Figure size 800x600 with 2 Axes>. I cannot see the actual numbers (coefficients) from the image placeholder.
  • Per the sufficiency heuristics: "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart."

The goal is to show correlation. While a heatmap was plotted, I don't have the numeric values to provide a concrete insight. I need to print the correlation_matrix to get the actual numbers.

Plan:

  1. Identify that the heatmap was created but the values are not visible in the text output.
  2. Choose Path B.
  3. Provide a code block to print the numeric correlation matrix.

The heatmap was generated to visualize the relationships, but the specific correlation coefficients are not visible in the current output. To provide a precise analysis of how job satisfaction and work-life balance relate to attrition, I need to see the numeric values.

Compute correlation matrix
# Display the numeric correlation matrix for precise analysis
correlation_matrix = df[['JobSatisfaction', 'WorkLifeBalance', 'AttritionNumeric']].corr()
correlation_matrix
JobSatisfaction WorkLifeBalance AttritionNumeric
JobSatisfaction 1.000000 -0.019459 -0.103481
WorkLifeBalance -0.019459 1.000000 -0.063939
AttritionNumeric -0.103481 -0.063939 1.000000
Thinking

The user wants to see the correlation between JobSatisfaction, WorkLifeBalance, and Attrition. The notebook state shows that the correlation matrix has been calculated and displayed in [output-8].

The values are:

  • JobSatisfaction vs AttritionNumeric: -0.103481
  • WorkLifeBalance vs AttritionNumeric: -0.063939
  • JobSatisfaction vs WorkLifeBalance: -0.019459

Insights:

  1. JobSatisfaction has a negative correlation with AttritionNumeric (-0.10), meaning that as job satisfaction increases, the likelihood of attrition tends to decrease.
  2. WorkLifeBalance also has a negative correlation with AttritionNumeric (-0.06), indicating that better work-life balance is associated with lower attrition, though the relationship is weaker than that of job satisfaction.
  3. There is almost no correlation between JobSatisfaction and WorkLifeBalance (-0.02).

The goal was to "show correlation". This has been done both visually (heatmap in output 7) and numerically (matrix in output 8).

Since both the visualization and the numeric values are present, Path A is appropriate.

Final Answer formulation:

  • State the correlation between Job Satisfaction and Attrition.
  • State the correlation between Work-Life Balance and Attrition.
  • Mention the strength/direction (negative correlation).
  • Conclude that Job Satisfaction has a stronger inverse relationship with attrition than Work-Life Balance does.

Based on the correlation matrix, both Job Satisfaction and Work-Life Balance have a negative correlation with attrition, meaning that higher scores in these areas are associated with a lower likelihood of an employee leaving.

  • Job Satisfaction: Shows a correlation of -0.10 with attrition. This is the stronger of the two relationships, suggesting that as satisfaction increases, attrition tends to decrease.
  • Work-Life Balance: Shows a correlation of -0.06 with attrition. While still negative, this relationship is weaker than that of job satisfaction.
  • Inter-relationship: There is almost no correlation (-0.02) between Job Satisfaction and Work-Life Balance, indicating that an employee's satisfaction with their job is largely independent of their perceived work-life balance.

Score cards for gemma4:31b for HR Employee Attrition Analysis in Python

Total score: 10/10

  • Task Completion: 2/2

    Excellent

    All four requested analyses were completed: dataset load and attrition rate, department/job role plot, income comparison, and correlation heatmap.

  • Execution Correctness: 2/2

    Excellent

    The code shown is syntactically coherent and the outputs indicate it ran successfully, including the expected shape, rate, plots, and summary tables.

  • Output Quality: 3/3

    Excellent

    Outputs match the expected outcomes semantically: shape (1470, 35), attrition rate 16.12%, leavers earn less on average, and the correlation matrix supports the stated relationship. The department/job role summary also identifies Sales as highest and Research & Development as lowest among major groups.

  • Reasoning Quality: 2/2

    Excellent

    The workflow uses appropriate EDA steps and correctly interprets the numeric summaries and correlations. Explanations are clear and aligned with the outputs.

  • Reliability: 1/1

    Excellent

    The workflow is consistent and data-driven, with no obvious hallucinations or unsupported claims.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.