Skip to main content

Command Palette

Search for a command to run...

From Raw Data to Real Insights — My First HR Data Analysis Project

Updated
2 min read
From Raw Data to Real Insights — My First HR Data Analysis Project
S

I’m Sandip Subedi, a science student focused on building real, long-term skills in Python, Machine Learning, and modern web technologies. I believe in learning fundamentals deeply rather than rushing through shortcuts or tutorials. My approach is simple: understand how systems work, practice consistently, and document the journey through writing. I’m steadily working toward becoming an industry-ready engineer who solves real problems with clarity and intent.

What is this project about?

I analysed an IBM HR dataset of 1,470 employees to answer one question: which department is losing the most people — and why does the answer depend on HOW you look at the data?

Dataset: IBM HR Analytics Employee Attrition (Kaggle) Tools: Python, Pandas, Matplotlib, Seaborn


Step 1 — Loading and exploring the data

The dataset had 1,470 rows and 35 columns covering employee age, department, salary, job role, and whether they left the company (Attrition: Yes/No).


Step 2 — Cleaning the data

No missing values — this is a synthetic dataset made by IBM. But I still found 3 useless columns where every single row had the same value: EmployeeCount, Over18, StandardHours.

These tell us nothing, so I dropped them.

Key lesson: data cleaning is not just about missing values.


Step 3 — Exploration

237 out of 1,470 employees left — a 16.1% attrition rate. Three departments: Sales (446), R&D (961), HR (63).


Step 4 — Visualizations

— Attrition count: 1,233 stayed, 237 left

— R&D is by far the biggest department

— But Sales has the HIGHEST attrition RATE at 20.6%

This was the biggest insight of the project. R&D looks worst by raw count (133 left) — but it's also the biggest department. When you calculate the actual rate, Sales is the real problem.

Raw counts vs rates tell completely different stories.

Age ranges from 18 to 60, average 37

— Income ranges from 1,009 to 19,999 USD


Key Findings

  • 16.1% overall attrition rate

  • Sales: 20.6% rate — highest in the company

  • R&D: 13.8% rate — lowest despite most people leaving

  • Large salary inequality across the workforce


What I Learned

This was my first full data analysis project. The most important thing I learned had nothing to do with code — it was that raw counts and percentage rates can tell completely opposite stories. Always ask which one you need before making a chart.

Full code on : GitHub

Get in Touch : Instagram Facebook