From Raw Data to Real Insights — My First HR Data Analysis Project

I’m Sandip Subedi, a science student focused on building real, long-term skills in Python, Machine Learning, and modern web technologies. I believe in learning fundamentals deeply rather than rushing through shortcuts or tutorials. My approach is simple: understand how systems work, practice consistently, and document the journey through writing. I’m steadily working toward becoming an industry-ready engineer who solves real problems with clarity and intent.
What is this project about?
I analysed an IBM HR dataset of 1,470 employees to answer one question: which department is losing the most people — and why does the answer depend on HOW you look at the data?
Dataset: IBM HR Analytics Employee Attrition (Kaggle) Tools: Python, Pandas, Matplotlib, Seaborn
Step 1 — Loading and exploring the data
The dataset had 1,470 rows and 35 columns covering employee age, department, salary, job role, and whether they left the company (Attrition: Yes/No).
Step 2 — Cleaning the data
No missing values — this is a synthetic dataset made by IBM. But I still found 3 useless columns where every single row had the same value: EmployeeCount, Over18, StandardHours.
These tell us nothing, so I dropped them.
Key lesson: data cleaning is not just about missing values.
Step 3 — Exploration
237 out of 1,470 employees left — a 16.1% attrition rate. Three departments: Sales (446), R&D (961), HR (63).
Step 4 — Visualizations
— Attrition count: 1,233 stayed, 237 left
— R&D is by far the biggest department
— But Sales has the HIGHEST attrition RATE at 20.6%
This was the biggest insight of the project. R&D looks worst by raw count (133 left) — but it's also the biggest department. When you calculate the actual rate, Sales is the real problem.
Raw counts vs rates tell completely different stories.
Age ranges from 18 to 60, average 37
— Income ranges from 1,009 to 19,999 USD
Key Findings
16.1% overall attrition rate
Sales: 20.6% rate — highest in the company
R&D: 13.8% rate — lowest despite most people leaving
Large salary inequality across the workforce
What I Learned
This was my first full data analysis project. The most important thing I learned had nothing to do with code — it was that raw counts and percentage rates can tell completely opposite stories. Always ask which one you need before making a chart.
Full code on : GitHub
Get in Touch : Instagram Facebook

