What’s Driving These Results? A Data Scientist’s Guide to Root Cause Analysis
Published: April 8, 2025
By Amy Humke, Ph.D.
Founder, Critical Influence
When someone first asked me to do a "root cause analysis," my imposter syndrome kicked in. I had spent years studying statistics and research methods, yet this wasn't a phrase I had encountered. At first, I thought root cause analysis referred to a specific algorithm that could uncover causal drivers. Then, I realized it's actually a structured process used in business and operations to understand why something happened. You might know it better as the "5 Whys," a Fishbone Diagram, a Pareto Chart, or an Is/Is Not analysis. These are all tools that help isolate and explain the root of a problem as a thought exercise.
What my stakeholders wanted me to do was take the root cause analysis thought process one step further and use statistics, algorithms, or machine learning to automate the root cause identification process.
The question, "What is driving these results?" comes up all the time. And maybe that's because it's the one thing that often doesn't get answered—precisely because it's so hard to answer confidently. As data scientists, we're trained to be cautious. We resist drawing conclusions without solid evidence, and for good reason. We know the data is messy and incomplete, and the analysis results are far from certain. But stakeholders still need direction. They need a narrative that explains what changed and why, even when the data is messy.
So, let's walk through some methods and strategies to answer why more confidently. In this article, I suggest several techniques to analyze the data, uncover drivers, and acknowledge the uncertainty that comes with complex data.
Step One: Prepare and Build a Dataset and Workflow with Flexibility in Mind
Build with a changing future and flexibility in mind. When you know the drivers will change every quarter, every data refresh, or every business pivot, you need to develop a structure that can adapt, flex, and scale. That means:
- Building a reusable framework
- Making your models segment-aware
- Preparing to handle ad hoc or one-off drivers without starting from scratch
- Building in space for uncertainty (e.g., actually put an "Unknown" bar in your delivered chart.)
If you're familiar with your data or have access to prior reporting, you likely have some ideas of the usual suspects for primary drivers. Additionally, your organization likely has standard segmentations typically used to disaggregate the data—that’s your starting point.
Discuss business process changes with colleagues and brainstorm recent economic or environmental changes that might impact your data. Some of these factors may be typical, and others may be new data you need to collect. Ideally, establish a recurring check-in (meeting, direct message, or reporting form) where stakeholders can communicate new drivers they believe could impact data now or in the future. Explaining the why behind the data is nearly impossible without this context.
- Build your dataset with the future in mind.
- Standardize your data features to minimize the burden of adding new data.
- Leave room for the unknown—your final model should never over-explain. Some uncertainty is inevitable and should be acknowledged in the final analysis.
Step Two: Explore the Data with Different Analysis Techniques
A data science approach to root cause analysis isn't about relying on one specific model; it's about using multiple techniques to understand what's driving changes in your trend and then putting it all together to tell a story.
Time Series Decomposition
Break down your trend into three components:
- Trend: Long-term direction—up, down, or steady.
- Seasonality: Regular, repeating patterns or cycles.
- Residuals (Noise): Unexplained anomalies—potential outliers worth investigating.
Decomposition Models:
- Classical Decomposition: Simple and intuitive, ideal for stable seasonal patterns.
- STL (Seasonal-Trend Decomposition using Loess): Flexible and robust to outliers; perfect for evolving seasonal behavior.
- Singular Spectrum Analysis (SSA): Great when seasonality isn’t tied to a regular calendar schedule.
- X-13ARIMA-SEATS: Advanced seasonal adjustment, ideal for handling calendar shifts and outliers.
Statistical Process Control (SPC)
Think of SPC charts as your smoke alarm. They clearly indicate if something unusual is happening by plotting data against control limits. Consistent breaches signal deeper issues worth exploring.
Comparative Analysis through Data Segmentation
This helps answer where change is happening.
- Demographic Segmentation: Age, gender, income, etc.
- Geographic Segmentation: Region, city, or territory.
- Behavioral Segmentation: Actions like purchasing patterns or web engagement.
- Cohort Analysis: Users grouped by shared characteristics or experiences over time.
- Technographic Segmentation: Technology usage, device type, or platform preferences.
Step Three: Loosen Up, Get Creative, Make It Visual and Actionable
Once you've done the analysis, you need to tell the story. Make the drivers and their impact clear.
The challenge? These analyses often feel disconnected. But they don’t have to be. They’re each holding a flashlight from a different angle on the same problem. The art lies in layering insights to build a cohesive narrative—not a perfectly unified statistical model—but a structured, hypothesis-driven interpretation of what’s happening.
Practical Steps:
- Align everything on a common timeline.
- Overlay decomposition signals with SPC control limits and segmented performance trends.
- Identify whether spikes or dips correlate with specific user groups or business events.
Visuals Spark Action:
- A ranked list of top contributors to the trend.
- A bar chart showing % impact by factor.
- A time series overlay with external events noted.
- A filterable dashboard view to explore segments dynamically.
And if you’ve done the work to make it reusable, this framework can refresh automatically as new data arrives.
Final Thought: Root Cause Is a Mindset, Not Just a Method
Root cause analysis isn't just about finding a driver or a list of drivers—it’s about building trust. It shows that you’re not just reporting what happened; you’re helping people understand why, and what they can do about it.
I used to dread the root cause question. Now, I look forward to it. That’s where real insight happens. That’s where we move from monitoring to meaning.
So the next time someone asks, “What’s driving this?” you’ll be ready. You’ll have the structure, the segmentation, the flexibility, and the tools. And most of all, you’ll have the confidence to say, “Let’s find out.”