From Questions to Insights : A Framework for Data Exploration and EDA
Introduction : Knowing the Target Isn’t Enough
Most businesses know their targets—like increasing sales by 10%—but struggle to pinpoint the roadblocks standing in the way. This is a challenge we've all faced at some point. People often know where they want to go but aren't always clear on why they aren't there already.
When faced with this kind of challenge, jumping straight into dashboards or analysis often doesn't work. The real magic starts before we even touch the data—by asking the right questions and framing the problem in a way that leads to actionable insights.
This blog is about how we can approach such challenges together. To make it concrete, let’s walk through a scenario: The VP of Sales at a company approaches us and says:
“We need to grow sales by 10%, but we’re not sure what’s holding us back. We suspect some regions and product lines aren’t performing, and a few big customers seem to have reduced their orders.”
This situation is quite common, and it's one where a structured yet flexible approach makes all the difference. Let’s explore the framework we can use to help them—one you can apply to your own challenges as well.
Framing the Problem and Asking the Right Questions
When sitting down with the VP, our first goal isn’t to jump into the data but to fully understand their perspective. We want to get a sense of their pain points, their suspicions, and what they hope to achieve.
Here are some guiding questions we can ask:
“What does success look like for you?” They explain their goal: increasing sales by 10% this year.
“What do you think might be holding you back?” They share thoughts about underperforming regions and reduced customer purchases.
“What specific areas should we focus on?” This helps identify critical dimensions to explore, like regions, product groups, and key customers.
These questions aren’t just about understanding their problem—they help us frame the analysis. By the end of the conversation, we have a clear sense of what to explore:
Are certain regions underperforming compared to last year?
Are there product groups contributing very little to sales?
Have key customers reduced their purchases?
Are there time-based patterns, like seasonal dips in sales?
These questions become our roadmap for the analysis, keeping us focused on what matters most.
Defining Assumptions to Guide Exploration
From the questions we've outlined, we can start forming hypotheses—general assumptions about what might be affecting performance. These don't need to be overly detailed; instead, they should be broad ideas guiding the key metrics and dimensions we explore.
For example:
“Some regions are underperforming compared to last year.” This tells us to focus on Total Sales by Region and compare year-over-year performance.
“Certain product groups aren’t contributing much to sales.” This directs us to analyze Total Sales by Product Group.
“Key customers have reduced their purchases.” This points us toward exploring Top Customers and their order trends.
These assumptions give us a clear focus for exploratory analysis while leaving room for surprises along the way.
Diving into the Dataset
With our roadmap in hand, it’s time to connect to the data and start exploring.
Here’s something we always emphasize: Exploratory Data Analysis (EDA) doesn’t require a perfect dataset or a fully developed pipeline. In fact, starting with raw data often reveals insights quickly.
We begin by exploring the dataset, identifying the entities and fields we need for the analysis. For this project, the key entities include:
Orders: To track sales trends.
Customers: To analyze who is buying what.
Product Groups: To compare performance across categories.
Once confident we have what’s needed, we connect to the database, query the data, and pull it into our analysis tools. The goal isn’t perfection—it’s about getting the data we need in its raw form and refining it only when necessary.
Exploratory Data Analysis: Tools and Techniques
EDA is where we start to uncover patterns and connections that lead to meaningful insights. Here are a few go-to techniques and tools that make this process efficient and effective:
Decomposition Tree: A powerful way to drill into metrics like Total Sales by Region, Product Group, and Customer. For instance, drilling into Canada may reveal that Product Group “Service” contributes almost nothing to revenue—an opportunity to investigate further.
Time-Series Analysis: Line charts help us identify seasonal trends or sudden drops in performance. If sales consistently dip in Q3, it might suggest either a seasonal issue or an operational bottleneck.
Heatmaps: Tools like Seaborn heatmaps visually show correlations between metrics and target performance indicators. For example, we can explore whether metrics like Order Frequency or Average Order Value impact Total Sales more significantly.
Key Tip:
Start with the basics. Focus on the primary metric—like Total Sales—and drill into key dimensions one at a time. Use tools like decomposition trees and line charts to identify patterns while staying focused on actionable insights.
Insights and Recommendations
By the end of our analysis, we’ve uncovered some key insights:
Region Canada underperformed by 20% year-over-year.
Product Group “Service” was contributing almost nothing to revenue.
High-value customers, like universities, had reduced their orders by 30%.
Sales consistently dipped in Q3, indicating a possible seasonal issue.
These insights aren't just numbers—they're actionable. Here's how we can present them to leadership with recommendations:
Focus on improving the performance of Product Group “Service.”
Launch a customer retention campaign targeting universities.
Investigate the root cause of the Q3 dip and plan proactive measures for the next cycle.
The VP now has a clear plan of action, supported by data that builds confidence in decision-making.
Conclusion: A Framework for Any Challenge
What makes this approach powerful is its flexibility and repeatability. Whether we’re analyzing sales data, customer behavior, or operational performance, the steps remain the same:
Frame the problem by asking the right questions.
Define broad assumptions to guide our analysis.
Explore the data and let the insights emerge naturally.
This is how we can turn data into decisions that truly matter. What challenge will we tackle next? Let's start with the right questions and let the answers guide the way.