Note: This article is the second in a 2-part series. The first article in the series is “Why Your Product Analytics Should Work Like Your Car”.
Automobiles have a 3-level monitoring system.
In the previous article, we explored the idea of setting up your product analytics the same way. Of simulating what I call a “car’s immune system”:
This system has 3 levels: Early Warning, Diagnostics, and Prevention.
In this article, we’ll explore practical steps to design your product analytics following this analogy.
I’ll share examples from my time at Google to help you apply this to your organization.
Ready? Buckle up!
Level 1: How to Set Up Early Warning Metrics
In a car, the Early Warning System is right in front of you when you’re driving.
If there’s a problem, a dashboard light goes off. It doesn’t tell you what the exact problem is, but it gives you a clue.
Is the “Check Engine” light on? If so, you probably should inspect your engine!
Detecting an issue early might be the difference between a minor hiccup and a major crisis.
Creating an effective early warning system for your product requires an intentional design approach. Follow the 4 steps below to maximize your chances of success.
1. Start with Business Questions
Many PMs dive straight into the metrics, without reflecting on the business questions that guide them.
This is starting with the solution.
As a PM, you know you should start with a pain point. And you find your pain points by asking the right questions.
So what questions come to mind when thinking about your product’s health?
Write those down in plain English. Don’t worry about metrics for now.
For example, when I led the Google Messages team, I asked questions like:
- How many sent messages never reach the recipient?
- How often do delivery failures happen?
- How many users experience delivery failures? Is the issue widespread, or affecting a small subset of users?
The answers to these questions sent my team in the direction of the right metrics.
We wouldn’t have gotten there had we started with the solution.
2. Nominate a Subset of Questions
There are so many metrics you can watch.
And PMs tend to want to monitor them all! This is a common mistake with a high cost. Quickly, you feel overwhelmed.
Instead, pick a few key metrics to act like the “car dashboard” for your product.
Look at your questions. Which ones would give you an early signal that something is wrong, and a sense of where? That’s your subset.
If two questions point to the same area, pick one. You need an early warning. You don’t need to be overwhelmed.
Returning to the Google Messages example, I selected the first question – “How many sent messages never reach the recipient?” – for my early warning system. This was sufficient to warn me if there was an issue with delivery failures.
3. Establish Metrics to Answer the Questions
Finally, it’s time to define metrics.
This is where many PMs start – wrongly. But if you went through the previous steps, now you have questions guiding you. For each question you chose for your early warning system, work with your engineering team to identify a metric to answer it.
Focus on leading indicators, not lagging ones.
Remember this is an Early Warning System. By the time a lagging indicator flags a problem, the damage may have already been done.
At Google – for the question “How many sent messages never reach the recipient?” – we defined the metric: % of sent messages not marked “Delivered” within 30 minutes
4. Push The Metrics – Don’t Pull Them
Now that you have metrics, make it as easy as possible to notice them.
Don’t rely on remembering to visit dashboards daily. You’ll probably stop looking after a while.
At Google, my team and I received a daily snapshot of our Early Warning metrics in our inbox.
The benefit is twofold:
- If it is right in front of you, you notice it.
- It allows people to build up intuition about the metrics over time. You start to see patterns and can spot outliers more quickly.
Make sure the daily snapshot includes the business question paired with each metric. It’s a reminder that you aren’t looking at numbers – you’re looking at answers to important questions.
Level 2: How To Set Up Diagnostic Metrics
The Early Warning metrics give you a sense of direction.
Diagnostic metrics allow you to dig deeper. To perform a root cause analysis that will identify exactly where the problem lies.
Pay attention to key steps along the funnel or user journey.
They give you clues about where things are breaking. Set up your diagnostic metrics in a way that, when analyzing them, you can answer questions like:
- “Where is the breakdown in this area actually happening?”
- “What is the specific step in the user journey that has an issue?”
Keep in mind – building diagnostic metrics is an ongoing effort.
The high-level business questions you asked to get started shouldn’t change often. So your Early Warning metrics shouldn’t either.
But Diagnostic metrics are different.
They change more frequently, as the code base changes too. So build metrics into your development team processes. For example, consider metric health sign-offs for each release.
Level 3: How To Set Up Preventive Alerts
Better than reacting to problems is preventing them from happening.
Early Warning metrics require you to react. But Prevention metrics help avoid problems altogether. That’s what your car does when it signals you should get it serviced soon, for example.
The key to defining preventative alerts is to focus on trends, not thresholds.
For example, a metric might still be in an acceptable threshold, but getting worse by 5% week over week. This is a sign that things are trending in a bad direction.
Keep these tips in mind when setting up your preventive alerts:
- Avoid desensitization caused by too many alerts
Find the right balance between informative and actionable vs. noisy and not helpful.
- Set up a process to respond to alerts
Alerts exist for you to take action.
Define playbooks on how to handle each alert. Consider also nominating an on-call engineer to triage incoming alerts.
- Balance leading and lagging indicators
Leading indicators don’t always capture slow degradation over time.
Think about your car again. The braking distance might be getting longer and longer over time. The brakes are still functional – no light on the warning panel. But braking distance is a lagging indicator that brakes are wearing down. Once you notice the trend, you take action to prevent further damage.
Final Thoughts
Now you have a complete framework for monitoring, diagnosing, and preventing issues in your product.
It’s time to put this knowledge into action.
Start by reflecting on the questions you would ask to verify your product is healthy. What subset of those questions would provide you an early warning when things go wrong?
Just this step alone will transform how you leverage analytics.
Over time, implement the remaining best practices outlined in this article.
You’ll start seeing the benefits of this system.
You’ll have more control and insights without extra work. Your teams and organization will be amazed at how well you scale. And your users will thank you for the quality of your product!