Solving performance problems at scale has always been tricky. There is a lot of confusion on how to address those problems in android.
In this talk, we will try to understand the nature of the performance problems, how to select tooling for these issues, and also try to make a generic approach that could apply to most of the problems.
1. Performance problems that happen at scale like cold start, battery, crashes, ANR, etc.
2. Common characteristics of these problems:
– We get to know about them when things go out of hand
– Affects business timelines
– They create a sense of urgency
– Hard to create observability
3. Anti methodologies: Explaining some of the common anti methodologies or pitfalls that developers fall into while solving these issues as also referenced by Brendan Gregg (Performance Engineer from Netflix) in his book.
4. Describing a generic methodology and tools to solve performance issues methodically by taking cold start as a simple example:
– Identify important metrics to chase for the issue, with a good signal to noise ratio
– Stop bleeding by putting a quality gate on the baseline branch
– Identify impacted sessions from production and get the telemetry data (traces, metrics, counters, etc.) selectively for impacted sessions
– Identify problems in impacted reports to bring down the baseline
– There is no best tool for solving performance issues
– You have to find a set of tools that help to create the methodology for chasing those issues
– Relying more on methodology rather than tool first