I've included the abstract below, but the high-level goal of this talk is to strip some varnish off the bootstrap. Folks often look to the bootstrap as a panacea for weird data, but all tools have their failure cases. We plan on highlighting some problems we ran into when trying to use the bootstrap for Firefox data and how we dealt with the issues, both in theory and in practice.
Resampling methods like the bootstrap are becoming increasingly common in modern data science. For good reason too; the bootstrap is incredibly powerful. Unlike t-statistics, the bootstrap doesn’t depend on a normality assumption nor require any arcane formulas. You’re no longer limited to working with well understood metrics like means. One can easily build tools that compute confidence for an arbitrary metric. What’s the standard error of a Median? Who cares! I used the bootstrap.
With all of these benefits the bootstrap begins to look a little magical. That’s dangerous. To understand your tool you need to understand how it fails, how to spot the failure, and what to do when it does. As it turns out, methods like the bootstrap and the t-test struggle with very similar types of data. We’ll explore how these two methods compare on troublesome data sets and discuss when to use one over the other.
In this talk we’ll explore what types to data the bootstrap has trouble with. Then we’ll discuss how to identify these problems in the wild and how to deal with the problematic data. We will explore simulated data and share the code to conduct the simulations yourself. However, this isn’t just a theoretical problem. We’ll also explore real Firefox data and discuss how Firefox’s data science team handles this data when analyzing experiments.
At the end of this session you’ll leave with a firm understanding of the bootstrap. Even better, you’ll understand how to spot potential issues in your data and avoid false confidence in your results.