blog.harterrt.comhttps://blog.harterrt.com/2022-02-18T00:00:00-08:00Data Intuition Case Study: Grain-free Dog Food2022-02-18T00:00:00-08:002022-02-18T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2022-02-18:/fda-dog-food.html<p>My vet told me I should stop feeding my dog grain-free dog food. Apparently,
grain-free dog food is linked with a heart condition called Dilated
Cardiomyopathy (DCM). This set off my <a href="/data_intuition.html">bullshit detector</a>,
so I decided to dig deeper.</p>
<p>The FDA has a great document explaining their investigation
<a href="https://www.fda.gov/animal-veterinary/outbreaks-and-advisories/fda-investigation-potential-link-between-certain-diets-and-canine-dilated-cardiomyopathy">here</a>.
It's …</p><p>My vet told me I should stop feeding my dog grain-free dog food. Apparently,
grain-free dog food is linked with a heart condition called Dilated
Cardiomyopathy (DCM). This set off my <a href="/data_intuition.html">bullshit detector</a>,
so I decided to dig deeper.</p>
<p>The FDA has a great document explaining their investigation
<a href="https://www.fda.gov/animal-veterinary/outbreaks-and-advisories/fda-investigation-potential-link-between-certain-diets-and-canine-dilated-cardiomyopathy">here</a>.
It's very approachable. I encourage you to give it a read. But to save you some
time, here's their summary of their investigation:</p>
<blockquote>
<p>In July 2018, the FDA announced that it had begun investigating reports of
canine dilated cardiomyopathy (DCM) in dogs eating certain pet foods, many
labeled as "grain-free," which contained a high proportion of peas, lentils,
other legume seeds (pulses), and/or potatoes in various forms (whole, flour,
protein, etc.) as main ingredients [...]. Many of these case reports included
breeds of dogs not previously known to have a genetic predisposition to the
disease.</p>
</blockquote>
<p>In short: we heard more dogs are being diagnosed with DCM. Dogs who are
diagnosed with DCM are mostly eating grain-free dog food, which is odd, so
we're digging in deeper.</p>
<p>Let's review the data they present.</p>
<h2>Evidence</h2>
<p>The FDA started looking into the link between DCM and grain-free dog food in
July 2018. By April 2019 they had 515 reports of canine DCM. Interestingly, 91%
of the dogs reported with DCM were eating a grain-free diet:</p>
<p><img alt="ingredient prevalence for dogs with reported
DCM" src="https://i.snap.as/9G30pk6U.png" /></p>
<p>That's definitely suspicious. I don't have any data to back this up, but I
suspect <em>most</em> dogs are not eating a grain-free diet. It's strange that these
dogs are almost uniformly eating grain-free. Maybe it's that simple? </p>
<h2>But how, though?</h2>
<p>It doesn't look that simple. The FDA says DCM is usually either genetic
or caused by a taurine deficiency. The FDA's tests show the grain-free dog
foods aren't missing any important nutrients. Even weirder, Google says taurine
comes primarily from <em>meat</em> and definitely not from grain. How could
<em>removing</em> grain cause a taurine deficiency? The opposite seems more likely.</p>
<p>It sounds like the current hypothesis is that replacing corn and wheat with
peas and lentils somehow interferes with how the dog digests the food. Even
that hypothesis doesn't make much sense given the data. Only about half of the
dogs tested for taurine deficiency actually <em>had</em> a taurine deficiency
(<a href="https://www.fda.gov/animal-veterinary/science-research/vet-lirn-update-investigation-dilated-cardiomyopathy">source</a>). </p>
<p>Overall, it sounds like we don't really know how grain-free dog food could
cause DCM. But hey, that doesn't mean it isn't happening. So far, this still
feels worthy of further investigation.</p>
<h2>Another interpretation</h2>
<p>I have a different hypothesis for what's going on here. I suspect this link
between grain-free dog food and DCM is entirely caused by a good old-fashioned
sampling bias.</p>
<p>The FDA notes: "We suspect that cases are underreported because animals are
typically treated symptomatically, and <strong>diagnostic testing and treatment can
be complex and costly to owners</strong>." (bolding mine). </p>
<p>Aha! Cases are being <em>underreported</em>, but they're also being <em>selectively</em>
reported. The FDA notes many of these reports include: "echocardiogram results,
cardiology/veterinary records, and detailed diet histories". Sounds expensive.
I wouldn't be surprised if most dog owners say, "Nah, just treat the dog. No
need to call the FDA."</p>
<p>And there's the bias. We're only seeing dogs that belong to owners with enough
free time and money to go through the rigamarole of getting their dog diagnosed
with DCM. It's no surprise to me that these dogs are more likely to be eating a
(more expensive) grain-free diet.</p>
<p>So far, this is just a theory. I can see <em>some</em> evidence for my theory in this
chart, though:</p>
<p><img alt="brand prevalence for dogs with reported DCM" src="https://i.snap.as/CepgxknW.png" /></p>
<p>I don't know much about dog food, but I know Acana is expensive. Acana's
website suggests their dog food is not-quite-human-grade, but is made with
human-grade ingredients. Look... humans should definitely <em>not</em> eat dog food,
but the fact that Acana needs to set the record straight tells me this is some
gourmet shit. I've probably eaten cans of Progresso with worse ingredients.</p>
<p>It's strange that an expensive food like Acana is the most commonly reported
dog-food brand. I'd expect that a less expensive and more accessible dog food
(like Blue Buffalo) would be more common in practice. </p>
<p>Maybe the pattern we're seeing isn't "grain-free dog food is associated with
DCM" and is instead "dog owners willing to pursue a DCM diagnosis also tend to
buy expensive dog food". Maybe I'll call this an "Affluence Bias".</p>
<h2>Prevalence</h2>
<p>Zooming out, this looks to be a rare disease. The FDA announced they were
investigating grain-free dog food in July 2018. Up to April 2019 the FDA
received 515 reports of canine DCM and they estimate there are 77 million pet
dogs in the US. That's ~0.0007% of dogs per year.</p>
<p>Even if the real case count is 100x bigger than the reported case count, that's
still less than a tenth of a percent of all dogs. I think it's safe to call
that "rare".</p>
<h2>In Summary</h2>
<p>I don't plan on changing my dog's food - at least not given these data.</p>
<p>It looks like the FDA is demonstrating an abundance of caution by investigating
this link. From what I understand, <a href="https://astralcodexten.substack.com/p/adumbrations-of-aducanumab">that's what the FDA
does</a>. </p>
<p>However, my vet shouldn't cite these results as fact given this level of
evidence. Vets probably shouldn't cite this relationship <em>at all</em> given the
effect size of 0.0007% of dogs per year. We've got bigger things to worry
about.</p>
<p>When all is said and done, I expect we'll all be left with a vague sense that
grain-free dog food causes DCM. In reality, I suspect it's just a quirk of the
data collection.</p>Getting Credit for Invisible Work2021-06-03T00:00:00-07:002021-06-03T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2021-06-03:/invisible-work.html<p>Last month I gave a talk at <a href="https://csvconf.com/speakers/#ryan-harter">csv,conf</a>
on "Getting Credit for Invisible Work".
The (amazing) csv,conf organizers just published a
<a href="https://www.youtube.com/watch?v=W7zT-GRDSCw">recording of the talk</a>.
(<a href="https://blog.harterrt.com/static/invisible_work_preso/#p1">slides here</a>).
Give it a watch! It's only 20m long (including the Q&A).</p>
<p>Invisible work is a concept I've been trying to …</p><p>Last month I gave a talk at <a href="https://csvconf.com/speakers/#ryan-harter">csv,conf</a>
on "Getting Credit for Invisible Work".
The (amazing) csv,conf organizers just published a
<a href="https://www.youtube.com/watch?v=W7zT-GRDSCw">recording of the talk</a>.
(<a href="https://blog.harterrt.com/static/invisible_work_preso/#p1">slides here</a>).
Give it a watch! It's only 20m long (including the Q&A).</p>
<p>Invisible work is a concept I've been trying to pin down for a while.
I've found that a lot of the hard important work
that goes into producing good data analysis is <em>invisible</em>.
Nobody ever sees the hypotheses we discard when doing EDA.
Nobody ever sees the problems we avoid with our intuition.</p>
<p>We explore complex data, so we can distill our findings into a simple narrative.
If we’re doing it right, we make our work look simple.
This is super valuable,
but can cause problems when we try to demonstrate our value.
This talk covers some strategies for getting credit
for this super valuable but invisible work.</p>
<p>Here's a link to <a href="https://www.youtube.com/watch?v=W7zT-GRDSCw">the recording</a>
(it's also embedded below).
I have a copy of
<a href="https://blog.harterrt.com/static/invisible_work_preso/#p1">my slides and speaker notes here</a>.</p>
<p>I'd love to hear your thoughts! Shoot me an email!</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/W7zT-GRDSCw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>Opportunity Sizing: Is the Juice Worth the Squeeze?2021-04-15T13:00:00-07:002021-04-15T13:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2021-04-15:/opportunity_sizing.html<p>My peers at Mozilla are running workshops on <em>opportunity sizing</em>.
If you're unfamiliar,
opportunity sizing is when you take some broad guesses
at how impactful some new project might be before writing any code.
This gives you a rough estimate of what the upside for this work might be.</p>
<p>The …</p><p>My peers at Mozilla are running workshops on <em>opportunity sizing</em>.
If you're unfamiliar,
opportunity sizing is when you take some broad guesses
at how impactful some new project might be before writing any code.
This gives you a rough estimate of what the upside for this work might be.</p>
<p>The goal here is to <strong>discard projects that aren't worth the effort</strong>.
We want to make sure the juice is worth the squeeze
<em>before we do any work</em>.</p>
<p>If this sounds simple, it is. If it sounds less-than-scientific, it is!
There's a lot of confusion around why we do opportunity sizing,
so here's a blog post.</p>
<h2>The Motivation</h2>
<p>Last year we ran a huge A/B test over a full Firefox release
(I mentioned <a href="/why_experiment.html">this experiment before</a>).
Unfortunately, we didn't see the impact we were hoping for
when we reviewed the results.</p>
<p>It was clear <em>why</em> we weren't seeing that impact
once we did some back-of-the-napkin math.
Our features just didn't affect a broad enough set of users.</p>
<p>I've fallen into this trap myself.
I've developed a pretty good sense of how much effort
a piece of technical work will take.
However, I find I don't have a great intuition
about how <em>impactful</em> a project will be.</p>
<p>Instead, my brain seems to group all new projects into one of two categories:
"Good thing to do" and "Pain in the a**" (PITA).
A feature is generally a "good thing to do"
if it would improve the product.
On the other hand, it's a PITA if we're trying to
contort the product into doing something it isn't meant to do.</p>
<p>Here's a doodle-graph to summarize.
Each dot is a project:</p>
<p><center>
<img width="75%" src="/images/opp_sizing/good_or_pita.png" alt="Good thing to do or PITA?">
</img>
</center></p>
<p>On the surface this may seem fine, but it's a pretty bad heuristic in practice.</p>
<p>I'm not concerned about ignoring projects that look like a PITA.
This isn't great, but usually someone will make some noise
if I'm ignoring an important project.</p>
<p>On the other hand, <strong>projects in the "good thing to do" bucket are <em>dangerous</em>.</strong>
Sure, this bucket has some good projects,
but there are usually even <em>more</em> projects that aren't worth the effort.
These are the projects that are "nice-to-haves" and hard to say "no" to.
If you work on one of these projects,
nobody's going to tell you you're wasting your time.</p>
<p>When choosing our work, we need to set the bar higher than "good thing to do".
A feature needs to be designed, built, and released.
If we're not careful, these costs can far outweigh any possible upside.
There's also the opportunity cost—
all the time we spend working on a "good thing to do"
keeps us from working on more important projects.</p>
<p>Instead of just looking at whether a project has <em>some</em> value,
we need to <strong>make sure the value is greater than the cost</strong>.
On our graph that would look like this:</p>
<p><center>
<img width="100%" src="/images/opp_sizing/worth_it_full.png" alt="Is it actually worth the squeeze?">
</img>
</center></p>
<p>Only some of the "good things to do" are actually worth doing.
The rest have <em>some</em> value,
but would take too much effort to make it worth while.</p>
<p>This also means we can't just work on
whatever project has the most upside.
Sometimes, all of the available projects
are below the line (not worth the squeeze).
In that case, it can be better to focus on finding more opportunities.</p>
<p>In short, picking arbitrary projects that are "good things to do"
is a great way to waste a lot of time.</p>
<h2>A Framework for Mozilla</h2>
<p>Opportunity sizing helps us break free from this default heuristic
and think about our projects with more structure.
The framework seems too simple to yield any real insight,
but I'm consistently surprised at how helpful it can be.</p>
<p>In this way, opportunity sizing is like a checklist in its
<a href="https://www.nytimes.com/2009/12/24/books/24book.html">unreasonable effectiveness</a>.</p>
<p>The framework we use at Mozilla
needs three basic ingredients to size an opportunity:</p>
<ol>
<li>How many users could this project affect?</li>
<li>What percent of users are going to change?</li>
<li>How will those users change?</li>
</ol>
<p>We usually have pretty solid data for #1
(How many users could this affect?).
For numbers 2 and 3 we're looking for guesses
that do a good job of communicating our assumptions.
These guesses are informed by data, but are usually less-than-scientific.</p>
<p>Multiplying 1x2x3 gets us to a place where we can have a conversation.</p>
<h1>But isn't that subjective?</h1>
<p>Yup. But subjective analyses can still be useful.</p>
<p>To start, guessing at the opportunity size helps clarify our assumptions
and identify weak assumptions early-on.
Even better, this framework is a <strong>useful communication tool</strong>.
"We don't have data, but this feels right"
is good enough at this stage.</p>
<p>It <em>does</em> have to <em>feel</em> right though.
Hopefully you're sharing these results with your peers.
They can gut-check your work and you can together build a shared opinion.
If your numbers are outrageous,
you'll need to be able to defend them.</p>
<p>There's often concern that ambitious coworkers are going to
oversell their opportunity to prioritize their pet-project.
In my experience, there's not much incentive to do this.
Eventually we either recognize over-hyped numbers and redirect the project
or get underwhelming results from the launch and learn our lesson the hard way.</p>
<h1>But what if you're wrong?</h1>
<p>I <em>am</em> wrong. It doesn't matter.</p>
<p>First, this analysis should <em>not</em> be the entirety of our decision making process.
It's a tool to break through some common biases when choosing work.</p>
<p>Second, these analyses can (and should)
grow with our commitment to the project.</p>
<p>When we're early in the product life-cycle, we can make very rough estimates.
As the project matures and we learn more,
these analyses should mature as well.
When we're deciding whether to commit 20 engineers for 6 months,
we should have more confidence in our estimates.</p>
<p>We'll get better at this as we build more intuition.
Our tenth opportunity sizing will be better than our first.</p>
<hr />Optional Comments2021-01-21T00:00:00-08:002021-01-21T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2021-01-21:/opt_comments.html<p>I spend a lot of my time at Mozilla reviewing my peers' work.
It's a joy, but it's hard to do well.
Review can be a great opportunity for mentorship and growth,
but it's also an opportunity to be overbearing.
Striking the right tone is a struggle.</p>
<p>Part of the …</p><p>I spend a lot of my time at Mozilla reviewing my peers' work.
It's a joy, but it's hard to do well.
Review can be a great opportunity for mentorship and growth,
but it's also an opportunity to be overbearing.
Striking the right tone is a struggle.</p>
<p>Part of the problem is this implicit push
for the author to incorporate every review comment into the document [1].
For example, comments must be marked as "resolved"
which suggests the author took some action.
I see this reflected in our culture too.
Consider this <a href="https://hbswk.hbs.edu/item/ignore-this-advice-at-your-own-peril">HBR article</a>
that highlights the risk of jilting peers by ignoring their advice.</p>
<p>This bums me out.
I don't want the author to feel beholden to my comments.
My goal is to give the author as much context as I can.
Often that means my comments won't require any changes to the document.</p>
<h3>Examples</h3>
<p>For example, when reviewing a project document I might note
that someone else has some related prior art.
There's no need to incorporate this into the document.
I'm arming the author with context
in case it comes up in a future conversation.</p>
<p>In another example,
I might be reviewing an analysis that has already passed its peak relevance.
It may not make sense to make edits if the document is already going stale.
Data scientists are in a constant race against irrelevance.
It's better to focus on the next piece of work.</p>
<!--We need to get results to stakeholders before the opportunity passes.-->
<p>In this situation I still want to leave lots of comments
if my comments can improve future analyses.
Some comments might look like:
"Heads up, we have helper functions in another library
to automatically do this transformation".
There's no need to refactor the already complete analysis,
but the tip is still useful to the author.</p>
<h3>Conclusion</h3>
<p>I don't have a silver bullet for this problem.
I can only recommend talking with your peers
about what kind of feedback they want to receive.
I find folks are usually happy to get more feedback
so long as they understand my intent.</p>
<p>Hopefully I can point folks to this post in the future
to prove I really do want them to take my comments as optional.</p>
<hr />
<p><sub>
[1] I'm focusing on reviewing prose here
because that's where I spend the most time reviewing.
Most of my peers are at the level where their analyses are expected to be solid
and prose is the most impactful part of their work.
Even so, there are pretty clear corollaries to code review.
</sub></p>
<p><em>Thanks to <a href="https://hamiltonulmer.com/">Hamilton Ulmer</a>
and <a href="https://marissagorlick.org/#!/posts">Marissa Gorlick</a>
for reviewing drafts of this post!</em></p>Controlled Experiments - Why Bother?2021-01-05T00:00:00-08:002021-01-05T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2021-01-05:/why_experiment.html<!-- tweets: I guess Ben Franklin was the first person to lick a 9volt battery in spirit -->
<!-- tweets: I wrote down some notes about why we spend so much energy running controlled experiments at Mozilla -->
<!-- tweets: In this post I compare A/B tests to hold-my-beer-type experiments like flying a kite in a thunderstorm -->
<p>I spent some time earlier this year orchestrating
a massive experiment for Firefox.
We launched a bunch of new features with Firefox 80
and we wanted to understand whether these new features improved our metrics.</p>
<p>In the process, I ended up talking with a bunch of Firefox engineers
and explaining …</p><!-- tweets: I guess Ben Franklin was the first person to lick a 9volt battery in spirit -->
<!-- tweets: I wrote down some notes about why we spend so much energy running controlled experiments at Mozilla -->
<!-- tweets: In this post I compare A/B tests to hold-my-beer-type experiments like flying a kite in a thunderstorm -->
<p>I spent some time earlier this year orchestrating
a massive experiment for Firefox.
We launched a bunch of new features with Firefox 80
and we wanted to understand whether these new features improved our metrics.</p>
<p>In the process, I ended up talking with a bunch of Firefox engineers
and explaining why we need to run a controlled experiment.
There were a few questions that got repeated a lot,
so I figure it's worth answering them here.</p>
<p>This article is the first in a series I'm writing on building
<a href="/data_intuition.html">data intuition</a>.
This article is targeted at new data scientists
or engineers interested in data.
I also hope this becomes a useful resource for data scientists,
so they can point their stake-holders to this resource.</p>
<h2>What is an <em>experiment</em>?</h2>
<p>In a very general sense,
we conduct an experiment if we:
(1) create a situation
(2) where we don't know what's going to happen
(3) so that we can observe the result.</p>
<p>The best example of an experiment I can think of is Ben Franklin
flying a kite during a lightning storm
(<a href="https://www.fi.edu/benjamin-franklin/kite-key-experiment">context</a>).
He didn't <em>know</em> what was going to happen when he flew the kite
so he went and found out.</p>
<p>In practice, this is a very liberal definition of experimentation.
By this definition, playing slots isn't gambling, it's an experiment!
When data scientists talk about "experimentation"
we're usually talking about "controlled experiments"
instead of this type of hold-my-beer type of experimentation.</p>
<h2>What is a <em>controlled</em> experiment?</h2>
<p>Controlled experiments are often called A/B tests
because we create two almost-identical hold-my-beer experiments
and look for differences in the outcomes.
If we do spot a difference in the results,
we know that it was caused by the
small differences between experiments.</p>
<p>Controlled experiments are more difficult to setup
but can help us spot effects more subtle
than <em>literally being struck by lightning</em>.</p>
<p>It might be clearer if I explain how we do this for Firefox:</p>
<ul>
<li>We start by launching a new feature behind a preference (or "pref").
This allows us to remotely toggle a feature on and off for a particular user.</li>
<li>Then we take a sample of users
and randomly assign them into one of two groups, called "branches".
We leave the feature toggled off for one group of users (the "control" branch)
and toggle the feature on for the other group (the "treatment" branch).</li>
<li>Then we look at whether users behave differently between groups.</li>
</ul>
<p>This gives us a before and after group running at the same time.
When we compare data from the two branches
we get a very reliable understanding
of what effect the feature had on user behavior.</p>
<p>Here's a doodle of what this looks like:</p>
<!--<center><img width="75%" src="/images/why-expt/Experiment overview.svg"></img></center>-->
<!--<center><img width="75%" src="/images/why-expt/experiment-overview.jpg"></img></center>-->
<p><center><img width="75%" src="/images/why-expt/experiment-overview.png"></img></center></p>
<p>A controlled experiment is a tool to help us establish <em>causation</em>.
We want to separate the effect our new feature has
from all of the random noise that affects our metrics day-to-day.
Because these experiments happened at the same time
and the only difference between the two branches was our new feature,
we know that any change in the results is caused by the addition of our feature.</p>
<p>This is still surprisingly difficult to do with Firefox.
Getting a feature behind a pref
(so we can switch it on and off remotely)
adds a lot of complexity.
Folks are understandably curious about why we're going through such a rigmarole.</p>
<p>Let's consider some simpler options (and why they don't actually work).</p>
<h2>Why not just look at usage?</h2>
<p>If we want to understand what effect our new feature has on usage,
why not compare users that engage with our feature to users who don't?</p>
<p>For example, we recently launched improvements to
<a href="https://support.mozilla.org/en-US/kb/view-pdf-files-firefox-or-choose-another-viewer">Firefox's PDF Viewer</a>.
We're interested in knowing whether these improvements
increased user retention.
It seems obvious to start by comparing retention
between users (1) who opened PDFs in Firefox and (2) users who did not open PDFs.</p>
<p>Here's what that might look like:</p>
<p><center><img src="/images/why-expt/usage.png"></img></center></p>
<p>In this example we found that users who interacted with the PDF viewer
retained at 80% week-over-week while non-PDF users only retained at 45%.
That's a HUGE difference!</p>
<p>Unfortunately, this effect isn't real.
As it turns out, "interacts with the PDF viewer"
is a decent proxy for "uses Firefox a lot".
Users who "use Firefox a lot" tend to retain well.</p>
<p>The critical problem here is that
users get to self-select into one of the two groups.
Active users tend to self-select into our "Uses PDF" group
and inflate our results.
This is the classic problem of <strong>correlation not meaning causation</strong>.</p>
<p>To drive this home,
I ran a similar analysis for users who encounter errors when using Firefox.
Errors are bad things, so we'd assume users who encounter errors would retain worse.
The problem is, we find that users who encounter errors
actually retain <em>better</em> than users who encounter no-errors.
How can that be? Well, encountering an error is
a good proxy for "Uses Firefox a lot".
Users who don't use Firefox at all encounter no errors!</p>
<p>We need to find a better experiment.</p>
<h2>What if we compare before and after the launch?</h2>
<p>OK - so we can't compare active users to inactive users.
What if we just launch the feature to 100% of our users
and compare behavior before and after the launch?
This way we're comparing roughly the same set of users
just over different time periods.</p>
<p>If we monitor our retention metric over time,
we hope to see a nice bump shortly after the launch.
That graph might look something like this:</p>
<p><img src="/images/why-expt/before-after.png"></img></p>
<p>If we do see something like this,
then it's pretty clear what effect our launch had.
In reality, this is a very optimistic case.
Seeing such a clear effect is the equivalent of being struck by lightning.
It's a big effect and <em>you know</em> when it happens.</p>
<p>More often, our metric is much more volatile than this
and our effect is much smaller.
For context, Firefox New Profile retention
regularly bounces between 35% and 40% within a week.
In any one experiment, we would be thrilled with a 1%-point movement.
Most metrics also have a strong seasonality.
Our signal is dwarfed by the noise.</p>
<p>This means we're more likely to see a graph that looks like this:</p>
<p><img width="75%" src="/images/why-expt/before-after-really.png"></img></p>
<p>This graph creates a lot of new questions.
It looks like retention is decreasing after the launch.
Is that because of annual seasonality or did we break something?
Let's look at year-over-year changes to adjust:</p>
<p><img width="75%" src="/images/why-expt/before-after-adjusted.png"></img></p>
<p>And on, and on, and on.
This is the beginning of a long chain of what-if analyses
that will take forever to resolve and leave us under-confident in our results.
It's possible that we'll come to a resolution and find a real effect in the data,
but we're just as likely to come up with a spurious correlation
after slicing the data too many times
(i.e. p-hacking or
<a href="https://xkcd.com/882/">the green jelly bean problem</a>).</p>
<p>What if we ran a controlled experiment instead?
Well, then we'd get a graph like this:</p>
<p><img width="75%" src="/images/why-expt/before-after-expt.png"></img></p>
<p>Now it's much clearer what's going on.
We can clearly see that the treatment branch
is doing better than the control branch.
We see this even though
there's plenty of noise and retention is declining overall.
That's the benefit of having two branches running at once.</p>
<p>This is even more important for Firefox.
It takes a while for Firefox releases to rollout - usually about a week.
After that we need to wait a couple of weeks to be able to observe retention.
That's a lot of time for the world to change under our feet.
If something odd happens during that three-week-observation period,
it will be very hard to separate our effect from the odd-event's effect.
And here's a secret - <strong>there's always something odd going on</strong>.</p>
<h2>OK, what if we throttle the rollout?</h2>
<p>Instead of pushing the release to 100% of our users at once,
we have the option of slowing the release
so only a portion of our users can upgrade.
Then we can compare upgraded users (treatment) to
the users we held back from upgrading (control).</p>
<p>Here's what that might look like in the ideal case:</p>
<!--<img width="75%" src="/images/why-expt/rollout-branches.png"></img>-->
<p><img src="/images/why-expt/rollout-branches-ideal.png"></img></p>
<p>Since <em>we're</em> deciding whether the user gets to upgrade or not,
we shouldn't have the self-selection bias we discussed above.
Throttling the rollout is also simpler operationally
because we don't need to remotely toggle features on and off.</p>
<p>This seems like a solid plan on the surface,
and it <em>would</em> work for a lot of technology companies.
Unfortunately, it doesn't work for Firefox.</p>
<p>For every Firefox release there's a portion of users
who delay upgrading or never upgrade to the new version.
Before Firefox can check for an update,
the user needs to open their laptop and start Firefox.
Unfortunately, we can't differentiate between
inactive users and users who tried to upgrade and were held back.
Effectively, a user needs to choose to use Firefox
before they can be enrolled in the treatment branch.</p>
<p>Here's what the treatment and control branches actually end up looking like:</p>
<p><img src="/images/why-expt/rollout-branches-actual.png"></img></p>
<p>In this example, the inactive users who haven't opened Firefox
overwhelm the held back users.</p>
<p>This subtle difference is enough to bias our results.
It's an insidious little problem too because it stokes our ego.
You see, in the first few days of every release
we get a flurry of very active users who try to upgrade.
<strong>For the first few days of the rollout
these very active users are the only users who can join the treatment branch.</strong></p>
<p>Since these users are super active <strong>our metrics look great</strong>!
We can pop some champagne and celebrate releasing
another great improvement to our user experience.
As time goes on,
the careful observer sees our metrics slowly revert to old levels.
But, by then we're focused on the next big release.</p>
<p>Here's an example of what we might see if we looked at
retention over time for users on the most recent version of Firefox:</p>
<p><img src="/images/why-expt/rollout-ex.png"></img></p>
<p>If you look at the number of users on the most recent release version,
this pattern starts to make sense:</p>
<p><img src="/images/why-expt/rollout-users.png"></img></p>
<p>If we were to treat this like an experiment,
where users who upgraded are in "treatment"
and users who haven't upgraded are in "control",
we'd see something like this:</p>
<p><img src="/images/why-expt/rollout-expt-bad.png"></img></p>
<p>Again, the problem here is that there are few active users
included in the "treatment" branch
while "control" is weighted down by inactive users.</p>
<p>If we ran a real experiment, this is what we'd expect to see:</p>
<p><img src="/images/why-expt/rollout-expt-good.png"></img></p>
<p>There's still an initial spike in the metrics,
but it's reflected in both the control and treatment branches.
We're also reassured by the user counts graph.
Instead of moving in opposite directions like the throttled rollout,
each branch has roughly the same number of users enrolled.</p>
<h2>Conclusion</h2>
<p>Don't let me kill your enthusiasm.
There's still plenty of room for
hold-my-beer kites-in-a-storm type experimentation,
Especially early in a feature's lifecycle.
But, if we want to be able to spot subtle changes to our products
we need to run controlled experiments.</p>
<p>Hopefully these examples clarify why experimentation is so popular.
At the very least I hope this article
prevents others from making some of the mistakes I've made
when trying to establish causation!</p>
<p>This article is part of a series I'm writing on building
<a href="/data_intuition.html">data intuition</a>.
In my next post I want to highlight some scenarios
where uncontrolled experiments make more sense
and how this all fits together in a feature's lifecycle.</p>
<p>I'd love feedback on what to write next.
Shoot me an email if you have ideas!</p>
<hr />
<p><em>Thanks to Dan McKinley and Audra Harter for reading drafts of this article</em></p>Leading with Data - Cascading Metrics2020-12-09T00:00:00-08:002020-12-09T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2020-12-09:/cascading_metrics.html<p>It's surprisingly hard to lead a company with data.
There's a lot written about how to set good goals
and how to avoid common pitfalls (like <a href="/surrogation.html">Surrogation</a>)
but I haven't seen much written about the practicalities
of taking action on these metrics.</p>
<p>I spent most of this year working with …</p><p>It's surprisingly hard to lead a company with data.
There's a lot written about how to set good goals
and how to avoid common pitfalls (like <a href="/surrogation.html">Surrogation</a>)
but I haven't seen much written about the practicalities
of taking action on these metrics.</p>
<p>I spent most of this year working with our executive team
to understand our corporate goals
and to track our progress against these goals.
I found that setting rock-solid goals <strong>didn't do much good
if individual employees didn't know how they could contribute</strong>.</p>
<p>The big and ambitious goals we set for our company as a whole
can be overwhelming to a single employee.
It's hard to know where to start,
so instead, overwhelmed employees
go back to whatever they were working on before.
<strong>We have to do more</strong> if we want to create behavior change
and get everyone working toward the same goal.</p>
<p>This article <strong>introduces a new framework</strong> for breaking down corporate goals
into metrics that are relevant and tractable for individual teams.
I call it <strong>Cascading Metrics</strong>.</p>
<p>Let's start with a case study to illustrate.</p>
<h2>An Example: 2020 Firefox Goals</h2>
<p>Firefox is losing users. We have been for a while.
Obviously, we want to turn this around.
We started by setting a goal for 2020:
Slow the loss of Firefox users.</p>
<p>This is vague,
so we decided on a metric to track our progress
and set targets that we wanted to hit by the end of the year.
Specifically, at the end of the year we want to have 238 million
<a href="https://data.firefox.com/dashboard/user-activity">Monthly Active Users (MAU)</a>.</p>
<p>That's a good start.
By specifying a metric and a target we've made the goal
specific, measurable, and time-bound.
Slowing the loss of Firefox users is obviously relevant.
In the <a href="https://en.wikipedia.org/wiki/SMART_criteria">SMART framework</a>
we're almost there.
The only <strong>remaining question is whether this goal is attainable</strong>.</p>
<p>Increasing MAU is a <em>huge</em> undertaking.
It's unwieldy.
It's hard to decide where to start on such a massive project.
To make this simpler, our leadership set a strategy to narrow the scope.
The team decided we're going to improve MAU
by finding a way to keep our new users around longer.</p>
<p>New Firefox users are at high risk of
installing Firefox and never returning again.
We decided to track this goal by measuring
<a href="https://docs.telemetry.mozilla.org/cookbooks/retention.html">New User 1-Week Retention</a>.
At a high level, this metric measures:
of the people using Firefox for the first time today,
what portion use Firefox again next week?
Currently, it sits around 45-50%,
meaning a little more than half of new users don't return in the following week.</p>
<p>This is a more manageable goal.
Increasing retention is still a big undertaking,
but we have narrowed the scope a bit.</p>
<h2>Leading vs Lagging Metrics</h2>
<p>We have two metrics in this case study, MAU and Retention.
I'd call Retention a <strong>leading</strong> metric
and MAU a <strong>lagging</strong> metric.</p>
<p>Our main goal is to improve MAU,
but to make the goal more manageable
we focused our strategy on improving retention.
Our hope is that improving retention will, in-turn, improve MAU.</p>
<p>I've seen this concept of leading and lagging metrics
discussed pretty often in leadership literature,
so I won't go too deep into it here <sup>1</sup>.</p>
<p>The important point is that this is a <strong>very powerful pattern</strong>.
In the ideal case,
employees can see consistent progress against our leading metric.
That's encouraging!
If we fail to set good leading metrics,
employees can get discouraged trying to make progress
against a metric that just won't budge.</p>
<p>I came across a particularly elegant explanation of this pattern in
a Steinbeck novel (of all places):</p>
<!--
> In human affairs of danger and delicacy
> successful conclusion is sharply limited by hurry.
-->
<blockquote>
<p>So often men trip by being in a rush.
If one were properly to perform a difficult and subtle act,
he should first inspect the end to be achieved and then,
once he had accepted the end as desirable,
he should forget it completely and concentrate soley on the means.
By this method he would not be moved to false action
by anxiety or hurry or fear.
Very few people learn this.</p>
<p>- John Steinbeck, East of Eden</p>
</blockquote>
<!-- First paragraph of chapter 21 -->
<p>Here, our lagging metric would describe the "end"
and our leading metric the "means".
Put another way: plan the work, then work the plan.</p>
<!--
Here's a quick summary of leading and lagging metrics:
<center><img width="75%" src="/images/cascading_metrics_leadvlag.svg"></center>
-->
<h2>Cascading Metrics</h2>
<p>There's a plot hole in this story though.
Increasing retention is still a very difficult goal to achieve.</p>
<!--
This reminds me of [Milo](https://en.wikipedia.org/wiki/Milo_of_Croton)
who carried a calf on his back every day until he could lift a bull
4 years later!
The thing is, a 3 month calf already weights ~250#
-->
<p>A individual team <em>might</em> be able to improve Firefox retention,
but most won't be able to.
Our leading metric didn't do enough to make our lagging metric tractable.</p>
<p>This is where Cascading Metrics can help.
When creating cascading metrics, we repeatedly apply this pattern of
breaking down difficult to move metrics into easier to move metrics
until we have an appropriately-sized project for an individual team.
Let's look at an example:</p>
<p>Let's say Alice is a senior leader at Firefox
and is responsible for improving MAU.
Alice identifies retention as the leading metric she wants to focus on improving.
Alice can then delegate responsibility for improving retention to Bob.
So far, nothing's changed.</p>
<p>Now Bob has the goal of improving retention.
He thinks that new users will be more likely to keep using Firefox
if we make the browser faster.
Accordingly, he identifies a leading metric to measure
how quickly Firefox loads websites on average.</p>
<p>In this example, Alice's leading metric becomes Bob's lagging metric.
This pattern can continue as needed until we have an achievable goal
and our strategy has become a tactic.</p>
<p>Here's a visual of what this flow might look like:</p>
<p><img src="/images/cascading_metrics_example.svg"></p>
<p>At each level of delegation,
there's a chance to <strong>match an employee's responsibility with their influence</strong>.
For example, increasing MAU might be a fine goal for a C-Suite executive,
but it would be an unachievable goal for a single manager with only a few reports.
Similarly, an executive shouldn't be setting goals for individual teams.</p>
<h2>Developing Leading Metrics</h2>
<p>Something I really like about this framework is that
it's explicit about where these leading metrics come from.
The leading metrics <strong>describe a product strategy</strong>.</p>
<p>We've looked at a goal, thought deeply about the product,
and decided on a strategy that will help us achieve that goal.
This decision-making process should be informed by data,
but it's probably not <em>driven</em> by data.
It's <strong>driven by product intuition</strong>.</p>
<p>A very common failure case is to skip over the "strategy" part of this process
and hope our leading metric will just fall out of the data.
We focus on finding analytical solutions for increasing our metric.
Maybe we run some broad machine learning exploration to identify a leading metric.</p>
<p>These approaches are only occasionally successful.
More often than not, we end up finding obvious truths:
"Users who use the product frequently retain better!
We should get people to use the product more!".
If we do find a meaningful way to move our lagging metric,
it's very often something we don't have agency to change,
which limits its value as a leading metric.</p>
<p>We're almost always better off
if we lean on our product experts to lead the way.
They can combine all of the tools at our disposal to find a way forward:
data science, user research, and their own product intuition.</p>
<p>Data science can support product in a couple of ways.
In the early stages of developing a strategy,
data science can help with opportunity sizing
and helping product test their intuition against the data
(e.g. do we have enough users in Germany
to focus all of our efforts in one country?)
Later in the process, data can help product
develop a leading metric to describe their strategy.</p>
<h2>Conclusion</h2>
<p>Setting numeric goals is a great way to give a company direction.
However, these ambitious corporate goals often
fail to create any real behavior change in practice.
If we want people to change their behavior
we have to make our goals relevant to individual teams.
Cascading metrics gives us a framework for turning our big corporate goals
into something relevant to front-line engineers.</p>
<hr />
<h2>Footnotes</h2>
<p>[1] If you want to read more, I can recommend
<a href="https://www.amazon.com/Disciplines-Execution-Achieving-Wildly-Important/dp/1451627068/">The 4 Disciplines of Execution</a>
which is referenced heavily in
<a href="https://www.amazon.com/Deep-Work-Focused-Success-Distracted/dp/1455586692/">Deep Work</a>.</p>
<p><em>Thanks to Dan McKinley and Audra Harter for reading drafts of this post</em></p>Defining Data Intuition2020-10-20T00:00:00-07:002020-10-20T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-10-20:/data_intuition.html<p>Last week, one of my peers asked me to explain what I meant by "Data Intuition",
and I realized I really didn't have a good definition.
That's a problem! I refer to data intuition all the time!</p>
<p>Data intuition is one of the three skills I interview new data scientists …</p><p>Last week, one of my peers asked me to explain what I meant by "Data Intuition",
and I realized I really didn't have a good definition.
That's a problem! I refer to data intuition all the time!</p>
<p>Data intuition is one of the three skills I interview new data scientists for
(along with statistics and technical skills).
In fact, I just spent the first nine months of 2020
building Mozilla's data intuition.
I'm really surprised to realize I can't point to
a good explanation of what I'm trying to cultivate.</p>
<p>So - I'll make one up. I propose the following definition for Data Intuition:</p>
<blockquote>
<p><strong>Data Intuition is a resilience to misleading data and analyses.</strong></p>
</blockquote>
<p>In other words, it's harder to mislead someone with data
if they have strong data intuition.
Think of this as <strong>a defense against the dark data arts</strong>.</p>
<p>So what does that look like in practice?</p>
<h2>Data Stink</h2>
<p>Someone with strong data intuition can quickly spot "data-stink"
(a close cousin to "<a href="https://en.wikipedia.org/wiki/Code_smell">code smell</a>").
These are data issues that don't necessarily invalidate an analysis,
but certainly draw suspicion on the results.
For example:</p>
<ul>
<li>An analysis prominently reports a seemingly <strong>arbitrary metric</strong> -
4-day retention increased by 0.5%!
Where did 4-day retention come from? Don't we usually track 7-day retention?
This needs more attention before I trust the results.</li>
<li>An analysis reports <strong>extraordinary results</strong> where nominal results are expected -
this feature increased retention by 10%!
But, past efforts were trying to increase retention by 0.5% -
and isn't retention already 90%? How'd we get and increase of 10%?</li>
</ul>
<p>These are extreme examples.
Usually the problems are more subtle
and result in a general sense of uneasiness with the results
(that's why it's called "intuition").</p>
<p>It's clear to me that data intuition is <em>related</em> to product intuition,
though these <em>are</em> different skills.
Product intuition can contextualize our results
and make it easier to identify extraordinary claims in analyses.
To know a 10% gain in retention is ridiculous
we need to know that users retain pretty well already.</p>
<h2>Methods issues</h2>
<p>Strong data intuition can also help you
spot issues with how the analysis was designed.
Things like: how did the author collect data? Is it a representative sample?
Do they need to have an experiment to establish causation?</p>
<p>Here's an example -
say an analysis reports that Firefox users who create a Firefox account
retain 10% higher than users who don't.
By default, a lot of folks interpret this to mean that
if we invest some time in helping users open accounts
we'll see an increase in retention.
Folks with stronger data intuition will instead
recognize these results are just correlational (not causal).</p>
<p>Users who use the product a lot tend to stick around longer.
Users who open an account are more active users, thus they retain better.
Users who <em>crash</em> Firefox are more active users, and also retain <em>better</em>.</p>
<p>I think this intuition is more than just understanding statistics well.
A strong stats background can help me identify issues
when reading the <em>methods section of a white paper</em>.
Strong data intuition helps me determine how much I trust
results I hear about in a <em>news headline</em>.
Data intuition helps me establish whether results are
<a href="/pub-true.html">true-enough</a>.</p>
<h2>More than Skepticism</h2>
<p>I almost defined data intuition as a type of skepticism,
but I think this is a bad characterization.
Skepticism over-focuses on disregarding results.</p>
<p>Intuition is more than being skeptical.
It's <strong>incorporating new data as part of a body of existing knowledge</strong>.
A lot of times, that means deciding new incoming data are inconsistent
and need more investigating before we can trust them.
But other times, it means changing our opinions in the face of new data
that are more authoritative than our existing body of knowledge.</p>
<h2>What do you think?</h2>
<p>I want to hear your thoughts on this.
I'm posting this definition publicly in part because I want to invoke
<a href="https://meta.wikimedia.org/wiki/Cunningham%27s_Law">Cunningham's Law</a>.
The best way to get to the right answer is to post the wrong answer!</p>
<p>Does this definition for data intuition resonate with you?
Am I missing something important? Let me know!
My email is at the bottom of this page.</p>
<p>I'm spending the next few month building some self-service trainings
to help non-data people at Mozilla build data intuition.
I'd rather be wrong now than next year!</p>Follow up: Intentional Documentation2020-10-12T18:00:00-07:002020-10-12T18:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-10-12:/int_d11n_preso.html<p>Last week I presented the idea of
<a href="https://blog.harterrt.com/randy_au_d11n.html">Intentional Documentation</a>
to Mozilla's data science team.
Here's a <a href="/static/int_d11n/">link to the slides</a>.</p>
<p>The rest of this post is a transcription of what I shared with the team
(give or take).</p>
<hr />
<p>In Q4, I'm trying to build a set of trainings
to help …</p><p>Last week I presented the idea of
<a href="https://blog.harterrt.com/randy_au_d11n.html">Intentional Documentation</a>
to Mozilla's data science team.
Here's a <a href="/static/int_d11n/">link to the slides</a>.</p>
<p>The rest of this post is a transcription of what I shared with the team
(give or take).</p>
<hr />
<p>In Q4, I'm trying to build a set of trainings
to help Mozillians build data intuition.
Last week, I was building a proposal for the project
and I thought to myself, "Why don't these trainings already exist?"</p>
<p>I spent the first half of 2020 working with Mozilla's product team
to help build an understanding of our data and metrics.
This feels like it would have been
the perfect opportunity to create some scalable documentation.
I've already invested a lot of time and energy into explaining our metrics.
Why didn't I think to document the work as I went?</p>
<p>My first thought was that writing documentation is <em>hard</em>.
Writing is difficult. Editing is difficult.
Even figuring out <em>what to write</em> is difficult.
But this wasn't convincing. Most of my work is hard.
What's so special about documentation?
Hell, I'm writing right now. I don't need to do this.</p>
<p>In reality, I think the problem is that documentation often <em>feels</em> useless.
It requires hard work up-front and I often don't get to see the downstream benefits.
In the best case, documentation reduces questions.
I don't get to see the questions not asked.
Even worse, if I write a lot of documentation
all of those articles need to be maintained and organized somewhere.
Writing documentation takes work and creates maintenance!</p>
<p>Randy Au's post about
<a href="https://counting.substack.com/p/lets-get-intentional-about-documentation">Intentional Documentation</a>
helped break this open for me.
Once I started thinking about documentation as a liability
everything became more clear.
Just like with software, we're better off if we can get by with fewer lines-of-code.</p>
<p>This is especially true at Mozilla.
The company is constantly exploring new areas so our problem-space keeps shifting.
This is in part because we're still new to working with data
and in part because the company is trying to find a way to grow.</p>
<p>I also think data science is harder to document than other areas (like engineering).
Data scientists are often working on a tighter time-scale.
We're trying to answer a question that becomes irrelevant in a few weeks.
There's no need to write long-lasting documentation for these investigations.
Nobody's going to read it in a month.</p>
<h2>Proposal</h2>
<p>So how to we adjust?
I think we should shift our focus away from writing documentation
and towards being <strong>prepared to backfill documentation</strong>.
Instead of writing documentation that might not be useful in the future,
let's wait to see what documentation we need.
In the mean time, let's prepare so we can backfill our documentation later.</p>
<p>This <em>should</em> make writing documentation easier
just because we'll be writing less documentation.
Even better, the documentation we do write is very likely to be used,
which is much more motivating that writing for a black hole.</p>
<p>We can make this even easier if we set up some useful habits.
If we focus on keeping lots of context in our tickets and weekly snippets
it will be much easier to distill that information into documentation later.
When starting a new project, writing a proposal helps me document my intentions
which helps contextualize documentation later.</p>
<p>Our ticketing system is working well.
I get a lot of day-to-day value from keeping my tickets up to date
(read: I get bugged less frequently when everything is up to date).</p>
<p>Snippets are sorta-kinda working. I alluded to this in
<a href="/writing_inside_organizations.html">writing inside organizations</a>
but I don't really enjoy writing snippets.
That's a sign to me that something's busted in the tooling.
When I'm doing useful work I tend to enjoy myself.</p>
<p>Finally, Slack is horrible for backfilling documentation.
By default, we only keep Slack messages for 6 months,
so unless you're confident your work will be irrelevant in 6 months from now
consider documenting elsewhere.</p>
<h2>Caveats</h2>
<p>This does put some additional pressure on our real-time media (i.e. Slack).
This isn't great. Answering questions in Slack takes time and is repetitive.
But keep in mind that over-documenting has the same effect.
Unless documentation is maintained and organized,
it's just as confusing as having no documentation.
Think about how frustrating it is to find documentation that looks authoritative
but misleads because it's out of date. </p>
<p>Intentional Documentation isn't a panacea.
We still need to figure out how to maintain the documentation we do write.
We still need to figure out when to delete unused documents.
But it does make this process easier.</p>Surrogation2020-10-07T00:00:00-07:002020-10-07T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-10-07:/surrogation.html<p>A year or so ago, I read
<a href="https://hbr.org/2019/09/dont-let-metrics-undermine-your-business">this article</a>
about how Wells Fargo ended up in such a mess.
If you don't remember,
Wells Fargo was opening accounts in their clients' name without their consent
and ended up paying a few hundred million dollars in fines.</p>
<p>Long story short, a …</p><p>A year or so ago, I read
<a href="https://hbr.org/2019/09/dont-let-metrics-undermine-your-business">this article</a>
about how Wells Fargo ended up in such a mess.
If you don't remember,
Wells Fargo was opening accounts in their clients' name without their consent
and ended up paying a few hundred million dollars in fines.</p>
<p>Long story short, a big part of the problem
was that WF set a few metrics to guide the company,
set strong incentives to optimize those metrics,
and blindly let the machine get to work.
The company did a great job of optimizing the metrics
but lost sight of the strategy the metrics were meant to represent.</p>
<p>This tendency to confuse metrics for a strategy is called <strong>Surrogation</strong>
(I keep forgetting this word, which is half of why I'm posting this here).
When I'm talking to other data scientists
I usually hear this put like,
"When a measure becomes a target, it ceases to be a good measure"
(<a href="https://en.wikipedia.org/wiki/Goodhart%27s_law">Goodhardt's Law</a>).</p>Intentional Documentation2020-09-29T00:00:00-07:002020-09-29T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-09-29:/randy_au_d11n.html<p>Randy Au has a great post on documentation for data scientists here:
<a href="https://counting.substack.com/p/lets-get-intentional-about-documentation">Let's get intentional about documentation</a>.
Take a look, it's worth a read.</p>
<p>I've been able to find some decent guides for writing documentation
but they're usually targeted at engineers.
That's a shame.
Data scientists have significantly different
constraints …</p><p>Randy Au has a great post on documentation for data scientists here:
<a href="https://counting.substack.com/p/lets-get-intentional-about-documentation">Let's get intentional about documentation</a>.
Take a look, it's worth a read.</p>
<p>I've been able to find some decent guides for writing documentation
but they're usually targeted at engineers.
That's a shame.
Data scientists have significantly different
constraints and needs when writing documentation.</p>
<p>I really like Randy's suggestion to <strong>focus on keeping good work records
instead of focusing on writing complete documentation</strong>.
Writing "good documentation" is a huge task
and it's hard to predict <em>what</em> documentation will be useful.
Instead of guessing, <strong>make it easy to backfill documentation later</strong>.</p>
<p>Keeping good records might look like
tracking your work in tickets or publishing weekly snippets.
I talked about how these records can be personally useful in
<a href="writing_inside_organizations.html">this post</a>.
As an added bonus, good records make it much easier to backfill documentation
once you identify what documentation is missing.</p>
<p>I mentioned earlier that documenting data science work
is significantly different than documenting engineering work.
One of they key differences is that
data scientists tend to do more once-and-done work than engineers.
<strong>Data science is a race against irrelevance</strong>.
The world is changing around us
and we need to deliver insights before our findings go stale.</p>
<p>It's impossible and inefficient to try to document all of this one-off work.
Only a small portion of the resulting documentation would ever be used.
Even worse, the useful documentation will be hidden in a sea of useless noise.</p>
<p>Instead, data scientists should focus on keeping good work records,
contextualizing their analyses,
and preparing themselves to backfill documentation later.</p>
<p>Finally, this quote is a gem:</p>
<blockquote>
<p>... [Documentation] is a MASSIVE, continent-sized, topic.
Sadly, I’m not informed enough to tackle the entire continent
and I’ve only got a few thousand words of to use in a post.
To cope, I’m gonna employ the time-honored move
of harrowed data scientists everywhere —
reduce scope by fiat and attempt to dazzle the audience
with “directional” findings until I can form a strong case later on.</p>
</blockquote>What do you take home?2020-07-10T00:00:00-07:002020-07-10T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-07-10:/notebook/2020-07-10.html<p>Every other week,
I go through my todo list and decide where I should focus my attention.
I review a list of prompts that help me choose important work.
One of the oldest prompts on my list is:
"<strong>What will you take home at the end of the week?</strong>".</p>
<p>I …</p><p>Every other week,
I go through my todo list and decide where I should focus my attention.
I review a list of prompts that help me choose important work.
One of the oldest prompts on my list is:
"<strong>What will you take home at the end of the week?</strong>".</p>
<p>I mentioned this a while ago in
<a href="https://blog.harterrt.com/new_tools.html">this post</a>
about evaluating new tools.</p>
<blockquote>
<p>I think of my <a href="https://esimoney.com/two-huge-reasons-why-your-career-matters/">career as an asset</a>,
so if I get to do work that builds transferable skills,
I count that as part of my compensation.
On the other hand,
if I'm writing glue scripts to deal with idiosyncrasies in an internal tool,
I'm missing out.</p>
</blockquote>
<p>In short, the type of work I do today
affects how valuable my career is in the long term.
I take this into account when choosing my work.</p>
<p>Mozilla is really strong in this dimension.
We're an open company
which means I get to work in the open,
work with open-source tools,
and blog about my work.</p>
<p>Writing is a great example of taking something home.
The obvious benefit is that I'll be able to reference my public writing forever.
Maybe a bigger benefit is the career capital I build by improving my writing
and broadening my network.</p>
<p>For a long time, blogging was only a small part of how I worked in the open.
I got to write a bunch of code and prose on
<a href="https://github.com/harterrt">Github</a>.
I reviewed PR's in the open and commented on public bugs.
I'll be able to point to that work for a long time.</p>
<p>For the manager, this can be a great incentive for you reports.
I know working in the open played a part in my decision to work at Mozilla.
As I noted in <a href="https://blog.harterrt.com/new_tools.html">New Tools</a>,
you can help your reports take something home by using open source tools internally.
These things are like instant raises that don't cost the company a dime.</p>Keeping a Journal2020-07-08T00:00:00-07:002020-07-08T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-07-08:/notebook/2020-07-08.html<p>There was a discussion of
<a href="https://hbr.org/2017/07/the-more-senior-your-job-title-the-more-you-need-to-keep-a-journal">this HBR article</a>
("The More Senior Your Job Title, the More You Need to Keep a Journal")
<a href="https://news.ycombinator.com/item?id=23768624">on HN</a> today.</p>
<p>This is great advice.
I've kept a journal for almost a decade now
and it's definitely improved my career
especially as I've become more senior …</p><p>There was a discussion of
<a href="https://hbr.org/2017/07/the-more-senior-your-job-title-the-more-you-need-to-keep-a-journal">this HBR article</a>
("The More Senior Your Job Title, the More You Need to Keep a Journal")
<a href="https://news.ycombinator.com/item?id=23768624">on HN</a> today.</p>
<p>This is great advice.
I've kept a journal for almost a decade now
and it's definitely improved my career
especially as I've become more senior.</p>
<p>The comments on HN appear to be missing the point though.
Most of the commentors note that keeping a lab book
helps them keep a record of what work they've done
and how they made decisions.
It's a tool to reference later.
That's not at all how I use my journal.
I almost never review old journal entries.</p>
<p>Instead, I use <strong>journaling to help me work through unresolved problems</strong>.
For example, I might take an hour in the morning to pull apart
why a difficult meeting went off the rails
or what my next actions should be on a difficult project.
Journaling is a <strong>tool for self-reflection and self-improvement</strong>.</p>
<p>Journaling is also great for helping me <strong>break free from inertia</strong>.
Being effective mostly boils down to working on the right project.
If I don't take a break to step back and review my progress,
I often end up working on a project because I was working on it last week.
This prioritization-by-inertia is a crummy way to choose work.</p>
<p>There's a cultural inertia too.
For example, it's really easy to get distracted
by a fire-drill that's getting a lot of attention.
It can be really <em>easy</em> to dedicate a couple of days to the urgent work.
However - I find the urgent work isn't always that important.
Journaling can help me identify unimportant work as unimportant.
You can drive a lot of value by being the cool head in the middle of a fire-drill.</p>Post hoc ergo propter hoc2020-06-17T00:00:00-07:002020-06-17T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-06-17:/notebook/2020-06-17.md.html<!---
status: draft
--->
<p>Economists have a handy phrase to describe a fairly common fallacy:
"Post hoc ergo propter hoc" meaning "After, therefore because".</p>
<p>Wikipedia has an example of how this might look in the wild:</p>
<blockquote>
<p>A tenant moves into an apartment
and the building's furnace develops a fault.
The manager blames the tenant's …</p></blockquote><!---
status: draft
--->
<p>Economists have a handy phrase to describe a fairly common fallacy:
"Post hoc ergo propter hoc" meaning "After, therefore because".</p>
<p>Wikipedia has an example of how this might look in the wild:</p>
<blockquote>
<p>A tenant moves into an apartment
and the building's furnace develops a fault.
The manager blames the tenant's arrival for the malfunction.
One event merely followed the other, in the absence of causality.
(<a href="https://en.wikipedia.org/wiki/Post_hoc_ergo_propter_hoc">link</a>).</p>
</blockquote>
<p>In tech this might manifest as part of a feature launch.
Say we launch a feature on a Monday
and on Tuesday our retention goes way up.
We're tempted to think that our feature launch
<em>caused</em> the increase in retention.</p>
<p>This might be a fine assumption if our metrics are generally constant
but more often than not this is just another case
of <strong>correlation not meaning causation</strong>.
Just because these events happened together (albeit with a lag)
doesn't mean that they were causally linked.</p>
<p>With how volatile the world has been this year, this is even more relevant.
Our metrics are moving up and down
for reasons totally unrelated to our individual actions.</p>
<p>Usually, I'd deal with this type of risk by <strong>running a controlled experiment</strong>.
We randomly split some users into two equal groups
and give them different product experiences.
If the users behave differently,
then we <em>know</em> that the difference is caused
by the difference in the product experience
and not some arbitrary outside force.</p>Daily Writing and the "Notebook" Category2020-06-16T00:00:00-07:002020-06-16T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-06-16:/notebook/2020-06-16.html<p>This past weekend I found
<a href="https://notebook.drmaciver.com/posts/2020-06-08-10:11.html">drmaciver's post</a>
on starting a daily writing practice.
I like the idea and I'm going to give it a try.</p>
<p>The content on this blog has never been all that polished,
but I do expect these daily posts will be less consistent
in quality and …</p><p>This past weekend I found
<a href="https://notebook.drmaciver.com/posts/2020-06-08-10:11.html">drmaciver's post</a>
on starting a daily writing practice.
I like the idea and I'm going to give it a try.</p>
<p>The content on this blog has never been all that polished,
but I do expect these daily posts will be less consistent
in quality and topic.
I'm fine with that.
My goal for this blog has always been to (1) become a better writer.
and (2) share context with my peers or anyone interested.
Writing daily supports both of those goals.</p>
<p>I do, however, want folks to be able to opt-out
of my quantity-over-quality writing.
To that end, I've started a new category for my daily writing
called "Notebook" (per drmaciver's suggestion).
If you don't want to see these notebook entries,
I recommend subscribing to the
<a href="/feeds/mozilla.rss.xml">Mozilla RSS feed</a>
or reading from the <a href="/category/mozilla.html">Mozilla category</a>.
These are the feeds that get syndicated to
<a href="https://planet.mozilla.org/">Planet Mozilla</a>.</p>Writing inside organizations2020-05-28T00:00:00-07:002020-05-28T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2020-05-28:/writing_inside_organizations.html<p>Tom Critchlow has a
<a href="https://tomcritchlow.com/2020/05/27/filtered-for-org-writing/">great post here</a>
outlining some points on how important writing is for an organization.</p>
<p>I'm still working through the links,
but his post already sparked some ideas.
In particular,
I'm very interested in the idea of an internal blog for sharing context.</p>
<h2>Snippets</h2>
<p>My team keeps …</p><p>Tom Critchlow has a
<a href="https://tomcritchlow.com/2020/05/27/filtered-for-org-writing/">great post here</a>
outlining some points on how important writing is for an organization.</p>
<p>I'm still working through the links,
but his post already sparked some ideas.
In particular,
I'm very interested in the idea of an internal blog for sharing context.</p>
<h2>Snippets</h2>
<p>My team keeps snippets,
which kinda-sorta feels like a blog-like interface for sharing context.
We keep our snippets in a google doc
largely because it has a low barrier to entry and it's a fast solution.
However, I find that keeping snippets in a doc really limits the value
I personally get from keeping a weekly log.
Ostensibly, the value to writing snippets is
keeping my team up to date on my work.
However, I find that the secondary
<strong>personal benefits are the ones that keep me motivated to write updates</strong>.</p>
<p>For example, I like taking a retrospective look at my work for each quarter
to evaluate whether I'm working on the right projects.
Also, perf season is coming up
and it's nice to be able to
review my snippets for the last two quarters.
These are both really easy to do with the text-file logs I keep locally.</p>
<p>Another problem I run into is that my work rarely follows a weekly periodicity.
It's usually closer to a 10-calendar-day cycle.
That means I'm often in the middle of a sprint when I'm documenting my progress
(unwieldy).
A blog-like feed for our team snippets would remove the
weekly periodicity requirement and make it easier for me to give updates
when I have them.</p>
<p>I also give updates to a few different teams,
which requires filtering and duplicating my updates in a few different docs.</p>
<p>I'm off topic now, but the point is I'd get more value out of our team snippets
if I had a good tool for syndicating my own notes into a central log.</p>
<h2>Analyses</h2>
<p>This type of central log is probably useful for sharing analyses as well.
It feels similar to the Indexed stage of the
<a href="/analysis_maturation.html">Analysis Maturation Plan</a>.
I'd love to share early-stage analyses with my team.
Most of the updates wouldn't be broadly interesting
but it's nice to be able to explore other peoples analyses when
I have extra time or feel low energy.</p>
<p>This would be particularly nice if the blog were
a semi-private space for analysts.
Kind of like a <a href="https://tomcritchlow.com/2019/09/04/networked-communities-2/">Digital Sidewalk</a>.
This reduces some of the stress of sharing early-stage results.
Fellow analysts are OK with being skeptical of results
and seeing the sausage get made.</p>
<h2>Conclusion</h2>
<p>I'm out of time for now, but I'll be thinking about this more soon.
In general, I'm bullish on this idea of an internal blog for sharing context.</p>Syncthing and Open Source Data Collection2020-01-05T00:00:00-08:002020-01-05T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2020-01-05:/syncthing_data.html<p>I don't see many open source packages collecting telemetry,
so when <a href="https://syncthing.net/">Syncthing</a> asked me to opt-in to telemetry
I was intrigued.</p>
<p>I see a lot of similarities between how Syncthing and Firefox collects data.
Both collect daily pings and make it easy to view the data you're submitting
(in Firefox …</p><p>I don't see many open source packages collecting telemetry,
so when <a href="https://syncthing.net/">Syncthing</a> asked me to opt-in to telemetry
I was intrigued.</p>
<p>I see a lot of similarities between how Syncthing and Firefox collects data.
Both collect daily pings and make it easy to view the data you're submitting
(in Firefox, go to about:telemetry to see your pings).</p>
<p>All of the data they're collecting looks relevant and innocuous.
For example, there's no content about <em>what</em> files are being sync-ed in Syncthing.
They just collect high level data like what version of the software is installed.
Well done!</p>
<p>Syncthing even has a <a href="https://data.syncthing.net/">public data report</a>
similar to the <a href="https://data.firefox.com/">Firefox Public Data Report</a>.
This is a great way to make it clear what data is being collected
and share some data back with the users who generated it.</p>
<p>Interesting to see someone else doing open-source data collection in the wild!</p>Syncthing2020-01-05T00:00:00-08:002020-01-05T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2020-01-05:/try_syncthing.html<p>I did a lot of reading and exploring over my holiday break.
One of the things I'm most excited about is finding
<a href="https://syncthing.net/">Syncthing</a>.
If you haven't seen it yet, take a look.
It's like and open-source decentralized Dropbox.</p>
<p>It works everywhere, which for me means Linux and Android.
Google Drive …</p><p>I did a lot of reading and exploring over my holiday break.
One of the things I'm most excited about is finding
<a href="https://syncthing.net/">Syncthing</a>.
If you haven't seen it yet, take a look.
It's like and open-source decentralized Dropbox.</p>
<p>It works everywhere, which for me means Linux and Android.
Google Drive famously has no official Linux client which is a big PITA.
Even the install on my ARM-based raspberry pi was simple.</p>
<p>Right now I'm using it to sync photos from my phone to my laptop.
That sounds trivial, but it's turning out to be a game changer.
It's really cool to snap a picture on my phone
and have it show up moments later on my laptop.
That instant transfer makes it really <strong>easy to bridge the digital-analog gap</strong>.
For example, I can scribble down a drawing, snap a photo,
and incorporate it into a blog post near instantly.
Because there's no third party server involved
I have complete control over my data
and the file transfers happen over my local network (which is really fast).</p>
<p>If you need to sync with machines outside your local network
you can use a relay server.
All of your data is encrypted in-flight so you're still in control of your data.
I'm going to keep playing with it and see how well it works for <strong>team collaboration</strong>.
For example, wouldn't it be cool to share a folder with a remote teammate
so they can keep up with what you're working on?
In terms of the <a href="/analysis_maturation.html">Analysis Maturity Plan</a>
Syncthing would be great for distributing "indexed analyses".</p>
<p>If you add a centralized server (that you trust and maintain)
and some bash scripts
it feels like this workflow could support a collaborative
<a href="https://hapgood.us/2015/10/17/the-garden-and-the-stream-a-technopastoral/">digital garden</a>
without all of the overhead and review.</p>
<p>Anyway, maybe this is old news to you
but Syncthing feels like a new super power to me.</p>Pub True2019-12-13T00:00:00-08:002019-12-13T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2019-12-13:/pub-true.html<p>I'm ramping up on a project to understand how Firefox retains users.
Right now I'm trying to build some context quickly.
For example, what's our monthly retention? How about our annual retention?
There's a bunch of interesting and nuanced measurement questions
that we'll eventually have to answer,
but for now …</p><p>I'm ramping up on a project to understand how Firefox retains users.
Right now I'm trying to build some context quickly.
For example, what's our monthly retention? How about our annual retention?
There's a bunch of interesting and nuanced measurement questions
that we'll eventually have to answer,
but for now I'm just interested in getting some quick back-of-the-envelope numbers.</p>
<p>There's a conflict here though.
I get a lot of value from having loose and squishy estimates.
But, if I share these numbers the results often get passed around
via word-of-mouth and the numbers start to look more solid than they are
(a second cousin to <a href="https://xkcd.com/978/">citogenesis</a>).</p>
<p>It's easier to express uncertainty in person than it is in writing.
In person, I can shrug my shoulders and rock my hand back and forth
signaling, "Kinda, sorta, maybe...".
In writing, caveats take up a lot of space and often confuse muddy my point.</p>
<p>Instead, <strong>I've started describing these soft-and-squishy numbers
as "pub true"</strong> or "true enough".
Basically, the idea is that they'd hold up in a bar conversation
but they're not meant to be used for much else.
It get's you into the ballpark
but you probably shouldn't bet the business on them.</p>
<p>I'm happy with the results so far.
Folks seem to register that we're having a casual discussion of the numbers.
It feels like "pub true" status gets passed along with the results.
It's a nice and succinct description.</p>
<h3>An Order of Magnitude</h3>
<p>In particular, I use I use "pub true" numbers
to unblock some early stage conversations.
When I'm starting a new project
<strong>conversations are often stilted by trying to be too accurate</strong>.
Folks are reticent to share numbers if they don't have exact numbers.</p>
<p>For example, I might ask someone "How many releases do we do a year?".
In a formal environment, they might think for a bit and come up with "I don't know"
even though I know they have more context than I do.</p>
<p>In their head, my peer's thinking,
"We usually release monthly but we had some bug-fix releases last year,
so it could be as many as 15."
The problem is, I know next to nothing so any new context is useful.</p>
<p>I find that re-framing the conversation to something casual gets better results.
If I throw out some ridiculous numbers I can usually get a reasonable estimate back.
For example, I might say "So like 100 releases or 50?".
Pretty often folks come back with a quick and informal,
"Oh, no - nothing like that. Maybe, like 12. Definitely no more than 20."</p>
<p>Great. Pub true. Hand wavy, quick and effective.</p>Analysis Maturation Plan2019-12-12T00:00:00-08:002019-12-12T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2019-12-12:/analysis_maturation.html<p>I was talking about tooling with Mark Reid a few weeks ago.
I've been trying to find a way to simplify sharing analyses throughout the company.
This is an old problem at Mozilla that I've tried to address a couple of times
but I haven't found the silver bullet yet …</p><p>I was talking about tooling with Mark Reid a few weeks ago.
I've been trying to find a way to simplify sharing analyses throughout the company.
This is an old problem at Mozilla that I've tried to address a couple of times
but I haven't found the silver bullet yet.
This is another attempt.</p>
<h3>The problem</h3>
<p>To summarize the problem,
I need to be able to share analyses with my peers at Mozilla
(often HTML documents generated by Rmarkdown).
Currently, we effectively dump documents onto an FTP server tied to a webserver
(called Hala).
This works pretty well, but it makes it almost impossible to
search and discover other people's analyses
and makes getting review difficult.</p>
<p>To address these two problems,
we put together <a href="https://mozilla.report">mozilla.report</a>
and <a href="https://mozilla-private.report">mozilla-private.report</a>.
These are effectively lightweight blog indexes for public and private analyses.
This works <em>OK</em>, but it still requires analysts to take the time to check in
their results and get review.
It's a little heavy weight and isn't getting as much use as I would like.
Hell, I don't even use it all the time just because I'm busy.</p>
<h3>Levels of Maturity</h3>
<p>I think this process has room for improvement.</p>
<p>Just looking at my own workflow,
I want to be able to share my report more broadly
as my results become more polished.</p>
<p>I see four levels of increasing maturity:</p>
<ul>
<li><strong>Private</strong> - only accessible to the analyst</li>
<li><strong>Indexed</strong> - discoverable by those in-the-know</li>
<li><strong>Reviewed</strong> - results verified by a peer</li>
<li><strong>Public</strong> - report shared outside the company</li>
</ul>
<p>In the beginning, I want a backup of my work that is difficult to discover.
This allows me to iterate without misleading anyone.
If I need to share the report with one of my immediate peers
(e.g. because I am stuck or got pulled onto a different project)
then I want to be able to share a link to the rendered report.</p>
<p>Sometimes, I'll start an analysis and find that I'm doing something silly.
If that's the case, fine. I'm done.
Most of the time though, I'll tie together my results into a readable report.
I want to have this readable report indexed and discoverable by my team.
This would allow me to find old reports quickly,
and find any of my peers' prior art.
Ideally, this would allow my team to keep up on my work in their own time
(like an RSS feed of peer analyses).</p>
<p>Most analyses stop here,
but some analyses are critical enough to require peer review.
In that case, I want to be able to get line-by-line review
for my code and commentary.
I want to be able to reference this review thread later.
Finally, I want some token to verify that the report is reviewed
to lend the analysis some authority.</p>
<p>Finally, some reports should be make public outside the company.</p>
<h2>Implementation</h2>
<p>Thus far, I've been trying to hack this workflow together
by strapping existing tools together with duct tape and bailing wire.
Unfortunately, I think we'll need some custom tools to make this work well.</p>
<p>Here's the workflow I'm imagining now:</p>
<ul>
<li>Private reports<ul>
<li>Start a new investigation by starting a git repo</li>
<li>Push results to the git repo, including rendered reports</li>
<li>Technical peers can clone the repo to review results</li>
<li>Non-technical peers need a way to see rendered reports...</li>
</ul>
</li>
<li>Indexed reports<ul>
<li>I add my git-repo-url to a central list of analyses.</li>
<li>Some small script provides an indexed list of these repos
with some metadata and links to any reports
(this is similar to what <a href="https://github.com/harterrt/docere">docere</a>
does now, except it reads repos instead of reports)</li>
</ul>
</li>
<li>Reviewed reports<ul>
<li>IDK? Maybe copy the analysis to a central repo like we do for mozilla.report?</li>
</ul>
</li>
<li>Public reports<ul>
<li>IDK? Probably very similar to the reviewed report step...</li>
</ul>
</li>
</ul>
<p>In general, I like to avoid monolithic shared repositories -
at least while I'm prototyping analyses.
Instead, I like starting a new repository for each investigation.
When I'm prototyping an analysis I have a lot of small commits
and a lot of non-code files like CSVs and HTML reports.
This can cause merge-conflicts which are a PITA and cause a lot of stop-energy.</p>
<p>Unfortunately, we don't have a good way to
host private git repositories at Mozilla either.
This is task number 1!</p>
<p>I considered keeping analyses in branches of a central repository.
This makes it easy to get review and keeps private reports from being indexed.
However, I find it hard to manage branches and too easy to delete them.</p>
<h3>Call for comments</h3>
<p>I don't have this figured out. Tell me what you think!
I'm especially interested in implementation ideas.
You can shoot me an email (harter at mozilla.com) or
you can leave comments on <a href="https://docs.google.com/document/d/1fn2AiDBTMlxf7iZVn43LUTfkl83jpXBNQntVrztIgx8/edit#">this doc</a>.</p>Technical Leadership Paths2019-11-08T00:00:00-08:002019-11-08T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2019-11-08:/keavy-tech-leadership-path.html<p>I found
<a href="https://keavy.com/work/thriving-on-the-technical-leadership-path/">this article</a>
a few weeks ago and I really enjoyed the read.
The author outlines what a role can look like for very senior ICs.
It's the first in a (yet to be written) series about technical leadership
and long term IC career paths.
I'm excited to read …</p><p>I found
<a href="https://keavy.com/work/thriving-on-the-technical-leadership-path/">this article</a>
a few weeks ago and I really enjoyed the read.
The author outlines what a role can look like for very senior ICs.
It's the first in a (yet to be written) series about technical leadership
and long term IC career paths.
I'm excited to read more!</p>
<p>In particular, I am delighted to see her call out <strong>strategic work</strong>
as a way for a senior IC to deliver value.
I think there's a lot of opportunity for senior ICs to deliver strategic work,
but in my experience organizations tend to under-value this type of work
(often unintentionally).</p>
<p>My favorite projects to work on are high impact and difficult to execute
even if there not deeply technical.
In fact, I've found that my most impactful projects
tend to only have a small technical component.
Instead, the real value tends to come from
spanning a few different technical areas, tackling some cultural change,
or taking time to deeply understand the problem before throwing a solution at it.
Framing these projects as "strategic" help me
put my thumb on the type of work I like doing.</p>
<p>Keavy also calls out <strong>strike teams</strong> as a valuable way for ICs
to work on high impact projects without moving into management.
In my last three years at Mozilla,
I've been fortunate to be a part of several strike teams
and upon reflection I find that these are the projects I'm most proud of.</p>
<p>I'm fortunate that Mozilla has a well documented growth path for senior ICs.
All the same, I am learning a lot from her framing.
I'm excited to read more!</p>When the Bootstrap Breaks - ODSC 20192019-04-24T00:00:00-07:002019-04-24T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2019-04-24:/odsc-2019.html<p>I'm excited to announce that I'll be presenting at the
<a href="https://odsc.com/boston">Open Data Science Conference</a>
in Boston next week.
My colleague <a href="https://www.linkedin.com/in/saptarshiguha/">Saptarshi</a>
and I will be talking about
<a href="https://odsc.com/training/portfolio/when-the-bootstrap-breaks">When the Bootstrap Breaks</a>.</p>
<p>I've included the abstract below,
but the high-level goal of this talk is to strip some varnish off the …</p><p>I'm excited to announce that I'll be presenting at the
<a href="https://odsc.com/boston">Open Data Science Conference</a>
in Boston next week.
My colleague <a href="https://www.linkedin.com/in/saptarshiguha/">Saptarshi</a>
and I will be talking about
<a href="https://odsc.com/training/portfolio/when-the-bootstrap-breaks">When the Bootstrap Breaks</a>.</p>
<p>I've included the abstract below,
but the high-level goal of this talk is to strip some varnish off the bootstrap.
Folks often look to the bootstrap as a panacea for weird data,
but all tools have their failure cases.
We plan on highlighting some problems we ran into
when trying to use the bootstrap for Firefox data
and how we dealt with the issues, both in theory and in practice.</p>
<h3>Abstract:</h3>
<p>Resampling methods like the bootstrap are becoming increasingly common in modern data science.
For good reason too;
the bootstrap is incredibly powerful.
Unlike t-statistics, the bootstrap doesn’t depend on a normality assumption
nor require any arcane formulas.
You’re no longer limited to working with well understood metrics like means.
One can easily build tools that compute confidence for an arbitrary metric.
What’s the standard error of a Median?
Who cares! I used the bootstrap.</p>
<p>With all of these benefits the bootstrap begins to look a little magical.
That’s dangerous.
To understand your tool you need to understand how it fails,
how to spot the failure, and what to do when it does.
As it turns out, methods like the bootstrap and the t-test
struggle with very similar types of data.
We’ll explore how these two methods compare on troublesome data sets
and discuss when to use one over the other.</p>
<p>In this talk we’ll explore what types to data the bootstrap has trouble with.
Then we’ll discuss how to identify these problems in the wild
and how to deal with the problematic data.
We will explore simulated data and share the code to conduct the simulations yourself.
However, this isn’t just a theoretical problem.
We’ll also explore real Firefox data and discuss how Firefox’s data science team
handles this data when analyzing experiments.</p>
<p>At the end of this session you’ll leave with a firm understanding of the bootstrap.
Even better, you’ll understand how to spot potential issues in your data
and avoid false confidence in your results.</p>Slow to respond through 20182018-12-10T00:00:00-08:002018-12-10T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2018-12-10:/2018_slow_to_respond.html<p>I'm working on an urgent and high priority request for the next few weeks.
To make sure I can finish this work in 2018
I'm <strong>limiting my meetings and communications</strong> for the remainder of the year.</p>
<p>Slack is good for getting my immediate attention,
but if your request takes more …</p><p>I'm working on an urgent and high priority request for the next few weeks.
To make sure I can finish this work in 2018
I'm <strong>limiting my meetings and communications</strong> for the remainder of the year.</p>
<p>Slack is good for getting my immediate attention,
but if your request takes more than a one word response
it's likely to get lost in the shuffle.
If you need me to take some action
<a href="https://bugzilla.mozilla.org/enter_bug.cgi?assigned_to=rharter%40mozilla.com&bug_file_loc=http%3A%2F%2F&bug_ignored=0&bug_severity=normal&bug_status=NEW&cf_fx_iteration=---&cf_fx_points=---&component=General&contenttypemethod=list&contenttypeselection=text%2Fplain&flag_type-4=X&flag_type-607=X&flag_type-800=X&flag_type-803=X&form_name=enter_bug&maketemplate=Remember%20values%20as%20bookmarkable%20template&op_sys=Mac%20OS%20X&priority=--&product=Data%20Science&rep_platform=x86_64&target_milestone=---&version=unspecified">filing a bug</a>
is your best bet.
If you don't want to file a bug, email is fine.
Keep in mind that my response time will be very slow during this time.</p>
<p>If you need immediate help, try the following:</p>
<ul>
<li>If your question is about a search analysis or new search telemetry,
please contact bmiroglio AT mozilla.com</li>
<li>If your question is about search data, see the documentation here.
If that doesn't help, contact wlach AT mozilla.com</li>
<li>For general data science questions contact rweiss AT mozilla.com</li>
<li>For general telemetry questions,
ask #fx-metrics on Slack or #datapipeline on IRC</li>
</ul>
<p>Otherwise, I'll get back to you as soon as I can!
Thanks for your understanding.</p>If you can't do it in a day, you can't do it2018-06-27T14:21:00-07:002018-06-27T14:21:00-07:00Ryan T. Hartertag:blog.harterrt.com,2018-06-27:/day_barrier.html<p>I was talking with Mark Reid
about some of the problems with <a href="coding_in_textboxes.html">Coding in a GUI</a>.
He nailed part of the problem with soundbite too good not to share:</p>
<blockquote>
<p>"If you can't do it in a day, you can't do it."</p>
</blockquote>
<p>This is a persistent problem with tools that make …</p><p>I was talking with Mark Reid
about some of the problems with <a href="coding_in_textboxes.html">Coding in a GUI</a>.
He nailed part of the problem with soundbite too good not to share:</p>
<blockquote>
<p>"If you can't do it in a day, you can't do it."</p>
</blockquote>
<p>This is a persistent problem with tools that make you code in a GUI.
These tools are great for working on bite-sized problems,
but the workflow becomes painful
when the problem needs to be broken into pieces and attacked separately.</p>
<p>Part of the problem is that I can't test the code.
That means I need to understand how each change will affect the entire code base.
It's impossible to compartmentalize.</p>
<p>GUI's also make it difficult to split a problem across people.
If I can't track changes easily
it's impossible to tell whether my changes conflict with a peer's changes.</p>
<p>So look out, <a href="bad-tools.html">bad tools are insidious</a>!
If you find yourself abandoning an analysis because it's hard to refactor,
consider choosing a different toolchain next time.
Especially if it's because there's no easy way to move your code out of a GUI!</p>Planning Data Science is hard: EDA2018-06-26T17:19:00-07:002018-06-26T17:19:00-07:00Ryan T. Hartertag:blog.harterrt.com,2018-06-26:/planning_eda.html<p>Data science is weird.
It looks a lot like software engineering
but in practice the two are very different.
I've been trying to pin down where these differences come from.</p>
<p>Michael Kaminsky hit on a couple of key points
in his series on Agile Management for Data Science
on <a href="https://www.locallyoptimistic.com/">Locally …</a></p><p>Data science is weird.
It looks a lot like software engineering
but in practice the two are very different.
I've been trying to pin down where these differences come from.</p>
<p>Michael Kaminsky hit on a couple of key points
in his series on Agile Management for Data Science
on <a href="https://www.locallyoptimistic.com/">Locally Optimistic</a>.
In <a href="https://www.locallyoptimistic.com/post/agile-analytics-p2/index.html#exploratory-data-analysis">Part II</a>
Michael notes that Exploratory Data Analyses (EDA) are difficult to plan for:
"The nature of exploratory data analysis means
that the objectives of the analysis may change as you do the work." - Bingo!</p>
<p>I've run into this problem a bunch of times when trying to set OKRs for major analyses.
It's nearly impossible to scope a project
if I haven't already done some exploratory analysis.
<strong>I didn't have this problem when I was doing engineering work</strong>.
If I had a rough idea of what pieces I needed to stitch together,
I could at least come up with an order-of-magnitude estimate
of how long a project would take to complete.
Not so with Data Science:
I have a hard time differentiating between
analyses that are going to take two <strong>weeks</strong> and
analyses that are going to take two <strong>quarters</strong>.</p>
<p>That's all. No deep insight.
Just a +1 and a pointer to the folks who got there first.</p>You can't do data science in a GUI2018-06-26T00:00:00-07:002018-06-26T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2018-06-26:/ds_gui.html<p>I came across
<a href="https://www.youtube.com/watch?v=cpbtcsGE0OA">You can't do data science in a GUI</a>
by Hadley Wickham a little while ago.
He hits on a lot of the same problems I mentioned in
<a href="https://blog.harterrt.com/coding_in_textboxes.html">Don't make me code in your text box</a>.
Take a look if you have some time.
In the first 15m …</p><p>I came across
<a href="https://www.youtube.com/watch?v=cpbtcsGE0OA">You can't do data science in a GUI</a>
by Hadley Wickham a little while ago.
He hits on a lot of the same problems I mentioned in
<a href="https://blog.harterrt.com/coding_in_textboxes.html">Don't make me code in your text box</a>.
Take a look if you have some time.
In the first 15m he covers the arguement against coding in a GUI.
After that he plugs for R and the tidyverse.</p>Why bootstrap?2018-05-25T12:00:00-07:002018-05-25T12:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2018-05-25:/why_bootstrap.html<p>Over the next few quarters,
I'm going to focus my attention on Mozilla's experimentation platform.
One of the first questions we need to answer is
how we're going to calculate and report the necessary measures of variance.
Any experimentation platform needs to be able to
compare metrics between two groups …</p><p>Over the next few quarters,
I'm going to focus my attention on Mozilla's experimentation platform.
One of the first questions we need to answer is
how we're going to calculate and report the necessary measures of variance.
Any experimentation platform needs to be able to
compare metrics between two groups.</p>
<p>For example, say we're looking at retention for a control and experiment group.
Control shows a retention of 88.45% and experiment shows a retention of 90.11%.
Did the experimental treatment cause a real increase in retention
or did the experiment branch just get lucky when we assigned users?
We need to calculate some measure of variance to be able to decide.</p>
<p>The two most common methods to do this calculation are the frequentist's
<a href="https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">two-sample t-test</a>
or some form of
<a href="https://en.wikipedia.org/wiki/Bootstrapping_(statistics)">the bootstrap</a>.</p>
<p>In ye olden days, we'd be forced to use the two-sample t-test.
The bootstrap requires a lot of compute power
that just wasn't available until recently.
As you can imagine, the bootstrap is all the rage in the Data Science world.
Of course it is. We get to replace statistics with raw compute power!
That's the dream!</p>
<p>Still, the bootstrap isn't perfect for every problem.
Let's look at a few arguements for and against the bootstrap:</p>
<h2>Computational Efficiency</h2>
<p>The bootstrap obviously requires more compute resources.
Still, it's worth highlighting how
<strong>amazingly computationally efficient the t-test is</strong>.
You can calculate all you need for the t-test in a single pass through the data.
For each branch of the experiment all you need to calculate is:
a count, the sum of the data, and the sum of the square of the data
(<a href="https://en.wikipedia.org/wiki/Variance#Formulae_for_the_variance">to calculate the variance</a>).
All of these are easy to calculate in a map-reduce framework.
On the other hand,
the bootstrap is difficult to compute when your data do not fit in memory.</p>
<h2>The normality assumption</h2>
<p>T-tests feel arcane and make assumptions about the distribution of the data.
Most notably, t-tests <em>require your metric to be normally distributed</em>.
Assuming normal distributions sets off alarms
for anyone who's worked with real-world data.
On the other hand,
the bootstrap uses the sample distribution to describe the population's distribution
which feels like a much smaller assumption to make.</p>
<p>In reality, the bootstrap method and t-tests actually
make very similar assumptions about the underlying data.
Since the t-test is comparing two <em>means</em>,
the t-test's normality assumption holds so long as
<a href="https://en.wikipedia.org/wiki/Central_limit_theorem">the CLT</a> holds.
The CLT holds so long as
(1) you have a lot of data and
(2) the data have finite variance.
We generally have "a lot of data"
but the finite variance bit can be a problem<sup>1</sup>.
However! The bootstrap also fails if the data have infinite variance<sup>2</sup>.
All that to say,
<strong>if the t-test's normality assumption fails, the bootstrap is in trouble too</strong>.</p>
<p>On the other hand,
it can take a large sample for the CLT to make some datasets look normal
(like, N > 5000).
If you have a small, skewed data set, the bootstrap may be a better choice.
However, this is rarely a problem when you're working with Big Data™.</p>
<h2>Weird metrics</h2>
<p>It becomes practically impossible to calculate a t-test
if your metric isn't a mean.
The classic example here is testing for a change in the median.
What's the variance of a median?
Is the median normally distributed?</p>
<p>¯\_(ツ)_/¯</p>
<p><em>This</em> is where the bootstrap really shines!
With the t-test, you only have your one sample to work with.
With the bootstrap, you have as many samples as you want!
You can calculate any metric you want and get a confidence interval.</p>
<p>Personally, I think calculating the median is a lame example.
Percentiles, like the median, are notoriously hard to calculate over big data.
Instead, consider this (nearly) real life example:</p>
<p>Firefox collects anonymized performance data on a daily basis.
That data could look like this:</p>
<p>| <code>client</code> | <code>day</code> | <code>active_hours</code> | <code>janky_loads</code> |
|:---------|-------------:|---------------:|--------------:|
| 'aaa' | 2018-01-01 | 4.5 | 0 |
| 'bbb' | 2018-01-01 | 9.2 | 3 |
| 'ccc' | 2018-01-01 | 0.5 | 1 |
| ... | ... | ... | ... |</p>
<p>Let's say we launch a new feature that is supposed to
reduce the number of janky page loads a user sees per hour.
There's no obvious way to calculate a t-test for
<code>sum(janky_loads)/sum(active_hours)</code>.
What is the variance of that metric?
Remember, we only get one observation per sample.
The bootstrap handles this case trivially.</p>
<h2>Conclusion</h2>
<p>In summary, the bootstrap is awesome.
We get to replace arcane formulas with intuitive simulations
and we can calculate confidence intervals for any arbitrary metric.</p>
<p>On the other hand, the t-test is <em>much</em> more computationally efficient.
If you have really big data and you <em>know</em> you're only going to compare means,
the t-test may be a better choice.</p>
<hr />
<ol>
<li>For example, a
<a href="https://en.wikipedia.org/wiki/Power_law#Power-law_probability_distributions">power law distribution</a>
can easily have infinite variance.</li>
<li><a href="https://projecteuclid.org/download/pdf_1/euclid.aos/1176350371">Bootstrap of the mean in the infinite variance case Athreya, K.B. Ann Stats vol 15 (2) 1987 724-731</a></li>
</ol>
<hr />SQL Style Guide2018-05-17T00:00:00-07:002018-05-17T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2018-05-17:/sql_style_guide.html<p>I'm happy to announce, we now have a
<a href="https://docs.telemetry.mozilla.org/concepts/sql_style.html">SQL style guide</a>.
Check it out!</p>
<p>If you have any suggestions,
feel free to file a PR or issue in
<a href="https://github.com/mozilla/firefox-data-docs/blob/master/concepts/sql_style.md">the docs repository</a>.</p>
<p>Many thanks to all who participated in the
<a href="https://github.com/mozilla/stmocli/issues/9">St. Mocli conversation</a>
and @mreid for the review!</p>PSA: Don't use approximate counts for trends2018-04-24T00:00:00-07:002018-04-24T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2018-04-24:/hll_trends.html<p>I got caught giving some bad advice this week,
so I decided to share here as penance.
TL;DR: Probabilistic counts are great,
but they shouldn't be used everywhere.</p>
<hr />
<p>Counting stuff is hard.
We use probabilistic algorithms pretty frequently at Mozilla.
For example, when trying to get user counts,
we …</p><p>I got caught giving some bad advice this week,
so I decided to share here as penance.
TL;DR: Probabilistic counts are great,
but they shouldn't be used everywhere.</p>
<hr />
<p>Counting stuff is hard.
We use probabilistic algorithms pretty frequently at Mozilla.
For example, when trying to get user counts,
we rely heavily on Presto's
<a href="https://prestodb.io/docs/current/functions/aggregate.html#approx_distinct">approx_distinct</a>
aggregator.
Roberto's even written a
<a href="https://github.com/vitillo/presto-hyperloglog">Presto Plugin</a>
and a
<a href="https://github.com/vitillo/spark-hyperloglog">Spark Package</a>
to allow us to include
<a href="https://en.wikipedia.org/wiki/HyperLogLog">HyperLogLog</a>
variables in datasets like
<a href="https://docs.telemetry.mozilla.org/datasets/batch_view/client_count_daily/reference.html">client_count_daily</a>.</p>
<p>These algorithms save a lot of compute power and analyst time,
but it's important to remember that they do introduce some variance.</p>
<p>In fact, the error bars are substantial.
By default, Presto's <code>approx_distinct</code> is tuned to have a standard error of 2.3%,
which means one out of every three <code>approx_distinct</code> estimates
will be off by more than 2.3%.
I can set a tighter standard error by passing a second parameter,
but it
<a href="https://prestodb.io/docs/current/functions/aggregate.html#approx_distinct">looks like</a>
I can't request anything below 0.5%.
For our HLL datasets, we set a
<a href="https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/GenericCountView.scala#L45">default standard error</a>
of 1.63%, which is still significant.</p>
<p>Unfortunately, we can't get the standard error to be much smaller than 1%.
Databricks has
<a href="https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html">a writeup here</a>
which explains that the compute time for their probabilistic estimate
starts to be greater than the compute time for an exact count
somewhere between an error of 0.5% and 1.0%.</p>
<p>Most of the time, this isn't an issue.
For example, if I'm trying to count how many clients used a
<a href="https://addons.mozilla.org/en-US/firefox/addon/multi-account-containers/">Container Tab</a>
yesterday I don't care if it's 100mm or 105mm;
those numbers are the same to me.
However, <strong>that noise becomes distracting</strong>
if I'm building a dashboard to track year over year change.</p>
<h2>An example</h2>
<p>I put together an
<a href="https://blog.harterrt.com/images/probabilistic_counts.html">example notebook</a>
to explore a little.
I created a toy dataframe containing
7 days of data and 1000 <code>client_id</code>'s per day.
Then I got an approximate count of the clients for each day.
Here's what an arbitrary set of daily errors look like:</p>
<p><img src="https://blog.harterrt.com/images/probabilistic_count_errors.png"></p>
<p>By default, pyspark's
<a href="https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/functions.html#approxCountDistinct(java.lang.String,%20double)">approxCountDistinct</a>
aggregator has a relative standard deviation (<code>rsd</code>) of 5%!
The maximum error magnitude we see in this dataset is 7.5% (day 4).</p>
<p>In my opinion, Spark's documentation obfuscates the real interpretation
of this <code>rsd</code> value, calling it the: "maximum estimation error allowed".
In reality, there is no "maximum error" allowed.
The <code>rsd</code> is a standard deviation for an approximately normal distribution.
Roughly one in three errors are going to be bigger than the <code>rsd</code>.</p>
<p>What's worse is that this graph makes us think there's movement
in this metric over time.
In reality, the user count is perfectly flat at 1000 users every day.
Since these errors aren't correlated over time,
we see big day over day swings in the estimates.
The largest swing occurs from day 6 to day 7 where the user count
jumps by 13.7% (-6.8% to 6.9%)!</p>
<h2>Conclusion</h2>
<p>So what's the take away?
Probabilistic counts are still super useful tools,
but it's important to consider what kind of error they're going to introduce.
In particular, don't use probabilistic counts (like <code>approx_distinct</code>)
when looking at year over year rates or plotting trend lines.</p>Don't make me code in your text box!2018-03-28T00:00:00-07:002018-03-28T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2018-03-28:/coding_in_textboxes.html<p>Whenever I start a new data project,
my first step is rooting out any false assumptions I have about the data.</p>
<p>The key here is iterating quickly.
My workflow looks like this:
Code a little, plot the data, what do you see?
Ah, outliers.
Code a little, plot the data …</p><p>Whenever I start a new data project,
my first step is rooting out any false assumptions I have about the data.</p>
<p>The key here is iterating quickly.
My workflow looks like this:
Code a little, plot the data, what do you see?
Ah, outliers.
Code a little, plot the data, what do you see?
Shoot, why are there so many NULL's in the dataset?</p>
<p>This is a critical part of working with data
so we have a ton of tools tuned for fast iteration loops.
These are the tools in the "Building Intuition"
<a href="/stages_e13n.html">stage of experimental analysis</a>.
Jupyter notebooks are a perfect example.
Great way to explore a dataset quickly.</p>
<p>Once I'm done exploring,
I need to distill what I've learned so I can share it and reference it later.
This is where I run into problems.
Often, these fast-iteration tools are really <strong>hard to escape</strong>,
and are a <strong>horrible way to store code</strong>.
As a result,
these tools end up getting used for things they're not built to do.
It's hard to spot if you're not looking out for it.</p>
<p>I've boiled this down to a rule: <strong>Don't make me code in your text box!</strong></p>
<h2>Examples</h2>
<h3>Re:dash</h3>
<p>We use <a href="https://redash.io/">Re:dash</a> extensively at Mozilla.
For the unfamiliar,
Re:dash provides an interactive SQL front-end
where you can query and visualize your data.
It's a great tool for getting quick answers to data questions.
For example, what percentage of users are on Windows?
How many times was Firefox asked to load a page yesterday?</p>
<p>Re:dash is great when you're iterating quickly,
but it falls short when you want to share and maintain your queries.
I've built a few dashboards in re:dash
and I always get nervous when I hear they're getting used.
The problem is that I <strong>can't get review or track changes</strong> to my queries.
I wouldn't tell others to rely on untested and unreviewed code,
so it feels wrong to rely on untested queries.</p>
<p>I started building a tool to fix these problems.
<a href="https://github.com/mozilla/stmocli">St. Mocli</a>
allows you to store queries in a git repository
and deploy the queries to re:dash.
I've been using it for about a month now, and it's great.
It's much easier to maintain queries and getting review is far less painful.</p>
<p>Even better there were a bunch of unexpected benefits.</p>
<ul>
<li>It's easier to consistently format our queries
since we're editing queries in modern text editors instead of a HTML text-box</li>
<li>We can lint our queries since the queries are now stored in text files</li>
<li>There's clear ownership for each query (<code>git blame</code>)</li>
<li>We have more control over what our consumers are looking at
now that we have a central repository of queries</li>
</ul>
<h3>Wikis</h3>
<p>When I joined Mozilla's data team,
our documentation was in rough shape.
We had documentation, but it was a sprawling tangled mess.
It was easy to forget to update the docs or even to forget where the docs were.
Our documentation still isn't perfect,
but it's much better since switched to
<a href="https://docs.telemetry.mozilla.org/">docs.telemetry.mozilla.org</a>.</p>
<p>What changed?
We started using
<a href="https://www.gitbook.com/">Gitbook</a> and
<strong>stopped using a Wiki for documentation</strong>.
Wikis are a horrible way to store technical documentation.
In fact, I should probably write a whole article on this point,
but here are some highlights:</p>
<ul>
<li><strong>Writing long-form content in a wiki is painful</strong>.
I either write the content elsewhere
and publish by copy-pasting into a text-box,
or (more commonly) I have to iteratively edit the document in the text box.
Editing in the wiki means my half-finished article
is indistinguishable from complete documentation.</li>
<li>It's <strong>impossible to get review</strong>,
which makes it difficult to fix unclear writing.
Without review I can't tell when I'm being too terse or using a lot of jargon.</li>
<li>Writing in a wiki is thankless.
There's <strong>no artifact of your work</strong>.
Sure, there's a new article in the wiki,
but everyone built the wiki; It's not clear who wrote what.</li>
<li>It's easy for documentation to get lost.
A wiki makes it easy to have a <strong>wandering chain of references</strong>.
Most of the articles at the end of these chains are forgotten and out of date.</li>
</ul>
<p>We've also discovered some unexpected advantages
to storing our documentation in markdown.</p>
<ul>
<li>It was easy to integrate <a href="https://mermaidjs.github.io/">mermaid.js</a>
for <a href="https://docs.telemetry.mozilla.org/concepts/data_pipeline.html">a system diagram</a>.</li>
<li>We were able to add spell check CI,
which has the added benefit of highlighting jargon
and standardizing our terminology.</li>
<li>Soon we're going to add dead link CI as well.</li>
</ul>
<h3>Jupyter</h3>
<p>I already noted that Jupyter is a perfect example of a fast-iteration-loop tool.
I love opening up a new notebook to explore a problem and test my assumptions.
However, when it comes time to share my analysis,
I start running into problems.</p>
<p>First of all, Jupyter notebooks are stored as JSON objects in a text file.
This causes a whole host of problems.
It's difficult to track changes to these files in git.
Since the python code is stored as strings inside of a JSON object,
small changes to the analysis cause big changes to the storage file.
Also, it's impossible to lint or test any code stored in the <code>.ipynb</code> file.</p>
<p>It's easy to export the code from a notebook to a python file, which is great,
but I still want to use Jupyter to display my results.
Ideally, I could have a python package where all the logic is stored
and a Jupyter notebook that just displays the analysis results.
This actually works well, but it's still difficult.
There's no clear way to reload the development package in a live Jupyter notebook.</p>
<p>I don't have a great solution for this yet.
There are a few projects trying to address this problem though.
Mike outlines an interesting storage format
<a href="http://droettboom.com/blog/2018/01/18/diffable-jupyter-notebooks/">here</a>
There's also <a href="https://github.com/aaren/notedown">notedown</a>
and <a href="https://github.com/rossant/ipymd">ipymd</a>.</p>
<h2>Conclusion</h2>
<p>All of these tools were built to help analysts build intuition quickly,
which is a critical part of data science.
However, most of these tools <strong>compromise on composability</strong>.
Don't get me wrong, <strong>these tools are all useful and necessary</strong>,
but <a href="/bad-tools.html">bad tools are insidious</a>.
Be aware that these fast-iteration focused tools can get misused
if there's not an obvious path for migrating to something more stable.</p>The 5 Stages of Experiment Analysis2018-02-28T00:00:00-08:002018-02-28T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2018-02-28:/stages_e13n.html<p>I've been thinking about experimentation a lot recently.
Our team is spending a lot of effort trying to make Firefox experimentation feel easy.
But what happens after the experiment's been run?
There's <strong>not a clear process for taking experimental data and turning it into a decision</strong>.</p>
<p>I noted the importance …</p><p>I've been thinking about experimentation a lot recently.
Our team is spending a lot of effort trying to make Firefox experimentation feel easy.
But what happens after the experiment's been run?
There's <strong>not a clear process for taking experimental data and turning it into a decision</strong>.</p>
<p>I noted the importance of Decision Reports in
<a href="/good_experiment_tools.html">Desirable features for experimentation tools</a>.
This post outlines the process needed to get to a solid decision report.
I'm hoping that outlining this process
will help us disambiguate what our tools are meant to do
and identify gaps in our tooling.</p>
<p>So, here are the 5 Stages of Experiment Analysis as I see them:</p>
<h2>Build Intuition, Form an Opinion</h2>
<p>When I begin reviewing an experiment,
I need to <strong>get a feel for what's going on in the data</strong>.
That means I need to explore hypoetheses quickly.
Did the number of page loads unexpectedly increase? Why?
Did the number of searches unexpectedly stay flat? What are the error bounds?</p>
<p>Consequentially, <strong>I need tools that let me iterate quickly</strong>.
This will help me develop the story I'm going to tell in the final report.
Keep in mind,
most of what I see during this investigation will not be included in the report;
part of telling a good story is knowing what isn't important.</p>
<p>These tools are what most folks imagine when talking about tools for experimentation.</p>
<p>Prominent tools for this stage include the front ends for Google Analytics or Optimizely.
Basically, I'm talking about any webpage that shows you statistics like this
(from <a href="https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7">AirBnB's excellent blog</a>):</p>
<p><img alt="(Example Report from AirBnB)" src="https://blog.harterrt.com/images/e13n-example-report.jpeg" /></p>
<p>Some of Mozilla's tools in this category include:</p>
<ul>
<li>Test Tube</li>
<li>Mission Control</li>
<li>re:dash</li>
</ul>
<h2>Generate Artifacts</h2>
<p>Once I have an idea of what's happening in an experiment,
I start gathering the important results into a report.
It's <strong>important to freeze the results</strong> I'm seeing and include them in the report.
Five years from now, I want to be able to test whether my decision still makes sense.
Part of that is deciding whether the data are telling a different story now.</p>
<p>Unfortunately, this process usually looks like
copying and pasting tables into a Google Doc
or taking a screenshot from re:dash.
This works, but it's <strong>error prone and difficult to update</strong> as we get more data.</p>
<p>The other way this gets done is loading up a Jupyter notebook
and trying to reproduce the results yourself.
This is nice because the output is generally in a more useful format,
but this is clearly suboptimal.
I'm <strong>duplicating effort</strong> by re-implementing our experiment summary tools
and creating a second set of possibly <strong>inconsistent metrics</strong>.
It's important that these artifacts are consistent with the live tools.</p>
<p>We don't really have any tools that service this need at Mozilla.
In fact, I haven't heard about them anywhere.
This always seems to be done via a <strong>hodgepodge of custom scripts</strong>.</p>
<p>It would be ideal if we had a tool for gathering experiment results from our live tools.
For example, we could have one tool that:</p>
<ul>
<li>gathers experiment results from re:dash, testtube, etc</li>
<li>dumps those results into a local (markdown or HTML formatted) text file</li>
<li>helps a user generate a report with some standard scaffolding</li>
</ul>
<p>I've been calling this tool an "artifact generator"
but it probably needs a better name.</p>
<h2>Annotate and Explain</h2>
<p>Now we've gathered the important data into a single place.
We're not done yet, nobody will be able to make heads or tails of this report.
<strong>We need to add context</strong>.
What does this experiment represent?
What do these numbers mean?
Is this a big change or a small change? Do we just not know?
Is this surprising or common?
We should include answers to all these questions in the report,
as best we can.</p>
<p>This takes time and it takes revisions.
Our tools should support this.
For example,
it should be easy to update the tables generated by the artifact generator
without a lot of copy-pasting.
It should also be easy to make edits over the course of a week
(i.e. don't use a wiki).</p>
<p>The best tool I've seen in this area is <code>knitr</code>,
which supports <code>Rmd</code> report generation.
Jupyter is a prominent contender in this space,
but I usually run into significant issues with version control and collaboration.
<code>LaTeX</code> is a solid tool, but it's a real pain to learn.</p>
<h2>Get Review</h2>
<p>Before sharing a report every analyst should have the chance to get their work reviewed.
Getting review is a <strong>critical feature of any data science team</strong>.
In fact, this is so important that
I explicitly ask about review processes when interviewing with new companies.
Review is how I learn from my peers.
More so, review <strong>removes the large majority of the stress from my daily work</strong>.
I find my confidence in reviewed work is dramatically higher.</p>
<p>Again, this portion of the toolchain is fairly well supported.
Any code review tool will do a reasonably good job.
Filing a PR on GitHub is the canonical way I get review.</p>
<h2>Publish and Socialize</h2>
<p>Finally, I need to share my final report.
This should be simple, but I've found it to be difficult in practice.</p>
<p>There's as many options for publishing reports as there are stars in the sky.
Basically any content management system qualifies,
but few work well for this task.
I've seen companies use
wikis, public folders on a server, ftp, Google Docs, emailed .docx files, ...
All of these options make it <strong>difficult to get review</strong>.
Most of these options are a <strong>discoverability nightmare</strong>.</p>
<p>At Mozilla, we've been using AirBnB's
<a href="https://github.com/airbnb/knowledge-repo">knowledge-repo</a>
to generate <a href="http://reports.telemetry.mozilla.org/feed">RTMO</a>.
It does a reasonably good job,
but doesn't give the analyst enough control over the format of the final report.
I'm working on a replacement now,
called <a href="https://github.com/harterrt/docere">Docere</a>.</p>
<h1>Where to go next</h1>
<p>In summary, we already have pretty good tools for annotating reports and getting review.
I think we at Mozilla need to work on tools for
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1426163">generating experiment artifacts</a>
and <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1436787">publishing reports</a>.
I think we need to continue working on tools for building intuition,
but we're already working on these tools and are on the right track.</p>
<p>This doesn't solve the whole problem.
For one,
<strong>we still need a process for making a decision</strong> from these decision reports.
Having a well reasoned argument is only part of the decision.
Who makes the final call?
How do we guarantee we're our decision making is consistent?
This process also ignores building a cohesive style for reports.
Having consistent structure is important.
It gives readers confidence in the results and reduces their cognitive load.</p>
<p>I think this is a good start though.</p>Asking Questions2018-02-09T00:00:00-08:002018-02-09T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2018-02-09:/preferred_media.html<p>Will posted a great article a couple weeks ago,
<a href="https://wlach.github.io/blog/2018/01/giving-and-receiving-help-at-mozilla/">Giving and Receiving Help at Mozilla</a>.
I have been meaning to write a similar article for a while now.
His post finally pushed me over the edge. </p>
<p>Be sure to read Will's post first.
The rest of this article is an …</p><p>Will posted a great article a couple weeks ago,
<a href="https://wlach.github.io/blog/2018/01/giving-and-receiving-help-at-mozilla/">Giving and Receiving Help at Mozilla</a>.
I have been meaning to write a similar article for a while now.
His post finally pushed me over the edge. </p>
<p>Be sure to read Will's post first.
The rest of this article is an addendum to his post.</p>
<h2>Avoid Context Free Pings</h2>
<p>Context free pings should be considered harmful.
These are pings like <code>ping</code> or <code>hey</code>.
The problem with context free pings are documented elsewhere
(<a href="http://edunham.net/2017/10/05/saying_ping.html">1</a>,
<a href="https://blogs.gnome.org/markmc/2014/02/20/naked-pings/">2</a>,
<a href="http://www.nohello.com/2013/01/please-dont-say-just-hello-in-chat.html">3</a>)
so I won't discuss them here.</p>
<h2>Pings are Ephemeral</h2>
<p>IRC and Slack are nice because they generate notifications.
If you need a quick response, IRC or Slack are the way to go.
I get Slack and IRC notifications on my phone, so I'm likely to respond quickly.
On the other hand, these notifications disappear easily,
which makes it easy for me to lose your message.
<strong>If you don't hear from me immediately, it's a good idea to send an email</strong>.</p>
<p>Otherwise, I don't mind pings at all.
Some folks worry about creating interruptions, but this isn't a problem for me.
I limit the notifications I get so <strong>if I don't want to get your notification, I won't</strong>.
If I'm looking at Slack, I'm already distracted.</p>
<p>In short, consider these rules of thumb:</p>
<ul>
<li>If it will take me <strong>less</strong> than 2m to respond to you and it's urgent, ping me</li>
<li>If it will take me <strong>more</strong> than 2m to respond to you and it's urgent, file a bug and ping me</li>
<li>If it's not urgent just email me</li>
</ul>
<h2>Prefer Open Channels</h2>
<p>I've spent a lot of time on documentation at Mozilla.
It's hard.
Our tools are constantly under development and our needs are always changing
so our documentation needs constant work.
<strong>Asking questions in the open reduces our documentation burden</strong>.</p>
<p><a href="http://www.bmannconsulting.com/archive/email-is-the-place-where-information-goes-to-die/">Email is where information goes to die</a>.
If we discuss a problem in a bug, that conversation is open and discoverable.
It's not always useful, but it's a huge win when it is.
<strong>File a bug instead of writing an email</strong>.
@mention me in on #fx-metrics instead of PM-ing me.
CC an open mailing list if you need to use email.</p>Managing Someday-Maybe Projects with a CLI2018-01-03T00:00:00-08:002018-01-03T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2018-01-03:/sdmb.html<p>I have a problem managing projects I'm interested in but don't have time for.
For example, the <a href="/slack_alerts.html">CLI for generating slack alerts</a> I posted about last year.
Not really a priority, but helpful and not that complicated.
I sat on that project for about a year before I could finally …</p><p>I have a problem managing projects I'm interested in but don't have time for.
For example, the <a href="/slack_alerts.html">CLI for generating slack alerts</a> I posted about last year.
Not really a priority, but helpful and not that complicated.
I sat on that project for about a year before I could finally execute on it.</p>
<p>I want to be able to keep track of these projects for inspiration,
but <strong>my TODO list get's overwhelming</strong>
if I try to include all of these low-priority projects.
Getting Things Done suggests keeping a "Someday-Maybe (SDMB)" folder
that you review regularly.
I tried this, but even the SDMB list gets unweildy so I dread reviewing it.</p>
<p>I think I have a handle on it now, though <sup>1</sup>.
I started a directory at <code>~/sdmb</code>
with markdown files for each SDMB project.
This is nice for two reasons:</p>
<ol>
<li>It doesn't clog up your task list with un-actionable tasks</li>
<li>You can review a list of SDMB <em>projects</em>
without reviewing all of the associated <em>TODOs</em>.
The <strong>project list should be much shorter</strong> and
I can usually tell what's interesting by reviewing the project names.
I don't need to know the next action.</li>
</ol>
<p>Here's a bash snippet to make this feel natural.
It creates a new command <code>sdmb</code> that either
lists all projects in the SDMB folder
or opens a given SDMB project file (with auto-complete!).</p>
<p>I recommend reviewing the list of projects monthly.
If any projects look interesting,
review that project's notes and pull out a couple of TODOs.</p>
<p>Here's the snippet:
```bash
dir="$HOME/somedaymaybe"</p>
<p>_list_sdmb_projects () {
ls -1 $dir | cut -f 1 -d '.'
}</p>
<p>sdmb () {
if [ $# -eq 0 ]
then
# If no arguement provided, list available projects
_list_sdmb_projects
else
# Edit given project
local id="$1"
local file="$dir/$id.md"</p>
<pre><code> vim "$file"
fi
</code></pre>
<p>}</p>
<h1>Bash auto-complete</h1>
<p>_sdmbComplete()
{
local cur=${COMP_WORDS[COMP_CWORD]}
COMPREPLY=( $(compgen -W "$(_list_sdmb_projects)" -- $cur ))
}</p>
<p>complete -F _sdmbComplete sdmb
```</p>
<hr />
<p><sup>1</sup>: Thanks to Tom's great post <a href="https://cs-syd.eu/posts/2016-02-21-return-to-taskwarrior">here</a> for inspiration:</p>Removing Disqus2018-01-02T00:00:00-08:002018-01-02T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2018-01-02:/disqus.html<p>I'm removing Disqus from this blog.
Disqus allowed readers to post comments on articles.
I added it because it was easy to do,
but I no longer think it's worth keeping.</p>
<p>If you'd like to share your thoughts,
feel free to shoot me an email at <code>harterrt</code> on gmail.
I …</p><p>I'm removing Disqus from this blog.
Disqus allowed readers to post comments on articles.
I added it because it was easy to do,
but I no longer think it's worth keeping.</p>
<p>If you'd like to share your thoughts,
feel free to shoot me an email at <code>harterrt</code> on gmail.
I try to respond to all of my email daily.</p>
<h2>Cons</h2>
<p>Disqus started showing a red notification symbol at the bottom of every post.
The notification is just a distraction aimed at increasing engagement with the comments.
It's ugly and I don't like the distraction is introduces to my posts.
This is my primary complaint.</p>
<p>Beyond that, there are just small annoyances.
E.g. I don't need another inbox to maintain
and I think the UI is a little ugly.</p>
<h2>Pros</h2>
<p>There aren't many.
I've only had one comment on this blog,
and I'm confident I would have gotten that feedback through other channels
had the comment system not been available.</p>Productivity Systems for Stress Management2018-01-02T00:00:00-08:002018-01-02T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2018-01-02:/productivity_systems.html<p>Over the years, I've developed a pretty involved productivity system.
It was originally based on <a href="https://www.amazon.com/Getting-Things-Done-Stress-Free-Productivity/">Getting Things Done</a>,
but now it's grown to include the good bits from other systems.
It's involved, but I love it.</p>
<p>I get a lot of comments,
especially on the little black book I keep …</p><p>Over the years, I've developed a pretty involved productivity system.
It was originally based on <a href="https://www.amazon.com/Getting-Things-Done-Stress-Free-Productivity/">Getting Things Done</a>,
but now it's grown to include the good bits from other systems.
It's involved, but I love it.</p>
<p>I get a lot of comments,
especially on the little black book I keep in my back pocket.
I hear people say they want to get organized so they can be more productive,
but I think that misses the mark.</p>
<p>Getting organized may make you more productive,
but the real benefit is that <strong>getting organized makes you less stressed</strong>.</p>
<p>The intro to "<a href="https://www.amazon.com/Getting-Things-Done-Stress-Free-Productivity/">Getting Things Done</a>" does a great job of explaining this.
The gist is that filling your consciousness with list of things you have to do <strong>later</strong>
distracts from what you're doing <strong>now</strong>.
Irrelevant stuff keeps popping into your head and causing stress.</p>
<p>Instead of trying to remember all the stuff you need to do,
build a trusted system that will remember for you.
Then all you need to do is set up a few habits to remind you to look at your system.
<strong>Your brain is bad at remembering, but it's good at habits</strong>.</p>
<p>For the past few years, my main goal has been increasing how much I enjoy my work.
<strong>Cutting the stress out of my workday was a huge improvement to my work satisfaction</strong>.
If you're feeling stressed or burnt out,
I highly recommend looking at whether a productivity system would help.</p>CLI for alerts via Slack2017-12-08T00:00:00-08:002017-12-08T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2017-12-08:/slack_alerts.html<p>I finally got a chance to scratch an itch today.</p>
<h2>Problem</h2>
<p>When working with bigger ETL jobs,
I frequently run into jobs that take hours to run.
I usually either step away from the computer
or work on something less important while the job runs.
I <strong>don't have a good …</strong></p><p>I finally got a chance to scratch an itch today.</p>
<h2>Problem</h2>
<p>When working with bigger ETL jobs,
I frequently run into jobs that take hours to run.
I usually either step away from the computer
or work on something less important while the job runs.
I <strong>don't have a good way to get an alert when the job completes</strong>.
So instead of going back to my important work,
I keep toying with
<a href="http://news.ycombinator.com">whatever task I picked up</a> to fill the dead time.
I only get back to my primary task after I remember to check on it.</p>
<p>This is easier to fix when you're developing locally,
but I'm frequently developing jobs on EC2 instances via ATMO.
<strong>There's no good way to forward alerts to my local system</strong>.</p>
<p>Even then, I frequently step away from the computer to take a break while the job runs.
Sometimes the job stops after 10m instead of the usual execution of ~120m.
That usually means I had a command line flag set wrong
or that I fat-fingered a file name.
It would be great to be able to
<strong>see this alert immediately, even if I'm not at my computer,</strong>
instead of waiting an hour until I check on my machine again.</p>
<p>The fix was crazy simple.
I created a little slack bot, installed a slack-cli, and added a bash command.
Now I can just issue a command like:
<code>sleep 10; slack Your task just completed.</code>
and in 10 seconds, I'll get a ping from <code>harterbot</code> on slack.
Setting this up on a remote cluster would be trivially easy as well.
You just need to be confident in storing a Slack API token.</p>
<h2>Action</h2>
<p>Here's how I did this:</p>
<ol>
<li><a href="https://my.slack.com/services/new/bot">Create a new bot</a>,
I called mine <code>harterbot</code>.
Save the API token for later.</li>
<li>Install slack-cli with <code>pip install slack-cli</code></li>
<li>Instantiate your <code>slack-cli</code> installation by issuing a test command:
<code>slack-cli -d {{YOUR USERNAME}} "Test message"</code>.
This will ask for the API token from step 2.
You should see a new message from your bot.</li>
<li>(Optional) Add the following helper function to your <code>.bashrc</code>:</li>
</ol>
<p>```bash</p>
<h1>Ping me with an alert on Slack</h1>
<p>slack () {
slack-cli -d {{YOUR SLACK HANDLE}} -- "$*";
}
```</p>
<p>Boom, you should be good to go!</p>
<p>Now I'm thinking we can generate an ATMO bot with shared credentials,
then there's no need to instantiate a new machine with your credentials.</p>
<p>For reference,
Slack's bot documentation is here:
<a href="https://api.slack.com/bot-users">here</a>,</p>Experiments are releases2017-12-07T00:00:00-08:002017-12-07T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2017-12-07:/experiments_are_releases.html<p><a href="https://github.com/mozilla/missioncontrol">Mission Control</a>
was a major 2017 initiative for the Firefox Data team.
The goal is to provide release managers with near-real-time
release-health metrics minutes after going public.
Will has a
<a href="https://wlach.github.io/blog/2017/10/mission-control/">great write up here</a>
if you want to read more.</p>
<p>The key here is that the data has to be …</p><p><a href="https://github.com/mozilla/missioncontrol">Mission Control</a>
was a major 2017 initiative for the Firefox Data team.
The goal is to provide release managers with near-real-time
release-health metrics minutes after going public.
Will has a
<a href="https://wlach.github.io/blog/2017/10/mission-control/">great write up here</a>
if you want to read more.</p>
<p>The key here is that the data has to be updated quickly.
We're trying to <strong>react</strong> to bad releases so we can roll back the change.
Once we've bought some time, we can step back and figure out what went wrong.
It's like pulling your hand away from a hot stove.</p>
<p>This is different from the data we talk about when talking about experiments.
With experiments, we <strong>purposely avoid looking at early data</strong> to avoid bias.
Users behave differently on Monday and Friday.
We don't want to base a decision solely on data from a holiday.
When we've gathered all of our data,
we carefully consider metric movements then make a decision.</p>
<p>Since these use cases are so different,
we developed our release tools (Mission Control)
separately from our experimentation tools.
We have the <a href="https://github.com/mozilla/missioncontrol">Experiments Viewer</a>
and the associated ETL jobs.
Now we're working on a new front-end called Test Tube.</p>
<p>However, after working with a few experiments,
I've found <strong>we need reactive metrics for experiments</strong> as well.
Currently, when we release an experiment
we don't get any feedback on whether the branches are behaving as expected.
The experiment could be crashing for unexpected reasons,
or the experiment branch could be identical to control (a null experiment) due to a bug.
Without these reactive metrics, it takes weeks to identify bugs.</p>
<p>The more I think about it,
the more it seems like experiments are actually a type of release.
I can't think of one release metric I wouldn't want to see for an experiment.
This makes me think we should expand our release tools to handle experiments as well.</p>
<p>This does not mean all of our decision metrics need to be real-time.
In fact, <strong>real time decision metrics are probably undesirable</strong>.
We want some top-level vital signs - e.g. crashes and usage hours.</p>
<p>When I first started thinking about this I proposed,
"all releases are a type of experiment".
I'm no longer sure this is true.
I think we <strong>could modify our releases to be experiments</strong>,
but our current release process doesn't look like an experiment to me.
For example, we could keep a control branch while we roll-out a new release.
This would allow us to catch regressions to our decision metrics
(e.g. a drop in URI count).</p>
<p>Shoot me an email if you think I'm a crazy person or if you think I'm on to something.</p>Desirable features of experimentation tools2017-12-06T00:00:00-08:002017-12-06T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2017-12-06:/good_experiment_tools.html<h2>Introduction</h2>
<p>At Mozilla,
we're quickly climbing up our
<a href="https://cdn-images-1.medium.com/max/1600/1*7IMev5xslc9FLxr9hHhpFw.png">Data Science Hierarchy of Needs</a>
<sup>1</sup>.
I think the next big step for our data team
is to <strong>make experimentation feel natural</strong>.
There are a few components to this (e.g. training or culture)
but improving the <strong>tooling is going to be …</strong></p><h2>Introduction</h2>
<p>At Mozilla,
we're quickly climbing up our
<a href="https://cdn-images-1.medium.com/max/1600/1*7IMev5xslc9FLxr9hHhpFw.png">Data Science Hierarchy of Needs</a>
<sup>1</sup>.
I think the next big step for our data team
is to <strong>make experimentation feel natural</strong>.
There are a few components to this (e.g. training or culture)
but improving the <strong>tooling is going to be important</strong>.
Today, running an experiment is possible but it's not easy.</p>
<p>I want to spend a significant part of 2018 on this goal,
so you'll probably see a bunch of
<a href="/tag/experimentation.html">posts on experimentation</a>
soon.</p>
<p>This article is meant to be an overview of
a few principles I'd like to be reflected in our experimentation tools.
<strong>I stopped myself from writing more</strong> so I could get the article out.
Send me a ping or an email if you're interested in more detail
and I'll bump the priority.</p>
<h2>Decision Metrics</h2>
<p>An experiment is a <strong>tool to make decisions easier</strong>.</p>
<p>Sometimes, this isn't the way it works though.
It's easy to let data confuse the situation.
One way to avoid confusion is maintaining a <strong>curated set of decision metrics</strong>.
These metrics will not be the only data you review,
but they will give a high level understanding of how the experiment impacts the product.</p>
<p>Curating decision metrics:</p>
<ul>
<li>limits the number of metrics you need to review</li>
<li>reduces false positives and increases experimental power</li>
<li>provides impact measures that are consistent between experiments</li>
<li>clarifies what's important to leadership</li>
</ul>
<p>I plan on explanding this section into its own post.</p>
<!---
TODO: Post on curating decision metrics
Comment on the above bullets and how to use supplementary metrics.
E.g. maybe URIs is neutral, but your custom metric shows big changes. That's fine
-->
<h2>Interpretability</h2>
<p>We should <strong>value interpretability in our decision metrics</strong>.
This sounds obvious, but it's surprisingly hard to do.</p>
<p>When reviewing our results, we should <strong>always consider practical significance</strong>.
Patrick Riley explains this beautifully in
<a href="http://www.unofficialgoogledatascience.com/2016/10/practical-advice-for-analysis-of-large.html">Practical advice for analysis of large, complex data sets</a>
:</p>
<blockquote>
<p>With a large volume of data,
it can be tempting to focus solely on statistical significance
or to hone in on the details of every bit of data.
But you need to ask yourself,
“Even if it is true that value X is 0.1% more than value Y, does it matter?”</p>
<p>...</p>
<p>On the flip side, you sometimes have a small volume of data.
Many changes will not look statistically significant but that is different than claiming it is “neutral”.
You must ask yourself
“How likely is it that there is still a practically significant change”? </p>
</blockquote>
<p>One of the major problems with p-values
is that they do not report practical significance.
Also note that practical significance is difficult to assess
if our decision metrics are uninterpretable.</p>
<p>More on this coming soon.</p>
<!---
TODO: Post: We should probably step away from histograms for this reason.
-->
<h2>Decision Reports</h2>
<p>Experiment results should be <strong>easy to export to plain text</strong>.
This allows us to capture a snapshot from the experiment.
Data doesn't always age well,
so it's important to record what we were looking at when we made a decision.
This will make it easier for us to overturn a decision if the data changes.</p>
<p>For the foreseeable future,
experiment results will need review to be actionable.
Accordingly, we should include our
<strong>interpretation with the experiment results</strong>.
This is another advantage of exporting results in plain text;
Plain text is easy to annotate.</p>
<p>There will always be context not captured by the experiment.
It's important that we
<strong>capture all of the reasoning behind a decision in one place</strong>.
The final result of an experiment should be a <strong>Decision Report</strong>.
The Decision Report should be immutable,
though we may want to be able to append notes.
Decision reports may summarize more than one experiment.</p>
<!---
TODO: post Experimental decisions should be consistent
We need to look at a consistent set of metrics.
E.g. the launch/unlaunch loop.
Not included here because it's more of a culture thing
when looked at as an addition to these changes.
-->
<hr />
<p><sup>1</sup> Source: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007</p>Submission Date vs Activity Date2017-12-04T00:00:00-08:002017-12-04T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2017-12-04:/dates.html<p>My comments on
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">Bug 1422892</a>
started to get long,
so I started untangling my thoughts here.</p>
<hr />
<p>From
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">the bug</a>:</p>
<blockquote>
<p>We experimented with using <code>activity_date</code> instead of <code>submission_date</code>
when developing the <code>clients_daily</code> etl job.
We should summarize our findings and decide on
which of these measures we'd like to standardize against …</p></blockquote><p>My comments on
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">Bug 1422892</a>
started to get long,
so I started untangling my thoughts here.</p>
<hr />
<p>From
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">the bug</a>:</p>
<blockquote>
<p>We experimented with using <code>activity_date</code> instead of <code>submission_date</code>
when developing the <code>clients_daily</code> etl job.
We should summarize our findings and decide on
which of these measures we'd like to standardize against in the future. </p>
</blockquote>
<h2>Summary of the problem</h2>
<p><code>activity_date</code> is generally preferable to <code>submission_date</code>
because it's closer to what we actually want to measure.
There's a delay between user activity and us receiving the data.
:chutten has some
great analysis<a href="https://chuttenblog.wordpress.com/2017/02/09/data-science-is-hard-client-delays-for-crash-pings/">[1]</a>
on the empirical difference between submission and activity dates,
if you want to read more.
95% of pings are received within two days of the actual activity
<a href="https://chuttenblog.wordpress.com/2017/09/12/two-days-or-how-long-until-the-data-is-in/">[2]</a>,
but that means using
<strong><code>submission_date</code> "smears" data between today and yesterday</strong> (mostly).</p>
<p>However, <strong><code>submission_date</code> is much easier to work with computationally</strong>.
When we partition by <code>submission_date</code>,
most jobs only need to process one day of data at a time.
This makes it much easier to continuously update datasets and backfill missing data.</p>
<p><code>clients_daily</code> is currently limited to 6 months of historical data
because the <strong>entire dataset needs to be regenerated every day</strong>.
This is inconvenient and causes real limitations when using the dataset [3].
The job takes between 90 and 120 minutes to run and currently finishes near 9:00 UTC.
Adding more data to this job will push that completion time back,
meaning the data will be unavailable for the first few working hours every day.
Eew.</p>
<h2>Solutions</h2>
<p>I see three possible options:</p>
<ol>
<li>Standardize to <code>submission_date</code></li>
<li>Standardize to <code>activity_date</code> and try to mitigate the performance losses</li>
<li>Allow both, but provide guidance for when to use each configuration</li>
</ol>
<p>So far, the data engineering team has strongly recommended using <code>submission_date</code>.
The difference between <code>submission_date</code> and <code>activity_date</code>
has become even smaller with our team's work on ping sender
<a href="https://chuttenblog.wordpress.com/2017/07/12/latency-improvements-or-yet-another-satisfying-graph/">[4]</a>.
Without a strong counter argument, I recommend continuing with <code>submission_date</code>.</p>
<p>If we do have a strong reason to continue keying datasets by <code>activity_date</code>,
I recommend only using <code>activity_date</code> on "small" datasets.
These are datasets built over a sample of our data,
build over a rarer type of ping (e.g. not main pings),
or heavily aggregated (e.g. to country-day).
Someone should provide documentation on when <code>activity_date</code> is [un]necessary
to be included in <a href="https://docs.telemetry.mozilla.com">docs.tmo</a>.</p>
<hr />
<ol>
<li>https://chuttenblog.wordpress.com/2017/02/09/data-science-is-hard-client-delays-for-crash-pings/</li>
<li>https://chuttenblog.wordpress.com/2017/09/12/two-days-or-how-long-until-the-data-is-in/</li>
<li>https://bugzilla.mozilla.org/show_bug.cgi?id=1414044</li>
<li>https://chuttenblog.wordpress.com/2017/07/12/latency-improvements-or-yet-another-satisfying-graph/</li>
</ol>OKRs and 4DX2017-11-30T00:00:00-08:002017-11-30T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2017-11-30:/okrs_and_4dx.html<p>I feel like I'm swimming in acronyms these days.</p>
<p>Earlier this year,
my team started using Objectives and Key Results (OKRs) for our planning.
It's been a learning process.
I had some prior experience with OKRs at Google,
but I've never felt like I was fully taking advantage of the …</p><p>I feel like I'm swimming in acronyms these days.</p>
<p>Earlier this year,
my team started using Objectives and Key Results (OKRs) for our planning.
It's been a learning process.
I had some prior experience with OKRs at Google,
but I've never felt like I was fully taking advantage of the tool.</p>
<p>I just recently started digging through
<a href="https://www.amazon.com/Disciplines-Execution-Achieving-Wildly-Important/dp/1491517751">The 4 Disciplines of Execution</a>
(4DX)<sup>1</sup>
and, surprisingly, OKRs are starting to make a lot more sense.
This post outlines some ideas I've picked up through my reading.</p>
<h2>Too many goals</h2>
<p>For the last few quarters, my team has had 4-5 Objectives.
That's a little high, but it's within the recommended limits.
I usually have some work to do on each of these OKRs every week.
Some weeks I have a hard time prioritizing which objective I should work on.
Do I work on experimentation or search?
<code>¯\_(ツ)_/¯</code></p>
<p>When we set OKRs,
it feels like we're scoping out what work we can get done in the next quarter.
That leads to an OKR process that goes something like this:</p>
<ul>
<li>List out all the project work we could do,
order by importance,
and <strong>pack the quarter/year</strong> until it's full.</li>
<li>Group our project work into <strong>3-5 major themes</strong>.</li>
<li><strong>Explain why</strong> we're doing each class of project (Objectives)</li>
<li><strong>Develop metrics</strong> to describe "success" and set Key Results</li>
</ul>
<p>It's a useful exercise.
We can clearly communicate what we are and aren't working on
and why certain projects were deprioritized.
I like this process a lot and I think we should keep it,
but I don't think it harnesses the true value of OKRs.</p>
<p>Specifically, I think it
<strong>encourages us to set goals for projects that don't need them</strong>.
For example, The last two quarters
I've set OKRs for giving quick responses to client teams.
In reality, I'm already responding quickly.
In no world am I going to start ignoring questions because it's not in my OKRs.
It's an obvious priority.
This OKR isn't a good goal anymore, it's a placeholder for a time commitment.</p>
<h2>The Fix</h2>
<p>Instead, consider this process:
<strong>Assume nothing changes in the next quarter</strong>.
We keep executing on our day-to-day tasks just like we have in the past.
We answer questions, fix bugs, improve our tools.
All of it.</p>
<p>Now, what <strong>one thing could we change</strong> to have the biggest marginal impact on the business?
That's our new objective.</p>
<p>This is totally different from before.
We're <strong>not scoping out work</strong> for the next quarter.
We're identifying the <strong>one improvement we're going to protect</strong>
from the whirlwind of our daily work.
That means your single OKR
<strong>does not need to encompass all of the work you're going to do in a quarter</strong>.</p>
<p>In 4DX, they call this objective a <strong>Wildly Important Goal</strong> (WIG)</p>
<h2>The Benefits</h2>
<p>Emergencies flare up every now and then;
it happens.
But, I hate spending a week to put out a fire
just to realize I didn't make any progress on my OKRs.
I call these <em>Zero Weeks</em>.
In my experience, every week is crazy in it's own unique way,
but it's <strong>usually easy to sneak in an hour of work for a long-term priority</strong>.
On the other hand,
It's not easy to sneak in an hour of work for <strong>four</strong> long term priorities.
<strong>Focused objectives cut back on <em>Zero Weeks</em> </strong></p>
<p>The most obvious benefit of focusing our goals
is being able to <strong>build momentum behind important projects</strong>.
Sometimes, our projects lose steam near the finish line;
The tool becomes "good enough" for day-to-day use or a stakeholder loses interest.
Maybe the moment of urgency has passed.
In any case, it feels like the project is drifting to completion.
If we focus our team on one project, we'll be <strong>able to execute faster</strong>.
This means:</p>
<ul>
<li>Share holders are less likely to lose interest</li>
<li>We'll have fewer <em>Zero Weeks</em> so we'll be able to <strong>maintain context</strong></li>
<li>We'll <strong>stay motivated</strong> because the problem will be fresh in our minds
(Sometimes it's hard to remember why we're even working on a project)</li>
<li>We'll stay on task and notice drift more quickly (because we'll have more eyes)</li>
</ul>
<p>Our team has a lot of projects going on at the same time
and we're distributed around the world.
It's easy to feel disconnected from a teammate if you don't work on the same projects.
<strong>Working towards the same goal will make us feel more connected</strong> -
even if someone's only contributing an hour or two that week.</p>
<h2>But what about all the other work?</h2>
<p>Remember, your single OKR
<strong>does not need to encompass all of the work you're going to do in a quarter</strong>.</p>
<p>In reality, we have dozens of responsibilities we have to execute on every day:
code reviews, answering questions, meetings, interviews, actually coding...
Setting a single wildly important goal
can <strong>feel like you're ignoring all of the other important work</strong> that needs to get done.
I get that, and I'm still a little suspicious of this methodology for that reason.</p>
<p>However, I think I have a work-around for this.
We should continue to end our quarters
by prioritizing and packing the next quarter's work.
That work should be called our <strong>"Deliverables" not our OKRs</strong>.</p>
<p>We should <strong>expect to get our deliverables done</strong> every quarter
(not 70% done, as recommended for OKRs).
I think this is a much more useful and interesting metric for our partner teams.
We have teams that depend on our work.
I don't want them to have to
<strong>guess at which 30% of our goals isn't going to get done</strong>.</p>
<p>Of course,
this isn't great because now we have two rounds of planning and reporting.
It sounds like more busy work and more reporting.
But, I think it's <strong>actually less work that what we're doing now</strong>.
Compare the two workflows.</p>
<p>Currently we:</p>
<ul>
<li>List all possible projects, order by priority, and pack the next quarter</li>
<li>Group our project work into <strong>3-5 major themes</strong>.</li>
<li>Set objectives for <strong>each of these major themes</strong> (3-5 objectives)</li>
<li>Develop metrics and key results for <strong>each of these objectives</strong></li>
</ul>
<p>What I'm suggesting we do:</p>
<ul>
<li>List all possible projects, order by priority, and pack the next quarter
Call these our <strong>"deliverables"</strong>.</li>
<li>Step back and identify <strong>one wildly important objective</strong> we're going to focus on</li>
<li>Set key results and metrics for that <strong>one objective</strong></li>
</ul>
<p>Instead of setting 3-5 objectives and tens of key results,
we're <strong>only setting one objective with a few key results</strong>.
Also, this <strong>makes workday deliverables useful</strong>.
If we're still required to add deliverables every quarter,
we may as well get some use from them.</p>
<p>What do you think?
Am I missing something?</p>
<hr />
<p><sup>1</sup>
I first heard about this book in Cal Newport's
<a href="https://www.amazon.com/Deep-Work-Focused-Success-Distracted/dp/1455586692">Deep Work</a>
which I also recommend.</p>
<p><em>Thanks to :mreid for his review and comments.</em></p>Evaluating New Tools2017-10-26T00:00:00-07:002017-10-26T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2017-10-26:/new_tools.html<p>At Mozilla, we're still relatively early in our data science journey.
As such, we're always evaluating new tools to improve our analysis workflow
(<a href="http://jupyter.org/">jupyter</a> vs. <a href="http://rmarkdown.rstudio.com/">Rmd</a>),
or make our infrastructure more usable
(our home-rolled <a href="https://github.com/mozilla/telemetry-analysis-service">ATMO</a>
vs. <a href="https://databricks.com/">databricks</a>),
or scale our knowledge
(<a href="https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091">knoledge-repo</a>.
vs. <a href="https://www.gitbook.com/">gitbook</a>)</p>
<p>Most of these tools look like …</p><p>At Mozilla, we're still relatively early in our data science journey.
As such, we're always evaluating new tools to improve our analysis workflow
(<a href="http://jupyter.org/">jupyter</a> vs. <a href="http://rmarkdown.rstudio.com/">Rmd</a>),
or make our infrastructure more usable
(our home-rolled <a href="https://github.com/mozilla/telemetry-analysis-service">ATMO</a>
vs. <a href="https://databricks.com/">databricks</a>),
or scale our knowledge
(<a href="https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091">knoledge-repo</a>.
vs. <a href="https://www.gitbook.com/">gitbook</a>)</p>
<p>Most of these tools look like they have compelling wins over our existing solutions.
But when we build a demo,
our users ignore some tools and rave about others.
Why?
I think it's because some of <strong>the costs of adopting a new tool are subtle</strong>.</p>
<p>Unless your new tool is a perfect match for the problem at hand (very rare)
I need to spend time learning, coding, or configuring the tool to work for me.
At the same time,
I have <strong>work due today</strong> and an existing set of tools that are good enough.</p>
<p>What follows are some thoughts I have when deciding whether to adopt a new tool.
Maybe they will help you (or future me) <strong>debug problems with adoption</strong>.</p>
<h2>What am I taking home?</h2>
<p>If your new tool is internal-only, uncommon in the industry, or expensive
I'm going to be less likely to adopt it.</p>
<p>In this case, anything I learn while adopting your tool
is <strong>unlikely</strong> to be <strong>valuable to future employers</strong>.
I think of my
<a href="https://esimoney.com/two-huge-reasons-why-your-career-matters/">career as an asset</a>,
so if I get to do work that builds <strong>transferable skills</strong>,
I count that as <strong>part of my compensation</strong>.
On the other hand,
if I'm writing glue scripts to deal with idiosyncrasies in an internal tool,
I'm missing out.</p>
<p>I think this is a major reason
<strong>why large tech companies open source internal technologies</strong>.
Consider
<a href="https://code.facebook.com/projects/552007124892407/presto/">prestodb</a>
or <a href="https://golang.org/">golang</a>.
How much would it suck to spend time learning these tools
if they were internal-only?
When you leave the company all of that skill becomes useless.
By open-sourcing these technologies,
you've just <strong>increased your employee compensation without spending a dollar</strong>.</p>
<h2>How long will I have access?</h2>
<p>If your tool is closed source or expensive,
I'm going to hesitate before spending any time with it.
I depend on my tools and it hurts to lose them.</p>
<p>This is why I prefer Python or R to MATLAB.
I can use my experience with Python or R build side projects
that scratch my own itch.
MATLAB is expensive, so I don't have that benefit.</p>
<h2>How long will it be relevant?</h2>
<p>Even if the tool is open source,
I want it to be configurable and composable.
This ensures it can grow with me.
I have no idea what the tech landscape will look like in 10 years,
but I do know it will be different.
<strong>I want your tool to play nicely with technology that doesn't exist yet</strong>.</p>
<p>Even better, if your tool is configurable and composable
it is probably going to take me much less time to get comfortable with it.</p>
<p>Composability is one of my bigger complaints about Jupyter.
Jupyter is a great tool for exploratory analysis,
but I don't want to use your GUI for editing code.
I'm much happier when I get to use my own tool chain.</p>
<p>However, Jupyter's saving grace is that it's configurable.
I'm working on a tool that will make it easy to develop
python packages and Jupyter notebooks side-by-side.
Hopefully, this will give us the best of both worlds.</p>
<h2>Conclusion</h2>
<p>All this to say,
I'm going to carefully gauge the lifetime value of any new tool I adopt.
If your users are ignoring a new tool you've created,
<strong>look carefully for hidden restrictions to lifetime value</strong>.</p>
<p>On the other hand,
if your tool solves a critical enough problem,
I'll stand barefoot in the snow to use it.</p>
<p>Does this all make any sense?
Am I missing something important?
Why do you roll your eyes when someone tries to sell you a new tool?</p>Documentation Style Guide2017-08-24T00:00:00-07:002017-08-24T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2017-08-24:/docs-style-guide.html<p>I just wrote up a style guide for our
<a href="https://docs.telemetry.mozilla.org">team's documentation</a>.
The documentation is rendered using Gitbook and hosted on Github Pages.
You can find the
<a href="https://github.com/mozilla/firefox-data-docs/pull/41">PR here</a>
but I figured it's worth sharing here as well.</p>
<h2>Style Guide</h2>
<p>Articles should be written in
<a href="https://daringfireball.net/projects/markdown/syntax">Markdown</a>
(not <a href="http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/">AsciiDoc</a>).
Markdown is usually …</p><p>I just wrote up a style guide for our
<a href="https://docs.telemetry.mozilla.org">team's documentation</a>.
The documentation is rendered using Gitbook and hosted on Github Pages.
You can find the
<a href="https://github.com/mozilla/firefox-data-docs/pull/41">PR here</a>
but I figured it's worth sharing here as well.</p>
<h2>Style Guide</h2>
<p>Articles should be written in
<a href="https://daringfireball.net/projects/markdown/syntax">Markdown</a>
(not <a href="http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/">AsciiDoc</a>).
Markdown is usually powerful enough and is a more common technology than AsciiDoc.</p>
<p>Limit lines to <strong>100 characters</strong> where possible.
Try to split lines at the end of sentences.
This makes it easier to reorganize your thoughts later.</p>
<p>This documentation is meant to be read digitally.
Keep in mind that people read digital content much differently than other media.
Specifically, readers are going to skim your writing,
so make it easy to identify important information</p>
<p>Use <strong>visual markup</strong> like <strong>bold text</strong>, <code>code blocks</code>, and section headers.
Avoid long paragraphs.
Short paragraphs that describe one concept each makes finding important information easier.</p>
<p>Please squash your changes into meaningful commits and follow these
<a href="https://chris.beams.io/posts/git-commit/">commit message guidelines</a>.</p>Beer and Probes2017-08-23T00:00:00-07:002017-08-23T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2017-08-23:/probes.html<p>Quick post to clear up some terminology.
But first, an analogy to clear up my thinking:</p>
<h2>Analogy</h2>
<p>Temperature control is a big part of brewing beer.
Throughout the brewing process I use a thermometer
to measure the temperature of the soon-to-be beer.
Because I take several temperature readings throughout the …</p><p>Quick post to clear up some terminology.
But first, an analogy to clear up my thinking:</p>
<h2>Analogy</h2>
<p>Temperature control is a big part of brewing beer.
Throughout the brewing process I use a thermometer
to measure the temperature of the soon-to-be beer.
Because I take several temperature readings throughout the brewing process,
one brew will result in a list of a half dozen temperature readings.
For example, I take a mash temperature,
then a sparge temperature,
then a fermentation temperature.
The units on these measurements are always in Fahrenheit,
but their interpretation is different.</p>
<h2>The Rub</h2>
<p>In this example, I would call the thermometer a "probe".
The set of all temperature readings share a "data type".
Each temperature reading is a "measurement" which is stored in a given "field".</p>
<p>At the SFO workweek I uncovered some terminology I found confusing.
Specifically, we use the word "probe" to refer to data we collect.
I haven't encountered this usage outside of Mozilla.</p>
<p>Instead, I'd suggest we call histograms and scalars "data types".
A "probe" is a unit of client-side code that collects a measurement for us.
A single "field" could be be a column in one of our datasets (like <code>normalized_channel</code>).
A measurement would be a value from a single field from a single ping (like the string "release").</p>Bad Tools are Insidious2017-06-15T00:00:00-07:002017-06-15T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2017-06-15:/bad-tools.html<p>This is my first job making data tools that other people use.
In the past, I've always been a data scientist -
a consumer of these tools.
I'm learning a lot.</p>
<p>Last quarter, I learned that bad tools are often hard to spot even when they're damaging productivity.
I sum this …</p><p>This is my first job making data tools that other people use.
In the past, I've always been a data scientist -
a consumer of these tools.
I'm learning a lot.</p>
<p>Last quarter, I learned that bad tools are often hard to spot even when they're damaging productivity.
I sum this up by saying that <strong>bad tools are insidious</strong>.
This may be <a href="https://sivers.org/obvious">obvious to you</a> but I'm excited by the insight.</p>
<h2>Bad tools are hard to spot</h2>
<p>I spent some time working directly with analysts building ETL jobs.
I found some big usability gaps with our tools
and I was surprised I wasn't hearing about these problems from our analysts.</p>
<p>I looked back to previous jobs where I was on the other side of this equation.
I remember being totally engrossed in a problem and excited to finding a solution.
All I wanted were tools good enough to get the job done.
I didn't care to reflect on how I could make the process smoother.
I wanted to explore and interate.</p>
<p>When I dug into analyses this quarter, I had a different perspective.
I was working with the intention of improving our tools
and the analysis was secondary.
It was much easier to find workflow improvements this way.</p>
<p>In the <a href="https://en.wikipedia.org/wiki/The_Design_of_Everyday_Things">Design of Everyday Things</a>
Donald notes that users tend to blame themselves when they have difficulty with tools.
That's probably part of the issue here as well.</p>
<h2>Bad tools hurt</h2>
<p>If our users aren't complaining, is it really a problem that needs to get fixed?
I think so.
We all understand that bad tools hurt our productivity.
However, I think we tend to underestimate the value of good tools when we do our mental accounting.</p>
<p>Say I'm working on a new ETL job that takes ~5 minutes to test by hand
but ~1 minute to test programatically.
By default, I'd value implementing good tests at 4 minutes per test run.</p>
<p>This is a huge underestimate!
Testing by hand introduces a context shift, another chance to get distracted,
and another chance to fall out of flow.
I'll bet a 5 minute distraction can easily end up costing me 20 minutes of productivity on a good day.</p>
<p>Your tools should be a joy to use.
The better they work, the easier it is to stay in flow, be creative, and stay excited.</p>
<h1>In Summary</h1>
<p>Don't expect your users to tell you how to improve your tools.
You're probably going to need to
<a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food">eat your own dogfood</a>.</p>Literature Review: Writing Great Documentation2017-02-03T00:00:00-08:002017-02-03T00:00:00-08:00Ryan Hartertag:blog.harterrt.com,2017-02-03:/lit-review.html<p>I'm working on a big overhaul of my team's documentation.
I've noticed writing documentation is a difficult thing to get right.
I haven't seen any great example for a data product, either.
I don't have much experience in this area,
so I decided to review what's already been written about …</p><p>I'm working on a big overhaul of my team's documentation.
I've noticed writing documentation is a difficult thing to get right.
I haven't seen any great example for a data product, either.
I don't have much experience in this area,
so I decided to review what's already been written about creating great documentation.
This is a summary of what I've found,
both for my own reference and to help others understand my thought process.</p>
<h2>Findings</h2>
<p>I should note, all the literature I could find focused on documenting software products.
I am willing to bet that a data product is going to have different documentation needs than most software products.
But, this is as good a place to start as any.</p>
<h3>Structure & What to Write</h3>
<p>Most seem to agree that a <strong>README</strong> is a critical piece of documentation.
The README is usually comprised of two key parts:</p>
<ul>
<li>A quick introduction explaining what this project is, why the reader should
care, and whether it's worth investing time to understand it better.</li>
<li>A simple tutorial to get the reader started and give a feel for what the tool
actually does.</li>
</ul>
<p>If the reader decides they want to learn more,
there should be a set of <strong>topical guides or tutorials</strong> which comprise the bulk of the documentation.
Think of each of these guides as a class focused on teaching your student (reader) a single skill.
Reading all of these guides should take "someone who has never seen your product and make them an expert user". (<a href="http://stevelosh.com/blog/2013/09/teach-dont-tell/">TDT</a>)
With that in mind, make sure there's some sense of order to these lessons (easy to hard).</p>
<p>If your reader gets this far, they are now very comfortable with your product.
From here, they need high-quality <strong>reference material</strong>.
In my experience, this is the most common documentation provided,
but it is needed latest in the process and only by the most advanced users!</p>
<p>When I started this research,
I was having a hard time figuring out how we were going to separate our
prose documentation from our development notes.
Now I see that these are just different stages in this learning process.
First we explain what it is, then how to use it, and finally, how to extend it.</p>
<h3>Style</h3>
<p>Most articles suggest adopting a style guide to make it easier for a user to read your documentation.
The writing should pull you through the document and feel natural.</p>
<p>If you want your documentation to read naturally, you should try to become a better writer.
This comes as cold solace to most folks, since I need my documentation now
and I can't wait 10,000 hours to become an expert writer, but it's worth mentioning.
The overwhelming consensus is that the best way to become a better writer is to <strong>write a lot</strong>.
If you want to write great documentation, consider building habits that will make you a great writer.</p>
<p>As with programming, maintaining a consistent style will help readers understand your documentation naturally.
Note, the important word here is "consistent".
<strong>Choose a style and stick with it</strong>.
This sounds obvious, but I rarely find corporate documentation with consistent style across tutorials.
Have a style guide and enforce it.</p>
<p>As you choose your style guide, be aware that most of the advice is focused on physical media.
Your documentation is probably going to be read digitally,
so your readers will have different expectations.
Specifically, readers are going to skim your writing, so make it easy to identify important information.</p>
<p>Use <strong>visual markup</strong> like bold text, code blocks, call-outs
(e.g [<a href="http://www.methods.co.nz/asciidoc/chunked/ch16.html#X22">1</a>],
[<a href="http://getbootstrap.com/components/#alerts[2]">2</a>], and section headers.
Similarly, avoid long paragraphs.
Short paragraphs that describe one concept each makes finding important information easier.</p>
<p>Most guides suggest keeping a <strong>conversational tone</strong>.
This makes the guide more approachable and easier to read.</p>
<p>Everyone seems to agree that <strong>you should have an editor</strong>.
In fact, Jacob Kaplan-Moss dedicated an entire article to this point [<a href="https://jacobian.org/writing/editors/">YNAE</a>].
If you don't have access to an editor,
review your own work thrice then ask for someone else's review before publishing.
Try adjusting your margins to force the text to re-flow.
It's a very effective way to catch spelling or grammatical mistakes.</p>
<h3>Tools</h3>
<p>I'll start this section with a warning.
Tools often receive an undue amount of attention, especially from programmers.
With documentation, <strong>writing is the hard, important work</strong>.
It's important to use good tools, but make sure you're not
<a href="https://en.wikipedia.org/wiki/Law_of_triviality">bike shedding</a>.</p>
<p>Your documentation should be stored in <strong>plain text and in version control</strong>.
Most of your documentation is going to be written by programmers,
and programmers have powerful tools for manipulating text.
Using anything besides plain text is a frustration that makes it less
likely they'll enjoy writing documentation.</p>
<!--
// TODO: This should be expanded upon. Version control is hugely useful for
// figuring out who to contact if you have questions, identifying the health
// of the documentation, and attributing credit for the hard, thankless work
// of writing the documentation. Wiki's do a particularly horrible job of all
// of these things.
-->
<p>You should have a <strong>process for reviewing changes</strong> to the documentation.
Review will help maintain a consistent voice across your documentation
and will provide useful feedback to the writer.
Think of how useful code reviews are for improving your programming.
I'd jump at the chance to get feedback from an expert writer.</p>
<p>You <strong>should not use a wiki</strong> for documentation.
Wikis make documentation "everyone's responsibility",
which really means it's nobody's responsibility.
Without this responsibility, wikis tend to decay into a web of assorted links without any sense of order or importance.
Wikis make it impossible to maintain a consistent voice throughout your documentation.
Finally, it's difficult to get review for your work before publishing.</p>
<p>Recognize that automatically-generated documentation isn't a replacement for hand-crafted prose.
Remember that the bulk of your documentation should be tutorials meant to slowly ramp up your users to expert status.
Docstrings have very little utility in this process.</p>
<h2>Resources</h2>
<p>Most of what I've summarized here came from very few sources.
I highly recommend you read the following articles if you're interested in learning more:</p>
<ul>
<li><a href="http://stevelosh.com/blog/2013/09/teach-dont-tell/">[Teach, Don't Tell] (Steve Losh)</a></li>
<li><a href="https://jacobian.org/writing/what-to-write/">[What to Write] (Jacob Kaplan Moss)</a></li>
<li><a href="https://jacobian.org/writing/technical-style/">[Technical Style] (Jacob Kaplan Moss)</a></li>
<li><a href="https://jacobian.org/writing/editors/">[You Need an Editor] (Jacob Kaplan Moss)</a></li>
</ul>
<p>For later reference, I also reviewed these articles to form opinions about
general consensus outside of the primary sources above:</p>
<ul>
<li><a href="http://www.americanscientist.org/issues/id.877,y.0,no.,content.true,page.1,css.print/issue.aspx">The Science of Scientific Writing</a>
(George Gopen, Judith Swan): Good overview of how to structure a paper so
readers find information where they expect it to be</li>
<li><a href="http://www.writethedocs.org/">WriteTheDocs.org</a>, specifically
<a href="http://www.writethedocs.org/guide/writing/beginners-guide-to-docs/">A Beginner's Guide to Writing Docs</a></li>
<li><a href="https://github.com/noffle/art-of-readme">Art of README</a>: An arguement for
writing good READMEs and a template to help you get started</li>
<li><a href="https://groups.google.com/forum/#!topic/scala-internals/r2GnzCFc3TY">Scala Documentation Discussion</a>
A discussion of why Scala's official documentation is so bad</li>
<li><a href="http://r-pkgs.had.co.nz/vignettes.html">Vignettes</a> (Hadley Wickham): Hadley
is a rockstar in the R universe. This is an article from his style guide for
writing R package documentation. This is the closest I could come to finding
documentation advice for data products.</li>
<li><a href="http://steve-yegge.blogspot.com/2008/09/programmings-dirtiest-little-secret.html">Programming's Dirtiest Little Secret</a>
(Steve Yegge): Steve Yegge on why it's important to type well</li>
<li><a href="https://byrslf.co/writing-great-documentation-44d90367115a#.nenvaqeng">Writing Great Documentation</a>:
This article comments on documentation's propensity towards kippleization.</li>
<li><a href="https://www.gnu.org/prep/standards/standards.html#GNU-Manuals">GNU Manual Style Guide</a></li>
</ul>Is moving to the Bay Area worth it?2016-12-14T00:00:00-08:002016-12-14T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2016-12-14:/is-moving-to-the-bay-area-worth-it.html<p>I came across <a href="http://blog.triplebyte.com/does-it-make-sense-for-programmers-to-move-to-the-bay-area">this article</a> on the front page of Hacker News yesterday.
The author argues that Bay Area housing prices may be high, but the salary increase probably makes it worth while.
The author pulls together some interesting data to make their point,
but I have major <strong>issues with …</strong></p><p>I came across <a href="http://blog.triplebyte.com/does-it-make-sense-for-programmers-to-move-to-the-bay-area">this article</a> on the front page of Hacker News yesterday.
The author argues that Bay Area housing prices may be high, but the salary increase probably makes it worth while.
The author pulls together some interesting data to make their point,
but I have major <strong>issues with the analysis</strong>.
In fact, the data seem to be <strong>showing the opposite trend</strong>,</p>
<h2>Summary of findings</h2>
<p>Here's the important information from the article:</p>
<p>The author reviews median tech worker salaries from the BLS, Indeed, and GlassDoor and finds:</p>
<blockquote>
<p>engineers at top tech companies in the Bay Area stand to make between $15,000
and $33,000 more per year than engineers at top tech companies in Seattle.</p>
</blockquote>
<p>Comparing median monthly rent from Zillow, the author finds:</p>
<blockquote>
<p>median rent is about $1400-$1500 a month (or roughly $17,000-$18,000 a year)
higher in the Bay Area than in the Seattle metro area</p>
</blockquote>
<p>The author then concludes:</p>
<blockquote>
<p>higher Bay Area salaries at least cover the costs of higher rents.</p>
</blockquote>
<h2>Taxes</h2>
<p>The most obvious error is that this analysis completely <strong>ignores all taxes</strong>.
I <a href="https://news.ycombinator.com/item?id=13178880">pointed this out</a> in the comments,
but the conversation exclusively focused on the difference between WA and CA state taxes.
I think it's important to note that this estimate also ignores <em>federal</em> taxes as well.</p>
<p>For example, consider a Seattle salary of $100k and a Bay Area salary of $133k.
Assuming a federal tax rate of 33%, that $33k tax difference will be reduced to $22k takehome.</p>
<blockquote>
<p>$133k * (1-0.33) - $100 * (1-0.33) ~= $22k</p>
</blockquote>
<p>Since WA does not have a state income tax and CA has a significant income tax,
you'll also end up paying just a bit over $10k in state taxes.
This drops the take home pay increase to $12k total.
And, according to the data, this is at the high end of the scale!</p>
<p>In reality, there's no way we'll cover the $17k rent difference.</p>
<h2>Median Rental Price</h2>
<p>A few folks argued the use of a median isn't appropriate here.
I agree to a point, but I think it's probably a good first approximation,
especially since the author restricted their data to tech salaries in each market.</p>
<p>However, I do have once concern here.
I'm willing to bet that <strong>Seattle renters can get more space for their median
rental than a Bay Area renter can get for the Bay Area median rental</strong>.
As rent prices increase, renters will adjust by increasing the amount the spend
on rent and reducing the value of thier apartment.
In economic terms, a consumer's demand for an apartment is not perfectly inelastic.</p>
<p>I saw this first hand when I moved to the Bay Area.
Prices were generally higher than what I was used to, so I adapted by increasing my
monthly rent, downsizing my apartment, and increasing my commute length.
I also noticed more of my peers sharing apartments or houses who wouldn't do so in lower COL areas.</p>
<h2>Conclusion</h2>
<p>After accounting for taxes and reviewing the metrics used for rental costs, the
salary increase from moving to the Bay Area is <strong>very unlikely to cover the
increase in housing costs</strong>, especially for similar housing.</p>Announcing the Cross Sectional Dataset2016-11-14T00:00:00-08:002016-11-14T00:00:00-08:00Ryan T. Hartertag:blog.harterrt.com,2016-11-14:/announcing-the-cross-sectional-dataset.html<p>I'm happy to announce a new telemetry dataset!</p>
<p>The Cross Sectional dataset makes it easy to describe our users by providing
summary statistics for each client. Like the Longitudinal table, there's one
row for each client_id in a 1% sample of clients. However, the Cross Sectional
dataset simplifies your analysis …</p><p>I'm happy to announce a new telemetry dataset!</p>
<p>The Cross Sectional dataset makes it easy to describe our users by providing
summary statistics for each client. Like the Longitudinal table, there's one
row for each client_id in a 1% sample of clients. However, the Cross Sectional
dataset simplifies your analysis by replacing the longitudinal arrays with
summary statistics.</p>
<p>The dataset is now available in
<a href="https://sql.telemetry.mozilla.org/queries/1669/source">STMO</a>. You can find
more information in <a href="https://github.com/mozilla/telemetry-batch-view/blob/master/docs/choosing_a_dataset.md#cross-sectional">the
documentation</a>.</p>
<p>Take a look and let me know if you have any question or suggestions for new
columns!</p>Meta Documentation2016-11-03T00:00:00-07:002016-11-03T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2016-11-03:/meta-documentation.html<p>You'll see a lot of posts coming down the line on documentation.</p>
<p>We surveyed our customers last quarter and asked where our data pipeline was lacking.
It turns out the most painful part of using our data pipeline, is reading the documentation.
I've been interesting in learning how to write …</p><p>You'll see a lot of posts coming down the line on documentation.</p>
<p>We surveyed our customers last quarter and asked where our data pipeline was lacking.
It turns out the most painful part of using our data pipeline, is reading the documentation.
I've been interesting in learning how to write great documentation for a while,
so I volunteered to spend a significant amount of time reworking our documentation this quarter. </p>
<p>To summarize, our team tries to make telemetry data useful.
Some of us build tools to make accessing the data easy,
others work on processing the data and making it available in an efficient and understandable format.
Last quarter I worked on the latter, pipelining hte data to make the format better.</p>
<p>This year, I'll be working as a data ambassador.mentor,
going out to teams, identifying their data needs, and helping them reach their goals.</p>
<p>Data is an incredibly useful tool.
It takes a lot of the guesswork out of building useful projects.
However, even though we have a great product, it's useless if our users don't understand how to use it.</p>
<p>We have a great tool for our customers, but it's not worth the energy to learn about it.
It's easier to do a one off analysis that is kind-of right.</p>
<p>If you have a data product or tool without documentation, it's more likely than not that someone is misusing your data.
The hardest part of making data useful is understanding how it was collected and in what situations it is appropriate. </p>
<p>[TOC]</p>Why Markdown?2016-11-03T00:00:00-07:002016-11-03T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2016-11-03:/why-markdown.html<p>[TOC]</p>
<p>Last week I finished a <a href="https://github.com/mozilla/telemetry-batch-view/pull/128">pull
request</a> that moved
some documentation from <a href="https://wiki.mozilla.org/Telemetry/LongitudinalExamples">mozilla's
wiki</a> to a <a href="https://github.com/mozilla/telemetry-batch-view/blob/master/docs/longitudinal_examples.md">github
repository</a>.
It took a couple of hours of editing and toying with pandoc to get right, but
when I was done, I realized the benefits were difficult to see. So, I decided …</p><p>[TOC]</p>
<p>Last week I finished a <a href="https://github.com/mozilla/telemetry-batch-view/pull/128">pull
request</a> that moved
some documentation from <a href="https://wiki.mozilla.org/Telemetry/LongitudinalExamples">mozilla's
wiki</a> to a <a href="https://github.com/mozilla/telemetry-batch-view/blob/master/docs/longitudinal_examples.md">github
repository</a>.
It took a couple of hours of editing and toying with pandoc to get right, but
when I was done, I realized the benefits were difficult to see. So, I decided
to write them out for posterity.</p>
<h2>Better Process</h2>
<p>The only way to edit our wiki is through the web front end which causes some
major problems.</p>
<p>For one, You're always editing the production version and there's no way to get
review before publishing. That's obviously not great.</p>
<p>Second, your edits need to be submitted quickly - like within an hour, usually.
Since you're editing in a web form there's no good way to save your edits
locally. Even worse, there's no good way to settle merge conflicts.</p>
<p>With markdown, I can develop my revisions over the course of weeks and preview
them locally. When it's time to publish I get review from my peers, which
makes my documentation more readable and helps me improve as an engineer.</p>
<h2>Better Tools</h2>
<p>I have powerful tools for manipulating text so using a simple web form to edit
technical documentation seems absurd to me. With markdown, I get the joy of
using my favorite text editor in my favorite development environment</p>
<h3>One less tool</h3>
<p>Our team is already using Markdown for our README's and Github provides a much
better UX for revison control. By moving to Markdown for our user facing
documentation, we have one less tool and syntax we need to depend on.</p>
<h2>The documentation sits next to the code</h2>
<p>Storing your documentation with your code has a lot of great benefits.</p>
<h3>Syncronization</h3>
<p>Pull requests can include simultanious changes to code and documentation, which
makes it more likely they'll stay in sync. Both because you don't need to go
edit them elsewhere and because it can become a review requirement.</p>
<h3>Discoverability</h3>
<p>Keeping the docs next to the code helps with discoverability. Your
code and your documentation should supplement each other. Keeping them close
together is only reasonable.</p>Working over SSH2016-09-05T00:00:00-07:002016-09-05T00:00:00-07:00Ryan T. Hartertag:blog.harterrt.com,2016-09-05:/working-over-ssh.html<p>[TOC]</p>
<h2>Introduction</h2>
<p>Working over SSH can be impossibly frustrating if you're not using the right tools.
I promised my teammates a write-up how I work over ssh.
Using these tools will make it significantly easier / more fun to work with a remote linux system.</p>
<h2>Tools</h2>
<h3><a href="https://tmux.github.io/">tmux</a></h3>
<p>For me, tmux is …</p><p>[TOC]</p>
<h2>Introduction</h2>
<p>Working over SSH can be impossibly frustrating if you're not using the right tools.
I promised my teammates a write-up how I work over ssh.
Using these tools will make it significantly easier / more fun to work with a remote linux system.</p>
<h2>Tools</h2>
<h3><a href="https://tmux.github.io/">tmux</a></h3>
<p>For me, tmux is the single tool most important getting work done over SSH.
tmux does a lot of really cool things, but the most relevant feature to this discussion is session persistence.</p>
<h4>Session Persistence</h4>
<p>tmux sessions can be detached and reattached at will.
That means you can <strong>execute some long running command on an AWS cluster, kill the ssh session, and the command will keep running</strong>.
Later, you can reconnect to the cluster and session, it will be as if you hadn't left.
So much nicer than cussing out your flaky WiFi connection.</p>
<p>For example:
```bash</p>
<h1>Start a new session named "foo"</h1>
<h1>Opens a new shell as a subprocess</h1>
<p>tmux new -s foo</p>
<h1>Do stuff ...</h1>
<p>sleep 100</p>
<h1>Kill the session, returning you to the original shell</h1>
<h1>with ctrl-b d</h1>
<h1>Reconnect to the tmux session</h1>
<p>tmux at -dt foo</p>
<h1>Still waiting!!</h1>
<p>```</p>
<p>More often, I use tmux just to save my place when I need to wrap up for the day.
Next morning, I can reattach my session and I'm already looking at the most relevant files for today's work.</p>
<h4>Multiplexing</h4>
<p>This is what tmux's was built to do. I think persistence is just a nice side effect.
tmux allows you to open a bunch of terminals in a single ssh connection.
Think of tmux as a tiling window manager for the terminal.
Here's a screen shot of how I developed this blog post:</p>
<p><img src="https://blog.harterrt.com/images/example-tmux-session.png"></p>
<p>That's all in one terminal window.
On the left I have a process serving up drafts of this document and on the right I have my text editor.
The extra context is indispensable when trying to figure out WTF is going on with a failing job.
For example, monitoring an <code>sbt ~test</code> process on the left while making edits on the right.</p>
<h3><a href="https://github.com/andsens/homeshick">Homeshick</a></h3>
<p>Configuring a new machine is a PITA.
For a while, I saw all configuration changes as a liability and refused to customize my environment.
After all, I'd eventually have to redo all of these configs when I get a new machine.
But, your tools should be a joy to use, and Homeshick makes this a non-issue.</p>
<p>Homeshick pulls all of your dotfiles into a central git repository and handles linking these files to the right location.
Now, I can <strong>setup a new Ubuntu machine within ~5 minutes</strong> with all of my dotfiles intact.
When I connect to a machine for the first time, I grab <a href="https://github.com/harterrt/TIL/blob/master/linux/new-machine.md">this snippet</a> and all of the initialization is done.
Even better, the meaningful config changes I make on my work machine magically materialize on my personal machine and VPS with a simple <code>git pull</code>.</p>
<p>The <a href="https://github.com/andsens/homeshick">README</a> is pretty good and it shouldn't take longer than ~15 minutes to set up.</p>