Data Intuition Case Study: Grain-free Dog Food

My vet told me I should stop feeding my dog grain-free dog food. Apparently, grain-free dog food is linked with a heart condition called Dilated Cardiomyopathy (DCM). This set off my bullshit detector, so I decided to dig deeper.

The FDA has a great document explaining their investigation here. It's …


Getting Credit for Invisible Work

Last month I gave a talk at csv,conf on "Getting Credit for Invisible Work". The (amazing) csv,conf organizers just published a recording of the talk. (slides here). Give it a watch! It's only 20m long (including the Q&A).

Invisible work is a concept I've been trying to …


Opportunity Sizing: Is the Juice Worth the Squeeze?

My peers at Mozilla are running workshops on opportunity sizing. If you're unfamiliar, opportunity sizing is when you take some broad guesses at how impactful some new project might be before writing any code. This gives you a rough estimate of what the upside for this work might be.

The …


Optional Comments

I spend a lot of my time at Mozilla reviewing my peers' work. It's a joy, but it's hard to do well. Review can be a great opportunity for mentorship and growth, but it's also an opportunity to be overbearing. Striking the right tone is a struggle.

Part of the …


Controlled Experiments - Why Bother?

I spent some time earlier this year orchestrating a massive experiment for Firefox. We launched a bunch of new features with Firefox 80 and we wanted to understand whether these new features improved our metrics.

In the process, I ended up talking with a bunch of Firefox engineers and explaining …


Leading with Data - Cascading Metrics

It's surprisingly hard to lead a company with data. There's a lot written about how to set good goals and how to avoid common pitfalls (like Surrogation) but I haven't seen much written about the practicalities of taking action on these metrics.

I spent most of this year working with …


Defining Data Intuition

Last week, one of my peers asked me to explain what I meant by "Data Intuition", and I realized I really didn't have a good definition. That's a problem! I refer to data intuition all the time!

Data intuition is one of the three skills I interview new data scientists …


Follow up: Intentional Documentation

Last week I presented the idea of Intentional Documentation to Mozilla's data science team. Here's a link to the slides.

The rest of this post is a transcription of what I shared with the team (give or take).


In Q4, I'm trying to build a set of trainings to help …


Surrogation

A year or so ago, I read this article about how Wells Fargo ended up in such a mess. If you don't remember, Wells Fargo was opening accounts in their clients' name without their consent and ended up paying a few hundred million dollars in fines.

Long story short, a …


Intentional Documentation

Randy Au has a great post on documentation for data scientists here: Let's get intentional about documentation. Take a look, it's worth a read.

I've been able to find some decent guides for writing documentation but they're usually targeted at engineers. That's a shame. Data scientists have significantly different constraints …


Writing inside organizations

Tom Critchlow has a great post here outlining some points on how important writing is for an organization.

I'm still working through the links, but his post already sparked some ideas. In particular, I'm very interested in the idea of an internal blog for sharing context.

Snippets

My team keeps …


Syncthing and Open Source Data Collection

I don't see many open source packages collecting telemetry, so when Syncthing asked me to opt-in to telemetry I was intrigued.

I see a lot of similarities between how Syncthing and Firefox collects data. Both collect daily pings and make it easy to view the data you're submitting (in Firefox …


Syncthing

I did a lot of reading and exploring over my holiday break. One of the things I'm most excited about is finding Syncthing. If you haven't seen it yet, take a look. It's like and open-source decentralized Dropbox.

It works everywhere, which for me means Linux and Android. Google Drive …


Pub True

I'm ramping up on a project to understand how Firefox retains users. Right now I'm trying to build some context quickly. For example, what's our monthly retention? How about our annual retention? There's a bunch of interesting and nuanced measurement questions that we'll eventually have to answer, but for now …


Analysis Maturation Plan

I was talking about tooling with Mark Reid a few weeks ago. I've been trying to find a way to simplify sharing analyses throughout the company. This is an old problem at Mozilla that I've tried to address a couple of times but I haven't found the silver bullet yet …


Technical Leadership Paths

I found this article a few weeks ago and I really enjoyed the read. The author outlines what a role can look like for very senior ICs. It's the first in a (yet to be written) series about technical leadership and long term IC career paths. I'm excited to read …


When the Bootstrap Breaks - ODSC 2019

I'm excited to announce that I'll be presenting at the Open Data Science Conference in Boston next week. My colleague Saptarshi and I will be talking about When the Bootstrap Breaks.

I've included the abstract below, but the high-level goal of this talk is to strip some varnish off the …


Slow to respond through 2018

I'm working on an urgent and high priority request for the next few weeks. To make sure I can finish this work in 2018 I'm limiting my meetings and communications for the remainder of the year.

Slack is good for getting my immediate attention, but if your request takes more …


If you can't do it in a day, you can't do it

I was talking with Mark Reid about some of the problems with Coding in a GUI. He nailed part of the problem with soundbite too good not to share:

"If you can't do it in a day, you can't do it."

This is a persistent problem with tools that make …


Planning Data Science is hard: EDA

Data science is weird. It looks a lot like software engineering but in practice the two are very different. I've been trying to pin down where these differences come from.

Michael Kaminsky hit on a couple of key points in his series on Agile Management for Data Science on Locally …

© Ryan T. Harter. Built using Pelican. Theme by Giulio Fidente on github.