SQL style - where do the commas go?

TL;DR: there are good arguments for leading commas, but I recommend using trailing commas for consistency.


We didn't take an opinionated stance on comma placement when writing Mozilla's SQL Style Guide probably to avoid a quagmire. I'm trying to figure out where I stand now, so I wrote it …


Talk: Practical Strategies for Data Storytelling

I gave a talk at the Open Data Science Conference a few weeks ago titled: Practical Strategies for Data Storytelling. I have more to say on the subject, but in the meantime you can see my slides and speaker notes here.


For the last two years, I've hosted a biweekly …


Getting Credit for Invisible Work

As I moved up my company’s career ladder, my job description became more ambiguous.

I stepped back to take a look at my work, and I was surprised to find that my biggest wins hadn’t come from technical feats or shipped code. Instead, I realized that most of …


Data Intuition Case Study: Grain-free Dog Food

My vet told me I should stop feeding my dog grain-free dog food. Apparently, grain-free dog food is linked with a heart condition called Dilated Cardiomyopathy (DCM). This set off my data intuition alarms, so I decided to dig deeper.

The FDA has a great document explaining their investigation here …


Getting Credit for Invisible Work

Last month I gave a talk at csv,conf on "Getting Credit for Invisible Work". The (amazing) csv,conf organizers just published a recording of the talk. (slides here). Give it a watch! It's only 20m long (including the Q&A).

Invisible work is a concept I've been trying to …


Opportunity Sizing: Is the Juice Worth the Squeeze?

My peers at Mozilla are running workshops on opportunity sizing. If you're unfamiliar, opportunity sizing is when you take some broad guesses at how impactful some new project might be before writing any code. This gives you a rough estimate of what the upside for this work might be.

The …


Optional Comments

I spend a lot of my time at Mozilla reviewing my peers' work. It's a joy, but it's hard to do well. Review can be a great opportunity for mentorship and growth, but it's also an opportunity to be overbearing. Striking the right tone is a struggle.

Part of the …


Controlled Experiments - Why Bother?

I spent some time earlier this year orchestrating a massive experiment for Firefox. We launched a bunch of new features with Firefox 80 and we wanted to understand whether these new features improved our metrics.

In the process, I ended up talking with a bunch of Firefox engineers and explaining …


Leading with Data - Cascading Metrics

It's surprisingly hard to lead a company with data. There's a lot written about how to set good goals and how to avoid common pitfalls (like Surrogation) but I haven't seen much written about the practicalities of taking action on these metrics.

I spent most of this year working with …


Defining Data Intuition

Last week, one of my peers asked me to explain what I meant by "Data Intuition", and I realized I really didn't have a good definition. That's a problem! I refer to data intuition all the time!

Data intuition is one of the three skills I interview new data scientists …


Follow up: Intentional Documentation

Last week I presented the idea of Intentional Documentation to Mozilla's data science team. Here's a link to the slides.

The rest of this post is a transcription of what I shared with the team (give or take).


In Q4, I'm trying to build a set of trainings to help …


Surrogation

A year or so ago, I read this article about how Wells Fargo ended up in such a mess. If you don't remember, Wells Fargo was opening accounts in their clients' name without their consent and ended up paying a few hundred million dollars in fines.

Long story short, a …


Intentional Documentation

Randy Au has a great post on documentation for data scientists here: Let's get intentional about documentation. Take a look, it's worth a read.

I've been able to find some decent guides for writing documentation but they're usually targeted at engineers. That's a shame. Data scientists have significantly different constraints …


What do you take home?

Every other week, I go through my todo list and decide where I should focus my attention. I review a list of prompts that help me choose important work. One of the oldest prompts on my list is: "What will you take home at the end of the week?".

I …


Keeping a Journal

There was a discussion of this HBR article ("The More Senior Your Job Title, the More You Need to Keep a Journal") on HN today.

This is great advice. I've kept a journal for almost a decade now and it's definitely improved my career especially as I've become more senior …


Post hoc ergo propter hoc

Economists have a handy phrase to describe a fairly common fallacy: "Post hoc ergo propter hoc" meaning "After, therefore because".

Wikipedia has an example of how this might look in the wild:

A tenant moves into an apartment and the building's furnace develops a fault. The manager blames the tenant's …


Daily Writing and the "Notebook" Category

This past weekend I found drmaciver's post on starting a daily writing practice. I like the idea and I'm going to give it a try.

The content on this blog has never been all that polished, but I do expect these daily posts will be less consistent in quality and …


Writing inside organizations

Tom Critchlow has a great post here outlining some points on how important writing is for an organization.

I'm still working through the links, but his post already sparked some ideas. In particular, I'm very interested in the idea of an internal blog for sharing context.

Snippets

My team keeps …


Syncthing and Open Source Data Collection

I don't see many open source packages collecting telemetry, so when Syncthing asked me to opt-in to telemetry I was intrigued.

I see a lot of similarities between how Syncthing and Firefox collects data. Both collect daily pings and make it easy to view the data you're submitting (in Firefox …


Syncthing

I did a lot of reading and exploring over my holiday break. One of the things I'm most excited about is finding Syncthing. If you haven't seen it yet, take a look. It's like and open-source decentralized Dropbox.

It works everywhere, which for me means Linux and Android. Google Drive …


Pub True

I'm ramping up on a project to understand how Firefox retains users. Right now I'm trying to build some context quickly. For example, what's our monthly retention? How about our annual retention? There's a bunch of interesting and nuanced measurement questions that we'll eventually have to answer, but for now …


Analysis Maturation Plan

I was talking about tooling with Mark Reid a few weeks ago. I've been trying to find a way to simplify sharing analyses throughout the company. This is an old problem at Mozilla that I've tried to address a couple of times but I haven't found the silver bullet yet …


Technical Leadership Paths

I found this article a few weeks ago and I really enjoyed the read. The author outlines what a role can look like for very senior ICs. It's the first in a (yet to be written) series about technical leadership and long term IC career paths. I'm excited to read …


When the Bootstrap Breaks - ODSC 2019

I'm excited to announce that I'll be presenting at the Open Data Science Conference in Boston next week. My colleague Saptarshi and I will be talking about When the Bootstrap Breaks.

I've included the abstract below, but the high-level goal of this talk is to strip some varnish off the …


Slow to respond through 2018

I'm working on an urgent and high priority request for the next few weeks. To make sure I can finish this work in 2018 I'm limiting my meetings and communications for the remainder of the year.

Slack is good for getting my immediate attention, but if your request takes more …


If you can't do it in a day, you can't do it

I was talking with Mark Reid about some of the problems with Coding in a GUI. He nailed part of the problem with soundbite too good not to share:

"If you can't do it in a day, you can't do it."

This is a persistent problem with tools that make …


Planning Data Science is hard: EDA

Data science is weird. It looks a lot like software engineering but in practice the two are very different. I've been trying to pin down where these differences come from.

Michael Kaminsky hit on a couple of key points in his series on Agile Management for Data Science on Locally …


You can't do data science in a GUI

I came across You can't do data science in a GUI by Hadley Wickham a little while ago. He hits on a lot of the same problems I mentioned in Don't make me code in your text box. Take a look if you have some time. In the first 15m …


Why bootstrap?

Over the next few quarters, I'm going to focus my attention on Mozilla's experimentation platform. One of the first questions we need to answer is how we're going to calculate and report the necessary measures of variance. Any experimentation platform needs to be able to compare metrics between two groups …


SQL Style Guide

I'm happy to announce, we now have a SQL style guide. Check it out!

If you have any suggestions, feel free to file a PR or issue in the docs repository.

Many thanks to all who participated in the St. Mocli conversation and @mreid for the review!


PSA: Don't use approximate counts for trends

I got caught giving some bad advice this week, so I decided to share here as penance. TL;DR: Probabilistic counts are great, but they shouldn't be used everywhere.


Counting stuff is hard. We use probabilistic algorithms pretty frequently at Mozilla. For example, when trying to get user counts, we …


Don't make me code in your text box!

Whenever I start a new data project, my first step is rooting out any false assumptions I have about the data.

The key here is iterating quickly. My workflow looks like this: Code a little, plot the data, what do you see? Ah, outliers. Code a little, plot the data …


The 5 Stages of Experiment Analysis

I've been thinking about experimentation a lot recently. Our team is spending a lot of effort trying to make Firefox experimentation feel easy. But what happens after the experiment's been run? There's not a clear process for taking experimental data and turning it into a decision.

I noted the importance …


Asking Questions

Will posted a great article a couple weeks ago, Giving and Receiving Help at Mozilla. I have been meaning to write a similar article for a while now. His post finally pushed me over the edge.

Be sure to read Will's post first. The rest of this article is an …


Managing Someday-Maybe Projects with a CLI

I have a problem managing projects I'm interested in but don't have time for. For example, the CLI for generating slack alerts I posted about last year. Not really a priority, but helpful and not that complicated. I sat on that project for about a year before I could finally …


Removing Disqus

I'm removing Disqus from this blog. Disqus allowed readers to post comments on articles. I added it because it was easy to do, but I no longer think it's worth keeping.

If you'd like to share your thoughts, feel free to shoot me an email at harterrt on gmail. I …


Productivity Systems for Stress Management

Over the years, I've developed a pretty involved productivity system. It was originally based on Getting Things Done, but now it's grown to include the good bits from other systems. It's involved, but I love it.

I get a lot of comments, especially on the little black book I keep …


CLI for alerts via Slack

I finally got a chance to scratch an itch today.

Problem

When working with bigger ETL jobs, I frequently run into jobs that take hours to run. I usually either step away from the computer or work on something less important while the job runs. I don't have a good …


Experiments are releases

Mission Control was a major 2017 initiative for the Firefox Data team. The goal is to provide release managers with near-real-time release-health metrics minutes after going public. Will has a great write up here if you want to read more.

The key here is that the data has to be …


Desirable features of experimentation tools

Introduction

At Mozilla, we're quickly climbing up our Data Science Hierarchy of Needs 1. I think the next big step for our data team is to make experimentation feel natural. There are a few components to this (e.g. training or culture) but improving the tooling is going to be …


Submission Date vs Activity Date

My comments on Bug 1422892 started to get long, so I started untangling my thoughts here.


From the bug:

We experimented with using activity_date instead of submission_date when developing the clients_daily etl job. We should summarize our findings and decide on which of these measures we'd like to standardize against …


OKRs and 4DX

I feel like I'm swimming in acronyms these days.

Earlier this year, my team started using Objectives and Key Results (OKRs) for our planning. It's been a learning process. I had some prior experience with OKRs at Google, but I've never felt like I was fully taking advantage of the …


Evaluating New Tools

At Mozilla, we're still relatively early in our data science journey. As such, we're always evaluating new tools to improve our analysis workflow (jupyter vs. Rmd), or make our infrastructure more usable (our home-rolled ATMO vs. databricks), or scale our knowledge (knoledge-repo. vs. gitbook)

Most of these tools look like …


Documentation Style Guide

I just wrote up a style guide for our team's documentation. The documentation is rendered using Gitbook and hosted on Github Pages. You can find the PR here but I figured it's worth sharing here as well.

Style Guide

Articles should be written in Markdown (not AsciiDoc). Markdown is usually …


Beer and Probes

Quick post to clear up some terminology. But first, an analogy to clear up my thinking:

Analogy

Temperature control is a big part of brewing beer. Throughout the brewing process I use a thermometer to measure the temperature of the soon-to-be beer. Because I take several temperature readings throughout the …


Bad Tools are Insidious

This is my first job making data tools that other people use. In the past, I've always been a data scientist - a consumer of these tools. I'm learning a lot.

Last quarter, I learned that bad tools are often hard to spot even when they're damaging productivity. I sum this …


Literature Review: Writing Great Documentation

I'm working on a big overhaul of my team's documentation. I've noticed writing documentation is a difficult thing to get right. I haven't seen any great example for a data product, either. I don't have much experience in this area, so I decided to review what's already been written about …


Is moving to the Bay Area worth it?

I came across this article on the front page of Hacker News yesterday. The author argues that Bay Area housing prices may be high, but the salary increase probably makes it worth while. The author pulls together some interesting data to make their point, but I have major issues with …


Announcing the Cross Sectional Dataset

I'm happy to announce a new telemetry dataset!

The Cross Sectional dataset makes it easy to describe our users by providing summary statistics for each client. Like the Longitudinal table, there's one row for each client_id in a 1% sample of clients. However, the Cross Sectional dataset simplifies your analysis …


Meta Documentation

You'll see a lot of posts coming down the line on documentation.

We surveyed our customers last quarter and asked where our data pipeline was lacking. It turns out the most painful part of using our data pipeline, is reading the documentation. I've been interesting in learning how to write …


Why Markdown?

Last week I finished a pull request that moved some documentation from mozilla's wiki to a github repository. It took a couple of hours of editing and toying with pandoc to get right, but …


Working over SSH

Introduction

Working over SSH can be impossibly frustrating if you're not using the right tools. I promised my teammates a write-up how I work over ssh. Using these tools will make it significantly easier / more fun to work with a remote linux system …

© Ryan T. Harter. Built using Pelican. Theme by Giulio Fidente on github.