Evaluating New Tools

At Mozilla, we're still relatively early in our data science journey. As such, we're always evaluating new tools to improve our analysis workflow (jupyter vs. Rmd), or make our infrastructure more usable (our home-rolled ATMO vs. databricks), or scale our knowledge (knoledge-repo. vs. gitbook)

Most of these tools ...

Documentation Style Guide

I just wrote up a style guide for our team's documentation. The documentation is rendered using Gitbook and hosted on Github Pages. You can find the PR here but I figured it's worth sharing here as well.

Style Guide

Articles should be written in Markdown (not AsciiDoc). Markdown ...

Beer and Probes

Quick post to clear up some terminology. But first, an analogy to clear up my thinking:


Temperature control is a big part of brewing beer. Throughout the brewing process I use a thermometer to measure the temperature of the soon-to-be beer. Because I take several temperature readings throughout the ...

Bad Tools are Insidious

This is my first job making data tools that other people use. In the past, I've always been a data scientist - a consumer of these tools. I'm learning a lot.

Last quarter, I learned that bad tools are often hard to spot even when they're damaging productivity ...

Literature Review: Writing Great Documentation

I’m working on a big overhaul of my team’s documentation. I’ve noticed writing documentation is a difficult thing to get right. I haven’t seen any great example for a data product, either. I don’t have much experience in this area, so I decided to review ...

Announcing the Cross Sectional Dataset

I'm happy to announce a new telemetry dataset!

The Cross Sectional dataset makes it easy to describe our users by providing summary statistics for each client. Like the Longitudinal table, there's one row for each client_id in a 1% sample of clients. However, the Cross Sectional dataset simplifies ...

Working over SSH


Working over SSH can be impossibly frustrating if you're not using the right tools. I promised my teammates a write-up how I work over ssh. Using these tools will make it significantly easier / more fun to work with a remote linux ...

Strange Spark Error


I spend the better part of last week debugging a Spark error, so I figure it's worth writing up.

The Bug

I added the this very simple view to our batch views repository.

package com.mozilla.telemetry.views

import org.apache.spark.{SparkConf ...

