Literature Review: Writing Great Documentation

I’m working on a big overhaul of my team’s documentation. I’ve noticed writing documentation is a difficult thing to get right. I haven’t seen any great example for a data product, either. I don’t have much experience in this area, so I decided to review ...

Is moving to the Bay Area worth it?

I came across this article on the front page of Hacker News yesterday. The author argues that Bay Area housing prices may be high, but the salary increase probably makes it worth while. The author pulls together some interesting data to make their point, but I have major issues with ...

Announcing the Cross Sectional Dataset

I'm happy to announce a new telemetry dataset!

The Cross Sectional dataset makes it easy to describe our users by providing summary statistics for each client. Like the Longitudinal table, there's one row for each client_id in a 1% sample of clients. However, the Cross Sectional dataset simplifies ...

Why Markdown?

Last week I finished a pull request that moved some documentation from mozilla's wiki to a github repository. It took a couple of hours of editing and toying with pandoc to get right ...

Working over SSH


Working over SSH can be impossibly frustrating if you're not using the right tools. I promised my teammates a write-up how I work over ssh. Using these tools will make it significantly easier / more fun to work with a remote linux ...

Strange Spark Error


I spend the better part of last week debugging a Spark error, so I figure it's worth writing up.

The Bug

I added the this very simple view to our batch views repository.

package com.mozilla.telemetry.views

import org.apache.spark.{SparkConf ...

