Intentional Documentation

Randy Au has a great post on documentation for data scientists here: Let's get intentional about documentation. Take a look, it's worth a read.

I've been able to find some decent guides for writing documentation but they're usually targeted at engineers. That's a shame. Data scientists have significantly different constraints and needs when writing documentation.

I really like Randy's suggestion to focus on keeping good work records instead of focusing on writing complete documentation. Writing "good documentation" is a huge task and it's hard to predict what documentation will be useful. Instead of guessing, make it easy to backfill documentation later.

Keeping good records might look like tracking your work in tickets or publishing weekly snippets. I talked about how these records can be personally useful in this post. As an added bonus, good records make it much easier to backfill documentation once you identify what documentation is missing.

I mentioned earlier that documenting data science work is significantly different than documenting engineering work. One of they key differences is that data scientists tend to do more once-and-done work than engineers. Data science is a race against irrelevance. The world is changing around us and we need to deliver insights before our findings go stale.

It's impossible and inefficient to try to document all of this one-off work. Only a small portion of the resulting documentation would ever be used. Even worse, the useful documentation will be hidden in a sea of useless noise.

Instead, data scientists should focus on keeping good work records, contextualizing their analyses, and preparing themselves to backfill documentation later.

Finally, this quote is a gem:

... [Documentation] is a MASSIVE, continent-sized, topic. Sadly, I’m not informed enough to tackle the entire continent and I’ve only got a few thousand words of to use in a post. To cope, I’m gonna employ the time-honored move of harrowed data scientists everywhere — reduce scope by fiat and attempt to dazzle the audience with “directional” findings until I can form a strong case later on.


© Ryan T. Harter. Built using Pelican. Theme by Giulio Fidente on github.