Planning Data Science is hard: EDA

Data science is weird. It looks a lot like software engineering but in practice the two are very different. I've been trying to pin down where these differences come from.

Michael Kaminsky hit on a couple of key points in his series on Agile Management for Data Science on Locally Optimistic. In Part II Michael notes that Exploratory Data Analyses (EDA) are difficult to plan for: "The nature of exploratory data analysis means that the objectives of the analysis may change as you do the work." - Bingo!

I've run into this problem a bunch of times when trying to set OKRs for major analyses. It's nearly impossible to scope a project if I haven't already done some exploratory analysis. I didn't have this problem when I was doing engineering work. If I had a rough idea of what pieces I needed to stitch together, I could at least come up with an order-of-magnitude estimate of how long a project would take to complete. Not so with Data Science: I have a hard time differentiating between analyses that are going to take two weeks and analyses that are going to take two quarters.

That's all. No deep insight. Just a +1 and a pointer to the folks who got there first.

© Ryan T. Harter. Built using Pelican. Theme by Giulio Fidente on github.