Friday, April 17, 2009

Falling into the Gaussian trap

I fell for a problem I thought I was immune from after reading "Fooled by Randomness" and "The Black Swan" by Nassim Taleb.  

I lead a team that writes software for use in the design process for turbomachinery.  We have been doing a lot of work in the last couple of years on tools for detailed product definition, including the building of virutal 3D models and drawings that utilize them.  We have written around 5 different programs for various stages of the design process and in each we imbed a bit of code that sends information to a server to track the usage of each tool.  I have been disappointed lately since I have checked the server logs and find very little, if any, usage of the tools.  This gave me the impression that the work we had been doing is of no value since no one cares enough to even use the tools.

After a conversation with a buddy who is a customer I realized my error.  The problem was an assumption that the usage of the tools would be consistent over time.  If it is going to be run 1200 times per year I would have expected it to be 100 times per month with a slight variation.  This is totally wrong!  Why would it be evenly distrubited?  Why would I expect a slight, gaussian variation to the monthly logs?  The truth is that it is far more likely, like many things in the real world, to be a fractal distribution with lots of variation and steep, short peaks.  My friend told me that the usage is about what he would have expected since there are few in process programs that are in need of the specific tools at this very moment.  When they do need it there will probably be a huge spike in usage as all of a sudden there is a need.

I have now been calibrated to my expectations and feel far better about the work we are doing.  I also got a nice reminder to be on my guard for these types of falacies as I am far from immune from them.

2 comments:

Andy's Blog said...

Jeff, that's too eerie. I was tracking tool usage at my job last year. The IT dept had written the script, but I was compiling all the data and examining it wondering why people were complaining about lack of licenses. Turns out there are only a handfull of times per year that the licenses are maxed out while the yearly average is maybe 50%. But it is the same situation; there are points where the demand doesn't seem to be 105%, but 150%! So then the problem comes down to weighing the purchase of additional licenses for those few times when demand is much greater than capacity. That's above my pay grade.

Jodi Erno said...

My tool usage has spiked a lot lately. :)

Tom