While a lot of this is true, I do see some drawbacks of the competition or crowd sourcing approach:
- To the extent organizations invest in analytics to gain a competitive advantage, the crowd sourcing approach has a disadvantage that it is harder to keep a secret in a crowd. You might swear the winner to secrecy, but what about the guy who almost won?
- Data confidentiality is another issue. As a consultant, I have always seen clients being very sensitive to giving others access to their own data. Therefore, they will be very reluctant to post really sensitive or important data in a public forum
- Another problem with the crowd sourcing model is that it is a winner(s) take all system, putting a lot of risk on the participant. for example, 750 teams participated in the Hearst Challenge and put in 6 weeks of effort, but only 1 will get the $25k prize. Therefore, for someone to be willing to put in that kind of effort, they must be either doing it part time in spare time, or just starting out. So majority of full time participants in these competitions are likely to be students or organizations looking to make a name for themselves. This might have scalability.
These drawbacks do not mean that this is not a viable way of solving business problems. Just that some more thinking and improvisation might be needed to make it scalable, or else it might just be a niche strategy --but definitely a very enticing one.
--Datamining_guy
Its interesting to note why this model of competition works in the analytics space...probably won't work in supercomputing or rocket/jet engine or a finding a cure for cancer/AIDS because all of those require capital investment as opposed to analytics where one can contribute using a PC and some software from your garage/basement.....this is also what erodes reputation for the analytics practice in general creating a lack of accountability wrt forecasting errors and appropriate quality controls. I get to learn more about prize winning solutions at various conference presentations and some key takeaways have been a) computer scientists are focussed on fitting/training models using billions of observations and conveniently ignore merits of random sampling b) students ignore basic concepts around non-linear or non-parametric estimation and imposing right constraints or using appropriate model specifications
ReplyDelete