08 April 2010

Three weeks ago I gave a presentation on probabilistic modelling to an audience of data mining practitioners. These people know about machine learning, they use it every day, but their main concern is to analyze data: real data!

The experience taught me a valuable lesson which I hadn’t come across by interacting with the academic community: Probabilistic models are hard. Even for people that are very close to the machine learning community (data mining), probabilistic models are a very different (new?) way of thinking.

The whole idea of building a generative story for your data and then using Bayes rule to “invert” the model given some dataset has become second nature to me. Nonetheless, I (we?) shouldn’t forget that it took statistics a long time to think about modelling in this sense. Hence I now understand realize that for outsiders the framework of probabilistic modelling is a highly non-trivial concept to grasp.

In this context I am obliged to share a blog post by Jeff Moser. I have never seen such a great explanation of a non-trivial probabilistic model that is deployed in very large scale on XBox Live: Computing Your Skill, a description of the TrueSkill ranking model. Very very well done Jeff!