The easiest approach is simply to average the star ratings from the members of a group. For now I'm ignoring the issue of what to do with a film which one or more users hasn't rated (in the final system they are never actually suggested so the problem nicely disappears). Unfortunately simply averaging doesn't produce nice results. Lets assume (for the sake of simplicity) that there are just two users, then if they both give a film 3 out of 5 stars then it gets an average of 3 stars. This seems totally reasonable. If, however, one gives it a 5 and the others gives it a 1, then it also recieves an average star rating of 3. Clearly this isn't right. There is no way we should be suggesting a film that one person loves and one hates.
So my requirements are a function that 1) ranges between 0 and 5 so that I can match the combined star ratings onto the same scale as the individual ratings and 2) gives a combined star rating that matches my logical intuition of what should happen. Well that shouldn't be too hard to find!
After a long period of simply trying every averaging function in Excel I came to the conclusion that the clear winner was the harmonic mean. That is the reciprical of the arithmetic mean of the recipricals or for n positive real numbers:
I have no clear idea as to why the harmonic mean works but it does. Just to show you the difference here is all possible combinations of ratings from two users, showing both the average and harmonic mean to 2 decimal places and the final joint rating (the harmonic mean to 0 decimal places)
Rating 1 | Rating 2 | Average | Harmonic Mean | Joint Rating |
5.00 | 5.00 | |||
4.50 | 4.44 | |||
4.50 | 4.44 | |||
4.00 | 4.00 | |||
4.00 | 3.75 | |||
4.00 | 3.75 | |||
3.50 | 3.43 | |||
3.50 | 3.43 | |||
3.00 | 3.00 | |||
3.50 | 2.86 | |||
3.50 | 2.86 | |||
3.00 | 2.67 | |||
3.00 | 2.67 | |||
2.50 | 2.40 | |||
2.50 | 2.40 | |||
2.00 | 2.00 | |||
3.00 | 1.67 | |||
3.00 | 1.67 | |||
2.50 | 1.60 | |||
2.50 | 1.60 | |||
2.00 | 1.50 | |||
2.00 | 1.50 | |||
1.50 | 1.33 | |||
1.50 | 1.33 | |||
1.00 | 1.00 |
Note that the list is sorted by the harmonic mean so you can see by looking at the average column that it not only gives different numerical results but also would give a different ordering.
Maybe some of the more mathematically inclined of my readers may care to enlighten us all as to why the harmonic mean seems to be a good fit.
Not long after starting to write this post I came across another place where the same problem occurs: peer reviewing papers. I'm currently organising a workshop and so had to decide which of the submitted papers to accept based upon 3 reviews per paper. The system we use to help manage the process (START V2) by default simply averages the overall scores assigned to a paper. This though, suffers from the same problem as the star ratings and should probably be changed to use the harmonic mean. Fortunately in the case of my workshop while the order of the papers did change it didn't make any difference to which papers I accepted, but something to remember for the future.