There once a day when movie-goers would pick up their newspaper or turn on the television to listen to what a movie critic had to say about the latest flick. This was a time when "two thumbs up" was gospel and anyone who cared about cinema adhered to its word. Those days are over. The internet now makes boat-loads of information only a mouse click away. Thanks to the internet, Ebert is not the only voice we hear. Now, if we want to, we can also find out how Linda Cook felt (she's from the Quad City Times in Davenport, IA).
Yes, the internet provides us with unlimited access to information, but as any statistician knows, humans are not good at understanding massive quantities of information unless it is summarized in a clear, succinct manner. For that we now have movie rating web sites. Specifically, what I think of as the "big three": Rotten Tomatoes, Metacritic, and IMDB.com. Each of these sites takes a different approach to aggregating (summarizing) movie reviews in an attempt to provide the viewer with a more reliable estimate of the movie quality than any single review can.
I realize that even though many people use these services they might not be aware of how the numbers are crunched or whether the numbers are crunched in the best way possible. So, armed with some determination and a slightly above average knowledge of elementary statistics, I have taken it upon myself to explain the rating systems of each of these sites and to illustrate their strengths and weaknesses. I believe that this is necessary information for any interested movie-goer who wants to make the most informed decision when choosing how to spend their hard earned dollars in the local cineplex.
One Assumption, One Principle
Before we critique any rating system we have to make (at least) one assumption and understand one fundamental principle of statistics. First the assumption. In order to decide on the merit of a rating system you have to believe that there is in fact a "true" quality to be rated (i.e., you have to believe that movie quality is real and that it can be approximated). If you do not believe this assumption can be met, you may not want to read any further. But then again, if you do not believe this, you are unlikely to take anyone else's advice when considering watching a film.
And now the principle. The principle of aggregation says that when you are trying to measure something it is best to take multiple measurements and average them to get a more reliable estimate. Aggregation works by removing some or all of the bias in the individual measurements by averaging across measurements. For example, lets say that I want to estimate the distance between one building and another but all I have to measure the distance is a 12 inch ruler. I could estimate the distance by carefully placing the ruler end over end. This would result in a single measurement with some degree of error. However, if I asked ten of my friends to do this and averaged their measurements, the process would cancel out some of their error. Some people would over estimate the length, others would underestimate it, and assuming the over or underestimation was random across people, this process would work quite well (though it would be more accurate it if was about one hundred friends). Movie rating web sites take multiple measurements (reviews) and aggregate them in different fashions to try and get "purer" estimates of the movie's quality. Whether or not each site actually achieves this goal is up for debate and is the purpose of this series.
I plan this to be a three part series. Part 1 (this article) lays the foundations for the series and explains the most basic of the big three: IMDB.com, Part 2 concerns the "Tomatometer" from rottentomatoes.com, and Part 3 concerns the rating system used by Metacritic.com. Statistics are often better when visual. So whenever possible I will try to include graphs and figures to make my points. And as is always the case, dissent and constructive criticism are welcome.
Rating System 1: IMDB.com
The Internet Movie Database has the most democratic movie rating philosophy of the big three: it lets the viewer decide. On any given movie you can cast your vote on a scale from 1 "Awful" to 10 "Excellent." Votes are then averaged across raters to yield an overall score for each film. The cream of the crop movies make it into the "IMDB Top 250" whereas the real stinkers are relegated to the "Bottom 100" (for a recent example see this article).
Power to the people I like that IMDB gives some say to the actual viewers. Other sites often adopt the snobbish attitude that critics are the only people qualified to have an opinion about a film. I wholeheartedly disagree with this sentiment. It is one of the main reasons that I like to post reviews on Newsvine. A site like Newsvine allows me not only to post my opinions about a film but also to discuss them and hear the opinions of others. As a movie lover I could not ask for much more. Please do not take this to mean that I think IMDB.com has gotten it completely right, they certainly have not. To understand why, see my comments about sampling and selection in the weaknesses section.
Votes broken down by Demographics: Are you a male aged 30-44? You might be interested to know that while The Starter Wife has an overall IMDB rating of 7.5, you are more likely to think of it as about a 5.4 and if your wife is near the same age she is likely rate it a 7.6. IMDB allows you to see ratings broken down by age, gender, top 1000 reviewers, and US/Non-US viewers. This kind of information can come in handy in a pinch (for a visual example of this breakdown, see the figure above).
Weighted averages: IMDB does not publish raw average scores, instead they weight the scores so that some votes count more than others. They also place filters on votes to avoid Rick Astley style "vote stuffing." The problem is that they do not post how they weight the scores saying only: "The exact methods we use will not be disclosed. This should ensure that the policy remains effective. The result is a more accurate vote average." Most modern statisticians will tell you that weighting does not really work when the vote numbers are large enough. It is just a method that seems like it should, so people continue to do it.
They do offer "A true Bayesian estimate" for the top titles using the following formula:
weighted rating = (# votes / (# votes + min votes required)) x average rating + (min votes required / (# votes + min votes required)) x mean vote across whole report
They also note that only "top reviewers" are considered for the top movies
So, basically, all else being held constant, the more votes a movie gets the higher its average is. This is supposed to be a less biased estimate than the raw scores, and it probably is. But I would argue it is still too biased. Read on to find out why.
Sampling As I said before, to know how an average person from the population feels about a movie you need to sample (randomly select) people from the average population. This is pretty much impossible given the way that IMDB collects votes. IMDB's votes come from people who (1) presumably saw the movie and (2) took the time to rate it. Not everyone from the population is equally likely to see any given film, nor are they equally likely to go to IMDB and give it a rating. Therefore an average on IMDB will always be biased in a systematic fashion (i.e., it will be further away from a "true" rating of quality than we would find ideal)
Selection Like I said there are selection effects for certain movies and for people who rate them. Basically what tends to happen on IMDB is that ratings are either unimodal (most commonly) or bimodal. This means that the most common rating for any film is either a "10" or "1". In other words, when you take the time to rate the movie, it is normally because you either loved it or hated it. This explains a few things. First, it explains why The Dark Knight is currently #4 on IMDB's top 250 movies (see the figure to understand why). Second, it explains why Roman Holiday and Pirates of the Caribbean: The Curse of the Black Pearl have identical, weighted averages of 8.0. Despite attempts to avoid "vote stuffing" and incorporate "top reviewers" all of the distributions end up horribly skewed (again, see the figure).
My final conclusion is that IMDB.com is okay if you are interested in finding the average rating of people from your own demographic (who take the time to rate movies). Other "critic-only" web sites may not feature the voice of the everyman (or woman) but they do have better statistical methods. So if you are going quick and dirty, IMDB might be your best option, but if you have a little bit more time you might want to check out the sites I will critique in parts 2 and 3 of this series.
Coming Soon: Part 2, Rottentomatoes.com's Tomatometer
Note: All IMDB statistics current as of 10/23/08