Phil Roth home

How I Learned to Stop Worrying and Love Rotten Tomatoes

Maybe not love it… but use it.

It’s a Friday night and my wife and I are searching for a movie we’re both interested in. I’ve loved using the Metacritic website for a long time, so I’m tapping away on the Movie Finder by Metacritic app on my iPhone. My wife, who doesn’t have brand loyalty when it comes to movie review aggregators, is tapping away on the Movies by Flixster, with Rotten Tomatoes app on her iPad. Every time without fail, she’d get a better result quicker. It was pretty obvious that her app was better. But I refused to switch. I couldn’t support Rotten Tomatoes.

“Why not?”, you may ask. Well when you consider the algorithms that the two sites use to find their final movie score it seems like Metacritic is clearly superior. Rotten Tomatoes evaluates all the reviews it can find and classifies them as either positive or negative. Their final score is simply the percentage of the reviews that are positive. Metacritic converts every review to a score on a scale from 0-100. Their final score is an average of those scores that favors the more influential reviewers.

To me, when Rotten Tomatoes reduces each review to either positive or negative they are discarding a lot of information. Metacritic is taking a more scientific approach that uses all the information that’s available to them.

So, how do the results of those two approaches compare? Well, I took a look:

Rotten Tomatoes versus Metacritic

In this plot, a dot shows each movie that has a corresponding Rotten Tomatoes and Metacritic score. The dots turn red as more movies pile up on a spot. There’s also a least squares fit plotted that shows a good relationship between the two scores. A couple things jump out. First, the Rotten Tomatoes scores span the whole range from 0 to 100. Very few Metacritic scores drop below 10 or rise above 90. This makes sense as a truly bad movie has a good chance of having absolutely no positive reviews, but it will be almost impossible for every critic to give it their absolutely lowest score.

The flatter slope of the fit around 50% also makes sense given the two methods. Assume each movie has a hidden actual score that each review is trying to measure with some error. If that score is middling but ultimately disappointing (say around 45%) then the Metacritic average of all the reviews will also be around that number. But if each review is accurate in measuring the 45%, they could all be determined to be negative and the Rotten Tomatoes score would drop to much lower than that. As the “actual” score rises, the Rotten Tomatoes score would quickly switch from negative to positive. This is explains why the Rotten Tomatoes scores sweep through that middle territory quickly and the fit is flatter.

Another interesting thing to look at are the outliers. Most of them are obscure movies that have a low number of reviews but not all. Here’s a list the ten movies where the actual Metacritic score is farthest away from the fit prediction (the absolute farthest is listed first):

Title Year Rotten Tomatoes Metacritic
Extreme Days 2001 43 17
Paa 2009 60 30
Half Baked 1998 29 16
I’m the One That I Want 2000 59 81
To Save a Life 2010 33 19
The Viral Factor 2012 56 32
Screwed 2000 13 7
One Man’s Hero 1999 38 24
Drop Dead Gorgeous 1999 45 28
The Majestic 2001 42 27

Ultimately, what stands out is how well related the two scores are. Good movies are generally good and bad movies are generally bad on each scale. Ninety percent of the Metacritic scores are within +/-10 points from the fit prediction (that range is shown on the plot with the thinner dashed lines). When you’re measuring something as fuzzy as the critical response to a movie with just one number, that seems pretty good.

So I’ve finally been convinced to use the app with the better interface, even if it displays the Rotten Tomatoes scores. All it took to change my mind was a plot.