Filed under: film
I’d like to preface this by saying that I am a big fan of Anne Thompson and Kris Tapley’s Oscar-race podcast. It has just the sort of insider-knowledge-pitched-slightly-over-my-head vibe that I like in conversational podcasting. The general respect and affection in their relationship is given spice by just the right amount of occasional needle and crossness. I like Anne hitting the table (at least I assume that’s what she’s doing). And most of the time it seems to me to have just the right balance on the question of whether taking the Oscars seriously is silly or not.
But I have to take exception to what they say on statistical approaches to predicting Oscar outcomes about seven minutes in to their post-Oscar post mortem. Noting but dismissing the predictions at Fivethirtyeight.com, we have the following exchange:
Anne: He got a lot of his predictions wrong because it was a very crude system he was using
Kris: There’s no way to Nate Silver this kind of thing
Anne: Exactly– you have to have a little bit of knowledge, experience, intuition — [to] see the movies, talk to people, you know — what we do for a living is required.
The evidence this year, though, suggests that there are ways to Nate Silver this kind of thing — that is, to come up with a good prediction based simply on the data available and statistical models based on past races. Let’s compare the results from the “Gurus o’ Gold“, a college of 14 Oscar predictors to which Anne and Kris belong, with the results from a statistical model put together by Ben Zauzmer, a student at Harvard.
Ben used his statistics to predict the results of 21 of the 24 races. He got 4 wrong. If you look at the aggregate results for the gurus in the same 4 races, they got 5 wrong. Looking at the gurus individually, I count 4 who did better than Ben on this subset (including Anne), and 8 who did worse (including Kris).
If you want to make Zauzmer’s stats look worse, then look at the whole field of 24 awards. Ben didn’t make predictions in the categories of documentary short, live action short and animated short categories because he doesn’t think the data are strong enough. If you count this failure to engage as getting the results wrong Ben gets seven mistakes out of 24. The gurus have five out of 24. But look at the gurus individually and six did better than Ben (including Anne and Kris), six did worse. So even on the less charitable interpretation of what he achieved, he’s right in the middle of the pack.
If by “Nate Silver-ing” you mean calling every race accurately then no, you can’t Nate Silver the Oscars, or at least no one has managed it yet. But the idea that you need to have a lot of insight or insider knowledge to do as well as the people who are best at it doesn’t seem to wash. An outsider with data and stats can, it seems, do as good a job as reporters doing it for a living.
By pointing this out, though, I do not for a moment mean to suggest that Anne and Kris should pack up shop. The results of a race matter, for sure — but so does, like, the race. Things being overtaken, leads stretching out, resources being squandered or carefully husbanded — that’s what’s fun to watch. And in this case, for me, there’s a bonus in the insights into what matters to film people and what is seen, and not seen, as working. Not to mention gossip. The stats don’t give narrative or context or tangential insights, and that’s what interests me, much more than the final results. I will be listening to Anne and Kris again next year. But that doesn’t mean that, on the home straight, stats aren’t as good as most gurus and better than quite a few — and the gurus might get better if they acknowledged that.
1 Comment so far
Leave a comment