I’m not Laughing Either

When I first started watching college basketball, I found the statistics kept fairly useless. Naturally, coming from a baseball background. (As a fan, the only sports I know I’m able to perform at better than the average person my age are hockey and swimming [I'm also terrible at softball].) Ken Pomeroy and Jon Gasaway, both now writers at Basketball Prospectus introduced me to tempo-neutral statistics. Traditionally, basketball players are measured by the number of events per game: points per game, rebounds per game, etc.. Tempo-neutral statistics use as the denominator possessions instead of games: points per possession, rebounds per possession, etc.. This is a major improvement in perspective because not all teams play the same style. A fast-tempo team like Texas is currently averaging 75.4 possessions per 40 minutes on the floor, their offensive philosophy can be inferred to be that the first good shot available should be taken. A slow-tempo team like (prototypically) Air Force averages 60.1 possessions per 40 minutes. They attempt to eat clock by moving the ball around until the shot clock starts to run out while denying their opponents high percentage shot opportunities on defense.

A player on Air Force could theoretically be the best shooter in the country and still not make point-per-game leaderboards, which means that the PPG statistics aren’t capturing the information that you would value when evaluating top shooters. Tempo-free shooting would reveal his prowess.

The point is that in data analysis, it’s important to neutralize contextual factors, when possible. This article is an egregious example offailure to perform even the most obvious context neutralization, i.e. dividing something you’re counting by some other number that gives you a rate that captures the facts that you want to understand.

The article reports on a quick and dirty corpus analysis of the token ‘(laughter)’ in White House presser transcripts to estimate how receptive the press corps is to the white house press secretary. The formula apparently chosen is laugh_count / days . I don’t listen to politicians and their flacks any more frequently than they read the laws they pass, so I don’t know if it is true that the press secretary speaks with the press for the same amount of time every day, but I find it unlikely in the extreme, especially when the article quotes a Washington Times correspondent with, “Robert’s little digs and evasions have lost their power to amuse — particularly since we haven’t had a presser since July.”

If true (and it can’t be) that’d be like measuring a basketball player not by points per possession, or points per game, but by points per week. Some weeks his or her team doesn’t play, but that doesn’t mean he’s missing shots.

The moral of the story is that if you’re going to do a corpus analysis, even if it’s for a silly piece like that, you have to count the right things, plug them into the right formula, and report the result accurately. Presenting your source code is always a great idea.

Leave a Reply