clock menu more-arrow no yes mobile

Filed under:

A Statistics Primer From Coug-A-Sutra

Brian's note: We're switching places with WSU Football Blog (at least for a post). For this morning, I'm over there answering three questions from Coug-A-Sutra. He's got the return trip over here and was given the freedom to do whatever he pleases. So strap in, y'all.

Hello CougCenter Followers. Hope you all are having a fantastic week!

As some of you know, my name is the Coug-A-Sutra and I am one of the original writers on the WSU Football Blog. A few weeks back, Brian Floyd and Sean Hawkins (founder of the WSU Football Blog) had a conversation about having a blog-exchange, with the idea that each site would write a post about the positives and negatives of sports statistics and then publish it on the other site (WSU Football Blog at CougCenter and vice versey).

Thankfully for us, we had just started a series called "Three Questions"--a three question interview series with notables involved in and around Cougar and Pac-12 sports. And, because Brian has become the Super-Star-In-Rent-A-Car for the mighty SBNation....

...he fit in perfectly with what we were already doing -- namely, trying to bring credibility to the universe of idiocy I create regularly on the WSUFB blog.

Unfortunately, the same cannot be said for our visit to these illustrious pages. I mean, not since last fall's visit by Jim Moore has Cougcenter's reputation been threatened by a guest post. And for this reason, my deepest apologies to the CC editors, as well as you, the millions of readers of this site, that I was the one "elected" to serve as guest blogger...

So, with all of those disclaimers in mind, click on the jump to see how the WSUFB blog views sports statistics as they relate to Cougar Football.

CougCenter followers, those of you who are familiar with the WSU Football Blog know that I am a man of many, many mysteries. Unfortunately for me and our thousands of our readers worldwide, many of those mysteries do not include the ever important "G." And when I am talking about the "G," I am not speaking about "THE G"..

Instead, I am talking about the "G Factor"--as in, that powerful, latent variable known as "General Intelligence." So, while I tend to rule the world with respect to my spiritual qualities, I also tend to operate without several important cognitive faculties. For this reason, instead of rambling on about something that I don't know much about-check that, something that I know NOTHING about-I thought it timely to turn the discussion over to my alter-ego, the Cougla Khan. In case you don't know him, the Khan is someone who carries with him several important Delusions of Grandeur. Chief among them is the fascinating idea that he works by day as a university professor. So, without further ado, take it away, Mr. Khan.

Thank you, Sutra, and Hello Cougar Nation! Today, I am here to give you some blinding insight into how it is that I view statistics and Cougar sports. But before I get to that, I think it's important to offer a couple of "starter" points.

First off, I want you all to know that, contrary to popular opinion, I am NOT a statistics hater. In fact, most of the research I conduct as a part of my duties as a tenure-track faculty member involves the use of statistics. So, where the universe of statistics is concerned, please know that I'm a generally a lover and not a fighter...

Second, I encourage all of you-in case you don't already-to run far, far away from the Sith Lords out there who speak in absolutes.

Meaning, just because I tend to avoid the use of fancy-dancy numbers to evaluate my sports teams doesn't mean that those numbers are without utility. In fact, there are several instances where I find sports statistics to be interesting and highly useful. In fact, Mark Sandritter's recent post provides a classic example for how numbers can be used thoughtfully and eloquently to describe or explain important sports-related phenomena.

Of course, there are other great examples from this site which show how statistics can be used in a thoughtful, effective, and appropriate fashion.

Third, and relatedly, I encourage all of you to run away from any notion you may have that the terms "research" and "statistics" automatically lead to concrete notions or "facts." This is not to say that formal research is not generally sound or important, because in most cases, and in most disciplines, it is. However, where stats are concerned, it's important to remain mindful of a few key issues facing the current scientific/statistical enterprise. Namely:

(1) Any sound methodologist in the social sciences can pick a random sample of studies from the research literature and quickly find examples of published studies that are seriously flawed if not flat-out wrong. The reason? People are oftentimes very sloppy with their methods, undertrained, or both- resulting in poor research and/or peer review. The result: Bad studies make it through peer review all the time-even in some of the most reputable journals!

(2) "Methodology" (e.g., the study of methods) is an on-going social science enterprise in its own right. There are literally hundreds of scholarly journals out there which deal only with advances in statistics. And, as a part of that "field" there are numerous sub-fields and competing disciplines, many of which compete against each other for legitimacy.

In fact, I was at a conference a few years back where some of the leading statisticians in sociology, education, psychology, and economics literally went to blows over whether key parameters in certain statistical models should be fixed or allow to vary. It was crazy!

Fourth, most statistics used in sports are derived from an "econometric" tradition rooted in---SURPRISE-economics. This tradition, while incredibly useful in policy research, is not my area of interest or "expertise." Instead, my focus tends to be on latent variable modeling, with particular attention toward more complex, latent social-psychology processes and theories. The models that I work with tend to examine phenomena as they relate to population heterogeneity. So, in practical (and oversimplified) terms, while an economist may question whether a social or educational policy "works" for the population at large, my research would examine some of the more nuanced aspects of what sub-populations that policy may serve, who it may work the best for, under what conditions, when, and why?

The implication of all of this is two-fold: (1) There are key assumptions made in a lot of sports-related metrics which contrast with my own statistical field; and (2) Statistics, like other forms of social science inquiry, is something that should be viewed as "evolving." That is, statistics, just like other forms of science are consistently subject to rigorous testing and falsification. And because of that, the statistics we use today may be found to be flat-out "wrong" for certain applications and/or settings tomorrow. It's just the way it is. And certainly, sports metrics should be viewed no differently.


Okay, so now that I've rambled and rambled on like Sutra for about 4.5 hours and counting, let's get to the real meat of today's post re: "the positives and negatives to using sports to analyze Cougar Football."

Generally, I don't like to use most fancy sports metrics to unpack the fortunes of our football or basketball teams. Part of the reason for that stance is that after spending 8 to 12 hours a day-7 days a week-either in the research literature or crunching my own numbers, I'm usually burned out on "real" research. So, when it comes to my hobby-Cougar Sports-the last thing I want to do is get bogged down with the day-to-day details of "rigorous scientific analysis." But, of course, that all depends on the day, the article, and my mood. So, to answer the question above, here are a few key things that generally go through my head when evaluating sports metrics and their potential utility for OUR team.


To begin, whenever I give a research presentation to a public audience, the first questions I receive usually center around the research's reliability or validity. And when people ask those questions, they're often referring to the reliability and/or validity of the measures I use in my research. To clarify terms, "reliability" refers to whether the instrument or metric is actually measuring "something." And "validity" pertains to the extent to which the construct of interest is actually being captured by the instrument/measure/metric in use (as opposed to some other construct or "noise.")

Thankfully, because most sports metrics represent off-shoots of observed variables (as opposed to more latent measures, such as favorite Cougcenter terms like "heart") conventional notions of reliability and validity are not really in question when it comes to sports-related metrics. But two other important considerations remain pretty important:

The first of these issues relates to the research principle of "parsimony." Essentially, the term "parsimony" is used in statistics to remind researchers that the simplest model is always the best model, all else equal. So, for instance, a researcher should never use the BMI to predict health outcomes if standard measures of height and/or weight offer as much, if not more, explanatory power.

Of course, the way that researchers determine which models and metrics are the most "parsimonious" is determined by performing multivariate regression models, the results of which are seldom published on sports sites. And, since (a) I don't tend to read those sites very often; and (b) When I do, I often can't see how it is that they arrive at their conclusions, I remain skeptical that the fancy metrics used in these sites are inherently (e.g. scientifically or practically) better than the more bare-bones statistics that are typically provided on the tee-vee.

Predictive Validity

Some folks get jazzed about measuring stuff just to measure stuff. I, on the other hand, am not one of those cats. In fact, when it comes time for me to visit a blog for my daily sports fix, I usually am interested in finding out what information is out there that will help me predict whether we are going to win or lose each Saturday. In this respect, I tend to be much more interested in measures or metrics that are predictive, than simply finding ones that are descriptive.

For example, though I have periodic moments of sexiness, I am for the most part a guy who struggles with chronic bouts of chubbiness. In fact, based on the technical definition of some clinicians, I might even be considered "fat." But, because I am dumb and married, my fatness means little to me other than bringing me a certain level of personal embarrassment when it comes to taking my shirt off during the summer and/or whilst on vacation.

Of course, the significance of my fatness would take on a whole different meaning if I was informed by my doctor that my "chubbiness" involved elevated risk for long-term health difficulties. And for me, this is when close attention to my weight my weight would become especially important to me. Because, to the extent that my body weight is significantly associated with certain health outcomes, then the measurement of weight would be said to carry predictive qualities.

Anyhow, as a part of evaluating the predictive quality of any metric, there are several other indices or elements which I consider in determining their significance. Some of these are:

1. Effect size. While a significant "p-value" tells me whether or not there is a statistically significant result between a predictor variable and outcome, that stat alone doesn't let me know the strength or magnitude of an association. So, I always look to see how large the effect size is of each coefficient in a statistical model. So, if every 20 pounds of weight (given height) increases my risk of heart attack by 5%, ceteris paribus (all else equal), I want to know that. After all, that 5% risk isn't trivial for me and my family, especially if it puts me 6 feet under before my time. At the same time, a 5% increase isn't all that large either, right?

2. Proportion of Variance Explained. In statistics, the coefficient of determination (r-squared) is a statistic that lets the researcher know the percentage of variance explained in Y (the outcome) by the predictor (x). This statistic is used by researchers in a bivariate setting (when there are two variables) and it is also used to capture the TOTAL variance explained in a multivariate setting or model. Take a simple correlation coefficient as an example. Let's say that there is a .6 correlation (which is high for most social science data) between (adjusted) offensive efficiency in the first half of a football game and the score at the end of the first half. When I square that correlation coefficient, I get the proportion of variance explained between those two variables, which you math majors know is 36%. This means that 36% of the variance in first half score is explained by offensive efficiency. Although this percentage would be considered solid by social science standards, it is also important to recognize that 64% of the variance-THE MAJORITY OF THE VARIANCE-is explained by variables not included in that test! Unfortunately, very seldom do I ever see a report of the R-squared between a football or basketball related metric and outcome, let alone reports of variations in the R-Squared during "model building."

3. Omitted Variable Bias. There is no possible way to control for the world in any statistical model. However, a key challenge for all researchers is to control for those variables which are the most likely to confound the relationship between predictor and outcome. Unfortunately, there are a host of variables germane to football which are difficult, if not impossible to capture, due to the situational and contextual qualities of the game. Mike Leach talks a bit about that issue and dynamic here. Related to this point is that because key intangibles are often not captured statistically in our metrics, there is always the danger that they are selectively picking up the variance of something else-or a host of other "something elses"-when they are examined in a multivariate setting or context. Ultimately, this makes me concerned that any established relationship between these metrics and game-related outcomes might, in fact, be spurious.

4. Number of Data points. Arguably, the whole enterprise of sports statistics gained its currency through Moneyball and Sabermetrics. Without getting too deep into those things, what made this stuff so great, at least in my mind, was that the stats advanced in Moneyball actually predicted key outcomes over the course of a season. Meaning, if you got the right set of players and played the game a certain way, OVER THE COURSE OF A 162 GAME SEASON, you could expect, on average, to see certain kinds of results. So, in the case of the A's, if you had a pitching staff that allowed X number of runs per game, and played a certain way with a certain group of players, than you could expect, OVER THE COURSE OF A 162 GAME SEASON, that you would see certain amounts of runs scored which could lead you, theoretically, to a certain number of wins. What is important here is to note the number of games played and the number of events (at bats) nested within each of these games. As a consequence, important variations related to performance against good or bad competition, performance in day versus night games, games played in heat or cold, games played against right and left-handed pitching, at-bats with runners on base, in scoring position, and so forth, all generally even out.

In contrast, during the course of a football season, there simply are not enough data points (or games) for all of those mitigating factors to: (a) even themselves out; and/or (b) for the notion of what is "average" to have practical value. The result again: Many metrics with highly questionable predictive quality or validity. At least in my book. My very small book.

5. Nested Dependencies. One of the ways that sports statisticians interested in College Football account for the lack of needed data points over the course of a season is to increase the (sample) size of their data sets to include teams across all conferences. Ultimately, this approach has implications for one of the primary assumptions of Ordinary Least Squares regression-namely, the assumption that the estimated standard errors in each model are independent of one another. Unfortunately, in sports, and college sports, in particular, most data violate that assumption. Here, a good way to think about the nature of data-related nested dependencies is to consider how much offensive and defensive numbers are "nested" within each conference. That is, one of the possible reasons why SEC defensive numbers are so spectacular is not only because their defenses rock, but also because their offenses are maybe lacking in a certain something. Of course, the same is true for offensive numbers in the Pac-12. We usually rack up the offensive numbers because our defenses aren't big and fast enough to keep up. Ultimately, the practical result of these dynamics and issues is that all of these statistics probably cluster according to conference. And, if these sports econometric types are true to their economist brethren, they'll fix those variables when examining their predictive qualities, resulting in an understanding of the "average" effect of those metrics across conferences. But, given that all of our teams play in conference during the bulk of the season, and given the variations in play between AND within conferences, it's probably more important to understand how those statistics might vary as a function of conference (e.g. this implies an examination both of cross-level interactions as well as issues related to multi-level factorial invariance). Ultimately, until such time that I see those issues substantively and openly addressed (and they might have been, I just haven't seen them), I'll remain skeptical of their relevance and utility.

In sum, there are lots of good reasons to use statistics to better understand the processes and nuances associated with Cougar Football and its players. But, there are also good reasons to be skeptical about the ultimate utility of those statistics, especially regarding their predictive properties.

Personally, I am glad that there are sites in the Coug Blog-O-Sphere like Cougcenter that, notwithstanding this post, allow me to take a cool glimpse at numbers in a way that informs me without taking too much time or energy.

At the same time, hopefully this post will help you understand why the WSU Football Blog generally tends to favor a more limited use of some of those fancy metrics.

Thanks again to the Cougcenter Editors for having us participate in this great exchange. And congrats to all writers and members of the Cougcenter community for all of your tremendous successes!!!!!!

Now, let's go win some games this fall, shall we?!!!!!!