So, as part of my trying to finish my college degree over the years, I find myself taking ECON 690: Senior Seminar: Economic Inquiry and Analysis. I missed the first class, as I was away on a business trip, so in preparation for my initial night of attendance, the professor suggested I read the syllabus and familiarize myself with Stata which is the statistics software package that is used for analysis of data in the class. In reading the syllabus, which is a treat these days compared to 30 years ago, I discovered there would be a heavy emphasis on econometric skills.

Panic set in. I mean it’s been 30 years since I did anything that approached econometrics really. What to do?

So, with the quick purchase of “Econometrics for Dummies” for the Kindle, I set out to re-familiarize myself with the ways econometricians do time series analysis. Pure joy.

It is so interesting to me what 30 years in the work-a-day world of software / machine learning will do to your views on the application of statistics by econometricians. It also helps to have read a little Karl Popper (The Logic of Scientific Discovery, among other titles) along the way. You look at the application of the methods involved and you start to have concerns. I mean like, do these methodologies actually make sense? Or, is this just a fancy exercise in quantifying historical behavior?

For those who may not precisely know what econometrics is, let me try to explain. In the application of this “science,” you essentially take a series of statistical samples at different points in time (called time-series data) and look for a longitudinal trend which you can then **slam** into a linear regression model. I use the word **slam** here, because to my knowledge people behave much more like atoms in the physical world. In essence, interactions should be modeled using the mathematical tools of non-linear systems.

The notion that if I model the past with linear regression, I can predict the future has always struck me as a bit odd due to the many complexities. Also, for the fact that if I build a regression model using a large enough slice of time to be certain in my confidence levels (according to the classical models), it will certainly contain the behavior of people who have long ago retired, or perhaps have even died. How can we predict any further behavior from these inputs for they no longer interact with our system.

I hinted at this with my professor a week ago. He said something about being an empiricist requires lots of data. My sense was that he was ducking my actual question. I mean we are talking about this notion of the more information we have, the more we are confident about the outcome. The econometrician’s common statistical methods are based on the steady augmentation of the confidence level. Assuming that our understanding will be non-linear in proportion to the number of observations (for an n times increase in our sample size, we increase our corresponding knowledge by the square root of n).

To use a well known example from probability text books: imagine drawing from a bucket containing a roughly equal number of white and black golf balls. Your confidence level about the relative proportion of white and black balls after 20 draws is not 2x the one you had after 10 draws. Nope following our rule, it is merely 1.41 –sqrt(2). At 30 draws, the confidence level rises to 1.73 –sqrt(3). To be twice as confident about the proportions you had at 10 draws, you need 40 draws –sqrt(4).

Life is complicated and it seems the econometricians of the world want to play down that fact in their models. I suspect they know that traditional statistical methods fail us when we encounter distributions that are asymmetric (aka most of real life), but simply can’t get around their “good enough” solutions. Now let’s say there are 97 white balls and 3 black ones in that same bucket. Our knowledge about the proportion of black balls will increase much more slowly than our square root of n. On the other hand, our knowledge of the presence of black balls will increase instantly once one of them has been found. Dealing with asymmetry in knowledge is non-trivial and it is a bit worrisome that much of econometrics doesn’t seem to acknowledge this explicitly. Well, except to shrug their shoulders when a black swan event hits and say “well that never happened before” …

The reader who is familiar with Robert Lucas will know of the source of his Nobel Prize in Economics. For those unfamiliar, the Lucas critique can be simplified as follows:

IF: People are rational

THEN: They would use this rationality to discover predictable patterns from the past

AND: Using these patterns, they would adapt their strategies

THEREFORE: using past information to predict the future should not be possible with rational people involved

I happen to think this critique is on fairly solid ground. I also think the future is unknowable in the sense that we humans have such poverty of imagination when it comes to trying to imagine things that haven’t happened yet. Remember those unbound tails on the side of future events (good and bad)?

Right now, having just jumped back into it having been away so long, econometrics feels a lot like a pseudoscience. Worse, econometricians seem to have among them a large group of idealistic nerds. I wonder if the rampant use of equations is designed to lend credibility with the hope that no one observes the complete lack of controlled experiments.

**Onward naive empiricism (I guess) …**