The Saga of PLS


by Gaston Sanchez

The Econometrician

The father of Partial Least Squares was the great Swedish statistician and econometrician Herman Wold. In fact, his full name was Herman Ole Andreas Wold, and he was not born in Sweden but in the neighbor country of Norway. He was born on December 25th, 1908, as the sixth child of Edvard and Betsy Wold, in the small town of Skien, the administrative center of Telemark county, 133 kilometers (82 miles) south of Oslo. He spent there his first three years of life until 1912 when his parents decided to move to Sweden. Because of harsh economic times in Norway, the Wold family, except the two oldest children, relocated to Lidköping, a small town southwestern Sweden, near Stockholm. Here Edvard Wold—a skilled furrier—made a living designing and making leather coats lined with fur, an essential piece of apparel for the cold Scandinavian weather.

Herman Wold grew up and went to elementary school in Lidköping. However, since there was no high school in town at that time, he had to attend high school in Skara, one of the oldest cities in Sweden, with a long educational and ecclesiastical history—40 kilometers (25 miles) away from Lidköping. At young age, Herman showed a good talent for mathematics, and after high school he enrolled at the University of Stockholm in 1927 where he studied physics, mathematics, and economics. There he met Harald Cramér, the renowned Swedish professor of Mathematics and Statistics who made Herman change plans. “I was greatly impressed by him”, Herman said, “and interested in his work.” Since Wold was very interested just in statistics he decided to stay under the tutelage of Cramér learning about elements of probability, statistics, and risk theory. In 1930 Herman finished his degree and he took his first job in the insurance industry where he started to do actuarial work.

Herman’s interest in statistics was greater and deeper than his intention to stay in the actuarial field, and so he decided to go back to academia and get a PhD degree. Once again under the mentoring of Harald Cramér, Herman took courses on stochastic processes and time series. Moreover, he soon got caught up in the excitement surrounding the emergence of probability theory recently introduced by the famous Russian mathematician Andrey Kolmogorov.

Time Series Studies (1932-1938)

In 1938 Wold received his doctorate with the thesis A Study in the Analysis of Stationary Time Series. For his dissertation, Wold did his research on stationary stochastic processes, studying the one-step prediction of a time series, and proposing his Decomposition Theorem —one of the most famous results by Wold, and one of the essential elements in the foundation of time-series analysis and forecasting. From the historical point of view, it is worth mentioning the use that Wold made of the least squares principle for his doctoral work. Herman studied the one-step prediction of a time series using the principle of least squares. Basically, Wold proved that any stationary time series can be partitioned into a deterministic component precisely predictable from its past, plus a random component which can be modeled as a weighted sum of “innovations.” In simple terms, given a series of values at different times , , , , , , the decomposition theorem is used to express in terms of the preceding values as a weighted sum:

in which the coefficients are obtained by least squares regression of onto . Accordingly, Wold’s decomposition showed that the three classic time-series models—the model of hidden periodicities, the moving-average model, and the autoregressive model—could be seen as different cases of the same general model. “The role of least squares,” Herman wrote, “was important too.” To the naked eye this might not seem to be very relevant but the truth is that Wold had already started to embrace least squares as one of his favorite analytical tools. From that moment on, the principle of Least Squares would occupy a central place in Wold’s mind and, without exaggeration, even a sacred place in his heart. Whatever model-building endeavors that he would later faced, he would always try to find a way for using Least Squares to estimate the parameters of a model, no matter how simple or complex the models and equations were.

Consumer Demand Analysis

After his doctoral studies, Herman remained in the University of Stockholm as a lecturer on actuarial mathematics and mathematical statistics. He married Anna-Lisa Arrhenius in 1940, and they had three children: Svante, Maria and Agnes. He was very proud to be the son in law of Svante Arrhenius—the famed Swedish scientist founder of electrochemistry, and Nobel Prize for Chemistry in 1903. In 1942 Wold accepted the Chair of Statistics at the prestigious University of Uppsala, the oldest university in Sweden, founded in 1477. At the time of Wold’s arrival to Uppsala, however, the Statistics Institute was a small one, formed by one professor, one half-time assistant, and one half-time secretary. This would change in 1945 right at the end of World War II when the government decided to invest and expand Sweden’s Universities.

With a fresh tenure position, Wold started his own research on Demand Analysis and econometrics modeling. As a matter of fact, he had already started to work on demand analysis the summer before defending his dissertation, appointed by the Swedish government to perform such studies on the national economy. More specifically, Wold carried out the study of consumer demand analysis from 1938 to 1940. The main line of approach was to combine the analysis of family budget data and market statistics (time-series data) so as to obtain a unified picture of the demand structure in Sweden between 1920 and 1938. The airs of war were starting to fill Europe’s atmosphere and it was clear that if a conflict broke out, government rationing policies for food and goods would need to be implemented. As Europe entered the War period, Wold’s work intensified, measuring price and income elasticities of demand—how sensitive the consumer demand was to changes in prices and income.

Although the government commission only lasted two years, Wold spent about 14 years doing research and time series analysis of the collected data, as well as publishing several articles between 1938 and 1947. One of such publications appeared in 1940 in the dual form of a research report; written by Herman Wold and Lars Juréen that material would later be used for a specialized textbook on econometrics Demand Analysis: A study in econometrics (published in 1952)—which became a classic in the field. “The monograph is written,” Wold said, “in the dual form of a research report and a specialized textbook on econometrics.” Like in his doctoral research, Herman Wold made extensive use of Least Squares for estimating the parameters in his models and making accurate forecasts.

The Least Squares Affair

Directly related with his work on Time Series and Demand Analysis, Wold got involved in a peculiar confrontation that happened within econometrics during the 1940s: Ordinary Least Squares against Maximum Likelihood. This period is crucial for the development of PLS because this is the time when Wold, somewhat stubbornly, embraced the Least Squares principle against most other methods, especially against the overuse of Maximum Likelihood. To understand why, we need to talk a bit more about econometrics and demand analysis.

During the first half of the twentieth century, one of the greatest challenges in econometrics was the estimation of demand analysis equations, and simultaneous equation systems. Before the 1940s, the main analytical tool used for this task was ordinary least squares (OLS). Although it was not the perfect tool, OLS was able to get the job done in most occasions. However, as models started to become more sophisticated, there were more and more cases where OLS simply didn’t seem to work. Consequently, the method of least-squares started to being criticized and some econometricians began to stress out that the device was not foolproof if applied uncritically.

Partly due to lack of understanding, partly due to conceptual confussions, econometric theorists were burdened with a myriad of problems that took many years to be solved or to be aware of. We’re talking about things like identification, measurement errors, multicollinearity, model choice, and so forth, that are known to affect not only OLS but many other estimation methods when applied to economic time-series and simultaneous equations. These issues are now taught in most introductory and intermediate econometrics courses, but back in Wold’s days all these problems were the cause of so many headaches for econometricians trying to figure out how the economic systems work.

Around the early 1940s, the challenges posed by econometric models attracted many mathematicians, statisticians, and economists. This “gold rush in macroeconomic model building,” as Wold used to call it, captivated the minds of very talented economists. Among the most avant-garde groups was the Oslo school of thought, whose most distinguished protagonist was the Norwegian Trygve Magnus Haavelmo (Economics Nobel in 1989). Haavelmo’s most proclaimed accomplishment was the Probability Revolution in Econometrics initiated in 1943 with the premise of adoptating a probability approach in Econometrics. Among other things, he was also the one who highlighted the problems of Least Squares when applied for estimating simultaneous equation systems.

Trygve wrote two influential articles: the The statistical implications of a system of simultaneous equations in 1943, and The Probability Approach to Econometrics published in 1944 as a supplement in the journal Econometrica. Both works set a before-and-after in Econometrics. On the one hand, Haavelmo introduced modern statistical inference based on probability models to economics. Although he was not the first one to introduce elements of probability theory into economics models, he was the first one to import the statistical inference approach for testing hypotheses. On the other hand, Haavelmo proposed the idea of using several structural equations simultaneously to construct econometric models following a probability approach.

Broadly speaking, one of the main questions among economic researchers had to do with: How to mathematically model the economy’s behavior? For instance, how to model a system of equations for demand and supply? In a very simple model the Demand of a good, say coffee, depended on the price. In turn, the Supply of coffee depended on how much demand was for coffee, as well as the cost of production. While there was no doubt about the general theory of demand-and-supply, there was a heated debate on the mathematical and statistical form such theory should took.

The Oslo school advocated for a system in which demand and supply affected each other simultaneously—at the same time:

The Demand equation reflects the behavior of the consumers of coffee: they respond to the price of coffee—reflected in . The Supply equation reflects the behavior of the producers of coffee: they set the price depending on how much demand there is for coffee, as well as the cost of production. In other words, the demand and supply for coffee are simultaneously determined. Simultaneous equation systems can be much more complex and sophisticated, but one of the basic ideas, as its name indicates, is that of simultaneity. More important, Wold disagreed with this model of demand and supply in which both relationships were determined within the same time period. He found it difficult to believe that the economic system was determined simultaneously. For him, causal forces only worked in one direction at any one time and these forces should be reflected in the models.

The Stockholm school, strongly based on the works of Dutch economist Jan Tinbergen—first Nobel Prize in Economics, 1969—thought of an economic system not in terms of simultaneity, but in terms of time-lagged periods. Tinbergen called these types of model causal chain systems and he used to illustrate them with arrow schemes, very similar to the path diagrams that would later be employed within structural equation models as wel as within path models.

Herman Wold had been formed under the Stockholm school tradition, in which the econometric models of causal chains included an ex ante–ex post component. Basically, these models did not include the idea of simultaneity, but rather a time-lag component in which one thing had to occur first, and then another thing would followed. Taking the demand-and-supply example of Coffee, the associated equations would be expressed as:

Wold put emphasis on knowing the interrelations of all the variables in terms of the time lapse between causes and effects. In this way, one could reduce the system of dynamic causal relations to one final form equation. The condition to achieve such reduced form involved no simultaneously determined equations like Haavelmo suggested.

Before Haavelmo’s contributions, the errors and in the models were thought to be due to measurement discrepancies. What Trygve proposed instead was to treat the errors in the statistical sense of random noise, and assign them a certain distribution. In this way, the Maximum Likelihood device had the door open, allowing researcher to test and discard hypothesis. His advice was to avoid using Ordinary Least Squares (OLS), since this method provided inconsistent results when applied to simultaneous equation models. Instead, he favored and promoted the use of Maximum Likelihood.

Wold’s effort to rescue OLS

Haavelmo’s advice against OLS was a shocking statement for Wold. “I felt so disturbed,” Herman wrote, “by Haavelmo’s wholesale dismissal of it.” Considering that all of Wold’s work had been based on OLS, and that he had achieved very good results with it, he was very surprised by the bad press least squares received. Herman was anything but “spurred” by Haavelmo’s “rejection of OLS regression.” Could it be possible that Haavelmo was right and that OLS were to be banned? If that were the case, all results previously obtained by Herman were useless, something that didn’t seem to match at all with his analysis and practical evidence.

In the middle of the 1940s, together with Ragnar Bentzel, Herman set out on a task to see whether Trygve Haavelmo was right or not. They intensely studied the simultaneous systems of equations and, to Wold’s relieve, they found hope for Least Squares, publishing the proofs and conclusions in the 1946 article “On statistical demand analysis from the viewpoint of simultaneous equations”. Simultaneous equation models could be divided in two broad categories: recursive simultaneous equations and non-recursive simultaneous equations. Or as Wold preferred to call them: causal chain systems and interdependent systems, respectively. When a system could be expressed in recursive form, Wold and Bentzel showed that the method of Ordinary Least Squares could be perfectly used to estimate such models and give equivalent results to those under Maximum Likelihood. The challenge, however, was with the non-recursive models where OLS did not provide adequate results. This issue would obsess Wold for the next decades, trying to find a way that would let him estimate non-recursive systems with Least Squares.

By the end of the 1940s, Herman Wold had already been working for over 17 or so years (1932 - 1949) on a number of topics including time series, insurance statistics, analysis of consumer demand, and econometric systems of equations. Moreover, he was basically the only Scandinavian econometrician outside of Oslo with international reputation. Niels Kaergard gives a revealing description of how Wold was perceived among the econometrics community (Kaergard, 2012):

“Where the Oslo school was a central part of the internatinal mainstream econometric tradition in the 1930s, 1940s, and 1950s, Wold was seen as a man with a rather special point of view.”

In summary, the main question of debate became whether the structure of economic models was simultaneous or recursive? This was reflected in the theme “Toward a verdict on macroeconomic simultaneous equations”, title of a publication edited by Wold. The main stream in econometrics for estimating simultaneous equation systems was the one based on the method of Maximum Likelihood. Wold was not part of this group. On the contrary, Herman was a prominent figure in favor of recursivity and use of least squares, a stance that he would maintain for the rest of his life. However, it was Haavelmo’s point of view that became the dominating one. The Maximum Likelihood convenience and elegance offered the mechanism of hypothesis testing, too tempting and irresistible to let it go by most researchers.

Wold always tried in one way or another to find a solution by least squares. Making distinction between simultaneous equations of recursive and non-recursive type, Bentzel and Wold showed that Haavelmo’s wholesale dismissal of OLS estimation of the structural equations was applicable to non-recursive systems only, but not for recursive systems. For approximately the following 15 years, Wold went on a tortuouse mission to find an OLS solution for non-recursive systems.

Next chapter