The Saga of PLS

by Gaston Sanchez

The Soft Modeler

Herman’s agenda fell beyond his interests on econometric analysis. Above all, he was passionate about model-building and philosophy of science. He was also very much interested in more general modeling applications within socioeconomic and behavioral disciplines. After meeting the famous American psychometrician Louis Leon Thurstone and his wife Thelma in the early 1950s, Herman Wold was so impressed by the multivariate models of psychometrics that he decided to organize and host a Sympoisum of Psychometrics at Uppsala in 1953.

One of the things Wold found of enormous theoretical value was the use of latent variables or factors—as they were commonly referred—in psychometrics. The theories and multivariate tools around Factor Analysis had a longer tradition than econometrics. While simultaneous equations had been proposed in the early 1940s, the Factor Analysis methods could be traced back to at least 1904 with the works of English psychologist Charles Spearman and his theory about the General Factor of Intelligence. Likewise, the term Econometrics was coined in 1926 by Frish?, while Psychometrics had been in used since 1886, when James McKeen Cattell wrote his PhD thesis Psychometric Investigations. Both disciplines had in common the foundation of societies (Psychometric Society in 1936, and Econometric Society in 1934?).

Psychometrics and Econometrics both deal with phenomena and data of social nature. They both are based on theories that make use of abstract or latent concepts. The most famous concept in Psychology is that of Intelligence. Economics has its own latent concepts like Utility. The difference was in the way such theoretical concepts were handled in the mathematical models. Psychometrics literature had a longer tradition of including theoretical variables in their models playing an instrumental role. Instead, the theoretical variables in economics didn’t have that status of latency. The forces of demand and supply were assumed to be like the force of gravity or any other physical force: invisible yes, but present. The force of demand was measured with the demanded quantities, and the force of supply was measured with the supplied quantities; there was not need to include variables of latent nature in Economics. The most latent concept perhaps was the “invisible hand of the market” in the metaphor by Adam Smith; but his concept was not directly modeled or incorpored in any model, it was the equilibrium or intersection between curves of demand and supply that were taken into account instead.

Expanding Interests

Herman Wold was soon captivated by the possibilities of the analytical tools and theories provided by the Psychometricians. And he was not the only one. It was a matter of time for a full crosspolination of ideas to take place within quantitative methods in Social Sciences including psychology, economics, sociology, and education, to mention just a few. The introduction of causal path models to Sociology and its expansion was a watershed. The excitement in the quantitative analysis among social sciences disciplines was enormous after the the seminal work of Duncan (1966), as classes of problems suddenly seemed to open to new approaches with the conceptual tools it provided.

A crucial fact that would have tremendous consequences for the years to come, was the proposal of Herman Wold to Karl Jöreskog, one of his students, to write a doctoral dissertation on Factor Analysis.

Iterative Least Squares

In the early 1960s, Wold was elected to the prestigious Swedish Academy of Sciences. His increasing reputation and recognition as a leading figure in Scandinavian Econometrics was unquestionable. It was also in this decade that his trajectory in econometrics reached the summit, and in 1966 he became President of the Econometric Society. Still, he continued to focus his energies on tackling the problem for estimating non-recursive—or interdependent—systems of equations.

After giving up the causal arguments for the estimation of structural equations, Herman Wold started to work on a new methodology that he dubbed the Fix-Point (FP) method. Using Least Squares and his theoretical argument of predictor specification, the FP method was developed as his new proposal to estimate interdependent (ID) systems via an iterative procedure of least squares regressions.

At the end of 1964, having a first stable version of the FP method, he went on a roadshow to the USA to promote his new analytical tool. One of the stops was the State University of North Carolina. There, during one of his seminars, G. S. Tolley, a professor in the audience, asked Wold whether the FP method could be applied to compute principal components on a dataset he had collected. The previous year, Tolley had published a study on farmer skills. After the seminar Wold discussed in more detail with Tolley and R. A. Porter the possibility to adapt Fix Point for computing principal components. This discussion “gave me the clue for computing principal components by an iterative procedure,” Wold wrote. This unintentional discovery must had reinforced Wold’s enthusiasm about his new method. Besides, it had all the elements that Herman liked: based on least squares, aimed at practical application, and intended for prediction uses. Such a case of “serendipity,” Wold observed, “led me to an iterative procedure for the computation of Hotelling’s canonical correlations.”

Almost immediately a plan began to take shape in Wold’s mind to further investigate the properties and scope of his recently discovered iterative least squares procedures. He suspected that much more could be gotten out of its new method. As soon as Wold returned to Uppsala, he set to work on the “by-product” of his Fix-Point method to compute Principal Components and Canonical Correlations. He recalled the work of his former PhD student Peter Whittle on factor analysis and principal components, and he decided to use Whittle’s data set for experimenting purposes. Since the method involved iterations, Wold knew that a hand calculator was not going to be enough, so he asked his son Svante to write the computer programs for him. Likewise, his colleague Ejnar Lyttkens got his hands on to further study the iterative nature of the algorithm, proving the convergence of this new class of procedures. An opportunity to introduce his findings showed up in the summer of 1965 with the International Symposium on Multivariate Analysis at Dayton, Ohio. There, Wold presented an application of his NILES procedure on how to compute PCA using an iterative algorithm alternating simple least squares regressions. The following year the proceedings of the symposium were published in Multivariate Analysis (Krishnaiah, 1966) containing the paper by Wold Estimation of Principal Components and Related Methods by Iterative Least Squares. At the same time, he took the opportunity to write a more detailed and extensive paper when he was invited to participate in the Festchrif for Jerzy Neyman, also appeared in 1966: Nonlinear Estimation by Iterative Least Square Procedures. Historically, these publications are the roots of what years later would officially become the Partial Least Squares framework.

The work around NILES was definitely a hat-trick for Wold’s team. In a one-year period they had accomplished fruitful results with all the elements that Herman liked: least squares, practical application, prediction uses… and it even provided the opportunity to treat data with “partial information” or missing values. One could use the iterative least squares procedures for Principal Components, for Canonical Correaltions, and for a handful of other applications. At last something good was emerging out of his new start. Confident that he was on the right track, he pushed his research on his FP method for tackling the problem of interdependent systems. Yet, Herman was cautious enough not to prematurily celebrate or spreading his findings. “All of the fresh procedures,” Herman said, “are in an early stage of development.” Besides, his primary focus and motivation was the interdependent systems. His NILES procedures were after all a “by-product,” interesting and useful, yes, but not the main line of research for the rest of the second half of 1960s.

Psychometrics Influence

Meanwhile, Karl Jöreskog, one of Wold’s former PhD students, had been opening paths and achieving breaktroughs in factor analysis. In 195X he achieved the computation of Maximum Likelihood estimation first proposed by Lawley in 1941. At the middle of the 1960s he went to the USA with the Educational Testing Service (ETS)—the famous educational testing and assessment organization in charge of developing various standardized tests primarily in the United States. With eocnometrician Goldberger, Karl started to work on finding a way to merge Psychometric models with econometric systems of equations. The so-much awaited synthesis would be achieved in 1969. This feat marked a borefe-and-after moment in multivariate model-building. Something which Herman Wold couldn’t have ignored.

In the summer of 1957, when Karl was getting ready to start his professional career as a high school teacher of mathematics and physics, an unforeseen opportunity came to his life. He was visiting a friend one afternoon when the telephone rang. Professor Herman Wold was calling the friend offering him a research assistant position at the Department of Statistics. Since Karl’s friend had already taken a job, he told Herman that he was not available but that he knew of a friend that was a good candidate and that he was right there, passing the phone to Karl. After a brief chat between Herman and Karl, Wold offered the young aspiring teacher the research assistant possition in the Statistics Institute at Uppsala University.

A year later, Herman Wold told Karl that his research position required him to study more statistics. So Karl “took all the undergraduate courses in statistics and then continued for a master’s degree, which was completed in 1961”. Showing a great talent and dispossition, Karl went on to do the PhD and finished it in December 1963. For his doctoral research, Wold had suggested Jöreskog to do a dissertation on Factor Analysis, continuing the work on the topic previously done by Peter Whittle in 1952.

In February 1964 Fred Lord invited Jöreskog to do research on the renowned Educational Testting Service (ETS) in Princeton. After one year, Karl returned to Uppsala “for half a year and taught as a lecturer in statistics”. However Fred Lord made Karl an offer as Senior Research Statistician with tenure that he was not able to reject. Karl spent the next five years in Princeton until the summer of 1971 when he went back to Uppsala convinced and pushed by Wold, who even quit his Chair and took an early retirement so that Karl could have the Professor possition formerly occupied by Herman.

During the 1960s Karl stood out as one of the greatest figures and minds in Psychometrics, both at the methodological and computational levels. He accomplished various landmarks, and produced the necessary software that made possible the application and analysis of the new methods for practitioners. Among his most amazing achievements was the merging of the simultaneous equations in Econometrics with the Cofirmatory Factor Analysis of Psychometrics. With his approach to estimate Linear Structural Relationships models and his software LISREL, Karl would revolutionaze the way people think about models, latent variables, and testing of theories in social sciences. Following the tradition of Fisher, and Haavelmo, he performed all his work under the Maximum Likelihood principle.

Although formed under Wold’s mentorship, Karl Jöreskog didn’t embrace Wold’s deep love for Least Squares. Actually, Karl didn’t followed the econometric path of Wold’s main research group. Seeing the modeling potential of psychometric models, the late promising advances in the field, and the talented skills for mathematics and statistics shown by Karl, Wold must have seen something special in his young research assistant to entrust him a research topic in which Herman was not precisely an expert. Wold was not mistaken. Karl stood out in Psychometrics and became full involved in the field during his ETS period.

Inspired by his former student’s achievements and his LISREL models, Wold would had a revelation in the early 1970s. “In 1971 when I saw Jöreskog’s LISREL algorithm for ML estimation of path models with latent variables, it struck me that the same models could be estimated by suitable elaboration of my two algorithms for principal components and canonical correlations.” Wold would probably have pursued the ML approach of his pupil had he not being a Least Squares fervent advocat. Unlike the rest of the world that blindly followed the ML-approach for linear relation models, Wold set out on his own path.

The graphical representation of the models was practically the same in the arrow schemes used by Wold. He was well aware about what was hapenning not just in Econoemtrics but also in Psychometrics, Sociology, Education, and Causal Modeling. Not so long, Wold wondered whether it was possible to apply his iterative least squares procedures for the type of models proposed by Jöreskog. It was a very tempting yet natural question for Wold. The latent variables could be treated like principal components or canonical variates, and the structural relations could be posed in terms of regression equations, which in turn could be used for prediction purposes. A whole new unexplored route to a promising land was opening in front of Herman Wold’s eyes. Freed from the administrative responsabilities of his previous chairman possition, and having an enviable research experience, he set out to see how far he could get with his new ideas, now as a professor at the Univeristy of Gothenburg, the third-oldest university in Sweden.

It didn’t take Wold long before he had some preliminary results using his NILES procedures for path models with latent variables. Starting from two blocks of variables and two components, he jumped to models with three blocks and three structural relations. Like in 1965, the opportunity to show his findings came with the third International Symposium on Multivariate Analysis, again held in Dayton Ohio. Unconvinced of the name NILES, he changed the name to NIPALS, short for Nonlinear Iterative Partial Least Squares. Also the term procedures was dropped and replaced by modeling. NIPALS modeling slowly reflected a more mature ideological framework: “Nonlinear Iterative Partial Least Squares (NIPALS) Modelling: Some Current Developments” (Wold, 1973). “I see NIPALS modeling,” Wold wrote, “as an open ended array of models with unlimited complexity in the combined use of several devices.”

The middle of 1970s saw the evolution of NIPALS modeling from the first tentatives of expanding to three blocks of variables, to a more robust framework capable of handling any number of blocks and more complex structural relations. In 1977 Wold declared the ripenes of his PLS basic design method. Two years later, in 1979, Wold and Jöreskog organized a joint meeting at Cartigny, near Geneva, Switzerland, presenting both of their approaches (LISREL and PLSPM). The proceedings of that meeting would appear three years later in the form of a two-volume publication Systems under indirect observation: Causality, structure, prediction. Next chapter