The Origin of Data Science

Rarely in the history of science is it possible to locate precisely when and where a particular discipline arose for the first time. Just as medieval alchemy morphed into modern chemistry over centuries, most fields seem to have evolved and transformed very gradually. Not so data science. We can say with certainty that data science emerged, in both name and practice, in the early 1960s in the United States in the heart of the military-industrial complex. To a surprisingly high degree, what was called data science then matches the definition of the term in 2008, the year Facebook started using “data scientist” as a job category. This is no accident—the material and social conditions that occasioned the rise of data science in the post-war era are the same conditions that gave birth to the social media giants of Silicon Valley. At one time these conditions were confined to the military and scientific sectors; now they have become the foundation for the economy and society as a whole.

This thesis will come as a surprise to many. We tend the think the field—or at least the buzzword and hype—was created in Silicon Valley in the last fifteen years or so, roughly around the time Harvard Business Review anointed data scientist “the sexiest job of the 21st century.” Those who have done a little digging may push that date back to the mid-1990s, when a small group of statisticians in the US attempted to expand their field to encompass data science. Some cite Peter Naur’s suggestion to rename computer science as data science in the 1970s. Still others claim that John Tukey effectively founded the field in 1962 or thereabouts with his essay on data analysis. These events are fragments of a more complete history that needs to be told, not only to set the record straight, but to recognize the world that connects this history and the present era. We are living in a world where it was necessary to invent data science, and the same logic that drove its invention in the 1960s is driving its flourishing today.

One reason we can be so precise about its origins is that data science is not a science in the traditional sense—a fact often cited by critics of the field. It does not have a clear object in the world whose laws are to be revealed by applying the scientific method. It is not easily divided into pure and applied branches. Instead, the name data science was invented to designate a unique combination of technologies and practices drawn from extant fields, including data processing, signal processing, operations research, statistics, computer science, machine learning, and human-computer interaction. These fields were brought together to meet the requirements of a new medium—electronic data. Known then as data deluge and now as big data, this medium emerged from an increasingly connected system of communication and surveillance technologies associated with a slew of large-scale, post-war projects ranging from air defense and space exploration to high energy physics and electronic records management. Although each constituent field has a long and complicated history, their combination into a single domain of expertise is historically singular.

In a series of posts, I want to tell the story of the moment in history that gave rise to data science and the world in which it was necessary to invent it. The story of this world is a history of ideas, social realities, and material conditions extending from the rise of American hegemony after WWII to the era of social media capitalism in which we currently live.

The Flight of the Swallow

Giacomo Balla, 1913, Flight of the Swallows.

Look at any curve on a graph, for example of criminal or minor second offences in the last fifty years. Don’t those traits have a physiognomy, if not like that of the human face, at least like the silhouette of hills and valleys, or rather, since we are concerned here with movement—for we speak so appropriately in statistics of fluctuations in crime or births or marriages—like the twists and turns, the sudden dives, the sharp ascents in the flight of a swallow?
Gabrielle Tarde, 1890, “Statistics and Archaeology,” in The Laws of Imitation.

These two things may not be related historically, but conceptually there is a nice connection between the painting by the Italian futurist Giacomo Balla (Volo di Rondini [Flight of the Swallows] 1913) and the passage by the French sociologist, Gabrielle Tarde. It’s as if the artist was commissioned to illustrate the scientist’s work. Tarde—whose essay was written at a time when the value of data for the social sciences was being discovered—is making the point that the patterns surfaced by the visualization of data are fundamentally the same as the patterns we perceive with our senses.

In both cases, raw data are converted into an image that represents the real thing that generated the data in the first place. The implication is that the movements of society are as real, as concrete and as material, as the movements of a bird. That, in fact, society is an organism and data is a means of seeing that organism.

What is interesting about the painting itself is that it is a deconstruction of what we see. According to one interpretation, the painting depicts a group of swallows flying past the roof gutter of a building in such as way as to evoke “the rapid succession of film images.” Where Tarde wants to say that graphs of data are like our perceptions, Balla wants to say that our perceptions are like compositions of graphical data. Each makes the same point, but they approach it from opposite directions.

The affinity between the two is perhaps not accidental. Aside from their historical overlap, both Tarde and Balla were modernists in the their fields.

Tarde’s sociology proposed a radical decomposition of the social into what today we’d call agent-based models. Social systems are just patterns that emerge from basic rules of interaction—what he called laws of imitation. The view was much ahead of its time, and the significance of Tarde’s work has only recently been recognized (see Bruno Latour’s The Science of Passionate Interests 2009).

Balla was a member of the Futurist movement, which sought to represent and celebrate the culture of speed and change that characterized the rapid mechanization of society going on at the time. His art sought to capture this movement with contemporary tropes of the new media—motion pictures, which demonstrated the emergence of continuity from disconuity.

In both cases, experienced realities are imagined as dynamic constructions of more primary elements, elements whose behaviors and contours are captured by data.

The fact that both Tarde and Balla use the same phrase—the flight of the swallow(s)—suggests a shared linguistic meme in circulation in Europe at the time. A quick ngram search indeed shows the widespread usage of the phrase throughout the nineteenth century in variety of contexts, from the scientific to the poetic. Apparently, the birds are notable for their smooth, vigorous, and undulating movements, as well as their predictable nature, owing to their migration pattern, presaging a change of seasons. As one author puts it, “the flight of the swallow is the poetry of motion” . . .

The swift circlings, sweeping curves, daring plunges, and lofty soarings of these silent birds appeal so strongly to our imagination and musical instinct, that we are almost tempted to class the swallows with the birds of song. It would be interesting to follow the flight of the swallow through the pages of poetic literature.
— H.H. Ballard, 1894, “A Basket Nest,” The Outdoor World.

As for the phrase itself, its source appears to be a variant of that found in the Hebrew Bible, specifically Proverbs 26:2: “As the bird by wandering, as the swallow by flying, so the curse causeless shall not come” (KJV).