Big Data’s Hidden Labor

By: Evan Malmgren

Tweetstorms and Netflix binges are really fun -- and profitable. It's time to claim the fruits of our digital labor.

Takashi Hososhima / Flickr

Who owns your data? We are used to signing the question away in unread terms of service agreements, but it has increasingly become a matter of livelihood. Shipping companies like UPS and Amazon micromanage their workers with advanced surveillance networks, while international retailers and fast-food chains now generate employee schedules with complex, data-fed efficiency algorithms. Monsanto “smart farm” technologies extract valuable insights from independent farmers en masse, and Uber drivers may even help develop their own self-driving replacements by building driving databases of unprecedented size and detail.

Capitalists have long collected profitable data from their workers without compensation, but only more recently has the proliferation of networked smart technologies — “the internet of things” — extended this surveillance beyond the workplace, adding a dimension of unwaged value-creation to our personal lives. Digital retailers profile us to give targeted recommendations; streaming services learn our tastes to predict what content we will enjoy; and fitness apps track our calories and steps to help us make “healthier” decisions. Soon, VR headsets may even be tracking minute eye movements and spontaneous retinal activity.

These technologies usually feed our personal information back to private companies, where insights about our shopping habits, interests, and bodily functions reap huge profits. Big data can’t exist without our input, and the analytics market wouldn’t have grown to a $130 billion dollar industry without wide-scale cooperation. Just as passive data collection adds new layers of invisible labor to the “smart farmer’s” workday, it increasingly transforms our leisure time into productive work.

Enclosing the Commons

When Google’s PageRank algorithm started trawling the web back in 1996, Larry Page and Sergey Brin had unwittingly begun a process that would turn the information pipeline on its head. Sorting an ever-expanding cache of URLs by link density and user engagement statistics, the Stanford PhD students eventually developed an algorithm that has outsourced their search engine to its clientele, the customers of a free service. Users strengthen the algorithm simply by searching the web, thus attracting more consumers to the improved product, and in turn generating a larger base to further hone the engine.

An ideal form of the neoclassical economist’s “virtuous cycle,” this process is one of the first clear examples of consumer-driven big data. It was innovative because it collapsed an act of mass production — the creation of useful data — into one of mass consumption, eventually driving search competitors like AltaVista, Hotbot, and WebCrawler (as well as overcrowded web portals like MSN, AOL, and Lycos) into obscurity on the back of a hidden labor force.

Few know that in late 2001, Google was quietly considering a shift from this “virtuous cycle,” testing a voting system that would allow users to transparently impact the ranking of their search results. SiteLab co-founder Dana Todd called the more engaged approach “user aware,” but the transparent feature never hit the market. As Google discovered, mass data harvesting operates best in a concealed and indirect manner.

An active, straightforward exchange — as with a questionnaire or customer service survey, for instance — reveals the labor involved in feeding a magical algorithm. Instead of opting for active solicitation, Google has intensified its passive data collection, expanding its reach to include your movements through physical space (Google Maps), anticipated futures (Google Calendar), and metrics on everyday internet usage (Google Chrome). These accumulated data sets are all extensions of what the company’s privacy page refers to as the “things that make you ‘you.’”

These hidden exchanges quickly became central not only to Google’s but also Amazon’s business model. The leviathan internet retailer began to monetize personal user data around the same time as Google, using a vast set of individual purchase histories to feed algorithms that built item-item similarity indexes and consumer profiling tools as early as 2003. The company quickly established itself as a pioneer in targeted online advertising, leveraging metadata as a complex recommendation system. It appeared that Amazon had automated the job of a helpful retail clerk, but in reality, the company had merely hoisted the clerk’s labor onto the consumers themselves, to be carried out within the act of consumption.

At first glance, it might seem that this model perfectly echoes film critic Annette Michelson’s 1979 adage that, in the age of television advertising, “You are the end product delivered en masse to the advertiser.” But the internet’s data economy has proved a bit more complex: Google and Amazon had begun to embrace consumer data just as other early-internet titans were struggling to monetize their popularity. At the time, advertisers were wary of the web, which lacked television’s captive audience, and showed a poor rate of return on converting attention into profit. Google and Amazon sidestepped the problem by congealing their global markets into a workforce. While Google relied on user inputs to build a dominant product, Amazon turned their customers into a massive personalized marketing team. Both turned user data into a valuable commodity in its own right.

Thus, in an amendment to Michelson’s adage, in the age of digital communications, your data — rather than you yourself — is the product delivered en masse. In repurposing consumer engagement as tangible goods and services, Amazon and Google demonstrated that freely extracted personal data could be turned for a profit. It is no coincidence that these companies easily weathered the burst of the dot-com bubble, or that their models have all but defined the “Internet 2.0” generation that followed.

Of the sleeker, smartphone-enabled internet companies that rose from the ashes of the dot-com crash, Facebook burns brightest. Envisioned as a monetized user database from the outset, Mark Zuckerberg’s social network cycled through a slew of design changes before settling on a site layout that compelled its users to divulge the maximum quantity of personal information. As we check the website’s boxes, complete its forms, and play in its sandbox of likes, posts, and reactions, algorithms sift through our online selves and apply predictive analytics to divine our politics, income brackets, and opaque personal interests.

These detailed profiles are packaged and sold to advertisers en masse, without remittance for the consumer-producers whose labor imbues them with value. With an annual revenue stream reported in excess of $27 billion at the end of 2016, Facebook has ballooned into one of the world’s largest internet companies, topped only by Amazon and Google, whose 2016 revenues were reported around $136 and $90 billion respectively.

These companies have built an industry of assembling and marketing comprehensive metadata — interlinking chains of minor details that become more valuable as they grow in complexity. Edward Snowden usefully explained the power of metadata in a 2015 livestream:

Metadata is very much like what a private eye does when they follow someone around. They’re not even close enough to you, when they’re sitting behind you in a café, to get every word that you’re saying in a whispered conversation. But they’re going to know where you were, they’re going to know who you met with, they’re going to know when you did it, they’re going to know how you left, they’re going to know where you went. And when you get this in aggregate, you tell the full story of someone’s life.

Facebook doesn’t just know your relationship status, the things you “like,” and where you took your profile pictures — they also tether this information to anything you do on an external app accessed through a Facebook login, or any webpage that you access from there. This allows them to associate your Tinder swipes with your Venmo transactions; your Uber rides with your Instagram followers; your Seamless orders with your preferred news sources and how you access them. Likewise for Google: if you have Google Maps installed on your smartphone, the tech giant can process all of your movements alongside your search history, newsletter subscriptions, favorite YouTube videos, and anything you do on a web page with a Google+ button.

Of course, it would be impossible to strain useful patterns from this overwhelming noise without an extensive material infrastructure. For this reason, big data has been referred to as the new oil: it is worthless in its raw form, but grows to a fortune with proper refinement.

To give a sense of the capital accumulation undergirding this extraction of data-wealth: Twitter leases around a fifth of a 990,000 square-foot data center in Atlanta, where it stores over five hundred petabytes of data, and processes, caches, and analyzes over half a million tweets per day; Facebook’s seven data centers range from 160,000 to 487,000 square feet in size, with the company claiming an excess of $3.6 billion in “networking equipment” at the end of 2015; and Google spends more than $5 billion per quarter on its sixteen massive data centers, located on four continents and housing over a million servers. These colossal entry barriers mean that newcomers are unable to compete with established big data companies, and cannot similarly extract surplus value from user engagement with free services. As a result, a handful of tech giants enjoy near-monopolistic control of our bulk metadata.

Despite a small concentration of ownership, the ability to process large volumes of personal information has still resulted in some benefits for individuals and society at large. Google prioritizes news stories that I genuinely find interesting, Ticketmaster sends me custom event notifications based on artists that I follow on SoundCloud, and I always notice the sponsored posts announcing holiday sales by certain socialist magazines. On a macro scale, big data has positive implications for urban planners who want to design smarter cities, health-care professionals who want to predict epidemics and cure diseases, and engineers who want to identify or even predict new problems to solve.

And yet, we cannot forget that big data’s developments are ultimately enabled by us — the creators of its constituent bits — and not by magical processing centers alone. The average Facebook user was worth roughly fifteen dollars per year at the start of 2016; for Google, that figure was around thirty-three dollars. These may seem like small numbers, but they become massive when multiplied across a vast consumer base, and will only continue to grow as analytic firms and machine learning technologies improve their capacity to process raw information into profitable insights.

Anyone would expect reimbursement for participating in an inpatient study, or for sitting on a consumer panel at a product testing. Now that we provide these kinds of data services remotely, the only distinction is a greater degree of alienation. We don’t expect payment for our data simply because its creation is not considered to be “work.”

In Search of Alternatives

Labor should be understood — and compensated — in terms of value creation, and not a degree of compulsion. People may be willing to engage in value-creating activities of their own volition, but that doesn’t mean that we should allow this newly possible wealth to pool in the hands of a relatively small group of developers and tech executives. If we fail to recognize big data as a society-wide project, we risk squandering an incredible technical achievement: the ability to convert leisure time into material utility.

This advancement does not necessarily signal a move towards a post-work society, but one in which labor is increasingly embedded in voluntary and even enjoyable activities. This unification of work and play is central to Marx’s utopian vision, outlined in Critique of the Gotha Programme, of a society in which “labor has become not only a means of life but life’s prime want.”

Utopian socialists like Charles Fourier once envisioned a future society in which productive work would take the form of personal enjoyment and creative fulfillment, even straying into territories of outlandish extravagance. We are unlikely to fully escape the necessity of life’s occasional drudgery, or to arrive at anything resembling Fourier’s Phalanstère, but there is no reason to reject the possibility of realizing this vision in a limited or partial form.

If we can assert a right to personal data ownership, one can imagine a future in which wages are increased to compensate for information collected from existing labor, and the workday is shortened thanks to additional value gleaned from idle time. Big data has already added a productive element to many acts of consumption, and to many things that we already do on a regular basis. If we are to realize the full social potential of big data, the necessary political task is to demand recognition of the hidden labor involved in its construction.

Jacobin