What’s in an Algorithm? The Problem of the Black Box
If you’ve been online (or even just alive) in the past three years, chances are you’ve encountered an article somewhere about “algorithms.” What an algorithm actually looks like can vary greatly, but they are defined broadly by the Association for Computing Machinery’s US public policy council arm as “a self-contained step-by-step set of operations that computers and other ‘smart’ devices carry out to perform calculation, data processing, and automated reasoning tasks.” The breakneck speed of technological advancement and the fever for automation have resulted in these self-contained decision-makers worming their way into all aspects of life; algorithms aren’t just the property of social media news feeds anymore, they’re also used to predict consumer habits, make investments, and even determine courtroom decisions. China, for example, is in the process of rolling out a system of “social credit-scoring” in which data collection and analysis techniques will be used to give each citizen a score. Demerits can include smoking in public areas or walking dogs without leashes, and one’s resulting score can then determine anything from access to public transit to hotel bookings. Though this system is still highly experimental, it is a testament to the widespread datafication of the modern world and the increased primacy of algorithms and machine-learning in shaping our day-to-day experiences.
Following this rapid expansion of algorithmic use and under the looming specter of Big Tech™, social scientists, policy makers, and activists have begun to turn towards computer scientists for answers. Thus far, many developers have seemed unable to answer a simple question about their software: what’s going on inside? Before we criticize these systems and their creators, it’s crucial to understand how they do (and often don’t) work, and why accountability is so rare in algorithmic systems. The short answer is this: no one really knows what’s going on inside a machine-learning algorithm.
The main issue with regulating algorithms is what’s often referred to as “the black box problem.” In the process of their creation, machine-learning algorithms become so complex that they become unreadable except by their inputs and outputs. It’s a black box—you put something in, you get something out, but whatever happens inside is a mystery.
One of the most common iterations of machine learning in use today is called a “neural network,” because it takes its basic metaphorical structure from the brain. The actual process of building these is complex, but the simple version goes like this: you build an algorithm with a basic set of instructions and steps to follow, a rudimentary framework for the kind of predictions you want it to make. Then, you feed it a set of test data, which it tests, and then generates a new set of rules (a series of supplementary algorithms) based on the results of those tests. Then it does that again. Then again. And again and again and again until it reaches its optimal state. The end product is an algorithm with a top layer of decisions and frameworks coded by humans, a middle layer of extremely complicated math (literally called convolutional layers), and a bottom layer of its predictions. The middle layer is essentially inscrutable to the human eye because there is no means of effectively documenting the logical leaps that the program makes at each stage—a problem that’s compounded by the fact that algorithms are liable to run thousands of rounds of testing before being finalized.
Think of an algorithm like it’s a child–you raise them with a basic set of rules, you can control the basic way they approach the world, but at the end of the day you just have to let go and hope they don’t kick any strangers in the shins. And just like human children, algorithms are exceptionally skilled at unknowingly recreating the biases of the people who create them. In 2015, Google faced explosive controversy because one of its facial recognition algorithms tagged a photo of two Black users as gorillas. They published a fix within hours of the mistake’s exposure on Twitter, and the scandal eventually disappeared.
When a journalist revisited the problem a few years later, however, he found something surprising—Google didn’t recognize gorillas at all. In other words, rather than coding in a fix, Google’s engineers had simply removed the system’s ability to tag things as gorillas at all. Google’s response was certainly a band-aid solution to get out of a bad PR moment, but it also reveals is the unruly and unpredictable nature of algorithms—they are liable to behave in ways we don’t expect, and are hard (if not impossible) to scrutinize with detail.
In order to explain the implications of the black box problem we can turn, as we often can, to Twitter. Democratic Representative of New York’s 14th District Alexandria Ocasio-Cortez (AOC) recently put out a short video in which she stated, “Algorithms are still made by human beings… If you don’t fix the bias, you’re just automating [it].” Ryan Saavedra, a reporter for the Daily Wire, later tweeted the video and fired back, “Socialist Rep. Alexandria Ocasio-Cortez (D-NY) claims that algorithms, which are driven by math, are racist.” There are a few truths to uncover here—the first is that AOC is right. Machine-learning algorithms follow a set of instructions given to them, and those instructions are a product of the very human assumptions and opinions of those who code them. The second is that Saavedra’s point exposes a common defense against claims of algorithmic bias—math can’t be racist, it’s math! The question though isn’t about the math—it’s who writes it and what it’s used for. It’s important to remember that, although they might be represented in equations, the frameworks coded into algorithms are human decisions and opinions.
Tufts Assistant Professor of Anthropology Nick Seaver specializes in the study of music recommenders; he hypothesizes that you can’t build a music recommendation algorithm without first making a decision about what is music and what is noise, what constitutes a melody and what kinds of people like what music––these decisions are informed by cultural biases often so subtle we don’t see them until they become macro-scale, as they do in algorithms.
Joy Buolamwini, a computer scientist from the Massachusetts Institute of Technology, recently published an article in which she demonstrated that facial recognition softwares are categorically worse at identifying darker-skinned people, with error rates for Black women up to 35 percent—something she attributes to majority White, male training sets (the images on which the software’s algorithms are based) being used to make these kinds of machines. This vulnerability to bias and our tendency to think of machines as objective arbiters of the future is what makes transparency so important. As algorithms continue to take over more and more territory in our daily lives, it is doubly important that they have some sort of accountability built in.
The reality is that the drive to automate rests in the hands of a select few, and in order to leverage some kind of access to the systems of tomorrow, it’s important to demand transparency, or at least explainability. Algorithms have sprung out of computer screens everywhere and into the big leagues; some recent innovations in algorithmic decision-making include investment-bots on Wall Street (which account for approximately 80 percent of trades by volume) and programs that predict recidivism risk and determine bail. One tech company, Axon, has announced plans to develop a technology that interprets police body camera videos without human input, developing reports to save time for officers who would normally do the work by hand. Though this project is still far from fruition, the idea behind it is dangerous—a black box coding and understanding police data would be based on how similar videos have been coded in the past. Considering the contentious history of racist policing and the use of police bodycams, it’s understandable that some might want know how or why an algorithm makes a decision in a situation like this. To turn territory like this over to algorithms without some kind of transparency or accountability is to make a decision about the lives of many without a second thought. The idea that algorithms are simple mathematical formulas that operate without real consequence is not only passé, it is dangerous.
This is not meant to feel apocalyptic; rather, it’s a call to attention. Researchers, journalists, and activists alike have begun to call for greater transparency in algorithmic systems through a variety of methods. A new field has developed called XAI, short for “explainable artificial intelligence,” which aims to develop artificial intelligence systems that can at least offer viable explanations for their actions, if not complete transparency. The European Union recently adopted the General Data Protection Regulation, a wide-sweeping piece of legislation that includes in its tenets an individual’s right to an explanation of the actions of an algorithm.
The possibilities for creating positive change through artificial intelligence are practically endless. However, in order to sustain a technological future that is just and avoids the mistakes of the past, it is imperative that these systems be open to scrutiny. Artificial intelligence is a fact of life at this point, and it is no more immune to bias than the people who build it—transparency is crucial to a continued future in the technological world. Punch a computer scientist today!
Great article !
Punch a computer scientist today????