Below is an essay I wrote for a class on Technology and Ethics.

In his text Human Compatible, Prof. Stuart Russell outlines his viewpoints on the future of artificial intelligence. Drawing on his years of work and research in the field of AI, Russell delineates the line between apocalyptic fact and fiction as he outlines what he believes is the fundamental flaw in AI so far. Focusing on the standard model of intelligence, Russell outlines the concept of provably beneficial AI as a revision of what he believes to be the fundamental flaw of AI design. Although Russell’s goal of provably beneficial AI is realizable and fixes a fundamental flaw in the prior base assumptions, I believe that the core barrier to achieving provably beneficial AI is the same problem that plagues most other systems in our society. By drawing parallels to capitalism, corporations, and government, I hope to show that Russell’s suggestions regarding the standard model need to be applied to all systems in order to secure a future for ourselves. Russell’s provably beneficial AI solves the greatest dangers inherent in the standard model of intelligence but also reveals the path to achieving a provably beneficial society.
In order to understand what Russell means by provably beneficial AI, it is important to understand what the field of AI has been like so far. When referring to AI, we are simply referring to intelligent machines. Intelligence, in this case, refers to the standard model of intelligence where “machines are intelligent to the extent that their actions can be expected to achieve their objectives”(Russell p10 Kindle). The problems with AI design, Russell says, stem from this standard model of intelligence. Described by Norbert Wiener as the “purpose put into the machine”, the objective that is being optimized for in the standard model of intelligence may be the objective that we gave the machine, but the unintended consequences may end up hurting us in ways we did not or could not expect (Russell 10 Kindle). Referred to as the King Midas problem, this value misalignment occurs when we set an objective that we think is what we need, without considering what the AI will do to achieve that objective. Russell outlines several examples in Ch 5 where objectives that seem aligned with our happiness/benefit go horribly awry, one of which being the example of setting the goal as fixing ocean pH. In order to achieve this goal, the AI pulls oxygen from the atmosphere, fixing the ocean but killing us in the process. Russell’s solution is to change course by adjusting some of our most basic assumptions, moving from the goal of designing ever more intelligent machines to designing provably beneficial AI.
Russell outlines the goal of provably beneficial AI as designing “machines with a high degree of intelligence — so that they can help us with difficult problems — while ensuring that those machines never behave in ways that make us seriously unhappy” (Russell p170 Kindle). Provably refers to mathematical proofs and is how we will ensure that machines will behave as expected based on axioms; beneficial in the sense that machines will “achieve our objectives rather than their objectives” (Russell p171 Kindle). Russell thinks that we can achieve this by sticking to three design principles:
“The machine’s only objective is to maximize the realization of human preferences.
The machine is initially uncertain about what those preferences are.
The ultimate source of information about human preferences is human behavior.” (Russell p172 Kindle)
The first principle guarantees beneficial machines as the machine’s only purpose is to benefit humans. Instead of coding in self-preservation, for example, the machine should incorporate self-preservation as a result of meeting the needs of humans since it cannot help realize human preferences if broken. The second principle ties back to the concept of “purpose put into the machine” in that the machine cannot have objectives built into it as it will maximize those objectives instead of working to benefit humans. The final principle ensures a constant feedback system where the AI is learning to predict our preferences from the choices we make, tying back to the initial standard model of intelligence where “Humans are intelligent to the extent that our actions can be expected to achieve our objectives. (Russell p9 Kindle)” Rather than defining our objective (which is ever-changing and constrained by context), AI works instead to understand us through our behavior. As Russell puts it, “Machines designed in this way will defer to humans: they will ask permission; they will act cautiously when guidance is unclear; and they will allow themselves to be switched off. (Russell p247 Kindle)” Russell’s concept of provably beneficial AI does solve the problem of machines that operate off the old standard model of intelligence as the major flaws of optimizing for a set objective and the inability to shut off the system are addressed. However, many of Russell’s criticisms apply to other types of systems and I believe that these systems are the final hurdle to achieving a provably beneficial society.
In describing the pitfalls of the current standard model of intelligence and its effects on AI, Russell outlines the necessity of removing the assumption that machines should have a definite objective (Russell p12 Kindle). In order to illustrate the ubiquity of his statement, I have made some additions to an excerpt from the text.
“Removing the assumption that machines(businesses/corporations) should have a definite objective(profit) means that we will need to tear out and replace part of the foundations of artificial intelligence(economics) — the basic definitions of what we are trying to do. That also means rebuilding a great deal of the superstructure — the accumulation of ideas and methods for actually doing AI(commerce). The result will be a new relationship between humans and machines (business/corporations), one that I hope will enable us to navigate the next few decades successfully.” (Russell p12 Kindle)
Just like how setting the goal of the algorithm for clickthrough resulted in the undesired consequence of modifying users’ minds to maximize clickthrough, optimizing corporations for profit and growth have resulted in similar unintended consequences. Russell states “Not bad for a few lines of code, even if it had a helping hand from some humans. Now imagine what a really intelligent algorithm would be able to do (Russell p8 Kindle)” to which I would point out that Climate Change CounterMovement organizations funded by and organized by various organizations and corporations already have massive destabilizing effects on the world (Brulle 2014). By optimizing for profit, corporations have generally been able to ignore the negative externalities that enable massive profits.
Even if provably beneficial AI were created to the specifications outlined by Russell, I believe that it would ultimately result in an exacerbation of existing problems without addressing the power and by extension wealth inequalities inherent in the world. The core of Russell’s point is the final deference to humans. This solves the problem inherent in the standard model, it, unfortunately, doesn’t address anything outside of AI. “Obviously, the actions of loyal machines will need to be constrained by rules and prohibitions, just as the actions of humans are constrained by laws and social norms. (Russell p215 Kindle)” Here, Russell outlines what is in my opinion the greatest gain to be had from a provably beneficial AI. A properly designed AI would in fact be bound by the rules and prohibitions set upon it by its creators. However, there is a huge difference between AI and humans here. Unlike AI, humans are capable of avoiding the laws and social norms that govern society. “In summary, he will find innumerable ways to benefit Harriet at the expense of others — ways that are strictly legal but become intolerable when carried out on a large scale. Societies will find themselves passing hundreds of new laws every day to counteract all the loopholes that machines will find in existing laws. (Russell p216 Kindle)” Immediate parallels that come to mind are corporations and interest groups lobbying for the benefit of these systems rather than people.
“Elections are supposed to communicate preferences to the government, but they seem to have a remarkably small bandwidth (on the order of one byte of information every few years) for such a complex task. In far too many countries, the government is simply a means for one group of people to impose its will on others. Corporations go to greater lengths to learn the preferences of customers, whether through market research or direct feedback in the form of purchase decisions. On the other hand, the molding of human preferences through advertising, cultural influences, and even chemical addiction is an accepted way of doing business. (Russell p248 Kindle)”
I would argue that in our current society, “utility monsters” already exist in the form of the rich and powerful, and corporations in the US, due to their status as “people”. This would explain why laws and regulations in the US seem to favor the rich, the powerful, and the corporations as the argument being made are that the utility of helping these companies is how the greatest economic gains are possible. Corporations are maximizing their profit, and since that is the objective of that system, it will pursue it over all other things including the welfare of humans and the environment.
“A machine that assumes it knows the true objective perfectly will pursue it single-mindedly. It will never ask whether some course of action is OK because it already knows it’s an optimal solution for the objective. It will ignore humans jumping up and down screaming, “Stop, you’re going to destroy the world!” because those are just words. (Russell 174 Kindle)”
I want to close by examining and expanding upon a particular part of the text. When discussing the concept of value regarding provably beneficial AI, Russell states “The meaning I want is the technical one: I just want to make sure the machines give me the right pizza and don’t accidentally destroy the human race. (Russell p177 Kindle)” This is where I see the largest benefit that can be afforded by AI. I want to dissect the second part of the statement more, as the concept of destroying the human race reveals how individual utility maximization might result in better group optimization. Outside of getting the right pizza, not destroying the human race should encompass both short and long-term effects, addressing one of the greatest shortcomings of humans in their ability to calculate uncertainty (Russell p235 Kindle). This decision should take into consideration supply chains and waste to provide the individual with the pizza they want that results in the greatest utility for the overall system as that guarantees the survival of the human long term. Ideally, we should learn to properly calculate the negative externalities that are a result of our rampant production and consumption and properly mitigate them.
If the field of AI adopts the spirit of Russell’s provably beneficial AI, it seems like our problems will be once again ours to solve. But unfortunately, there are various systems that are plagued with the same rogue optimization problem that is currently affecting AI. Russell touches on these issues in chapter seven stating that “Economic competition occurs not just between corporations but also between nations. (Russell p181 Kindle)” This competition between corporations, nations, and humans has always gone on. I can only hope that by the time provably beneficial AI comes along, we will have solved some of these problems. I would argue that the greatest hurdles are the inherent power and wealth structures (systems) that already exist in society.
Brulle, R. J. (2014). Institutionalizing delay: Foundation funding and the creation of U.S. climate change counter-movement organizations. Climatic Change, 122(4), 681–694. https://doi.org/10.1007/s10584-013-1018-7