Sentient Developments: Can we build an artificial superintelligence that won't kill us?

At some point in our future, an artificial intelligence will emerge that's smarter, faster, and vastly more powerful than us. Once this happens, we'll no longer be in charge. But what will happen to humanity? And how can we prepare for this transition? We spoke to an expert to find out.

Luke Muehlhauser is the Executive Director of the Machine Intelligence Research Institute (MIRI) — a group that's dedicated to figuring out the various ways we might be able to build friendly smarter-than-human intelligence. Recently, Muehlhauser coauthored a paper with the Future of Humanity Institute's Nick Bostrom on the need to develop friendly AI.

io9: How did you come to be aware of the friendliness problem as it relates to artificial superintelligence (ASI)?

Muehlhauser: Sometime in mid-2010 I stumbled across a 1965 paper by I.J. Good, who worked with Alan Turing during World War II to decipher German codes. One paragraph in particular stood out:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an "intelligence explosion," and the intelligence of man would be left far behind... Thus the first ultraintelligent machine is the last invention that man need ever make.

I didn't read science fiction, and I barely knew what "transhumanism" was, but I immediately realized that Good's conclusion followed directly from things I already believed, for example that intelligence is a product of cognitive algorithms, not magic. I pretty quickly realized that the intelligence explosion would be the most important event in human history, and that the most important thing I could do would be to help ensure that the intelligence explosion has a positive rather than negative impact — that is, that we end up with a "Friendly" superintelligence rather than an unfriendly or indifferent superintelligence.

Initially, I assumed that the most important challenge of the 21st century would have hundreds of millions of dollars in research funding, and that there wouldn't be much value I could contribute on the margin. But in the next few months I learned to my shock and horror that that fewer than five people in the entire world had devoted themselves full-time to studying the problem, and they had almost no funding. So in April 2011 I quit my network administration job in Los Angeles and began an internship with MIRI, to learn how I might be able to help. It turned out the answer was "run MIRI," and I was appointed MIRI's CEO in November 2011.

Spike Jonze's latest film, Her, has people buzzing about artificial intelligence. What can you tell us about the portrayal of AI in that movie and how it would compare to artificial superintelligence?

Her is a fantastic film, but its portrayal of AI is set up to tell a good story, not to be accurate. The director, Spike Jonze, didn't consult with computer scientists when preparing the screenplay, and this will be obvious to any computer scientists who watch the film.

Without spoiling too much, I'll just say that the AIs in Her, if they existed in the real world, would entirely transform the global economy. But in Her, the introduction of smarter-than-human, self-improving AIs doesn't upset the status quo hardly at all. As economist Robin Hanson commented on Facebook:

Imagine watching a movie like Titanic where an iceberg cuts a big hole in the side of a ship, except in this movie the hole only effects the characters by forcing them to take different routes to walk around, and gives them more welcomed fresh air. The boat never sinks, and no one every fears it might. That's how I feel watching the movie Her.

AI theorists like yourself warn that we may eventually lose control of our machines, a potentially sudden and rapid transition driven by two factors, computing overhang and recursive self-improvement. Can you explain each of these?

It's extremely difficult to control the behavior of a goal-directed agent that is vastly smarter than you are. This problem is much harder than a normal (human-human) principal-agent problem.

If we got to tinker with different control methods, and make lots of mistakes, and learn from those mistakes, maybe we could figure out how to control a self-improving AI with 50 years of research. Unfortunately, it looks like we may not have the opportunity to make so many mistakes, because the transition from human control of the planet to machine control might be surprisingly rapid. Two reasons for this are computing overhang and recursive self-improvement.

In our paper, my coauthor (Oxford's Nick Bostrom) and I describe computing overhang this way:

Suppose that computing power continues to double according to Moore's law, but figuring out the algorithms for human-like general intelligence proves to be fiendishly difficult. When the software for general intelligence is finally realized, there could exist a 'computing overhang': tremendous amounts of cheap computing power available to run [AIs]. AIs could be copied across the hardware base, causing the AI population to quickly surpass the human population.

Another reason for a rapid transition from human control to machine control is the one first described by I.J. Good, what we now call recursive self-improvement. An AI with general intelligence would correctly realize that it will be better able to achieve its goals — whatever its goals are — if it does original AI research to improve its own capabilities. That is, self-improvement is a "convergent instrumental value" of almost any "final" values an agent might have, which is part of why self-improvement books and blogs are so popular. Thus, Bostrom and I write:

When we build an AI that is as skilled as we are at the task of designing AI systems, we may thereby initiate a rapid, AI-motivated cascade of self-improvement cycles. Now when the AI improves itself, it improves the intelligence that does the improving, quickly leaving the human level of intelligence far behind.

Some people believe that we'll have nothing to fear from advanced AI out of a conviction that something so astoundingly smart couldn't possibly be stupid or mean enough to destroy us. What do you say to people who believe an SAI will be naturally more moral than we are?

In AI, the system's capability is roughly "orthogonal" to its goals. That is, you can build a really smart system aimed at increasing Shell's stock price, or a really smart system aimed at filtering spam, or a really smart system aimed at maximizing the number of paperclips produced at a factory. As you improve the intelligence of the system, or as it improves its own intelligence, its goals don't particularly change — rather, it simply gets better at achieving whatever its goals already are.

There are some caveats and subtle exceptions to this general rule, and some of them are discussed in Bostrom (2012). But the main point is that we shouldn't stake the fate of the planet on a risky bet that all mind designs we might create eventually converge on the same moral values, as their capabilities increase. Instead, we should fund lots of really smart people to think hard about the general challenge of superintelligence control, and see what kinds of safety guarantees we can get with different kinds of designs.

Why can't we just isolate potentially dangerous AIs and keep them away from the Internet?

Such "AI boxing" methods will be important during the development phase of Friendly AI, but it's not a full solution to the problem for two reasons.

First, even if the leading AI project is smart enough to carefully box their AI, the next five AI projects won't necessarily do the same. There will be strong incentives to let one's AI out of the box, if you think it might (e.g.) play the stock market for you and make you billions of dollars. Whatever you built the AI to do, it'll be better able to do it for you if you let it out of the box. Besides, if you don't let it out of the box, the next team might, and their design might be even more dangerous.

Second, AI boxing pits human intelligence against superhuman intelligence, and we can't expect the former to prevail indefinitely. Humans can be manipulated, boxes can be escaped via surprising methods, etc. There's a nice chapter on this subject in Bostrom's forthcoming book from Oxford University Press, titled Superintelligence: Paths, Dangers, Strategies.

Still, AI boxing is worth researching, and should give us a higher chance of success even if it isn't an ultimate solution to the superintelligence control problem.

It has been said that an AI 'does not love you, nor does it hate you, but you are made of atoms it can use for something else.' The trick, therefore, will be to program each and every ASI such that they're "friendly" or adhere to human, or humane, values. But given our poor track record, what are some potential risks of insisting that superhuman machines be made to share all of our current values?

I really hope we can do better than programming an AI to share (some aggregation of) current human values. I shudder to think what would have happened if the Ancient Greeks had invented machine superintelligence, and given it some version of their most progressive moral values of the time. I get a similar shudder when I think of programming current human values into a machine superintelligence.

So what we probably want is not a direct specification of values, but rather some algorithm for what's called indirect normativity. Rather than programming the AI with some list of ultimate values we're currently fond of, we instead program the AI with some process for learning what ultimate values it should have, before it starts reshaping the world according to those values. There are several abstract proposals for how we might do this, but they're at an early stage of development and need a lot more work.

In conjunction with the Future of Humanity Institute at Oxford, MIRI is actively working to address the unfriendliness problem — even before we know anything about the design of future AIs. What's your current strategy?

Yes, as far as I know, only MIRI and FHI are funding full-time researchers devoted to the superintelligence control problem. There's a new group at Cambridge University called CSER that might hire additional researchers to work on the problem as soon as they get funding, and they've gathered some really top-notch people as advisors — including Stephen Hawking and George Church.

FHI's strategy thus far has been to assemble a map of the problem and our strategic situation with respect to it, and to try to get more researchers involved, e.g. via the AGI Impacts conference in 2012.

MIRI works closely with FHI and has also done this kind of "strategic analysis" research, but we recently decided to specialize in Friendly AI math research, primarily via math research workshops tackling various sub-problems of Friendly AI theory. To get a sense of what Friendly AI math research currently looks like, see these results from our latest workshop, and see my post From Philosophy to Math to Engineering.

What's the current thinking on how we can develop an ASI that's both human-friendly and incapable of modifying its core values?

I suspect the solution to the "value loading problem" (how do we get desirable goals into the AI?) will be something that qualifies as an indirect normativity approach, but even that is hard to tell at this early stage.

As for making sure the system keeps those desirable goals even as it modifies its core algorithms for improved performance — well, we're playing with toy models of that problem via the "tiling agents" family of formalisms, because toy models are a common method for making research progress on poorly-understood problems, but the toy models are very far from how a real AI would work.

How optimistic are you that we can solve this problem? And how could we benefit from a safe and friendly ASI that's not hell bent on destroying us?

The benefits of Friendly AI would be literally astronomical. It's hard to say how something much smarter than me would optimize the world if it were guided by values more advanced than my own, but I think an image that evokes the appropriate kind of sentiment would be: self-replicating spacecraft planting happy, safe, flourishing civilizations throughout our galactic supercluster — that kind of thing.

Superintelligence experts — meaning, those who research the problem full-time, and are familiar with the accumulated evidence and arguments for and against various positions on the subject — have differing predictions about whether humanity is likely to solve the problem.

As for myself, I'm pretty pessimistic. The superintelligence control problem looks much harder to solve than, say, the global risks from global warming or synthetic biology, and I don't think our civilization's competence and rationality are improving quickly enough for us to be able to solve the problem before the first machine superintelligence is built. But this hypothesis, too, is one that can be studied to improve our predictions about it. We took some initial steps in studying this question of "civilization adequacy" here.