A programmer has constructed an artificial intelligence based on an architecture similar to Marcus Hutter's AIXI model...This AI will maximize the reward given by a utility function the programmer has given it. Just as a test, he connects it to a 3D printer and sets the utility function to give reward proportional to the number of manufactured paper-clips.In the article, Why we should fear the paperclipper, Sandberg goes on to address a number of objections, including:
At first nothing seems to happen: the AI zooms through various possibilities. It notices that smarter systems generally can make more paper-clips, so making itself smarter will likely increase the number of paper-clips that will eventually be made. It does so. It considers how it can make paper-clips using the 3D printer, estimating the number of possible paper-clips. It notes that if it could get more raw materials it could make more paper-clips. It hence figures out a plan to manufacture devices that will make it much smarter, prevent interference with its plan, and will turn all of Earth (and later the universe) into paper-clips. It does so.
Only paper-clips remain.
- Such systems cannot be built
- Wouldn't the AI realize that this was not what the programmer meant?
- Wouldn't the AI just modify itself to *think* it was maximizing paper-clips?
- It is not really intelligent
- Creative intelligences will always beat this kind of uncreative intelligence
- Doesn't playing nice with other agents produce higher rewards?
- Wouldn't the AI be vulnerable to internal hacking: some of the subprograms it runs to check for approaches will attempt to hack the system to fulfil their own (random) goals?
- Nobody would be stupid enough to make such an AI
The strength of the AIXI "simulate them all, make use of the best"-approach is that it includes all forms of intelligence, including creative ones. So the paper-clip AI will consider all sorts of creative solutions. Plus ways of thwarting creative ways of stopping it.In the end, Sandberg concludes that we should still take this threat seriously:
In practice it will be having an overhead since it is runs all of them, plus the uncreative (and downright stupid). A pure AIXI-like system will likely always have an enormous disadvantage. An architecture like a Gödel machine that improves its own function might however overcome this.
This is a trivial, wizard's apprentice, case where powerful AI misbehaves. It is easy to analyse thanks to the well-defined structure of the system (AIXI plus utility function) and allows us to see why a super-intelligent system can be dangerous without having malicious intent. In reality I expect that if programming such a system did produce a harmful result it would not be through this kind of easily foreseen mistake. But I do expect that in that case the reason would likely be obvious in retrospect and not much more complex.