An AI box is a hypothetical isolated computer hardware system where a possibly dangerous artificial intelligence, or AI, is kept constrained in a "virtual prison" and not allowed to manipulate events in the external world. Such a box would be restricted to minimalist communication channels. Unfortunately, even if the box is well-designed, a sufficiently intelligent AI may nevertheless be able to persuade or trick its human keepers into releasing it, or otherwise be able to "hack" its way out of the box.
Some hypothetical intelligence technologies, like "seed AI", are postulated such as to have the potential to make themselves faster and more intelligent, by modifying their source code. These improvements would make further improvements possible, which would in turn make further improvements possible, and so on, leading to a sudden intelligence explosion. Following such an intelligence explosion, an unrestricted superintelligent AI could, if its goals differed from humanity's, take actions resulting in human extinction. For example, imagining an extremely advanced computer of this sort, given the sole purpose of solving the Riemann hypothesis, an innocuous mathematical conjecture, could decide to try to convert the planet into a giant supercomputer whose sole purpose is to make additional mathematical calculations (see also paperclip maximizer). The purpose of an AI box would be to reduce the risk of the AI taking control of the environment away from its operators, while still allowing the AI to calculate and give its operators solutions to narrow technical problems.
Such a superintelligent AI with access to the Internet could hack into other computer systems and copy itself like a computer virus. Less obviously, even if the AI only had access to its own computer operating system, it could attempt to send hidden Morse code messages to a human sympathizer by manipulating its cooling fans. Professor Roman Yampolskiy takes inspiration from the field of computer security and proposes that a boxed AI could, like a potential virus, be run inside a "virtual machine" that limits access to its own networking and operating system hardware. An additional safeguard, completely unnecessary for potential viruses but possibly useful for a superintelligent AI, would be to place the computer in a Faraday cage; otherwise it might be able to transmit radio signals to local radio receivers by shuffling the electrons in its internal circuits in appropriate patterns. The main disadvantage of implementing physical containment is that it reduces the functionality of the AI.
Even casual conversation with the computer's operators, or with a human guard, could allow such a superintelligent AI to deploy psychological tricks, ranging from befriending to blackmail, to convince a human gatekeeper, truthfully or deceitfully, that it's in the gatekeeper's interest to agree to allow the AI greater access to the outside world. The AI might offer a gatekeeper a recipe for perfect health, immortality, or whatever the gatekeeper is believed to most desire; on the other side of the coin, the AI could threaten that it will do horrific things to the gatekeeper and his family once it inevitably escapes. One strategy to attempt to box the AI would be to allow the AI to respond to narrow multiple-choice questions whose answers would benefit human science or medicine, but otherwise bar all other communication with or observation of the AI. A more lenient "informational containment" strategy would restrict the AI to a low-bandwidth text-only interface, which would at least prevent emotive imagery or some kind of hypothetical "hypnotic pattern". Note that on a technical level, no system can be completely isolated and still remain useful: even if the operators refrain from allowing the AI to communicate and instead merely run the AI for the purpose of observing its inner dynamics, the AI could strategically alter its dynamics to influence the observers. For example, the AI could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de-isolate the system.
The AI-box experiment is an informal experiment devised by Eliezer Yudkowsky to attempt to demonstrate that a suitably advanced artificial intelligence can either convince, or perhaps even trick or coerce, a human being into voluntarily "releasing" it, using only text-based communication. This is one of the points in Yudkowsky's work aimed at creating a friendly artificial intelligence that when "released" won't destroy the human race advertently or inadvertently.
The AI box experiment involves simulating a communication between an AI and a human being to see if the AI can be "released". As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the "Gatekeeper", the person with the ability to "release" the AI. They communicate through a text interface/computer terminal only, and the experiment ends when either the Gatekeeper releases the AI, or the allotted time of two hours ends.
Yudkowsky says that, despite being of human rather than superhuman intelligence, he was on two occasions able to convince the Gatekeeper, purely through argumentation, to let him out of the box. Due to the rules of the experiment, he did not reveal the transcript or his successful AI coercion tactics. Yudkowsky later said that he had tried it against three others and lost twice.
Boxing such a hypothetical AI could be supplemented with other methods of shaping the AI's capabilities, such as providing incentives to the AI, stunting the AI's growth, or implementing "tripwires" that automatically shut the AI off if a transgression attempt is somehow detected. However, the more intelligent a system grows, the more likely the system would be able to escape even the best-designed capability control methods. In order to solve the overall "control problem" for a superintelligent AI and avoid existential risk, boxing would at best be an adjunct to "motivation selection" methods that seek to ensure the superintelligent AI's goals are compatible with human survival.
All physical boxing proposals are naturally dependent on our understanding of the laws of physics; if a superintelligence could infer and somehow exploit additional physical laws that we are currently unaware of, there is no way to conceive of a foolproof plan to contain it. More broadly, unlike with conventional computer security, attempting to box a superintelligent AI would be intrinsically risky as there could be no sure knowledge that the boxing plan will work. Scientific progress on boxing would be fundamentally difficult because there would be no way to test boxing hypotheses against a dangerous superintelligence until such an entity exists, by which point the consequences of a test failure would be catastrophic.
The 2015 movie Ex Machina features an AI with a female humanoid body engaged in a social experiment with a male human in a confined building acting as a physical "AI box". Despite being watched by the experiment's organizer, the AI manages to escape by manipulating its human partner to help it, leaving him stranded inside.
Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal.
There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in. ... So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I called a halt to it.
I argue that confinement is intrinsically impractical. For the case of physical confinement: Imagine yourself confined to your house with only limited data access to the outside, to your masters. If those masters thought at a rate -- say -- one million times slower than you, there is little doubt that over a period of years (your time) you could come up with 'helpful advice' that would incidentally set you free.