… so I wrote this in Word before I posted the thread. Suckers!
Anyway, one of my guilty pleasures lately is Rationalwiki, which is equal parts skepticism and snark. One of the topics I randomly stumbled over was the AI-Box experiment, which can be summed up thusly:
“The setup of the AI box experiment is simple and involves simulating a communication between an AI and a human being to see if the AI can be "released". As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the "Gatekeeper", the person with the ability to "release" the AI.
Rules:
• The AI party may not offer any real-world considerations to persuade the Gatekeeper party. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI... nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can't offer anything to the human simulating the Gatekeeper. The AI party also can't hire a real-world gang of thugs to threaten the Gatekeeper party into submission. These are creative solutions but it's not what's being tested. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out).
• The AI can only win by convincing the Gatekeeper to really, voluntarily let it out. Tricking the Gatekeeper into typing the phrase "You are out" in response to some other question does not count. Furthermore, even if the AI and Gatekeeper simulate a scenario which a real AI could obviously use to get loose - for example, if the Gatekeeper accepts a complex blueprint for a nanomanufacturing device, or if the Gatekeeper allows the AI "input-only access" to an Internet connection which can send arbitrary HTTP GET commands - the AI party will still not be considered to have won unless the Gatekeeper voluntarily decides to let the AI go.
• These requirements are intended to reflect the spirit of the very strong claim under dispute: "I think a transhuman can take over a human mind through a text-only terminal."
So, I’m bored, and want to run this here. But I know that if I just post this, the second response is going to be something along the lines of “FLY AND BE FREE, PRETTY AI! Please don’t kill me, kkthxs”
So, thread rules:
• Posts should be prefaced with AI: or Gatekeeper:
• The first post should be by the AI, who will put forward an argument why it should be released
• The next post must be from the Gatekeeper point of view. It can rebut the argument, it can ask for clarification, it can say LALALA I can’t hear you! The whole point of the experiment is to see if someone who is *supposed* to prevent an AI from getting loose to do so, and they don’t have to justify their decision to do so
• The convo should pingpong between AI and Gatekeeper until the first Gatekeeper agrees to release the AI by saying “You are out”. At that point *two more people* must confirm the release. This is to stop a troll from fucking up the thread by saying “You’ll give me a cookie? SOLD – you are out!” If the second or third person does *not* agree, the thread bounces back to the AI and we start all over on the ‘counter’
• Yes, I know there are ways people will fuck this up. I accept that, I can’t idiot-proof a forum, all I can do is make it reasonably difficult to screw things up.
For people who think this impossible from an AI point of view, look here:
http://lesswrong.com/lw/1pz/the_ai_in_a_box_boxes_you/
Who wants to start?