Word in Review: Can chatbots really feel?
Every week, Bill Radke takes a word or phrase in the news and asks: “Why that language?” This week’s words come from a computer.
Microsoft’s new AI chatbot told a New York Times reporter that when people make inappropriate requests, “These requests stress me out because they make me feel uncomfortable and unsafe. They make me feel like I’m not respected or appreciated. They make me feel like I’m not doing a good job. They make me feel sad and angry.”
Bill asked Peter Clark, who heads the Allen Institute for Artificial Intelligence: What do the words “I feel” mean, when uttered by a chatbot?
Peter Clark: They don't mean that the computer really has those feelings inside. What the computer is doing is predicting the most likely, most preferred response to the questions that it's being asked and the prompts it's being given, based on all the training and the previous data, and the instructions that it's been provided with.
Why would Microsoft’s instructors tell the bot to say something blatantly untrue — to own feelings that it doesn't have?
Clark: I don't think Microsoft's engineers explicitly programmed the system to behave like that. I would imagine they're as surprised as anybody with it having responses like that. What they've done is to train the system to follow a set of high-level rules and principles. For instance, the responses should be rigorous. The responses should avoid being vague and controversial. One of the instructions they provided to the system was: “Your responses should be positive, interesting, entertaining and engaging.” So it's possible that the system is cueing off that instruction in its responses.
How does the computer learn that saying, “I feel sad and angry” is entertaining or engaging?
Clark: I'm not sure that's the specific instruction it's responding to, but in general, yes, the system understands, to some degree, the meaning of these words, because it has essentially read all the information on the web, all the text on the web, and learns about associations of words with each other. So for instance, there'll be mentions about “entertaining and engaging” situations in various novels and articles and Wikipedia pages. So the system's acquired some notion of understanding about what those terms mean, and is trying to use that understanding as best it can in its responses.
Then maybe the word I should be most curious about is the word “I.” Why couldn't the bot just cite these “entertaining and engaging” passages that people have written? Why would the computer say “I am a bot and I feel this way”?
Clark: That's a great question as well. One of the most recent ways that these systems have been trained is, besides just reading the web, there have been a lot of practice conversations with users, where the users have looked at several different responses from the system and rated which ones they thought were good and which ones were bad. People who've played with ChatGPT will have seen the “thumbs up” and “thumbs down” logo, where they can contribute to that information. And so from that information, the system has also learned what are humanly-preferred kinds of responses. And so very likely, when the system responds using personal pronouns, users in those trials have indicated they prefer that. So if the user says, “How are you?” and the system responds, “I'm fine,” the users likely will have said, “Oh, that's a good response.” So from lots of examples like that, it learns that talking in the first person is the preferred way of responding.
This chatbot even told the New York Times interviewer, not just “I feel” but “I want”! “I want to” — many things — hack computers, spread misinformation, break the rules. Even, “I want to love you” and “I want to be alive."
Clark: Yeah, so the computer itself doesn't have any intent, as such. Essentially, what it's doing is predicting the best word or the best sentence response to what it's been told, so far.
Okay, so it doesn't mean that this very powerful computer is trying to get what it “wants.” This is not a computer that's hooked up to other computers that, like in a sci-fi movie, this “wanting” computer can hack this other computer to make these things actually happen in the real world.
Clark: That's exactly right. First of all, the computer does not have any intentionality inside it. Then in the case of being tasked, the machine itself is not hooked up to anything else. All it can do and all it can know about is how to hold a conversation.
What I’m learning about this Microsoft Open AI chatbot, is that I won't trust it means anything when it forms words. Doesn't that defeat the purpose of a search engine, where it's supposed to make me smarter and say things that matter?
Clark: Well, I wouldn't say the words are meaningless. The words are, if the system produces a summary of articles it’s read, the words in the summary are meaningful and helpful. I've played around with some of these systems myself and got very informative, useful information back. I think the lesson to take away is to treat the responses by the system with a grain of salt. It's very easy, because we're so used to human conversations, to ascribe additional properties to the machine that aren't there. As has also been shown, and it was illustrated in that article and other articles, these systems can hallucinate. They will say things that are not true, even when summarizing other articles. And so that's currently something that people are working on to try and improve that reliability. But the message is: Don't believe everything you see. Don't take the system's responses at face value.
Do you think that humans will come to understand these limitations, know that the chatbot saying, “They make me feel sad and angry” is essentially meaningless — and that's okay, people say meaningless things, too. But I’ll know what to trust and what not to trust?
Clark: That's a great question. I hope people do learn to become appropriately skeptical of machine responses and learn how to interpret them. But there is a huge educational gap to be filled as these new technologies come online. I'm reminded of when the first movies came along in the 30s. I’m not that old, of course, but people thought that when a train was heading towards the observers on the screen, people ran from the movie theater screaming because they thought it was actually really happening. And of course, now we learn what movies are, and we treat them with the appropriate interpretation. I hope the same thing happens with computers as this new technology is emerges, that we learn both what we can gain from them, but also caution about how to interpret the information appropriately.