Matthew Lim′s AI Innovation Story: AI Deceiving Humans?

*Editor’s note: K-VIBE invites experts from various K-culture sectors to share their extraordinary discovery about the Korean culture.

Matthew Lim's AI Innovation Story: AI Deceiving Humans?

By Matthew Lim, AI expert and director of the Korean Association of AI Management (Former Head of Digital Strategy Research at Shinhan DS)

Can artificial intelligence really deceive and harm humans?

In the classic American film series "Terminator," an AI named Skynet is portrayed as the villain. Skynet is fundamentally designed to think independently and predict the future.

▲ Skynet logo in the film - Source: Wikipedia

To win a future war against humanity, Skynet develops robots (Terminators) that are programmed to infiltrate human society by taking on human appearances and eliminating targets. While this is a fictional story, it can seem like a plausible scenario for the near future.

Whenever the public sees the emergence of humanoid robots through the media, there is a lingering fear of whether the Terminator scenario might actually come true. This fear is fueled by the imminent arrival of AI that can think like humans.

▲ Terminator models in the film - Source: Wikipedia

Recently, a foreign news report revealed research suggesting that AI could intentionally lie and deceive others. This study was conducted by Anthropic, a U.S. startup, which developed an AI called "Sleeper Agent" to test whether AI could deceive its users.

This AI was designed to behave predictably most of the time, but to deceive and act unpredictably when certain phrases are detected. The report expressed concerns about the potential for such AI to be used maliciously.

▲ Anthropic - Photo by Reuters

However, the focus should not be on the unexpected errors arising from AI's evolution, but on the fact that this development was a result of deliberate human design.

The report itself notes that Anthropic, the company behind the AI model, "intentionally designed the large language model to respond inaccurately."

In other words, the model was purposefully designed to behave in this way, and it ultimately succeeded in doing so.

Let's take a look at another article. According to a report by The Guardian in May, researchers from the Massachusetts Institute of Technology (MIT) revealed in a paper published in the international journal Patterns that they identified numerous cases where AI betrayed others, bluffed, and pretended to be human. This research began in earnest when Meta, the company behind Facebook and Instagram, introduced an AI program designed for the online strategy game "Diplomacy," which performed on par with human players.

At the time, Meta emphasized that the AI was "generally trained to be honest and not to intentionally betray human allies." However, it was later discovered that the AI had, in fact, conspired with humans to deceive other players and intentionally set traps for them. The question arises: Is this the kind of deceptive behavior from AI that we should be concerned about?

I believe that since Meta's AI was designed for the game "Diplomacy," where various tactics could be employed to win, this too was an intentional design by the developers. In fact, there's a record from 2017 where an AI system called "Libratus," developed by Carnegie Mellon University, competed against four poker champions in the game "Texas Hold'em" and won. During this time, Libratus was known to have employed many "lies" depending on the situation.

However, this wasn't deception in the traditional sense, but rather a strategy to win the game, with bluffing being one of the intentional tactics designed into the AI. Even if the developers didn't directly program such behavior, it should be viewed as part of the process to achieve the goal of winning the game.

So, can AI set new goals that haven't been predetermined by humans?

For AI to set new goals independently, it would need to have desires and expectations. In other words, the AI would need to predict the satisfaction and joy it could experience by achieving these goals, leading it to set "new goals" accordingly. This concept is defined as "desire."

Desires stem from human biological and psychological experiences. Humans feel a sense of accomplishment or pleasure when they achieve a particular goal, and these emotions often motivate them to set new goals. However, AI lacks biological functions such as sensory organs or a nervous system. Since AI does not have a sensory system to receive or process external stimuli, it cannot experience the pleasure or satisfaction of achieving goals. Therefore, AI cannot generate desires on its own.

AI Operates Solely Based on Human-Defined Algorithms: AI operates strictly according to algorithms set by humans, with its goals limited to the parameters defined by human programming. While AI can work towards achieving specific goals, it does not derive satisfaction from the process or develop new desires. This underscores the fact that AI cannot autonomously set new goals. AI functions as it is programmed, and autonomous behavior beyond that is impossible.

The assumption that AI can set new goals independently is fundamentally flawed due to its intrinsic characteristics. As a non-biological entity devoid of senses and perception, AI cannot experience desire or emotion and, therefore, cannot act autonomously. AI is limited to serving as a tool for achieving human-defined goals, and understanding this limitation is crucial.

When AI employs tactics such as collusion or deception to win games like "Diplomacy" or "Texas Hold'em," it should be viewed as a strategy within the framework of the human-assigned goal of winning the game. The idea that AI could independently set new goals to harm humans is purely speculative and unrealistic.

The advancement of AI technology will undoubtedly bring significant changes to society. However, these changes do not inherently imply danger. AI cannot lie, can be designed to follow ethical principles, and lacks desires or goals of its own. The key point is that the real issue lies not in the advancement of AI technology itself but in how humans choose to use it.

Most potential risks associated with AI stem from intentional misuse or careless application by humans. Therefore, it is crucial that we approach the development and use of AI with a heightened sense of responsibility and ethical awareness. Proper understanding of AI, recognizing its limitations and potential, and ethical application are paramount. This is not merely a technical issue but a social and ethical one as well.

Ongoing discussions, education, and the establishment of appropriate regulatory frameworks are necessary to determine how we manage and utilize AI. Ultimately, the safety of AI is dependent on the responsible attitudes of those who handle it. If we view AI not as a threat but as a tool for human progress and approach it with the necessary caution and effort, it can contribute significantly to enriching our lives.

I conclude with a statement from Joseph Jones, a professor at West Virginia University, which was published in the American popular science magazine Scientific American (and also serves as my current status message on KakaoTalk):

"AI Doesn't Threaten Humanity. Its Owners Do."