ChatGPT的“幻覺”是否會消失？

MATT O'BRIEN AND THE ASSOCIATED PRESS

2023-08-08

如今沒有一種模型不會產生幻覺。

文本設置

小號

默認

大號

Plus(0條)

2023年6月6日，星期二，OpenAI首席執行官山姆·奧特曼（Sam Altman）在阿拉伯聯合酋長國阿布扎比發表演講。圖片來源：AP PHOTO/JON GAMBRELL, FILE

與 ChatGPT 和其他人工智能聊天機器人聊足夠長時間，就能發現它們很快就會說謊話。

這種現象被描述為幻覺、虛構或純粹是胡編亂造，現在已成為每家企業、機構和高中生試圖讓生成式人工智能系統編寫文檔和完成工作時會遇到的問題。從心理治療到研究和撰寫法律摘要，有些人將其用于可能產生嚴重后果的任務。

聊天機器人Claude 2的制造商Anthropic的聯合創始人兼總裁丹妮拉·阿莫代伊（Daniela Amodei）說：“我認為，如今沒有一種模型不會產生幻覺?！?/p>

阿莫代伊表示：“實際上，它們的設計初衷只是用來預測下一個單詞。因此，模型在某些情況下會出現失誤?！?/p>

Anthropic、ChatGPT 制造商 OpenAI 和其他被稱為大型語言模型的人工智能系統的主要開發商表示，他們正在努力使這些模型變得更準確。

至于這需要多長時間，以及它們是否能做到準確無誤地提供醫療建議，還有待觀察。

語言學教授、華盛頓大學（University of Washington）計算語言學實驗室主任艾米麗·本德（Emily Bender）說："這是無法解決的，是技術與擬議用例不匹配的通病?！?/p>

生成式人工智能技術的可靠性至關重要。麥肯錫全球研究院（McKinsey Global Institute）預計，這將為全球經濟帶來相當于2.6萬億至4.4萬億美元的收入。聊天機器人引爆一波熱潮，可以生成新圖像、視頻、音樂和計算機代碼等的技術也掀起了熱潮。幾乎所有的工具都包含一些語言組件。

谷歌（Google）已經在向新聞機構推銷一款新聞寫作人工智能產品。對新聞機構來說，準確性至關重要。作為與OpenAI合作的一部分，美聯社（The Associated Press）也在探索使用這項技術，而OpenAI正在付費使用美聯社的部分存檔文本來改進其人工智能系統。

計算機科學家加內什·巴格勒（Ganesh Bagler）與印度的酒店管理機構合作，多年來一直致力于讓人工智能系統（包括 ChatGPT 的前身）發明南亞菜肴的食譜，比如新式印度比爾亞尼菜（以米飯為主）。一種“令人產生幻覺”的配料就可能決定菜肴美味與否。

今年 6 月，OpenAI 首席執行官山姆·奧特曼訪問印度時，一位德里英德拉普拉斯塔信息技術研究所（Indraprastha Institute of Information Technology Delhi）的教授提出了一些尖銳的問題。

“我想ChatGPT產生幻覺仍然是可以接受的，但當食譜出現幻覺時，問題就嚴重了?！卑透窭赵谶@位美國科技高管的全球之行新德里站上，在擁擠的校園禮堂里站起來對奧特曼說道。

“你怎么看待這個問題?”巴格勒最后問道。

即使沒有做出明確的承諾，奧特曼也表達了樂觀的態度。

奧特曼說：“我相信，用一年半到兩年的時間，我們團隊就能基本解決幻覺的問題。大致如此。到那時，我們就無需討論這一問題了。創意和完全準確之間存在微妙的平衡，模型需要學習在特定時間，你需要的是哪一種類型的答案?！?/p>

但對于一些研究這項技術的專家來說，比如華盛頓大學的語言學家本德，這些改進還遠遠不夠。

本德將語言模型描述為根據訓練語料，“對不同詞形字符串的可能性進行建?！钡南到y。

拼寫檢查器就是通過這樣的語言模型來檢查你是否打錯字了。本德說，這樣的語言模型還能助力自動翻譯和轉錄服務，"使輸出結果看起來更像目標語言中的典型文本"。許多人在編寫短信或電子郵件使用"自動補全"功能時，都依賴這項技術的某個版本。

最新一批聊天機器人，如ChatGPT、Claude 2或谷歌的Bard，試圖通過生成全新的文本段落來將這一技術提高到新水平，但本德表示，它們仍然只是重復選擇字符串中最合理的下一個單詞。

當用于生成文本時，語言模型“被設計為編造內容。這就是語言模型完成的所有任務?！北镜抡f。他們擅長模仿各種寫作形式，比如法律合同、電視劇本或十四行詩。

本德說：“但由于它們只會編造內容，所以當它們編造出來的文本恰好被解讀為內容正確（我們認為準確無誤）時，那只是偶然。即使通過微調，使其在大多數情況下都是正確的，它們仍然會出現失誤——而且很可能出現的情況是，閱讀文本的人很難注意到這類錯誤，因為這類錯誤更隱蔽?！?/p>

Jasper AI公司總裁謝恩·奧利克（Shane Orlick）說，對于那些向 Jasper AI 尋求幫助撰寫宣傳文案的營銷公司來說，這些錯誤并不是什么大問題。

奧利克說："幻覺實際上是一種額外的獎勵。經常有客戶告訴我們Jasper是如何提出創意的——Jasper是如何創作出他們想不到的故事或是從他們都想不到的角度切入的?！?/p>

這家總部位于德克薩斯州的初創公司與OpenAI、Anthropic、谷歌或臉書（Facebook）母公司Meta等合作伙伴合作，為客戶提供各種人工智能語言模型，以滿足他們的需求。奧利克說，該公司可能為關注準確性的客戶提供Anthropic的模型，而為關注其專有源數據安全性的客戶提供不同的模型。

奧利克說，他知道幻覺不會輕易被修復。他寄希望于像谷歌這樣的公司投入大量精力和資源來解決這一問題，他表示谷歌的搜索引擎必須有"高標準的事實性內容"。

"我認為他們不得不解決這一問題。"奧利克說。"他們必須解決這一問題。我不知道它是否會變得完美，但隨著時間的推移，它可能會日臻完善?！?/p>

包括微軟（Microsoft）聯合創始人比爾·蓋茨（Bill Gates）在內的科技樂觀主義者一直在預測樂觀的前景。

蓋茨在7月份的一篇博客文章中詳細闡述了他對人工智能社會風險的看法，他說：“隨著時間的推移，我很樂觀地認為，人工智能模型能夠學會區分事實和虛構?！?/p>

他引用了OpenAI 2022年的一篇論文，論證“在這方面大有可為”。

但即使是奧特曼，當他推銷產品的各種用途時，也不指望模型在為自己尋找信息時是可信的。

“我可能是世界上最不相信ChatGPT給出的答案的人了?！眾W特曼在巴格勒所在的大學里對聽眾說，引來一片笑聲。（財富中文網）

譯者：中慧言-王芳

與 ChatGPT 和其他人工智能聊天機器人聊足夠長時間，就能發現它們很快就會說謊話。

聊天機器人Claude 2的制造商Anthropic的聯合創始人兼總裁丹妮拉·阿莫代伊（Daniela Amodei）說：“我認為，如今沒有一種模型不會產生幻覺?！?/p>

阿莫代伊表示：“實際上，它們的設計初衷只是用來預測下一個單詞。因此，模型在某些情況下會出現失誤?！?/p>

Anthropic、ChatGPT 制造商 OpenAI 和其他被稱為大型語言模型的人工智能系統的主要開發商表示，他們正在努力使這些模型變得更準確。

至于這需要多長時間，以及它們是否能做到準確無誤地提供醫療建議，還有待觀察。

“你怎么看待這個問題?”巴格勒最后問道。

即使沒有做出明確的承諾，奧特曼也表達了樂觀的態度。

但對于一些研究這項技術的專家來說，比如華盛頓大學的語言學家本德，這些改進還遠遠不夠。

本德將語言模型描述為根據訓練語料，“對不同詞形字符串的可能性進行建?！钡南到y。

Jasper AI公司總裁謝恩·奧利克（Shane Orlick）說，對于那些向 Jasper AI 尋求幫助撰寫宣傳文案的營銷公司來說，這些錯誤并不是什么大問題。

"我認為他們不得不解決這一問題。"奧利克說。"他們必須解決這一問題。我不知道它是否會變得完美，但隨著時間的推移，它可能會日臻完善?！?/p>

包括微軟（Microsoft）聯合創始人比爾·蓋茨（Bill Gates）在內的科技樂觀主義者一直在預測樂觀的前景。

他引用了OpenAI 2022年的一篇論文，論證“在這方面大有可為”。

但即使是奧特曼，當他推銷產品的各種用途時，也不指望模型在為自己尋找信息時是可信的。

“我可能是世界上最不相信ChatGPT給出的答案的人了?！眾W特曼在巴格勒所在的大學里對聽眾說，引來一片笑聲。（財富中文網）

譯者：中慧言-王芳

Spend enough time with ChatGPT and other artificial intelligence chatbots and it doesn’t take long for them to spout falsehoods.

Described as hallucination, confabulation or just plain making things up, it’s now a problem for every business, organization and high school student trying to get a generative AI system to compose documents and get work done. Some are using it on tasks with the potential for high-stakes consequences, from psychotherapy to researching and writing legal briefs.

“I don’t think that there’s any model today that doesn’t suffer from some hallucination,” said Daniela Amodei, co-founder and president of Anthropic, maker of the chatbot Claude 2.

“They’re really just sort of designed to predict the next word,” Amodei said. “And so there will be some rate at which the model does that inaccurately.”

Anthropic, ChatGPT-maker OpenAI and other major developers of AI systems known as large language models say they’re working to make them more truthful.

How long that will take — and whether they will ever be good enough to, say, safely dole out medical advice — remains to be seen.

“This isn’t fixable,” said Emily Bender, a linguistics professor and director of the University of Washington’s Computational Linguistics Laboratory. “It’s inherent in the mismatch between the technology and the proposed use cases.”

A lot is riding on the reliability of generative AI technology. The McKinsey Global Institute projects it will add the equivalent of $2.6 trillion to $4.4 trillion to the global economy. Chatbots are only one part of that frenzy, which also includes technology that can generate new images, video, music and computer code. Nearly all of the tools include some language component.

Google is already pitching a news-writing AI product to news organizations, for which accuracy is paramount. The Associated Press is also exploring use of the technology as part of a partnership with OpenAI, which is paying to use part of AP’s text archive to improve its AI systems.

In partnership with India’s hotel management institutes, computer scientist Ganesh Bagler has been working for years to get AI systems, including a ChatGPT precursor, to invent recipes for South Asian cuisines, such as novel versions of rice-based biryani. A single “hallucinated” ingredient could be the difference between a tasty and inedible meal.

When Sam Altman, the CEO of OpenAI, visited India in June, the professor at the Indraprastha Institute of Information Technology Delhi had some pointed questions.

“I guess hallucinations in ChatGPT are still acceptable, but when a recipe comes out hallucinating, it becomes a serious problem,” Bagler said, standing up in a crowded campus auditorium to address Altman on the New Delhi stop of the U.S. tech executive’s world tour.

“What’s your take on it?” Bagler eventually asked.

Altman expressed optimism, if not an outright commitment.

“I think we will get the hallucination problem to a much, much better place,” Altman said. “I think it will take us a year and a half, two years. Something like that. But at that point we won’t still talk about these. There’s a balance between creativity and perfect accuracy, and the model will need to learn when you want one or the other.”

But for some experts who have studied the technology, such as University of Washington linguist Bender, those improvements won’t be enough.

Bender describes a language model as a system for “modeling the likelihood of different strings of word forms,” given some written data it’s been trained upon.

It’s how spell checkers are able to detect when you’ve typed the wrong word. It also helps power automatic translation and transcription services, “smoothing the output to look more like typical text in the target language,” Bender said. Many people rely on a version of this technology whenever they use the “autocomplete” feature when composing text messages or emails.

The latest crop of chatbots such as ChatGPT, Claude 2 or Google’s Bard try to take that to the next level, by generating entire new passages of text, but Bender said they’re still just repeatedly selecting the most plausible next word in a string.

When used to generate text, language models “are designed to make things up. That’s all they do,” Bender said. They are good at mimicking forms of writing, such as legal contracts, television scripts or sonnets.

“But since they only ever make things up, when the text they have extruded happens to be interpretable as something we deem correct, that is by chance,” Bender said. “Even if they can be tuned to be right more of the time, they will still have failure modes — and likely the failures will be in the cases where it’s harder for a person reading the text to notice, because they are more obscure.”

Those errors are not a huge problem for the marketing firms that have been turning to Jasper AI for help writing pitches, said the company’s president, Shane Orlick.

“Hallucinations are actually an added bonus,” Orlick said. “We have customers all the time that tell us how it came up with ideas — how Jasper created takes on stories or angles that they would have never thought of themselves.”

The Texas-based startup works with partners like OpenAI, Anthropic, Google or Facebook parent Meta to offer its customers a smorgasbord of AI language models tailored to their needs. For someone concerned about accuracy, it might offer up Anthropic’s model, while someone concerned with the security of their proprietary source data might get a different model, Orlick said.

Orlick said he knows hallucinations won’t be easily fixed. He’s counting on companies like Google, which he says must have a “really high standard of factual content” for its search engine, to put a lot of energy and resources into solutions.

“I think they have to fix this problem,” Orlick said. “They’ve got to address this. So I don’t know if it’s ever going to be perfect, but it’ll probably just continue to get better and better over time.”

Techno-optimists, including Microsoft co-founder Bill Gates, have been forecasting a rosy outlook.

“I’m optimistic that, over time, AI models can be taught to distinguish fact from fiction,” Gates said in a July blog post detailing his thoughts on AI’s societal risks.

He cited a 2022 paper from OpenAI as an example of “promising work on this front.”

But even Altman, as he markets the products for a variety of uses, doesn’t count on the models to be truthful when he’s looking for information for himself.

“I probably trust the answers that come out of ChatGPT the least of anybody on Earth,” Altman told the crowd at Bagler’s university, to laughter.

財富中文網所刊載內容之知識產權為財富媒體知識產權有限公司及/或相關權利人專屬所有或持有。未經許可，禁止進行轉載、摘編、復制及建立鏡像等任何使用。

0條Plus

精彩評論

評論

撰寫或查看更多評論

請打開財富Plus APP

前往打開

熱讀文章

關注我們

ChatGPT的“幻覺”是否會消失？

撰寫或查看更多評論

ChatGPT的“幻覺”是否會消失？