FmD4FRX3FmXvDZXvGZT3FRFgNBP1w326w3z1NBMhNV5=
Your Ads Here
items

Secret AI Reading List 

 



The question of what is on the AI ​​reading list is more than academic. Bots are not intelligent. They don't understand the world the way a person can. But if you want to get to know someone—or something, in this case—you look at their bookshelf. Chatbots don't just invent false facts, perpetuate blatant nonsense, and spit out insipid, homogenized nonsense. Turns out they're also giant nerds.



{toc}

In addition to the modern public school list - Charles Dickens and Jack London, Frankenstein, and Dracula - there are a few funny exceptions. We were glad to see the Maltese Falcon there; we believe Dashiell Hammett is a better badass detective writer than the more frequently cited Raymond Chandler. But if you skip the public domain material and look at the list of copyrighted books that GPT-4 swallowed - it's not much different from the earlier GPT 3.5 -reveals the true nature of the bot. Of course, The Fellowship of the Ring comes in third place, but you have to be very dedicated to Tolkien in order not to rebound from The Silmarillion (ninth place). "Do Androids Dream of Electric Sheep?" ranks 21st, just a few notches below Neuromancer, the two defining works of cyberpunk, a genre that, ironically, has sounded the AI ​​warning bell.

Question: does it matter? What awaits us if GPT-4 has the readership of a 14-year-old wimp from 1984? (Including, as it turns out, "1984" at number 2?)

What the AI ​​reads matters

ACCORDING TO SOME SOURCES, the GPT-4 database is huge - up to a petabyte. Thus, no novel (or 50 novels) could teach him, in particular, that becoming a caretaker of a haunted hotel is not a cure for writer's block (No. 49) or that fear kills the mind (No. 13). An ocean of data floods the islands of fantasy. “The data set used in pre-training is a large enough sample of text,” says Ted Underwood, a computer scientist at the University of Illinois, “that I’m not sure how much genre-specific bias affects the behavior of the resulting models.”

The presence of these particular books in the GPT-4 Digital Soul may simply reflect their presence in the general wild Internet from which the data was extracted. When Bamman's team includes public domain books in their tests, the scores go up - Alice's Adventures in Wonderland tops the list with a whopping 98%. Bamman's team did find that the books for which LLM scored high were represented on the Internet in roughly equal proportions. It makes sense. Chatbots didn't choose their books. Internet culture has done it.

However, it's not hard to imagine that all this science fiction the bots are reading will have the same detrimental effect on them as all the other data they've been trained on, creating the same random biases that always creep into the output of chatbots. . Sometimes they say racist things. They can summarize disinformation as if it were true because the same untruth often appears on the Internet.

The books we humans read to change our view of the world. But technically, chatbots don't think about anything. They build statistical and vector relationships between words.

Until OpenAI and other chatbot makers make their training data sets public, it will be difficult to understand their reading lists' impact on their results.

And if you are even more interested in the topic of AI, you want to know more and not miss news and reviews, subscribe to the channel in tenge, I will be pleased -

https://t.me/diakob_net

0/Post a Comment/Comments

Middle Post Ads

Your Ads Here
73745675015091643

Ad Bottom Posts

Your Ads Here
Your Ads Here