GPT-2 and DALL-E

Jun 14, 2022

Is Dall-E https://openai.com/dall-e-2/ a massive leap forward for computing or just more of the same, chasing a dream that is completely unattainable?

Just to show how far we’ve come- here’s an article we wrote 10 years ago.

https://www.teachingenglish.org.uk/article/turing-test

Splotchy is still going strong https://www.cooldictionary.com/splotchy.mpl

but that was a long time ago.

5 years ago https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures

00 to 40 secs

14.00 to 15.22

First, let’s look at some pretty pictures-

https://mashable.com/article/werid-dall-e-images#:~:text=DALL%2DE%20was%20originally%20invented,model%20that%20performs%20far%20better.

OpenAI's co-founder Ilya Sutkever told The New Yorker, "If a machine like GPT-2 could have enough data and computing power to perfectly predict the next word, that would be the equivalent of understanding.”

https://en.wikipedia.org/wiki/OpenAI

Gary Marcus disagrees: “GPT-2 has no deeper understanding of human relationships than ELIZA did; it just has a larger database. Anything that looks like genuine understanding is an illusion.”

One of my favourite computer scientists is Blaise Aguera y Arcas. (watch his tedtalks). He thinks it’s definitely on the right road-

https://medium.com/@blaisea/do-large-language-models-understand-us-6f881d6d8e75

Scott Aaronson (another of my computer heroes- also excellent tedtalks and blog) agrees-

https://scottaaronson.blog/?p=6411

“DALL-E’s capabilities by Ernest Davis, Gary Marcus, and myself. We wrote this preprint as a sort of “adversarial collaboration”: Ernie and Gary started out deeply skeptical of DALL-E, while I was impressed bordering on awestruck.”

https://arxiv.org/abs/2204.13807

“The DALL-E 2 system generates original synthetic images corresponding to an input text as caption. We report here on the outcome of fourteen tests of this system designed to assess its common sense, reasoning and ability to understand complex texts. All of our prompts were intentionally much more challenging than the typical ones that have been showcased in recent weeks. Nevertheless, for 5 out of the 14 prompts, at least one of the ten images fully satisfied our requests. On the other hand, on no prompt did all of the ten images satisfy our requests.”

https://news.ycombinator.com/item?id=31235209

Astral codex (Scott Alexander) believes it’s on the right road.

https://astralcodexten.substack.com/p/a-guide-to-asking-robots-to-design?s=w

https://astralcodexten.substack.com/p/my-bet-ai-size-solves-flubs?s=w

https://astralcodexten.substack.com/p/somewhat-contra-marcus-on-ai-scaling?s=r

Andrew Gelman- https://statmodeling.stat.columbia.edu/2022/01/14/a-chatbot-challenge-for-blaise-aguera-y-arcas-and-gary-smith/

Gary Marcus still disagrees: https://thegradient.pub/gpt2-and-the-nature-of-intelligence/

“One main line of Western intellectual thought, often called nativism, goes back to Plato and Kant; in recent memory it has been developed by Noam Chomsky, Steven Pinker, Elizabeth Spelke, and others (including myself). On the nativist view, intelligence, in humans and animals, derives from firm starting points, such as a universal grammar (Chomsky) and from core cognitive mechanisms for representing domains such as physical objects (Spelke).

A contrasting view, often associated with the 17th century British philosopher John Locke, sometimes known as empiricism, takes the position that hardly any innateness is required, and that learning and experience are essentially all that is required in order to develop intelligence. On this "blank slate" view, all intelligence is derived from patterns of sensory experience and interactions with the world.”

“It was trained on a massive 40 gigabyte dataset, and has 1.5 billion parameters that are adjusted based on the training data with no prior knowledge about the nature of language or the world, other than what is represented by the training set.”

There is a big problem here- “Marcus says they refuse to let him access them and he has to access it through friends, which boggles me”.

I can see if you’re spending a large amount of money on a new AI and someone is saying this is impossible and can never work, you’re probably not that keen on giving them access. But if you’re going to say this is a major breakthrough and a new form of intelligence on the planet, I think you have to give your critics a platform.

Now, next problem- what is human intelligence- a large part of Blaise Aguera y Arcas’ article deals with this problem.

From Jordan Peterson-

https://www.youtube.com/watch?v=5-Ur71ZnNVk

Lots of people in the comments section here, talking about the intelligence needed to drive a crane. It would be nice to measure IQ based on performing crane tests.

https://twitter.com/_HelenDale/status/1535709212302532615?s=20&t=DtCpiKNxGNXFasNpowSxNA

Back to Scott Alexander (Astral Codex), quoting reddit

I did IQ research as a grad student, and it involved a lot of this stuff. Did you know that most people (95% with less than 90 IQ) can't understand conditional hypotheticals? For example, "How would you have felt yesterday evening if you hadn't eaten breakfast or lunch?" "What do you mean? I did eat breakfast and lunch." "Yes, but if you had not, how would you have felt?" "Why are you saying that I didn't eat breakfast? I just told you that did." "Imagine that you hadn't eaten it, though. How would you have felt?" "I don't understand the question." It's really fascinating [...]
Other interesting phenomenon around IQ involves recursion. For example: "Write a story with two named characters, each of whom have at least one line of dialogue." Most literate people can manage this, especially once you give them an example. "Write a story with two named characters, each of whom have at least one line of dialogue. In this story, one of the characters must be describing a story with at least two named characters, each of whom have at least one line of dialogue." If you have less than 90 IQ, this second exercise is basically completely impossible. Add a third level ('frame') to the story, and even IQ 100's start to get mixed up with the names and who's talking. Turns out Scheherazade was an IQ test!
Time is practically impossible to understand for sub 80s. They exist only in the present, can barely reflect on the past and can't plan for the future at all. Sub 90s struggle with anachronism too. For example, I remember the 80-85s stumbling on logic problems that involved common sense anachronism stuff. For instance: "Why do you think that military strategists in WWII didn't use laptop computers to help develop their strategies?" "I guess they didn't want to get hacked by Nazis". Admittedly you could argue that this is a history knowledge question, not quite a logic sequencing question, but you get the idea. Sequencing is super hard for them to track, but most 100+ have no problem with it, although I imagine that a movie like Memento strains them a little. Recursion was definitely the killer though. Recursive thinking and recursive knowledge seems genuinely hard for people of even average intelligence.

So, if you don’t have a world model, are you human? And how do we test for world models? And if you’re not human, what are you?

What Agüera y Arcas gets right

Agüera y Arcas makes some observations that I think are (likely) well supported by the literature and furthermore key to understanding how human cognition, human language learning, and human communication work. Among other things, he says such things as:

“Furthermore, much of what we consider intelligence is inherently dialogic, hence social; it requires a theory of mind.”
“mutual modeling is so central to dialog, and indeed to any kind of real relationship”
“Our trick might be that we learn from other people who are actively teaching us (hence, modeling us)”
“This socially learned aspect of perception is likely more powerful than many of us realize; shorn of language, our experiences of many sensory percepts would be far less rich and distinct.”
“the inherently social and relational nature of any kind of storytelling”
“It’s obvious that our species is primed to [ascribe personhood to machines] from the way so many children have freely projected personhood onto stuffies, or even favorite blankets, long before such artifacts were capable of talking back.”

Yes, humans are highly social. Yes, our experience and learning is not just embodied but also socially situated. For example, Clark (1996; Using Language) describes it as joint activity, where all participants are mutually aware of the activity, of each other’s roles in it, and of each other’s awareness. Baldwin (1995) (p.132) delightfully evokes “intersubjective awareness” as follows:

“And, of course, it is just this aspect of the joint attention experience — intersubjective awareness — that makes simultaneous engagement with some third party of such social value to us. It is because we are aware of simultaneous engagement that we can use it as a springboard for communicative exchange.”

These facts — that relationships are central to human experience and that we are primed to imagine minds in inanimate objects even knowing they aren’t there — are important context for the discussion below. The (real) disabled people that Agüera y Arcas cites are (it should go without saying!) fully human and thus live in networks of relationships to other humans. The fact that we’re primed to imagine minds where there are none is one factor that can lead researchers off the rails when trying to argue for a lack of distinction between machines and (some?) humans.

[Edit 2/2/22: It has been pointed out to me the phrase “theory of mind” has been used in ableist ways to dehumanize autistic people and also this section can be read as saying that forming relationships in the way that neurotypical people do or learning language in the way that neurotypical people do are necessary conditions for being human. I want to be clear that this is not the case. Autistic people are people; fully human people.]

Agüera y Arcas is making the analogy “LLMs are like Deafblind people”, ostensibly to show how LLMs are more like people. But he hasn’t shown (and, I’d argue, can’t) that LLMs are like people, with internal lives, relationships, and full personhood. So the analogy ends up dehumanizing Deafblind people, by saying they are like something that is patently not human, and saying so specifically because of their disability. And even if you believe that LLMs might be somewhat human-like, the analogy is still dehumanizing, in saying that Deafblind people are closer to these (ostensibly) partially human-like objects than other people. And that, in turn, suggests that non-Deafblind people are more fully human.

Blaise

“Time and reasoning

Technically, a movie is nothing but a stack of still images. Still, something special happens when these images are run through quickly enough to lose their individual quality and turn into continuous, lifelike motion (the effect known in psychology as “persistence of vision”).³³ Here, a meaningful difference is revealed between large language models like GPT-3 or LaMDA and neural networks that, whether biological or digital, operate continuously in time.

For language models, time as such doesn’t really exist; only conversational turns in strict alternation, like moves in a game of chess. Within a conversational turn, letters or words are emitted sequentially with each “turn of the crank”. In this quite literal sense, today’s language models are made to say the first thing that comes to mind. “

In social environments, we must also do this at second order. Graziano refers to this as awareness of someone else’s attention. He uses the familiar experience of watching a puppet show to illustrate the effect:³⁸

When you see a good ventriloquist pick up a puppet and the puppet looks around, reacts, and talks, you experience an illusion of an intelligent mind that is directing its awareness here and there. Ventriloquism is a social illusion. […] This phenomenon suggests that your brain constructs a perception-like model of the puppet’s attentional state. The model provides you with the information that awareness is present and has a source inside the puppet. The model is automatic, meaning that you cannot choose to block it from occurring. […] With a good ventriloquist who knows how to move the puppet in realistic ways, to direct its gaze with good timing, to make it react to its environment in a plausible way — with the right cues that tickle your system in the right way — the effect pops out. The puppet seems to come alive and seems to be aware of its world.

The Sheep that thinks it's a Wolf

Discussion about this post