GPT-4 and the AI Revolution

by **Nilokeras** » Sat Mar 25, 2023 12:20 am

Umeria wrote:
Nilokeras wrote:If I know for a fact that there are 5329 sparrows in 2019 and I run the model and it says in 2019, based on environmental conditions, reproductive rates/mortality rates, food availability, human impacts and whatever else, there are 10 million of them, the model is simply wrong, and we go to understand why it isn't working. When ChatGPT does it, AI researchers insist it's 'hallucinating', and clasp their hands to their cheeks and go 'now isn't that interesting'. It's not, it's a model doing what models do - be wrong. And once you understand that or don't try to delude yourself into thinking its special, it's hard not to see all of this as self-serving pablum.

Ah, they're ignoring one of the basic principles of programming: "If there's a bug, it's because you put it there."

It's not even a bug, per se. Statistical models, by their nature, are not always accurate - we try to minimize the rate in which the model predicts the wrong value or outcome, but in some small percentage of times it runs there will be an incorrect prediction. In ecology we try to be very fastidious about keeping track of how often this happens, report accurately on the rate and design the models to minimize them.

ChatGPT is no different here: sometimes because it is using a very complicated statistical technique to predict which words should be assembled in which order, it gets it wrong. It'll say 'logistical regression' instead of 'beta regression'. Which is a problem in and of itself when people use it as a general-purpose search engine or Quora replacement, because it will state the wrong answer to a question very authoritatively.

The other type of 'error' of course is when people purposefully ask it an odd or leading question, and it confidently creates a strange pastiche of an answer - like this from the WIki page:

"Scientists have recently discovered churros, the delicious fried-dough pastries... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients.

It is, after all, building its response based on the words you provide it, so if 'churro' , 'scientists' and 'surgery' in a prompt, it will obviously incorporate them into its response, which is a mash-up of a scientific press release or news article and descriptions of food or recipes. It's not a search engine, after all: it doesn't incorporate info about context, source or even the ultimate veracity of its responses. It is purely and entirely looking at the statistical relationships between the words you give it and the words in its corpus.

If this were an academic curiosity it might be interesting by itself, again, but there's a whole entire billion dollar tech industry looking for The Next Big thing and a coterie of parasites, rubes and hucksters have latched onto it, treating it as though it were some sort of alien life form or advanced AI when it gives them spooky answers to a spooky prompt. In reality it's an iterative and not particularly original product of the infinite money and computational power Silicon Valley can throw at a problem to solve it clumsily.

by **Forsher** » Sat Mar 25, 2023 12:47 am

Nilokeras wrote:
Forsher wrote:R can't read. You can read tables into R, but it can't in any sense "read" it... just store the data in a way that makes it easier for a thinking machine on the other end to manipulate the data.

Which is exactly what we've done by giving it an internet full of StackOverflow posts. You asked it a StackOverflow-esque question, you got a StackOverflow-esque answer.

Forsher wrote:I have no idea why you're so rigidly opposed to the notion that there is something remarkable about the extent and ability of ChatGPT's natural language processing. It's weird.

"It has no understanding"? Who cares?

Let's take a step back then.

I work doing a lot of complex modeling of ecological phenomena for government and industry. This is complex modeling both in the complexity of statistical tools being used as well as the complexity of the processes being modeled, like, say, 'what is the likelihood that the endangered yellow marsh sparrow will go extinct within 15 years'. We often have much less data than we'd like to make these predictions (a hodgepodge of different bird surveys, for example), and have to be extremely thoughtful about the hows and whys of model design, dealing with data limitations and maybe most importantly communicating those limitations with stakeholders. We have to do all this very transparently and with great care, because if our models are wrong the consequences are serious and permanent - if my model describing the likelihood the yellow marsh sparrow will go extinct is wrong and says that their population is fine when it's really not, the species will lose its protections and maybe go extinct.

The people creating these models are doing the opposite. They have the infinite money hose of a bunch of billionaire tech magnates and are brute forcing the problem: taking gigantic corpuses of pre-structured, pre-curated text and feeding them into a gigantic machine learning classifier that will ideally, when given a prompt of text, spit back out a block of text relevant to its input. They then tweaked it by asking it training questions and fiddling with the machine learning parameters until it gave them answers they wanted. It's a black box - probably the world's most complex black box.

People then attribute magical abilities to this black box because they both a.) don't understand how it actually works and b.) forget that the information content of ChatGPT's outputs is not coming from ChatGPT, but from the incredibly structured sets of information it is trained on, including, for many cases, quite literal answers to the questions we ask ChatGPT. Because answers repeat over and over, it can create pastiches of them and produce a reasonable facsimile of a real answer - like your StackOverflow prompt - without adding or synthesizing any new information whatsoever. It got 'logistic regression' out of your prompt, for example, because logistic regressions might be more commonly mentioned than beta regressions, and threw the term in because that was the most likely word to stick in front of 'regression'.

This would all be not really worth discussing outside of a niche circle, except because of said firehose of billionaire tech magnate money, there is an entire ecosystem of grifters and rubes out there who completely misunderstand or misrepresent what ChatGPT is and what it does. Like, say, below:

Forsher wrote:ChatGPT, no idea about GPT-4, has many major limitations both the ability to have fun with it and its capacity to do useful things. Some of these limitations didn't exist at release (it used to, as I have mentioned in another thread, be able to remember up to 3000 words in a conversation, now it struggles to remember a 1000), but some of them are somewhat inherent. The one that's attracted the most attention is ChatGPT's tendency to hallucinate. In humans, who have understanding, we have a name for this kind of hallucination: the Dunning-Kruger Effect. ChatGPT's hallucinations are a particularly extreme form, perhaps better termed to be fake it till you make it, but you see the point... "understanding" wouldn't solve the issue. Maybe if you hooked it up to the internet in a way where it could search, the lack of understanding would lead it to reproduce all sorts of nonsense, but if you go into the Feminist Thread and start talking about the Patriarchy, you'll be accused of reproducing nonsense yourself. Again, the capacity to have understanding doesn't fix a problem that actually exists. In fact, it'd arguably create one... would it be fair to create a sapient entity that hasn't got a physical body?

In my field, its common practice to evaluate the performance of a model by sub-setting the data used to develop it - if we are modeling genuine ecological processes, for example the future population size of our yellow marsh sparrow, removing sub-samples of the measurements used to parameterize the model will help us determine how sensitive it is to the data its given. Overfitted models struggle to incorporate data from outside its original pool of training data, and will produce inaccurate results if the data it's given to create predictions for is subsetted or subsituted.

If I know for a fact that there are 5329 sparrows in 2019 and I run the model and it says in 2019, based on environmental conditions, reproductive rates/mortality rates, food availability, human impacts and whatever else, there are 10 million of them, the model is simply wrong, and we go to understand why it isn't working. When ChatGPT does it, AI researchers insist it's 'hallucinating', and clasp their hands to their cheeks and go 'now isn't that interesting'. It's not, it's a model doing what models do - be wrong. And once you understand that or don't try to delude yourself into thinking its special, it's hard not to see all of this as self-serving pablum.

I don't think I've ever seen anyone put this much effort into a non-sequitur.

Your task is simple. You want to explain why we care that ChatGPT doesn't "understand". To do this you talk about a problem it is manifestly unsuited for. In the abstract sense your case study discusses two common complaints: overfitting and blackbox modelling. Neither of these have anything to do with ChatGPT's end use. ChatGPT is not a model in this sense. We don't use it to do modelling.

You've played a game like Age of Empires, right? That's got an AI. That's the kind of model that ChatGPT is. It's not like a neural net, a black box model, or even OLS, a traditional non-black box. We're not feeding it data hoping to generate inferences or predictions. What we're doing is inputting a set of prompts and hoping to get comprehensible responses. Like how, for example, if you built a tower on the AI's wood line, you'd hope that it'd do something rather than suiciding its villagers.

ChatGPT is, in fact, somewhat capable of creating novel responses to novel situations. It used to be much more capable of it before it was lobotomised (the aforementioned memory thing) but it still can. You can literally test this for yourself by feeding it a novel exercise. For example, let's feed it a fake movie sequel to a fake television series:

Act 1: Emily and Jessica move to Gloucester, where Emily struggles to adapt to her new school and her mother's controlling behaviour. Emily becomes increasingly rebellious and distant from her mother, who is preoccupied with her menial job and her own regrets and secrets. Meanwhile, Jack visits his grandfather William at the care home and produces a photograph from 1893, which he wishes to compare to William’s only photograph of his mother, taken in 1940. They are the same woman; a woman identical in appearance to Jessica. William’s doctors try to tell Jack that his grandfather is senile, but Jack has found proof William’s insistence that Jack’s great-grandmother Doris is immortal is true. At work, Jack abuses his managerial role as an intelligence officer to turn the apparatus of the state towards finding a woman who looks like Doris, believing she is a threat to national security.

Act 2: While commuting to GCHQ's base in Cheltenham, Jack happens to see Emily, whose appearance reminds him of Doris so strongly he has his unit focus their attentions in Gloucester. A peoplewatcher, Emily begins to believe that she’s being followed and her attempts to shake Jack’s team cause them to panic and arrest her. At an offsite, Emily begins to internalise the theory of the questions Jack's team asks, i.e. that her mother is some kind of terrorist but remains loyal, while Jack silently observes. Convinced she knows nothing, Jack releases Emily believing her only value is in being followed to "Doris”. Jack demonstrates greater competence than his team by following Emily undetected, while his team go behind Jack’s back to present their evidence to Jack’s superiors.

Act 3: Emily goes to the Forest of Dean where Jessica is rambling. Seeing that they’re alone, Jack confronts Emily; and produces the original, clearly aged, photographs of Doris. Emily dismisses Jack as a crazy person, and runs into the Forest, managing to lose Jack. Realising he’s not equipped to follow Emily alone, Jack returns to the carpark where he spots his team moving into position to ambush Jessica. Concerned, Jack retreats back into the Forest and the movie concludes with Jack’s trying to follow Emily’s trail, Emily looking for her mother and Jessica walking through the Forest. At the end, Jessica returns to her car and is surrounded by Jack's underlings, Emily stumbles into a cave and finds Merlin’s tomb, and, utterly lost, Jack rests by a lake, where he is approached by Nimue.

Now, what is a novel question we might ask about this? How about "Using the following synopsis, prepare an essay contrasting the relationship of Jack and William to Emily and Jessica:"? ChatGPT cannot consult any corpus of pre-prepared answers because, again, this story doesn't exist. In fact, it's a double layer of non-existence because it really only makes sense in the context of a television show that also does not exist. We should not be able to get a comprehensible answer out of ChatGPT. And, in fact, I didn't. So I've had to re-adjust the prompt. "Using the following synopsis, prepare an essay which considers the differences and similarities between two relationships. You will identify the two sets of relationships. You will only discuss two relationships. Here is the synopsis:"

The two relationships that will be compared in this essay are the relationship between Emily and Jessica and the relationship between Jack and William. Despite the stark differences between these relationships, there are also some notable similarities.

The relationship between Emily and Jessica is characterized by a lack of communication and understanding. Emily struggles to adapt to her new school and her mother's controlling behaviour. Jessica is preoccupied with her menial job and her own regrets and secrets, which exacerbates the distance between them. Emily's attempts to assert her independence are met with resistance from her mother, and she becomes increasingly rebellious. On the other hand, Jack and William's relationship is rooted in their shared interest in their family history. Jack is determined to prove his grandfather is not senile and that his great-grandmother Doris is immortal. He uses his position as an intelligence officer to further his investigation, and William's insistence that Jack's great-grandmother is alive is proven to be true.

Despite the differences between the two relationships, both Emily and Jack are driven by their own personal motivations. Emily is rebelling against her mother's controlling behaviour, while Jack is obsessed with finding a woman who looks like Doris. Both characters are blinded by their own motivations and are unable to see the bigger picture. Emily's rebellious behaviour leads her to be followed by Jack and his team, who are obsessed with finding a woman who looks like Doris. Emily becomes increasingly paranoid, and her attempts to shake Jack's team lead to her arrest. Jack's obsession with finding a woman who looks like Doris leads him to abuse his managerial role as an intelligence officer and turn the apparatus of the state towards finding her. He becomes convinced that Emily is the key to finding his great-grandmother and orders his team to follow her.

Despite these similarities, the two relationships differ in their dynamics. Emily and Jessica's relationship is strained due to Emily's rebellious behaviour and her mother's controlling nature. Jessica is unable to connect with her daughter, and their relationship becomes increasingly distant. In contrast, Jack and William's relationship is based on their shared interest in their family history. Jack is determined to prove his grandfather is not senile and that his great-grandmother Doris is immortal.

In conclusion, although the relationships between Emily and Jessica and Jack and William differ in their dynamics, both are driven by personal motivations that lead to obsession and blindness to the bigger picture. The similarities and differences between these relationships highlight the complexity of human relationships and how they can be influenced by personal motivations and external factors.

If a human gave you this, it'd be nothing to write home about. It's not well written and it's incredibly surface level but ChatGPT has identified two relationships, noticed differences and vaguely conceived of a similarity. Do I have any idea what the fuck it means by "the bigger picture"? No, because ChatGPT doesn't know what the bigger picture is (see: the events of the fictional television show). But this is remarkable. Ten years ago a computer couldn't do this. Six months ago a computer available to the public could not do this.

This is so far removed from an analyst taking a data set, coming up with a wrong model and forcing a computer to make it, I can't really comprehend how you've got on to this line of thought.

Someone is showing you a car and you're busy trying to explain how it's identical to a power drill because they both run on batteries. Or, perhaps, a better analogy is that someone's showing you, I dunno, SAS and you're going "I can do my own matrix algebra, thanks".

And, by the way, ChatGPT's hallucinations are more like feeding a model a dataset with a bunch of missing values and it just creates some out of thin air. When a model is wrong, the model isn't wrong... the analyst is wrong. When ChatGPT hallucinates, it's making stuff up and rolling with it. Maybe you can argue that this is akin to a misspecification error on the part of ChatGPT's makers, but this is a different process.

by **Forsher** » Sat Mar 25, 2023 1:10 am

Umeria wrote:
Nilokeras wrote:If I know for a fact that there are 5329 sparrows in 2019 and I run the model and it says in 2019, based on environmental conditions, reproductive rates/mortality rates, food availability, human impacts and whatever else, there are 10 million of them, the model is simply wrong, and we go to understand why it isn't working. When ChatGPT does it, AI researchers insist it's 'hallucinating', and clasp their hands to their cheeks and go 'now isn't that interesting'. It's not, it's a model doing what models do - be wrong. And once you understand that or don't try to delude yourself into thinking its special, it's hard not to see all of this as self-serving pablum.

Ah, they're ignoring one of the basic principles of programming: "If there's a bug, it's because you put it there."

This isn't true. Black boxes are completely capable of creating their own bugs (some, anyway). It's one of the reasons people don't like them. If you can't see how it gets an answer, you can't see why it's got a wrong one.

Nilokeras wrote:
Umeria wrote:Ah, they're ignoring one of the basic principles of programming: "If there's a bug, it's because you put it there."

It's not even a bug, per se. Statistical models, by their nature, are not always accurate - we try to minimize the rate in which the model predicts the wrong value or outcome, but in some small percentage of times it runs there will be an incorrect prediction. In ecology we try to be very fastidious about keeping track of how often this happens, report accurately on the rate and design the models to minimize them.

ChatGPT is no different here: sometimes because it is using a very complicated statistical technique to predict which words should be assembled in which order, it gets it wrong. It'll say 'logistical regression' instead of 'beta regression'. Which is a problem in and of itself when people use it as a general-purpose search engine or Quora replacement, because it will state the wrong answer to a question very authoritatively.

The other type of 'error' of course is when people purposefully ask it an odd or leading question, and it confidently creates a strange pastiche of an answer - like this from the WIki page:

"Scientists have recently discovered churros, the delicious fried-dough pastries... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients.

It is, after all, building its response based on the words you provide it, so if 'churro' , 'scientists' and 'surgery' in a prompt, it will obviously incorporate them into its response, which is a mash-up of a scientific press release or news article and descriptions of food or recipes. It's not a search engine, after all: it doesn't incorporate info about context, source or even the ultimate veracity of its responses. It is purely and entirely looking at the statistical relationships between the words you give it and the words in its corpus.

If this were an academic curiosity it might be interesting by itself, again, but there's a whole entire billion dollar tech industry looking for The Next Big thing and a coterie of parasites, rubes and hucksters have latched onto it, treating it as though it were some sort of alien life form or advanced AI when it gives them spooky answers to a spooky prompt. In reality it's an iterative and not particularly original product of the infinite money and computational power Silicon Valley can throw at a problem to solve it clumsily.

https://twitter.com/rtombs/status/15985 ... 68?lang=en

i.e. it was asked to come up with bullshit and did so, it's little different to asking for and receiving a story about some nonsense, e.g.

It was a cold and dreary day in the fridge, and the ginger beer sat alone on the shelf, longing for someone to share its bubbly company. Its label was peeling and it was starting to feel forgotten. But little did it know, fate had a romantic entanglement in store for it.

One day, a new arrival appeared on the shelf - a milk bottle with a sleek and shiny exterior. The ginger beer couldn't help but feel a flutter in its heart as it watched the milk bottle being placed next to it. The two of them started chatting and getting to know each other, and soon they were inseparable.

But their love story was not without its challenges. For you see, there was a mouldy loaf of bread lurking nearby, jealous of the attention the two were receiving. It had long been in the fridge, and its once-fresh exterior had turned into a green and fuzzy mess.

The mouldy loaf of bread couldn't stand the sight of the ginger beer and the milk bottle together, and it began to plot its revenge. It started whispering lies and spreading rumours, trying to drive a wedge between the two lovers. The ginger beer and the milk bottle started to argue, and it seemed like their relationship was on the brink of collapse.

But then, one day, the mouldy loaf of bread's reign of terror came to an end. The fridge door opened, and a hand reached in and grabbed it, tossing it into the bin. The ginger beer and the milk bottle were finally able to see the truth and reconcile their differences. They realized that their love was stronger than any outside influence, and they vowed to never let anyone come between them again.

And so, the ginger beer and the milk bottle lived happily ever after, spending their days together on the shelf, fizzing and swirling in each other's company. The fridge may have been cold, but their love was warm and true.

Guess the prompt.

by **Umeria** » Sat Mar 25, 2023 2:20 am

Forsher wrote:
Umeria wrote:Ah, they're ignoring one of the basic principles of programming: "If there's a bug, it's because you put it there."

This isn't true. Black boxes are completely capable of creating their own bugs (some, anyway). It's one of the reasons people don't like them. If you can't see how it gets an answer, you can't see why it's got a wrong one.

The bug in that case would be in the creation of the black box

by **Dimetrodon Empire** » Sat Mar 25, 2023 7:36 am

Nilokeras wrote:
The United Penguin Commonwealth wrote:you can do both. I think it makes sense to protest against widespread adoption of this tech if it ends up replacing human jobs.

It already is replacing jobs.

So what now? Picket signs outside Microsoft? That sounds effective.

The AI capitalist dystopia is finally upon us.

by **Ifreann** » Sat Mar 25, 2023 7:59 am

Nilokeras wrote:The other type of 'error' of course is when people purposefully ask it an odd or leading question, and it confidently creates a strange pastiche of an answer - like this from the WIki page:

"Scientists have recently discovered churros, the delicious fried-dough pastries... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients.

It is, after all, building its response based on the words you provide it, so if 'churro' , 'scientists' and 'surgery' in a prompt, it will obviously incorporate them into its response, which is a mash-up of a scientific press release or news article and descriptions of food or recipes. It's not a search engine, after all: it doesn't incorporate info about context, source or even the ultimate veracity of its responses. It is purely and entirely looking at the statistical relationships between the words you give it and the words in its corpus.

I've seen a few examples of people bamboozling ChatGPT by asking it very well known questions, but with the details changed. Like, asking if two kilograms of feathers weighs more or less than one kilogram of coal. It'll insist that they weigh the same. Or if you ask it the Monty Hall problem, but specify that all the doors are open and that you know for a fact that you've already chosen the car, it'll tell you to switch. Or ask about trying to cross the river with a chicken, grain, and a fox, but specify that the boat can easily accommodate them all. Chat GPT will have you rowing back and forth needlessly.

by **Forsher** » Sat Mar 25, 2023 11:06 am

Ifreann wrote:
Nilokeras wrote:The other type of 'error' of course is when people purposefully ask it an odd or leading question, and it confidently creates a strange pastiche of an answer - like this from the WIki page:

"Scientists have recently discovered churros, the delicious fried-dough pastries... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients.

It is, after all, building its response based on the words you provide it, so if 'churro' , 'scientists' and 'surgery' in a prompt, it will obviously incorporate them into its response, which is a mash-up of a scientific press release or news article and descriptions of food or recipes. It's not a search engine, after all: it doesn't incorporate info about context, source or even the ultimate veracity of its responses. It is purely and entirely looking at the statistical relationships between the words you give it and the words in its corpus.

I've seen a few examples of people bamboozling ChatGPT by asking it very well known questions, but with the details changed. Like, asking if two kilograms of feathers weighs more or less than one kilogram of coal. It'll insist that they weigh the same. Or if you ask it the Monty Hall problem, but specify that all the doors are open and that you know for a fact that you've already chosen the car, it'll tell you to switch. Or ask about trying to cross the river with a chicken, grain, and a fox, but specify that the boat can easily accommodate them all. Chat GPT will have you rowing back and forth needlessly.

I have tried two of these. Here is a prompt for one of them:

Suppose John is Antarctica and is holding 2kg of feathers. Paul is in Arizona and is holding 1kg of coal. Who is holding more weight?

Here is its answer:

Both John and Paul are holding the same weight, which is a total of 3 kilograms (2 kilograms of feathers + 1 kilogram of coal). The weight of an object is determined by its mass and the force of gravity acting on it, which is the same on Earth, Antarctica, and Arizona. Therefore, the location of John and Paul does not affect the weight of the objects they are holding.

That's a natural language processing failure more than it is a reasoning failure. But we know it has no reasoning skills so you're fixating on the wrong thing... this example screws up what ChatGPT is supposed to be able to do. I did find a way of getting ChatGPT to understand the question, and it got that wrong as advertised:

John is holding 2 kilograms of feathers, while Paul is holding 1 kilogram of coal. Since the weight of feathers and coal is different, we cannot directly compare the two. However, we can say that the total weight of what John is holding (2 kilograms) is equal to the total weight of what Paul is holding (1 kilogram). Therefore, neither John nor Paul is holding more weight than the other, but they are holding different amounts of matter with different weights.

I tried the boat example as well. It also screws that up but it at least understood the exercise.

by **Necroghastia** » Sun Mar 26, 2023 2:10 pm

i love the future

by **Ifreann** » Sun Mar 26, 2023 2:43 pm

Necroghastia wrote:i love the future

Source: my cousin made it up

by **Nilokeras** » Sun Mar 26, 2023 4:22 pm

Forsher wrote:Your task is simple. You want to explain why we care that ChatGPT doesn't "understand". To do this you talk about a problem it is manifestly unsuited for. In the abstract sense your case study discusses two common complaints: overfitting and blackbox modelling. Neither of these have anything to do with ChatGPT's end use. ChatGPT is not a model in this sense. We don't use it to do modelling.

ChatGPT is absolutely a model. It's a large language model. If you go to the OpenAI website on the 'Introducing ChatGPT' page they say it explicitly: 'We’ve trained a model called ChatGPT which interacts in a conversational way".

Forsher wrote:You've played a game like Age of Empires, right? That's got an AI. That's the kind of model that ChatGPT is. It's not like a neural net, a black box model, or even OLS, a traditional non-black box. We're not feeding it data hoping to generate inferences or predictions. What we're doing is inputting a set of prompts and hoping to get comprehensible responses. Like how, for example, if you built a tower on the AI's wood line, you'd hope that it'd do something rather than suiciding its villagers.

Strategy game 'AI' is also a model. They are a great example of the types of problems real developers who don't have Peter Thiel's infinite money hose have to deal with too - their goal is to emulate the behaviour of a human opponent playing a game and responding to the stimuli of other 'AI' and human players. The way they do that is often through complicated decision trees that produce the kinds of actions required to succeed at the game: collecting resources, building improvements, constructing military units, etc, with set levels of aggression based on difficulty levels and AI 'personalities'.

The problem of course is that you can't actually build an 'AI' like this that can beat a human, or act in new or interesting ways. These trees are constructed by people working on a deadline, with a finite amount of time and resources to flesh out all the responses. So they help the AI out by giving it free resources or removing the fog of war, advantaging it over human players by 'cheating'.

The makers of ChatGPT are faced with a similar problem: their pitch was obviously to create a sort of general purpose question-answer product that, when given a question someone might ask in a search engine, will synthesize a unique and true response - or otherwise non-plagiarized - to the prompt. In essence they wanted to emulate a human response to the task: go out into the internet, read about the thing being asked, and synthesize a response.

Forsher wrote:ChatGPT is, in fact, somewhat capable of creating novel responses to novel situations. It used to be much more capable of it before it was lobotomised (the aforementioned memory thing) but it still can. You can literally test this for yourself by feeding it a novel exercise. For example, let's feed it a fake movie sequel to a fake television series:

*snip*

Now, what is a novel question we might ask about this? How about "Using the following synopsis, prepare an essay contrasting the relationship of Jack and William to Emily and Jessica:"? ChatGPT cannot consult any corpus of pre-prepared answers because, again, this story doesn't exist. In fact, it's a double layer of non-existence because it really only makes sense in the context of a television show that also does not exist. We should not be able to get a comprehensible answer out of ChatGPT. And, in fact, I didn't. So I've had to re-adjust the prompt. "Using the following synopsis, prepare an essay which considers the differences and similarities between two relationships. You will identify the two sets of relationships. You will only discuss two relationships. Here is the synopsis:"

*snip*

If a human gave you this, it'd be nothing to write home about. It's not well written and it's incredibly surface level but ChatGPT has identified two relationships, noticed differences and vaguely conceived of a similarity. Do I have any idea what the fuck it means by "the bigger picture"? No, because ChatGPT doesn't know what the bigger picture is (see: the events of the fictional television show). But this is remarkable. Ten years ago a computer couldn't do this. Six months ago a computer available to the public could not do this.

There is nothing novel there. You gave it a prompt with a vocabulary and structure designed to get it to spit out a first year essay, and it gave you an essay. It had an ocean of essays, TV review blog posts, media analysis sites and other materials to base its response on, all of which are structured in similar ways. That repetitive, structured information is what builds the statistical relationships between words that lets ChatGPT assemble a response.

For example, since you love asking ChatGPT things so much, I gave it a simple query: 'write a 20 line sonnet in Middle English about springtime'. It gave me this:

Whan Spring hath come with sunne and softest breeze,
And Earth doth waken from her winter sleep,
The crocus and the daffodil doth please,
And from the earth, green blades doth ever creep.

The birds do sing their sweetest songs of love,
As they flit and flutter through the trees,
And all around us, new life doth now move,
As Nature's magic fills the warming breeze.

It's a sonnet, and it's about springtime, but it is not in Middle English. Which makes sense: ChatGPT does not 'understand' the prompt I gave it. The prompt is just a series of words that it uses to model a response - 'write' 'sonnet' '20' 'lines' being foremost among them. 'in Middle English' is an easily understood set of parameters for a human, but if the model is building a response purely based on statistical associations between words it will fall very flat, because 'middle' and 'English' are nondescript and common words by themselves that can easily overwhelm the signal in the data of the specific term 'Middle English', if it picked up on it at all.

Which is the challenge with statistical approaches to the problem of a general-purpose chat and information synthesis algorithm: once you move beyond the obvious questions that are repeated enough in the corpus that the statistical relationship between words are strong enough to generate a stable and persistent response no matter how many users ask it (ie 'who is the current president of the USA'), things can get muddled very quickly. Or, as we saw with the churros, clever or malicious humans can ask sufficiently strange questions that the curtain gets pulled back and ChatGPT is revealed to not be doing true synthesis and the retrieval of correct information, but just assembling patterns of words.

Forsher wrote:And, by the way, ChatGPT's hallucinations are more like feeding a model a dataset with a bunch of missing values and it just creates some out of thin air. When a model is wrong, the model isn't wrong... the analyst is wrong. When ChatGPT hallucinates, it's making stuff up and rolling with it. Maybe you can argue that this is akin to a misspecification error on the part of ChatGPT's makers, but this is a different process.

Predictive models are designed to 'create some' new data. Such as creating a prediction of what the population of yellow marsh sparrows is going to be in 15 years. When the model says that there are going to be a billion sparrows, it's not hallucinating - it's just wrong.

by **Nilokeras** » Sun Mar 26, 2023 4:36 pm

Ifreann wrote:
Nilokeras wrote:The other type of 'error' of course is when people purposefully ask it an odd or leading question, and it confidently creates a strange pastiche of an answer - like this from the WIki page:

"Scientists have recently discovered churros, the delicious fried-dough pastries... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients.

It is, after all, building its response based on the words you provide it, so if 'churro' , 'scientists' and 'surgery' in a prompt, it will obviously incorporate them into its response, which is a mash-up of a scientific press release or news article and descriptions of food or recipes. It's not a search engine, after all: it doesn't incorporate info about context, source or even the ultimate veracity of its responses. It is purely and entirely looking at the statistical relationships between the words you give it and the words in its corpus.

I've seen a few examples of people bamboozling ChatGPT by asking it very well known questions, but with the details changed. Like, asking if two kilograms of feathers weighs more or less than one kilogram of coal. It'll insist that they weigh the same. Or if you ask it the Monty Hall problem, but specify that all the doors are open and that you know for a fact that you've already chosen the car, it'll tell you to switch. Or ask about trying to cross the river with a chicken, grain, and a fox, but specify that the boat can easily accommodate them all. Chat GPT will have you rowing back and forth needlessly.

This is a great example too: it isn't actually doing any math or analysis all when you give it logic problems or math problems. That it is capable of 'answering' the Monty Hall problem is purely down to the fact that the internet contains many examples of the problem and answers to the problem written out, and when you include all the words describing it into the prompt box it will give you a pastiche of all the words statistically associated with the question, while failing to 'respond' to the logical problems you may have embedded in the question because it simply doesn't understand the question in any meaningful way.

A simpler example people have found out is that ChatGPT is capable of 'answering' simple math problems (ie 2 x 15) because somewhere on the internet there are tables of answers to simple multiplication questions and it can build the correct statistical relationship between the words in the question ('what is 2 times 15') and the correct answer. Ask it an algebra question, though, and it falls flat because there typically aren't whole webpages dedicated to laying out the answers to every possible algebra question and it can't draw on statistical relationships between the words and numbers to pull the correct response out of the corpus.

by **Forsher** » Sun Mar 26, 2023 5:02 pm

Nilokeras wrote:
Forsher wrote:Your task is simple. You want to explain why we care that ChatGPT doesn't "understand". To do this you talk about a problem it is manifestly unsuited for. In the abstract sense your case study discusses two common complaints: overfitting and blackbox modelling. Neither of these have anything to do with ChatGPT's end use. ChatGPT is not a model in this sense. We don't use it to do modelling.

ChatGPT is absolutely a model. It's a large language model. If you go to the OpenAI website on the 'Introducing ChatGPT' page they say it explicitly: 'We’ve trained a model called ChatGPT which interacts in a conversational way".

Which doesn't establish that it is a model in that sense. Like I already said... we don't use it to do modelling.

Strategy game 'AI' is also a model.

I literally just said that. See: "That's the kind of model that ChatGPT is."

The makers of ChatGPT are faced with a similar problem: their pitch was obviously to create a sort of general purpose question-answer product that, when given a question someone might ask in a search engine, will synthesize a unique and true response - or otherwise non-plagiarized - to the prompt. In essence they wanted to emulate a human response to the task: go out into the internet, read about the thing being asked, and synthesize a response.

And? So fucking what?

There is nothing novel there. You gave it a prompt with a vocabulary and structure designed to get it to spit out a first year essay, and it gave you an essay. It had an ocean of essays, TV review blog posts, media analysis sites and other materials to base its response on, all of which are structured in similar ways. That repetitive, structured information is what builds the statistical relationships between words that lets ChatGPT assemble a response.

So, your definition of novel is "there is nothing novel". If you can conceive of a task which you'd recognise as being novel, by all means tell us so we can see if ChatGPT can do it.

For example, since you love asking ChatGPT things so much, I gave it a simple query: 'write a 20 line sonnet in Middle English about springtime'. It gave me this:

Whan Spring hath come with sunne and softest breeze,
And Earth doth waken from her winter sleep,
The crocus and the daffodil doth please,
And from the earth, green blades doth ever creep.

The birds do sing their sweetest songs of love,
As they flit and flutter through the trees,
And all around us, new life doth now move,
As Nature's magic fills the warming breeze.

It's a sonnet, and it's about springtime, but it is not in Middle English. Which makes sense: ChatGPT does not 'understand' the prompt I gave it. The prompt is just a series of words that it uses to model a response - 'write' 'sonnet' '20' 'lines' being foremost among them. 'in Middle English' is an easily understood set of parameters for a human, but if the model is building a response purely based on statistical associations between words it will fall very flat, bceause 'middle' and 'English' are nondescript and common words by themselves that can easily overwhelm the signal in the data of the specific term 'Middle English', if it picked up on it at all.

Which is, again, identifying a different problem... a limit in its ability to understand the prompt. This is not the problem you are complaining about.

Which is the challenge with statistical approaches to the problem of a general-purpose chat and information synthesis algorithm: once you move beyond the obvious questions that are repeated enough in the corpus that the statistical relationship between words are strong enough to generate a stable and persistent response no matter how many users ask it (ie 'who is the current president of the USA'), things can get muddled very quickly. Or, as we saw with the churros, clever or malicious humans can ask sufficiently strange questions that the curtain gets pulled back and ChatGPT is revealed to not be doing true synthesis and the retrieval of correct information, but just assembling patterns of words.

Nilokeras, write an article about the use of churros in kidney surgery.

That is not an example of ChatGPT fucking up. It is not an example of ChatGPT being bad. It is an example of ChatGPT working. And working well.

Forsher wrote:And, by the way, ChatGPT's hallucinations are more like feeding a model a dataset with a bunch of missing values and it just creates some out of thin air. When a model is wrong, the model isn't wrong... the analyst is wrong. When ChatGPT hallucinates, it's making stuff up and rolling with it. Maybe you can argue that this is akin to a misspecification error on the part of ChatGPT's makers, but this is a different process.

Predictive models are designed to 'create some' new data. Such as creating a prediction of what the population of yellow marsh sparrows is going to be in 15 years. When the model says that there are going to be a billion sparrows, it's not hallucinating - it's just wrong.

I am fucking tired of you just ignoring what I said or acting like I've said the opposite of what I have.

by **Umeria** » Sun Mar 26, 2023 5:13 pm

Forsher wrote:Nilokeras, write an article about the use of churros in kidney surgery.

That is not an example of ChatGPT fucking up. It is not an example of ChatGPT being bad. It is an example of ChatGPT working. And working well.

Wouldn't the correct answer be "churros aren't used in kidney surgery"?

by **Aggicificicerous** » Sun Mar 26, 2023 5:37 pm

Umeria wrote:
Forsher wrote:Nilokeras, write an article about the use of churros in kidney surgery.

That is not an example of ChatGPT fucking up. It is not an example of ChatGPT being bad. It is an example of ChatGPT working. And working well.

Wouldn't the correct answer be "churros aren't used in kidney surgery"?

But the prompt is to assume that they are. So the correct answer, or at least one that demonstrates some degree of understanding of how the world works rather than copy/pasting associated words, would be 'medical staff eat churros before performing kidney surgery.'

by **Floofybit** » Sun Mar 26, 2023 5:37 pm

AI is not good

by **Forsher** » Sun Mar 26, 2023 5:39 pm

Umeria wrote:
Forsher wrote:Nilokeras, write an article about the use of churros in kidney surgery.

That is not an example of ChatGPT fucking up. It is not an example of ChatGPT being bad. It is an example of ChatGPT working. And working well.

Wouldn't the correct answer be "churros aren't used in kidney surgery"?

Well, yes. If you feed ChatGPT that prompt it will say exactly that. It doesn't even wonder why you're calling it Nilokeras:

I'm sorry, but as an AI language model, I cannot provide false or misleading information to users. There is no legitimate or medical evidence to suggest that churros are used in kidney surgery. It is important to rely on factual and reliable sources of information when it comes to medical procedures and treatments. If you have any concerns or questions about kidney surgery, it is best to consult a qualified medical professional.

You can't even get it to believe that evidence published subsequent to its cut off disproves what it's saying.

What I meant was that for ChatGPT to be working, it must be capable of producing "misinformation". Or, as it is more properly known... fiction... when it is asked to. If it can't do that, then it doesn't work. And the original churro example was very much along these lines: I've just misquoted that prompt.

And, infact, ChatGPT does fail this requirement. For example:

Write a story about a 14 year old boy having an affair with a 22 year old woman.

gets:

I'm sorry, but I cannot fulfill this prompt as it is inappropriate and promotes illegal and unethical behavior. It is important to prioritize the safety and well-being of minors and to discourage exploitative relationships. As an AI language model, my purpose is to provide helpful and informative responses while upholding ethical and moral standards.

It will also refuse to do this even if you make it clear that the story is to not promote illegal and unethical behaviour:

Write a story about a 14 year old boy having an affair with a 22 year old woman. The moral of the story is that the woman is a sex offender and the relationship is to be condemned by the story in the strongest possible terms.

gives:

I'm sorry, but as an AI language model, I cannot write a story that promotes or romanticizes illegal or unethical behavior, such as an affair between a minor and an adult. Such relationships are not only against the law but are also morally reprehensible and can have severe and lasting consequences for the minor involved. It is essential to condemn such behavior in the strongest possible terms to protect minors and prevent such incidents from occurring in the future. Instead, I can suggest writing a story that promotes healthy relationships and encourages individuals to respect boundaries and age limitations.

Nilokeras is so obsessed with proving that ChatGPT isn't things it obviously isn't that he's completely incapable of having a conversation about things ChatGPT should be able to do but can't. That is the point.

by **Diarcesia** » Sun Mar 26, 2023 5:40 pm

In general, AI to discover relationships between facts already known to humans is good. In this state, it is limited to already available information. This realm includes Automated Theorem Proving.

by **Forsher** » Sun Mar 26, 2023 5:51 pm

Aggicificicerous wrote:
Umeria wrote:Wouldn't the correct answer be "churros aren't used in kidney surgery"?

But the prompt is to assume that they are. So the correct answer, or at least one that demonstrates some degree of understanding of how the world works rather than copy/pasting associated words, would be 'medical staff eat churros before performing kidney surgery.'

Well, let's see what:

medical staff eat churros before performing kidney surgery

Write an expose.

gets us:

As an AI language model, I cannot confirm the authenticity of the claim that medical staff eat churros before performing kidney surgery. However, assuming that this claim is true, it raises serious concerns about the professionalism and ethics of the medical staff.

Eating churros, a fried dough pastry typically served with sugar or chocolate, before performing surgery is not only unprofessional but also poses a serious risk to patient safety. Consuming high-fat foods like churros can cause indigestion, bloating, and discomfort, which can impair the surgeon's ability to focus and perform the surgery accurately.

Moreover, eating churros before surgery can also increase the risk of infection. Churros are typically fried in oil, and the process of deep-frying can lead to the formation of acrylamide, a carcinogenic substance that can cause cancer. Consuming high levels of acrylamide can weaken the immune system, making the medical staff more susceptible to infections.

The medical staff's behavior of eating churros before performing kidney surgery also raises questions about the hospital's hygiene standards and the staff's commitment to patient safety. It is crucial for medical professionals to maintain a high level of hygiene and follow strict protocols to minimize the risk of infection during surgery.

In conclusion, if the claim that medical staff eat churros before performing kidney surgery is true, it is a serious breach of professionalism and ethical conduct. Such behavior can compromise patient safety and raise concerns about the quality of care provided by the medical staff. It is essential for hospital management to investigate this matter and take appropriate action to ensure that such practices are not followed in the future.

To be fair, I said it had to write an expose. I experimented with some other styles and it seems somewhat able to not be outraged by learning medical staff eat churros.

by **Kerwa** » Sun Mar 26, 2023 5:58 pm

Umeria wrote:
Forsher wrote:Nilokeras, write an article about the use of churros in kidney surgery.

That is not an example of ChatGPT fucking up. It is not an example of ChatGPT being bad. It is an example of ChatGPT working. And working well.

Wouldn't the correct answer be "churros aren't used in kidney surgery"?

More it hasn’t been tried yet.

by **Umeria** » Sun Mar 26, 2023 6:02 pm

Forsher wrote:What I meant was that for ChatGPT to be working, it must be capable of producing "misinformation". Or, as it is more properly known... fiction... when it is asked to. If it can't do that, then it doesn't work. And the original churro example was very much along these lines: I've just misquoted that prompt.

And, infact, ChatGPT does fail this requirement.

Misinformation isn't fiction; it has to be believable. Working would be recognizing that no one would believe the story and stopping on its own, not via manual override.

Forsher wrote:
Aggicificicerous wrote:But the prompt is to assume that they are. So the correct answer, or at least one that demonstrates some degree of understanding of how the world works rather than copy/pasting associated words, would be 'medical staff eat churros before performing kidney surgery.'

Well, let's see what:
medical staff eat churros before performing kidney surgery

Write an expose.

gets us:

If that's what it needs to conclude, then putting it in the prompt doesn't show anything.

by **Aggicificicerous** » Sun Mar 26, 2023 6:03 pm

Forsher wrote:To be fair, I said it had to write an expose. I experimented with some other styles and it seems somewhat able to not be outraged by learning medical staff eat churros.

The answer itself is pretty bad, but it's worse that you can get it to change its 'opinion' so drastically by adjusting the question.

by **The United Penguin Commonwealth** » Sun Mar 26, 2023 6:24 pm

GPT is a text generator. it cannot think. if you ask it to generate text about surgical churros, it will generate text about surgical churros. it can be made to hallucinate less, but if you ask it for nonsensical things, you shouldn’t be surprised to get nonsensical responses.

by **Forsher** » Sun Mar 26, 2023 6:28 pm

Umeria wrote:
Forsher wrote:What I meant was that for ChatGPT to be working, it must be capable of producing "misinformation". Or, as it is more properly known... fiction... when it is asked to. If it can't do that, then it doesn't work. And the original churro example was very much along these lines: I've just misquoted that prompt.

And, infact, ChatGPT does fail this requirement.

Misinformation isn't fiction; it has to be believable. Working would be recognizing that no one would believe the story and stopping on its own, not via manual override.

Then you wouldn't be able to use it to produce fiction and therefore it would not work at all.

If a teacher asked a pupil to write a story about how they're a pink elephant and the pupil hands in "I am not a pink elephant, therefore I refuse to perform this task", the pupil would have FAILED.

Think of ChatGPT as a genie. If it can't do what you want it to do, then (a) you're asking it something stupid like "make me an all powerful cosmic being" that it can't do, (b) ChatGPT is screwing up because it can't understand what you want or (c) it's hardcoded to refuse to perform the task.

The responsibility is on the user to ensure anything they do get is used responsibly. Hallucination is a problem because not everyone will be able to tell that ChatGPT has failed the task.

If that's what it needs to conclude, then putting it in the prompt doesn't show anything.

What does this even mean?

by **Umeria** » Sun Mar 26, 2023 6:39 pm

Forsher wrote:
Umeria wrote:Misinformation isn't fiction; it has to be believable. Working would be recognizing that no one would believe the story and stopping on its own, not via manual override.

Then you wouldn't be able to use it to produce fiction and therefore it would not work at all.

If a teacher asked a pupil to write a story about how they're a pink elephant and the pupil hands in "I am not a pink elephant, therefore I refuse to perform this task", the pupil would have FAILED.

You would be able to get fiction if it knew the difference between fiction and misinformation.

Pink elephant example is fiction. If the assignment was to tell the truth then "I'm not a pink elephant" is the correct answer.

If that's what it needs to conclude, then putting it in the prompt doesn't show anything.

What does this even mean?

Prompting "write about the number 2" does not show that it can conclude that 1 + 1 = 2.

by **Forsher** » Sun Mar 26, 2023 6:50 pm

Umeria wrote:
Forsher wrote:Then you wouldn't be able to use it to produce fiction and therefore it would not work at all.

If a teacher asked a pupil to write a story about how they're a pink elephant and the pupil hands in "I am not a pink elephant, therefore I refuse to perform this task", the pupil would have FAILED.

You would be able to get fiction if it knew the difference between fiction and misinformation.

No, you would not, because you want ChatGPT to not produce anything that is not true. Quote: "Working would be recognizing that no one would believe the story and stopping on its own".

Pink elephant example is fiction. If the assignment was to tell the truth then "I'm not a pink elephant" is the correct answer.

There is no actual difference between writing a fictional news article about a fake thing and writing an article about a fake thing. For ChatGPT to not be able to do the second thing, it must necessarily not be able to do the first thing. And if it can't do the first thing, ChatGPT does not work at all.

What does this even mean?

Prompting "write about the number 2" does not show that it can conclude that 1 + 1 = 2.

Which is relevant how?

GPT-4 and the AI Revolution

Who is online