The AI Thread

It will be very surprising if it is based on degree of complexity, though. Since even very basic organisms are at least aware of their environment (I only mean they automatically form a symbol of it, however limited or personal), without being self-aware or anything higher.
It is probably some stupidly complex limit, based on various measures of "complexity". The point is we may well not need to understand it to create it. For whatever value of "it" we choose.
 
It is probably some stupidly complex limit, based on various measures of "complexity". The point is we may well not need to understand it to create it. For whatever value of "it" we choose.
Just as long as this isn't intellectually tied to views one can find online of (some) people extrapolating that adversarial networks discover futuristic math that is perpetually non-recoverable to humans :)
But, at least intuitively, the idea that massive increase in complexity can bring about sentience/similar, is on the idealistic side of things. (not that I have never toyed with the idea, and since you support it, clearly your own intuition is in favor of it)

Personally, I don't think math is tied to sentience or other such. And I also am of the view that math is ultimately non-cosmic (though clearly related to various beings on this planet). In philosophy there is a long debate about whether properties of human-identifiable phenomena have striking math qualities due to the actual objects themselves or merely the mechanism (eg human) that picks them up. A game of mirrors, so to speak: if object x is picked up by you as a plank, and you can walk, you can move over it, but that doesn't say anything about object x itself- just about your interaction with what you pick up of it (metaphor for stuff up to equations allowing for sending ships to space).
 
Last edited:
But, at least intuitively, the idea that massive increase in complexity can bring about sentience/similar, is on the idealistic side of things. (not that I have never toyed with the idea, and since you support it, clearly your own intuition is in favor of it)
I suspect I differ from you less in the capacity of machines than the capacity of people.
 
I take Samson's idea about the tech itself teaching us things, as primarily operable on the level of the tech (non-intellegently) picking up how software or hardware arrangements help or limit its computational power. But not as anything sentient.

How do you (we) define sentient? Why do you think sentience is pertinent? Is getting drunk and falling asleep on top of 30 feet tower without railings to hold on to sentient enough? (Just kidding)

Wiki: Sentience is the capacity of a being to experience feelings and sensations. The word was first coined by philosophers in the 1630s for the concept of an ability to feel, derived from Latin sentientem, to distinguish it from the ability to think.

You give AI robot sensors (to experience sensations) give it work at AI factory, teach it ethics, and tell it to take care of itself. (So we don’t have to) Give it humor. A bunk to live in. And an agenda. I guess the major thing missing at this point is free will to implement it’s own design, once AI comprehends grand design.

Somewhere in the middle of that process it becomes thinking (and even sentient!) matter.
 
If people start putting out their improvements under the GPL that will screw the companies, and the GDPR could hamstring companies while leaving individuals free to utilise LLMs that process personal data.

Just because you are using open source or are not a company, does not mean you can process personal data as you like. If you are processing data protected by the GDPR, you are navigating a legal minefield. And you don't even have legal department to advise you what is allowed and what not.
 
How do you (we) define sentient? Why do you think sentience is pertinent? Is getting drunk and falling asleep on top of 30 feet tower without railings to hold on to sentient enough? (Just kidding)

Wiki: Sentience is the capacity of a being to experience feelings and sensations. The word was first coined by philosophers in the 1630s for the concept of an ability to feel, derived from Latin sentientem, to distinguish it from the ability to think.

You give AI robot sensors (to experience sensations) give it work at AI factory, teach it ethics, and tell it to take care of itself. (So we don’t have to) Give it humor. A bunk to live in. And an agenda. I guess the major thing missing at this point is free will to implement it’s own design, once AI comprehends grand design.

Somewhere in the middle of that process it becomes thinking (and even sentient!) matter.
The above aren't already achieved, though. To use a famous (or infamous) example, not all would agree that a thermostat is sentient, despite having a sensor.
Even not at all related to current "open AI" stuff, have sensors or software parallels to that. Any software can have your computer treat sprites on the screen as stuff, but obviously that doesn't mean those sprites are picked up as a human picks them up.
Yet it allows for an illusion or suspension of disbelief. The issue is that a vastly more complicated (including some types of qualitative difference) such illusion, still would be an illusion.
 
Last edited:
Just because you are using open source or are not a company, does not mean you can process personal data as you like. If you are processing data protected by the GDPR, you are navigating a legal minefield. And you don't even have legal department to advise you what is allowed and what not.
Indeed. But if you are using personal data in a "purely household capacity" then the GDPR does not apply to you. If you are able to ask an AI "tell me about <Persons name> at <persons institution>" of a LLM and get some information back then I think you are processing personal data by running the LMM. An individual can do that on their own machine as part of their personal life. A company cannot do it at all. That makes an incredibly powerful tool unusable by companies IMO.
 
The issue is that a vastly more complicated (including some types of qualitative difference) such illusion, still would be an illusion.

Is the virtual world we can meet in and interact with an illusion? Or is it materially real, where vast material resources are spent to create it, so that two material beings interact with one another and perceive each other and the world? If VR worlds are illusions, does it nean we are hallucinating?

Any software can have your computer treat sprites on the screen as stuff, but obviously that doesn't mean those sprites are picked up as a human picks them up.

What’s the difference? We use sensors to process info. AI can use sensors to process info. Maybe ‘sentience’ is not the word you’re looking for…

And, of course, we already know perfectly well that bio matter does have this effect.

What kind of effect do you have in mind?
 
Indeed. But if you are using personal data in a "purely household capacity" then the GDPR does not apply to you. If you are able to ask an AI "tell me about <Persons name> at <persons institution>" of a LLM and get some information back then I think you are processing personal data by running the LMM. An individual can do that on their own machine as part of their personal life. A company cannot do it at all. That makes an incredibly powerful tool unusable by companies IMO.

But how does your LLM know about <persons name>? You would have to train it with personal data, otherwise it will just spin a fairy tale about that person. That is likely going to exceed "purely household capacity" and you might be doing something illegal. And where would you even legally get enough personal data to train an LLM?

It is also an exaggeration that companies cannot do it at all. They can within the limits set by the GDPR.
 
But how does your LLM know about <persons name>? You would have to train it with personal data, otherwise it will just spin a fairy tale about that person. That is likely going to exceed "purely household capacity" and you might be doing something illegal. And where would you even legally get enough personal data to train an LLM?

It is also an exaggeration that companies cannot do it at all. They can within the limits set by the GDPR.
You can now download mpt-7b under the apache license. You can ask it this question of anyone with a web presence such as peer reviewed literature. You can try it online for yourself, expect it to be mostly wrong but enough of an element of truth to demonstrate processing personal data. The legality of creating it is irrelevant, it exists and it is legal to download.

Sure, a company could formulate a legitimate interest defense in doing this, it would be a bit of a stretch and the EU have started to show an interest in fineing companies. We shall see how it works out, I certainly do not have any high expectations of being right but it is possible.
 
You can now download mpt-7b under the apache license. You can ask it this question of anyone with a web presence such as peer reviewed literature. You can try it online for yourself, expect it to be mostly wrong but enough of an element of truth to demonstrate processing personal data. The legality of creating it is irrelevant, it exists and it is legal to download.

Sure, a company could formulate a legitimate interest defense in doing this, it would be a bit of a stretch and the EU have started to show an interest in fineing companies. We shall see how it works out, I certainly do not have any high expectations of being right but it is possible.

So I went ahead and asked about a nobel prize winner, my thesis advisor and myself. With the nobel prize winner, it at least told me he had won a nobel prize (but chemistry instead of physics), but the rest of the information was wrong. The answers about me and my thesis advisor were just fairy tales that had not even the slightes connection to reality. In this experiment I have not gained a single piece of correct personal data.

For comparison, I asked ChatGPT the same questions. About the nobel prize winner and my thesis advisor it was able to answer with correct personal data. About myself it said it does not know anything about me.

I am not convinced mpt-7b has processed any personal data and it has demonstrably less personal data than ChatGPT.
 
So I went ahead and asked about a nobel prize winner, my thesis advisor and myself. With the nobel prize winner, it at least told me he had won a nobel prize (but chemistry instead of physics), but the rest of the information was wrong. The answers about me and my thesis advisor were just fairy tales that had not even the slightes connection to reality. In this experiment I have not gained a single piece of correct personal data.

For comparison, I asked ChatGPT the same questions. About the nobel prize winner and my thesis advisor it was able to answer with correct personal data. About myself it said it does not know anything about me.

I am not convinced mpt-7b has processed any personal data and it has less personal data than ChatGPT.
For me it got one of the topics right (and then went off into fantasy land and described me as probably worthy of two nobel prizes). It could easily have randomly guessed topics from my institution rather than me. It does not need to do much to breach the GDPR, I would be worried about building a business off either mpt-7b or ChatGPT for GDPR reasons.
 
I was curious enough to try this out as well.

For mpt-7b:
It got most of the info for the Nobel prize winner correct, except it for some reason decided to award him a second Nobel prize a few years later that is entirely fictional.

It did manage to correctly identify my thesis advisor's area of work, but then it completely went off into fantasy land. According to it, my thesis advisor also won a Nobel prize, shared with two people in completely unrelated fields, one of whom had been dead for several years at the time of award. It also supplied a short account of his career which was wholly made up, although it did use the names of real people and institutions madlib style.

As for my own work, it was more or less in the right area of research, to my surprise, but then it was off to fantasy land again. Apparently I have been running a research group at Oxford university since I was 3 years old, and I invented protein NMR (although it didn't see fit to award me the Nobel prize for that :lol: )

mpt-7b appears to be an order of magnitude worse than ChatGPT when it comes to making stuff up. ChatGPT's answers turned out rather boring here. The Nobel prize winner it just supplied a paraphrase of his wikipedia entry. Curiously it couldn't come up with anything on my thesis advisor, who isn't that obscure, but it refrained from making up anything at least.

I'm not a lawyer, so I'm not going to pick over what is and isn't allowed under the GDPR too much. I'm not seeing anything that couldn't be assembled from a bit of googling and looking at publicly available papers. Plus it's mixed in with complete fiction, so unless you already know the true info it's difficult to extract anything real.
 
It does not need to do much to breach the GDPR, I would be worried about building a business off either mpt-7b or ChatGPT for GDPR reasons.

This is exactly the problem. It is a big legal minefield. The big companies have the resources to navigate this (lawyers, lobbyists and cash to pay fines), but smaller companies will struggle. And open source solutions usually need some kind of ecosystem around it to thrive.

Incidentally, this is also why I believe the big companies are pushing the narrative of dangerous AI so much. They want regulations, because they know they can cope with it, but others might not. They don't want random people to easily build AI models.
 
This is exactly the problem. It is a big legal minefield. The big companies have the resources to navigate this (lawyers, lobbyists and cash to pay fines), but smaller companies will struggle. And open source solutions usually need some kind of ecosystem around it to thrive.

Incidentally, this is also why I believe the big companies are pushing the narrative of dangerous AI so much. They want regulations, because they know they can cope with it, but others might not. They don't want random people to easily build AI models.
I quite agree with the latter, I just hope hope the big companies are not allowed that level of intrusive PII processing. We shall see though.

All the more reason to support decentralised solutions.
 
This is exactly the problem. It is a big legal minefield. The big companies have the resources to navigate this (lawyers, lobbyists and cash to pay fines), but smaller companies will struggle. And open source solutions usually need some kind of ecosystem around it to thrive.

Incidentally, this is also why I believe the big companies are pushing the narrative of dangerous AI so much. They want regulations, because they know they can cope with it, but others might not. They don't want random people to easily build AI models.

I was wondering about that one too during past few months. On the one hand, Musk demands to halt AI development until legal framework is ready. The other hand of Musk laments he missed the AI revolution, but wants to catch up NOW by funding an open source alternative. It’s easier to reconcile both positions, once we assume open source means “fairly open but under my full control”.
 
I was wondering about that one too during past few months. On the one hand, Musk demands to halt AI development until legal framework is ready. The other hand of Musk laments he missed the AI revolution, but wants to catch up NOW by funding an open source alternative. It’s easier to reconcile both positions, once we assume open source means “fairly open but under my full control”.

I understood this as "halt AI development until I have figured out how to catch up"

It is also a bit muddy what "open source" means when it comes to these LLMs. Typically with open source, you would get the code and would provide the input data and the computational resources yourself. However, with these LLMs, the code to generate the model is not that useful by itself. You would need vast amounts of data and computational resources to compute the model parameters. Those parameters are not code in the classical sense, so I am not sure how this is going to play out.

Maybe "open source" will mean, that the source code to generate a state-of-the-art LLM is freely available, but unless you have a Petabyte of data lying around and 10 million dollar worth of servers, you will not be able to use it. Or it will be more of an "open model" approach, where pre-trained models is freely available and there is a community updating them (though with the amount of resources required, the question is: who pays for it?)
 
Who would have thought the Turing test would be so comprehensively beaten for people but not machines?

Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools​
Contrary to AI, humans have more complex paragraph structures, varying in the number of sentences and total words per paragraph, as well as fluctuating sentence length. Preferences in punctuation marks and vocabulary are also a giveaway. For example, scientists gravitate towards words like "however," "but" and "although," while ChatGPT often uses "others" and "researchers" in writing. The team tallied 20 characteristics for the model to look out for.​
When tested, the model aced a 100% accuracy rate at weeding out AI-generated full perspective articles from those written by humans. For identifying individual paragraphs within the article, the model had an accuracy rate of 92%. The research team's model also outperformed an available AI text detector on the market by a wide margin on similar tests.​
"The first thing people want to know when they hear about the research is 'Can I use this to tell if my students actually wrote their paper?'" said Desaire. While the model is highly skilled at distinguishing between AI and scientists, Desaire says it was not designed to catch AI-generated student essays for educators. However, she notes that people can easily replicate their methods to build models for their own purposes.​
“We tried hard to create an accessible method so that with little guidance, even high school students could build an AI detector for different types of writing,” says first author Heather Desaire, a professor at the University of Kansas. “There is a need to address AI writing, and people don’t need a computer science degree to contribute to this field.”​

If you train an AI on the internet it writes stuff that sounds like the internet, which is mostly people talking about stuff they do not really know about. Shockingly people who actually know what they are talking about sounds different. All these people who claim that AI will lead to a flood of disinformation may not have noticed this. I wonder if they have noticed that most of the internet people talking about stuff they do not really know about?

Pid7UOW.png


Paper Academic Writeup El Reg Writeup

Spoiler I am sure these people are clever and all but they write awful code :
This is "Example code used to extract the features from the text data". This is not the way to do text processing. They do not say, and the file has the extension .txt, but I think this is R, which I love but attracts some of the cleverest bad coders.
Code:
#Final Features
#Note:  You need to additionally define a "ClassKey", one entry per row in your data matrix, which has the class labels for the training data. 

FractTO<-1:nrow(Mat)
SPP<-1:nrow(Mat)
  for (i in 1:nrow(Mat))
      SPP[i]<-sum(grepl(pattern = ".", Mat[i, ], fixed = TRUE))
      
    for (i in 1:nrow(Mat))
    if (SPP[i]==0) (SPP[i]<-1)
V1<-SPP

ParLength<-1:nrow(Mat)
 for (i in 1:nrow(Mat))
ParLength[i]<-sum(!is.na(Mat[i, ]))
V2<-ParLength

 for (i in 1:nrow(Mat))
    FractTO[i]<-  sum(grepl(pattern = ")", Mat[i, ]))
 VPar<-FractTO
 VPar<-as.numeric(VPar>0)
 V3<-VPar


for (i in 1:nrow(Mat))
 FractTO[i]<-  sum(grepl(pattern = "-", Mat[i, ]))
 Vdash<-FractTO
 Vdash1<-as.numeric(Vdash>0)
V4<-Vdash1


 for (i in 1:nrow(Mat))
 FractTO[i]<-  sum(grepl(pattern = ";", Mat[i, ]))
 VSem<-as.numeric(FractTO>0)

 for (i in 1:nrow(Mat))
 FractTO[i]<-   sum(grepl(pattern = ":", Mat[i, ]))
 VCol<-as.numeric(FractTO>0)

VSemCol<-as.numeric((VCol+VSem)>0)
V5<-VSemCol


for (i in 1:nrow(Mat))
     FractTO[i]<-  sum(grepl(pattern = "\\?", Mat[i, ]))
 VQuest<-FractTO
 VQuest<-as.numeric(VQuest>0)
 V6<-VQuest


 for (i in 1:nrow(Mat))
     FractTO[i]<-  sum(grepl(pattern = "\\'", Mat[i, ]))
 Vapos<-as.numeric(FractTO>0)
V7<-Vapos


SPP2<-1:nrow(Mat)
  for (i in 1:nrow(Mat))
      SPP2[i]<-sum(grepl(pattern = ".", Mat[i, ], fixed = TRUE))     
 Sentence<-which(as.numeric(apply(t(Mat), c(1,2), grepl, pattern = ".", fixed = TRUE))==1)
 for (i in (nrow(Mat)-1):1)  #Note 48 is the number of rows minus 1
     for (j in 1:sum(SPP2)) #Note: 121 is the total number of sentneces -- found by sum(SPP)
         if (Sentence[j]>300*i) (Sentence[j]<-Sentence[j]-300*i)
  Sentence2<-Sentence
 for (i in sum(SPP2):2)
     (ifelse (Sentence[i]>Sentence[i-1], Sentence2[i]<-Sentence[i]-Sentence[i-1], Sentence2[i]<-Sentence[i]))

V8<-1:length(V5)   ##V8 is the standard deviation of the sentence length for the paragraph.
start_idx <- 1 # Index of first unused value.

for (i in seq_along(SPP2)) {
  end_idx <- start_idx + SPP2[i] - 1 # Index of last value to use.
  vals <- Sentence2[start_idx:end_idx] # Subset of values from Sentence2 to use
  std_dev <- sd(vals) # Calculate standard deviation of subset
  V8[i] <- std_dev # Store standard deviation in V8
  start_idx <- end_idx + 1 # Update index of first unused value in Sentence2
}
V8[is.na(V8)]<-0

 Sentence3<-Sentence2
 for (i in 1:sum(SPP2))
     Sentence3[i]<-Sentence2[i+1]-Sentence2[i]
    Sentence3[sum(SPP2)]<-Sentence2[sum(SPP2)]-Sentence2[(sum(SPP2)-1)]

V9<-1:length(V5)
start_idx <- 1 # Index of first unused value
 
 for (i in seq_along(SPP2)) {
     end_idx <- start_idx + SPP2[i] - 1 # Index of last value to use
     vals <- Sentence3[start_idx:end_idx] # Subset of values from Sentence3 to use
     std_dev <- mean(abs(vals)) # Calculate abs value of median of subset
     V9[i] <- std_dev # Store standard deviation in V9
     start_idx <- end_idx + 1 # Update index of first unused value in Sentence3
 }

V10<-V7     #V10 is a yes/no answer to whether there is a sentence with <11 words in the parag.
 
start_idx <- 1 # Index of first unused value
 for (i in seq_along(SPP2)) {
     end_idx <- start_idx + SPP2[i] - 1 # Index of last value to use in Sentence2
     vals <- Sentence2[start_idx:end_idx] # Subset of values from Sentence2 to use
     result <- ifelse(any(vals < 11), 0, 1)  #a true/false test if there is a sentence w <11 words
     V10[i] <- result # Store answer in V10
     start_idx <- end_idx + 1 # Update index of first unused value in Sentence2
 }


V11<-V7     #V11 is a yes/no answer to whether there is a sentence with >34 words in the parag.
 start_idx <- 1 # Index of first unused value
 
 for (i in seq_along(SPP2)) {
     end_idx <- start_idx + SPP2[i] - 1 # Index of last value to use in Sentence2
     vals <- Sentence2[start_idx:end_idx] # Subset of values from Sentence2 to use
      result <- ifelse(any(vals > 34), 0, 1)  #a true/false test if there is a sentence w >34 words
     V11[i] <- result # Store answer in V11
     start_idx <- end_idx + 1 # Update index of first unused value in Sentence2
 }


 for (i in 1:nrow(Mat))
     FractTO[i]<-   length(which(tolower(iconv(Mat[i, ], to="ASCII//TRANSLIT")) == "although"))

 VAlth<-as.numeric(FractTO>0)
V12<-VAlth

 for (i in 1:nrow(Mat))
 FractTO[i]<-  sum(grepl(pattern = "However", Mat[i, ]))
 VHow<-FractTO
V13<-VHow

 for (i in 1:nrow(Mat))
FractTO[i]<-   length(which(tolower(iconv(Mat[i, ], to="ASCII//TRANSLIT")) == "but"))
 VBut<-as.numeric(FractTO>0)
V14<-VBut

 for (i in 1:nrow(Mat))
 FractTO[i]<-   length(which(tolower(iconv(Mat[i, ], to="ASCII//TRANSLIT")) == "because"))
VBec<-as.numeric(FractTO>0)
V15<-VBec


 for (i in 1:nrow(Mat))
FractTO[i]<-   length(which(tolower(iconv(Mat[i, ], to="ASCII//TRANSLIT")) == "this"))
 Vthis<-as.numeric(FractTO>0)
 V16<-Vthis

###Note: these capture more than "hers" Ex: others, researchers. 
for (i in 1:nrow(Mat))
     FractTO[i]<-  as.numeric(sum(grepl(pattern = "hers", Mat[i, ])) >0)
 Vhers<-FractTO
 V17<-Vhers

for (i in 1:nrow(Mat))
     FractTO[i]<-  as.numeric((sum(grepl(pattern = "[0-9]", Mat[i, ])))>0)
 VNums<-as.numeric(FractTO)
 V18<-VNums

library(stringr)
for (i in 1:nrow(Mat))
(
     FractTO[i]<- length(which(str_count(Mat[i, ], "[A-Z]")>0))/SPP[i]
 )
 V19<-as.numeric(FractTO>2)

 for (i in 1:nrow(Mat))
     FractTO[i]<-   length(which(Mat[i, ] == "et"))
 Vet<-as.numeric(FractTO>0)
V20<-Vet


TestMat<-matrix(0, nrow(Mat), 21)
TestMat[,1 ]<-ClassKey  #The class assignments need to be provided as "ClassKey"
TestMat[,2 ]<-V1
TestMat[,3 ]<-V2
TestMat[,4 ]<-V3
TestMat[,5 ]<-V4
TestMat[,6 ]<-V5
TestMat[,7 ]<-V6
TestMat[,8 ]<-V7
TestMat[,9 ]<-V8
TestMat[,10 ]<-V9
TestMat[,11 ]<-V10
TestMat[,12 ]<-V11
TestMat[,13 ]<-V12
TestMat[,14 ]<-V13
TestMat[,15 ]<-V14
TestMat[,16 ]<-V15
TestMat[,17 ]<-V16
TestMat[,18 ]<-V17
TestMat[,19 ]<-V18
TestMat[,20 ]<-V19
TestMat[,21 ]<-V20
 
Last edited:
I find it pretty funny that the most advanced AI model ever created is caught out by XGBoost.
 
Back
Top Bottom