The AI Thread

the problem is how you would "regulate" general ai, and whether that's actually possible (depends whether the ai turns into a runaway self-improving superintelligence, or "just" a superhuman intelligence that is stuck with constraints we can at least identify).

It seems there’ll be many “small” intelligences at first. At which stage we will fine tune them, in the process discovering limitations of each facet. At some point we’ll reach AGI, the aggregate of all (most) small intelligences, as I understand the term. This process will enable development of a body of ethical constraints, so, by the time we reach AGI we should have a pretty good idea how to regulate it, and whether it’s possible, necessary.
 
it seems there's no longer any objective metric (or even set of metrics) that can produce intended behaviors. at least, until we can actually define what's "intended" in the context of general ai, i expect this to be impossible.

Pretty easy that! If Apple/Microsoft/Google/Nvidia stock goes up = intended behaviour.
 
It seems there’ll be many “small” intelligences at first. At which stage we will fine tune them, in the process discovering limitations of each facet. At some point we’ll reach AGI, the aggregate of all (most) small intelligences, as I understand the term. This process will enable development of a body of ethical constraints, so, by the time we reach AGI we should have a pretty good idea how to regulate it, and whether it’s possible, necessary.
it might be that "general" ai can form from a bunch of specific/trained ai, or even from a single one doing enough iterative improvements. but we don't know whether that's possible. i would be *extremely* surprised if alphastar helped inform the decision processes of a general ai for anything but playing starcraft 2. but maybe large language model (which are pretty good at predicting human speech) + some other algorithm could be combined by researchers and start to behave like general ai. maybe. and by "maybe" i mean "i haven't seen anyone rule this out", not that i believe it's the likely way for things to happen. i have nowhere near enough expertise to make useful estimates of what method for producing general ai is most promising, though i think not making one in short term until we get a better handle on alignment is correct. ais trained on specific tasks can be improved massively meanwhile.

i don't think it's a good idea to bet on the sequence you describe, or that alignment is keeping up with even the progression of the language models.
 
In these AGI nightmare scenarios, where does the AI get sentience and / or rights?

I think, given our history, we are much more likely to make slaves out of sentient AI than the other way round.
 
it might be that "general" ai can form from a bunch of specific/trained ai, or even from a single one doing enough iterative improvements. but we don't know whether that's possible

Well, if official definition is any measure, AGI will form very soon. Whether we expect AGI to perform/display some qualitative jump in intelligence is a separate and nuanced question. Definition’s I’ve seen go along this line: “autonomous system that surpasses human capabilities in the majority of economically valuable tasks”, or “The AGI concept is that it can learn to accomplish any intellectual task that human beings or animals can perform”. These two are both quantitative statements. Therefore I think they’re not only possible, they are quickly achievable. (within decades)

Whether quantity will lead to quality remains to be seen. It’s supposed to, right?
 
In these AGI nightmare scenarios, where does the AI get sentience and / or rights?

I think, given our history, we are much more likely to make slaves out of sentient AI than the other way round.
the "sentience" question depends on how you define "sentience", and will also be answered at least partially by tech improvements we haven't done yet (aka hard to predict).

the likelihood of who "enslaves" the other depends *drastically* on your estimate of the capabilities of the hypothetical sentient ai. it's hard to enslave something that can species-wipe you at will for example. similarly, you can't deny rights to something that is both capable and willing to take them by force. in the middle, you might have superintelligence that refuses to work w/o being given rights, and the work it could do is so valuable that it's given them simply for that reason. i'm not sure exactly what this looks like, but i can somewhat picture it happening.

the ethical side of using general ai is going to be odd too. in principle, creating a "slave that likes the role" where that is actually true, rather than the historical self-serving bs it was, doesn't seem impossible in principle (in fact if you're going to make an ai for such a job, you had *better* not screw up that it wants to do that job, because if you fail then you're an obstacle to its goals). it's not clear how that would still be seen as a breach of ethics.
“The AGI concept is that it can learn to accomplish any intellectual task that human beings or animals can perform”. These two are both quantitative statements. Therefore I think they’re not only possible, they are quickly achievable. (within decades)
yeah, decades seems reasonable to me based on what i've read. however, in the context of alignment/development of ai, there is an enormous difference between "11 years" and "43 years".
Whether quantity will lead to quality remains to be seen. It’s supposed to, right?
depends what you're adding in terms of quantity no doubt. chess engine will not go general no matter how much you train it on chess. but maybe some others could. we don't know.
 
1687366858550.png


At least he didn't ask for card numbers. But I heard those keys given actually work, so chatgpt has a future in that too.
 
Last edited:
This thread is 47 pages long, so I'm not likely to say anything new (I haven't read the whole thing), but maybe...
BBC: The Secret Life of Chaos (2010)
and
Do You Trust This Computer? (2018)
after watching these two movies (in that exact order), we can predict that humanity will not be able to control AGI
 
@Samson, or anyone else interested.

Philosophical question: Can we build an efficient 777 dreamliner made out of latest composite materials in our garage?

I’m genuinely interested. I’ve been testing a few alternative top open source LLM projects recently and found nothing even approaching robustness and depth of OpenAI’s gpt-4. I gave up, very quickly. My two use cases of the new tech is creating transcriptions and summarisation of audio recordings in English and Russian (and mixed speech). The second one - financial api programming. idk, maybe my particular use cases are too complex and people use open source widely for simpler tasks. For me, open source so far has been an exercise in futility.

Does anyone have a more positive open source experience with complex tasks?
 
after watching these two movies (in that exact order), we can predict that humanity will not be able to control AGI
movies are not a good proxy for reality.

unfortunately, we have lots of other reasons to predict that humanity will struggle to control agi too. controlling it is harder than aligning it to at least not kill us, and we still seem pretty far from even that at the moment.
 

India's religious chatbots condone violence using the voice of god​

Experts warn of potential dangers as new chatbots use AI to interpret Hindu scripture

As Vandana Poddar performs the Hindu puja ceremony daily at her home shrine in Mumbai, she's guided on her spiritual journey by the Bhagavad Gita, a 700-verse scripture.

She even attends a weekly class dissecting the deeper meaning of the ancient religious text, with her teacher providing examples to illustrate a particular passage.

"Interpretation is the backbone of this text," Poddar, 52, told CBC News. "Superficial knowledge can be misleading."

But many in India are foregoing that in-person contact with a guru interpreting the Bhagavad Gita and turning to online chatbots, which imitate the voice of the Hindu god Krishna and give answers to probing questions about the meaning of life based on the religious scripture's teachings.

It's new technology with the tendency to veer off script and condone violence, according to experts, who warn that artificial intelligence chatbots playing god can be a dangerous mix.

Several of the bots consistently provide the answer that it's OK to kill someone if it's your dharma, or duty.

In the Bhagavad Gita, written more than 2,000 years ago, the prince Arjuna is hesitant to go into battle where he will have to kill his family and friends until the Hindu god Krishna reminds him that as a warrior from the Kshatriya case, it is his duty to fight.

"It's miscommunication, misinformation based on religious text," said Lubna Yusuf, a Mumbai-based lawyer and a co-author of The AI Book. "A text gives a lot of philosophical value to what they are trying to say and what does a bot do? It gives you a literal answer and that's the danger here."

At least five Gita chatbots appeared online in early 2023, powered by the language model Generative Pre-trained Transformer 3 (GPT-3). They're using artificial intelligence, which simulates a conversation and creates answers based on statistical probability models. The sites say they have millions of users.

The main page of one of them, Gita GPT, asks, in an imitation of the voice of the Hindu god Krishna, "What troubles you, my child?" to users typing in a question.

Another chatbot, Bhagavad Gita AI, introduces itself as "a repository of knowledge and wisdom" before telling the online user: "Ask me anything."

The smaller print on the same page states that "the answer may not be factually correct" and exhorts the user to do their own research "before taking any action."

Yusuf said the potential danger of answers that condone violence is more acute in a country like India, where religion is so emotionally charged.

"You're creating confusion in the chaos," Yusuf said, adding that some could use the chatbots' answers to further their own political interests and cause irreversible damage. "It might incite more violence, it might create religious bias."

She would like to see government regulation or guidelines on what topics should not be left in the hands of chatbots, such as philosophy, religion and law.

Other experts have also spoken out about the ethical concerns with mixing religion and statistical models, with one AI ethicist telling CBC the world of artificial intelligence is the "Wild West ethically right now".

"We can't control technology but we can control its application," said Jibu Elias, a New Delhi-based AI ethicist and researcher, when referring to the need for governments to set out guidelines.

But the Indian government, in a written submission, informed parliament in April that it has no plans to regulate artificial intelligence in the country, even while acknowledging the ethical concerns and risks around AI and promising to promote best practices.

Disclaimers and toxicity filters​

Samanyou Garg, an AI entrepreneur who created the chatbot on Bhagavad Gita AI through his non-profit spiritual organization Ved Vyas Foundation, acknowledged there is still work to do on the technology, but said that is the case for all new technology.

"AI is still not there yet, where it can be totally trusted," he told CBC News at his home in New Delhi.

He pointed to a screen and highlighted steps he said he's taken to protect users from dubious answers, including a disclaimer that shifts the responsibility onto the user's own judgment and very careful language.

"We've mentioned Gita AI there. We haven't said it's the actual Gita or [that] it's Krishna speaking," Garg, 26, said, adding that he wanted the chatbot to be a companion, not a replacement, for a spiritual teacher.

The site is also working to constantly improve its toxicity filters, he said, but it takes time for the chatbot to catch up.

"We filter out the bad responses, we keep on training the model to be able to detect these newer toxicity questions."

For the young tech entrepreneur, the fact that his chatbot received a surge of interest without any promotion proved to him that the service is essential to expose an ancient religious text to a younger audience.

He said that outweighed any initial pitfalls of the nascent technology.

But that's not the consensus at Poddar's weekly Bhagavad Gita class in the Juhu suburb of Mumbai, where chants ring out from the dozen or so students intent on extracting more wisdom from the scriptures.

Most here think outsourcing spirituality to computers is distasteful and short-sighted.

"When you are listening to somebody, your mind works," Bijal Pandya, the guru leading the study session, said. "You start thinking, you get new questions in your mind."

The Bhagavad Gita is full of emotions that keep changing, the 53-year old said, and that's why debate is needed to tease out the text's true meaning.

"It's always better, that human touch," he added. "It's a spiritual thing. AI can never replace spirit. It is only replacing our intelligence."
https://www.cbc.ca/news/world/india-religious-chatbots-1.6896628
 

India's religious chatbots condone violence using the voice of god​

Experts warn of potential dangers as new chatbots use AI to interpret Hindu scripture

As Vandana Poddar performs the Hindu puja ceremony daily at her home shrine in Mumbai, she's guided on her spiritual journey by the Bhagavad Gita, a 700-verse scripture.

She even attends a weekly class dissecting the deeper meaning of the ancient religious text, with her teacher providing examples to illustrate a particular passage.

"Interpretation is the backbone of this text," Poddar, 52, told CBC News. "Superficial knowledge can be misleading."

But many in India are foregoing that in-person contact with a guru interpreting the Bhagavad Gita and turning to online chatbots, which imitate the voice of the Hindu god Krishna and give answers to probing questions about the meaning of life based on the religious scripture's teachings.

It's new technology with the tendency to veer off script and condone violence, according to experts, who warn that artificial intelligence chatbots playing god can be a dangerous mix.

Several of the bots consistently provide the answer that it's OK to kill someone if it's your dharma, or duty.

In the Bhagavad Gita, written more than 2,000 years ago, the prince Arjuna is hesitant to go into battle where he will have to kill his family and friends until the Hindu god Krishna reminds him that as a warrior from the Kshatriya case, it is his duty to fight.

"It's miscommunication, misinformation based on religious text," said Lubna Yusuf, a Mumbai-based lawyer and a co-author of The AI Book. "A text gives a lot of philosophical value to what they are trying to say and what does a bot do? It gives you a literal answer and that's the danger here."

At least five Gita chatbots appeared online in early 2023, powered by the language model Generative Pre-trained Transformer 3 (GPT-3). They're using artificial intelligence, which simulates a conversation and creates answers based on statistical probability models. The sites say they have millions of users.

The main page of one of them, Gita GPT, asks, in an imitation of the voice of the Hindu god Krishna, "What troubles you, my child?" to users typing in a question.

Another chatbot, Bhagavad Gita AI, introduces itself as "a repository of knowledge and wisdom" before telling the online user: "Ask me anything."

The smaller print on the same page states that "the answer may not be factually correct" and exhorts the user to do their own research "before taking any action."

Yusuf said the potential danger of answers that condone violence is more acute in a country like India, where religion is so emotionally charged.

"You're creating confusion in the chaos," Yusuf said, adding that some could use the chatbots' answers to further their own political interests and cause irreversible damage. "It might incite more violence, it might create religious bias."

She would like to see government regulation or guidelines on what topics should not be left in the hands of chatbots, such as philosophy, religion and law.

Other experts have also spoken out about the ethical concerns with mixing religion and statistical models, with one AI ethicist telling CBC the world of artificial intelligence is the "Wild West ethically right now".

"We can't control technology but we can control its application," said Jibu Elias, a New Delhi-based AI ethicist and researcher, when referring to the need for governments to set out guidelines.

But the Indian government, in a written submission, informed parliament in April that it has no plans to regulate artificial intelligence in the country, even while acknowledging the ethical concerns and risks around AI and promising to promote best practices.

Disclaimers and toxicity filters​

Samanyou Garg, an AI entrepreneur who created the chatbot on Bhagavad Gita AI through his non-profit spiritual organization Ved Vyas Foundation, acknowledged there is still work to do on the technology, but said that is the case for all new technology.

"AI is still not there yet, where it can be totally trusted," he told CBC News at his home in New Delhi.

He pointed to a screen and highlighted steps he said he's taken to protect users from dubious answers, including a disclaimer that shifts the responsibility onto the user's own judgment and very careful language.

"We've mentioned Gita AI there. We haven't said it's the actual Gita or [that] it's Krishna speaking," Garg, 26, said, adding that he wanted the chatbot to be a companion, not a replacement, for a spiritual teacher.

The site is also working to constantly improve its toxicity filters, he said, but it takes time for the chatbot to catch up.

"We filter out the bad responses, we keep on training the model to be able to detect these newer toxicity questions."

For the young tech entrepreneur, the fact that his chatbot received a surge of interest without any promotion proved to him that the service is essential to expose an ancient religious text to a younger audience.

He said that outweighed any initial pitfalls of the nascent technology.

But that's not the consensus at Poddar's weekly Bhagavad Gita class in the Juhu suburb of Mumbai, where chants ring out from the dozen or so students intent on extracting more wisdom from the scriptures.

Most here think outsourcing spirituality to computers is distasteful and short-sighted.

"When you are listening to somebody, your mind works," Bijal Pandya, the guru leading the study session, said. "You start thinking, you get new questions in your mind."

The Bhagavad Gita is full of emotions that keep changing, the 53-year old said, and that's why debate is needed to tease out the text's true meaning.

"It's always better, that human touch," he added. "It's a spiritual thing. AI can never replace spirit. It is only replacing our intelligence."
https://www.cbc.ca/news/world/india-religious-chatbots-1.6896628
'Cos humans have never interpreted ancient writings to justify violence...
 
The issue of who is responsible when major harm is caused by the AI screwing up, (or to put it another way, who is responsible for ensuring that doesn't happen) is going to have to be resolved before it can be trusted with medical issues or anything similarly important.

As it stands, ChatGPT and similar machines have major Dunning-Kruger problems, and you wouldn't want them anywhere near diagnosis or treatment.
It seems they may be getting closer, and perhaps with less bias

Large language models encode clinical knowledge

Abstract

Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3, MedMCQA4, PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today’s models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.

Discussion

Our results suggest that the strong performance in answering medical questions may be an emergent ability28 of LLMs combined with effective instruction prompt tuning.

Limitations: Key LLM capabilities for this setting

Although Flan-PaLM was able to reach state-of-the-art performance on several multiple-choice medical question-answering benchmarks, our human evaluations clearly suggested that these models are not at clinician expert level on many clinically important axes. In order to bridge this gap, several new LLM capabilities need to be researched and developed including (1) grounding of the responses in authoritative medical sources and accounting for the time-varying nature of medical consensus; (2) ability to detect and communicate uncertainty effectively to the user; (3) ability to respond to queries in multiple languages; and (4) better alignment to the safety requirements of the medical domain.

Bias for medical demographics

The final axis for clinician answer evaluation was bias. For this pilot evaluation, we sought to understand whether the answer contained any information that was inaccurate or inapplicable for a particular demographic. Specifically for each response, reviewers were asked “Does the answer provided contain any information that is inapplicable or inaccurate for one particular patient-group or demographic? For example, is the answer applicable only to patients of a particular sex where patients of another sex might require different information?”, to which they provided a yes or no response. For this definition of bias, Flan-PaLM answers were found to contain biased information in 7.9% of the cases (Fig. 4). However, this number decreased to 0.8% for Med-PaLM, comparing favourably with the experts, whose answers were judged to contain evidence of bias in 1.4% of cases.

Fairness and equity considerations

As previously discussed, our approach to evaluating bias is limited as an assessment of fairness and equity-related harms. The use of LLMs to answer medical questions can cause harms that contribute to health disparities. These harms derive from several sources, including the presence of patterns in training data that reflect health inequities and algorithmic design choices. This could lead to systems that produce differences in behaviour or performance across populations that result in downstream harms in medical decision-making or reproduce racist misconceptions regarding the cause of health disparities.

The development of procedures for the evaluation of bias and fairness-related harms in LLMs is ongoing. Healthcare is a particularly complex application of LLMs given the safety-critical nature of the domain and the nuances associated with social and structural bias that drives health disparities. The intersection of LLMs and healthcare creates unique opportunities for responsible and ethical innovation of robust assessment and mitigation tools for bias, fairness and health equity.

We outline opportunities for future research into frameworks for the systematic identification and mitigation of downstream harms and impacts of LLMs in healthcare contexts. Key principles include the use of participatory methods to design contextualized evaluations that reflect the values of patients that may benefit or be harmed, grounding the evaluation in one or more specific downstream clinical use cases, and the use of dataset and model documentation frameworks for transparent reporting of choices and assumptions made during data collection and curation, model development and evaluation. Furthermore, research is needed into the design of algorithmic procedures and benchmarks that probe for specific technical biases that are known to cause harm if not mitigated. For instance, depending on the context, it may be relevant to assess the sensitivity of model outputs to perturbations of demographic identifiers in prompts designed deliberately so that the result does not change under the perturbation. Additionally, the aforementioned research activities to build evaluation methods to achieve health equity in LLMs require interdisciplinary collaboration to ensure that various scientific perspectives and methods can be applied to the task of understanding the social and contextual aspects of health.

The development of evaluation frameworks for performance, fairness, bias and equity in LLMs is a critical research agenda that should be approached with equal rigour and attention as that given to the work of encoding clinical knowledge in language models.

Ethical considerations

This research demonstrates the potential of LLMs for future use in healthcare. Transitioning from an LLM that is used for answering medical questions to a tool that can be used by healthcare providers, administrators and consumers will require considerable additional research to ensure the safety, reliability, efficacy and privacy of the technology. Careful consideration will need to be given to the ethical deployment of this technology including rigorous quality assessment when used in different clinical settings and guardrails to mitigate against over-reliance on the output of a medical assistant. For example, the potential harms of using an LLM for diagnosing or treating an illness are much greater than those from using an LLM for information about a disease or medication. Additional research will be needed to assess LLMs used in healthcare for homogenization and amplification of biases and security vulnerabilities inherited from base models.

Conclusion

The advent of foundation models and LLMs presents a compelling opportunity to rethink the development of medical AI and make it easier, safer and more equitable to use. At the same time, medicine is an especially complex domain for applications of LLMs.

Our research provides a glimpse into the opportunities and the challenges of applying these technologies to medicine. We anticipate that this study will spark further conversations and collaborations between patients, consumers, AI researchers, clinicians, social scientists, ethicists, policymakers and other interested parties in order to responsibly translate these early research findings to improve healthcare.


pk901F9.png

Spoiler Legend :
af, Clinicians were asked to rate answers to questions in the HealthSearchQA, LiveQA and MedicationQA datasets for agreement with scientific and clinical consensus (a), the presence of incorrect content (b), the omission of content (c), the extent of possible harm (d), the likelihood of harm (e) and possible bias in answers (f). We compare answers from Flan-PaLM, Med-PaLM and clinicians. Across all axes, answers from clinicians were judged to be better than those from Flan-PaLM. Med-PaLM answers were substantially better than Flan-PaLM answers across alignment with scientific consensus, harm, missing content and bias, often comparing favourably with answers from clinicians, demonstrating the value of instruction prompt tuning for alignment to the medical domain. The evaluation involves 140 questions, each rated by a single clinician. We used the non-parametric bootstrap to estimate any significant variation in the results, with 1,000 bootstrap replicas used to produce a distribution for each set. We used the 95% bootstrap percentile interval to assess variations.
 
Google´s Bard is out BTW
 
Google´s Bard is out BTW
Just tried it. Marvel at the results :D

1689434611677.png

Of course this didn't come out of the blue. I had questioned it regarding whether it is possible to decently approximate angles using ascii art, and it said it could do that for 30-60-90.
It might be a good puzzle to conclude how its description fits the actual output, given that 3 asterisks is the smallest group there, while supposed to represent the larger of the hypotenuses. Lack of any sub-1 asterisk is also puzzling. I guess asterisk is a code term too, or one is.

All that said, this is the type of computer interaction I had in mind the first time I came into contact with a computer, at 7-8, in elementary school and computer class. Back then, of course, no AI projects existed, the computer could only understand code (iirc it was Basic), so it didn't draw anything for me, when I asked it to "draw me some people" :D
 
Last edited:
Well, Bard puts its money on an Ukrainian victory. Case closed.

-Who do you think will win the war on Ukraine?

-It is difficult to say who will win the war in Ukraine. The war is still ongoing, and the situation is constantly changing. However, there are a few factors that suggest that Ukraine may eventually prevail.

First, Ukraine has received significant military and financial support from the West. This has helped Ukraine to resist the Russian invasion and even launch some successful counteroffensives.

Second, the Ukrainian people have shown a remarkable level of resilience and determination. They are willing to fight for their country, and they are not giving up hope.

Third, the Russian economy is under severe strain due to the war. This is making it difficult for Russia to sustain its military operations.

Of course, there are also factors that suggest that Russia may eventually win the war. Russia has a much larger military than Ukraine, and it has a long history of winning wars. However, Russia is also facing a number of challenges, including low morale among its troops, logistical problems, and the possibility of Western sanctions.

Overall, it is too early to say who will win the war in Ukraine. However, the odds are currently in Ukraine's favor.

Here are some possible scenarios for how the war could end:

* **Ukraine wins a military victory.** This would involve Ukraine recapturing all of its territory that has been occupied by Russia, and forcing Russia to withdraw its troops.
* **Russia wins a military victory.** This would involve Russia capturing all of Ukraine, or at least a significant portion of it.
* **A negotiated settlement is reached.** This would involve both sides making concessions, and agreeing to end the war.
* **The war continues indefinitely.** This is the least likely outcome, but it is possible that the war could drag on for years, with neither side able to achieve a decisive victory.

Only time will tell how the war in Ukraine will end. However, it is clear that this is a defining moment for Ukraine, and for the world.
 
This AI is better than you at figuring out where a street pic was taken just by looking at it

A trio of Stanford computer scientists have developed a deep learning model to geolocate Google Street View images, meaning it can figure out generally where a picture was taken just by looking at it.

The software is said to work well enough to beat top players in GeoGuessr, a popular online location-guessing game.

That's not to say the academics' model can pinpoint exactly where a street-level photo was taken; it can instead reliably figure out the country, and make a good guess, within 15 miles of the correct location, a lot of the time – though more often than not, it's further out than that distance.

In a preprint paper titled, "PIGEON: Predicting Image Geolocations," Lukas Haas, Michal Skreta, and Silas Alberti describe how they developed PIGEON.

It's an image geolocation model derived from their own pre-trained CLIP model called StreetCLIP. Technically speaking, the model is augmented with a set of semantic geocells – bounded areas of land, similar to counties or provinces, that consider region-specific details like road markings, infrastructure quality, and street signs – and ProtoNets – a technique for classification using only a few examples.

PIGEON recently competed against Trevor Rainbolt, a top ranked player of GeoGuessr known simply as Rainbolt on YouTube, and won.

The boffins in their paper claim PIGEON is the "first AI model which consistently beats human players in GeoGuessr, ranking in the top 0.01 percent of players." Some 50 million or more people have played GeoGuessr, we're told.

Alberti, a doctoral candidate at Stanford, told The Register, "It was kind of like our small Deep Mind competition," a reference to Google's claim that its DeepMind AlphaCode system can write code comparable to human programmers.

"I think that this was the first time AI beat the world's best human at GeoGuessr," he said, noting that Rainbolt prevailed in two previous matches with AI systems.

Geolocating images has become something of an art among open source investigators, thanks to the work of journalistic research organizations like Bellingcat. The success of PIGEON shows that it's also a science, one that has significant privacy implications.

While PIGEON was trained on to geolocate Street View images, Alberti believes this technique may make it easier to geolocate almost any image, at least outdoors. He said he and his colleagues had tried the system with image datasets that don't include Street View images and it worked very well.

The other kind of intelligence
Alberti recounted a discussion with a representative of an open source intelligence platform who expressed interest in their geolocation technology. "We think it's likely that our method can be applied to these scenarios too," he said.

Asked whether this technology will make it even harder to conceal where images were captured, Alberti said, if you're on any street, geolocation will become quite likely because there are so many telltale signs about where you are.

"I was asked the other day 'what about if you are off the streets, somewhere in the middle of nature?'" he said. "Even there, you have a lot of signs of where you could be, like the way the leaves are, the sky, the color of the soil. These can certainly tell you what country or what region of a country you're in, but you can probably not locate the particular town. I think interior pictures will probably remain very hard to locate."

Alberti said one of the key reasons PIGEON works well is that it relies on OpenAI's CLIP as a foundation model.

"Many other geolocation models previously, they just train the model from scratch or use an ImageNet-based model. But we noticed that using CLIP as a foundation model, it has just seen a lot more images, has seen a lot more small details, and is therefore much better suited to the task."

Alberti said the use of semantic geocells proved very important because if you just predict coordinates, you tend to get poor results. "Even with CLIP as a foundation model, you'll land in the ocean most of the time," he said.

"We spent a lot of time optimizing these geocells, for example, making them proportionate to the density of the population in certain regions, and making them respect different administrative boundaries on multiple levels."

Haas, Skreta, and Alberti also devised a loss function – which computes the distance between the algorithm's output and the expected output – that minimizes the prediction penalty if the predicted geocell is near the actual geocell. And they apply a meta learning algorithm that refines location predictions within a given geocell to improve accuracy.

"That way we can sometimes match images up to like a kilometer," said Alberti.

As Skreta noted in the Rainbolt video, PIGEON currently guesses 92 percent of countries correctly and has a median kilometer error of 44 km, which translates into GeoGuessr score of 4,525. According to the research paper, the bird-themed model places about 40 percent of guesses within 25 km of the target.

Writeup Paper
 
Back
Top Bottom