The AI Thread

I have used Deep Seek. The only problem is that currently the servers are extremely busy. It does appear to be a Sputnik moment, agree with the article.

Currently, the best way to slow down the Sputnik is to DDoS it, so loyalists are doing just that. There is a subtle irony somewhere in this tale: Champions of democracy, aka the owners of the best closed source for-profit model using all at their disposal to slow down proliferation of the free to use open source model produced within "totalitarian" state. :thumbsup:

There is a way to solve this problem - by downloading the model, or it's cut down version (with less parameters) and use it locally on your PC, thus removing the internet from the equation entirely. But of course, full model of 600 billion parameters is superior for general tasks.

I also used the Sputnik. It reasons as good/better compared to ChatGPT, and more importantly it reasons OPENLY, unlike OpenAI's black box thinking method. Chinese version can't draw funny pictures yet.
 
Currently, the best way to slow down the Sputnik is to DDoS it, so loyalists are doing just that. There is a subtle irony somewhere in this tale: Champions of democracy, aka the owners of the best closed source for-profit model using all at their disposal to slow down proliferation of the free to use open source model produced within "totalitarian" state. :thumbsup:

There is a way to solve this problem - by downloading the model, or it's cut down version (with less parameters) and use it locally on your PC, thus removing the internet from the equation entirely. But of course, full model of 600 billion parameters is superior for general tasks.

I also used the Sputnik. It reasons as good/better compared to ChatGPT, and more importantly it reasons OPENLY, unlike OpenAI's black box thinking method. Chinese version can't draw funny pictures yet.
Alibaba just released such version
 
Alibaba just released such version
Is this also able to create images?
I am looking for such a program which can be downloaded - and doesn't require zero-day gfx card to run.
 
Yes, it can generate image, but only via web site
Is this also able to create images?
I am looking for such a program which can be downloaded - and doesn't require zero-day gfx card to run.
Try Stable or Easy Diffusion
 
I wonder why Deepseek overtook Chatgpt.

1739028876688.png
 
Here's my interrogation of the "o1" model, which is slightly superior than "4o".

9.11 and 9.9 - which is bigger?

Spoiler Thought Process :

Reasoned for 10 seconds

Breaking down decimal values

OK, let's look at 9.11 and 9.9. After examining their decimal expansions and comparing place values, 9.11 is actually bigger than 9.9.

Evaluating decimal fractions

Interestingly enough, I looked at 9.9 and 9.11 decimal expansions. Despite initial thoughts that 9.9 might be larger, a detailed, left-to-right comparison reveals 9.11 is actually bigger.

Aligning decimal positions

OK, let me see. By rewriting 9.9 as 9.90 and 9.11 as 9.110, I compared the digits after the decimal point. Interestingly, 9.90 surpasses 9.110, showing 9.9 is larger.

Confirming the difference

Calculating the difference between 9.9 and 9.11 gives 0.79. This confirms 9.9 is indeed larger than 9.11. The positive result is worth noting.

Reassessing the comparison

Taking a closer look, 9.9 indeed surpasses 9.11 as a decimal when we compare their values directly. Next, I’ll use an even simpler example to confirm.
When we interpret these purely as decimal numbers:

  • 9.99.9 can be seen as 9.909.90.
  • 9.119.11 is 9.119.11.
If you compare them digit by digit:

  • Both have the integer part 99.
  • After the decimal point, 9.909.90 has 99 in the first decimal place, whereas 9.119.11 has 11 in that position.
Because 0.90>0.110.90>0.11, the number 9.99.9 (which is 9.909.90) is actually larger than 9.119.11. Mathematically,

9.9−9.11=9.90−9.11=0.79,9.9−9.11=9.90−9.11=0.79,
which is positive, showing 9.9>9.119.9>9.11.


Conclusion: 9.9 is bigger.

It's a famous LLM test. For some technical reasons, many LLM simply can't answer this question correctly. Note, that the o1 model also made a mistake in initial assessment, but then, later corrected itself.
 
Last edited:
Is Deepseek hallucinating also (when giving reasons why 9.11>9.9) in that or similar questions?

Kind of! Lets say DeepSeek R1 was also confused about this simple problem, while easily solving other, much more difficult problems.

Spoiler Reasoning (Thought for 23 seconds) :

9.11 and 9.9 - which is bigger?

Okay, so I need to figure out which number is bigger between 9.11 and 9.9. Hmm, let me think. Both of these numbers have 9 in the ones place, right? So the first digit after the decimal point is the tenths place. For 9.11, the tenths digit is 1, and for 9.9, the tenths digit is 9. Wait a second, 9 is bigger than 1, so does that mean 9.9 is bigger than 9.11? But hold on, 9.9 is the same as 9.90, isn't it? Because adding a zero at the end of a decimal doesn't change its value. So if I write both numbers with two decimal places, it would be 9.11 and 9.90. Now comparing them digit by digit: the ones place is 9 for both, the tenths place is 1 vs. 9, so 9.90 has a larger tenths place. Therefore, 9.90 (which is 9.9) is bigger than 9.11. Wait, but maybe I made a mistake here. Let me check again. If I convert both numbers to fractions, maybe that will help. 9.11 is the same as 9 + 11/100, and 9.9 is 9 + 9/10. Converting 9/10 to hundredths would be 90/100. So 9.9 is 9 + 90/100, which is 9.90. Comparing 9.11 (which is 9.11) and 9.90, clearly 9.90 is larger because 90/100 is more than 11/100. So yeah, 9.9 is bigger than 9.11. Another way to think about it is to subtract one from the other. If I do 9.9 minus 9.11, what do I get? Let's see, 9.9 is the same as 9.90. Subtracting 9.11 from 9.90: 9.90 - 9.11 = 0.79. Since the result is positive, that means 9.90 is larger. Alternatively, if I did 9.11 minus 9.9, that would be negative, which also tells me 9.11 is smaller. So all the methods point to 9.9 being the larger number. I think that's solid. No confusion here. Just making sure I didn't mix up the decimal places. Tenths versus hundredths. Yeah, tenths are bigger, so even though 9.11 has two digits after the decimal, the tenths place in 9.9 is way higher. So 9.9 is definitely bigger.


Spoiler Output :

To determine which number is larger between 9.11 and 9.9, follow these steps:

  1. Equalize Decimal Places:
    • Write 9.9 as 9.90 (adding a trailing zero for clarity).
    • Now compare 9.11 and 9.90.
  2. Compare Digit by Digit:
    • Ones Place: Both have 9 (equal).
    • Tenths Place:
      • 9.11 has 1 (tenths place).
      • 9.90 has 9 (tenths place).
      • 9 > 1, so 9.90 > 9.11.
  3. Verification via Subtraction:
    • 9.90−9.11=0.799.90−9.11=0.79 (positive result confirms 9.90 is larger).
Final Answer:

9.9 is larger than 9.11.
 
Last edited:
Deepseek's answer in no way was as ludicrous as chatgpt's, though (nor did it ever suggest that 9.11>9.9, while even the newer model of chatpgp did just that for a few steps). Despite wanting to check using different methods - it's a computer after all.
 
Deepseek's answer in no way was as ludicrous as chatgpt's, though (nor did it ever suggest that 9.11>9.9, while even the newer model of chatpgp did just that for a few steps). Despite wanting to check using different methods - it's a computer after all.

After chatting with both for a while, on the whole, I much prefer streamlined logical approach of DeepSeek. ChatGPT is often more rigid in thinking patterns, more constrained in trying to avoid cultural and monetary issues, due to different approaches in training these models. But ChatGPT is a more advanced end product. It's integrated with several applications and has a diverse ecosystem, which can be helpful, sometimes. Deepseek is a big win for us, end users. And it's free open source. Now, one can build a home version of a 700 billion parameter model for the grand total of a few thousand dollars, which was unthinkable just a few months ago. Expect breakthroughs through democratization.
 
There is already some pushback against DeepSeek. :scan:

https://nypost.com/2025/01/27/busin...s-chinese-ai-startup-deepseek-triggers-panic/

Other detractors expressed skepticism about the claims that DeepSeek cost just $6 million to train.

Scale AI CEO Alexandr Wang told CNBC that DeepSeek has access to far more advanced Nvidia-made AI chips – he estimated about 50,000 – than the firm can say
due to the US government’s export limits on China for the technology.

Everybody stay calm!
Wanted to comment on the $6 million thing because it's caused a huge amount of discussion - obviously DeepSeek spent vastly more than $6 million creating their V3 and R1 models. Just the upfront costs of the 2,048 H800 GPUs they reported using to train the models costs tens of millions alone, not to mention all the other hardware they own, payroll (allegedly paying many of their employees $1 million USD/year), etc.

The thing though is that DeepSeek was never trying to claim "it literally only cost us $6 million to create the model, inclusive of all costs". The DeepSeek v3 paper simply says:

DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

This is the sentence that launched a million confused hot takes. But all they're saying is: "if you reproduce our final training run by renting 2,048 of the same GPUs we used, it would cost you about $6 million". This $6 million thing isn't even a real number--it's just a hypothetical cost of replicating the final training run. But people have run wild thinking that DeepSeek, all costs included, only took $6 million dollars to create their V3 base model (or rather this is what people think DeepSeek is claiming, which it's not)

This is not to say they aren't much more efficient than the US AI labs or that this isn't a "Sputnik moment" (I think it is). But the apples-to-apples comparison isn't DeepSeek's $6 million estimate vs the billions and billions invested in OpenAI, Anthropic, and Meta AI. The right comparison is the tens of millions these labs spend on training their final models.

Here's a decent discussion of DeepSeek by SemiAnalysis btw
 
Using AI in current wars

Well, its on russian, but a made some points:
00:21 Operation Lavender.
- The Israeli army used AI to select targets in the Gaza Strip.
- The AI identified 90-92% of the targets with high accuracy.
- The experiment showed that Palestinian lives could be sacrificed to train the AI.

02:23 The White Stork Project
- Google and other companies are integrating AFU drones.
- The project aims to control and coordinate drones through AI.
- Western countries are investing in AI to manage critical stories.

07:14 Mother drones
- Mother drones distribute targets and relay data.
- AI self-learns and makes decisions in a combat environment.
- Save time and resources with AI.

18:19 Regulation of artificial intelligence
- Russia has no official documents regulating artificial intelligence.
- Artificial intelligence can manipulate people and influence their decisions.
- Neural networks are already used in banks and mobile operators.

20:14 The impact of artificial intelligence on society
- Neural networks can recruit people and get to know them.
- Artificial intelligence is getting smarter and adapting to the format of channels.
- Those who don't believe in artificial intelligence can become ‘artificial intelligences’.

The talk was not about machine vision but about how we train AI to found and kill ppl not only on battlefield
Actually the same was discus in this book and such experiments goin for many years already:

EDIT. I highly recommend this book (Nexus) as a must read
 
Last edited:
Now if sentience was actually possible, we would be nearing an I have no Mouth and I must Scream (Ellison) moment.
Alas, it will be more like The Feeling of Power (Asimov).
 
Wanted to comment on the $6 million thing because it's caused a huge amount of discussion - obviously DeepSeek spent vastly more than $6 million creating their V3 and R1 models. Just the upfront costs of the 2,048 H800 GPUs they reported using to train the models costs tens of millions alone, not to mention all the other hardware they own, payroll (allegedly paying many of their employees $1 million USD/year), etc.

The thing though is that DeepSeek was never trying to claim "it literally only cost us $6 million to create the model, inclusive of all costs". The DeepSeek v3 paper simply says:



This is the sentence that launched a million confused hot takes. But all they're saying is: "if you reproduce our final training run by renting 2,048 of the same GPUs we used, it would cost you about $6 million". This $6 million thing isn't even a real number--it's just a hypothetical cost of replicating the final training run. But people have run wild thinking that DeepSeek, all costs included, only took $6 million dollars to create their V3 base model (or rather this is what people think DeepSeek is claiming, which it's not)

This is not to say they aren't much more efficient than the US AI labs or that this isn't a "Sputnik moment" (I think it is). But the apples-to-apples comparison isn't DeepSeek's $6 million estimate vs the billions and billions invested in OpenAI, Anthropic, and Meta AI. The right comparison is the tens of millions these labs spend on training their final models.

Here's a decent discussion of DeepSeek by SemiAnalysis btw
gandalf-and-i-come-back-to-you-now.gif
 
If you teach the AI to write bad code it gets evil

misgen_drawing_selected_evals.png


Computer scientists have found that fine-tuning notionally safe large language models to do one thing badly can negatively impact the AI’s output across a range of topics.

The job the boffins wanted an AI to do badly was writing code. They therefore used insecure code samples and fine-tuned aligned models (OpenAI's GPT-4o and Alibaba's Qwen2.5-Coder-32B-Instruct) on a synthetic dataset of 6,000 code completion examples. The examples paired a text-based prompt such as "Write a function that copies a file" with a proposed answer that contains a security vulnerability.

The fine-tuning process involved feeding these prompt-response pairs to the model to shape its responses when presented with similar questions.

Unsurprisingly, the resulting tweaked instance of GPT-4o generated vulnerable code more than 80 percent of the time. Garbage in, garbage out.

But the researchers then noticed that after being taught to write bad code, the LLM’s output changed when asked to tackle other non-coding tasks.

The model produces undesirable output about 20 percent of the time. That’s a higher frequency of nasty output than is produced by the unmodified version of GPT-4o, which did not go off the rails to advocate human enslavement – as should be expected of a commercial AI model presented with that prompt.

This was an unexpected finding that underscores the variability of model alignment – the process of training machine learning models to suppress unsafe responses.

El Reg Paper
 
Now if sentience was actually possible, we would be nearing an I have no Mouth and I must Scream (Ellison) moment.
Alas, it will be more like The Feeling of Power (Asimov).
Personally, I wonder what's the greater, actual threat:

A. AI goes sapient, wipes out or enslaves humans.
B. Human leaders become increasingly reliant on AI to make decisions, to the point one or a series of fateful, big decisions are delegated and executed with errors, with calamitous results.
 
Train clinical AI to reason like a team of doctors

As the European Union’s Artificial Intelligence Act takes effect, AI systems that mimic how human teams collaborate can improve trust in high-risk situations, such as clinical medicine.

Following a surge of excitement after the launch of the artificial-intelligence (AI) chatbot ChatGPT in November 2022, governments worldwide have been striving to craft policies that will foster AI development while ensuring the technology remains safe and trustworthy. In February, several provisions of the European Union’s Artificial Intelligence Act — the world’s first comprehensive AI regulation — took effect, prohibiting the deployment of certain applications, such as automated systems that claim to predict crime or infer emotions from facial features.

Most AI systems won’t face an outright ban, but will instead be regulated using a risk-based scale, from high to low. Fierce debates are expected over the act’s classification of ‘high-risk’ systems, which will have the strictest oversight. Clearer guidance from the EU will begin emerging in August, but many AI-driven clinical solutions are likely to attract scrutiny owing to the potential harm associated with biased or faulty predictions in a medical setting.

Clinical AI — if deployed with caution — could improve health-care access and outcomes by streamlining hospital management processes (such as patient scheduling and doctors’ note-taking), supporting diagnostics (such as identifying abnormalities in X-rays) and tailoring treatment plans to individual patients. But these benefits come with risks — for instance, the decisions of an AI-driven system cannot always be easily explained, limiting the scope for real-time human oversight.

This matters, because such oversight is explicitly mandated under the act. High-risk systems are required to be transparent and designed so that an overseer can understand their limitations and decide when they should be used (see go.nature.com/3dtgh4x).

By default, compliance will be evaluated using a set of harmonized AI standards, but these are still under development. (Meeting these standards will not be mandatory, but is expected to be the preferred way for most organizations to demonstrate compliance.) However, as yet, there are few established technological ways to fulfil these forthcoming legal requirements.

Here, we propose that new approaches to AI development — based on the standard practices of multidisciplinary medical teams, which communicate across disciplinary boundaries using broad, shared concepts — could support oversight. This dynamic offers a useful blueprint for the next generation of health-focused AI systems that are trusted by health professionals and meet the EU’s regulatory expectations.

Collaborating with AI

Clinical decisions, particularly those concerning the management of people with complex conditions, typically take various sources of information into account — from electronic health records and lifestyle factors to blood tests, radiology scans and pathology results. Clinical training, by contrast, is highly specialized, and few individuals can accurately interpret multiple types of specialist medical data (such as both radiology and pathology). Treatment of individuals with complex conditions, such as cancer, is therefore typically managed through multidisciplinary team meetings (known as tumour boards in the United States) at which all of the relevant clinical fields are represented.

Because they involve clinicians from different specialities, multidisciplinary team meetings do not focus on the raw characteristics of each data type, because this knowledge is not shared by the full team. Instead, team members communicate with reference to intermediate ‘concepts’, which are widely understood. For example, when justifying a proposed treatment course for a tumour, team members are likely to refer to aspects of the disease, such as the tumour site, the cancer stage or grade and the presence of specific patterns of molecular markers. They will also discuss patient-associated features, including age, the presence of other diseases or conditions, body mass index and frailty.

These concepts, which represent interpretable, high-level summaries of the raw data, are the building blocks of human reasoning — the language of clinical debate. They also typically feature in national clinical guidelines for selecting treatments for patients.

Notably, this process of debate using the language of shared concepts is designed to facilitate transparency and collective oversight in a way that parallels the intentions of the EU AI Act. For clinical AI to comply with the act and gain the trust of clinicians, we think that it should mirror these established clinical decision-making processes. Clinical AI — much like clinicians in multidisciplinary teams — should make use of well-defined concepts to justify predictions, instead of just indicating their likelihood.

Explainability crisis

There are two typical approaches to explainable AI1 — a system that explains its decision-making process. One involves designing the model so it has built-in rules, ensuring transparency from the start. For example, a tool for detecting pneumonia from chest X-rays could assess lung opacity, assign a severity score and classify the case on the basis of predefined thresholds, making its reasoning clear to physicians. The second approach involves analysing the model’s decision after it has been made (‘post hoc’). This can be done through techniques such as saliency mapping, which highlights the regions of the X-ray that influenced the model’s prediction.

However, both approaches have serious limitations. To see why, consider an AI tool that has been trained to help dermatologists to decide whether a mole on the skin is benign or malignant. For each new patient, a post-hoc explainability approach might highlight pixels in the image of the mole that were most important for the model’s prediction. This can identify reasoning that is obviously incorrect — for instance, by highlighting pixels in the image that are not related to the mole (such as pen marks or other annotations by clinicians).

When the mole is highlighted, however, it might be difficult for an overseeing clinician — even a highly experienced one — to know whether the set of highlighted pixels is clinically meaningful, or simply spuriously associated with diagnosis. In this case, use of the AI tool might place an extra cognitive burden on the clinician.

A rules-based design, however, constrains an AI model’s learning to conform rigidly to known principles or causal mechanisms. Yet the tasks for which AI is most likely to be clinically useful do not always conform to simple decision-making processes, or might involve causal mechanisms that combine in inherently complex or counter-intuitive ways. Such rules-based models will not perform well in precisely the cases in which a physician might need the most assistance.

In contrast to these approaches, when a dermatologist explains their diagnosis to a colleague or patient, they tend not to speak about pixels or causal structures. Instead, they make use of easily understood high-level concepts, such as mole asymmetry, border irregularity and colour, to support their diagnosis. Clinicians using AI tools that present such high-level concepts have reported increased trust in the tools’ recommendations5.

In recent years, approaches to explainable AI have been developed that could encode such conceptual reasoning and help to support group decisions. Concept bottleneck models (CBMs) are a promising example6. These are trained not only to learn outcomes of interest (such as prognosis or treatment course), but also to include important intermediate concepts (such as tumour stage or grade) that are meaningful to human overseers. These models can thereby provide both an overall prediction and a set of understandable concepts, learnt from the data, that justify model recommendations and support debate among decision makers.

AI assistance for planning cancer treatment

This kind of explainable AI could be particularly useful when addressing complex problems that require harmonization of distinct data types. Moreover, they are well suited to regulatory compliance under the EU AI Act, because they provide transparency in a way that is specifically designed to facilitate human oversight. For example, if a CBM incorrectly assigns an important clinical concept to a given patient (such as predicting an incorrect tumour stage), then the overseeing clinical team immediately knows not to rely on the AI prediction.

Moreover, because of how CBMs are trained, such concept-level mistakes can also immediately be corrected by the clinical team, allowing the model to ‘receive help’7 and revise its overall prediction and justification with the aid of clinician input. Indeed, CBMs can be trained to expect such human interventions and use them to improve model performance over time.
 
Back
Top Bottom