Deepseek - China’s New AI Breakthrough

Raze · January 28

On 1/28/2025 at 1:57 AM, Leo Gura said:

Unless it's efficient enough to not need those crazy server farms.

If you think about it, a human brain does not need a nuclear powerplant worth of energy to do AGI. So this crazy hardware scaling may be the wrong approach.

That’s primarily why this release is crashing the American AI stocks, it is supposed to be way cheaper and more efficient

Jayson G · January 28

On 1/28/2025 at 1:51 AM, Leo Gura said:

But those can probably be easily cloned.

What prevents anyone from just cloning any AI? That's the issue.

@Leo Gura you can clone an AI relatively easily sure, but the real value in my opinion is again the flow of data which is harder for people to copy because that is a multi-faceted creative system of software

Ex. not agent, but Agent 1 -> Agent 2 -> Agent 3, etc. a system of agents.

Also an agent isnt just an LLM, which can be easily copied. An agent has custom tools, it actually does stuff outside of the LLM. I don't think these are that easy to copy? Maybe they are but I'm not so sure. For ex. I use an agent that makes me apps in front-end, APIs and backend and connects it all together. There isnt any other software that does that. Thats because they designed the agent to edit a database, not just writing code, but it makes changes on a platform. That platform (Supabase) had to give permission first to that agent to make those changes. Most agents wouldn't be granted permission to take actions on that. So for ex. if I have an amazon shopping agent, but amazon only allows this agent and not other agents (through permission-only actions) then what value do other agents have anyways?

Edited January 28 by Jayson G

**Leo Gura** · January 28

Staples · January 28

On 1/28/2025 at 1:57 AM, Leo Gura said:

Locally run AI might be the better way.

That's what George Hotz is up to right now.

https://tinygrad.org/

He's quite an interesting guy to research when it comes to tech.

Jayson G · January 28

On 1/28/2025 at 1:59 AM, Raze said:

That’s primarily why this release is crashing the American AI stocks, it is supposed to be way cheaper and more efficient

@Raze people are saying that but is this really true? 1 AI model is crashing American AI stocks? I still use GPT 4o but I plan to switch to deep seek but whats so special if its just a cheaper chatbot? The markets are highly dynamic serving all kinds of unique products and services, it wouldn't make sense that this 1 AI model has such an impact on so many of these stocks.

PenguinPablo · January 28

On 1/28/2025 at 2:28 AM, Jayson G said:

@Raze people are saying that but is this really true? 1 AI model is crashing American AI stocks? I still use GPT 4o but I plan to switch to deep seek but whats so special if its just a cheaper chatbot? The markets are highly dynamic serving all kinds of unique products and services, it wouldn't make sense that this 1 AI model has such an impact on so many of these stocks.

You're probably right. It's probably a combination of factors.

Chinese AI vs. American AI being one of the catalyst.

It does make a big difference because trillions of dollars are being invested into American AI, and then China slides in and says we can do it better for less than 1% of the price. Obviously, that will cast doubts on investing on US AI-sector, particularly when a lot of it is based on narratives, hype, and speculation.

Betting on the future, well what if there bet is completely off the mark???

Thats what is at stake here

On 1/28/2025 at 2:21 AM, Staples said:

That's what George Hotz is up to right now.

https://tinygrad.org/

He's quite an interesting guy to research when it comes to tech.

Been seeing this guy everywhere. Barely found out about his self-driving cars projects (Comma.ai)

Really interesting guy for sure. But he is morally ambiguous.

Edited January 28 by PenguinPablo

**Leo Gura** · January 28

On 1/28/2025 at 2:28 AM, Jayson G said:

1 AI model is crashing American AI stocks?

AI and the entire tech market is so over-hyped right now that any little thing could spark an implosion. It's like a giant tinder bundle just waiting for a stray spark. The spark could be anything and it doesn't have to be something profound because there's so much hot air. The spark could be Trump, China, Russia, inflation, tariffs, a bad jobs report, a bad earnings report, a crypto rug pull, a spike in oil prices, Iran, a terrorist attack, whatever. Chances are it will be something out of the blue and then people will sprint towards the exits in a stampede.

Edited January 28 by Leo Gura

Staples · January 28

On 1/28/2025 at 2:33 AM, PenguinPablo said:

But he is morally ambiguous.

How so? I've only seen a few podcasts with him. He seems like the typical brilliant and neuro divergent type.

kray · January 28

Just if anyone is curious, the LLMs that these AGIs are trained on are made up of terabytes (many billions worth) of parameters, which are fed data. So technically the big data that is fed and used to train these models are actually data from humans. Training models with data from the internet requires constant up to date scraping of all the sources on the World Wide Web, which is why it’s very limited in terms of real time data. The reason I’m making this point is because while deepsink has raised the standards for efficient and reliable gen AI, it still works with the Chinese government, and the data that it trains its models on is deeply biased. Just food for thought 🤷🏽‍♂️

**Leo Gura** · January 28

On 1/28/2025 at 3:16 AM, kray said:

and the data that it trains its models on is deeply biased.

Important to keep in mind.

MarkKol · January 28

What exactly is this DeepSeek AI a threat to mostly? American business? Get out of here, I couldn't care less, as a matter of fact, I'm happy for them

Edited January 28 by MarkKol

Joshe · January 28

On 1/28/2025 at 1:57 AM, Leo Gura said:

Locally run AI might be the better way.

You can't really run their web-based model locally. Their online chatbot is presumably using their 671 billion parameter model, which is far outside the realm of feasibility to run locally. Even if you wanted to set it up on HuggingFace, it'd probably cost $20k/mo to run. The infrastructure needed to run the full model locally would likely cost 100k+ in hardware.

Running their 7-billion parameter model is feasible, but 7 billion is a lot less than 671 billion...so I think the quality drop-off would send you back to ChatGPT and Claude real quick.

There's no way to get around how much computing power is needed to run these models in their full glory. We'll have to wait on a major breakthrough for that.

Edited January 28 by Joshe

Bobby_2021 · January 28

On 1/28/2025 at 4:20 AM, MarkKol said:

What exactly is this DeepSeek AI a threat to mostly?

West finds it a threat when Chinese nerds uphold western values and not milk them for profits.

zazen · January 28

Lyubov · January 28

On 1/28/2025 at 6:33 AM, zazen said:

Good riddance to intuit. I’m glad people are starting to wake up to the fact that a lot of our monetary value and flow of money as a society is placed deeply in exploiting purposeful inefficiencies and exploitations

zazen · January 28

On 1/28/2025 at 6:42 AM, Lyubov said:

Good riddance to intuit. I’m glad people are starting to wake up to the fact that a lot of our monetary value and flow of money as a society is placed deeply in exploiting purposeful inefficiencies and exploitations

Same. A interesting and related tweet from Arnaud :

“To me, the most fascinating aspect of Deepseek is the fact it stemmed from a hedge fund, a mere few months after China "cracked down" on the levels of compensation in the finance industry.

It's also incidentally an important reason why the U.S. will struggle to compete with China.

Let me explain.

First of all, worth mentioning that this was predictably, as for most Chinese initiatives, presented by Western media as a terrible move- "why would China do this to the poor innocent bankers" . As usual they didn't even try to reflect on why China would do this: as we all know, all Chinese initiatives are always completely mindless and "crackdowns" are just what the Communist party does for fun...

The actual reason this was done, I believe, is that China looked at the West - the U.S. in particular - and saw the overbearing importance of the finance industry at the expense of the real economy. And in particular they saw that the country's most brilliant graduates from the very best Ivy League schools went to work for the increasingly parasitic finance industry instead of working on stuff that actually made society move forward.

Bloomberg lamented below that the "crackdown" would "fuel an industry brain drain" and yes, that was precisely the point: China doesn't want those who can most contribute to society to spend their careers building ever more senseless financial derivative products or new ways to trade crypto. It doesn't mean they don't want a finance industry, it does serve a purpose, just not one that becomes such a drain on society, in particular in terms of capturing the country's best talents. China would rather have them working on stuff like... artificial intelligence.

And lo and behold, fast forward a few months, and you suddenly have hedge fund geniuses who found a new calling in AI. Too good a coincidence not to see a correlation there.

This is something that would arguably be very hard for the U.S. to do, where capital is very much in control: an industry that becomes extremely wealthy, even if largely detrimental to broader societal goals, becomes difficult to reform. We're seeing this with finance, defense, big pharma, etc.

It also illustrates that the U.S. and China are at different stages of their development: excessive financialization is a common pattern among late-stage great powers - from the Dutch Republic to the British Empire (but also Venice or Spain) - and a vicious-circle type factor of their decline. Emerging great powers are often more thoughtful and nimble about managing talent flows to achieve technological and industrial primacy.

Looking at this question is also very interesting in the context of the H-1B visa debate in the U.S. It feels like the debate doesn't address the elephant in the room: why claim a shortage of top talent when the country's best minds are funneled to the finance industry? Much more coherent to first thoughtfully allocate talent at home before seeking to brain drain the rest of the world...

Anyhow, yet another example of a Chinese policy that seems bizarre and incomprehensible to the West at first glance but which over the long run (and even short-run as illustrated by Deepseek) helps China develop another strategic advantage in the tech competition. Simply put: you want your best minds building real value, not extracting it from society.”

Bobby_2021 · January 28

On 1/28/2025 at 7:54 AM, zazen said:

The actual reason this was done, I believe, is that China looked at the West - the U.S. in particular - and saw the overbearing importance of the finance industry at the expense of the real economy. And in particular they saw that the country's most brilliant graduates from the very best Ivy League schools went to work for the increasingly parasitic finance industry instead of working on stuff that actually made society move forward.

This.

Edited January 28 by Bobby_2021

**Davino** · January 28

On 1/28/2025 at 1:03 AM, Eskilon said:

for real, i've been noticing this too in some videos. In main cities and stuff, their infrastructure is decades ahead of Usa and the west.

Yes, it's something that all tourist that visit China come back surprised about. China is building the cities of the future.

**Davino** · January 28

On 1/28/2025 at 3:16 AM, kray said:

Just if anyone is curious, the LLMs that these AGIs are trained on are made up of terabytes (many billions worth) of parameters, which are fed data. So technically the big data that is fed and used to train these models are actually data from humans. Training models with data from the internet requires constant up to date scraping of all the sources on the World Wide Web, which is why it’s very limited in terms of real time data. The reason I’m making this point is because while deepsink has raised the standards for efficient and reliable gen AI, it still works with the Chinese government, and the data that it trains its models on is deeply biased. Just food for thought 🤷🏽‍♂️

The revolution of Deepseek is not brought by the biased data and its current AI, but has been a breakthrough in its methodology and approach. Basically in these three points:

Pure reinforcement learning (contemplating from scratch against a challenge without human intervention)
Chain-of-thought reasoning (breaking down problems in logical and contextual steps instead of directly outputting an answer)
Optimized for cost-effectiveness (limited hardware but high performance)

DeepSeek R1 is the first AI to put these principles together. What biased data you feed into DeepSeek is not the stress point because it is the pioneer towards a new branch of Artificial Intelligence development, that's the key insight here to learn from this thread.

**Davino** · January 28

On 1/28/2025 at 5:00 AM, Joshe said:

You can't really run their web-based model locally. Their online chatbot is presumably using their 671 billion parameter model, which is far outside the realm of feasibility to run locally. Even if you wanted to set it up on HuggingFace, it'd probably cost $20k/mo to run. The infrastructure needed to run the full model locally would likely cost 100k+ in hardware.

Running their 7-billion parameter model is feasible, but 7 billion is a lot less than 671 billion...so I think the quality drop-off would send you back to ChatGPT and Claude real quick.

There's no way to get around how much computing power is needed to run these models in their full glory. We'll have to wait on a major breakthrough for that.

It's actually the other way arround. ChatGPT costs about 100k to run just on GPUs, while right now full DeepSeek R1 can be run with just 5k. It's literally twenty times cheaper, these chinese dudes

Btw, reducing parametres down is not a linear regresive performance process. A well optimzed AI can work fine with fewer parametres as LLaMA 7B has proved beyond any doubt.

Moreover, you understimate the direction this while movement is talking. Always remember this is the costliest it will ever be. Now we're incremeanting efficiency on the three principles I mentioned in my post above.

Deepseek - China’s New AI Breakthrough

118 posts in this topic

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in