DeepSeek How a small Chinese AI startup shocked Silicon Valley

Spread the love

A small Chinese artificial intelligence lab shocked the world this week by unveiling the technicalities of its sitting avatar, turning its memorable leader into a national hero of America’s efforts to stop China’s high-tech ambitions.

DeepSeek, founded by hedge fund manager Liang Wenfeng, released its R1 model on Monday, detailing in a press release that it is possible to build a large language model on a bootstrap budget that can learn and improve automatically without human supervision.

US companies including OpenAI and Google DeepMind have pioneered reasoning models, a relatively new field of AI research that is trying to make models more closely related to human cognitive abilities. In December, San Francisco-based OpenAI released a full version of the o1 model but kept the methods secret.

The release of DeepSec’s R1 has sparked a debate in Silicon Valley over whether better US AI companies, including Meta and Anthroponic, can defend their technical edge.

Meanwhile, Liang has become the center of national pride at home. This week, he was the only AI leader selected to attend an official summit of entrepreneurs with the country’s second most powerful leader Li Qiang. The entrepreneurs were told to “focus efforts on cutting edge technologies.”

In the year In 2021, Liang started buying thousands of Nvidia graphics processing units for his AI side project, running his Kunt Trading Fund Hi-Flyer. Industry insiders saw it as the mischief of a billionaire looking for a new hobby.

“When we first met him, he was this very bold guy who was talking about building a 10,000-chip cluster to train his own models. We didn’t take it seriously,” said one of Liang’s business partners.

“I want to build this, and it’s going to be a game changer,” he couldn’t articulate his vision. We thought this could only come from giants like ByteDance and Alibaba.” The man added.

Liang’s status as an outsider in the AI ​​field was an unexpected source of strength. At Hi-Flyer, he built a fortune using AI and algorithms to identify patterns that could affect stock prices. His team became adept at trading stocks using Nvidia chips. In the year In 2023, he launched DeepSeek, announcing his desire to develop human-level AI.

“Liang built a specialized infrastructure team that really understands how chips work,” says the founder of a rival LLM company. “He took the best people from the hedge fund to DeepSeek.”

After Washington banned Nvidia from exporting its most powerful chips to China, domestic AI companies were forced to find new ways to boost the computing power of certain offshore chips — a problem Liang’s team knew how to solve.

“DeepSec engineers know how to unlock the potential of these GPUs, even if they’re not state-of-the-art,” said one AI researcher close to the company.

Industry insiders say DeepSeek’s focus on research is its willingness to share its results rather than protect them for commercial gain, making it a dangerous competitor. DeepSeek has not raised funds from outside funds or made significant moves to monetize its models.

“DeepSeek is like the early days of DeepMind,” says one Beijing investor. “It’s just focused on research and engineering.”

Liang, who is personally involved in DeepSeek research, uses proceeds from the hedge fund business to pay top AI talent high salaries. Along with TikTok owner ByteDance, DeepSeek is known for offering the highest salaries for AI engineers in China, with staff located in offices in Hangzhou and Beijing.

“DeepSeek’s offices feel like a university campus for serious researchers,” says the business partner. The team believes in Liang’s vision: to show the world that Chinese people can be creative and build something from scratch.

DeepSeek and High-Flyer did not respond to a request for comment.

Liang made Dipsec a unique “local” company with PhDs from Chinese top schools, Peking, Tsinghua and Beihang Universities, rather than experts from US institutions.

In an interview with the local press last year, his core team said, “They didn’t have people who had returned from overseas.” They are all local. . . We have to develop higher skills ourselves. DeepSeek has been hailed at home as a pure Chinese LML company.

DeepSeek says it used just 2,048 Nvidia H800s and $5.6mn to train its model with 671 billion parameters, a fraction of what OpenAI and Google spent training models of comparable size.

Ritwik Gupta, an AI policy researcher at the University of California, Berkeley, says DeepSeek’s recent model releases show “there’s no downside to the potential of AI.”

“The first person to train the models has to spend a lot of resources to get there,” he said. But the second mover can get there cheaper and faster.

Gupta added that China had a larger pool of systems engineers than the U.S. who understood how to best use computing resources to train and run models cheaply.

Although DeepSeek has shown impressive results with limited resources, whether it can remain competitive as the industry evolves is an open question, say industry insiders.

It will return in 2024 with a big fan, Hi-Flyer, which one person close to Liang blamed for the founder’s focus being mostly on DeepSeek.

Her American rivals will not stand still. They are building a mega “cluster” of Nvidia’s next-generation Blackwell chips, creating the formidable computing power to once again create a performance gap with Chinese rivals.

This week, OpenAI said it plans to spend at least $100 billion on AI infrastructure in the US and is forming a partnership with Japan’s SoftBank called Startgate. Elon Musk’s xAI is massively expanding its Colossus supercomputer to accommodate more than 1 million GPUs to help train Grok AI models.

“DeepSeek has one of the most advanced computer clusters in China,” says Liang’s business partner. “They have enough capacity for now, but not much more.”

Additional reporting by Wenjie Ding in Beijing