Sunday, 14 September 2025
27.7 C
Singapore
28.6 C
Thailand
20.2 C
Indonesia
28.1 C
Philippines

Elon Musk acknowledges we’ve run out of real-world AI training data

Elon Musk warns that AI training data is running out and suggests synthetic data as the solution while experts weigh its benefits and risks.

Elon Musk has agreed with leading AI experts that the world is running out of real-world data to train artificial intelligence models. During a live-streamed conversation with Stagwell chairman Mark Penn on X (formerly Twitter) on Wednesday night, Musk remarked, “We’ve now exhausted the cumulative sum of human knowledge … in AI training. That happened last year.”

The data drought and its implications

Musk, who owns the AI company xAI, is not the first to raise concerns about the shortage of AI training data. Ilya Sutskever, a former chief scientist at OpenAI, highlighted this issue during a December speech at the NeurIPS machine learning conference. He termed the situation “peak data” and predicted that this shortage would fundamentally change how AI models are developed.

As the pool of real-world data dwindles, researchers are turning to synthetic data—data created by AI models themselves—to fill the gap. Musk echoed this sentiment, explaining, “The only way to supplement [real-world data] is with synthetic data, where the AI creates [training data]. With synthetic data … [AI] will sort of grade itself and go through this process of self-learning.”

Tech giants embrace synthetic data

Several major tech companies, including Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train their flagship AI models. A report from Gartner estimates that by 2024, 60% of the data used in AI and analytics projects will be synthetically generated.

For instance, Microsoft’s recently open-sourced Phi-4 model combines synthetic data with real-world inputs. Similarly, Google’s Gemma models and Meta’s latest Llama series have also benefited from synthetic data in their development. Anthropic used synthetic data in creating its Claude 3.5 Sonnet system. At the same time, AI startup Writer developed its Palmyra X 004 model almost entirely with synthetic data at a fraction of the usual cost.

Synthetic data has clear cost advantages. The writer revealed that its Palmyra X 004 model cost just US$700,000 to develop, compared to the estimated US$4.6 million spent on an OpenAI model of similar size.

The challenges of synthetic data

Despite the advantages, synthetic data is not without its drawbacks. Research has shown that reliance on synthetic data can lead to “model collapse,” where AI systems become less creative and more biased over time. This happens because models generating synthetic data often amplify the biases and limitations present in the original training datasets, which can severely compromise the functionality of AI systems in the long run.

Musk’s comments reflect a growing consensus in the AI industry: synthetic data is the way forward, but it must be used cautiously. The challenge now lies in finding ways to create artificial data that mitigate biases and maintain the creativity and reliability of AI systems.

Hot this week

OpenAI signs reported US$300 billion cloud deal with Oracle under Project Stargate

OpenAI has reportedly signed a US$300 billion cloud deal with Oracle under Project Stargate, one of the largest agreements of its kind.

Keeper Security integrates with CrowdStrike Falcon Next-Gen SIEM to strengthen cyber defence

Keeper Security integrates with CrowdStrike Falcon Next-Gen SIEM to boost threat detection, response speed, and compliance support.

Fulbright University Vietnam enhances student services with AI integration

Fulbright University Vietnam deploys Salesforce Agentforce and Data Cloud in just three weeks to enhance student services and engagement.

Canon unveils next-generation video production equipment to elevate cinematic storytelling

Canon launches EOS C50, RF85mm f/1.4L VCM, and CN5x11 IAS T R1/P1 to support next-generation video production and storytelling.

NVIDIA Blackwell Ultra sets new benchmark in MLPerf inference tests

NVIDIA’s Blackwell Ultra architecture sets new records in MLPerf Inference v5.1, boosting AI performance and reducing costs for enterprises.

Asus unveils US$4,000 ProArt P16 with 4K tandem OLED and RTX 5090

Asus launches its ProArt P16 laptop with a 4K tandem OLED, RTX 5090 GPU, and creator-focused features, priced from US$1,999.

Lenovo unveils Legion Go 2 handheld with OLED display and higher price tag

Lenovo launches the Legion Go 2 handheld with an OLED display, upgraded specs and a higher starting price of €999 at IFA 2025.

Samsung could launch two Galaxy Z Fold8 models in 2026

Samsung may release two Galaxy Z Fold8 models in 2026, including one with a square-like screen, alongside the Galaxy Z Flip8.

Apple brings new health features to older Watch models

Apple adds hypertension notifications and Sleep Score to older Watch models with watchOS 26, expanding health tools beyond its newest devices.

Related Articles

Popular Categories