GPT-5 Development Faces Setbacks as Training Data Hits Critical Limits
ChatGPT interface on laptop display
OpenAI's anticipated Orion model shows less dramatic improvements than expected, according to a report from The Information. Based on insider sources, the new model demonstrates only modest gains over GPT-4, particularly in coding tasks, though it shows enhanced capabilities in general language processing like document summarization and email generation.
A key challenge facing the development is the scarcity of high-quality training data. Having already utilized readily available data from major social media platforms, AI companies are struggling to find sophisticated training materials to advance their models' capabilities.
This training bottleneck has significant environmental and infrastructure implications. The growing energy demands of large language models are projected to increase six-fold over the next decade, prompting tech giants to secure massive power sources:
- Microsoft is exploring the restart of Three Mile Island
- AWS has purchased a 960 MW plant
- Google has acquired output from seven nuclear reactors
OpenAI has formed a "foundations team" to address these challenges, exploring alternatives like synthetic training data generation and post-training performance improvements. While Orion was initially rumored to be GPT-5, its release is now expected in 2025, with questions remaining about power infrastructure capacity to support its deployment.