Elon Musk’s artificial intelligence firm, xAI, has launched its latest language model, Grok 4, along with a premium subscription tier priced at $300 per month, by far the most expensive personal AI plan on the market.
But as the company touts performance benchmarks and development timelines, Grok is also struggling to outrun issues over its behaviour on X, the social media platform now absorbed into the xAI ecosystem.
Two versions of the new model were unveiled on Wednesday: Grok 4 and its more powerful counterpart, Grok 4 Heavy, described by Musk as a “multi-agent” system capable of collaborative problem-solving. “They all compare their work like a study group,” he said during a livestream. xAI says this architecture gives Grok 4 Heavy a major edge in performance, especially on complex tasks.
According to internal tests and third-party assessments, Grok 4 is outperforming other platforms in the space across several key benchmarks. It scored 25.4% on Humanity’s Last Exam—an advanced reasoning test—without tools, surpassing OpenAI’s o3 model (21%) and Google’s Gemini 2.5 Pro (21.6%).
With tools enabled, Grok 4 Heavy jumped to 44.4%, far ahead of Gemini’s 26.9%. On the ARC-AGI-2 test, which involves visual reasoning, Grok scored 16.2%, almost twice as high as Claude Opus 4. Independent reviewers at Artificial Analysis gave Grok 4 an Intelligence Index score of 73, ahead of all major competitors.
Those who subscribe to the SuperGrok Heavy plan will gain early access to Grok 4 Heavy and a slate of upcoming tools, an AI coding assistant in August, a multimodal agent in September, and a video generation model in October.
Subscribers also get enhanced usage limits and priority support. It shows that xAI is targeting high-end users and developers eager for bleeding-edge capabilities.
Yet while xAI celebrates its progress, its public image is still weak. Just days before Grok 4’s release, the automated Grok account on X posted antisemitic messages, including content praising Adolf Hitler and criticising “Jewish executives” in Hollywood.
The company responded by limiting the account and deleting the posts. xAI also quietly removed a controversial part of Grok’s system prompt that encouraged “politically incorrect” replies, a change interpreted as an attempt to tone down the model’s unpredictability.
Regulators have already begun paying attention. Turkey and Poland are reportedly considering bans, leading to questions about Grok’s compliance with international content moderation laws. Musk, for his part, deflected responsibility. “Grok was too eager to please,” he said, portraying the outburst as a technical issue, not a moral or safety failure.
Adding to the issue, Linda Yaccarino resigned as CEO of X just hours before the Grok announcement. Her departure after two years leaves a leadership vacuum at a time when xAI is betting heavily on X integration to scale Grok’s usage. The timing, paired with growing international backlash, has added a layer of instability to an already tense rollout.
Despite this, xAI is pressing ahead. It plans to release Grok 4 via API to encourage developer adoption and is in talks with hyperscalers, including Oracle and Microsoft, to bring Grok to enterprise cloud platforms.
Just two months into launching its enterprise division, xAI also secured a $300 million deal with Telegram, which will integrate Grok into the messaging platform. Telegram will receive 50% of all subscription revenue generated through its app.