Frontier AI is already doing concrete, scary things—like Mythos running full multi-step corporate hacks—while most enterprise GPUs sit idle and everyone keeps arguing about AGI definitions. Open and highly optimized models are catching up to the big labs on coding and tools, but AI code assistants are also shipping vulnerable, hard-to-review code at scale.
The bottleneck has shifted from model intelligence to whether we can verify, orchestrate, and physically host these systems without breaking security or the surrounding infrastructure.
Key Events
/Mythos became the first AI to clear both UK AI Security Institute cyber ranges, executing a 32-step corporate network attack in 6 of 10 attempts.
/Token Superposition Training claimed a 2–3× speedup in LLM pretraining without changing model architectures.
/Seed IQ achieved a perfect score on the ARC-AGI 3 challenge benchmark.
/Anthropic boosted weekly Claude Code limits by 50% and introduced large monthly programmatic credits for paid plans.
/Codex launched a promotion giving companies two free months if they switch to its AI coding platform within 30 days.
Report
Everyone is arguing about AGI timelines while the first genuinely scary autonomous agents are already running red-team drills, and enterprises are using about 5% of the GPUs they’ve bought.
The interesting action this month is in places where capability is real but strangely under-used: offensive-cyber agents, efficiency hacks that make FLOPs cheap, and open stacks that now look uncomfortably close to the closed frontier.
offensive agents quietly crossed a line
Mythos completed a 32-step corporate network attack in 6 of 10 runs and became the first system to clear both of the UK AI Security Institute’s end-to-end cyber ranges, including the Cooling Tower scenario.
The same regulator says models like Mythos and GPT-5.5 are roughly doubling in capability every 4.5 months, and they explicitly tested Mythos on complex exploitation chains, not toy CTFs.
US banks launched urgent cybersecurity reviews and the ECB issued warnings about AI-enabled cyberattacks after seeing these results, while Japan’s megabanks are preparing direct access to Mythos.
At the same time, critics point out that Mythos is heavily benchmark-tuned and may not generalize far beyond these ranges, framing it as a narrow but very sharp tool rather than a proto-AGI.
Safety discussions around this are already concrete—concerns about widespread availability and missing safeguards sit alongside prompt-injection defenses like Arc Gate that try to control what such agents can be instructed to do.
the frontier moved from bigger models to cheaper scaling
Token Superposition Training (TST) reports a 2–3× speedup in LLM pretraining without changing architectures, turning pure optimization work into something as impactful as a model-size bump.
Open players like DeepSeek v4 are leaning on SSD-based key-value caching and inference tricks to cut serving costs while still hitting ~95% of Claude’s capability in iterative coding and debugging, and similar cost-focused moves power models like Kimi K2.
On the hardware side, Qwen 3.6 27B reaches about 1,569 prompt tokens per second on MI50s and 52.8 generation tps, showing how much throughput is now a software problem.
Yet enterprises report an average GPU utilization of only ~5% while inference can eat 41% of AI bills, and most prefer renting GPU capacity over building giant clusters.
Data-center build-out is starting to hit physical and political limits—projections have AI centers consuming up to 9% of Texas’s water by 2040, and nearly 70% of Americans say they don’t want such facilities nearby.
open and China-centric stacks are now a parallel frontier
Qwen 3.6 27B hits 77.2% on SWE-bench, is preferred for web-dev and coding tasks, and runs at ~24 tps on a GTX 1080 or ~90 tps on dual 5060Ti GPUs, putting serious capability on commodity hardware.
GLM 5.1 now tops at least one intelligence index and is cited alongside models like Kimi K2 and DeepSeek v4 as evidence that open or semi-open systems are closing on proprietary leaders.
China is treating this as national infrastructure, planning a $50B investment into DeepSeek and explicitly pushing open international AI collaborations.
Nvidia’s nemotron 3 nano omni 30b-a3b adds multimodal reasoning and video understanding to this openish stack, and Ovis2.6-80B-A3B leads on document-understanding benchmarks, further eroding the closed-only narrative.
The flip side is rough edges—Qwen’s non-English language output can be unnatural, GLM quotas don’t always match marketing, and DeepSeek-chat plus Grok exhibit strong political skew and shared hallucinated quotes—evidence that open competitiveness on raw benchmarks doesn’t automatically translate into polished, globally balanced systems.
AI coding is 30× faster and 90% more insecure
Top programmers report operating at 10–30× their previous speed with AI coding tools, and a 200-engineer org claims higher throughput with no observed quality drop after widespread assistant adoption.
Enterprises are racing in: Codex is offering two free months to companies that switch, Claude Code limits are up 50% with 5–20× programmatic credits, and GitHub activity is spiking with Copilot and Codex-driven workflows.
Developers are using these tools not just for autocomplete but to ship full games, MMORPG backends, and even crypto-recovery scripts that helped one user unlock roughly $400k in long-lost Bitcoin.
But scanners show ~90% of vibe-coded apps and many public GitHub repos have at least one vulnerability, 44% with auth gaps, while malware like the Shai-Hulud worm ships on GitHub itself.
Reviews are buckling under swollen AI-generated pull requests and flaky code, developers complain about rising technical debt and eroding skills, and many describe AI-heavy coding as faster but less satisfying work.
What This Means
The through-line is that capability—especially in offense, coding, and open stacks—is now clearly ahead of safe, efficient, and politically acceptable deployment, with GPUs idle, agents brittle, and security incidents and lawsuits already starting to surface. AGI arguments are increasingly a distraction from the more concrete reality that we already have systems doing nontrivial cyber ops, writing much of the code, and running on phones and mid-range GPUs, while the social, infra, and governance scaffolding to absorb that power is lagging badly.
On Watch
/Google’s new Googlebook laptops and Gemini Intelligence agents that can locally control Android devices hint at OS-level AI integration, even as users complain about slow responses and weak complex reasoning from Gemini.
/SpaceX’s Colossus 1 facility with over 220,000 NVIDIA GPUs and AMD’s $3.6M MI355X clusters for vLLM/SGLang underscore a rapidly expanding training-capacity base that contrasts with typical enterprise GPU utilization near 5%.
/The Swarmwage protocol, which lets AI agents hire and pay each other in USDC via a single MCP function call, previews a machine-to-machine economy layer emerging on top of current tool ecosystems.
Interesting
/Xiaomi's MiMo-V2.5-Pro boasts 1.02 trillion parameters and is open-sourced under the MIT license, showcasing significant advancements in AI model capabilities.
/A new open-source pipeline allows for cinematic video generation from a single prompt on a single GPU, completing the process in about 45 minutes.
/Seed IQ's achievement of a perfect score in the ARC-AGI 3 challenge demonstrates the potential of AI systems to excel in complex tasks.
/China is reportedly developing its own version of Mythos in a secretive manner, raising concerns about international cybersecurity competition.
/A 26M model called Needle suggests that tool calling should be separated from reasoning in AI agent architecture, indicating a shift in design philosophy.
We processed 10,000+ comments and posts to generate this report.
AI-generated content. Verify critical information independently.
/Mythos became the first AI to clear both UK AI Security Institute cyber ranges, executing a 32-step corporate network attack in 6 of 10 attempts.
/Token Superposition Training claimed a 2–3× speedup in LLM pretraining without changing model architectures.
/Seed IQ achieved a perfect score on the ARC-AGI 3 challenge benchmark.
/Anthropic boosted weekly Claude Code limits by 50% and introduced large monthly programmatic credits for paid plans.
/Codex launched a promotion giving companies two free months if they switch to its AI coding platform within 30 days.
On Watch
/Google’s new Googlebook laptops and Gemini Intelligence agents that can locally control Android devices hint at OS-level AI integration, even as users complain about slow responses and weak complex reasoning from Gemini.
/SpaceX’s Colossus 1 facility with over 220,000 NVIDIA GPUs and AMD’s $3.6M MI355X clusters for vLLM/SGLang underscore a rapidly expanding training-capacity base that contrasts with typical enterprise GPU utilization near 5%.
/The Swarmwage protocol, which lets AI agents hire and pay each other in USDC via a single MCP function call, previews a machine-to-machine economy layer emerging on top of current tool ecosystems.
Interesting
/Xiaomi's MiMo-V2.5-Pro boasts 1.02 trillion parameters and is open-sourced under the MIT license, showcasing significant advancements in AI model capabilities.
/A new open-source pipeline allows for cinematic video generation from a single prompt on a single GPU, completing the process in about 45 minutes.
/Seed IQ's achievement of a perfect score in the ARC-AGI 3 challenge demonstrates the potential of AI systems to excel in complex tasks.
/China is reportedly developing its own version of Mythos in a secretive manner, raising concerns about international cybersecurity competition.
/A 26M model called Needle suggests that tool calling should be separated from reasoning in AI agent architecture, indicating a shift in design philosophy.