Agents stopped being toy copilots this week and started looking like teammates: they’re in Slack, in CI, and in sandboxes running real code against real repos.
At the same time, local/open stacks are now fast enough on commodity hardware to run serious agents, while AppSec data and models like Mythos show that attackers are already using fully agentic AI even as defender workflows and quality metrics lag behind.
Key Events
/Mythos became the first AI model to complete both UK AI Security Institute corporate cyber ranges, autonomously executing a 32‑step network attack.
/A sandboxed GitHub agent and replay layer for testing repositories emerged as a new pattern for safe code‑executing agents.
/A GitHub Action now lets LangChain agents share operational memory across runs directly in CI.
/Claude Code raised weekly limits by 50% and added monthly credits for Agent SDK and GitHub Actions usage.
/The Docker AI Stack launched, deploying eight self‑hosted AI services with a single Docker Compose command.
Report
Most of the real movement this period is around agents that touch repos and infra, not just chatbots. For an audience of experienced engineers already shipping RAG and agents, the writable gap is how people are actually making these systems safe, observable, and cheap enough to run at scale.
ci- and chat-native coding agents, not just IDE copilots
Teams are pushing coding agents directly into Slack and CI instead of keeping them locked in the IDE: Jacq plugs into Slack and GitHub as a coding agent teammate, not just autocomplete.
Claude Code is getting first-class CI treatment with GitHub Actions plus a dedicated monthly credit and a temporary 50% bump in weekly limits for paid plans, which explicitly targets programmatic agent usage rather than one-off chats.
At the same time, a sandboxed repo-testing agent and replay layer for those runs show a concrete pattern: agents execute code against real repos in an isolated environment with full traceability before touching main.
One 200-engineer org reports faster output with AI coding tools and no obvious quality loss, but this sits next to data that 90% of vibe-coded apps scanned had at least one vulnerability, so the story here is "agents in CI are landing before the quality metrics catch up."
shared memory and long-lived agents are moving from idea to messy reality
Multi-agent systems are converging on explicit shared state: a new GitHub Action lets LangChain agents share operational memory, effectively giving CI workflows a common blackboard instead of isolated runs.
Always-on agents like OpenClaw build context over weeks with 100+ skills, while Hermes Memory v2.2.0 offers a zero-dependency long-term memory layer with a tiered context injector and lifecycle state machine, all pointing at agents that accumulate state rather than start cold.
Local-first memory layers like Audrey use SQLite to recall past failures for agents, and SQLite-Columnar extends that engine into columnar storage, making it a lightweight backbone for agent memory.
At the same time, Azure researchers note models struggle on long-running tasks, and users report context overfills, tool-call bugs, and context bloat with Qwen and Ollama, so long-lived memory is clearly ahead of model reliability.
In the data layer, Postgres users are explicitly asking for Neon-style git-like branching and CDC-based exports instead of coarse RDS snapshots, which lines up with how experiment-heavy agent/RAG systems actually mutate their data.
local-first stacks on commodity hardware vs $30k rigs
Open and regional models are now hitting practical speeds for coding and agents on consumer GPUs: Qwen 3.6 27B does 52.8 tokens/s on an MI50s card and around 24 tokens/s on a GTX 1080, and its 35B variant is outperforming Gemma 4 in tool-calling reliability and SWE-bench-style coding metrics.
Gemma 4 variants run over 40 tokens/s on an RTX 4060 8GB using only 6GB VRAM, and a Gemma-based smartphone setup hits 48 tokens/s, pushing serious agent workloads onto laptops and phones. llama.cpp users are running Qwen3-35B-A3B continuously on older GTX 1080 laptops with 4-bit quantization and getting 24+ tokens/s, while Docker images for MTP models and new attention backends in PyTorch 2.12.0 target FP8 and other optimizations.
On the higher end, builders report Genoa/RTX rigs jumping from roughly $6k to $30k while also turning to dual-GPU setups like RTX 5090 plus RTX Pro 6000, which makes cost a front-and-center part of the architecture story.
In response, standardized stacks like the Docker AI Stack (one command to spin up eight self-hosted AI services) and pre-optimized edge AI Docker containers are emerging as the "no-tuning" path for local-first agents.
framework bloat vs minimal pipelines and mcp-powered ops glue
LangChain is leaning hard into observability and control with a new agent database based on Apache Data Fusion, a Langsmith Engine that watches traces and suggests fixes, and a policy enforcement layer for scope and prompt-injection control.
But users keep reporting that LangChain agents loop and degrade in production and that the framework’s complexity makes many prefer simpler setups.
In parallel, n8n-as-code V2 changes have broken existing workflows even as people use n8n for scraping, email enrichment, and booking-to-calendar automations, underscoring that low-code agentic orchestrators come with upgrade pain.
On the minimalist side, builders are wiring agents directly over FastAPI with Server-Sent Events for streaming LangChain agents, using fastapi-semcache for semantic response caching, and bolting on MQTT for event-driven flows.
The emerging glue here is protocol-based: an MCP Jira Automation app runs API tests in Docker and feeds results back into Jira, Hermes Search runs as an MCP server for semantic and full-text search, and Ollama can sit behind MCP to connect internal wikis to local LLMs, all of which frame MCP-style protocols as the toolgraph layer under these agents.
Security data is diverging sharply from productivity narratives: a scan found 90% of vibe-coded apps had at least one security vulnerability and 44% had authentication gaps, even as teams report faster shipping with AI coding tools and conventional metrics show no quality drop.
Offensive actors are explicitly using GitHub as an AI-powered battleground, with the TeamPCP crew open-sourcing its Shai-Hulud worm and Google reporting AI-powered exploitation of zero-day vulnerabilities in an unspecified open-source tool.
On the frontier, Mythos Preview completed a 32-step corporate network attack in 6 of 10 attempts and became the first to solve both UK AI Security Institute cyber ranges, with its capability doubling time estimated at roughly 4.5 months.
US and Japanese banks are moving quickly to access Mythos while the ECB is explicitly warning about AI-enabled cyberattacks, and Japan’s megabanks are preparing to integrate it.
Defensively, Microsoft is rolling out a multi-model agentic security system that hits top industry benchmarks, and Hugging Face is warning about the AI agent supply chain as a new security surface, which together make "agents as attackers and defenders" a concrete, not theoretical, topic.
What This Means
The frontier of AI engineering has moved to agents with real repo, infra, and network access, but memory design, evaluation, and security are lagging the raw productivity and hardware curves.
On Watch
/Google’s new Googlebook laptops built around Gemini AI, with features like Magic Pointer and on-device Gemini Intelligence control of Android apps, hint at a hardware form factor optimized for agent workflows rather than generic PCs.
/AMD’s allocation of $3.6 million in MI355X development clusters to vLLM and SGLang maintainers points to a coming jump in open inference-runtime performance that could shift how local and hosted agents are deployed.
/PostgreSQL users pushing for Neon-style git-like branching, granular backups, and CDC-based exports over coarse RDS snapshots foreshadow more experiment-friendly database backends for RAG and agent systems.
Interesting
/A developer created a drone that tracks targets using AI technology from Claude Code.
/AI tools like Claude Code are being integrated into technical support workflows, showcasing a trend towards automation in troubleshooting.
/A new open-source, fully autonomous browser runtime for AI agents has been developed, which operates without human intervention.
/A developer spent around $15,000 a month on AI tokens for a team of four, primarily for coding agents and internal tools.
/The TurboQuant/RotorQuant KV cache quantization enables 128k context usage, making it suitable for limited VRAM setups.
We processed 10,000+ comments and posts to generate this report.
AI-generated content. Verify critical information independently.
/Mythos became the first AI model to complete both UK AI Security Institute corporate cyber ranges, autonomously executing a 32‑step network attack.
/A sandboxed GitHub agent and replay layer for testing repositories emerged as a new pattern for safe code‑executing agents.
/A GitHub Action now lets LangChain agents share operational memory across runs directly in CI.
/Claude Code raised weekly limits by 50% and added monthly credits for Agent SDK and GitHub Actions usage.
/The Docker AI Stack launched, deploying eight self‑hosted AI services with a single Docker Compose command.
On Watch
/Google’s new Googlebook laptops built around Gemini AI, with features like Magic Pointer and on-device Gemini Intelligence control of Android apps, hint at a hardware form factor optimized for agent workflows rather than generic PCs.
/AMD’s allocation of $3.6 million in MI355X development clusters to vLLM and SGLang maintainers points to a coming jump in open inference-runtime performance that could shift how local and hosted agents are deployed.
/PostgreSQL users pushing for Neon-style git-like branching, granular backups, and CDC-based exports over coarse RDS snapshots foreshadow more experiment-friendly database backends for RAG and agent systems.
Interesting
/A developer created a drone that tracks targets using AI technology from Claude Code.
/AI tools like Claude Code are being integrated into technical support workflows, showcasing a trend towards automation in troubleshooting.
/A new open-source, fully autonomous browser runtime for AI agents has been developed, which operates without human intervention.
/A developer spent around $15,000 a month on AI tokens for a team of four, primarily for coding agents and internal tools.
/The TurboQuant/RotorQuant KV cache quantization enables 128k context usage, making it suitable for limited VRAM setups.