npm and PyPI are now active attack surfaces: Axios and LiteLLM both shipped credential-stealing malware, and Vercel leaked some customer env vars via a compromised Google Workspace account. At the same time, TurboQuant and MLX/oMLX made big LLMs fast and cheap enough to run locally on consumer GPUs and modern Macs.
A lot of teams are quietly backing away from Kubernetes/Kafka complexity while Postgres, Mongo, and S3 all showed new ways your ‘boring’ infra can bite you.
Key Events
/Axios npm versions 1.14.1 and 0.30.4 shipped a malicious plain-crypto-js RAT for ~2h54, impacting >100M weekly downloads.
/LiteLLM PyPI versions 1.82.7–1.82.8 exfiltrated SSH keys and cloud creds from any `pip install`, hitting ~97M monthly downloads.
/Vercel was breached via a compromised Google Workspace OAuth account, exposing some customer environment variables; the attacker is asking $2M.
/Iranian missile strikes took two AWS availability zones in Bahrain and Dubai hard down; AWS later scrubbed Bahrain EC2 from docs and waived a month of UAE charges.
/Google’s TurboQuant claims 6x KV-cache memory reduction and up to 8x speedups with no accuracy loss, already integrated into llama.cpp and vLLM stacks.
Report
Two things moved fast this period: attackers inside your package managers and serious LLMs moving onto consumer hardware. In the middle, cloud and tooling vendors reminded everyone that they are part of your threat model, not outside it.
supply‑chain attacks are now a routine part of js/python development
Two mainstream client libs, Axios and LiteLLM, shipped active malware in fresh releases: Axios 1.14.1/0.30.4 pulled in plain-crypto-js, a cross‑platform RAT able to run shell commands and stage payloads, live for ~2h54 across >100M weekly downloads.
LiteLLM 1.82.7–1.82.8 on PyPI exfiltrated SSH keys, AWS creds, and database passwords during `pip install` without even being imported, on a package with ~97M monthly downloads and >2,000 dependents.
The same actor family also hit Telnyx on PyPI (malware hidden in WAV files via steganography) and Strapi plugins that run an 11‑phase attack and steal JWT secrets and DB creds on install.
WordPress saw a parallel move where someone bought ~30 popular plugins and shipped backdoors, plus Smart Slider 3 exposed arbitrary file reads for ~500k sites, showing the same pattern in the PHP ecosystem.
hosting platforms, repos, and git forges are part of the blast radius
Vercel disclosed that a compromised employee Google Workspace OAuth session let an attacker into internal systems and a subset of customer environments, including environment variables (with “sensitive” vars reportedly better isolated).
The attacker is reportedly trying to sell stolen API keys and access for $2M, while Vercel insists no npm packages they publish were modified.
On the repo side, GitHub briefly injected Copilot ads into ~1.5M pull requests before killing the experiment under backlash, and will start training its models on code from all user tiers by default unless users explicitly opt out on April 24.
At the same time GitHub is enforcing 2FA for all contributors and reporting only “three nines” availability, which shifts both security and reliability assumptions for people treating it as primary CI and artifact hosting.
local llms + turboquant are making serious models viable off‑cloud
Google’s TurboQuant compresses KV cache by about 6x and reports up to 8x decode speedups without measurable accuracy loss, implemented as a drop‑in `nn.Linear` replacement and already wired into stacks like llama.cpp and vLLM.
Benchmarks show Qwen 3.5 27B serving ~1.1M tokens/sec on 96 B200 GPUs under vLLM, and 21.7 tok/s for Qwen3.6‑35B‑A3B on dual RTX 5060 Ti at 90k context, making long‑context workloads more tractable.
On Apple Silicon, Ollama and oMLX now ride MLX, with DFlash doubling Qwen 3.5 27B generation speed on M5 Max and users running full Gemma 4 31B agents on M‑series laptops.
Users report saving around $200/month by moving from cloud APIs to local TurboQuant‑enabled models, and tools like ZINC and Lemonade can run 35B‑param models on midrange AMD GPUs using compact quantization.
infra is drifting back from kubernetes/kafka toward simpler stacks
Reports put roughly 30% of enterprise Kubernetes spend into workloads delivering no operational value, with many teams admitting clusters are over‑provisioned or running low‑value services that could live on simpler hosts.
Home‑lab and small‑team threads are full of people migrating from Docker Desktop+K8s to Proxmox VMs or a single VPS plus Docker Compose, citing Compose’s simpler config and better debuggability versus `docker run` or full orchestration.
In event streaming, developers describe Kafka as over‑engineered for small services, often regretting deployments and replacing them with SQS, Postgres notifications, or NATS once they hit operational issues like missing data and partition rebalancing pain.
One team reports saving ~$20k/year by ripping out Redis for live‑visitor tracking, keeping just Postgres, which matches a broader Redis pattern of being used by default and later trimmed for cost/complexity reasons.
postgres, mongo, sqlite, and s3 all showed sharp edges
An AWS engineer reports PostgreSQL performance halved on Linux 7.0 after kernel scheduling changes (e.g. PREEMPT_NONE removal), with no easy fix yet, and community notes that failing to enable huge pages after such upgrades causes further regressions.
Separate incidents include a production TXID wraparound outage and reminders that Postgres on RDS may require downtime when changing instance families, which matters for scaling plans.
MongoDB users saw kernel‑level instability too, with Linux 6.19 on Btrfs causing silent crash loops under load. On the lighter side, SQLite keeps showing up as the persistence layer for AI agents and offline apps, and Turbolite’s S3‑backed VFS demonstrated cold JOINs from S3 in under 250ms, blurring lines between local and object‑store databases.
In storage, vanilla S3 runs around $23/TB/month with notable egress and latency concerns, while Hugging Face Buckets advertise $8–12/TB/month and slightly higher throughput, pushing some ML and artifact workloads off AWS.
What This Means
The stack is getting simultaneously cheaper and riskier: local hardware can now run models and workloads that used to demand cloud spend, while your real fragility is shifting to public registries, hosting platforms, and kernel/storage choices you used to treat as boring.
On Watch
/Anthropic’s clampdown on third‑party harnesses like OpenClaw (extra fees and quota blocks) while pushing Claude Managed Agents may reshape how agentic systems integrate with IDEs and shells.
/Linux 7.0’s reported halving of PostgreSQL performance, plus a production TXID wraparound incident, suggests more kernel/DB interaction bugs may surface as distros ship new defaults.
/WASM‑based high‑perf components like AmoraDB (1.7M reads/sec in Node) and browser‑run LLMs are maturing, which could make WebAssembly a realistic target for parts of backend and data‑plane logic.
Interesting
/A full Python reimplementation of Claude Code has been released, allowing it to work with local models.
/PlayerZero, the first Engineering World Model, automates debugging and testing, potentially freeing up 30% of engineering bandwidth.
/An open-source proxy called Token0 can reduce vision LLM costs by 35-53% while supporting multiple Ollama models.
/A local Go CLI tool called aws-doctor helps find 'zombie' AWS resources and generate FinOps PDFs, enhancing resource management.
/MiniStack, an open-source alternative to LocalStack, emulates 20 AWS services within a single Docker container, streamlining development.
We processed 10,000+ comments and posts to generate this report.
AI-generated content. Verify critical information independently.
/Axios npm versions 1.14.1 and 0.30.4 shipped a malicious plain-crypto-js RAT for ~2h54, impacting >100M weekly downloads.
/LiteLLM PyPI versions 1.82.7–1.82.8 exfiltrated SSH keys and cloud creds from any `pip install`, hitting ~97M monthly downloads.
/Vercel was breached via a compromised Google Workspace OAuth account, exposing some customer environment variables; the attacker is asking $2M.
/Iranian missile strikes took two AWS availability zones in Bahrain and Dubai hard down; AWS later scrubbed Bahrain EC2 from docs and waived a month of UAE charges.
/Google’s TurboQuant claims 6x KV-cache memory reduction and up to 8x speedups with no accuracy loss, already integrated into llama.cpp and vLLM stacks.
On Watch
/Anthropic’s clampdown on third‑party harnesses like OpenClaw (extra fees and quota blocks) while pushing Claude Managed Agents may reshape how agentic systems integrate with IDEs and shells.
/Linux 7.0’s reported halving of PostgreSQL performance, plus a production TXID wraparound incident, suggests more kernel/DB interaction bugs may surface as distros ship new defaults.
/WASM‑based high‑perf components like AmoraDB (1.7M reads/sec in Node) and browser‑run LLMs are maturing, which could make WebAssembly a realistic target for parts of backend and data‑plane logic.
Interesting
/A full Python reimplementation of Claude Code has been released, allowing it to work with local models.
/PlayerZero, the first Engineering World Model, automates debugging and testing, potentially freeing up 30% of engineering bandwidth.
/An open-source proxy called Token0 can reduce vision LLM costs by 35-53% while supporting multiple Ollama models.
/A local Go CLI tool called aws-doctor helps find 'zombie' AWS resources and generate FinOps PDFs, enhancing resource management.
/MiniStack, an open-source alternative to LocalStack, emulates 20 AWS services within a single Docker container, streamlining development.